[klibc] Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
Stefan Kanthak
stefan.kanthak at nexgo.de
Mon Aug 19 17:35:51 PDT 2019
"H. Peter Anvin" <hpa at zytor.com> wrote August 20, 2019 12:51 AM:
> On 8/14/19 9:42 PM, Stefan Kanthak wrote:
>> Hi,
>>
>> both
>> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S
>> and
>> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
>> use the following code sequences for shift counts greater 31:
>>
>> 1: 1:
>> xorl %edx,%edx shrl %cl,%edx
>> shl %cl,%eax xorl %eax,%eax
>> ^
>> xchgl %edx,%eax xchgl %edx,%eax
>> ret ret
>>
>> At least and especially on Intel processors XCHG was and
>> still is a rather slow instruction and should be avoided.
>> Use the following better code sequences instead:
>>
>> 1: 1:
>> shll %cl,%eax shrl %cl,%edx
>> movl %eax,%edx movl %edx,%eax
>> xorl %eax,%eax xorl %edx,%edx
>> ret ret
>>
>> regards
>> Stefan Kanthak
>>
>
> XCHG is slow for register-memory operations due to implicit locking, but
> should be fine for register-register.
"but should be fine" is not enough: XCHG is of course slow for register-
register operations too, otherwise I would not have spend time to write in.
See https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
or Agner Fogs http://www.agner.org/optimize/instruction_tables.pdf
> Remember, too, that klibc is optimized for size.
Remember that the linker aligns functions on 16 byte boundaries!
With XCHG, these functions have a code size of 29 bytes; with MOV
they grow by 1 byte.
>> PS: I doubt that a current GCC emits calls of the routines
>> in the /usr/klibc/arch/i386 subdirectory any more.
>
> Which, of course, is even better.
... and means that you can get rid of this subdirectory!
regards
Stefan
More information about the klibc
mailing list