[klibc] Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
hpa at zytor.com
hpa at zytor.com
Mon Aug 19 18:27:33 PDT 2019
On August 19, 2019 5:35:51 PM PDT, Stefan Kanthak <stefan.kanthak at nexgo.de> wrote:
>"H. Peter Anvin" <hpa at zytor.com> wrote August 20, 2019 12:51 AM:
>
>> On 8/14/19 9:42 PM, Stefan Kanthak wrote:
>>> Hi,
>>>
>>> both
>>>
>https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S
>>> and
>>>
>https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
>>> use the following code sequences for shift counts greater 31:
>>>
>>> 1: 1:
>>> xorl %edx,%edx shrl %cl,%edx
>>> shl %cl,%eax xorl %eax,%eax
>>> ^
>>> xchgl %edx,%eax xchgl %edx,%eax
>>> ret ret
>>>
>>> At least and especially on Intel processors XCHG was and
>>> still is a rather slow instruction and should be avoided.
>>> Use the following better code sequences instead:
>>>
>>> 1: 1:
>>> shll %cl,%eax shrl %cl,%edx
>>> movl %eax,%edx movl %edx,%eax
>>> xorl %eax,%eax xorl %edx,%edx
>>> ret ret
>>>
>>> regards
>>> Stefan Kanthak
>>>
>>
>> XCHG is slow for register-memory operations due to implicit locking,
>but
>> should be fine for register-register.
>
>"but should be fine" is not enough: XCHG is of course slow for
>register-
>register operations too, otherwise I would not have spend time to write
>in.
>See
>https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
>or Agner Fogs http://www.agner.org/optimize/instruction_tables.pdf
>
>> Remember, too, that klibc is optimized for size.
>
>Remember that the linker aligns functions on 16 byte boundaries!
>With XCHG, these functions have a code size of 29 bytes; with MOV
>they grow by 1 byte.
>
>>> PS: I doubt that a current GCC emits calls of the routines
>>> in the /usr/klibc/arch/i386 subdirectory any more.
>>
>> Which, of course, is even better.
>
>... and means that you can get rid of this subdirectory!
>
>regards
>Stefan
Leaving it for compatibility with old gcc... but let's not churn the code: old gcc is by definition slow.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
More information about the klibc
mailing list