[klibc] Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

H. Peter Anvin hpa at zytor.com
Mon Aug 19 15:51:01 PDT 2019


On 8/14/19 9:42 PM, Stefan Kanthak wrote:
> Hi,
> 
> both
> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S
> and
> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
> use the following code sequences for shift counts greater 31:
> 
> 1:                         1:
>      xorl  %edx,%edx            shrl  %cl,%edx
>      shl   %cl,%eax             xorl  %eax,%eax
>         ^
>      xchgl %edx,%eax            xchgl %edx,%eax
>      ret                        ret
> 
> At least and especially on Intel processors XCHG was and
> still is a rather slow instruction and should be avoided.
> Use the following better code sequences instead:
> 
> 1:                         1:
>      shll  %cl,%eax             shrl  %cl,%edx
>      movl  %eax,%edx            movl  %edx,%eax
>      xorl  %eax,%eax            xorl  %edx,%edx
>      ret                        ret
> 
> regards
> Stefan Kanthak
> 

XCHG is slow for register-memory operations due to implicit locking, but
should be fine for register-register. Remember, too, that klibc is
optimized for size.

> PS: I doubt that a current GCC emits calls of the routines
>     in the /usr/klibc/arch/i386 subdirectory any more.

Which, of course, is even better.

	-hpa



More information about the klibc mailing list