Discussion:
[openssl.org #3541] [PATCH] BN_nist_mod_521 fails on Windows ARM
Gilles Khouzam via RT
2014-09-23 15:13:14 UTC
Permalink
When compiling OpenSSL optimized on ARM using the Microsoft compiler, the wrong code is being emitted for BN_nist_mod_521 (in bn_nist.c). The compiler seems to think that val and temp represent the same item when they are clearly one index apart. I've coded a fix that simply avoid using temporary variables and uses the indices into the t_d array directly. The code is simply a refactor of the existing code and does generate very effective neon instructions for the loop.

ectest test which was always failing before in ARM on Windows is now succeeding (as well as all the other tests).


Gilles Khouzam
Senior Development Lead
Microsoft OSG
Andy Polyakov via RT
2014-09-23 19:49:49 UTC
Permalink
Post by Gilles Khouzam via RT
When compiling OpenSSL optimized on ARM using the Microsoft compiler,
Out of curiosity, which version? There is record of it generating bad
code specifically in bn_nist.c. At one occasion it was reported that it
generates bad code when optimization is switched off.
Post by Gilles Khouzam via RT
the wrong code is being emitted for BN_nist_mod_521 (in bn_nist.c).
The compiler seems to think that val and temp represent the same item
when they are clearly one index apart. I've coded a fix that simply
avoid using temporary variables and uses the indices into the t_d
array directly. The code is simply a refactor of the existing code
and does generate very effective neon instructions for the loop.
ectest test which was always failing before in ARM on Windows is now
succeeding (as well as all the other tests).
A recall looking at code generated at x86 and not liking the result with
code similar to what you suggest. Which is why those temporary values
were added. I wonder if you could test following loop.

for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
{
t_d[i] = (val>>BN_NIST_521_RSHIFT |
(val=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
}
t_d[i] = val>>BN_NIST_521_RSHIFT;

BTW, is there interest to adapt ARM/NEON assembly for Windows?


______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org
Andy Polyakov via RT
2014-09-23 21:01:28 UTC
Permalink
Post by Gilles Khouzam via RT
When compiling OpenSSL optimized on ARM using the Microsoft compiler,
the wrong code is being emitted for BN_nist_mod_521 (in bn_nist.c).
The compiler seems to think that val and temp represent the same item
when they are clearly one index apart. I've coded a fix that simply
avoid using temporary variables and uses the indices into the t_d
array directly. The code is simply a refactor of the existing code
and does generate very effective neon instructions for the loop.
ectest test which was always failing before in ARM on Windows is now
succeeding (as well as all the other tests).
I recall looking at code generated at x86 and not liking the result with
code similar to what you suggest. Which is why those temporary values
were added. I wonder if you could test following loop.
for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
{
t_d[i] = (val>>BN_NIST_521_RSHIFT |
(val=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
}
t_d[i] = val>>BN_NIST_521_RSHIFT;
Other compilers issue warnings, so try this instead

for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
{
t_d[i] = (val>>BN_NIST_521_RSHIFT |
(tmp=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
val=tmp;
}
t_d[i] = val>>BN_NIST_521_RSHIFT;

If it doesn't work, then we'll go for removing temporary values.


______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org
Gilles Khouzam via RT
2014-09-24 02:40:43 UTC
Permalink
Thanks Andy,

With the change suggested, the tests do succeed.

-----Original Message-----
From: Andy Polyakov via RT [mailto:***@openssl.org]
Sent: Tuesday, September 23, 2014 14:01
To: Gilles Khouzam
Cc: openssl-***@openssl.org
Subject: Re: [openssl.org #3541] [PATCH] BN_nist_mod_521 fails on Windows ARM
Post by Gilles Khouzam via RT
When compiling OpenSSL optimized on ARM using the Microsoft compiler,
the wrong code is being emitted for BN_nist_mod_521 (in bn_nist.c).
The compiler seems to think that val and temp represent the same item
when they are clearly one index apart. I've coded a fix that simply
avoid using temporary variables and uses the indices into the t_d
array directly. The code is simply a refactor of the existing code
and does generate very effective neon instructions for the loop.
ectest test which was always failing before in ARM on Windows is now
succeeding (as well as all the other tests).
I recall looking at code generated at x86 and not liking the result
with code similar to what you suggest. Which is why those temporary
values were added. I wonder if you could test following loop.
for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
{
t_d[i] = (val>>BN_NIST_521_RSHIFT |
(val=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
}
t_d[i] = val>>BN_NIST_521_RSHIFT;
Other compilers issue warnings, so try this instead

for (val=t_d[0],i=0; i<BN_NIST_521_TOP-1; i++)
{
t_d[i] = (val>>BN_NIST_521_RSHIFT |
(tmp=t_d[i+1])<<BN_NIST_521_LSHIFT) & BN_MASK2;
val=tmp;
}
t_d[i] = val>>BN_NIST_521_RSHIFT;

If it doesn't work, then we'll go for removing temporary values.



______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org
Loading...