[openssl.org #3554] [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix

Discussion:

David Leon Gil via RT

2014-10-01 07:09:04 UTC

All internal exports: Zeroize XMM registers that may contain secret
data before returning. (At 4x pxors per cycle, the overhead is
negligible.)

_ctr32: Zeroize $key0 and $ctr.

aesni_ecb_encrypt: If $win64, saves and restores xmm registers with
callee-save status under the Win64 ABI. The code is adapted fairly
directly from _ctr32.

The Win64 fix is untested! I don't have a Windows development machine
at the moment.

(By the way, is OPENSSL_wipe_cpu used or tested anywhere?)

Andy Polyakov via RT

2014-10-01 13:24:29 UTC

Permalink

Post by David Leon Gil via RT
All internal exports: Zeroize XMM registers that may contain secret
data before returning. (At 4x pxors per cycle, the overhead is
negligible.)
_ctr32: Zeroize $key0 and $ctr.

Question is why just aesni module? Why not everywhere? Why not demand
that compiler does it too? Why just registers, and not stack too? The
answer is that it doesn't make much sense, because the code you are
trying to "protect" against resides in same process context and can read
all the secrets from memory much more reliably than from registers or
stack. I'm not saying that it makes no sense to clean up, only that *if*
we do choose to do it, then it should be done for right reason and
consistently.

Post by David Leon Gil via RT
(By the way, is OPENSSL_wipe_cpu used or tested anywhere?)

No. For reference, the original idea was to seek opportunity to deploy
it on border to libcrypto shared library, i.e. not upon return from some
specific subroutines, but automatically upon any return from within
libcrypto to application code. The idea was never realized though. The
only case when it makes sense is when some other code called afterwards
refers to register or ex-local variable used in libcrypto as
uninitialized and unintentionally sends that data to another process or
over network. Even then one can argue that it's that code vulnerability
that should be fixed. It might make sense to use OPENSSL_wipe_cpu in
specialized cases, when OpenSSL components are used in non-multi-task
environment, but that would be something for specific developer to
reason around and do.

______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org

Andy Polyakov via RT

2014-10-01 13:40:04 UTC

Permalink

Post by Andy Polyakov via RT

Well, I'm being a little bit inconsistent here, because there are
*stack* cleanups in some other modules, most notably in BN. But question
why registers and why in just aesni still stands. If you can present
coherent argument and consensus is reached, then it would have to be
implemented universally, not only in aesni-x86_64 module.

______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org

David Leon Gil via RT

2014-10-01 14:08:30 UTC

Permalink

Post by Andy Polyakov via RT
If you can present
coherent argument and consensus is reached, then it would have to be
implemented universally, not only in aesni-x86_64 module.

So, hopefully my cross-posted message convinced you. To summarize the
argument briefly:

- Library users may be performing a mix of private cryptographic
operations and operations controlled by untrusted code.
- A large "API" may be exposed to the untrusted code.
- It's easier to sanitize secrets from memory and registers when we
know they contain secrets than it is to ensure that *no* other
functions may leak register contents to untrusted code.
- The cost is negligible. (And it's lower for us than library clients:
They have no way of knowing what registers have been used, so they
would need to do the equivalent of OPENSSL_wipe_cpu.)

The reasons for targeting AES-NI first: I was working on another patch
(not yet submitted) to that file.

The reason for doing this one-file-at-a-time: A single huge patch
would likely see little meaningful review (reviewing assembler is
fairly tiring even at the scope of a single file).

______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org

Andy Polyakov via RT

2014-10-01 18:02:58 UTC

Permalink

Post by David Leon Gil via RT

Post by Andy Polyakov via RT
If you can present
coherent argument and consensus is reached, then it would have to be
implemented universally, not only in aesni-x86_64 module.

So, hopefully my cross-posted message convinced you.

No, not really. What I meant was that as long as you can't ask even
compiler to wipe used registers and stack frame, it doesn't really make
sense to strive for this in assembly. Or in other words if we set up for
such quest, then we should make corresponding cases with all compiler
developers/manufacturers.

Post by David Leon Gil via RT
To summarize the
- Library users may be performing a mix of private cryptographic
operations and operations controlled by untrusted code.

Even case for compiler-generated code.

Post by David Leon Gil via RT
- A large "API" may be exposed to the untrusted code.

Even case for compiler-generated code.

Post by David Leon Gil via RT
- It's easier to sanitize secrets from memory and registers when we
know they contain secrets than it is to ensure that *no* other
functions may leak register contents to untrusted code.

Even case for compiler-generated code.

Post by David Leon Gil via RT
They have no way of knowing what registers have been used, so they
would need to do the equivalent of OPENSSL_wipe_cpu.)

Even would be case for compiler-generated code.

______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org

David Leon Gil

2014-10-02 00:30:51 UTC

Permalink

Post by Andy Polyakov via RT
Or in other words if we set up for
such quest, then we should make corresponding cases with all compiler
developers/manufacturers.

Agreed.

Post by Andy Polyakov via RT

Post by David Leon Gil via RT
They have no way of knowing what registers have been used, so they
would need to do the equivalent of OPENSSL_wipe_cpu.)

Even would be case for compiler-generated code.

Agreed.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org

David Leon Gil via RT

2014-10-02 00:31:27 UTC

Permalink

Post by Andy Polyakov via RT
Or in other words if we set up for
such quest, then we should make corresponding cases with all compiler
developers/manufacturers.

Agreed.

Post by Andy Polyakov via RT

Post by David Leon Gil via RT
They have no way of knowing what registers have been used, so they
would need to do the equivalent of OPENSSL_wipe_cpu.)

Even would be case for compiler-generated code.

Agreed.

______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org

David Leon Gil via RT

2014-10-02 00:37:11 UTC

Permalink

(Many thanks to Lutz for pointing out that I omitted the subject line;
hopefully this isn't a duplicate.)

---------- Forwarded message ----------
Date: Wed, 1 Oct 2014 09:45:10 -0400
Subject: Re: [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix

Question is why just aesni module? Why not everywhere?

It should be done everywhere. But I only have limited time to spend on
this, so...one file at a time.

Why not demand that compiler does it too?

See, e.g., http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html
for someone who is demanding just that.

I have heard there is some work on a compiler pass to do so, in fact.

Why just registers, and not stack too?

You're right; I was neglecting the stack on Win64. (Some of the
assembler already appears to clean parts of the stack in that case; on
other platforms the stack isn't used IIRC.)

The answer is that it doesn't make much sense, because the code you are
trying to "protect" against resides in same process context

Maybe yes, maybe no; a lot of things can reside in the same process.
Motivating example: WebCrypto provides an API to perform encryption
with keys that aren't accessible to the JS.

(Thanks for the background on wipe_cpu.)

______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-***@openssl.org
Automated List Manager ***@openssl.org