1. 30 Apr, 2016 2 commits
    • David Benjamin's avatar
      Make CRYPTO_is_NEON_capable aware of the buggy CPU. · 1b5bcb59
      David Benjamin authored
      If we're to allow the buggy CPU workaround to fire when __ARM_NEON__ is set,
      CRYPTO_is_NEON_capable also needs to be aware of it. Also add an API to export
      this value out of BoringSSL, so we can get some metrics on how prevalent this
      chip is.
      
      BUG=chromium:606629
      
      Change-Id: I97d65a47a6130689098b32ce45a8c57c468aa405
      Reviewed-on: https://boringssl-review.googlesource.com/7796
      
      Reviewed-by: default avatarAdam Langley <[email protected]>
      1b5bcb59
    • David Benjamin's avatar
      Don't set a default armcap state in dynamic armcap modes. · b69307a1
      David Benjamin authored
      The getauxval (and friends) code would be filling that in anyway. The default
      only serves to enable NEON even if the OS is old enough to be missing getauxval
      (and everything else).
      
      Notably, this unbreaks the has_buggy_neon code when __ARM_NEON__ is set, as is
      the case in Chrome for Android, as of M50.  Before, the default
      OPENSSL_armcap_P value was getting in the way.
      
      Arguably, this doesn't make a whole lot of sense. We're saying we'll let the
      CPU run compiler-generated NEON code, but not our hand-crafted stuff. But, so
      far, we only have evidence of the hand-written NEON tickling the bug and not
      the compiler-generated stuff, so avoid the unintentional regression. (Naively,
      I would expect the hand-crafted NEON is better at making full use of the
      pipeline and is thus more likely to tickle the CPU bug.)
      
      This is not the fix for M50, as in the associated Chromium bug, but it will fix
      master and M51. M50 will instead want to revert
      https://codereview.chromium.org/1730823002.
      
      BUG=chromium:606629
      
      Change-Id: I394f97fea2f09891dd8fa30e0ec6fc6b1adfab7a
      Reviewed-on: https://boringssl-review.googlesource.com/7794
      
      Reviewed-by: default avatarAdam Langley <[email protected]>
      b69307a1
  2. 01 Apr, 2016 9 commits
  3. 30 Mar, 2016 7 commits
  4. 29 Mar, 2016 8 commits
  5. 28 Mar, 2016 3 commits
    • David Benjamin's avatar
      Import chacha-x86.pl fix. · 762e1d03
      David Benjamin authored
      Patch from https://mta.openssl.org/pipermail/openssl-dev/2016-March/005625.html.
      
      Upstream has yet to make a decision on aliasing requirements for their
      assembly. If they choose to go with the stricter aliasing requirement rather
      than land this patch, we'll probably want to tweak EVP_AEAD's API guarantees
      accordingly and then undiverge.
      
      In the meantime, import this to avoid a regression on x86 from when we had
      compiler-vectorized code on GCC platforms.  Per our assembly coverage tools and
      pending multi-CPU-variant tests, we have good coverage here. Unlike Poly1305
      (which is currently waiting on yet another upstream bugfix), where there is
      risk of missed carries everywhere, it is much more difficult to accidentally
      make a ChaCha20 implementation that fails based on the data passed into it.
      
      This restores a sizeable speed improvement on x86.
      
      Before:
      Did 1131000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000205us (1130768.2 ops/sec): 18.1 MB/s
      Did 161000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1006136us (160018.1 ops/sec): 216.0 MB/s
      Did 28000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1023264us (27363.4 ops/sec): 224.2 MB/s
      Did 1166000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000447us (1165479.0 ops/sec): 18.6 MB/s
      Did 160000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1004818us (159232.8 ops/sec): 215.0 MB/s
      Did 30000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1016977us (29499.2 ops/sec): 241.7 MB/s
      
      After:
      Did 2208000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000031us (2207931.6 ops/sec): 35.3 MB/s
      Did 402000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001717us (401310.9 ops/sec): 541.8 MB/s
      Did 97000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005394us (96479.6 ops/sec): 790.4 MB/s
      Did 2444000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000089us (2443782.5 ops/sec): 39.1 MB/s
      Did 459000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000563us (458741.7 ops/sec): 619.3 MB/s
      Did 97000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1007942us (96235.7 ops/sec): 788.4 MB/s
      
      Change-Id: I976da606dae062a776e0cc01229ec03a074035d1
      Reviewed-on: https://boringssl-review.googlesource.com/7561
      
      Reviewed-by: default avatarSteven Valdez <[email protected]>
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      762e1d03
    • David Benjamin's avatar
      Remove unnecessary include. · 17d729e6
      David Benjamin authored
      Change-Id: I24d0179ca5019e82ca1494c8773f373f8c09ce82
      Reviewed-on: https://boringssl-review.googlesource.com/7566
      
      Reviewed-by: default avatarSteven Valdez <[email protected]>
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      17d729e6
    • David Benjamin's avatar
      Fix typo in comment. · 2aca2264
      David Benjamin authored
      Change-Id: I0effe99d244c4ccdbb0e34db6e01a59c9463cb15
      Reviewed-on: https://boringssl-review.googlesource.com/7572
      
      Reviewed-by: default avatarSteven Valdez <[email protected]>
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      2aca2264
  6. 27 Mar, 2016 5 commits
  7. 26 Mar, 2016 6 commits
    • David Benjamin's avatar
      Work around Android devices without AT_HWCAP2. · a2d4c0c4
      David Benjamin authored
      Some ARMv8 Android devices don't have AT_HWCAP2. This means, when running in
      32-bit mode (ARM capability APIs on Linux are different between AArch32 and
      AArch64), we can't discover the various nice instructions.
      
      On a Nexus 6P, this gives a, uh, minor performance win when running in 32-bit
      mode.
      
      Before:
      Did 1085000 AES-128-GCM (16 bytes) seal operations in 1000003us (1084996.7 ops/sec): 17.4 MB/s
      Did 60000 AES-128-GCM (1350 bytes) seal operations in 1013416us (59205.7 ops/sec): 79.9 MB/s
      Did 11000 AES-128-GCM (8192 bytes) seal operations in 1019778us (10786.7 ops/sec): 88.4 MB/s
      Did 1009000 AES-256-GCM (16 bytes) seal operations in 1000650us (1008344.6 ops/sec): 16.1 MB/s
      Did 49000 AES-256-GCM (1350 bytes) seal operations in 1015698us (48242.7 ops/sec): 65.1 MB/s
      Did 9394 AES-256-GCM (8192 bytes) seal operations in 1071104us (8770.4 ops/sec): 71.8 MB/s
      Did 1557000 SHA-1 (16 bytes) operations in 1000317us (1556506.6 ops/sec): 24.9 MB/s
      Did 762000 SHA-1 (256 bytes) operations in 1000527us (761598.6 ops/sec): 195.0 MB/s
      Did 45000 SHA-1 (8192 bytes) operations in 1013773us (44388.6 ops/sec): 363.6 MB/s
      Did 1459000 SHA-256 (16 bytes) operations in 1000271us (1458604.7 ops/sec): 23.3 MB/s
      Did 538000 SHA-256 (256 bytes) operations in 1000990us (537467.9 ops/sec): 137.6 MB/s
      Did 26000 SHA-256 (8192 bytes) operations in 1008403us (25783.3 ops/sec): 211.2 MB/s
      
      After:
      Did 1890000 AES-128-GCM (16 bytes) seal operations in 1000068us (1889871.5 ops/sec): 30.2 MB/s
      Did 509000 AES-128-GCM (1350 bytes) seal operations in 1000112us (508943.0 ops/sec): 687.1 MB/s
      Did 110000 AES-128-GCM (8192 bytes) seal operations in 1007966us (109130.7 ops/sec): 894.0 MB/s
      Did 1960000 AES-256-GCM (16 bytes) seal operations in 1000303us (1959406.3 ops/sec): 31.4 MB/s
      Did 460000 AES-256-GCM (1350 bytes) seal operations in 1001873us (459140.0 ops/sec): 619.8 MB/s
      Did 97000 AES-256-GCM (8192 bytes) seal operations in 1005337us (96485.1 ops/sec): 790.4 MB/s
      Did 1927000 SHA-1 (16 bytes) operations in 1000429us (1926173.7 ops/sec): 30.8 MB/s
      Did 1151000 SHA-1 (256 bytes) operations in 1000425us (1150511.0 ops/sec): 294.5 MB/s
      Did 87000 SHA-1 (8192 bytes) operations in 1003089us (86732.1 ops/sec): 710.5 MB/s
      Did 2357390 SHA-256 (16 bytes) operations in 1000116us (2357116.6 ops/sec): 37.7 MB/s
      Did 1410000 SHA-256 (256 bytes) operations in 1000176us (1409751.9 ops/sec): 360.9 MB/s
      Did 101000 SHA-256 (8192 bytes) operations in 1007007us (100297.2 ops/sec): 821.6 MB/s
      
      BUG=chromium:596156
      
      Change-Id: Iacc1f8d8a07e991d4615f2e12c5c54923fb31aa2
      Reviewed-on: https://boringssl-review.googlesource.com/7507
      
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      a2d4c0c4
    • David Benjamin's avatar
      Rewrite ARM feature detection. · 054e151b
      David Benjamin authored
      This removes the thread-unsafe SIGILL-based detection and the
      multi-consumer-hostile CRYPTO_set_NEON_capable API. (Changing
      OPENSSL_armcap_P after initialization is likely to cause problems.)
      
      The right way to detect ARM features on Linux is getauxval. On aarch64,
      we should be able to rely on this, so use it straight. Split this out
      into its own file. The #ifdefs in the old cpu-arm.c meant it shared all
      but no code with its arm counterpart anyway.
      
      Unfortunately, various versions of Android have different missing APIs, so, on
      arm, we need a series of workarounds. Previously, we used a SIGILL fallback
      based on OpenSSL's logic, but this is inherently not thread-safe. (SIGILL also
      does not tell us if the OS knows how to save and restore NEON state.) Instead,
      base the behavior on Android NDK's cpu-features library, what Chromium
      currently uses with CRYPTO_set_NEON_capable:
      
      - Android before API level 20 does not provide getauxval. Where missing,
        we can read from /proc/self/auxv.
      
      - On some versions of Android, /proc/self/auxv is also not readable, so
        use /proc/cpuinfo's Features line.
      
      - Linux only advertises optional features in /proc/cpuinfo. ARMv8 makes NEON
        mandatory, so /proc/cpuinfo can't be used without additional effort.
      
      Finally, we must blacklist a particular chip because the NEON unit is broken
      (https://crbug.com/341598).
      
      Unfortunately, this means CRYPTO_library_init now depends on /proc being
      available, which will require some care with Chromium's sandbox. The
      simplest solution is to just call CRYPTO_library_init before entering
      the sandbox.
      
      It's worth noting that Chromium's current EnsureOpenSSLInit function already
      depends on /proc/cpuinfo to detect the broken CPU, by way of base::CPU.
      android_getCpuFeatures also interally depends on it. We were already relying on
      both of those being stateful and primed prior to entering the sandbox.
      
      BUG=chromium:589200
      
      Change-Id: Ic5d1c341aab5a614eb129d8aa5ada2809edd6af8
      Reviewed-on: https://boringssl-review.googlesource.com/7506
      
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      054e151b
    • Brian Smith's avatar
      Fix build when using Visual Studio 2015 Update 1. · dc6c1b83
      Brian Smith authored
      Many of the compatibility issues are described at
      https://msdn.microsoft.com/en-us/library/mt612856.aspx. The macros
      that suppressed warnings on a per-function basis no longer work in
      Update 1, so replace them with #pragmas. Update 1 warns when |size_t|
      arguments to |printf| are casted, so stop doing that casting.
      Unfortunately, this requires an ugly hack to continue working in
      MSVC 2013 as MSVC 2013 doesn't support "%zu". Finally, Update 1 has new
      warnings, some of which need to be suppressed.
      
      ---
      
      Updated by davidben to give up on suppressing warnings in crypto/x509 and
      crypto/x509v3 as those directories aren't changed much from upstream. In each
      of these cases, upstream opted just blindly initialize the variable, so do the
      same. Also switch C4265 to level 4, per Microsoft's recommendation and work
      around a bug in limits.h that happens to get fixed by Google include order
      style.
      
      (limits.h is sensitive to whether corecrt.h, pulled in by stddef.h and some
      other headers, is included before it. The reason it affected just one file is
      we often put the file's header first, which means base.h is pulling in
      stddef.h. Relying on this is ugly, but it's no worse than what everything else
      is doing and this doesn't seem worth making something as tame as limits.h so
      messy to use.)
      
      Change-Id: I02d1f935356899f424d3525d03eca401bfa3e6cd
      Reviewed-on: https://boringssl-review.googlesource.com/7480
      
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      dc6c1b83
    • David Benjamin's avatar
      Add tests for RSA objects with only n and d. · db50299b
      David Benjamin authored
      Conscrypt, thanks to Java's RSAPrivateKeySpec API, must be able to use RSA keys
      with only modulus and exponent. This is kind of silly and breaks the blinding
      code so they, both in OpenSSL and BoringSSL, had to explicitly turn blinding
      off.
      
      Add a test for this as we're otherwise sure to break it on accident.
      
      We may wish to avoid the silly rsa->flags modification, I'm not sure. For now,
      keep the requirement in so other consumers do not accidentally rely on this.
      
      (Also add a few missing ERR_clear_error calls. Functions which are expected to
      fail should be followed by an ERR_clear_error so later unexpected failures
      don't get confused.)
      
      BUG=boringssl:12
      
      Change-Id: I674349821f1f59292b8edd085f21dc37e8bcaa75
      Reviewed-on: https://boringssl-review.googlesource.com/7560
      
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      db50299b
    • Brian Smith's avatar
      Clarify lifecycle of |BN_BLINDING|. · cbf56a56
      Brian Smith authored
      In |bn_blinding_update| the condition |b->e != NULL| would never be
      true (probably), but the test made reasoning about the correctness of
      the code confusing. That confusion was amplified by the circuitous and
      unusual way in which |BN_BLINDING|s are constructed. Clarify all this
      by simplifying the construction of |BN_BLINDING|s, making it more like
      the construction of other structures.
      
      Also, make counter unsigned as it is no longer ever negative.
      
      Change-Id: I6161dcfeae19a80c780ccc6762314079fca1088b
      Reviewed-on: https://boringssl-review.googlesource.com/7530
      
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      cbf56a56
    • Brian Smith's avatar
      Always cache Montgomery contexts in RSA. · 24493a4f
      Brian Smith authored
      Simplify the code by always caching Montgomery contexts in the RSA
      structure, regardless of the |RSA_FLAG_CACHE_PUBLIC| and
      |RSA_FLAG_CACHE_PRIVATE| flags. Deprecate those flags.
      
      Now that we do this no more than once per key per RSA exponent, the
      private key exponents better because the initialization of the
      Montgomery contexts isn't perfectly side-channel protected.
      
      Change-Id: I4fbcfec0f2f628930bfeb811285b0ae3d103ac5e
      Reviewed-on: https://boringssl-review.googlesource.com/7521
      
      Reviewed-by: default avatarDavid Benjamin <[email protected]>
      24493a4f