linux

Author	SHA1	Message	Date
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Jia He	9369693a2c	crypto: arm64/poly1305 - move data to rodata section When objtool gains support for ARM in the future, it may encounter issues disassembling the following data in the .text section: > .Lzeros: > .long 0,0,0,0,0,0,0,0 > .asciz "Poly1305 for ARMv8, CRYPTOGAMS by \@dot-asm" > .align 2 Move it to .rodata which is a more appropriate section for read-only data. There is a limit on how far the label can be from the instruction, hence use "adrp" and low 12bits offset of the label to avoid the compilation error. Signed-off-by: Jia He <justin.he@arm.com> Tested-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-08-17 13:55:49 +08:00
Herbert Xu	b0cd6f4c3f	Revert "crypto: arm64/poly1305 - move data to rodata section" This reverts commit `47d9625209`. It causes build issues as detected by the kernel test robot. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408040817.OWKXtCv6-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-08-06 13:45:59 +08:00
Jia He	47d9625209	crypto: arm64/poly1305 - move data to rodata section When objtool gains support for ARM in the future, it may encounter issues disassembling the following data in the .text section: > .Lzeros: > .long 0,0,0,0,0,0,0,0 > .asciz "Poly1305 for ARMv8, CRYPTOGAMS by \@dot-asm" > .align 2 Move it to .rodata which is a more appropriate section for read-only data. Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-08-02 21:11:20 +08:00
Jeff Johnson	b568826eff	crypto: arm64 - add missing MODULE_DESCRIPTION() macros With ARCH=arm64, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/crct10dif-ce.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/poly1305-neon.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/aes-neon-bs.o Add the missing invocations of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-06-21 22:04:16 +10:00
Mark Brown	a720de9fba	crypto: arm64/crc10dif - Raise priority of NEON crct10dif implementation The NEON implementation of crctd10dif is registered with a priority of 100 which is identical to that used by the generic C implementation. Raise the priority to 150, half way between the PMULL based implementation and the NEON one, so that it will be preferred over the generic implementation. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-05-31 17:34:56 +08:00
Ard Biesheuvel	571e557cba	crypto: arm64/aes-ce - Simplify round key load sequence Tweak the round key logic so that they can be loaded using a single branchless sequence using overlapping loads. This is shorter and simpler, and puts the conditional branches based on the key size further apart, which might benefit microarchitectures that cannot record taken branches at every instruction. For these branches, use test-bit-branch instructions that don't clobber the condition flags. Note that none of this has any impact on performance, positive or otherwise (and the branch prediction benefit would only benefit AES-192 which nobody uses). It does make for nicer code, though. While at it, use \@ to generate the labels inside the macros, which is more robust than using fixed numbers, which could clash inadvertently. Also, bring aes-neon.S in line with these changes, including the switch to test-and-branch instructions, to avoid surprises in the future when we might start relying on the condition flags being preserved in the chaining mode wrappers in aes-modes.S Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-04-26 17:26:09 +08:00
Linus Torvalds	c8e7699616	This update includes the following changes: API: - Avoid unnecessary copying in scomp for trivial SG lists. Algorithms: - Optimise NEON CCM implementation on ARM64. Drivers: - Add queue stop/query debugfs support in hisilicon/qm. - -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmXxDbcACgkQxycdCkmx i6dCzw//dI5CuszRMalZu8fwvh7K48d4GoY/5uuc1vPK6yLyc5dl6CzPsrKhmBGO Ip2fUCFoO+ViWl5tw8SjOOEnPRztETrIGlOM5PHOIxRuSA6lbcsdo+EQja4WBDo1 fGUNaNVW9eY9XsaUw46uwxaicd+xOqRGJURKB2XuzSHACln/QrNy74aVdbc6VhaK Jnh4o2blsEh7jcVAK2ahNmmDm739w8C462Go4UIl8xcM693HtjUUFx7TALpI0iC+ BWxctAV9IzA/siUwztvwwlo+V1KZbjZFD5XfC/erkV0gFs7ll8IcuDFnZ1suMvt2 Z+6OVhghkkvMJJEl1qKNxVJITOUdXH0yKWqTbEHuFlV7VV+9hjUlWlvRVm7ZIrXf ZXi/OpBJRAIBB6fJqy1RAvfIZUR8Dl8i8RKLVawdqFLbB+XCOiIK7a8+5hCKaWsU 0DVFNCfiJFPIiByzxXV+4Jt/hzxx/qseIbTiUShdoXeHOVWMgH5H255MWJv5qjVy aSwgQLX6MlY9Pj+w5cPHAUjOiGjiAa3iKwsbnmo5lszsPl4KR407gaJnV7VI4s6R fhqAgIjJ4Ik2lHzQRM89QU2OOogVBPs+FHEkk98lak5CATpjUq97iLbTharAg3dc l8FLoSB9+NM+RCbnzUXuOIvoU/okSM4+j6a3xAnLQH6OxBOtk3Y= =Uw2R -----END PGP SIGNATURE----- Merge tag 'v6.9-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Avoid unnecessary copying in scomp for trivial SG lists Algorithms: - Optimise NEON CCM implementation on ARM64 Drivers: - Add queue stop/query debugfs support in hisilicon/qm - Intel qat updates and cleanups" * tag 'v6.9-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (79 commits) Revert "crypto: remove CONFIG_CRYPTO_STATS" crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem crypto: tcrypt - add ffdhe2048(dh) test crypto: iaa - fix the missing CRYPTO_ALG_ASYNC in cra_flags crypto: hisilicon/zip - fix the missing CRYPTO_ALG_ASYNC in cra_flags hwrng: hisi - use dev_err_probe MAINTAINERS: Remove T Ambarus from few mchp entries crypto: iaa - Fix comp/decomp delay statistics crypto: iaa - Fix async_disable descriptor leak dt-bindings: rng: atmel,at91-trng: add sam9x7 TRNG dt-bindings: crypto: add sam9x7 in Atmel TDES dt-bindings: crypto: add sam9x7 in Atmel SHA dt-bindings: crypto: add sam9x7 in Atmel AES crypto: remove CONFIG_CRYPTO_STATS crypto: dh - Make public key test FIPS-only crypto: rockchip - fix to check return value crypto: jitter - fix CRYPTO_JITTERENTROPY help text crypto: qat - make ring to service map common for QAT GEN4 crypto: qat - fix ring to service map for dcc in 420xx crypto: qat - fix ring to service map for dcc in 4xxx ...	2024-03-15 14:46:54 -07:00
Ard Biesheuvel	1c0cf6d196	crypto: arm64/neonbs - fix out-of-bounds access on short input The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with. It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code. The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs. Fixes: `fc074e1300` ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk") Cc: <stable@vger.kernel.org> Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-02-24 08:37:24 +08:00
Ard Biesheuvel	f691d444f9	crypto: arm64/aes-ccm - Merge finalization into en/decrypt asm helpers The C glue code already infers whether or not the current iteration is the final one, by comparing walk.nbytes with walk.total. This means we can easily inform the asm helpers of this as well, by conditionally passing a pointer to the original IV, which is used in the finalization of the MAC. This removes the need for a separate call into the asm code to perform the finalization. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	7150528849	crypto: arm64/aes-ccm - Merge encrypt and decrypt tail handling The encryption and decryption code paths are mostly identical, except for a small difference where the plaintext input into the MAC is taken from either the input or the output block. We can factor this in quite easily using a vector bit select, and a few additional XORs, without the need for branches. This way, we can use the same tail handling logic on the encrypt and decrypt code paths, allowing further consolidation of the asm helpers in a subsequent patch. (In the main loop, adding just a handful of ALU instructions results in a noticeable performance hit [around 5% on Apple M2], so those routines are kept separate) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	565def1542	crypto: arm64/aes-ccm - Cache round keys and unroll AES loops The CCM code as originally written attempted to use as few NEON registers as possible, to avoid having to eagerly preserve/restore the entire NEON register file at every call to kernel_neon_begin/end. At that time, this API took a number of NEON registers as a parameter, and only preserved that many registers. Today, the NEON register file is restored lazily, and the old API is long gone. This means we can use as many NEON registers as we can make meaningful use of, which means in the AES case that we can keep all round keys in registers rather than reloading each of them for each AES block processed. On Cortex-A53, this results in a speedup of more than 50%. (From 4 cycles per byte to 2.6 cycles per byte) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	948ffc66e5	crypto: arm64/aes-ccm - Reuse existing MAC update for AAD input CCM combines the counter (CTR) encryption mode with a MAC based on the same block cipher. This MAC construction is a bit clunky: it invokes the block cipher in a way that cannot be parallelized, resulting in poor CPU pipeline efficiency. The arm64 CCM code mitigates this by interleaving the encryption and MAC at the AES round level, resulting in a substantial speedup. But this approach does not apply to the additional authenticated data (AAD) which is not encrypted. This means the special asm routine dealing with the AAD is not any better than the MAC update routine used by the arm64 AES block encryption driver, so let's reuse that, and drop the special AES-CCM version. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	c131098d6d	crypto: arm64/aes-ccm - Replace bytewise tail handling with NEON permute Implement the CCM tail handling using a single sequence that uses permute vectors and overlapping loads and stores, rather than going over the tail byte by byte in a loop, and using scalar operations. This is more efficient, even though the measured speedup is only around 1-2% on the CPUs I have tried. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	97c4c10daf	crypto: arm64/aes-ccm - Pass short inputs via stack buffer In preparation for optimizing the CCM core asm code using permutation vectors and overlapping loads and stores, ensure that inputs shorter than the size of a AES block are passed via a buffer on the stack, in a way that positions the data at the end of a 16 byte buffer. This removes the need for the asm code to reason about a rare corner case where the tail of the data cannot be read/written using a single NEON load/store instruction. While at it, tweak the copyright header and authorship to bring it up to date. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	88c6d50f64	crypto: arm64/aes-ccm - Keep NEON enabled during skcipher walk Now that kernel mode NEON no longer disables preemption, we no longer have to take care to disable and re-enable use of the NEON when calling into the skcipher walk API. So just keep it enabled until done. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	f722002441	crypto: arm64/aes-ccm - Revert "Rewrite skcipher walker loop" This reverts commit `57ead1bf1c`, which updated the CCM code to only rely on walk.nbytes to check for failures returned from the skcipher walk API, mostly for the common good rather than to fix a particular problem in the code. This change introduces a problem of its own: the skcipher walk is started with the 'atomic' argument set to false, which means that the skcipher walk API is permitted to sleep. Subsequently, it invokes skcipher_walk_done() with preemption disabled on the final iteration of the loop. This appears to work by accident, but it is arguably a bad example, and providing a better example was the point of the original patch. Given that future changes to the CCM code will rely on the original behavior of entering the loop even for zero sized inputs, let's just revert this change entirely, and proceed from there. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Herbert Xu	29fa12e918	crypto: arm64/sm4 - Remove cfb(sm4) Remove the unused CFB implementation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-12-08 11:59:45 +08:00
Eric Biggers	1be7505933	crypto: arm64/sha512 - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha512_block_data_order() got this backwards. Fix this, albeit without changing the name in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:26 +08:00
Eric Biggers	455951b5e1	crypto: arm64/sha256 - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha256_block_data_order() and __sha256_block_neon() got this backwards. Fix this, albeit without changing the names in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:26 +08:00
Eric Biggers	5f720a3df3	crypto: arm64/sha512-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha512_ce_transform() and __sha512_block_data_order() got this backwards. Fix this, albeit without changing "sha512_block_data_order" in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	ba30d31121	crypto: arm64/sha2-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha2_ce_transform() and __sha256_block_data_order() got this backwards. Fix this, albeit without changing "sha256_block_data_order" in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	1f9f3a5218	crypto: arm64/sha1-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha1_ce_transform() got this backwards. Fix this. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	ddefde7b2a	crypto: arm64/nhpoly1305 - implement ->digest Implement the ->digest method to improve performance on single-page messages by reducing the number of indirect calls. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	1efcbf0eff	crypto: arm64/sha2-ce - implement ->digest for sha256 Implement a ->digest function for sha256-ce. This improves the performance of crypto_shash_digest() with this algorithm by reducing the number of indirect calls that are made. This only adds ~112 bytes of code, mostly for the inlined init, as the finup function is tail-called. For now, don't bother with this for sha224, since sha224 is rarely used. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Masahiro Yamada	ac2d838fb7	crypto: arm64/aes - remove Makefile hack Do it more simiply. This also fixes single target builds. [before] $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- arch/arm64/crypto/aes-glue-ce.i [snip] make[4]: *** No rule to make target 'arch/arm64/crypto/aes-glue-ce.i'. Stop. [after] $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- arch/arm64/crypto/aes-glue-ce.i [snip] CPP arch/arm64/crypto/aes-glue-ce.i Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-08-11 19:19:27 +08:00
Herbert Xu	f573db7aa5	crypto: arm64/sha256-glue - Include module.h Include module.h in arch/arm64/crypto/sha256-glue.c as it uses various macros (such as MODULE_AUTHOR) that are defined there. Also fix the ordering of types.h. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202305191953.PIB1w80W-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-05-19 20:56:59 +08:00
Eric Biggers	47446d7cd4	crypto: arm64/aes-neonbs - fix crash with CFI enabled aesbs_ecb_encrypt(), aesbs_ecb_decrypt(), aesbs_xts_encrypt(), and aesbs_xts_decrypt() are called via indirect function calls. Therefore they need to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause their type hashes to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure if the compiler doesn't happen to optimize out the indirect calls. Fixes: `c50d32859e` ("arm64: Add types to indirect called assembly functions") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-03-14 17:06:44 +08:00
Herbert Xu	4e4a08868f	crypto: arm64/sm4-gcm - Fix possible crash in GCM cryption An often overlooked aspect of the skcipher walker API is that an error is not just indicated by a non-zero return value, but by the fact that walk->nbytes is zero. Thus it is an error to call skcipher_walk_done after getting back walk->nbytes == 0 from the previous interaction with the walker. This is because when walk->nbytes is zero the walker is left in an undefined state and any further calls to it may try to free uninitialised stack memory. The sm4 arm64 ccm code gets this wrong and ends up calling skcipher_walk_done even when walk->nbytes is zero. This patch rewrites the loop in a form that resembles other callers. Reported-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Tianjia Zhang	3b9d902153	crypto: arm64/sm4-ccm - Rewrite skcipher walker loop The fact that an error in the skcipher walker API are indicated not only by a non-zero return value, but also by the fact that walk->nbytes is zero, causes the layout of the skcipher walker loop to be sufficiently different from the usual layout, which is not a problem in itself, but it is likely to cause reading confusion and difficulty in code maintenance. This patch rewrites skcipher walker loop, and separates the last chunk cryption from the loop to avoid wrong calls to the skcipher walker API. In addition to following the usual convention of checking walk->nbytes, it also makes the loop execute logic clearer and easier to understand. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Herbert Xu	57ead1bf1c	crypto: arm64/aes-ccm - Rewrite skcipher walker loop An often overlooked aspect of the skcipher walker API is that an error is not just indicated by a non-zero return value, but by the fact that walk->nbytes is zero. Thus it is an error to call skcipher_walk_done after getting back walk->nbytes == 0 from the previous interaction with the walker. This is because when walk->nbytes is zero the walker is left in an undefined state and any further calls to it may try to free uninitialised stack memory. The arm64 ccm code has to deal with zero-length messages, and it needs to process data even when walk->nbytes == 0 is returned. It doesn't have this bug because there is an explicit check for walk->nbytes != 0 prior to the skcipher_walk_done call. However, the loop is still sufficiently different from the usual layout and it appears to have been copied into other code which then ended up with this bug. This patch rewrites it to follow the usual convention of checking walk->nbytes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Ard Biesheuvel	9e3457112b	crypto: arm64/gcm - add RFC4106 support Add support for RFC4106 ESP encapsulation to the accelerated GCM implementation. This results in a ~10% speedup for IPsec frames of typical size (~1420 bytes) on Cortex-A53. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-01-20 18:29:31 +08:00
Tianjia Zhang	736f88689c	crypto: arm64/sm4 - fix possible crash with CFI enabled The SM4 CCM/GCM assembly functions for encryption and decryption is called via indirect function calls. Therefore they need to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause its type hash to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure (if the compiler didn't happen to optimize out the indirect call). Fixes: `67fa3a7fdf` ("crypto: arm64/sm4 - add CE implementation for CCM mode") Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-30 17:57:42 +08:00
Ard Biesheuvel	a428636d4c	crypto: arm64/ghash-ce - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to set up the stack frame so that return address protections will be enabled automically when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	489a4a05fe	crypto: arm64/crct10dif - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to set up the stack frame so that return address protections will be enabled automically when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	7d709af180	crypto: arm64/aes-modes - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to create the stack frames in the AES chaining mode wrappers so that they will get PAC and/or shadow call stack protection when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	67ab02dce3	crypto: arm64/aes-neonbs - use frame_push/pop consistently Use the frame_push and frame_pop macros consistently to create the stack frame, so that we will get PAC and/or shadow call stack handling as well when enabled. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Herbert Xu	14386d4713	crypto: Prepare to move crypto_tfm_ctx The helper crypto_tfm_ctx is only used by the Crypto API algorithm code and should really be in algapi.h. However, for historical reasons many files relied on it to be in crypto.h. This patch changes those files to use algapi.h instead in prepartion for a move. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-02 18:12:40 +08:00
Eric Biggers	be8f6b6496	crypto: arm64/sm3 - fix possible crash with CFI enabled sm3_neon_transform() is called via indirect function calls. Therefore it needs to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause its type hash to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure (if the compiler didn't happen to optimize out the indirect call). Fixes: `c50d32859e` ("arm64: Add types to indirect called assembly functions") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:19 +08:00
Eric Biggers	e5e1c67e2f	crypto: arm64/nhpoly1305 - eliminate unnecessary CFI wrapper Since the CFI implementation now supports indirect calls to assembly functions, take advantage of that rather than use a wrapper function. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:19 +08:00
Tianjia Zhang	824db5cd1e	crypto: arm64 - Fix unused variable compilation warnings of cpu_feature The cpu feature defined by MODULE_DEVICE_TABLE is only referenced when compiling as a module, and the warning of unused variable will be encountered when compiling with intree. The warning can be removed by adding the __maybe_unused flag. Fixes: `03c9a333fe` ("crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL") Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Ard Biesheuvel	61c581a46a	crypto: move gf128mul library into lib/crypto The gf128mul library does not depend on the crypto API at all, so it can be moved into lib/crypto. This will allow us to use it in other library code in a subsequent patch without having to depend on CONFIG_CRYPTO. While at it, change the Kconfig symbol name to align with other crypto library implementations. However, the source file name is retained, as it is reflected in the module .ko filename, and changing this might break things for users. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-11 18:14:59 +08:00
Tianjia Zhang	ae1b83c7d5	crypto: arm64/sm4 - add CE implementation for GCM mode This patch is a CE-optimized assembly implementation for GCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 224 and 224 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is gcm_base(ctr-sm4-ce,ghash-generic)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (gcm_base(ctr-sm4-ce,ghash-generic)): gcm(sm4) \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc \| 25.24 64.65 104.66 116.69 123.81 125.12 129.67 130.62 GCM dec \| 25.40 64.80 104.74 116.70 123.81 125.21 129.68 130.59 GCM mb enc \| 24.95 64.06 104.20 116.38 123.55 124.97 129.63 130.61 GCM mb dec \| 24.92 64.00 104.13 116.34 123.55 124.98 129.56 130.48 After: gcm-sm4-ce \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc \| 108.62 397.18 971.60 1283.92 1522.77 1513.39 1777.00 1806.96 GCM dec \| 116.36 398.14 1004.27 1319.11 1624.21 1635.43 1932.54 1974.20 GCM mb enc \| 107.13 391.79 962.05 1274.94 1514.76 1508.57 1769.07 1801.58 GCM mb dec \| 113.40 389.36 988.51 1307.68 1619.10 1631.55 1931.70 1970.86 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:35:44 +08:00
Tianjia Zhang	67fa3a7fdf	crypto: arm64/sm4 - add CE implementation for CCM mode This patch is a CE-optimized assembly implementation for CCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 223 and 225 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is ccm_base(ctr-sm4-ce,cbcmac-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (rfc4309(ccm_base(ctr-sm4-ce,cbcmac-sm4-ce))): ccm(sm4) \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------- CCM enc \| 35.07 125.40 336.47 468.17 581.97 619.18 712.56 736.01 CCM dec \| 34.87 124.40 335.08 466.75 581.04 618.81 712.25 735.89 CCM mb enc \| 34.71 123.96 333.92 465.39 579.91 617.49 711.45 734.92 CCM mb dec \| 34.42 122.80 331.02 462.81 578.28 616.42 709.88 734.19 After (rfc4309(ccm-sm4-ce)): ccm-sm4-ce \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------- CCM enc \| 77.12 249.82 569.94 725.17 839.27 867.71 952.87 969.89 CCM dec \| 75.90 247.26 566.29 722.12 836.90 865.95 951.74 968.57 CCM mb enc \| 75.98 245.25 562.91 718.99 834.76 864.70 950.17 967.90 CCM mb dec \| 75.06 243.78 560.58 717.13 833.68 862.70 949.35 967.11 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:35:32 +08:00
Tianjia Zhang	6b5360a5e0	crypto: arm64/sm4 - add CE implementation for cmac/xcbc/cbcmac This patch is a CE-optimized assembly implementation for cmac/xcbc/cbcmac. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 300 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is XXXmac(sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: update-size \| 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- cmac(sm4-ce) \| 293.33 403.69 503.76 527.78 531.10 535.46 535.81 xcbc(sm4-ce) \| 292.83 402.50 504.02 529.08 529.87 536.55 538.24 cbcmac(sm4-ce) \| 318.42 415.79 497.12 515.05 523.15 521.19 523.01 After: update-size \| 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- cmac-sm4-ce \| 371.99 675.28 903.56 971.65 980.57 990.40 991.04 xcbc-sm4-ce \| 372.11 674.55 903.47 971.61 980.96 990.42 991.10 cbcmac-sm4-ce \| 371.63 675.33 903.23 972.07 981.42 990.93 991.45 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:43 +08:00
Tianjia Zhang	01f633113b	crypto: arm64/sm4 - add CE implementation for XTS mode This patch is a CE-optimized assembly implementation for XTS mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is xts(ecb-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: xts(ecb-sm4-ce) \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- XTS enc \| 117.17 430.56 732.92 1134.98 2007.03 2136.23 2347.20 XTS dec \| 116.89 429.02 733.40 1132.96 2006.13 2130.50 2347.92 After: xts-sm4-ce \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- XTS enc \| 224.68 798.91 1248.08 1714.60 2413.73 2467.84 2612.62 XTS dec \| 229.85 791.34 1237.79 1720.00 2413.30 2473.84 2611.95 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	b1863fd074	crypto: arm64/sm4 - add CE implementation for CTS-CBC mode This patch is a CE-optimized assembly implementation for CTS-CBC mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is cts(cbc-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: cts(cbc-sm4-ce) \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- CTS-CBC enc \| 286.09 297.17 457.97 627.75 868.58 900.80 957.69 CTS-CBC dec \| 286.67 285.63 538.35 947.08 2241.03 2577.32 3391.14 After: cts-cbc-sm4-ce \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- CTS-CBC enc \| 288.19 428.80 593.57 741.04 911.73 931.80 950.00 CTS-CBC dec \| 292.22 468.99 838.23 1380.76 2741.17 3036.42 3409.62 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	45089dbe59	crypto: arm64/sm4 - export reusable CE acceleration functions In the accelerated implementation of the SM4 algorithm using the Crypto Extension instructions, there are some functions that can be reused in the upcoming accelerated implementation of the GCM/CCM mode, and the CBC/CFB encryption is reused in the optimized implementation of SVESM4. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	cb9ba02b07	crypto: arm64/sm4 - simplify sm4_ce_expand_key() of CE implementation Use a 128-bit swap mask and tbl instruction to simplify the implementation for generating SM4 rkey_dec. Also fixed the issue of not being wrapped by kernel_neon_begin/end() when using the sm4_ce_expand_key() function. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:31 +08:00
Tianjia Zhang	ce41fefd24	crypto: arm64/sm4 - refactor and simplify CE implementation This patch does not add new features, but only refactors and simplifies the implementation of the Crypto Extension acceleration of the SM4 algorithm: Extract the macro optimized by SM4 Crypto Extension for reuse in the subsequent optimization of CCM/GCM modes. Encryption in CBC and CFB modes processes four blocks at a time instead of one, allowing the ld1 instruction to load 64 bytes of data at a time, which will reduces unnecessary memory accesses. CBC/CFB/CTR makes full use of free registers to reduce redundant memory accesses, and rearranges some instructions to improve out-of-order execution capabilities. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:31 +08:00

1 2 3 4 5 ...

354 Commits