"ARM"="Arm", "NEON"="Neon"
Refer to:
https://www.arm.com/company/policies/trademarks/arm-trademark-list/arm-trademark
https://www.arm.com/company/policies/trademarks/arm-trademark-list/neon-trademark
NOTE: These changes are only applied to change log entries for 2.0.x and
later, since the change log is a historical record and Arm's new
trademark policy did not go into effect until late 2017.
diff --git a/BUILDING.md b/BUILDING.md
index a4ae1e0..6828809 100644
--- a/BUILDING.md
+++ b/BUILDING.md
@@ -398,8 +398,8 @@
Building libjpeg-turbo for iOS
------------------------------
-iOS platforms, such as the iPhone and iPad, use ARM processors, and all
-currently supported models include NEON instructions. Thus, they can take
+iOS platforms, such as the iPhone and iPad, use Arm processors, and all
+currently supported models include Neon instructions. Thus, they can take
advantage of libjpeg-turbo's SIMD extensions to significantly accelerate JPEG
compression/decompression. This section describes how to build libjpeg-turbo
for these platforms.
@@ -412,7 +412,7 @@
it should be installed in your `PATH`.
-### ARMv7 (32-bit)
+### Armv7 (32-bit)
**gas-preprocessor.pl required**
@@ -465,7 +465,7 @@
make
-### ARMv7s (32-bit)
+### Armv7s (32-bit)
**gas-preprocessor.pl required**
@@ -493,13 +493,13 @@
#### Xcode 5 and later (Clang)
-Same as the ARMv7 build procedure for Xcode 5 and later, except replace the
+Same as the Armv7 build procedure for Xcode 5 and later, except replace the
compiler flags as follows:
export CFLAGS="-Wall -mfloat-abi=softfp -arch armv7s -miphoneos-version-min=6.0"
-### ARMv8 (64-bit)
+### Armv8 (64-bit)
**gas-preprocessor.pl required if using Xcode < 6**
@@ -523,7 +523,7 @@
[additional CMake flags] {source_directory}
make
-Once built, lipo can be used to combine the ARMv7, v7s, and/or v8 variants into
+Once built, lipo can be used to combine the Armv7, v7s, and/or v8 variants into
a universal library.
@@ -534,7 +534,7 @@
[Android NDK](https://developer.android.com/tools/sdk/ndk).
-### ARMv7 (32-bit)
+### Armv7 (32-bit)
The following is a general recipe script that can be modified for your specific
needs.
@@ -559,7 +559,7 @@
make
-### ARMv8 (64-bit)
+### Armv8 (64-bit)
The following is a general recipe script that can be modified for your specific
needs.
@@ -742,21 +742,21 @@
make udmg
-This creates a Mac package/disk image that contains universal x86-64/i386/ARM
+This creates a Mac package/disk image that contains universal x86-64/i386/Arm
binaries. The following CMake variables control which architectures are
included in the universal binaries. Setting any of these variables to an empty
string excludes that architecture from the package.
* `OSX_32BIT_BUILD`: Directory containing an i386 (32-bit) Mac build of
libjpeg-turbo (default: *{source_directory}*/osxx86)
-* `IOS_ARMV7_BUILD`: Directory containing an ARMv7 (32-bit) iOS build of
+* `IOS_ARMV7_BUILD`: Directory containing an Armv7 (32-bit) iOS build of
libjpeg-turbo (default: *{source_directory}*/iosarmv7)
-* `IOS_ARMV7S_BUILD`: Directory containing an ARMv7s (32-bit) iOS build of
+* `IOS_ARMV7S_BUILD`: Directory containing an Armv7s (32-bit) iOS build of
libjpeg-turbo (default: *{source_directory}*/iosarmv7s)
-* `IOS_ARMV8_BUILD`: Directory containing an ARMv8 (64-bit) iOS build of
+* `IOS_ARMV8_BUILD`: Directory containing an Armv8 (64-bit) iOS build of
libjpeg-turbo (default: *{source_directory}*/iosarmv8)
-You should first use CMake to configure i386, ARMv7, ARMv7s, and/or ARMv8
+You should first use CMake to configure i386, Armv7, Armv7s, and/or Armv8
sub-builds of libjpeg-turbo (see "Build Recipes" and "Building libjpeg-turbo
for iOS" above) in build directories that match those specified in the
aforementioned CMake variables. Next, configure the primary build of
diff --git a/ChangeLog.md b/ChangeLog.md
index e496281..b04ba36 100644
--- a/ChangeLog.md
+++ b/ChangeLog.md
@@ -20,8 +20,8 @@
- Fixed an issue whereby `jpeg_skip_scanlines()` always returned 0 when
skipping past the end of an image.
-3. The ARM 64-bit (ARMv8) NEON SIMD extensions can now be built using MinGW
-toolchains targetting ARM64 (AArch64) Windows binaries.
+3. The Arm 64-bit (Armv8) Neon SIMD extensions can now be built using MinGW
+toolchains targetting Arm64 (AArch64) Windows binaries.
4. Fixed unexpected visual artifacts that occurred when using
`jpeg_crop_scanline()` and interblock smoothing while decompressing only the DC
@@ -94,7 +94,7 @@
(unlike the decompressor) is not generally exposed to arbitrary data exploits,
this issue did not likely pose a security risk.
-6. The ARM 64-bit (ARMv8) NEON SIMD assembly code now stores constants in a
+6. The Arm 64-bit (Armv8) Neon SIMD assembly code now stores constants in a
separate read-only data section rather than in the text section, to support
execute-only memory layouts.
@@ -380,7 +380,7 @@
now produces bitwise-identical results to the unmerged algorithms.
12. The SIMD function symbols for x86[-64]/ELF, MIPS/ELF, macOS/x86[-64] (if
-libjpeg-turbo is built with YASM), and iOS/ARM[64] builds are now private.
+libjpeg-turbo is built with YASM), and iOS/Arm[64] builds are now private.
This prevents those symbols from being exposed in applications or shared
libraries that link statically with libjpeg-turbo.
diff --git a/README.md b/README.md
index c88d3f5..1ff632e 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
==========
libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate
-baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and
+baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and
MIPS systems, as well as progressive JPEG compression on x86 and x86-64
systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg,
all else being equal. On other types of systems, libjpeg-turbo can still
diff --git a/cmakescripts/BuildPackages.cmake b/cmakescripts/BuildPackages.cmake
index 27e4c9e..b98fa93 100644
--- a/cmakescripts/BuildPackages.cmake
+++ b/cmakescripts/BuildPackages.cmake
@@ -137,13 +137,13 @@
"Directory containing 32-bit (i386) Mac build to include in universal binaries (default: ${DEFAULT_OSX_32BIT_BUILD})")
set(DEFAULT_IOS_ARMV7_BUILD ${CMAKE_SOURCE_DIR}/iosarmv7)
set(IOS_ARMV7_BUILD ${DEFAULT_IOS_ARMV7_BUILD} CACHE PATH
- "Directory containing ARMv7 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7_BUILD})")
+ "Directory containing Armv7 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7_BUILD})")
set(DEFAULT_IOS_ARMV7S_BUILD ${CMAKE_SOURCE_DIR}/iosarmv7s)
set(IOS_ARMV7S_BUILD ${DEFAULT_IOS_ARMV7S_BUILD} CACHE PATH
- "Directory containing ARMv7s iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7S_BUILD})")
+ "Directory containing Armv7s iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7S_BUILD})")
set(DEFAULT_IOS_ARMV8_BUILD ${CMAKE_SOURCE_DIR}/iosarmv8)
set(IOS_ARMV8_BUILD ${DEFAULT_IOS_ARMV8_BUILD} CACHE PATH
- "Directory containing ARMv8 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV8_BUILD})")
+ "Directory containing Armv8 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV8_BUILD})")
set(OSX_APP_CERT_NAME "" CACHE STRING
"Name of the Developer ID Application certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo DMG. Leave this blank to generate an unsigned DMG.")
diff --git a/jchuff.c b/jchuff.c
index cb05055..db85ce1 100644
--- a/jchuff.c
+++ b/jchuff.c
@@ -34,10 +34,10 @@
* memory footprint by 64k, which is important for some mobile applications
* that create many isolated instances of libjpeg-turbo (web browsers, for
* instance.) This may improve performance on some mobile platforms as well.
- * This feature is enabled by default only on ARM processors, because some x86
+ * This feature is enabled by default only on Arm processors, because some x86
* chips have a slow implementation of bsr, and the use of clz/bsr cannot be
* shown to have a significant performance impact even on the x86 chips that
- * have a fast implementation of it. When building for ARMv6, you can
+ * have a fast implementation of it. When building for Armv6, you can
* explicitly disable the use of clz/bsr by adding -mthumb to the compiler
* flags (this defines __thumb__).
*/
diff --git a/jcphuff.c b/jcphuff.c
index 8c4efaf..a8b94be 100644
--- a/jcphuff.c
+++ b/jcphuff.c
@@ -43,10 +43,10 @@
* memory footprint by 64k, which is important for some mobile applications
* that create many isolated instances of libjpeg-turbo (web browsers, for
* instance.) This may improve performance on some mobile platforms as well.
- * This feature is enabled by default only on ARM processors, because some x86
+ * This feature is enabled by default only on Arm processors, because some x86
* chips have a slow implementation of bsr, and the use of clz/bsr cannot be
* shown to have a significant performance impact even on the x86 chips that
- * have a fast implementation of it. When building for ARMv6, you can
+ * have a fast implementation of it. When building for Armv6, you can
* explicitly disable the use of clz/bsr by adding -mthumb to the compiler
* flags (this defines __thumb__).
*/
diff --git a/release/ReadMe.txt b/release/ReadMe.txt
index 0a08711..0d1888d 100644
--- a/release/ReadMe.txt
+++ b/release/ReadMe.txt
@@ -1,4 +1,4 @@
-libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and MIPS systems, as well as progressive JPEG compression on x86 and x86-64 systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal. On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines. In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
+libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and MIPS systems, as well as progressive JPEG compression on x86 and x86-64 systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal. On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines. In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
libjpeg-turbo implements both the traditional libjpeg API as well as the less powerful but more straightforward TurboJPEG API. libjpeg-turbo also features colorspace extensions that allow it to compress from/decompress to 32-bit and big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java interface.
diff --git a/release/deb-control.in b/release/deb-control.in
index c41c9a7..b82bdac 100644
--- a/release/deb-control.in
+++ b/release/deb-control.in
@@ -9,7 +9,7 @@
Installed-Size: {__SIZE}
Description: A SIMD-accelerated JPEG codec that provides both the libjpeg and TurboJPEG APIs
libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate
- baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and
+ baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and
MIPS systems, as well as progressive JPEG compression on x86 and x86-64
systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg,
all else being equal. On other types of systems, libjpeg-turbo can still
diff --git a/release/makemacpkg.in b/release/makemacpkg.in
index bbbfe6f..ae80bec 100755
--- a/release/makemacpkg.in
+++ b/release/makemacpkg.in
@@ -223,15 +223,15 @@
}
if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV7" != "" ]; then
- install_ios $BUILDDIRARMV7 ARMv7 armv7 arm
+ install_ios $BUILDDIRARMV7 Armv7 armv7 arm
fi
if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV7S" != "" ]; then
- install_ios $BUILDDIRARMV7S ARMv7s armv7s arm
+ install_ios $BUILDDIRARMV7S Armv7s armv7s arm
fi
if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV8" != "" ]; then
- install_ios $BUILDDIRARMV8 ARMv8 armv8 arm64
+ install_ios $BUILDDIRARMV8 Armv8 armv8 arm64
fi
install_name_tool -id $LIBDIR/$LIBJPEG_DSO_NAME $PKGROOT/$LIBDIR/$LIBJPEG_DSO_NAME
diff --git a/release/rpm.spec.in b/release/rpm.spec.in
index 83a1669..f8db764 100644
--- a/release/rpm.spec.in
+++ b/release/rpm.spec.in
@@ -52,7 +52,7 @@
%description
libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate
-baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and
+baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and
MIPS systems, as well as progressive JPEG compression on x86 and x86-64
systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg,
all else being equal. On other types of systems, libjpeg-turbo can still
diff --git a/simd/CMakeLists.txt b/simd/CMakeLists.txt
index 5c8009a..ba0bd13 100644
--- a/simd/CMakeLists.txt
+++ b/simd/CMakeLists.txt
@@ -205,7 +205,7 @@
###############################################################################
-# ARM (GAS)
+# Arm (GAS)
###############################################################################
elseif(CPU_TYPE STREQUAL "arm64" OR CPU_TYPE STREQUAL "arm")
diff --git a/simd/arm/jsimd.c b/simd/arm/jsimd.c
index 45f9b04..709656c 100644
--- a/simd/arm/jsimd.c
+++ b/simd/arm/jsimd.c
@@ -13,7 +13,7 @@
*
* This file contains the interface between the "normal" portions
* of the library and the SIMD implementations when running on a
- * 32-bit ARM architecture.
+ * 32-bit Arm architecture.
*/
#define JPEG_INTERNALS
@@ -118,7 +118,7 @@
#if defined(__ARM_NEON__)
simd_support |= JSIMD_NEON;
#elif defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)
- /* We still have a chance to use NEON regardless of globally used
+ /* We still have a chance to use Neon regardless of globally used
* -mcpu/-mfpu options passed to gcc by performing runtime detection via
* /proc/cpuinfo parsing on linux/android */
while (!parse_proc_cpuinfo(bufsize)) {
diff --git a/simd/arm/jsimd_neon.S b/simd/arm/jsimd_neon.S
index af929fe..f8f0dad 100644
--- a/simd/arm/jsimd_neon.S
+++ b/simd/arm/jsimd_neon.S
@@ -1,5 +1,5 @@
/*
- * ARMv7 NEON optimizations for libjpeg-turbo
+ * Armv7 Neon optimizations for libjpeg-turbo
*
* Copyright (C) 2009-2011, Nokia Corporation and/or its subsidiary(-ies).
* All Rights Reserved.
@@ -229,7 +229,7 @@
ROW7L .req d30
ROW7R .req d31
- /* Load and dequantize coefficients into NEON registers
+ /* Load and dequantize coefficients into Neon registers
* with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
@@ -261,7 +261,7 @@
vld1.16 {d0, d1, d2, d3}, [ip, :128] /* load constants */
add ip, ip, #16
vmul.s16 q15, q15, q3
- vpush {d8-d15} /* save NEON registers */
+ vpush {d8-d15} /* save Neon registers */
/* 1-D IDCT, pass 1, left 4x8 half */
vadd.s16 d4, ROW7L, ROW3L
vadd.s16 d5, ROW5L, ROW1L
@@ -507,7 +507,7 @@
vqrshrn.s16 d17, q9, #2
vqrshrn.s16 d18, q10, #2
vqrshrn.s16 d19, q11, #2
- vpop {d8-d15} /* restore NEON registers */
+ vpop {d8-d15} /* restore Neon registers */
vqrshrn.s16 d20, q12, #2
/* Transpose the final 8-bit samples and do signed->unsigned conversion */
vtrn.16 q8, q9
@@ -688,7 +688,7 @@
* function from jidctfst.c
*
* Normally 1-D AAN DCT needs 5 multiplications and 29 additions.
- * But in ARM NEON case some extra additions are required because VQDMULH
+ * But in Arm Neon case some extra additions are required because VQDMULH
* instruction can't handle the constants larger than 1. So the expressions
* like "x * 1.082392200" have to be converted to "x * 0.082392200 + x",
* which introduces an extra addition. Overall, there are 6 extra additions
@@ -718,7 +718,7 @@
TMP3 .req r2
TMP4 .req ip
- /* Load and dequantize coefficients into NEON registers
+ /* Load and dequantize coefficients into Neon registers
* with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
@@ -749,7 +749,7 @@
vmul.s16 q13, q13, q1
vld1.16 {d0}, [ip, :64] /* load constants */
vmul.s16 q15, q15, q3
- vpush {d8-d13} /* save NEON registers */
+ vpush {d8-d13} /* save Neon registers */
/* 1-D IDCT, pass 1 */
vsub.s16 q2, q10, q14
vadd.s16 q14, q10, q14
@@ -842,7 +842,7 @@
vadd.s16 q14, q5, q3
vsub.s16 q9, q5, q3
vsub.s16 q13, q10, q2
- vpop {d8-d13} /* restore NEON registers */
+ vpop {d8-d13} /* restore Neon registers */
vadd.s16 q10, q10, q2
vsub.s16 q11, q12, q1
vadd.s16 q12, q12, q1
@@ -913,7 +913,7 @@
*
* NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which
* requires much less arithmetic operations and hence should be faster.
- * The primary purpose of this particular NEON optimized function is
+ * The primary purpose of this particular Neon optimized function is
* bit exact compatibility with jpeg-6b.
*
* TODO: a bit better instructions scheduling can be achieved by expanding
@@ -1016,7 +1016,7 @@
adr TMP4, jsimd_idct_4x4_neon_consts
vld1.16 {d0, d1, d2, d3}, [TMP4, :128]
- /* Load all COEF_BLOCK into NEON registers with the following allocation:
+ /* Load all COEF_BLOCK into Neon registers with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
* 0 | d4 | d5
@@ -1126,7 +1126,7 @@
*
* NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which
* requires much less arithmetic operations and hence should be faster.
- * The primary purpose of this particular NEON optimized function is
+ * The primary purpose of this particular Neon optimized function is
* bit exact compatibility with jpeg-6b.
*/
@@ -1173,7 +1173,7 @@
adr TMP2, jsimd_idct_2x2_neon_consts
vld1.16 {d0}, [TMP2, :64]
- /* Load all COEF_BLOCK into NEON registers with the following allocation:
+ /* Load all COEF_BLOCK into Neon registers with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
* 0 | d4 | d5
@@ -1499,7 +1499,7 @@
adr ip, jsimd_ycc_\colorid\()_neon_consts
vld1.16 {d0, d1, d2, d3}, [ip, :128]
- /* Save ARM registers and handle input arguments */
+ /* Save Arm registers and handle input arguments */
push {r4, r5, r6, r7, r8, r9, r10, lr}
ldr NUM_ROWS, [sp, #(4 * 8)]
ldr INPUT_BUF0, [INPUT_BUF]
@@ -1507,7 +1507,7 @@
ldr INPUT_BUF2, [INPUT_BUF, #8]
.unreq INPUT_BUF
- /* Save NEON registers */
+ /* Save Neon registers */
vpush {d8-d15}
/* Initially set d10, d11, d12, d13 to 0xFF */
@@ -1814,7 +1814,7 @@
adr ip, jsimd_\colorid\()_ycc_neon_consts
vld1.16 {d0, d1, d2, d3}, [ip, :128]
- /* Save ARM registers and handle input arguments */
+ /* Save Arm registers and handle input arguments */
push {r4, r5, r6, r7, r8, r9, r10, lr}
ldr NUM_ROWS, [sp, #(4 * 8)]
ldr OUTPUT_BUF0, [OUTPUT_BUF]
@@ -1822,7 +1822,7 @@
ldr OUTPUT_BUF2, [OUTPUT_BUF, #8]
.unreq OUTPUT_BUF
- /* Save NEON registers */
+ /* Save Neon registers */
vpush {d8-d15}
/* Outer loop over scanlines */
@@ -2017,7 +2017,7 @@
adr TMP, jsimd_fdct_ifast_neon_consts
vld1.16 {d0}, [TMP, :64]
- /* Load all DATA into NEON registers with the following allocation:
+ /* Load all DATA into Neon registers with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
* 0 | d16 | d17 | q8
@@ -2112,8 +2112,8 @@
*
* Note: the code uses 2 stage pipelining in order to improve instructions
* scheduling and eliminate stalls (this provides ~15% better
- * performance for this function on both ARM Cortex-A8 and
- * ARM Cortex-A9 when compared to the non-pipelined variant).
+ * performance for this function on both Arm Cortex-A8 and
+ * Arm Cortex-A9 when compared to the non-pipelined variant).
* The instructions which belong to the second stage use different
* indentation for better readiability.
*/
diff --git a/simd/arm64/jsimd.c b/simd/arm64/jsimd.c
index 0e6c7b9..808c0e3 100644
--- a/simd/arm64/jsimd.c
+++ b/simd/arm64/jsimd.c
@@ -12,7 +12,7 @@
*
* This file contains the interface between the "normal" portions
* of the library and the SIMD implementations when running on a
- * 64-bit ARM architecture.
+ * 64-bit Arm architecture.
*/
#define JPEG_INTERNALS
@@ -114,8 +114,8 @@
*/
/*
- * ARMv8 architectures support NEON extensions by default.
- * It is no longer optional as it was with ARMv7.
+ * Armv8 architectures support Neon extensions by default.
+ * It is no longer optional as it was with Armv7.
*/
diff --git a/simd/arm64/jsimd_neon.S b/simd/arm64/jsimd_neon.S
index 70cef2c..3ed5f58 100644
--- a/simd/arm64/jsimd_neon.S
+++ b/simd/arm64/jsimd_neon.S
@@ -1,5 +1,5 @@
/*
- * ARMv8 NEON optimizations for libjpeg-turbo
+ * Armv8 Neon optimizations for libjpeg-turbo
*
* Copyright (C) 2009-2011, Nokia Corporation and/or its subsidiary(-ies).
* All Rights Reserved.
@@ -611,7 +611,7 @@
shrn2 v5.8h, v15.4s, #16 /* wsptr[DCTSIZE*3] = (int)DESCALE(tmp13 + tmp0, CONST_BITS+PASS1_BITS+3) */
shrn2 v6.8h, v17.4s, #16 /* wsptr[DCTSIZE*4] = (int)DESCALE(tmp13 - tmp0, CONST_BITS+PASS1_BITS+3) */
movi v0.16b, #(CENTERJSAMPLE)
- /* Prepare pointers (dual-issue with NEON instructions) */
+ /* Prepare pointers (dual-issue with Neon instructions) */
ldp TMP1, TMP2, [OUTPUT_BUF], 16
sqrshrn v28.8b, v2.8h, #(CONST_BITS+PASS1_BITS+3-16)
ldp TMP3, TMP4, [OUTPUT_BUF], 16
@@ -992,7 +992,7 @@
* function from jidctfst.c
*
* Normally 1-D AAN DCT needs 5 multiplications and 29 additions.
- * But in ARM NEON case some extra additions are required because VQDMULH
+ * But in Arm Neon case some extra additions are required because VQDMULH
* instruction can't handle the constants larger than 1. So the expressions
* like "x * 1.082392200" have to be converted to "x * 0.082392200 + x",
* which introduces an extra addition. Overall, there are 6 extra additions
@@ -1024,7 +1024,7 @@
instruction ensures that those bits are set to zero. */
uxtw x3, w3
- /* Load and dequantize coefficients into NEON registers
+ /* Load and dequantize coefficients into Neon registers
* with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
@@ -1037,7 +1037,7 @@
* 6 | d28 | d29 ( v22.8h )
* 7 | d30 | d31 ( v23.8h )
*/
- /* Save NEON registers used in fast IDCT */
+ /* Save Neon registers used in fast IDCT */
get_symbol_loc TMP5, Ljsimd_idct_ifast_neon_consts
ld1 {v16.8h, v17.8h}, [COEF_BLOCK], 32
ld1 {v0.8h, v1.8h}, [DCT_TABLE], 32
@@ -1142,7 +1142,7 @@
add v20.8h, v20.8h, v1.8h
/* Descale to 8-bit and range limit */
movi v0.16b, #0x80
- /* Prepare pointers (dual-issue with NEON instructions) */
+ /* Prepare pointers (dual-issue with Neon instructions) */
ldp TMP1, TMP2, [OUTPUT_BUF], 16
sqshrn v28.8b, v16.8h, #5
ldp TMP3, TMP4, [OUTPUT_BUF], 16
@@ -1221,7 +1221,7 @@
*
* NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which
* requires much less arithmetic operations and hence should be faster.
- * The primary purpose of this particular NEON optimized function is
+ * The primary purpose of this particular Neon optimized function is
* bit exact compatibility with jpeg-6b.
*
* TODO: a bit better instructions scheduling can be achieved by expanding
@@ -1291,7 +1291,7 @@
instruction ensures that those bits are set to zero. */
uxtw x3, w3
- /* Save all used NEON registers */
+ /* Save all used Neon registers */
sub sp, sp, 64
mov x9, sp
/* Load constants (v3.4h is just used for padding) */
@@ -1300,7 +1300,7 @@
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [TMP4]
- /* Load all COEF_BLOCK into NEON registers with the following allocation:
+ /* Load all COEF_BLOCK into Neon registers with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
* 0 | v4.4h | v5.4h
@@ -1434,7 +1434,7 @@
*
* NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which
* requires much less arithmetic operations and hence should be faster.
- * The primary purpose of this particular NEON optimized function is
+ * The primary purpose of this particular Neon optimized function is
* bit exact compatibility with jpeg-6b.
*/
@@ -1483,7 +1483,7 @@
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
ld1 {v14.4h}, [TMP2]
- /* Load all COEF_BLOCK into NEON registers with the following allocation:
+ /* Load all COEF_BLOCK into Neon registers with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
* 0 | v4.4h | v5.4h
@@ -1857,7 +1857,7 @@
/* Load constants to d1, d2, d3 (v0.4h is just used for padding) */
get_symbol_loc x15, Ljsimd_ycc_rgb_neon_consts
- /* Save NEON registers */
+ /* Save Neon registers */
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x9], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
ld1 {v0.4h, v1.4h}, [x15], 16
@@ -2142,7 +2142,7 @@
.endm
/* TODO: expand macros and interleave instructions if some in-order
- * ARM64 processor actually can dual-issue LOAD/STORE with ALU */
+ * AArch64 processor actually can dual-issue LOAD/STORE with ALU */
.macro do_rgb_to_yuv_stage2_store_load_stage1 fast_ld3
do_rgb_to_yuv_stage2
do_load \bpp, 8, \fast_ld3
@@ -2182,7 +2182,7 @@
ldr OUTPUT_BUF2, [OUTPUT_BUF, #16]
.unreq OUTPUT_BUF
- /* Save NEON registers */
+ /* Save Neon registers */
sub sp, sp, #64
mov x9, sp
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x9], 32
@@ -2396,13 +2396,13 @@
get_symbol_loc TMP, Ljsimd_fdct_islow_neon_consts
ld1 {v0.8h, v1.8h}, [TMP]
- /* Save NEON registers */
+ /* Save Neon registers */
sub sp, sp, #64
mov x10, sp
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x10], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x10], 32
- /* Load all DATA into NEON registers with the following allocation:
+ /* Load all DATA into Neon registers with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
* 0 | d16 | d17 | v16.8h
@@ -2629,7 +2629,7 @@
st1 {v16.8h, v17.8h, v18.8h, v19.8h}, [DATA], 64
st1 {v20.8h, v21.8h, v22.8h, v23.8h}, [DATA]
- /* Restore NEON registers */
+ /* Restore Neon registers */
ld1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
ld1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
@@ -2681,7 +2681,7 @@
get_symbol_loc TMP, Ljsimd_fdct_ifast_neon_consts
ld1 {v0.4h}, [TMP]
- /* Load all DATA into NEON registers with the following allocation:
+ /* Load all DATA into Neon registers with the following allocation:
* 0 1 2 3 | 4 5 6 7
* ---------+--------
* 0 | d16 | d17 | v0.8h
@@ -3066,7 +3066,7 @@
.endif
sub sp, sp, 272
sub BUFFER, BUFFER, #0x1 /* BUFFER=buffer-- */
- /* Save ARM registers */
+ /* Save Arm registers */
stp x19, x20, [sp]
get_symbol_loc x15, Ljsimd_huff_encode_one_block_neon_consts
ldr PUT_BUFFER, [x0, #0x10]