Merge branch 'master' into dev
diff --git a/ChangeLog.md b/ChangeLog.md
index 5560622..c4f9490 100644
--- a/ChangeLog.md
+++ b/ChangeLog.md
@@ -388,7 +388,7 @@
 
 1. Added AVX2 SIMD implementations of the colorspace conversion, chroma
 downsampling and upsampling, integer quantization and sample conversion, and
-slow integer DCT/IDCT algorithms.  When using the slow integer DCT/IDCT
+accurate integer DCT/IDCT algorithms.  When using the accurate integer DCT/IDCT
 algorithms on AVX2-equipped CPUs, the compression of RGB images is
 approximately 13-36% (avg. 22%) faster (relative to libjpeg-turbo 1.5.x) with
 64-bit code and 11-21% (avg. 17%) faster with 32-bit code, and the
@@ -498,10 +498,10 @@
 
 13. Added Loongson MMI SIMD implementations of the RGB-to-YCbCr and
 YCbCr-to-RGB colorspace conversion, 4:2:0 chroma downsampling, 4:2:0 fancy
-chroma upsampling, integer quantization, and slow integer DCT/IDCT algorithms.
-When using the slow integer DCT/IDCT, this speeds up the compression of RGB
-images by approximately 70-100% and the decompression of RGB images by
-approximately 2-3.5x.
+chroma upsampling, integer quantization, and accurate integer DCT/IDCT
+algorithms.  When using the accurate integer DCT/IDCT, this speeds up the
+compression of RGB images by approximately 70-100% and the decompression of RGB
+images by approximately 2-3.5x.
 
 14. Fixed a build error when building with older MinGW releases (regression
 caused by 1.5.1[7].)
@@ -833,8 +833,8 @@
 disabled by setting the `JSIMD_NOHUFFENC` environment variable to `1`.
 
 13. Added ARM 64-bit (ARMv8) NEON SIMD implementations of the commonly-used
-compression algorithms (including the slow integer forward DCT and h2v2 & h2v1
-downsampling algorithms, which are not accelerated in the 32-bit NEON
+compression algorithms (including the accurate integer forward DCT and h2v2 &
+h2v1 downsampling algorithms, which are not accelerated in the 32-bit NEON
 implementation.)  This speeds up the compression of full-color JPEGs by about
 75% on average on a Cavium ThunderX processor and by about 2-2.5x on average on
 Cortex-A53 and Cortex-A57 cores.
@@ -965,8 +965,8 @@
 
 7. Fixed an extremely rare bug in the Huffman encoder that caused 64-bit
 builds of libjpeg-turbo to incorrectly encode a few specific test images when
-quality=98, an optimized Huffman table, and the slow integer forward DCT were
-used.
+quality=98, an optimized Huffman table, and the accurate integer forward DCT
+were used.
 
 8. The Windows (CMake) build system now supports building only static or only
 shared libraries.  This is accomplished by adding either `-DENABLE_STATIC=0` or
@@ -1125,8 +1125,8 @@
 The accuracy of this implementation now matches the accuracy of the SSE/SSE2
 implementation.  Note, however, that the floating point DCT/IDCT algorithms are
 mainly a legacy feature.  They generally do not produce significantly better
-accuracy than the slow integer DCT/IDCT algorithms, and they are quite a bit
-slower.
+accuracy than the accurate integer DCT/IDCT algorithms, and they are quite a
+bit slower.
 
 8. Added a new output colorspace (`JCS_RGB565`) to the libjpeg API that allows
 for decompressing JPEG images into RGB565 (16-bit) pixels.  If dithering is not
@@ -1536,8 +1536,8 @@
 
 2. Despite the above, the fast integer forward DCT still degrades somewhat for
 JPEG qualities greater than 95, so the TurboJPEG wrapper will now automatically
-use the slow integer forward DCT when generating JPEG images of quality 96 or
-greater.  This reduces compression performance by as much as 15% for these
+use the accurate integer forward DCT when generating JPEG images of quality 96
+or greater.  This reduces compression performance by as much as 15% for these
 high-quality images but is necessary to ensure that the images are perceptually
 lossless.  It also ensures that the library can avoid the performance pitfall
 created by [1].
diff --git a/README.md b/README.md
index da57040..924ebd8 100644
--- a/README.md
+++ b/README.md
@@ -287,12 +287,13 @@
   (and slightly faster) floating point IDCT algorithm introduced in libjpeg
   v8a as opposed to the algorithm used in libjpeg v6b.  It should be noted,
   however, that this algorithm basically brings the accuracy of the floating
-  point IDCT in line with the accuracy of the slow integer IDCT.  The floating
-  point DCT/IDCT algorithms are mainly a legacy feature, and they do not
-  produce significantly more accuracy than the slow integer algorithms (to put
-  numbers on this, the typical difference in PNSR between the two algorithms
-  is less than 0.10 dB, whereas changing the quality level by 1 in the upper
-  range of the quality scale is typically more like a 1.0 dB difference.)
+  point IDCT in line with the accuracy of the accurate integer IDCT.  The
+  floating point DCT/IDCT algorithms are mainly a legacy feature, and they do
+  not produce significantly more accuracy than the accurate integer algorithms
+  (to put numbers on this, the typical difference in PNSR between the two
+  algorithms is less than 0.10 dB, whereas changing the quality level by 1 in
+  the upper range of the quality scale is typically more like a 1.0 dB
+  difference.)
 
 - If the floating point algorithms in libjpeg-turbo are not implemented using
   SIMD instructions on a particular platform, then the accuracy of the
@@ -340,7 +341,7 @@
 correct results whenever the fast integer forward DCT is used along with a JPEG
 quality of 98-100.  Thus, libjpeg-turbo must use the non-SIMD quantization
 function in those cases.  This causes performance to drop by as much as 40%.
-It is therefore strongly advised that you use the slow integer forward DCT
+It is therefore strongly advised that you use the accurate integer forward DCT
 whenever encoding images with a JPEG quality of 98 or higher.
 
 
diff --git a/cjpeg.1 b/cjpeg.1
index 5782e8b..569dc3f 100644
--- a/cjpeg.1
+++ b/cjpeg.1
@@ -1,4 +1,4 @@
-.TH CJPEG 1 "18 December 2019"
+.TH CJPEG 1 "4 November 2020"
 .SH NAME
 cjpeg \- compress an image file to a JPEG file
 .SH SYNOPSIS
@@ -160,31 +160,40 @@
 unable to view an arithmetic coded JPEG file at all.
 .TP
 .B \-dct int
-Use integer DCT method (default).
+Use accurate integer DCT method (default).
 .TP
 .B \-dct fast
-Use fast integer DCT (less accurate).
-In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
-method when using the x86/x86-64 SIMD extensions (results may vary with other
-SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)
+Use less accurate integer DCT method [legacy feature].
+When the Independent JPEG Group's software was first released in 1991, the
+compression time for a 1-megapixel JPEG image on a mainstream PC was measured
+in minutes.  Thus, the \fBfast\fR integer DCT algorithm provided noticeable
+performance benefits.  On modern CPUs running libjpeg-turbo, however, the
+compression time for a 1-megapixel JPEG image is measured in milliseconds, and
+thus the performance benefits of the \fBfast\fR algorithm are much less
+noticeable.  On modern x86/x86-64 CPUs that support AVX2 instructions, the
+\fBfast\fR and \fBint\fR methods have similar performance.  On other types of
+CPUs, the \fBfast\fR method is generally about 5-15% faster than the \fBint\fR
+method.
+
 For quality levels of 90 and below, there should be little or no perceptible
-difference between the two algorithms.  For quality levels above 90, however,
-the difference between the fast and the int methods becomes more pronounced.
-With quality=97, for instance, the fast method incurs generally about a 1-3 dB
-loss (in PSNR) relative to the int method, but this can be larger for some
-images.  Do not use the fast method with quality levels above 97.  The
-algorithm often degenerates at quality=98 and above and can actually produce a
-more lossy image than if lower quality levels had been used.  Also, in
-libjpeg-turbo, the fast method is not fully accelerated for quality levels
-above 97, so it will be slower than the int method.
+quality difference between the two algorithms.  For quality levels above 90,
+however, the difference between the \fBfast\fR and \fBint\fR methods becomes
+more pronounced.  With quality=97, for instance, the \fBfast\fR method incurs
+generally about a 1-3 dB loss in PSNR relative to the \fBint\fR method, but
+this can be larger for some images.  Do not use the \fBfast\fR method with
+quality levels above 97.  The algorithm often degenerates at quality=98 and
+above and can actually produce a more lossy image than if lower quality levels
+had been used.  Also, in libjpeg-turbo, the \fBfast\fR method is not fully
+accelerated for quality levels above 97, so it will be slower than the
+\fBint\fR method.
 .TP
 .B \-dct float
-Use floating-point DCT method.
-The float method is mainly a legacy feature.  It does not produce significantly
-more accurate results than the int method, and it is much slower.  The float
-method may also give different results on different machines due to varying
-roundoff behavior, whereas the integer methods should give the same results on
-all machines.
+Use floating-point DCT method [legacy feature].
+The \fBfloat\fR method does not produce significantly more accurate results
+than the \fBint\fR method, and it is much slower.  The \fBfloat\fR method may
+also give different results on different machines due to varying roundoff
+behavior, whereas the integer methods should give the same results on all
+machines.
 .TP
 .BI \-icc " file"
 Embed ICC color management profile contained in the specified file.
diff --git a/cjpeg.c b/cjpeg.c
index a74ecf9..e5a9211 100644
--- a/cjpeg.c
+++ b/cjpeg.c
@@ -5,7 +5,7 @@
  * Copyright (C) 1991-1998, Thomas G. Lane.
  * Modified 2003-2011 by Guido Vollbeding.
  * libjpeg-turbo Modifications:
- * Copyright (C) 2010, 2013-2014, 2017, 2019, D. R. Commander.
+ * Copyright (C) 2010, 2013-2014, 2017, 2019-2020, D. R. Commander.
  * For conditions of distribution and use, see the accompanying README.ijg
  * file.
  *
@@ -176,15 +176,15 @@
   fprintf(stderr, "  -arithmetic    Use arithmetic coding\n");
 #endif
 #ifdef DCT_ISLOW_SUPPORTED
-  fprintf(stderr, "  -dct int       Use integer DCT method%s\n",
+  fprintf(stderr, "  -dct int       Use accurate integer DCT method%s\n",
           (JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));
 #endif
 #ifdef DCT_IFAST_SUPPORTED
-  fprintf(stderr, "  -dct fast      Use fast integer DCT (less accurate)%s\n",
+  fprintf(stderr, "  -dct fast      Use less accurate integer DCT method [legacy feature]%s\n",
           (JDCT_DEFAULT == JDCT_IFAST ? " (default)" : ""));
 #endif
 #ifdef DCT_FLOAT_SUPPORTED
-  fprintf(stderr, "  -dct float     Use floating-point DCT method%s\n",
+  fprintf(stderr, "  -dct float     Use floating-point DCT method [legacy feature]%s\n",
           (JDCT_DEFAULT == JDCT_FLOAT ? " (default)" : ""));
 #endif
   fprintf(stderr, "  -icc FILE      Embed ICC profile contained in FILE\n");
diff --git a/djpeg.1 b/djpeg.1
index f6a6f43..31431b9 100644
--- a/djpeg.1
+++ b/djpeg.1
@@ -1,4 +1,4 @@
-.TH DJPEG 1 "18 December 2019"
+.TH DJPEG 1 "4 November 2020"
 .SH NAME
 djpeg \- decompress a JPEG file to an image file
 .SH SYNOPSIS
@@ -121,32 +121,40 @@
 Switches for advanced users:
 .TP
 .B \-dct int
-Use integer DCT method (default).
+Use accurate integer DCT method (default).
 .TP
 .B \-dct fast
-Use fast integer DCT (less accurate).
-In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
-method when using the x86/x86-64 SIMD extensions (results may vary with other
-SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)  If
-the JPEG image was compressed using a quality level of 85 or below, then there
-should be little or no perceptible difference between the two algorithms.  When
-decompressing images that were compressed using quality levels above 85,
-however, the difference between the fast and int methods becomes more
-pronounced.  With images compressed using quality=97, for instance, the fast
-method incurs generally about a 4-6 dB loss (in PSNR) relative to the int
-method, but this can be larger for some images.  If you can avoid it, do not
-use the fast method when decompressing images that were compressed using
-quality levels above 97.  The algorithm often degenerates for such images and
-can actually produce a more lossy output image than if the JPEG image had been
-compressed using lower quality levels.
+Use less accurate integer DCT method [legacy feature].
+When the Independent JPEG Group's software was first released in 1991, the
+decompression time for a 1-megapixel JPEG image on a mainstream PC was measured
+in minutes.  Thus, the \fBfast\fR integer DCT algorithm provided noticeable
+performance benefits.  On modern CPUs running libjpeg-turbo, however, the
+decompression time for a 1-megapixel JPEG image is measured in milliseconds,
+and thus the performance benefits of the \fBfast\fR algorithm are much less
+noticeable.  On modern x86/x86-64 CPUs that support AVX2 instructions, the
+\fBfast\fR and \fBint\fR methods have similar performance.  On other types of
+CPUs, the \fBfast\fR method is generally about 5-15% faster than the \fBint\fR
+method.
+
+If the JPEG image was compressed using a quality level of 85 or below, then
+there should be little or no perceptible quality difference between the two
+algorithms.  When decompressing images that were compressed using quality
+levels above 85, however, the difference between the \fBfast\fR and \fBint\fR
+methods becomes more pronounced.  With images compressed using quality=97, for
+instance, the \fBfast\fR method incurs generally about a 4-6 dB loss in PSNR
+relative to the \fBint\fR method, but this can be larger for some images.  If
+you can avoid it, do not use the \fBfast\fR method when decompressing images
+that were compressed using quality levels above 97.  The algorithm often
+degenerates for such images and can actually produce a more lossy output image
+than if the JPEG image had been compressed using lower quality levels.
 .TP
 .B \-dct float
-Use floating-point DCT method.
-The float method is mainly a legacy feature.  It does not produce significantly
-more accurate results than the int method, and it is much slower.  The float
-method may also give different results on different machines due to varying
-roundoff behavior, whereas the integer methods should give the same results on
-all machines.
+Use floating-point DCT method [legacy feature].
+The \fBfloat\fR method does not produce significantly more accurate results
+than the \fBint\fR method, and it is much slower.  The \fBfloat\fR method may
+also give different results on different machines due to varying roundoff
+behavior, whereas the integer methods should give the same results on all
+machines.
 .TP
 .B \-dither fs
 Use Floyd-Steinberg dithering in color quantization.
@@ -282,12 +290,6 @@
 .B \-dither none
 may give acceptable results in two-pass mode, but is seldom tolerable in
 one-pass mode.
-.PP
-If you are fortunate enough to have very fast floating point hardware,
-\fB\-dct float\fR may be even faster than \fB\-dct fast\fR.  But on most
-machines \fB\-dct float\fR is slower than \fB\-dct int\fR; in this case it is
-not worth using, because its theoretical accuracy advantage is too small to be
-significant in practice.
 .SH ENVIRONMENT
 .TP
 .B JPEGMEM
diff --git a/djpeg.c b/djpeg.c
index 7aa346b..d47984e 100644
--- a/djpeg.c
+++ b/djpeg.c
@@ -149,15 +149,15 @@
 #endif
   fprintf(stderr, "Switches for advanced users:\n");
 #ifdef DCT_ISLOW_SUPPORTED
-  fprintf(stderr, "  -dct int       Use integer DCT method%s\n",
+  fprintf(stderr, "  -dct int       Use accurate integer DCT method%s\n",
           (JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));
 #endif
 #ifdef DCT_IFAST_SUPPORTED
-  fprintf(stderr, "  -dct fast      Use fast integer DCT (less accurate)%s\n",
+  fprintf(stderr, "  -dct fast      Use less accurate integer DCT method [legacy feature]%s\n",
           (JDCT_DEFAULT == JDCT_IFAST ? " (default)" : ""));
 #endif
 #ifdef DCT_FLOAT_SUPPORTED
-  fprintf(stderr, "  -dct float     Use floating-point DCT method%s\n",
+  fprintf(stderr, "  -dct float     Use floating-point DCT method [legacy feature]%s\n",
           (JDCT_DEFAULT == JDCT_FLOAT ? " (default)" : ""));
 #endif
   fprintf(stderr, "  -dither fs     Use F-S dithering (default)\n");
diff --git a/jfdctint.c b/jfdctint.c
index b47c306..c95a3a7 100644
--- a/jfdctint.c
+++ b/jfdctint.c
@@ -4,11 +4,11 @@
  * This file was part of the Independent JPEG Group's software:
  * Copyright (C) 1991-1996, Thomas G. Lane.
  * libjpeg-turbo Modifications:
- * Copyright (C) 2015, D. R. Commander.
+ * Copyright (C) 2015, 2020, D. R. Commander.
  * For conditions of distribution and use, see the accompanying README.ijg
  * file.
  *
- * This file contains a slow-but-accurate integer implementation of the
+ * This file contains a slower but more accurate integer implementation of the
  * forward DCT (Discrete Cosine Transform).
  *
  * A 2-D DCT can be done by 1-D DCT on each row followed by 1-D DCT
diff --git a/jidctint.c b/jidctint.c
index b9974d4..bb08748 100644
--- a/jidctint.c
+++ b/jidctint.c
@@ -5,11 +5,11 @@
  * Copyright (C) 1991-1998, Thomas G. Lane.
  * Modification developed 2002-2018 by Guido Vollbeding.
  * libjpeg-turbo Modifications:
- * Copyright (C) 2015, D. R. Commander.
+ * Copyright (C) 2015, 2020, D. R. Commander.
  * For conditions of distribution and use, see the accompanying README.ijg
  * file.
  *
- * This file contains a slow-but-accurate integer implementation of the
+ * This file contains a slower but more accurate integer implementation of the
  * inverse DCT (Discrete Cosine Transform).  In the IJG code, this routine
  * must also perform dequantization of the input coefficients.
  *
diff --git a/jmorecfg.h b/jmorecfg.h
index 78f4bf1..fb3a9cf 100644
--- a/jmorecfg.h
+++ b/jmorecfg.h
@@ -5,7 +5,7 @@
  * Copyright (C) 1991-1997, Thomas G. Lane.
  * Modified 1997-2009 by Guido Vollbeding.
  * libjpeg-turbo Modifications:
- * Copyright (C) 2009, 2011, 2014-2015, 2018, D. R. Commander.
+ * Copyright (C) 2009, 2011, 2014-2015, 2018, 2020, D. R. Commander.
  * For conditions of distribution and use, see the accompanying README.ijg
  * file.
  *
@@ -238,9 +238,9 @@
 
 /* Capability options common to encoder and decoder: */
 
-#define DCT_ISLOW_SUPPORTED     /* slow but accurate integer algorithm */
-#define DCT_IFAST_SUPPORTED     /* faster, less accurate integer method */
-#define DCT_FLOAT_SUPPORTED     /* floating-point: accurate, fast on fast HW */
+#define DCT_ISLOW_SUPPORTED     /* accurate integer method */
+#define DCT_IFAST_SUPPORTED     /* less accurate int method [legacy feature] */
+#define DCT_FLOAT_SUPPORTED     /* floating-point method [legacy feature] */
 
 /* Encoder capability options: */
 
diff --git a/jpeglib.h b/jpeglib.h
index 33f8ad2..d7664f0 100644
--- a/jpeglib.h
+++ b/jpeglib.h
@@ -5,7 +5,7 @@
  * Copyright (C) 1991-1998, Thomas G. Lane.
  * Modified 2002-2009 by Guido Vollbeding.
  * libjpeg-turbo Modifications:
- * Copyright (C) 2009-2011, 2013-2014, 2016-2017, D. R. Commander.
+ * Copyright (C) 2009-2011, 2013-2014, 2016-2017, 2020, D. R. Commander.
  * Copyright (C) 2015, Google, Inc.
  * For conditions of distribution and use, see the accompanying README.ijg
  * file.
@@ -244,9 +244,9 @@
 /* DCT/IDCT algorithm options. */
 
 typedef enum {
-  JDCT_ISLOW,             /* slow but accurate integer algorithm */
-  JDCT_IFAST,             /* faster, less accurate integer method */
-  JDCT_FLOAT              /* floating-point: accurate, fast on fast HW */
+  JDCT_ISLOW,             /* accurate integer method */
+  JDCT_IFAST,             /* less accurate integer method [legacy feature] */
+  JDCT_FLOAT              /* floating-point method [legacy feature] */
 } J_DCT_METHOD;
 
 #ifndef JDCT_DEFAULT            /* may be overridden in jconfig.h */
diff --git a/libjpeg.txt b/libjpeg.txt
index c233ecb..3c680b5 100644
--- a/libjpeg.txt
+++ b/libjpeg.txt
@@ -969,30 +969,38 @@
 
 J_DCT_METHOD dct_method
         Selects the algorithm used for the DCT step.  Choices are:
-                JDCT_ISLOW: slow but accurate integer algorithm
-                JDCT_IFAST: faster, less accurate integer method
-                JDCT_FLOAT: floating-point method
+                JDCT_ISLOW: accurate integer method
+                JDCT_IFAST: less accurate integer method [legacy feature]
+                JDCT_FLOAT: floating-point method [legacy feature]
                 JDCT_DEFAULT: default method (normally JDCT_ISLOW)
                 JDCT_FASTEST: fastest method (normally JDCT_IFAST)
-        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
-        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
-        with other SIMD implementations, or when using libjpeg-turbo without
-        SIMD extensions.)  For quality levels of 90 and below, there should be
-        little or no perceptible difference between the two algorithms.  For
-        quality levels above 90, however, the difference between JDCT_IFAST and
+        When the Independent JPEG Group's software was first released in 1991,
+        the compression time for a 1-megapixel JPEG image on a mainstream PC
+        was measured in minutes.  Thus, JDCT_IFAST provided noticeable
+        performance benefits.  On modern CPUs running libjpeg-turbo, however,
+        the compression time for a 1-megapixel JPEG image is measured in
+        milliseconds, and thus the performance benefits of JDCT_IFAST are much
+        less noticeable.  On modern x86/x86-64 CPUs that support AVX2
+        instructions, JDCT_IFAST and JDCT_ISLOW have similar performance.  On
+        other types of CPUs, JDCT_IFAST is generally about 5-15% faster than
+        JDCT_ISLOW.
+
+        For quality levels of 90 and below, there should be little or no
+        perceptible quality difference between the two algorithms.  For quality
+        levels above 90, however, the difference between JDCT_IFAST and
         JDCT_ISLOW becomes more pronounced.  With quality=97, for instance,
-        JDCT_IFAST incurs generally about a 1-3 dB loss (in PSNR) relative to
+        JDCT_IFAST incurs generally about a 1-3 dB loss in PSNR relative to
         JDCT_ISLOW, but this can be larger for some images.  Do not use
         JDCT_IFAST with quality levels above 97.  The algorithm often
         degenerates at quality=98 and above and can actually produce a more
         lossy image than if lower quality levels had been used.  Also, in
         libjpeg-turbo, JDCT_IFAST is not fully accelerated for quality levels
-        above 97, so it will be slower than JDCT_ISLOW.  JDCT_FLOAT is mainly a
-        legacy feature.  It does not produce significantly more accurate
-        results than the ISLOW method, and it is much slower.  The FLOAT method
-        may also give different results on different machines due to varying
-        roundoff behavior, whereas the integer methods should give the same
-        results on all machines.
+        above 97, so it will be slower than JDCT_ISLOW.
+
+        JDCT_FLOAT does not produce significantly more accurate results than
+        JDCT_ISLOW, and it is much slower.  JDCT_FLOAT may also give different
+        results on different machines due to varying roundoff behavior, whereas
+        the integer methods should give the same results on all machines.
 
 J_COLOR_SPACE jpeg_color_space
 int num_components
@@ -1270,31 +1278,39 @@
 
 J_DCT_METHOD dct_method
         Selects the algorithm used for the DCT step.  Choices are:
-                JDCT_ISLOW: slow but accurate integer algorithm
-                JDCT_IFAST: faster, less accurate integer method
-                JDCT_FLOAT: floating-point method
+                JDCT_ISLOW: accurate integer method
+                JDCT_IFAST: less accurate integer method [legacy feature]
+                JDCT_FLOAT: floating-point method [legacy feature]
                 JDCT_DEFAULT: default method (normally JDCT_ISLOW)
                 JDCT_FASTEST: fastest method (normally JDCT_IFAST)
-        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
-        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
-        with other SIMD implementations, or when using libjpeg-turbo without
-        SIMD extensions.)  If the JPEG image was compressed using a quality
-        level of 85 or below, then there should be little or no perceptible
-        difference between the two algorithms.  When decompressing images that
-        were compressed using quality levels above 85, however, the difference
+        When the Independent JPEG Group's software was first released in 1991,
+        the decompression time for a 1-megapixel JPEG image on a mainstream PC
+        was measured in minutes.  Thus, JDCT_IFAST provided noticeable
+        performance benefits.  On modern CPUs running libjpeg-turbo, however,
+        the decompression time for a 1-megapixel JPEG image is measured in
+        milliseconds, and thus the performance benefits of JDCT_IFAST are much
+        less noticeable.  On modern x86/x86-64 CPUs that support AVX2
+        instructions, JDCT_IFAST and JDCT_ISLOW have similar performance.  On
+        other types of CPUs, JDCT_IFAST is generally about 5-15% faster than
+        JDCT_ISLOW.
+
+        If the JPEG image was compressed using a quality level of 85 or below,
+        then there should be little or no perceptible quality difference
+        between the two algorithms.  When decompressing images that were
+        compressed using quality levels above 85, however, the difference
         between JDCT_IFAST and JDCT_ISLOW becomes more pronounced.  With images
         compressed using quality=97, for instance, JDCT_IFAST incurs generally
-        about a 4-6 dB loss (in PSNR) relative to JDCT_ISLOW, but this can be
+        about a 4-6 dB loss in PSNR relative to JDCT_ISLOW, but this can be
         larger for some images.  If you can avoid it, do not use JDCT_IFAST
         when decompressing images that were compressed using quality levels
         above 97.  The algorithm often degenerates for such images and can
         actually produce a more lossy output image than if the JPEG image had
-        been compressed using lower quality levels.  JDCT_FLOAT is mainly a
-        legacy feature.  It does not produce significantly more accurate
-        results than the ISLOW method, and it is much slower.  The FLOAT method
-        may also give different results on different machines due to varying
-        roundoff behavior, whereas the integer methods should give the same
-        results on all machines.
+        been compressed using lower quality levels.
+
+        JDCT_FLOAT does not produce significantly more accurate results than
+        JDCT_ISLOW, and it is much slower.  JDCT_FLOAT may also give different
+        results on different machines due to varying roundoff behavior, whereas
+        the integer methods should give the same results on all machines.
 
 boolean do_fancy_upsampling
         If TRUE, do careful upsampling of chroma components.  If FALSE,
diff --git a/simd/arm64/jsimd_neon.S b/simd/arm64/jsimd_neon.S
index add422c..933482c 100644
--- a/simd/arm64/jsimd_neon.S
+++ b/simd/arm64/jsimd_neon.S
@@ -6,7 +6,7 @@
  * Author:  Siarhei Siamashka <siarhei.siamashka@nokia.com>
  * Copyright (C) 2013-2014, Linaro Limited.  All Rights Reserved.
  * Author:  Ragesh Radhakrishnan <ragesh.r@linaro.org>
- * Copyright (C) 2014-2016, D. R. Commander.  All Rights Reserved.
+ * Copyright (C) 2014-2016, 2020, D. R. Commander.  All Rights Reserved.
  * Copyright (C) 2015-2016, 2018, Matthieu Darbois.  All Rights Reserved.
  * Copyright (C) 2016, Siarhei Siamashka.  All Rights Reserved.
  *
@@ -2373,7 +2373,7 @@
 /*
  * jsimd_fdct_islow_neon
  *
- * This file contains a slow-but-accurate integer implementation of the
+ * This file contains a slower but more accurate integer implementation of the
  * forward DCT (Discrete Cosine Transform). The following code is based
  * directly on the IJG''s original jfdctint.c; see the jfdctint.c for
  * more details.
diff --git a/simd/i386/jfdctint-avx2.asm b/simd/i386/jfdctint-avx2.asm
index 97de230..23cf733 100644
--- a/simd/i386/jfdctint-avx2.asm
+++ b/simd/i386/jfdctint-avx2.asm
@@ -2,7 +2,7 @@
 ; jfdctint.asm - accurate integer FDCT (AVX2)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2009, 2016, 2018, D. R. Commander.
+; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -14,7 +14,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; forward DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jfdctint.c; see the jfdctint.c for
 ; more details.
@@ -103,7 +103,7 @@
 %endmacro
 
 ; --------------------------------------------------------------------------
-; In-place 8x8x16-bit slow integer forward DCT using AVX2 instructions
+; In-place 8x8x16-bit accurate integer forward DCT using AVX2 instructions
 ; %1-%4: Input/output registers
 ; %5-%8: Temp registers
 ; %9:    Pass (1 or 2)
diff --git a/simd/i386/jfdctint-mmx.asm b/simd/i386/jfdctint-mmx.asm
index 3ade9d4..34a43b9 100644
--- a/simd/i386/jfdctint-mmx.asm
+++ b/simd/i386/jfdctint-mmx.asm
@@ -2,7 +2,7 @@
 ; jfdctint.asm - accurate integer FDCT (MMX)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2016, D. R. Commander.
+; Copyright (C) 2016, 2020, D. R. Commander.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -14,7 +14,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; forward DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jfdctint.c; see the jfdctint.c for
 ; more details.
diff --git a/simd/i386/jfdctint-sse2.asm b/simd/i386/jfdctint-sse2.asm
index 71b684c..6f8e18c 100644
--- a/simd/i386/jfdctint-sse2.asm
+++ b/simd/i386/jfdctint-sse2.asm
@@ -2,7 +2,7 @@
 ; jfdctint.asm - accurate integer FDCT (SSE2)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2016, D. R. Commander.
+; Copyright (C) 2016, 2020, D. R. Commander.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -14,7 +14,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; forward DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jfdctint.c; see the jfdctint.c for
 ; more details.
diff --git a/simd/i386/jidctint-avx2.asm b/simd/i386/jidctint-avx2.asm
index c371985..199c7df 100644
--- a/simd/i386/jidctint-avx2.asm
+++ b/simd/i386/jidctint-avx2.asm
@@ -2,7 +2,7 @@
 ; jidctint.asm - accurate integer IDCT (AVX2)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2009, 2016, 2018, D. R. Commander.
+; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -14,7 +14,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; inverse DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jidctint.c; see the jidctint.c for
 ; more details.
@@ -113,7 +113,7 @@
 %endmacro
 
 ; --------------------------------------------------------------------------
-; In-place 8x8x16-bit slow integer inverse DCT using AVX2 instructions
+; In-place 8x8x16-bit accurate integer inverse DCT using AVX2 instructions
 ; %1-%4:  Input/output registers
 ; %5-%12: Temp registers
 ; %9:     Pass (1 or 2)
diff --git a/simd/i386/jidctint-mmx.asm b/simd/i386/jidctint-mmx.asm
index 4f07f56..f15c8d3 100644
--- a/simd/i386/jidctint-mmx.asm
+++ b/simd/i386/jidctint-mmx.asm
@@ -2,7 +2,7 @@
 ; jidctint.asm - accurate integer IDCT (MMX)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2016, D. R. Commander.
+; Copyright (C) 2016, 2020, D. R. Commander.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -14,7 +14,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; inverse DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jidctint.c; see the jidctint.c for
 ; more details.
diff --git a/simd/i386/jidctint-sse2.asm b/simd/i386/jidctint-sse2.asm
index e442fdd..43e3201 100644
--- a/simd/i386/jidctint-sse2.asm
+++ b/simd/i386/jidctint-sse2.asm
@@ -2,7 +2,7 @@
 ; jidctint.asm - accurate integer IDCT (SSE2)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2016, D. R. Commander.
+; Copyright (C) 2016, 2020, D. R. Commander.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -14,7 +14,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; inverse DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jidctint.c; see the jidctint.c for
 ; more details.
diff --git a/simd/jsimd.h b/simd/jsimd.h
index 97a0062..c7d805b 100644
--- a/simd/jsimd.h
+++ b/simd/jsimd.h
@@ -2,7 +2,7 @@
  * simd/jsimd.h
  *
  * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
- * Copyright (C) 2011, 2014-2016, 2018, D. R. Commander.
+ * Copyright (C) 2011, 2014-2016, 2018, 2020, D. R. Commander.
  * Copyright (C) 2013-2014, MIPS Technologies, Inc., California.
  * Copyright (C) 2014, Linaro Limited.
  * Copyright (C) 2015-2016, 2018, Matthieu Darbois.
@@ -951,7 +951,7 @@
 EXTERN(void) jsimd_convsamp_float_dspr2
   (JSAMPARRAY sample_data, JDIMENSION start_col, FAST_FLOAT *workspace);
 
-/* Slow Integer Forward DCT */
+/* Accurate Integer Forward DCT */
 EXTERN(void) jsimd_fdct_islow_mmx(DCTELEM *data);
 
 extern const int jconst_fdct_islow_sse2[];
@@ -1060,7 +1060,7 @@
 EXTERN(void) jsimd_idct_12x12_pass2_dspr2
   (int *workspace, int *output);
 
-/* Slow Integer Inverse DCT */
+/* Accurate Integer Inverse DCT */
 EXTERN(void) jsimd_idct_islow_mmx
   (void *dct_table, JCOEFPTR coef_block, JSAMPARRAY output_buf,
    JDIMENSION output_col);
diff --git a/simd/mips64/jfdctint-mmi.c b/simd/mips64/jfdctint-mmi.c
index a0ea692..7f4dfe9 100644
--- a/simd/mips64/jfdctint-mmi.c
+++ b/simd/mips64/jfdctint-mmi.c
@@ -1,7 +1,7 @@
 /*
  * Loongson MMI optimizations for libjpeg-turbo
  *
- * Copyright (C) 2014, 2018, D. R. Commander.  All Rights Reserved.
+ * Copyright (C) 2014, 2018, 2020, D. R. Commander.  All Rights Reserved.
  * Copyright (C) 2016-2017, Loongson Technology Corporation Limited, BeiJing.
  *                          All Rights Reserved.
  * Authors:  ZhuChen     <zhuchen@loongson.cn>
@@ -28,7 +28,7 @@
  * 3. This notice may not be removed or altered from any source distribution.
  */
 
-/* SLOW INTEGER FORWARD DCT */
+/* ACCURATE INTEGER FORWARD DCT */
 
 #include "jsimd_mmi.h"
 
diff --git a/simd/mips64/jidctint-mmi.c b/simd/mips64/jidctint-mmi.c
index 419c638..cd3db98 100644
--- a/simd/mips64/jidctint-mmi.c
+++ b/simd/mips64/jidctint-mmi.c
@@ -1,7 +1,7 @@
 /*
  * Loongson MMI optimizations for libjpeg-turbo
  *
- * Copyright (C) 2014-2015, 2018, D. R. Commander.  All Rights Reserved.
+ * Copyright (C) 2014-2015, 2018, 2020, D. R. Commander.  All Rights Reserved.
  * Copyright (C) 2016-2017, Loongson Technology Corporation Limited, BeiJing.
  *                          All Rights Reserved.
  * Authors:  ZhuChen     <zhuchen@loongson.cn>
@@ -28,7 +28,7 @@
  * 3. This notice may not be removed or altered from any source distribution.
  */
 
-/* SLOW INTEGER INVERSE DCT */
+/* ACCUATE INTEGER INVERSE DCT */
 
 #include "jsimd_mmi.h"
 
diff --git a/simd/powerpc/jfdctint-altivec.c b/simd/powerpc/jfdctint-altivec.c
index 6e63cc1..3d4f017 100644
--- a/simd/powerpc/jfdctint-altivec.c
+++ b/simd/powerpc/jfdctint-altivec.c
@@ -1,7 +1,7 @@
 /*
  * AltiVec optimizations for libjpeg-turbo
  *
- * Copyright (C) 2014, D. R. Commander.  All Rights Reserved.
+ * Copyright (C) 2014, 2020, D. R. Commander.  All Rights Reserved.
  *
  * This software is provided 'as-is', without any express or implied
  * warranty.  In no event will the authors be held liable for any damages
@@ -20,7 +20,7 @@
  * 3. This notice may not be removed or altered from any source distribution.
  */
 
-/* SLOW INTEGER FORWARD DCT */
+/* ACCURATE INTEGER FORWARD DCT */
 
 #include "jsimd_altivec.h"
 
diff --git a/simd/powerpc/jidctint-altivec.c b/simd/powerpc/jidctint-altivec.c
index 0e5dd58..60e619f 100644
--- a/simd/powerpc/jidctint-altivec.c
+++ b/simd/powerpc/jidctint-altivec.c
@@ -1,7 +1,7 @@
 /*
  * AltiVec optimizations for libjpeg-turbo
  *
- * Copyright (C) 2014-2015, D. R. Commander.  All Rights Reserved.
+ * Copyright (C) 2014-2015, 2020, D. R. Commander.  All Rights Reserved.
  *
  * This software is provided 'as-is', without any express or implied
  * warranty.  In no event will the authors be held liable for any damages
@@ -20,7 +20,7 @@
  * 3. This notice may not be removed or altered from any source distribution.
  */
 
-/* SLOW INTEGER INVERSE DCT */
+/* ACCURATE INTEGER INVERSE DCT */
 
 #include "jsimd_altivec.h"
 
diff --git a/simd/x86_64/jfdctint-avx2.asm b/simd/x86_64/jfdctint-avx2.asm
index 6ad4cf0..e56258b 100644
--- a/simd/x86_64/jfdctint-avx2.asm
+++ b/simd/x86_64/jfdctint-avx2.asm
@@ -2,7 +2,7 @@
 ; jfdctint.asm - accurate integer FDCT (64-bit AVX2)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2009, 2016, 2018, D. R. Commander.
+; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -14,7 +14,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; forward DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jfdctint.c; see the jfdctint.c for
 ; more details.
@@ -103,7 +103,7 @@
 %endmacro
 
 ; --------------------------------------------------------------------------
-; In-place 8x8x16-bit slow integer forward DCT using AVX2 instructions
+; In-place 8x8x16-bit accurate integer forward DCT using AVX2 instructions
 ; %1-%4: Input/output registers
 ; %5-%8: Temp registers
 ; %9:    Pass (1 or 2)
diff --git a/simd/x86_64/jfdctint-sse2.asm b/simd/x86_64/jfdctint-sse2.asm
index 5d0de3c..ec1f383 100644
--- a/simd/x86_64/jfdctint-sse2.asm
+++ b/simd/x86_64/jfdctint-sse2.asm
@@ -2,7 +2,7 @@
 ; jfdctint.asm - accurate integer FDCT (64-bit SSE2)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2009, 2016, D. R. Commander.
+; Copyright (C) 2009, 2016, 2020, D. R. Commander.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -14,7 +14,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; forward DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jfdctint.c; see the jfdctint.c for
 ; more details.
diff --git a/simd/x86_64/jidctint-avx2.asm b/simd/x86_64/jidctint-avx2.asm
index fee7c93..ca7e317 100644
--- a/simd/x86_64/jidctint-avx2.asm
+++ b/simd/x86_64/jidctint-avx2.asm
@@ -2,7 +2,7 @@
 ; jidctint.asm - accurate integer IDCT (64-bit AVX2)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2009, 2016, 2018, D. R. Commander.
+; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander.
 ; Copyright (C) 2018, Matthias Räncker.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
@@ -15,7 +15,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; inverse DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jidctint.c; see the jidctint.c for
 ; more details.
@@ -114,7 +114,7 @@
 %endmacro
 
 ; --------------------------------------------------------------------------
-; In-place 8x8x16-bit slow integer inverse DCT using AVX2 instructions
+; In-place 8x8x16-bit accurate integer inverse DCT using AVX2 instructions
 ; %1-%4:  Input/output registers
 ; %5-%12: Temp registers
 ; %9:     Pass (1 or 2)
diff --git a/simd/x86_64/jidctint-sse2.asm b/simd/x86_64/jidctint-sse2.asm
index d26ea5d..7aa869b 100644
--- a/simd/x86_64/jidctint-sse2.asm
+++ b/simd/x86_64/jidctint-sse2.asm
@@ -2,7 +2,7 @@
 ; jidctint.asm - accurate integer IDCT (64-bit SSE2)
 ;
 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
-; Copyright (C) 2009, 2016, D. R. Commander.
+; Copyright (C) 2009, 2016, 2020, D. R. Commander.
 ; Copyright (C) 2018, Matthias Räncker.
 ;
 ; Based on the x86 SIMD extension for IJG JPEG library
@@ -15,7 +15,7 @@
 ; NASM is available from http://nasm.sourceforge.net/ or
 ; http://sourceforge.net/project/showfiles.php?group_id=6208
 ;
-; This file contains a slow-but-accurate integer implementation of the
+; This file contains a slower but more accurate integer implementation of the
 ; inverse DCT (Discrete Cosine Transform). The following code is based
 ; directly on the IJG's original jidctint.c; see the jidctint.c for
 ; more details.
diff --git a/usage.txt b/usage.txt
index 1d9efcd..f7fa3c0 100644
--- a/usage.txt
+++ b/usage.txt
@@ -168,35 +168,43 @@
                         be unable to view an arithmetic coded JPEG file at
                         all.
 
-        -dct int        Use integer DCT method (default).
-        -dct fast       Use fast integer DCT (less accurate).
-                        In libjpeg-turbo, the fast method is generally about
-                        5-15% faster than the int method when using the
-                        x86/x86-64 SIMD extensions (results may vary with other
-                        SIMD implementations, or when using libjpeg-turbo
-                        without SIMD extensions.)  For quality levels of 90 and
-                        below, there should be little or no perceptible
-                        difference between the two algorithms.  For quality
-                        levels above 90, however, the difference between
-                        the fast and the int methods becomes more pronounced.
-                        With quality=97, for instance, the fast method incurs
-                        generally about a 1-3 dB loss (in PSNR) relative to
-                        the int method, but this can be larger for some images.
-                        Do not use the fast method with quality levels above
-                        97.  The algorithm often degenerates at quality=98 and
-                        above and can actually produce a more lossy image than
-                        if lower quality levels had been used.  Also, in
-                        libjpeg-turbo, the fast method is not fully accerated
-                        for quality levels above 97, so it will be slower than
-                        the int method.
-        -dct float      Use floating-point DCT method.
-                        The float method is mainly a legacy feature.  It does
-                        not produce significantly more accurate results than
-                        the int method, and it is much slower.  The float
-                        method may also give different results on different
-                        machines due to varying roundoff behavior, whereas the
-                        integer methods should give the same results on all
-                        machines.
+        -dct int        Use accurate integer DCT method (default).
+        -dct fast       Use less accurate integer DCT method [legacy feature].
+                        When the Independent JPEG Group's software was first
+                        released in 1991, the compression time for a
+                        1-megapixel JPEG image on a mainstream PC was measured
+                        in minutes.  Thus, the fast integer DCT algorithm
+                        provided noticeable performance benefits.  On modern
+                        CPUs running libjpeg-turbo, however, the compression
+                        time for a 1-megapixel JPEG image is measured in
+                        milliseconds, and thus the performance benefits of the
+                        fast algorithm are much less noticeable.  On modern
+                        x86/x86-64 CPUs that support AVX2 instructions, the
+                        fast and int methods have similar performance.  On
+                        other types of CPUs, the fast method is generally about
+                        5-15% faster than the int method.
+
+                        For quality levels of 90 and below, there should be
+                        little or no perceptible quality difference between the
+                        two algorithms.  For quality levels above 90, however,
+                        the difference between the fast and int methods becomes
+                        more pronounced.  With quality=97, for instance, the
+                        fast method incurs generally about a 1-3 dB loss in
+                        PSNR relative to the int method, but this can be larger
+                        for some images.  Do not use the fast method with
+                        quality levels above 97.  The algorithm often
+                        degenerates at quality=98 and above and can actually
+                        produce a more lossy image than if lower quality levels
+                        had been used.  Also, in libjpeg-turbo, the fast method
+                        is not fully accelerated for quality levels above 97,
+                        so it will be slower than the int method.
+        -dct float      Use floating-point DCT method [legacy feature].
+                        The float method does not produce significantly more
+                        accurate results than the int method, and it is much
+                        slower.  The float method may also give different
+                        results on different machines due to varying roundoff
+                        behavior, whereas the integer methods should give the
+                        same results on all machines.
 
         -restart N      Emit a JPEG restart marker every N MCU rows, or every
                         N MCU blocks if "B" is attached to the number.
@@ -318,36 +326,45 @@
 
 Switches for advanced users:
 
-        -dct int        Use integer DCT method (default).
-        -dct fast       Use fast integer DCT (less accurate).
-                        In libjpeg-turbo, the fast method is generally about
-                        5-15% faster than the int method when using the
-                        x86/x86-64 SIMD extensions (results may vary with other
-                        SIMD implementations, or when using libjpeg-turbo
-                        without SIMD extensions.)  If the JPEG image was
-                        compressed using a quality level of 85 or below, then
-                        there should be little or no perceptible difference
-                        between the two algorithms.  When decompressing images
-                        that were compressed using quality levels above 85,
-                        however, the difference between the fast and int
-                        methods becomes more pronounced.  With images
-                        compressed using quality=97, for instance, the fast
-                        method incurs generally about a 4-6 dB loss (in PSNR)
-                        relative to the int method, but this can be larger for
-                        some images.  If you can avoid it, do not use the fast
-                        method when decompressing images that were compressed
-                        using quality levels above 97.  The algorithm often
-                        degenerates for such images and can actually produce
-                        a more lossy output image than if the JPEG image had
-                        been compressed using lower quality levels.
-        -dct float      Use floating-point DCT method.
-                        The float method is mainly a legacy feature.  It does
-                        not produce significantly more accurate results than
-                        the int method, and it is much slower.  The float
-                        method may also give different results on different
-                        machines due to varying roundoff behavior, whereas the
-                        integer methods should give the same results on all
-                        machines.
+        -dct int        Use accurate integer DCT method (default).
+        -dct fast       Use less accurate integer DCT method [legacy feature].
+                        When the Independent JPEG Group's software was first
+                        released in 1991, the decompression time for a
+                        1-megapixel JPEG image on a mainstream PC was measured
+                        in minutes.  Thus, the fast integer DCT algorithm
+                        provided noticeable performance benefits.  On modern
+                        CPUs running libjpeg-turbo, however, the decompression
+                        time for a 1-megapixel JPEG image is measured in
+                        milliseconds, and thus the performance benefits of the
+                        fast algorithm are much less noticeable.  On modern
+                        x86/x86-64 CPUs that support AVX2 instructions, the
+                        fast and int methods have similar performance.  On
+                        other types of CPUs, the fast method is generally about
+                        5-15% faster than the int method.
+
+                        If the JPEG image was compressed using a quality level
+                        of 85 or below, then there should be little or no
+                        perceptible quality difference between the two
+                        algorithms.  When decompressing images that were
+                        compressed using quality levels above 85, however, the
+                        difference between the fast and int methods becomes
+                        more pronounced.  With images compressed using
+                        quality=97, for instance, the fast method incurs
+                        generally about a 4-6 dB loss in PSNR relative to the
+                        int method, but this can be larger for some images.  If
+                        you can avoid it, do not use the fast method when
+                        decompressing images that were compressed using quality
+                        levels above 97.  The algorithm often degenerates for
+                        such images and can actually produce a more lossy
+                        output image than if the JPEG image had been compressed
+                        using lower quality levels.
+        -dct float      Use floating-point DCT method [legacy feature].
+                        The float method does not produce significantly more
+                        accurate results than the int method, and it is much
+                        slower.  The float method may also give different
+                        results on different machines due to varying roundoff
+                        behavior, whereas the integer methods should give the
+                        same results on all machines.
 
         -dither fs      Use Floyd-Steinberg dithering in color quantization.
         -dither ordered Use ordered dithering in color quantization.