The Independent JPEG Group's JPEG software v4
diff --git a/CHANGELOG b/CHANGELOG
index 24d7a49..02c63dc 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,5 +1,50 @@
 CHANGELOG for Independent JPEG Group's JPEG software
 
+Version 4  10-Dec-92
+--------------------
+
+Revised user interface: switches now use names instead of single letters.
+(Old switch letters are acceptable abbreviations of new switch names, EXCEPT
+for djpeg's old -g, -D, -1 switches.)  cjpeg has several new switches.
+
+Provision for smoothing the input image added to cjpeg.  This helps a lot with
+converting dithered GIFs to JPEG.
+
+Decoder upsampling now uses interpolation instead of pixel replication; this
+improves rendering of sharp colored edges.
+
+The decompressor will now try to continue after detecting an error in the
+compressed data, instead of just aborting.  If the input file has restart
+markers, full synchronization will usually be regained at the next undamaged
+restart marker.  (But you're still out of luck if any of the header markers
+are corrupt.)
+
+Substantial improvements in speed; DCT accuracy improved too.
+
+Numerous minor changes to improve portability.  egetopt.c, which was by far
+the worst portability problem, is gone altogether.
+
+A few bugfixes, sigh (mostly affecting DOS implementations only).
+Bugfix: on DOS machines, cjpeg -o would fail on grayscale input files.
+Bugfix: one-pass quantization to more than 64 color levels would fail on
+16-bit-int machines.  This could only happen with quantized grayscale output.
+
+A couple of changes affect code that calls the JPEG subroutine library:
+
+1. The parameter struct tag names are now capitalized (Compress_info_struct,
+Compress_methods_struct, Decompress_info_struct, Decompress_methods_struct,
+and External_methods_struct).  This makes it easier to live with brain-damaged
+compilers with short identifier lengths.  (All identifiers used in the JPEG
+code are now unique within the first 16 characters.)
+
+2. If you are not calling jselerror(), you need to initialize three new fields
+in the emethods structure, typically as follows:
+    e_methods.num_warnings = 0; /* no warnings emitted yet */
+    e_methods.first_warning_level = 0; /* display first corrupt-data warning */
+    e_methods.more_warning_level = 3; /* but suppress additional ones */
+These fields control handling of corrupt-data warnings.
+
+
 Version 3  17-Mar-92
 --------------------
 
diff --git a/README b/README
index 3dc1140..9223de2 100644
--- a/README
+++ b/README
@@ -1,24 +1,28 @@
 The Independent JPEG Group's JPEG software
 ==========================================
 
-README for release 3 of  17-Mar-92
-==================================
+README for release 4 of 10-Dec-92
+=================================
 
-This distribution contains the third official release of the Independent JPEG
+This distribution contains the fourth public release of the Independent JPEG
 Group's free JPEG software.  You are welcome to redistribute this software and
 to use it for any purpose, subject to the conditions under LEGAL ISSUES, below.
 
-For installation instructions, see file SETUP; for usage instructions, see
-file USAGE (or the cjpeg.1 and djpeg.1 manual pages).
+For installation instructions, see file SETUP.
+
+For usage instructions, see file USAGE (or the cjpeg.1 and djpeg.1 manual
+pages; but USAGE contains a "hints" section not found in the manual pages).
+Useful information can also be found in the JPEG FAQ (Frequently Asked
+Questions) article; see ARCHIVE LOCATIONS below to obtain the FAQ article.
 
 This software is still undergoing revision.  Updated versions may be obtained
 by FTP or UUCP to UUNET and other archive sites; see ARCHIVE LOCATIONS below
 for details.
 
-If you intend to become a serious user of this software, please contact
-jpeg-info@uunet.uu.net to be added to our electronic mailing list.  Then
-you'll be notified of updates and have a chance to participate in discussions,
-etc.
+Serious users of this software (particularly those incorporating it into
+larger programs) should contact jpeg-info@uunet.uu.net to be added to our
+electronic mailing list.  Mailing list members are notified of updates and
+have a chance to participate in technical discussions, etc.
 
 This software is the work of Tom Lane, Philip Gladstone, Luis Ortiz,
 Lee Crocker, Ge' Weijers, and other members of the Independent JPEG Group.
@@ -32,7 +36,9 @@
 an accurate implementation of the JPEG standard.  (See LEGAL ISSUES for even
 more disclaimers.)
 
-Please report any problems with this software to jpeg-info@uunet.uu.net.
+Despite that, we believe that this software is pretty good, and if you find
+any problems with it, we'd like to know about them.  Please report problems
+by e-mail to jpeg-info@uunet.uu.net.
 
 
 WHAT'S HERE
@@ -40,15 +46,15 @@
 
 This distribution contains C software to implement JPEG image compression and
 decompression.  JPEG (pronounced "jay-peg") is a standardized compression
-method for full-color and gray-scale images.  JPEG is intended for
+method for full-color and gray-scale images.  JPEG is intended for compressing
 "real-world" scenes; cartoons and other non-realistic images are not its
 strong suit.  JPEG is lossy, meaning that the output image is not necessarily
-identical to the input image.  Hence you should not use JPEG if you have to
-have identical output bits.  However, on typical images of real-world scenes,
-very good compression levels can be obtained with no visible change, and
-amazingly high compression levels can be obtained if you can tolerate a
-low-quality image.  For more details, see the references, or just experiment
-with various compression settings.
+identical to the input image.  Hence you must not use JPEG if you have to have
+identical output bits.  However, on typical images of real-world scenes, very
+good compression levels can be obtained with no visible change, and amazingly
+high compression levels are possible if you can tolerate a low-quality image.
+For more details, see the references, or just experiment with various
+compression settings.
 
 The software implements JPEG baseline and extended-sequential compression
 processes.  Provision is made for supporting all variants of these processes,
@@ -57,14 +63,19 @@
 LEGAL ISSUES.  At present we have made no provision for supporting the
 progressive, hierarchical, or lossless processes defined in the standard.
 
-The present software is not far beyond the prototype stage.  It does not
-support all possible variants of the JPEG standard, and some functions have
-rather slow and/or crude implementations.  However, it is useful already.
+In order to support file conversion and viewing software, we have included
+considerable functionality beyond the bare JPEG coding/decoding capability;
+for example, the color quantization modules are not strictly part of JPEG
+decoding, but they are essential for output to colormapped file formats or
+colormapped displays.  These extra functions can be compiled out if not
+required for a particular application.
 
 The emphasis in designing this software has been on achieving portability and
-flexibility, while also making it fast enough to be useful.  We have not yet
-undertaken serious performance measurement or tuning; we intend to do so in
-the future.
+flexibility, while also making it fast enough to be useful.  In particular,
+the software is not intended to be read as a tutorial on JPEG.  (See the
+REFERENCES section for introductory material.)  While we hope that the entire
+package will someday be industrial-strength code, much remains to be done in
+performance tuning and in improving the capabilities of individual modules.
 
 
 This software can be used on several levels:
@@ -99,27 +110,28 @@
 The "official" archive site for this software is ftp.uu.net (Internet
 address 137.39.1.9 or 192.48.96.9).  The most recent released version can
 always be found there in directory graphics/jpeg.  This particular version
-will be archived as jpegsrc.v3.tar.Z.  If you are on the Internet, you can
+will be archived as jpegsrc.v4.tar.Z.  If you are on the Internet, you can
 retrieve files from UUNET by anonymous FTP.  If you don't have FTP access,
 UUNET's archives are also available via UUCP; contact postmaster@uunet.uu.net
 for information on retrieving files that way.
 
-Various other Internet sites maintain copies of the UUNET file, which may or
-may not be up-to-date.  In Europe, try nic.funet.fi (128.214.6.100; look in
-directory pub/graphics/programs/jpeg).
+Numerous Internet sites maintain copies of the UUNET files; in particular,
+you can probably find a copy at any site that archives comp.sources.misc
+submissions.  However, only ftp.uu.net is guaranteed to have the latest
+official version.
 
 You can also obtain this software from CompuServe, in the GRAPHSUPPORT forum
-(GO PICS), library 10; this version will be file jpsrc3.zip.
+(GO PICS), library 15; this version will be file jpsrc4.zip.  Again,
+CompuServe is not guaranteed to have the very latest version.
 
-If you are not reasonably handy at configuring and installing portable C
-programs, you may have some difficulty installing this package.  You may
-prefer to obtain a pre-built executable version.  A collection of pre-built
-executables for various machines is currently available for anonymous FTP at
-procyon.cis.ksu.edu (129.130.10.80 --- this number is due to change soon);
-look under /pub/JPEG.  The administrators of this system ask that FTP traffic
-be limited to non-prime hours.  For more information on this archive, please
-contact Steve Davis (strat@cis.ksu.edu).  This collection is not maintained by
-the Independent JPEG Group, and programs in it may not be the latest version.
+The JPEG FAQ (Frequently Asked Questions) article is a useful source of
+general information about JPEG.  It is updated constantly and therefore
+is not included in this distribution.  The FAQ is posted every two weeks
+to Usenet newsgroups comp.graphics, news.answers, and other groups.  You
+can always obtain the latest version from the news.answers archive at
+rtfm.mit.edu (18.172.1.27).  By FTP, fetch /pub/usenet/news.answers/jpeg-faq.
+If you don't have FTP, send e-mail to mail-server@rtfm.mit.edu with body
+"send usenet/news.answers/jpeg-faq".
 
 
 SUPPORTING SOFTWARE
@@ -138,8 +150,10 @@
 these can be found in the contrib directory at export.lcs.mit.edu.  Actually,
 xv version 2.00 and up incorporates our software and thus can read and write
 JPEG files directly.  (NOTE: since xv internally reduces all images to 8
-bits/pixel, a JPEG file written by xv will not be very high quality; you may
-also prefer xloadimage for viewing if you have a 24-bit display.  Caveat user.)
+bits/pixel, a JPEG file written by xv will not be very high quality; and xv
+cannot fully exploit a 24-bit display.  These problems are expected to go away
+in the next xv release, planned for early 1993.  In the meantime, use
+xloadimage for 24-bit displays.)
 
 For DOS machines, Lee Crocker's free Piclab program is a useful companion to
 the JPEG software.  The latest version, currently 1.91, is available by FTP
@@ -157,26 +171,25 @@
 program JPG2GIF can read our files (at least ones produced with our default
 option settings).
 
-Unfortunately, many commercial JPEG implementations are also incompatible as
-of this writing, especially programs released before summer 1991.  The root of
-the problem is that the ISO JPEG committee failed to specify a concrete file
-format.  Some vendors "filled in the blanks" on their own, creating
-proprietary formats that no one else could read.  (For example, none of the
-early commercial JPEG implementations for the Macintosh were able to exchange
-compressed files.)
+Some commercial JPEG implementations are also incompatible as of this writing,
+especially programs released before summer 1991.  The root of the problem is
+that the ISO JPEG committee failed to specify a concrete file format.  Some
+vendors "filled in the blanks" on their own, creating proprietary formats that
+no one else could read.  (For example, none of the early commercial JPEG
+implementations for the Macintosh were able to exchange compressed files.)
 
 The file format we have adopted is called JFIF (see REFERENCES).  This format
 has been agreed to by a number of major commercial JPEG vendors, and we expect
 that it will become the de facto standard.  JFIF is a minimal representation;
-work is also going forward to incorporate JPEG compression into the TIFF
+work is also going forward to incorporate JPEG compression into the TIFF 6.0
 standard, for use in "high end" applications that need to record a lot of
-additional data about an image.  We intend to support JPEG-in-TIFF in the
-future.  We hope that these two formats will be sufficient and that other,
-incompatible JPEG file formats will not proliferate.
+additional data about an image.  We intend to support TIFF 6.0 in the future.
+We hope that these two formats will be sufficient and that other, incompatible
+JPEG file formats will not proliferate.
 
 Indeed, part of the reason for developing and releasing this free software is
 to help force rapid convergence to de facto standards for JPEG file formats.
-SUPPORT STANDARD, NON-PROPRIETARY FORMATS: demand JFIF or JPEG-in-TIFF!
+SUPPORT STANDARD, NON-PROPRIETARY FORMATS: demand JFIF or TIFF 6.0!
 
 
 USING JPEG AS A SUBROUTINE IN A LARGER PROGRAM
@@ -195,17 +208,18 @@
 compression) or go to memory or the screen (for decompression).  For this
 purpose you must provide image reading or writing routines that match the
 interface used by the image file I/O modules (jrdXXX or jwrXXX); again,
-example.c shows a skeleton of what is required.
+example.c shows a skeleton of what is required.  In this situation, you
+won't need any of the non-JPEG image file I/O modules used by cjpeg and djpeg.
 
 By default, any error detected inside the JPEG routines will cause a message
 to be printed on stderr, followed by exit().  You can override this behavior
 by supplying your own message-printing and/or error-exit routines; again,
 example.c shows how.
 
-Mechanics: we recommend you create libjpeg.a as shown in the Makefile, then
-link that with your surrounding program.  (If your linker is at all
-reasonable, only the code you actually need will get loaded.)  Include the
-files jconfig.h and jpegdata.h in C files that need to call the JPEG routines.
+We recommend you create libjpeg.a as shown in the Makefile, then link that
+with your surrounding program.  (If your linker is at all reasonable, only the
+code you actually need will get loaded.)  Include the files jconfig.h and
+jpegdata.h in C files that need to call the JPEG routines.
 
 CAUTION: some people have tried to compile JPEG and their surrounding code
 with different compilers, e.g., cc for JPEG and c++ or gcc for the rest.  This
@@ -228,52 +242,75 @@
 REFERENCES
 ==========
 
-The best and most readily available introduction to the JPEG compression
-algorithm is Wallace's article in the April '91 CACM:
+We highly recommend reading one or more of these references before trying to
+understand the innards of any JPEG software.
+
+The best short technical introduction to the JPEG compression algorithm is
 	Wallace, Gregory K.  "The JPEG Still Picture Compression Standard",
 	Communications of the ACM, April 1991 (vol. 34 no. 4), pp. 30-44.
 (Adjacent articles in that issue discuss MPEG motion picture compression,
-applications of JPEG, and related topics.)  We highly recommend reading that
-article before trying to understand the innards of any JPEG software.
-If you don't have the CACM issue handy, a PostScript file containing a revised
-version of the article is available at ftp.uu.net, graphics/jpeg/wallace.ps.Z.
-The file (actually a preprint for an article to appear in IEEE Trans. Consumer
-Electronics) omits the sample images that appeared in CACM, but it includes
-corrections and some added material.  Note: the Wallace article is copyright
-ACM and IEEE, and it may not be used for commercial purposes.
+applications of JPEG, and related topics.)  If you don't have the CACM issue
+handy, a PostScript file containing a revised version of the article is
+available at ftp.uu.net, graphics/jpeg/wallace.ps.Z.  The file (actually a
+preprint for an article to appear in IEEE Trans. Consumer Electronics) omits
+the sample images that appeared in CACM, but it includes corrections and some
+added material.  Note: the Wallace article is copyright ACM and IEEE, and it
+may not be used for commercial purposes.
 
-For more detail about the JPEG standard you pretty much have to go to the
-draft standard (which is not nearly as intelligible as Wallace's article).
-The standard is not now available electronically; you must order a paper copy
-through ISO.  In the US, copies may be ordered from ANSI Sales at (212)
-642-4900.  The standard is divided into two parts: Part 1 is the actual
-specification, and Part 2 covers compliance testing methods.  The current
-"committee draft" version of Part 1 is titled "Digital Compression and Coding
-of Continuous-tone Still Images, Part 1: Requirements and guidelines" and has
-document number ISO/IEC CD 10918-1.  (The alternate number SC2 N2215 should
-also be mentioned when ordering.)  This draft is expected to be superseded by
-the Draft International Standard version around the end of November 1991.
-Ordering info will be the same as above, but replace "CD" with "DIS" in the
-document number (alternate number not yet known).  The committee draft of
-Part 2 is expected to be available around the end of December 1991.  It will
-be titled "Digital Compression and Coding of Continuous-tone Still Images,
-Part 2: Compliance testing" and will have document number ISO/IEC CD 10918-2
-(alternate number not yet known).
+A somewhat less technical, more leisurely introduction to JPEG can be found in
+"The Data Compression Book" by Mark Nelson, published by M&T Books (Redwood
+City, CA), 1991, ISBN 1-55851-216-0.  This book provides good explanations and
+example C code for a multitude of compression methods including JPEG.  It is
+an excellent source if you are comfortable reading C code but don't know much
+about data compression in general.  The book's JPEG sample code is far from
+industrial-strength, but when you are ready to look at a full implementation,
+you've got one here...
+
+A new textbook about JPEG is "JPEG Still Image Data Compression Standard" by
+William B. Pennebaker and Joan L. Mitchell, published by Van Nostrand
+Reinhold, 1993, ISBN 0-442-01272-1.  Price US$59.95.  This book includes the
+complete text of the ISO JPEG standards (DIS 10918-1 and draft DIS 10918-2).
+This is by far the most complete exposition of JPEG in existence, and I highly
+recommend it.  If you read the entire book, you will probably know more about
+JPEG than I do.
+
+The JPEG standard itself is not available electronically; you must order a
+paper copy through ISO.  (Unless you are concerned about having a certified
+official copy, I recommend buying the Pennebaker and Mitchell book instead;
+it's much cheaper and includes a great deal of useful explanatory material.)
+In the US, copies of the standard may be ordered from ANSI Sales at (212)
+642-4900.  It's not cheap: as of 1992, Part 1 is $95 and Part 2 is $47, plus
+7% shipping/handling.  The standard is divided into two parts, Part 1 being
+the actual specification, while Part 2 covers compliance testing methods.
+As of early 1992, Part 1 has Draft International Standard status.  It is
+titled "Digital Compression and Coding of Continuous-tone Still Images, Part
+1: Requirements and guidelines" and has document number ISO/IEC DIS 10918-1.
+Part 2 is still at Committee Draft status.  It is titled "Digital Compression
+and Coding of Continuous-tone Still Images, Part 2: Compliance testing" and
+has document number ISO/IEC CD 10918-2.  (NOTE: I'm told that the final
+version of Part 2 will differ considerably from the CD draft.)
 
 The JPEG standard does not specify all details of an interchangeable file
 format.  For the omitted details we follow the "JFIF" conventions, revision
-1.01.  A copy of the JFIF spec is available from:
+1.02.  A copy of the JFIF spec is available from:
 	Literature Department
 	C-Cube Microsystems, Inc.
 	399A West Trimble Road
 	San Jose, CA  95131
 	(408) 944-6300
-The same source can supply copies of the draft JPEG-in-TIFF documents
-(Appendixes O and P to the TIFF spec).  PostScript versions of these
-documents can also be obtained by e-mail from the C-Cube mail server,
-netlib@c3.pla.ca.us.  Send the message "send jfif_ps from jpeg" to obtain the
-JFIF document; "send app_o_ps from jpeg" and "send app_p_ps from jpeg" will
-produce the TIFF documents.  Send the message "help" if you have trouble.
+A PostScript version of this document is available at ftp.uu.net, file
+graphics/jpeg/jfif.ps.Z.  It can also be obtained by e-mail from the C-Cube
+mail server, netlib@c3.pla.ca.us.  Send the message "send jfif_ps from jpeg"
+to the server to obtain the JFIF document; send the message "help" if you have
+trouble.
+
+The TIFF 6.0 file format specification can be obtained by FTP from sgi.com
+(192.48.153.1), file graphics/tiff/TIFF6.ps.Z; or you can order a printed copy
+from Aldus Corp. at (206) 628-6593.  It should be noted that the TIFF 6.0 spec
+of 3-June-92 has a number of serious problems in its JPEG features.  A
+clarification note will probably be needed to ensure that TIFF JPEG files are
+compatible across different implementations.  The IJG does not intend to
+support TIFF 6.0 until these problems are resolved.
 
 If you want to understand this implementation, start by reading the
 "architecture" documentation file.  Please read "codingrules" if you want to
@@ -305,10 +342,10 @@
 full responsibility for any undesirable consequences; the authors accept
 NO LIABILITY for damages of any kind.
 
-Permission is NOT granted for the use of any author's name or author's company
-name in advertising or publicity relating to this software or products derived
-from it.  This software may be referred to only as "the Independent JPEG
-Group's software".
+Permission is NOT granted for the use of any IJG author's name or company name
+in advertising or publicity relating to this software or products derived from
+it.  This software may be referred to only as "the Independent JPEG Group's
+software".
 
 We specifically permit and encourage the use of this software as the basis of
 commercial products, provided that all warranty or liability claims are
@@ -321,7 +358,7 @@
 by the usual distribution terms of the Free Software Foundation; principally,
 that you must include source code if you redistribute it.  (See the file
 ansi2knr.c for full details.)  However, since ansi2knr.c is not needed as part
-of any product generated from the JPEG code, this does not limit you more than
+of any program generated from the JPEG code, this does not limit you more than
 the foregoing paragraphs do.
 
 
@@ -331,7 +368,7 @@
 one or more licenses.  For this reason, support for arithmetic coding has been
 removed from the free JPEG software.  (Since arithmetic coding provides only a
 marginal gain over the unpatented Huffman mode, it is unlikely that very many
-people will choose to use it.  If you do obtain the necessary licenses,
+implementors will support it.  If you do obtain the necessary licenses,
 contact jpeg-info@uunet.uu.net for a copy of our arithmetic coding modules.)
 So far as we are aware, there are no patent restrictions on the remaining
 code.
@@ -346,9 +383,13 @@
 TO DO
 =====
 
+The next major release will probably be a significant rewrite to allow use of
+this code in conjunction with Sam Leffler's free TIFF library (assuming the
+bugs in the TIFF 6.0 specification get resolved).
+
 Many of the modules need fleshing out to provide more complete
 implementations, or to provide faster paths for common cases.
-Improving the speed will be the next big work item for the JPEG group.
+Speeding things up is still high on our priority list.
 
 We'd appreciate it if people would compile and check out the code on as wide a
 variety of systems as possible, and report any portability problems
diff --git a/SETUP b/SETUP
index 580a194..dd633ae 100644
--- a/SETUP
+++ b/SETUP
@@ -29,16 +29,18 @@
 old versions of Borland C).  We include several standard makefiles in the
 distribution:
 
-	makefile.ansi: for Unix systems with ANSI-compatible C compilers.
-	makefile.unix: for Unix systems with non-ANSI C compilers.
-	makefile.mc5:  for Microsoft C 5.x under MS-DOS.
-	makefile.mc6:  for Microsoft C 6.x under MS-DOS.
-	makefile.bcc:  for Borland C (Turbo C) under MS-DOS.
-	makefile.pwc:  for Mix Software's Power C under MS-DOS.
-	makefile.manx: for Manx Aztec C on Amigas.
-	makefile.sas:  for SAS C on Amigas.
-	makefile.mms:  for VAX/VMS systems with MMS.
-	makefile.vms:  for VAX/VMS systems without MMS.
+	makefile.ansi:	for Unix systems with ANSI-compatible C compilers.
+	makefile.unix:	for Unix systems with non-ANSI C compilers.
+	makefile.mc5:	for Microsoft C 5.x under MS-DOS.
+	makefile.mc6:	for Microsoft C 6.x and up under MS-DOS.
+	makefile.bcc:	for Borland C (Turbo C) under MS-DOS.
+	makefile.manx:	for Manx Aztec C on Amigas.
+	makefile.sas:	for SAS C on Amigas.
+	makcjpeg.st:	project file for Atari ST/STE/TT Pure C or Turbo C.
+	makdjpeg.st:	project file for Atari ST/STE/TT Pure C or Turbo C.
+	makljpeg.st:	project file for Atari ST/STE/TT Pure C or Turbo C.
+	makefile.mms:	for VAX/VMS systems with MMS.
+	makefile.vms:	for VAX/VMS systems without MMS.
 
 If you don't see a makefile for your system, we recommend starting from either
 makefile.ansi or makefile.unix, depending on whether your compiler accepts
@@ -76,8 +78,13 @@
 ======================
 
 Look over jconfig.h and adjust #defines to reflect the properties of your
-system and C compiler.  (If you prefer, you can usually leave jconfig.h
-unmodified and add -Dsymbol switches to the Makefile's CFLAGS definition.)
+system and C compiler.  If you prefer, you can usually leave jconfig.h
+unmodified and add -Dsymbol switches to the Makefile's CFLAGS definition.
+(This is already done if you used a compiler-specific makefile in step 1.)
+However, putting the switches in the Makefile is a bad idea if you are going
+to incorporate the JPEG software into other programs --- you'd need to include
+the same -D switches in the other programs' Makefiles.  Better to change
+jconfig.h.
 
 If you have an ANSI-compliant C compiler, no changes should be necessary
 except perhaps for RIGHT_SHIFT_IS_UNSIGNED and TWO_FILE_COMMANDLINE.  For
@@ -100,16 +107,19 @@
 STEP 3: SELECT SYSTEM-DEPENDENT FILES
 =====================================
 
+A few places in the JPEG software are so system-dependent that we have to
+provide several different implementations and let you select the one you need.
+
 The only system-dependent file in the current version is jmemsys.c.  This file
 controls use of temporary files for big images that won't fit in main memory.
-You'll notice there is no file by that name in the distribution; you must
+You'll notice there is no file named jmemsys.c in the distribution; you must
 select one of the provided versions and copy, rename, or link it to jmemsys.c.
 Here are the provided versions:
 
 	jmemansi.c	This is a reasonably portable version that should
 			work on most ANSI and near-ANSI C compilers.  It uses
 			the ANSI-standard library routine tmpfile(), which not
-			all pre-ANSI systems have.  On some systems tmpfile()
+			all non-ANSI systems have.  On some systems tmpfile()
 			may put the temporary file in a non-optimal location;
 			if you don't like what it does, use jmemname.c.
 
@@ -151,7 +161,8 @@
 "insufficient memory" failures.  On most Unix machines (and other systems with
 virtual memory), just set DEFAULT_MAX_MEM to several million and forget it.
 At the other end of the spectrum, for MS-DOS machines you probably can't go
-much above 300K to 400K.
+much above 300K to 400K.  (On MS-DOS the value refers to conventional memory;
+extended/expanded memory is handled separately by jmemdos.c.)
 
 
 STEP 4: MAKE
@@ -165,7 +176,7 @@
 If your compiler complains about big_sarray_control and big_barray_control
 being undefined structures, you should be able to shut it up by adding
 -DINCOMPLETE_TYPES_BROKEN to CFLAGS (or add #define INCOMPLETE_TYPES_BROKEN
-to jconfig.h).
+to jconfig.h).  If you don't have a getenv() library routine, define NO_GETENV.
 
 There are a fair number of routines that do not use all of their parameters;
 some compilers will issue warnings about this, which you can ignore.  Any
@@ -179,7 +190,7 @@
 several forms:
 	testorig.jpg	A reduced section of the well-known Lenna picture.
 	testimg.ppm	The output of djpeg testorig.jpg
-	testimg.gif	The output of djpeg -G testorig.jpg
+	testimg.gif	The output of djpeg -gif testorig.jpg
 	testimg.jpg	The output of cjpeg testimg.ppm
 (The two .jpg files aren't identical since JPEG is lossy.)  If you can
 generate duplicates of the testimg.* files then you probably have working
@@ -191,24 +202,94 @@
 compare these to testimg.* with whatever binary file comparison tool you have.
 The files should be bit-for-bit identical.
 
+If the cjpeg test run fails with "Missing Huffman code table entry", it's a
+good bet that you needed to define RIGHT_SHIFT_IS_UNSIGNED.  Go back to step 2
+and run ckconfig.c.  (This is a good plan for any other test failure, too.)
+
 If your choice of jmemsys.c was anything other than jmemnobs.c, you should
-also test that temporary-file usage works.  Try "djpeg -G -m 0 testorig.jpg"
+test that temporary-file usage works.  Try "djpeg -gif -max 0 testorig.jpg"
 and make sure its output matches testimg.gif.  If you have any really large
-images handy, try compressing them with -o and/or decompressing with -G
-to make sure your DEFAULT_MAX_MEM setting is not too large.
+images handy, try compressing them with -optimize and/or decompressing with
+-gif to make sure your DEFAULT_MAX_MEM setting is not too large.
 
 NOTE: this is far from an exhaustive test of the JPEG software; some modules,
-such as fast color quantization, are not exercised at all.  It's just a quick
+such as 1-pass color quantization, are not exercised at all.  It's just a quick
 test to give you some confidence that you haven't missed something major.
 
-If the test passes, you can copy the executable files cjpeg and djpeg to
-wherever you normally install programs.  Read the file USAGE to learn more
-about using the programs.
+
+STEP 6: INSTALLATION
+====================
+
+Once you're done with the above steps, you can install the software by copying
+the executable files (cjpeg and djpeg) to wherever you normally install
+programs.  On Unix systems, you'll also want to put cjpeg.1 and djpeg.1 in the
+corresponding manual directory.  (The makefiles don't support this step since
+there's such a wide variety of installation procedures on different systems.)
+
+To learn to use the programs, read the file USAGE (or manual pages cjpeg(1)
+and djpeg(1) on Unix).
+
+
+OPTIMIZATION
+============
+
+Unless you own a Cray, you'll probably be interested in making the JPEG
+software go as fast as possible.  This section covers some machine-dependent
+optimizations you may want to try.  We suggest that before trying any of this,
+you first get the basic installation to pass the self-test (step 5 above).
+Repeat the self-test after any optimization to make sure that you haven't
+broken anything.
+
+The JPEG DCT routines perform a lot of multiplications.  These multiplications
+must yield 32-bit results, but none of their input values are more than 16
+bits wide.  On many machines, notably the 680x0 and 80x86 CPUs, a 16x16=>32
+bit multiply instruction is faster than a full 32x32=>32 bit multiply.
+Unfortunately there is no portable way to specify such a multiplication in C,
+but some compilers can generate one when you use the right combination of
+casts.  See the MULTIPLY macro definitions in jfwddct.c and jrevdct.c.
+If your compiler makes "int" be 32 bits and "short" be 16 bits, defining
+SHORTxSHORT_32 is fairly likely to work.  When experimenting with alternate
+definitions, be sure to test not only whether the code still works (use the
+self-test step), but also whether it is actually faster --- on some compilers,
+alternate definitions may compute the right answer, yet be slower than the
+default.  Timing cjpeg on a large PPM input file is the best way to check
+this, as the DCT will be the largest fraction of the runtime in that mode.
+(Note: some of the distributed compiler-specific makefiles already contain
+-D switches to select an appropriate MULTIPLY definition.)
+
+If access to "short" arrays is slow on your machine, it may be a win to define
+type DCTELEM as int rather than as JCOEF (which is normally defined as short).
+This will cause the DCT routines to operate on int arrays instead of short
+arrays.  If shorts are slow and you have lots of memory to burn, you might
+even make JCOEF itself be int.
+
+If your compiler can compile function calls in-line, make sure the INLINE
+macro in jconfig.h is defined as the keyword that marks a function
+inline-able.  Some compilers have a switch that tells the compiler to inline
+any function it thinks is profitable (e.g., -finline-functions for gcc).
+Enabling such a switch is likely to make the compiled code bigger but faster.
+
+In general, it's worth trying the maximum optimization level of your compiler,
+and experimenting with any optional optimizations such as loop unrolling.
+(Unfortunately, far too many compilers have optimizer bugs ... be prepared to
+back off if the code fails self-test.)  If you do any experimentation along
+these lines, please report the optimal settings to jpeg-info@uunet.uu.net so
+we can mention them in future releases.  Be sure to specify your machine and
+compiler version.
 
 
 OPTIONAL STUFF
 ==============
 
+Progress monitor:
+
+If you like, you can #define PROGRESS_REPORT (in jconfig.h or in the Makefile)
+to enable display of percent-done progress reports.  The routines provided in
+jcmain.c/jdmain.c merely print percentages to stderr, but you can customize
+them to do something fancier.
+
+Utah RLE file format support:
+
 We distribute the software with support for RLE image files (Utah Raster
 Toolkit format) disabled, because the RLE support won't compile without the
 Utah library.  If you have URT version 3.0, you can enable RLE support as
@@ -221,13 +302,21 @@
 	    the directory containing the URT "librle.a" file (typically the
 	    "lib" subdirectory of the URT distribution).
 
+JPEG library:
+
 If you want to incorporate the JPEG code as subroutines in a larger program,
-we recommend that you make libjpeg.a.  (See file README for more info.)
+we recommend that you make libjpeg.a, then link that into your surrounding
+program.  See file README for more info.
 
 CAUTION: When you use the JPEG code as subroutines, we recommend that you make
 any required configuration changes by modifying jconfig.h, not by adding -D
 switches to the Makefile.  Otherwise you must be sure to provide the same -D
-switches when compiling any program that includes the JPEG .h files.
+switches when compiling any program that includes the JPEG .h files, to ensure
+that the parameter structures are interpreted the same way.  (This is only
+critical for the first few symbols mentioned in jconfig.h, down through
+NEED_FAR_POINTERS.)
+
+Removing code:
 
 If you need to make a smaller version of the JPEG software, some optional
 functions can be removed at compile time.  See the xxx_SUPPORTED #defines in
@@ -242,9 +331,8 @@
 ==========================
 
 We welcome reports on changes needed for systems not mentioned here.
-Submit 'em to jpeg-info@uunet.uu.net.  Also, ckconfig.c is fairly new and not
-yet thoroughly tested; if it's wrong about how to configure the JPEG software
-for your system, please let us know.
+Submit 'em to jpeg-info@uunet.uu.net.  Also, if ckconfig.c is wrong about
+how to configure the JPEG software for your system, please let us know.
 
 
 Amiga:
@@ -258,6 +346,26 @@
 Change jmemname.c if you don't like this.
 
 
+Atari:
+
+The project files provided should work as-is with Pure C.  For Turbo C, change
+library filenames "PC..." to "TC..." in the project files for cjpeg.ttp and
+djpeg.ttp.  Don't forget to select a jmemsys.c file, see Step 3 (we recommend
+jmemansi.c).  Also adjust the DEFAULT_MAX_MEM setting --- you probably want it
+to be a couple hundred K less than your normal free memory.  Note that you
+must make jpeg.lib before making cjpeg.ttp or cjpeg.ttp.  You'll have to
+perform the self-test (Step 5) by hand.
+
+There is a bug in some older versions of the Turbo C library which causes the
+space used by temporary files created with "tmpfile()" not to be freed after
+an abnormal program exit.  If you check your disk afterwards, you will find
+cluster chains that are allocated but not used by a file.  This should not
+happen in cjpeg or djpeg, since we enable a signal catcher to explicitly close
+temp files before exiting.  But if you use the JPEG library with your own
+code, be sure to supply a signal catcher, or else use a different
+system-dependent memory manager.
+
+
 Cray:
 
 Should you be so fortunate as to be running JPEG on a Cray YMP, there is a
@@ -265,14 +373,19 @@
 insert a line reading "#pragma novector" just before the loop	
     for (i = 1; i <= (int) htbl->bits[l]; i++)
       huffsize[p++] = (char) l;
-in fix_huff_tbl (in V2, line 42 of jchuff.c and line 38 of jdhuff.c).  The
+in fix_huff_tbl (in V3, line 42 of jchuff.c and line 38 of jdhuff.c).  The
 usual symptom of not adding this line is a core-dump.  See Cray's SPR 48222.
 
 
 HP/Apollo DOMAIN:
 
-At least in version 10.3.5, the C compiler is ANSI but the system include
-files are not.  Use makefile.ansi and add -DNONANSI_INCLUDES to CFLAGS.
+With system release 10.4 or later, makefile.ansi should work OK.  If you have
+version 10.3.anything, you need to figure out whether you have the ANSI C
+compiler (version 6.7 or later) and whether you've installed the ANSI C
+include files (if so, the first line of <stdio.h> will mention ANSI C).
+If you have the ANSI C compiler but not the ANSI C include files, use
+makefile.ansi and add -DNONANSI_INCLUDES to CFLAGS.  If you have both,
+then makefile.ansi should work as is.  If neither, use makefile.unix.
 
 
 HP-UX:
@@ -290,6 +403,14 @@
 convert the code to K&R style (i.e., use makefile.unix).
 
 
+Macintosh MPW:
+
+We don't directly support MPW in the current release, but Larry Rosenstein
+reports that the JPEG code can be ported without very much trouble.  There's
+useful notes and conversion scripts in his kit for porting PBMPLUS to MPW.
+You can obtain the kit by FTP to ftp.apple.com, file /pub/lsr/pbmplus-port*.
+
+
 Macintosh Think C:
 
 You'll have to prepare project files for cjpeg and djpeg; we don't include
@@ -306,8 +427,8 @@
 options except const; you must also #define const.)
 
 jcmain and jdmain are set up to provide the usual command-line interface
-by means of Think's ccommand() library routine.  Anybody want to write a
-more Mac-like interface for us?
+by means of Think's ccommand() library routine.  A more Mac-like interface
+is in the works.
 
 
 MS-DOS, generic comments:
@@ -329,35 +450,68 @@
 can't use jmemdos.c for some reason --- for example, because you don't have
 a Microsoft-compatible assembler to assemble jmemdosa.asm --- you'll have
 to fall back to jmemansi.c or jmemname.c.  IMPORTANT: if you use either of
-those files, you will have to compile in a large-data memory model in order
-to get the right stdio library.  Too bad.
+the latter two files, you will have to compile in a large-data memory model
+in order to get the right stdio library.  Too bad.
 
 None of the above advice applies if you are using a 386 flat-memory-space
-environment, such as DJGPP or Watcom C.  For these compilers, do NOT define
-NEED_FAR_POINTERS, and do NOT use jmemdos.c.  Use jmemnobs.c if the
-environment supplies adequate virtual memory, otherwise use jmemansi.c or
-jmemname.c.
+environment, such as DJGPP or Watcom C.  (And you should use one if you have
+it, as performance will be much better than 8086-compatible code!)  For
+flat-memory-space compilers, do NOT define NEED_FAR_POINTERS, and do NOT use
+jmemdos.c.  Use jmemnobs.c if the environment supplies adequate virtual
+memory, otherwise use jmemansi.c or jmemname.c.
+
+Most MS-DOS compilers treat stdin/stdout as text files, so you must use
+two-file command line style.  But if your compiler has the setmode() library
+routine, you can define USE_SETMODE to get one-file style.  (Don't forget to
+change the "make test" script in the Makefile if you do so.)
+
+If you add more switches to CFLAGS in the DOS-specific makefiles, you are
+likely to run up against DOS' 128-byte command line length limit.  In that
+case, remove some "-Dsymbol" switches from CFLAGS and instead put
+corresponding "#define symbol" lines at the head of jinclude.h.
+
+
+MS-DOS, Borland C:
+
+Be sure to convert all the source files to DOS text format (CR/LF newlines).
+Although Borland C will often work OK with unmodified Unix (LF newlines)
+source files, sometimes it will give bogus compile errors.
+"Illegal character '#'" is the most common such error.
+
+Some versions of Borland's MAKE erroneously display the warning message about
+creating jmemsys.c, even after you have done so.  If this happens to you,
+delete the four lines beginning with "jmemsys.c:" from the Makefile.
 
 
 MS-DOS, DJGPP:
 
-The file egetopt.c conflicts with some library routines in DJGPP 1.05.
-Remove #include "egetopt.c" from jcmain.c and jdmain.c, and in each of
-those files change the egetopt(...) call to getopt(...).  This will be
-fixed more cleanly in some future version.  Use makefile.ansi, and put
-"-DTWO_FILE_COMMANDLINE" (but *not* -DMSDOS) in CFLAGS.
+Use makefile.ansi and jmemnobs.c, and put "-UMSDOS" in CFLAGS to undo the
+compiler's automatic definition of MSDOS.  Also put either "-DUSE_SETMODE" or
+"-DTWO_FILE_COMMANDLINE" in CFLAGS, depending on whether you prefer one-file
+or two-file command line style.  (If you choose two-file style, change the
+"make test" section of the Makefile accordingly.)  You'll also need to put the
+object-file lists into response files in order to circumvent DOS's 128-byte
+command line length limit at the final linking step.
 
 
 MS-DOS, Microsoft C:
 
-Some versions of MS C fail with an "out of macro expansion space" error
+Old versions of MS C fail with an "out of macro expansion space" error
 because they can't cope with the macro TRACEMS8 (defined in jpegdata.h).
 If this happens to you, the easiest solution is to change TRACEMS8 to
 expand to nothing.  You'll lose the ability to dump out JPEG coefficient
-tables with djpeg -d -d, but at least you can compile.
+tables with djpeg -debug -debug, but at least you can compile.
 
-makefile.mc6 (MS C 6.x makefile) has not been tested since jmemdosa.asm
-was added; we'd appreciate hearing whether it works or not.
+Original MS C 6.0 is buggy; it compiles incorrect code unless you turn off
+optimization (remove -O from CFLAGS).  That problem seems to have been fixed
+in 6.00A and later versions.  6.00A still generates a bogus "conditional
+expression is constant" warning in jrdppm.c, but the emitted code seems OK.
+
+
+SGI:
+
+Use makefile.ansi, but set "AR2= ar -ts" rather than "AR2= ranlib".  Also
+make any changes recommended by ckconfig.c.
 
 
 Sun:
diff --git a/USAGE b/USAGE
index 9ee86fe..ebce6df 100644
--- a/USAGE
+++ b/USAGE
@@ -22,10 +22,11 @@
 hints on incorporating the JPEG software into other programs.
 
 If you are on a Unix machine you may prefer to read the Unix-style manual
-pages in files cjpeg.1 and djpeg.1.
+pages in files cjpeg.1 and djpeg.1.  But also see the HINTS section below,
+which is not present in either manual page.
 
-NOTE: at some point we will probably redesign the user interface, so the
-command line switches described here will change.
+NOTE: the switch syntax has been redesigned since the v3 release of
+cjpeg/djpeg.  Switch names are now words instead of single letters.
 
 
 GENERAL USAGE
@@ -58,155 +59,244 @@
 of some Targa-format files.  You have to tell djpeg which format to generate.
 
 The only JPEG file format currently supported is the JFIF format.  Support for
-the TIFF/JPEG format will probably be added at some future date.
+the TIFF 6.0 JPEG format will probably be added at some future date.
+
+All switch names may be abbreviated; for example, -grayscale may be written
+-gray or -gr.  Most of the "basic" switches can be abbreviated to as little as
+one letter.  Upper and lower case are equivalent (-GIF is the same as -gif).
+British spellings are also accepted (e.g., -greyscale), though for brevity
+these are not mentioned below.
 
 
 CJPEG DETAILS
 
-The command line switches for cjpeg are:
+The basic command line switches for cjpeg are:
 
-	-Q quality	Scale quantization tables to adjust image quality.
+	-quality N	Scale quantization tables to adjust image quality.
 			Quality is 0 (worst) to 100 (best); default is 75.
 			(See below for more info.)
 
-	-o		Perform optimization of entropy encoding parameters.
-			Without this, default encoding parameters are used.
-			-o usually makes the JPEG file a little smaller, but
-			cjpeg runs somewhat slower and needs much more memory.
-			Image quality and speed of decompression are unaffected
-			by -o.
+	-grayscale	Create monochrome JPEG file from color input.
+			Be sure to use this switch when compressing a grayscale
+			GIF file, because cjpeg isn't bright enough to notice
+			whether a GIF file uses only shades of gray.  By
+			saying -grayscale, you'll get a smaller JPEG file that
+			takes less time to process.
 
-	-T		Input file is Targa format.  Targa files that contain
+	-optimize	Perform optimization of entropy encoding parameters.
+			Without this, default encoding parameters are used.
+			-optimize usually makes the JPEG file a little smaller,
+			but cjpeg runs somewhat slower and needs much more
+			memory.  Image quality and speed of decompression are
+			unaffected by -optimize.
+
+	-targa		Input file is Targa format.  Targa files that contain
 			an "identification" field will not be automatically
 			recognized by cjpeg; for such files you must specify
-			-T to force cjpeg to treat the input as Targa format.
+			-targa to make cjpeg treat the input as Targa format.
 
-	-I		Generate noninterleaved JPEG file (not yet supported).
+The -quality switch lets you trade off compressed file size against quality of
+the reconstructed image: the higher the quality setting, the larger the JPEG
+file, and the closer the output image will be to the original input.  Normally
+you want to use the lowest quality setting (smallest file) that decompresses
+into something visually indistinguishable from the original image.  For this
+purpose the quality setting should be between 50 and 95; the default of 75 is
+often about right.  If you see defects at -quality 75, then go up 5 or 10
+counts at a time until you are happy with the output image.  (The optimal
+setting will vary from one image to another.)
 
-	-a		Use arithmetic coding rather than Huffman coding.
-			(Not currently supported for legal reasons.)
+-quality 100 will generate a quantization table of all 1's, eliminating loss
+in the quantization step (but there is still information loss in subsampling,
+as well as roundoff error).  This setting is mainly of interest for
+experimental purposes.  Quality values above about 95 are NOT recommended for
+normal use; the compressed file size goes up dramatically for hardly any gain
+in output image quality.
 
-	-d		Enable debug printout.  More -d's give more printout.
-			Also, version information is printed at startup.
+In the other direction, quality values below 50 will produce very small files
+of low image quality.  Settings around 5 to 10 might be useful in preparing an
+index of a large image library, for example.  Try -quality 2 (or so) for some
+amusing Cubist effects.  (Note: quality values below about 25 generate 2-byte
+quantization tables, which are considered optional in the JPEG standard.
+cjpeg emits a warning message when you give such a quality value, because
+some commercial JPEG programs may be unable to decode the resulting file.)
 
-	-m memory	Set limit for amount of memory to use in processing
+Switches for advanced users:
+
+	-maxmemory N	Set limit for amount of memory to use in processing
 			large images.  Value is in thousands of bytes, or
 			millions of bytes if "M" is attached to the number.
-			For example, -m 4m selects 4000000 bytes.  If more
+			For example, -max 4m selects 4000000 bytes.  If more
 			space is needed, temporary files will be used.
 
-The -Q switch lets you trade off compressed file size against quality of the
-reconstructed image: the higher the -Q setting, the larger the JPEG file, and
-the closer the output image will be to the original input.  Normally you want
-to use the lowest -Q setting (smallest file) that decompresses into something
-visually indistinguishable from the original image.  For this purpose the -Q
-setting should be between 50 and 95; the default of 75 is often about right.
-If you see defects at -Q 75, then go up 5 or 10 counts at a time until you are
-happy with the output image.  (The optimal setting will vary from one image to
-another.)
+	-restart N	Emit a JPEG restart marker every N MCU rows, or every
+			N MCU blocks if "B" is attached to the number.
+			-restart 0 (the default) means no restart markers.
 
--Q 100 will generate a quantization table of all 1's, eliminating loss in the
-quantization step (but there is still information loss in subsampling, as well
-as roundoff error).  This setting is mainly of interest for experimental
-purposes.  -Q values above about 95 are NOT recommended for normal use; the
-compressed file size goes up dramatically for hardly any gain in output image
-quality.
+	-smooth N	Smooth the input image to eliminate dithering noise.
+			N, ranging from 1 to 100, indicates the strength of
+			smoothing.  0 (the default) means no smoothing.
 
-In the other direction, -Q values below 50 will produce very small files of
-low image quality.  Settings around 5 to 10 might be useful in preparing an
-index of a large image library, for example.  Try -Q 2 (or so) for some
-amusing Cubist effects.  (Note: -Q values below about 25 generate 2-byte
-quantization tables, which are considered optional in the JPEG standard.
-cjpeg emits a warning message when you give such a -Q value, because some
-commercial JPEG programs may be unable to decode the resulting file.)
+	-verbose	Enable debug printout.  More -v's give more printout.
+	or  -debug	Also, version information is printed at startup.
+
+The -restart option inserts extra markers that allow a JPEG decoder to
+resynchronize after a transmission error.  Without restart markers, any damage
+to a compressed file will usually ruin the image from the point of the error
+to the end of the image; with restart markers, the damage is usually confined
+to the portion of the image up to the next restart marker.  Of course, the
+restart markers occupy extra space.  We recommend -restart 1 for images that
+will be transmitted across unreliable networks such as Usenet.
+
+The -smooth option filters the input to eliminate fine-scale noise.  This is
+often useful when converting GIF files to JPEG: a moderate smoothing factor of
+10 to 50 gets rid of dithering patterns in the input file, resulting in a
+smaller JPEG file and a better-looking image.  Too large a smoothing factor
+will visibly blur the image, however.
+
+Switches for wizards:
+
+	-arithmetic	Use arithmetic coding rather than Huffman coding.
+			(Not currently supported for legal reasons.)
+
+	-nointerleave	Generate noninterleaved JPEG file (not yet supported).
+
+	-qtables file	Use the quantization tables given in the specified
+			file.  The file should contain one to four tables
+			(64 values each) as plain text.  Comments preceded by
+			'#' may be included in the file.  The tables are
+			implicitly numbered 0,1,etc.  If -quality N is also
+			specified, the values in the file are scaled according
+			to cjpeg's quality scaling curve.
+
+	-sample HxV[,...]	Set JPEG sampling factors.  If you specify
+			fewer H/V pairs than there are components, the
+			remaining components are set to 1x1 sampling.  The
+			default setting is equivalent to "-sample 2x2".
+
+The "wizard" switches are intended for experimentation with JPEG.  If you
+don't know what you are doing, DON'T USE THEM.  You can easily produce files
+with worse image quality and/or poorer compression than you'll get from the
+default settings.  Furthermore, these switches should not be used when making
+files intended for general use, because not all JPEG implementations will
+support unusual JPEG parameter settings.
 
 
 DJPEG DETAILS
 
-The command line switches for djpeg are:
+The basic command line switches for djpeg are:
 
-	-G		Select GIF output format (implies -q, with default
-			of 256 colors).
+	-colors N	Reduce image to at most N colors.  This reduces the
+	or -quantize N	number of colors used in the output image, so that it
+			can be displayed on a colormapped display or stored in
+			a colormapped file format.  For example, if you have
+			an 8-bit display, you'd need to reduce to 256 or fewer
+			colors.  (-colors is the recommended name, -quantize
+			is provided only for backwards compatibility.)
 
-	-P		Select PPM or PGM output format (this is the default).
-			PGM is emitted if the JPEG file is gray-scale or if -g
-			is specified.
+	-gif		Select GIF output format.  Since GIF does not support
+			more than 256 colors, -colors 256 is assumed (unless
+			you specify a smaller number of colors).
 
-	-R		Select RLE output format.  Requires URT library.
+	-pnm		Select PBMPLUS (PPM/PGM) output format (this is the
+			default format).  PGM is emitted if the JPEG file is
+			gray-scale or if -grayscale is specified; otherwise
+			PPM is emitted.
 
-	-T		Select Targa output format.  Gray-scale format is
-			emitted if the JPEG file is gray-scale or if -g is
-			specified; otherwise, colormapped format is emitted
-			if -q is specified; otherwise, 24-bit full-color
-			format is emitted.
+	-rle		Select RLE output format.  (Requires URT library.)
 
-	-g		Force gray-scale output even if input is color.
+	-targa		Select Targa output format.  Gray-scale format is
+			emitted if the JPEG file is gray-scale or if
+			-grayscale is specified; otherwise, colormapped format
+			is emitted if -colors is specified; otherwise, 24-bit
+			full-color format is emitted.
 
-	-q N		Quantize to N colors.  This reduces the number of
-			colors in the output image so that it can be displayed
-			on a colormapped display or stored in a colormapped
-			file format.  For example, if you have an 8-bit
-			display, you'd need to quantize to 256 or fewer colors.
+Switches for advanced users:
 
-	-D		Do not use dithering in color quantization.
+	-blocksmooth	Perform cross-block smoothing.  This is quite
+			memory-intensive and only seems to improve the image
+			at very low quality settings (-quality 10 to 20 or so).
+			At normal quality settings it may make things worse.
+
+	-grayscale	Force gray-scale output even if JPEG file is color.
+			Useful for viewing on monochrome displays.
+
+	-maxmemory N	Set limit for amount of memory to use in processing
+			large images.  Value is in thousands of bytes, or
+			millions of bytes if "M" is attached to the number.
+			For example, -max 4m selects 4000000 bytes.  If more
+			space is needed, temporary files will be used.
+
+	-nodither	Do not use dithering in color quantization.
 			By default, Floyd-Steinberg dithering is applied when
 			quantizing colors, but on some images dithering may
 			result in objectionable "graininess".  If that
-			happens, you can turn off dithering with -D.
-			-D is ignored unless you also say -q or -G.
+			happens, you can turn off dithering with -nodither.
+			-nodither is ignored unless you also say -colors N.
 
-	-1		Use one-pass instead of two-pass color quantization.
+	-onepass	Use one-pass instead of two-pass color quantization.
 			The one-pass method is faster and needs less memory,
-			but it produces a lower-quality image.
-			-1 is ignored unless you also say -q or -G.  Also,
+			but it produces a lower-quality image.  -onepass is
+			ignored unless you also say -colors N.  Also,
 			the one-pass method is always used for gray-scale
 			output (the two-pass method is no improvement then).
 
-	-b		Perform cross-block smoothing.  This is quite
-			memory-intensive and only seems to improve the image
-			at very low quality settings (-Q 10 to 20 or so).
-			At normal -Q settings it may make the image worse.
-
-	-d		Enable debug printout.  More -d's give more printout.
-			Also, version information is printed at startup.
-
-	-m memory	Set limit for amount of memory to use in processing
-			large images.  Value is in thousands of bytes, or
-			millions of bytes if "M" is attached to the number.
-			For example, -m 4m selects 4000000 bytes.  If more
-			space is needed, temporary files will be used.
+	-verbose	Enable debug printout.  More -v's give more printout.
+	or  -debug	Also, version information is printed at startup.
 
 
 HINTS
 
+Color GIF files are not the ideal input for JPEG; JPEG is really intended for
+compressing full-color (24-bit) images.  In particular, don't try to convert
+cartoons, line drawings, and other images that have only a few distinct
+colors.  GIF works great on these, JPEG does not.  If you want to convert a
+GIF to JPEG, you should experiment with cjpeg's -quality and -smooth options
+to get a satisfactory conversion.  -smooth 10 or so is often helpful.
+
 Avoid running an image through a series of JPEG compression/decompression
 cycles.  Image quality loss will accumulate; after ten or so cycles the image
 may be noticeably worse than it was after one cycle.  It's best to use a
 lossless format while manipulating an image, then convert to JPEG format when
 you are ready to file the image away.
 
-The -o option to cjpeg is worth using when you are making a "final" version
-for posting or archiving.  It's also a win when you are using low -Q settings
-to make very small JPEG files; the percentage improvement is often a lot more
-than it is on larger files.
+The -optimize option to cjpeg is worth using when you are making a "final"
+version for posting or archiving.  It's also a win when you are using low
+quality settings to make very small JPEG files; the percentage improvement
+is often a lot more than it is on larger files.
 
-The default memory usage limit (-m) is set when the software is compiled.
-If you get an "insufficient memory" error, try specifying a smaller -m value,
-even -m 0 to use the absolute minimum space.  You may want to recompile with
-a smaller default value if this happens often.
+When making images to be posted on Usenet, we recommend using cjpeg's option
+-restart 1.  This option limits the damage done to a compressed image by
+netnews transmission errors.
 
-djpeg with two-pass color quantization requires a good deal of space; on
-MS-DOS machines it may run out of memory even with -m 0.  In that case you
-can still decompress, with some loss of image quality, by specifying -1
-for one-pass quantization.
+The default memory usage limit (-maxmemory) is set when the software is
+compiled.  If you get an "insufficient memory" error, try specifying a smaller
+-maxmemory value, even -maxmemory 0 to use the absolute minimum space.  You
+may want to recompile with a smaller default value if this happens often.
+
+On machines that have "environment" variables, you can define the environment
+variable JPEGMEM to set the default memory limit.  The value is specified as
+described for the -maxmemory switch.  JPEGMEM overrides the default value
+specified when the program was compiled, and itself is overridden by an
+explicit -maxmemory switch.
+
+On MS-DOS machines, -maxmemory is the amount of main (conventional) memory to
+use.  (Extended or expanded memory is also used if available.)  Most
+DOS-specific versions of this software do their own memory space estimation
+and do not need -maxmemory.
+
+djpeg with two-pass color quantization requires a good deal of memory; on
+MS-DOS machines it may run out of memory even with -maxmemory 0.  In that case
+you can still decompress, with some loss of image quality, by specifying
+-onepass for one-pass quantization.
 
 If more space is needed than will fit in the available main memory (as
-determined by -m), temporary files will be used.  (MS-DOS versions will try to
-get extended or expanded memory first.)  The temporary files are often rather
-large: in typical cases they occupy three bytes per pixel, for example
-3*800*600 = 1.44Mb for an 800x600 image.  If you don't have enough free disk
-space, leave out -o (for cjpeg) or specify -1 (for djpeg).  On MS-DOS, the
-temporary files are created in the directory named by the TMP or TEMP
-environment variable, or in the current directory if neither of those exist.
-Amiga implementations put the temp files in the directory named by JPEGTMP:,
-so be sure to assign JPEGTMP: to a disk partition with adequate free space.
+determined by -maxmemory), temporary files will be used.  (MS-DOS versions
+will try to get extended or expanded memory first.)  The temporary files are
+often rather large: in typical cases they occupy three bytes per pixel, for
+example 3*800*600 = 1.44Mb for an 800x600 image.  If you don't have enough
+free disk space, leave out -optimize (for cjpeg) or specify -onepass (for
+djpeg).  On MS-DOS, the temporary files are created in the directory named by
+the TMP or TEMP environment variable, or in the current directory if neither
+of those exist.  Amiga implementations put the temp files in the directory
+named by JPEGTMP:, so be sure to assign JPEGTMP: to a disk partition with
+adequate free space.
diff --git a/ansi2knr.c b/ansi2knr.c
index 04e4789..13310d9 100644
--- a/ansi2knr.c
+++ b/ansi2knr.c
@@ -1,9 +1,3 @@
-/*
- * Received from Peter Deutsch (ghost@aladdin.com)
- * Fri, 26 Apr 91 10:10:10 PDT
- * Small portability improvements by Tom Lane
- */
-
 /* Copyright (C) 1989, 1991 Aladdin Enterprises.  All rights reserved.
    Distributed by Free Software Foundation, Inc.
 
@@ -225,13 +219,12 @@
 #define isidchar(ch) (isalnum(ch) || (ch) == '_')
 #define isidfirstchar(ch) (isalpha(ch) || (ch) == '_')
 
-int
 main(argc, argv)
     int argc;
     char *argv[];
 {	FILE *in, *out;
-#define bufsize 500			/* arbitrary size */
-	char buf[bufsize+1];
+#define bufsize 5000			/* arbitrary size */
+	char *buf;
 	char *line;
 	switch ( argc )
 	   {
@@ -253,6 +246,7 @@
 		exit(1);
 	   }
 	fprintf(out, "#line 1 \"%s\"\n", argv[1]);
+	buf = malloc(bufsize);
 	line = buf;
 	while ( fgets(line, (unsigned)(buf + bufsize - line), in) != NULL )
 	   {	switch ( test1(buf) )
@@ -262,7 +256,9 @@
 			break;
 		case -1:		/* maybe the start of a function */
 			line = buf + strlen(buf);
-			continue;
+			if ( line != buf + (bufsize - 1) ) /* overflow check */
+				continue;
+			/* falls through */
 		default:		/* not a function */
 			fputs(buf, out);
 			break;
@@ -270,6 +266,7 @@
 		line = buf;
 	   }
 	if ( line != buf ) fputs(buf, out);
+	free(buf);
 	fclose(out);
 	fclose(in);
 	return 0;
diff --git a/architecture b/architecture
index e27ed18..bf3385f 100644
--- a/architecture
+++ b/architecture
@@ -1,5 +1,5 @@
 
-	JPEG SYSTEM ARCHITECTURE		29-FEB-92
+	JPEG SYSTEM ARCHITECTURE		1-DEC-92
 
 
 This file provides an overview of the "architecture" of the portable JPEG
@@ -84,7 +84,7 @@
    grayscale (unless we support mapping color inputs to grayscale, which
    would most easily be done here).  Gamma adjustment may also be needed here.
 
-3. Subsampling (reduction of number of samples in some color components).
+3. Downsampling (reduction of number of samples in some color components).
    This step operates independently on each color component.
 
 4. MCU extraction (creation of a single sequence of 8x8 sample blocks).
@@ -132,7 +132,7 @@
 
 D. A memory management module to deal with small-memory machines.  This must
    create the illusion of virtual memory for certain large data structures
-   (e.g., the subsampled image or the transformed coefficients).
+   (e.g., the downsampled image or the transformed coefficients).
    The interface to this must be defined to minimize the overhead incurred,
    especially on virtual-memory machines where the module won't do much.
 
@@ -175,10 +175,10 @@
 
 6. Inverse DCT transformation of each 8x8 block.
 
-7. De-subsampling.  At this point a pixel image of the original dimensions
+7. Upsampling.  At this point a pixel image of the original dimensions
    has been recreated.
 
-8. Post-subsampling smoothing.  This can be combined with de-subsampling,
+8. Post-upsampling smoothing.  This can be combined with upsampling,
    by using a convolution-like calculation to generate each output pixel
    directly from one or more input pixels.
 
@@ -279,7 +279,7 @@
 as "((value)&0xFF)" on signed-char machines and "(value)" elsewhere.
 
 With these conventions, JSAMPLE values can be assumed to be >= 0.  This should
-simplify correct rounding during subsampling, etc.  The JPEG draft's
+simplify correct rounding during downsampling, etc.  The JPEG draft's
 specification that sample values run from -128..127 will be accommodated by
 subtracting 128 just as the sample value is copied into the source array for
 the DCT step (this will be an array of signed shorts or longs).  Similarly,
@@ -459,9 +459,9 @@
 row of MCUs; this may be from one to four blocks high (eight to thirty-two
 samples) depending on the sampling factors.  The height and width of an MCU
 row may be different in each component.  (Note that the height and width of an
-MCU row changes at the subsampling and de-subsampling steps.  An unsubsampled
+MCU row changes at the downsampling and upsampling steps.  An unsubsampled
 image has the same size in each component.  The preceding statements apply to
-the subsampled dimensions.)
+the downsampled dimensions.)
 
 For example, consider a 1024-pixel-wide image using (2h:2v)(1h:1v)(1h:1v)
 subsampling.  In the noninterleaved case, an MCU row of Y would contain 8x1024
@@ -502,7 +502,7 @@
 For similar reasons, one MCU is also the best chunk size for the frequency
 coefficient quantization and dequantization steps.
 
-For subsampling and desubsampling, the best chunk size is to have each call
+For downsampling and upsampling, the best chunk size is to have each call
 transform Vk sample rows from or to Vmax sample rows (Vk = this component's
 vertical sampling factor, Vmax = largest vertical sampling factor).  There are
 eight such chunks in each MCU row.  Using a whole MCU row as the chunk size
@@ -578,12 +578,12 @@
    subroutine instead of a method.  Depends on whether we want to be able to
    use alternative, optimized methods.)
 
-4. Subsampling: this will be applied to one component at a time.
-	subsample_init: initialize (precalculate convolution factors, for
-			example).  This will be called once per scan.
-	subsample: Given a sample array, reduce it to a smaller number of
-		   samples using specified sampling factors.
-	subsample_term: clean up at the end of a scan.
+4. Downsampling: this will be applied to one component at a time.
+	downsample_init: initialize (precalculate convolution factors, for
+			 example).  This will be called once per scan.
+	downsample: Given a sample array, reduce it to a smaller number of
+		    samples using specified sampling factors.
+	downsample_term: clean up at the end of a scan.
    If the current component has vertical sampling factor Vk and the largest
    sampling factor is Vmax, then the input is always Vmax sample rows (whose
    width is a multiple of Hmax) and the output is always Vk sample rows.
@@ -621,11 +621,11 @@
    MCU extraction; hard to see a need for multiple instantiations here.)
 
 8. Entropy encoding (Huffman or arithmetic).
-	entropy_encoder_init: prepare for one scan.
+	entropy_encode_init: prepare for one scan.
 	entropy_encode: accepts an MCU's worth of quantized coefficients,
 			encodes and outputs them.
-	entropy_encoder_term: finish up at end of a scan (dump any buffered
-			      bytes, for example).
+	entropy_encode_term: finish up at end of a scan (dump any buffered
+			     bytes, for example).
    The data output by this module will be sent to the entropy_output method
    provided by the pipeline controller.  (It will probably be worth using
    buffering to pass multiple bytes per call of the output method.)  The
@@ -819,6 +819,7 @@
 			  after read_file_header and again after each scan;
 			  it returns TRUE if it finds SOS, FALSE if EOI.
 	read_jpeg_data: fetch data for entropy decoder.
+	resync_to_restart: try to recover from bogus data (see below).
 	read_scan_trailer: finish up after one scan, prepare for another call
 			   of read_scan_header (may be a no-op).
 	read_file_trailer: finish up at end of file (probably a no-op).
@@ -836,6 +837,19 @@
    opened the input file and possibly read part of the header before
    read_file_header is called.)
 
+   When reading a file with a nonzero restart interval, the entropy decoder
+   expects to see a correct sequence of restart markers.  In some cases, these
+   markers may be synthesized by the file-format module (a TIFF reader might
+   do so, for example, using tile boundary pointers to determine where the
+   restart intervals fall).  If the incoming data is corrupted, the entropy
+   decoder will read as far as the next JPEG marker, which may or may not be
+   the expected next restart marker.  If it isn't, resync_to_restart is called
+   to try to locate a good place to resume reading.  We make this heuristic a
+   file-format-dependent operation since some file formats may have special
+   info that's not available to the entropy decoder (again, TIFF is an
+   example).  Note that resync_to_restart is NOT called at the end of a scan;
+   it is read_scan_trailer's responsibility to resync there.
+
    NOTE: for JFIF/raw-JPEG file format, the read_jpeg_data routine is actually
    supplied by the user interface; the jrdjfif module uses read_jpeg_data
    internally to scan the input stream.  This makes it possible for the user
@@ -847,17 +861,18 @@
    header module, if indeed it is practical at all.
 
 2. Entropy (Huffman or arithmetic) decoding of the coefficient sequence.
-	entropy_decoder_init: prepare for one scan.
+	entropy_decode_init: prepare for one scan.
 	entropy_decode: decodes and returns an MCU's worth of quantized
 			coefficients per call.
-	entropy_decoder_term: finish up after a scan (may be a no-op).
+	entropy_decode_term: finish up after a scan (may be a no-op).
    This will read raw data by calling the read_jpeg_data method (I don't see
    any reason to provide a further level of indirection).
    (This hides which entropy encoding method is in use.)
 
 3. Quantization descaling and zigzag reordering of the elements in each 8x8
-   block.  (This can probably be a plain subroutine called once per block;
-   hard to see a need for multiple instantiations here.)
+   block.  This will be folded into entropy_decode for efficiency reasons:
+   many of the coefficients are zeroes, and this can be exploited most easily
+   within entropy_decode since the encoding explicitly skips zeroes.
 
 4. MCU disassembly (conversion of a possibly interleaved sequence of 8x8
    blocks back to separate components in pixel map order).
@@ -867,7 +882,7 @@
 			  coefficient image array.
 	disassemble_term: clean up at the end of a scan.
    Probably this should be called once per MCU row and should call the
-   preceding two objects repeatedly to obtain the row's data.  The output is
+   entropy decoder repeatedly to obtain the row's data.  The output is
    always a multiple of an MCU's dimensions.
    (An object on the grounds that multiple instantiations might be useful.)
 
@@ -889,14 +904,14 @@
    labor in compression.  Note that the actual IDCT code is a separate source
    file.
 
-7. De-subsampling and smoothing: this will be applied to one component at a
+7. Upsampling and smoothing: this will be applied to one component at a
    time.  Note that cross-pixel smoothing, which was a separate step in the
    prototype code, will now be performed simultaneously with expansion.
-	unsubsample_init: initialize (precalculate convolution factors, for
-			  example).  This will be called once per scan.
-	unsubsample: Given a sample array, enlarge it by specified sampling
-		     factors.
-	unsubsample_term: clean up at the end of a scan.
+	upsample_init: initialize (precalculate convolution factors, for
+		       example).  This will be called once per scan.
+	upsample: Given a sample array, enlarge it by specified sampling
+		  factors.
+	upsample_term: clean up at the end of a scan.
    If the current component has vertical sampling factor Vk and the largest
    sampling factor is Vmax, then the input is always Vk sample rows (whose
    width is a multiple of Hk) and the output is always Vmax sample rows.
diff --git a/cjpeg.1 b/cjpeg.1
index c2ca517..c76a912 100644
--- a/cjpeg.1
+++ b/cjpeg.1
@@ -1,16 +1,46 @@
-.TH CJPEG 1 "28 February 1992"
+.TH CJPEG 1 "4 November 1992"
 .SH NAME
 cjpeg \- compress an image file to a JPEG file
 .SH SYNOPSIS
 .B cjpeg
 [
-.BI \-Q " quality"
+.BI \-quality " N"
 ]
 [
-.B \-oTIad
+.B \-grayscale
 ]
 [
-.BI \-m " memory"
+.B \-optimize
+]
+[
+.B \-targa
+]
+[
+.BI \-maxmemory " N"
+]
+[
+.BI \-restart " N"
+]
+[
+.BI \-smooth " N"
+]
+[
+.B \-verbose
+]
+[
+.B \-debug
+]
+[
+.B \-arithmetic
+]
+[
+.B \-nointerleave
+]
+[
+.BI \-qtables " file"
+]
+[
+.BI \-sample " HxV[,...]"
 ]
 [
 .I filename
@@ -21,106 +51,190 @@
 .B cjpeg
 compresses the named image file, or the standard input if no file is
 named, and produces a JPEG/JFIF file on the standard output.
-The currently supported image file formats are: PPM (PBMPLUS color
+The currently supported input file formats are: PPM (PBMPLUS color
 format), PGM (PBMPLUS gray-scale format), GIF, Targa, and RLE (Utah Raster
 Toolkit format).  (RLE is supported only if the URT library is available.)
 .SH OPTIONS
+All switch names may be abbreviated; for example,
+.B \-grayscale
+may be written
+.B \-gray
+or
+.BR \-gr .
+Most of the "basic" switches can be abbreviated to as little as one letter.
+Upper and lower case are equivalent (thus
+.B \-GIF
+is the same as
+.BR \-gif ).
+British spellings are also accepted (e.g.,
+.BR \-greyscale ),
+though for brevity these are not mentioned below.
+.PP
+The basic switches are:
 .TP
-.BI \-Q " quality"
+.BI \-quality " N"
 Scale quantization tables to adjust image quality.  Quality is 0 (worst) to
 100 (best); default is 75.  (See below for more info.)
 .TP
-.B \-o
+.B \-grayscale
+Create monochrome JPEG file from color input.  Be sure to use this switch when
+compressing a grayscale GIF file, because
+.B cjpeg
+isn't bright enough to notice whether a GIF file uses only shades of gray.
+By saying
+.BR \-grayscale ,
+you'll get a smaller JPEG file that takes less time to process.
+.TP
+.B \-optimize
 Perform optimization of entropy encoding parameters.  Without this, default
 encoding parameters are used.
-.B \-o
+.B \-optimize
 usually makes the JPEG file a little smaller, but
 .B cjpeg
 runs somewhat slower and needs much more memory.  Image quality and speed of
 decompression are unaffected by
-.BR \-o .
+.BR \-optimize .
 .TP
-.B \-T
+.B \-targa
 Input file is Targa format.  Targa files that contain an "identification"
 field will not be automatically recognized by
 .BR cjpeg ;
 for such files you must specify
-.B \-T
-to force
+.B \-targa
+to make
 .B cjpeg
-to treat the input as Targa format.
-.TP
-.B \-I
-Generate noninterleaved JPEG file (not yet supported).
-.TP
-.B \-a
-Use arithmetic coding rather than Huffman coding (not currently
-supported for legal reasons).
-.TP
-.B \-d
-Enable debug printout.  More
-.BR \-d 's
-give more output.  Also, version information is printed at startup.
-.TP
-.BI \-m " memory"
-Set limit for amount of memory to use in processing large images.  Value is
-in thousands of bytes, or millions of bytes if "M" is attached to the
-number.  For example,
-.B \-m 4m
-selects 4000000 bytes.  If more space is needed, temporary files will be used.
+treat the input as Targa format.
 .PP
 The
-.B \-Q
+.B \-quality
 switch lets you trade off compressed file size against quality of the
-reconstructed image: the higher the
-.B \-Q
-setting, the larger the JPEG file, and the closer the output image will be to
-the original input.  Normally you want to use the lowest
-.B \-Q
-setting (smallest file) that decompresses into something visually
-indistinguishable from the original image.  For this purpose the
-.B \-Q
-setting should be between 50 and 95; the default of 75 is often about right.
-If you see defects at 
-.B \-Q
+reconstructed image: the higher the quality setting, the larger the JPEG file,
+and the closer the output image will be to the original input.  Normally you
+want to use the lowest quality setting (smallest file) that decompresses into
+something visually indistinguishable from the original image.  For this
+purpose the quality setting should be between 50 and 95; the default of 75 is
+often about right.  If you see defects at
+.B \-quality
 75, then go up 5 or 10 counts at a time until you are happy with the output
 image.  (The optimal setting will vary from one image to another.)
 .PP
-.B \-Q
+.B \-quality
 100 will generate a quantization table of all 1's, eliminating loss in the
 quantization step (but there is still information loss in subsampling, as well
 as roundoff error).  This setting is mainly of interest for experimental
-purposes.  
-.B \-Q
-values above about 95 are
+purposes.  Quality values above about 95 are
 .B not
 recommended for normal use; the compressed file size goes up dramatically for
 hardly any gain in output image quality.
 .PP
-In the other direction, 
-.B \-Q
-values below 50 will produce very small files of low image quality.  Settings
-around 5 to 10 might be useful in preparing an index of a large image library,
-for example.  Try
-.B \-Q
-2 (or so) for some amusing Cubist effects.  (Note: 
-.B \-Q
+In the other direction, quality values below 50 will produce very small files
+of low image quality.  Settings around 5 to 10 might be useful in preparing an
+index of a large image library, for example.  Try
+.B \-quality
+2 (or so) for some amusing Cubist effects.  (Note: quality
 values below about 25 generate 2-byte quantization tables, which are
-considered optional in the JPEG standard.  
+considered optional in the JPEG standard.
 .B cjpeg
-emits a warning message when you give such a
-.B \-Q
-value, because some commercial JPEG programs may be unable to decode the
-resulting file.)
+emits a warning message when you give such a quality value, because some
+commercial JPEG programs may be unable to decode the resulting file.)
+.PP
+Switches for advanced users:
+.TP
+.BI \-maxmemory " N"
+Set limit for amount of memory to use in processing large images.  Value is
+in thousands of bytes, or millions of bytes if "M" is attached to the
+number.  For example,
+.B \-max 4m
+selects 4000000 bytes.  If more space is needed, temporary files will be used.
+.TP
+.BI \-restart " N"
+Emit a JPEG restart marker every N MCU rows, or every N MCU blocks if "B" is
+attached to the number.
+.B \-restart 0
+(the default) means no restart markers.
+.TP
+.BI \-smooth " N"
+Smooth the input image to eliminate dithering noise.  N, ranging from 1 to
+100, indicates the strength of smoothing.  0 (the default) means no smoothing.
+.TP
+.B \-verbose
+Enable debug printout.  More
+.BR \-v 's
+give more output.  Also, version information is printed at startup.
+.TP
+.B \-debug
+Same as
+.BR \-verbose .
+.PP
+The
+.B \-restart
+option inserts extra markers that allow a JPEG decoder to resynchronize after
+a transmission error.  Without restart markers, any damage to a compressed
+file will usually ruin the image from the point of the error to the end of the
+image; with restart markers, the damage is usually confined to the portion of
+the image up to the next restart marker.  Of course, the restart markers
+occupy extra space.  We recommend
+.B \-restart 1
+for images that will be transmitted across unreliable networks such as Usenet.
+.PP
+The
+.B \-smooth
+option filters the input to eliminate fine-scale noise.  This is often useful
+when converting GIF files to JPEG: a moderate smoothing factor of 10 to 50
+gets rid of dithering patterns in the input file, resulting in a smaller JPEG
+file and a better-looking image.  Too large a smoothing factor will visibly
+blur the image, however.
+.PP
+Switches for wizards:
+.TP
+.B \-arithmetic
+Use arithmetic coding rather than Huffman coding.  (Not currently
+supported for legal reasons.)
+.TP
+.B \-nointerleave
+Generate noninterleaved JPEG file (not yet supported).
+.TP
+.BI \-qtables " file"
+Use the quantization tables given in the specified file.  The file should
+contain one to four tables (64 values each) as plain text.  Comments preceded
+by '#' may be included in the file.  The tables are implicitly numbered
+0,1,etc.  If
+.B \-quality
+N is also specified, the values in the file are scaled according to
+.BR cjpeg 's
+quality scaling curve.
+.TP
+.BI \-sample " HxV[,...]"
+Set JPEG sampling factors.  If you specify fewer H/V pairs than there are
+components, the remaining components are set to 1x1 sampling.  The default
+setting is equivalent to \fB\-sample 2x2\fR.
+.PP
+The "wizard" switches are intended for experimentation with JPEG.  If you
+don't know what you are doing, \fBdon't use them\fR.  You can easily produce
+files with worse image quality and/or poorer compression than you'll get from
+the default settings.  Furthermore, these switches should not be used when
+making files intended for general use, because not all JPEG implementations
+will support unusual JPEG parameter settings.
 .SH EXAMPLES
 .LP
 This example compresses the PPM file foo.ppm with a quality factor of
 60 and saves the output as foo.jpg:
 .IP
-.B cjpeg \-Q
+.B cjpeg \-quality
 .I 60 foo.ppm
 .B >
 .I foo.jpg
+.SH ENVIRONMENT
+.TP
+.B JPEGMEM
+If this environment variable is set, its value is the default memory limit.
+The value is specified as described for the
+.B \-maxmemory
+switch.
+.B JPEGMEM
+overrides the default value specified when the program was compiled, and
+itself is overridden by an explicit
+.BR \-maxmemory .
 .SH SEE ALSO
 .BR djpeg (1)
 .br
@@ -137,8 +251,8 @@
 Not all variants of Targa file format are supported.
 .PP
 The
-.B -T
+.B -targa
 switch is not a bug, it's a feature.  (It would be a bug if the Targa format
 designers had not been clueless.)
 .PP
-Not as fast as we'd like.
+Still not as fast as we'd like.
diff --git a/ckconfig.c b/ckconfig.c
index 63fa116..687d2c6 100644
--- a/ckconfig.c
+++ b/ckconfig.c
@@ -218,8 +218,9 @@
     printf("\nMost of the changes recommended by this program can be made either\n");
     printf("by editing jconfig.h, or by adding -Dsymbol switches to the CFLAGS\n");
     printf("line in your Makefile.  (Some PC compilers expect /Dsymbol instead.)\n");
-    printf("The CFLAGS method is simpler, but if your system doesn't use makefiles,\n");
-    printf("or if your compiler doesn't support -D, then you must change jconfig.h.\n");
+    printf("The CFLAGS method is simpler, but if your compiler doesn't support -D,\n");
+    printf("then you must change jconfig.h.  Also, it's best to change jconfig.h\n");
+    printf("if you plan to use the JPEG software as a library for other programs.\n");
     any_changes = 1;
   }
   printf("\n");			/* blank line before each problem report */
diff --git a/djpeg.1 b/djpeg.1
index df4064c..be5a815 100644
--- a/djpeg.1
+++ b/djpeg.1
@@ -1,16 +1,43 @@
-.TH DJPEG 1 "28 February 1992"
+.TH DJPEG 1 "2 August 1992"
 .SH NAME
 djpeg \- decompress a JPEG file to an image file
 .SH SYNOPSIS
 .B djpeg
 [
-.B \-GPRTgD1bd
+.BI \-colors " N"
 ]
 [
-.BI \-q " colors"
+.B \-gif
 ]
 [
-.BI \-m " memory"
+.B \-pnm
+]
+[
+.B \-rle
+]
+[
+.B \-targa
+]
+[
+.B \-blocksmooth
+]
+[
+.B \-grayscale
+]
+[
+.BI \-maxmemory " N"
+]
+[
+.B \-nodither
+]
+[
+.B \-onepass
+]
+[
+.B \-verbose
+]
+[
+.B \-debug
 ]
 [
 .I filename
@@ -20,92 +47,130 @@
 .LP
 .B djpeg
 decompresses the named JPEG file, or the standard input if no file is named,
-and produces an image file on the standard output.  PPM, GIF, Targa, or RLE
-output format can be selected.  (RLE is supported only if the URT library is
-available.)
+and produces an image file on the standard output.  PBMPLUS (PPM/PGM), GIF,
+Targa, or RLE (Utah Raster Toolkit) output format can be selected.  (RLE is
+supported only if the URT library is available.)
 .SH OPTIONS
+All switch names may be abbreviated; for example,
+.B \-grayscale
+may be written
+.B \-gray
+or
+.BR \-gr .
+Most of the "basic" switches can be abbreviated to as little as one letter.
+Upper and lower case are equivalent (thus
+.B \-GIF
+is the same as
+.BR \-gif ).
+British spellings are also accepted (e.g.,
+.BR \-greyscale ),
+though for brevity these are not mentioned below.
+.PP
+The basic switches are:
 .TP
-.B \-G
-Select GIF output format (implies
-.BR \-q ,
-with default of 256 colors).
+.BI \-colors " N"
+Reduce image to at most N colors.  This reduces the number of colors used in
+the output image, so that it can be displayed on a colormapped display or
+stored in a colormapped file format.  For example, if you have an 8-bit
+display, you'd need to reduce to 256 or fewer colors.
 .TP
-.B \-P
-Select PPM or PGM output format (this is the default).  PGM is emitted if the
-JPEG file is gray-scale or if
-.B \-g
-is specified.
+.BI \-quantize " N"
+Same as
+.BR \-colors .
+.B \-colors
+is the recommended name,
+.B \-quantize
+is provided only for backwards compatibility.
 .TP
-.B \-R
-Select RLE output format.  Requires URT library.
+.B \-gif
+Select GIF output format.  Since GIF does not support more than 256 colors,
+.B \-colors 256
+is assumed (unless you specify a smaller number of colors).
 .TP
-.B \-T
+.B \-pnm
+Select PBMPLUS (PPM/PGM) output format (this is the default format).
+PGM is emitted if the JPEG file is gray-scale or if
+.B \-grayscale
+is specified; otherwise PPM is emitted.
+.TP
+.B \-rle
+Select RLE output format.  (Requires URT library.)
+.TP
+.B \-targa
 Select Targa output format.  Gray-scale format is emitted if the JPEG file is
 gray-scale or if
-.B \-g
+.B \-grayscale
 is specified; otherwise, colormapped format is emitted if
-.B \-q
+.B \-colors
 is specified; otherwise, 24-bit full-color format is emitted.
+.PP
+Switches for advanced users:
 .TP
-.B \-g
-Force gray-scale output even if input is color.
+.B \-blocksmooth
+Perform cross-block smoothing.  This is quite memory-intensive and only seems
+to improve the image at very low quality settings (\fB\-quality\fR 10 to 20 or
+so).  At normal quality settings it may make the image worse.
 .TP
-.BI \-q " N"
-Quantize to N colors.  This reduces the number of colors in the output image
-so that it can be displayed on a colormapped display or stored in a
-colormapped file format.  For example, if you have an 8-bit display, you'd
-need to quantize to 256 or fewer colors.
+.B \-grayscale
+Force gray-scale output even if JPEG file is color.
+Useful for viewing on monochrome displays.
 .TP
-.B \-D
+.BI \-maxmemory " N"
+Set limit for amount of memory to use in processing large images.  Value is
+in thousands of bytes, or millions of bytes if "M" is attached to the
+number.  For example,
+.B \-max 4m
+selects 4000000 bytes.  If more space is needed, temporary files will be used.
+.TP
+.B \-nodither
 Do not use dithering in color quantization.  By default, Floyd-Steinberg
 dithering is applied when quantizing colors, but on some images dithering may
 result in objectionable "graininess".  If that happens, you can turn off
 dithering with
-.BR \-D .
-.B \-D
+.BR \-nodither .
+.B \-nodither
 is ignored unless you also say
-.B \-q
-or
-.BR \-G .
+.B \-colors
+.IR N .
 .TP
-.B \-1
+.B \-onepass
 Use one-pass instead of two-pass color quantization.  The one-pass method is
 faster and needs less memory, but it produces a lower-quality image.
-.B \-1
+.B \-onepass
 is ignored unless you also say
-.B \-q
-or
-.BR \-G .
+.B \-colors
+.IR N .
 Also, the one-pass method is always used for gray-scale output (the two-pass
 method is no improvement then).
 .TP
-.B \-b
-Perform cross-block smoothing.  This is quite memory-intensive and only seems
-to improve the image at low quality settings (\fB\-Q\fR 10 to 20 or so).
-At normal
-.B \-Q
-settings it may make the image worse.
-.TP
-.B \-d
+.B \-verbose
 Enable debug printout.  More
-.BR \-d 's
+.BR \-v 's
 give more output.  Also, version information is printed at startup.
 .TP
-.BI \-m " memory"
-Set limit for amount of memory to use in processing large images.  Value is
-in thousands of bytes, or millions of bytes if "M" is attached to the
-number.  For example,
-.B \-m 4m
-selects 4000000 bytes.  If more space is needed, temporary files will be used.
+.B \-debug
+Same as
+.BR \-verbose .
 .SH EXAMPLES
 .LP
-This example decompresses the JPEG file foo.jpg, quantizes to 256 colors,
-and saves the output in GIF format in foo.gif:
+This example decompresses the JPEG file foo.jpg, automatically quantizes to
+256 colors, and saves the output in GIF format in foo.gif:
 .IP
-.B djpeg \-G
+.B djpeg \-gif
 .I foo.jpg
 .B >
 .I foo.gif
+.SH ENVIRONMENT
+.TP
+.B JPEGMEM
+If this environment variable is set, its value is the default memory limit.
+The value is specified as described for the
+.B \-maxmemory
+switch.
+.B JPEGMEM
+overrides the default value specified when the program was compiled, and
+itself is overridden by an explicit
+.BR \-maxmemory .
 .SH SEE ALSO
 .BR cjpeg (1)
 .br
@@ -119,4 +184,4 @@
 .SH BUGS
 Arithmetic coding is not supported for legal reasons.
 .PP
-Not as fast as we'd like.
+Still not as fast as we'd like.
diff --git a/egetopt.c b/egetopt.c
deleted file mode 100644
index dbfd7fd..0000000
--- a/egetopt.c
+++ /dev/null
@@ -1,286 +0,0 @@
-/*
- * egetopt.c -- Extended 'getopt'.
- *
- * A while back, a public-domain version of getopt() was posted to the
- * net.  A bit later, a gentleman by the name of Keith Bostic made some
- * enhancements and reposted it.
- *
- * In recent weeks (i.e., early-to-mid 1988) there's been some
- * heated discussion in comp.lang.c about the merits and drawbacks
- * of getopt(), especially with regard to its handling of '?'.
- *
- * In light of this, I have taken Mr. Bostic's public-domain getopt()
- * and have made some changes that I hope will be considered to be
- * improvements.  I call this routine 'egetopt' ("Extended getopt").
- * The default behavior of this routine is the same as that of getopt(),
- * but it has some optional features that make it more useful.  These
- * options are controlled by the settings of some global variables.
- * By not setting any of these extra global variables, you will have
- * the same functionality as getopt(), which should satisfy those
- * purists who believe getopt() is perfect and can never be improved.
- * If, on the other hand, you are someone who isn't satisfied with the
- * status quo, egetopt() may very well give you the added capabilities
- * you want.
- *
- * Look at the enclosed README file for a description of egetopt()'s
- * new features.
- *
- * The code was originally posted to the net as getopt.c by ...
- *
- *	Keith Bostic
- *	ARPA: keith@seismo 
- *	UUCP: seismo!keith
- *
- * Current version: added enhancements and comments, reformatted code.
- *
- *	Lloyd Zusman
- *	Master Byte Software
- *	Los Gatos, California
- *	Internet:	ljz@fx.com
- *	UUCP:		...!ames!fxgrp!ljz
- *
- *    	May, 1988
- *
- * Modified for use in free JPEG code:
- *
- *	Ed Hanway
- *	UUCP:	uunet!sisd!jeh
- *
- *	October, 1991
- */
-
-/* The original egetopt.c was written not to need stdio.h.
- * For the JPEG code this is an unnecessary and unportable assumption.
- * Also, we make all the variables and routines "static" to avoid
- * possible conflicts with a system-library version of getopt.
- *
- * In the JPEG code, this file is compiled by #including it in jcmain.c
- * or jdmain.c.  Since ANSI2KNR does not process include files, we can't
- * rely on it to convert function definitions to K&R style.  Hence we
- * provide both styles of function header with an explicit #ifdef PROTO (ick).
- */
-
-#define GVAR static		/* make empty to export these variables */
-
-/*
- * None of these constants are referenced in the executable portion of
- * the code ... their sole purpose is to initialize global variables.
- */
-#define BADCH		(int)'?'
-#define NEEDSEP		(int)':'
-#define MAYBESEP	(int)'\0'
-#define EMSG		""
-#define START		"-"
-
-/*
- * Here are all the pertinent global variables.
- */
-GVAR int opterr = 1;		/* if true, output error message */
-GVAR int optind = 1;		/* index into parent argv vector */
-GVAR int optopt;		/* character checked for validity */
-GVAR int optbad = BADCH;	/* character returned on error */
-GVAR int optchar = 0;		/* character that begins returned option */
-GVAR int optneed = NEEDSEP;	/* flag for mandatory argument */
-GVAR int optmaybe = MAYBESEP;	/* flag for optional argument */
-GVAR const char *optarg;	/* argument associated with option */
-GVAR const char *optstart = START; /* list of characters that start options */
-
-
-/*
- * Macros.
- */
-
-/*
- * Conditionally print out an error message and return (depends on the
- * setting of 'opterr').
- */
-#define TELL(S)	{ \
-	if (opterr) \
-		fprintf(stderr, "%s%s%c\n", *nargv, (S), optopt); \
-	return (optbad); \
-}
-
-/*
- * This works similarly to index() and strchr().  I include it so that you
- * don't need to be concerned as to which one your system has.
- */
-
-#ifdef PROTO
-LOCAL const char *
-_sindex (const char *string, int ch)
-#else
-LOCAL const char *
-_sindex (string, ch)
-     const char *string;
-     int ch;
-#endif
-{
-	if (string != NULL) {
-		for (; *string != '\0'; ++string) {
-			if (*string == (char)ch) {
-				return (string);
-			}
-		}
-	}
-
-	return (NULL);
-}
-
-/*
- * Here it is:
- */
-
-#ifdef PROTO
-LOCAL int
-egetopt (int nargc, char **nargv, const char *ostr)
-#else
-LOCAL int
-egetopt (nargc, nargv, ostr)
-     int nargc;
-     char **nargv;
-     const char *ostr;
-#endif
-{
-	static const char *place = EMSG; /* option letter processing */
-	register const char *oli;	 /* option letter list index */
-	register const char *osi = NULL; /* option start list index */
-
-	if (nargv == (char **)NULL) {
-		return (EOF);
-	}
-
-	if (nargc <= optind || nargv[optind] == NULL) {
-		return (EOF);
-	}
-
-	if (place == NULL) {
-		place = EMSG;
-	}
-
-	/*
-	 * Update scanning pointer.
-	 */
-	if (*place == '\0') {
-		place = nargv[optind];
-		if (place == NULL) {
-			return (EOF);
-		}
-		osi = _sindex(optstart, *place);
-		if (osi != NULL) {
-			optchar = (int)*osi;
-		}
-		if (optind >= nargc || osi == NULL || *++place == '\0') {
-		    	return (EOF);
-		}
-
-		/*
-		 * Two adjacent, identical flag characters were found.
-		 * This takes care of "--", for example.
-		 */
-		if (*place == place[-1]) {
-			++optind;
-			return (EOF);
-		}
-	}
-
-	/*
-	 * If the option is a separator or the option isn't in the list,
-	 * we've got an error.
-	 */
-	optopt = (int)*place++;
-	oli = _sindex(ostr, optopt);
-	if (optopt == optneed || optopt == optmaybe || oli == NULL) {
-		/*
-		 * If we're at the end of the current argument, bump the
-		 * argument index.
-		 */
-		if (*place == '\0') {
-			++optind;
-		}
-		TELL(": illegal option -- ");	/* byebye */
-	}
-
-	/*
-	 * If there is no argument indicator, then we don't even try to
-	 * return an argument.
-	 */
-	++oli;
-	if (*oli == '\0' || (*oli != optneed && *oli != optmaybe)) {
-		/*
-		 * If we're at the end of the current argument, bump the
-		 * argument index.
-		 */
-		if (*place == '\0') {
-			++optind;
-		}
-		optarg = NULL;
-	}
-	/*
-	 * If we're here, there's an argument indicator.  It's handled
-	 * differently depending on whether it's a mandatory or an
-	 * optional argument.
-	 */
-	else {
-		/*
-		 * If there's no white space, use the rest of the
-		 * string as the argument.  In this case, it doesn't
-		 * matter if the argument is mandatory or optional.
-		 */
-		if (*place != '\0') {
-			optarg = place;
-		}
-		/*
-		 * If we're here, there's whitespace after the option.
-		 *
-		 * Is it a mandatory argument?  If so, return the
-		 * next command-line argument if there is one.
-		 */
-		else if (*oli == optneed) {
-			/*
-			 * If we're at the end of the argument list, there
-			 * isn't an argument and hence we have an error.
-			 * Otherwise, make 'optarg' point to the argument.
-			 */
-			if (nargc <= ++optind) {
-				place = EMSG;
-				TELL(": option requires an argument -- ");
-			}
-			else {
-				optarg = nargv[optind];
-			}
-		}
-		/*
-		 * If we're here it must have been an optional argument.
-		 */
-		else {
-			if (nargc <= ++optind) {
-				place = EMSG;
-				optarg = NULL;
-			}
-			else {
-				optarg = nargv[optind];
-				if (optarg == NULL) {
-					place = EMSG;
-				}
-				/*
-				 * If the next item begins with a flag
-				 * character, we treat it like a new
-				 * argument.  This is accomplished by
-				 * decrementing 'optind' and returning
-				 * a null argument.
-				 */
-				else if (_sindex(optstart, *optarg) != NULL) {
-					--optind;
-					optarg = NULL;
-				}
-			}
-		}
-		place = EMSG;
-		++optind;
-	}
-
-	/*
-	 * Return option letter.
-	 */
-	return (optopt);
-}
diff --git a/example.c b/example.c
index 2cd3afb..3e7a5dd 100644
--- a/example.c
+++ b/example.c
@@ -185,9 +185,9 @@
    * call to jpeg_compress; typically, making them local data in the
    * calling routine is the best strategy.
    */
-  struct compress_info_struct cinfo;
-  struct compress_methods_struct c_methods;
-  struct external_methods_struct e_methods;
+  struct Compress_info_struct cinfo;
+  struct Compress_methods_struct c_methods;
+  struct External_methods_struct e_methods;
 
   /* Initialize the system-dependent method pointers. */
   cinfo.methods = &c_methods;	/* links to method structs */
@@ -314,7 +314,7 @@
 /*
  * The error_exit() routine should not return to its caller.  The default
  * routine calls exit(), but here we assume that we want to return to
- * read_JPEG_data, which has set up a setjmp context for the purpose.
+ * read_JPEG_file, which has set up a setjmp context for the purpose.
  * You should make sure that the free_all method is called, either within
  * error_exit or after the return to the outer-level routine.
  */
@@ -525,9 +525,9 @@
    * call to jpeg_decompress; typically, making them local data in the
    * calling routine is the best strategy.
    */
-  struct decompress_info_struct cinfo;
-  struct decompress_methods_struct dc_methods;
-  struct external_methods_struct e_methods;
+  struct Decompress_info_struct cinfo;
+  struct Decompress_methods_struct dc_methods;
+  struct External_methods_struct e_methods;
 
   /* Select the input and output files.
    * In this example we want to open the input file before doing anything else,
@@ -554,6 +554,10 @@
   emethods = &e_methods;	/* save struct addr for possible access */
   e_methods.error_exit = error_exit; /* supply error-exit routine */
   e_methods.trace_message = trace_message; /* supply trace-message routine */
+  e_methods.trace_level = 0;	/* default = no tracing */
+  e_methods.num_warnings = 0;	/* no warnings emitted yet */
+  e_methods.first_warning_level = 0; /* display first corrupt-data warning */
+  e_methods.more_warning_level = 3; /* but suppress additional ones */
 
   /* prepare setjmp context for possible exit from error_exit */
   if (setjmp(setjmp_buffer)) {
@@ -614,6 +618,9 @@
   /* Here we assume only the input file need be closed. */
   fclose(cinfo.input_file);
 
+  /* You might want to test e_methods.num_warnings to see if bad data was
+   * detected.  In this example, we just blindly forge ahead.
+   */
   return 1;			/* indicate success */
 
   /* Note: if you want to decompress more than one image, we recommend you
diff --git a/jbsmooth.c b/jbsmooth.c
index c18addd..223143e 100644
--- a/jbsmooth.c
+++ b/jbsmooth.c
@@ -27,7 +27,7 @@
 		     JBLOCKROW output)
 {
   QUANT_TBL_PTR Qptr = cinfo->quant_tbl_ptrs[compptr->quant_tbl_no];
-  long blocks_in_row = compptr->subsampled_width / DCTSIZE;
+  long blocks_in_row = compptr->downsampled_width / DCTSIZE;
   long col;
 
   /* First, copy the block row as-is.
diff --git a/jcarith.c b/jcarith.c
index f686747..108a9fb 100644
--- a/jcarith.c
+++ b/jcarith.c
@@ -7,12 +7,12 @@
  *
  * This file contains arithmetic entropy encoding routines.
  * These routines are invoked via the methods entropy_encode,
- * entropy_encoder_init/term, and entropy_optimize.
+ * entropy_encode_init/term, and entropy_optimize.
  */
 
 #include "jinclude.h"
 
-#ifdef ARITH_CODING_SUPPORTED
+#ifdef C_ARITH_CODING_SUPPORTED
 
 
 /*
@@ -39,4 +39,4 @@
   }
 }
 
-#endif /* ARITH_CODING_SUPPORTED */
+#endif /* C_ARITH_CODING_SUPPORTED */
diff --git a/jccolor.c b/jccolor.c
index 6fb1512..0da5e1b 100644
--- a/jccolor.c
+++ b/jccolor.c
@@ -16,16 +16,88 @@
 static JSAMPARRAY pixel_row;	/* Workspace for a pixel row in input format */
 
 
+/**************** RGB -> YCbCr conversion: most common case **************/
+
+/*
+ * YCbCr is defined per CCIR 601-1, except that Cb and Cr are
+ * normalized to the range 0..MAXJSAMPLE rather than -0.5 .. 0.5.
+ * The conversion equations to be implemented are therefore
+ *	Y  =  0.29900 * R + 0.58700 * G + 0.11400 * B
+ *	Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B  + MAXJSAMPLE/2
+ *	Cr =  0.50000 * R - 0.41869 * G - 0.08131 * B  + MAXJSAMPLE/2
+ * (These numbers are derived from TIFF 6.0 section 21, dated 3-June-92.)
+ *
+ * To avoid floating-point arithmetic, we represent the fractional constants
+ * as integers scaled up by 2^16 (about 4 digits precision); we have to divide
+ * the products by 2^16, with appropriate rounding, to get the correct answer.
+ *
+ * For even more speed, we avoid doing any multiplications in the inner loop
+ * by precalculating the constants times R,G,B for all possible values.
+ * For 8-bit JSAMPLEs this is very reasonable (only 256 entries per table);
+ * for 12-bit samples it is still acceptable.  It's not very reasonable for
+ * 16-bit samples, but if you want lossless storage you shouldn't be changing
+ * colorspace anyway.
+ * The MAXJSAMPLE/2 offsets and the rounding fudge-factor of 0.5 are included
+ * in the tables to save adding them separately in the inner loop.
+ */
+
+#ifdef SIXTEEN_BIT_SAMPLES
+#define SCALEBITS	14	/* avoid overflow */
+#else
+#define SCALEBITS	16	/* speedier right-shift on some machines */
+#endif
+#define ONE_HALF	((INT32) 1 << (SCALEBITS-1))
+#define FIX(x)		((INT32) ((x) * (1L<<SCALEBITS) + 0.5))
+
+/* We allocate one big table and divide it up into eight parts, instead of
+ * doing eight alloc_small requests.  This lets us use a single table base
+ * address, which can be held in a register in the inner loops on many
+ * machines (more than can hold all eight addresses, anyway).
+ */
+
+static INT32 * rgb_ycc_tab;	/* => table for RGB to YCbCr conversion */
+#define R_Y_OFF		0			/* offset to R => Y section */
+#define G_Y_OFF		(1*(MAXJSAMPLE+1))	/* offset to G => Y section */
+#define B_Y_OFF		(2*(MAXJSAMPLE+1))	/* etc. */
+#define R_CB_OFF	(3*(MAXJSAMPLE+1))
+#define G_CB_OFF	(4*(MAXJSAMPLE+1))
+#define B_CB_OFF	(5*(MAXJSAMPLE+1))
+#define R_CR_OFF	B_CB_OFF		/* B=>Cb, R=>Cr are the same */
+#define G_CR_OFF	(6*(MAXJSAMPLE+1))
+#define B_CR_OFF	(7*(MAXJSAMPLE+1))
+#define TABLE_SIZE	(8*(MAXJSAMPLE+1))
+
+
 /*
  * Initialize for colorspace conversion.
  */
 
 METHODDEF void
-colorin_init (compress_info_ptr cinfo)
+rgb_ycc_init (compress_info_ptr cinfo)
 {
+  INT32 i;
+
   /* Allocate a workspace for the result of get_input_row. */
   pixel_row = (*cinfo->emethods->alloc_small_sarray)
 		(cinfo->image_width, (long) cinfo->input_components);
+
+  /* Allocate and fill in the conversion tables. */
+  rgb_ycc_tab = (INT32 *) (*cinfo->emethods->alloc_small)
+				(TABLE_SIZE * SIZEOF(INT32));
+
+  for (i = 0; i <= MAXJSAMPLE; i++) {
+    rgb_ycc_tab[i+R_Y_OFF] = FIX(0.29900) * i;
+    rgb_ycc_tab[i+G_Y_OFF] = FIX(0.58700) * i;
+    rgb_ycc_tab[i+B_Y_OFF] = FIX(0.11400) * i     + ONE_HALF;
+    rgb_ycc_tab[i+R_CB_OFF] = (-FIX(0.16874)) * i;
+    rgb_ycc_tab[i+G_CB_OFF] = (-FIX(0.33126)) * i;
+    rgb_ycc_tab[i+B_CB_OFF] = FIX(0.50000) * i    + ONE_HALF*(MAXJSAMPLE+1);
+/*  B=>Cb and R=>Cr tables are the same
+    rgb_ycc_tab[i+R_CR_OFF] = FIX(0.50000) * i    + ONE_HALF*(MAXJSAMPLE+1);
+*/
+    rgb_ycc_tab[i+G_CR_OFF] = (-FIX(0.41869)) * i;
+    rgb_ycc_tab[i+B_CR_OFF] = (-FIX(0.08131)) * i;
+  }
 }
 
 
@@ -34,38 +106,16 @@
  * JPEG colorspace.
  */
 
-
-/*
- * This version handles RGB -> YCbCr conversion.
- * YCbCr is defined per CCIR 601-1, except that Cb and Cr are
- * normalized to the range 0..MAXJSAMPLE rather than -0.5 .. 0.5.
- * The conversion equations to be implemented are therefore
- *	Y  =  0.29900 * R + 0.58700 * G + 0.11400 * B
- *	Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B
- *	Cr =  0.50000 * R - 0.41869 * G - 0.08131 * B
- * where Cb and Cr must be incremented by MAXJSAMPLE/2 to create a
- * nonnegative output value.
- * (These numbers are derived from TIFF Appendix O, draft of 4/10/91.)
- *
- * To avoid floating-point arithmetic, we represent the fractional constants
- * as integers scaled up by 2^14 (about 4 digits precision); we have to divide
- * the products by 2^14, with appropriate rounding, to get the correct answer.
- *
- * For even more speed, we could avoid any multiplications in the inner loop
- * by precalculating the constants times R,G,B for all possible values.
- * This is not currently implemented.
- */
-
-#define SCALEBITS	14
-#define ONE_HALF	((INT32) 1 << (SCALEBITS-1))
-#define FIX(x)		((INT32) ((x) * (1L<<SCALEBITS) + 0.5))
-
-
 METHODDEF void
 get_rgb_ycc_rows (compress_info_ptr cinfo,
 		  int rows_to_read, JSAMPIMAGE image_data)
 {
-  register INT32 r, g, b;
+#ifdef SIXTEEN_BIT_SAMPLES
+  register UINT16 r, g, b;
+#else
+  register int r, g, b;
+#endif
+  register INT32 * ctab = rgb_ycc_tab;
   register JSAMPROW inptr0, inptr1, inptr2;
   register JSAMPROW outptr0, outptr1, outptr2;
   register long col;
@@ -82,36 +132,103 @@
     outptr0 = image_data[0][row];
     outptr1 = image_data[1][row];
     outptr2 = image_data[2][row];
-    for (col = width; col > 0; col--) {
-      r = GETJSAMPLE(*inptr0++);
-      g = GETJSAMPLE(*inptr1++);
-      b = GETJSAMPLE(*inptr2++);
+    for (col = 0; col < width; col++) {
+      r = GETJSAMPLE(inptr0[col]);
+      g = GETJSAMPLE(inptr1[col]);
+      b = GETJSAMPLE(inptr2[col]);
       /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations
        * must be too; we do not need an explicit range-limiting operation.
        * Hence the value being shifted is never negative, and we don't
        * need the general RIGHT_SHIFT macro.
        */
       /* Y */
-      *outptr0++ = (JSAMPLE)
-	((  FIX(0.29900)*r  + FIX(0.58700)*g + FIX(0.11400)*b
-	  + ONE_HALF) >> SCALEBITS);
+      outptr0[col] = (JSAMPLE)
+		((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])
+		 >> SCALEBITS);
       /* Cb */
-      *outptr1++ = (JSAMPLE)
-	(((-FIX(0.16874))*r - FIX(0.33126)*g + FIX(0.50000)*b
-	  + ONE_HALF*(MAXJSAMPLE+1)) >> SCALEBITS);
+      outptr1[col] = (JSAMPLE)
+		((ctab[r+R_CB_OFF] + ctab[g+G_CB_OFF] + ctab[b+B_CB_OFF])
+		 >> SCALEBITS);
       /* Cr */
-      *outptr2++ = (JSAMPLE)
-	((  FIX(0.50000)*r  - FIX(0.41869)*g - FIX(0.08131)*b
-	  + ONE_HALF*(MAXJSAMPLE+1)) >> SCALEBITS);
+      outptr2[col] = (JSAMPLE)
+		((ctab[r+R_CR_OFF] + ctab[g+G_CR_OFF] + ctab[b+B_CR_OFF])
+		 >> SCALEBITS);
+    }
+  }
+}
+
+
+/**************** Cases other than RGB -> YCbCr **************/
+
+
+/*
+ * Fetch some rows of pixels from get_input_row and convert to the
+ * JPEG colorspace.
+ * This version handles RGB->grayscale conversion, which is the same
+ * as the RGB->Y portion of RGB->YCbCr.
+ * We assume rgb_ycc_init has been called (we only use the Y tables).
+ */
+
+METHODDEF void
+get_rgb_gray_rows (compress_info_ptr cinfo,
+		   int rows_to_read, JSAMPIMAGE image_data)
+{
+#ifdef SIXTEEN_BIT_SAMPLES
+  register UINT16 r, g, b;
+#else
+  register int r, g, b;
+#endif
+  register INT32 * ctab = rgb_ycc_tab;
+  register JSAMPROW inptr0, inptr1, inptr2;
+  register JSAMPROW outptr;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+
+  for (row = 0; row < rows_to_read; row++) {
+    /* Read one row from the source file */
+    (*cinfo->methods->get_input_row) (cinfo, pixel_row);
+    /* Convert colorspace */
+    inptr0 = pixel_row[0];
+    inptr1 = pixel_row[1];
+    inptr2 = pixel_row[2];
+    outptr = image_data[0][row];
+    for (col = 0; col < width; col++) {
+      r = GETJSAMPLE(inptr0[col]);
+      g = GETJSAMPLE(inptr1[col]);
+      b = GETJSAMPLE(inptr2[col]);
+      /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations
+       * must be too; we do not need an explicit range-limiting operation.
+       * Hence the value being shifted is never negative, and we don't
+       * need the general RIGHT_SHIFT macro.
+       */
+      /* Y */
+      outptr[col] = (JSAMPLE)
+		((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])
+		 >> SCALEBITS);
     }
   }
 }
 
 
 /*
+ * Initialize for colorspace conversion.
+ */
+
+METHODDEF void
+colorin_init (compress_info_ptr cinfo)
+{
+  /* Allocate a workspace for the result of get_input_row. */
+  pixel_row = (*cinfo->emethods->alloc_small_sarray)
+		(cinfo->image_width, (long) cinfo->input_components);
+}
+
+
+/*
  * Fetch some rows of pixels from get_input_row and convert to the
  * JPEG colorspace.
- * This version handles grayscale (no conversion).
+ * This version handles grayscale output with no conversion.
+ * The source can be either plain grayscale or YCbCr (since Y == gray).
  */
 
 METHODDEF void
@@ -196,6 +313,10 @@
     break;
   }
 
+  /* Standard init/term methods (may override below) */
+  cinfo->methods->colorin_init = colorin_init;
+  cinfo->methods->colorin_term = colorin_term;
+
   /* Check num_components, set conversion method based on requested space */
   switch (cinfo->jpeg_color_space) {
   case CS_GRAYSCALE:
@@ -203,6 +324,11 @@
       ERREXIT(cinfo->emethods, "Bogus JPEG colorspace");
     if (cinfo->in_color_space == CS_GRAYSCALE)
       cinfo->methods->get_sample_rows = get_grayscale_rows;
+    else if (cinfo->in_color_space == CS_RGB) {
+      cinfo->methods->colorin_init = rgb_ycc_init;
+      cinfo->methods->get_sample_rows = get_rgb_gray_rows;
+    } else if (cinfo->in_color_space == CS_YCbCr)
+      cinfo->methods->get_sample_rows = get_grayscale_rows;
     else
       ERREXIT(cinfo->emethods, "Unsupported color conversion request");
     break;
@@ -210,9 +336,10 @@
   case CS_YCbCr:
     if (cinfo->num_components != 3)
       ERREXIT(cinfo->emethods, "Bogus JPEG colorspace");
-    if (cinfo->in_color_space == CS_RGB)
+    if (cinfo->in_color_space == CS_RGB) {
+      cinfo->methods->colorin_init = rgb_ycc_init;
       cinfo->methods->get_sample_rows = get_rgb_ycc_rows;
-    else if (cinfo->in_color_space == CS_YCbCr)
+    } else if (cinfo->in_color_space == CS_YCbCr)
       cinfo->methods->get_sample_rows = get_noconvert_rows;
     else
       ERREXIT(cinfo->emethods, "Unsupported color conversion request");
@@ -231,7 +358,4 @@
     ERREXIT(cinfo->emethods, "Unsupported JPEG colorspace");
     break;
   }
-
-  cinfo->methods->colorin_init = colorin_init;
-  cinfo->methods->colorin_term = colorin_term;
 }
diff --git a/jcdeflts.c b/jcdeflts.c
index fe25f74..48f4ed4 100644
--- a/jcdeflts.c
+++ b/jcdeflts.c
@@ -39,7 +39,7 @@
 
 
 /*
- * Table setup routines
+ * Huffman table setup routines
  */
 
 LOCAL void
@@ -50,10 +50,8 @@
   if (*htblptr == NULL)
     *htblptr = (HUFF_TBL *) (*cinfo->emethods->alloc_small) (SIZEOF(HUFF_TBL));
   
-  memcpy((void *) (*htblptr)->bits, (const void *) bits,
-	 SIZEOF((*htblptr)->bits));
-  memcpy((void *) (*htblptr)->huffval, (const void *) val,
-	 SIZEOF((*htblptr)->huffval));
+  MEMCOPY((*htblptr)->bits, bits, SIZEOF((*htblptr)->bits));
+  MEMCOPY((*htblptr)->huffval, val, SIZEOF((*htblptr)->huffval));
 
   /* Initialize sent_table FALSE so table will be written to JPEG file.
    * In an application where we are writing non-interchange JPEG files,
@@ -141,43 +139,19 @@
 }
 
 
-/* This is the sample quantization table given in the JPEG spec section K.1,
- * but expressed in zigzag order (as are all of our quant. tables).
- * The spec says that the values given produce "good" quality, and
- * when divided by 2, "very good" quality.  (These two settings are
- * selected by quality=50 and quality=75 in j_set_quality, below.)
+/*
+ * Quantization table setup routines
  */
 
-
-static const QUANT_VAL std_luminance_quant_tbl[DCTSIZE2] = {
-  16,  11,  12,  14,  12,  10,  16,  14,
-  13,  14,  18,  17,  16,  19,  24,  40,
-  26,  24,  22,  22,  24,  49,  35,  37,
-  29,  40,  58,  51,  61,  60,  57,  51,
-  56,  55,  64,  72,  92,  78,  64,  68,
-  87,  69,  55,  56,  80, 109,  81,  87,
-  95,  98, 103, 104, 103,  62,  77, 113,
- 121, 112, 100, 120,  92, 101, 103,  99
-};
-
-static const QUANT_VAL std_chrominance_quant_tbl[DCTSIZE2] = {
-  17,  18,  18,  24,  21,  24,  47,  26,
-  26,  47,  99,  66,  56,  66,  99,  99,
-  99,  99,  99,  99,  99,  99,  99,  99,
-  99,  99,  99,  99,  99,  99,  99,  99,
-  99,  99,  99,  99,  99,  99,  99,  99,
-  99,  99,  99,  99,  99,  99,  99,  99,
-  99,  99,  99,  99,  99,  99,  99,  99,
-  99,  99,  99,  99,  99,  99,  99,  99
-};
-
-
-LOCAL void
-add_quant_table (compress_info_ptr cinfo, int which_tbl,
-		 const QUANT_VAL *basic_table, int scale_factor,
-		 boolean force_baseline)
-/* Define a quantization table equal to the basic_table times */
-/* a scale factor (given as a percentage) */
+GLOBAL void
+j_add_quant_table (compress_info_ptr cinfo, int which_tbl,
+		   const QUANT_VAL *basic_table, int scale_factor,
+		   boolean force_baseline)
+/* Define a quantization table equal to the basic_table times
+ * a scale factor (given as a percentage).
+ * If force_baseline is TRUE, the computed quantization table entries
+ * are limited to 1..255 for JPEG baseline compatibility.
+ */
 {
   QUANT_TBL_PTR * qtblptr = & cinfo->quant_tbl_ptrs[which_tbl];
   int i;
@@ -202,24 +176,20 @@
 }
 
 
-GLOBAL void
-j_set_quality (compress_info_ptr cinfo, int quality, boolean force_baseline)
-/* Set or change the 'quality' (quantization) setting. */
-/* The 'quality' factor should be 0 (terrible) to 100 (very good). */
-/* Quality 50 corresponds to the JPEG basic tables given above; */
-/* quality 100 results in no quantization scaling at all. */
-/* If force_baseline is TRUE, quantization table entries are limited */
-/* to 0..255 for JPEG baseline compatibility; this is only an issue */
-/* for quality settings below 24. */
+GLOBAL int
+j_quality_scaling (int quality)
+/* Convert a user-specified quality rating to a percentage scaling factor
+ * for an underlying quantization table, using our recommended scaling curve.
+ * The input 'quality' factor should be 0 (terrible) to 100 (very good).
+ */
 {
   /* Safety limit on quality factor.  Convert 0 to 1 to avoid zero divide. */
   if (quality <= 0) quality = 1;
   if (quality > 100) quality = 100;
 
-  /* Convert quality rating to a percentage scaling of the basic tables.
-   * The basic table is used as-is (scaling 100) for a quality of 50.
+  /* The basic table is used as-is (scaling 100) for a quality of 50.
    * Qualities 50..100 are converted to scaling percentage 200 - 2*Q;
-   * note that at Q=100 the scaling is 0, which will cause add_quant_table
+   * note that at Q=100 the scaling is 0, which will cause j_add_quant_table
    * to make all the table entries 1 (hence, no quantization loss).
    * Qualities 1..50 are converted to scaling percentage 5000/Q.
    */
@@ -228,9 +198,53 @@
   else
     quality = 200 - quality*2;
 
+  return quality;
+}
+
+
+GLOBAL void
+j_set_quality (compress_info_ptr cinfo, int quality, boolean force_baseline)
+/* Set or change the 'quality' (quantization) setting, using default tables.
+ * This is the standard quality-adjusting entry point for typical user
+ * interfaces; only those who want detailed control over quantization tables
+ * would use the preceding two routines directly.
+ */
+{
+  /* This is the sample quantization table given in the JPEG spec section K.1,
+   * but expressed in zigzag order (as are all of our quant. tables).
+   * The spec says that the values given produce "good" quality, and
+   * when divided by 2, "very good" quality.  (These two settings are
+   * selected by quality=50 and quality=75 respectively.)
+   */
+  static const QUANT_VAL std_luminance_quant_tbl[DCTSIZE2] = {
+    16,  11,  12,  14,  12,  10,  16,  14,
+    13,  14,  18,  17,  16,  19,  24,  40,
+    26,  24,  22,  22,  24,  49,  35,  37,
+    29,  40,  58,  51,  61,  60,  57,  51,
+    56,  55,  64,  72,  92,  78,  64,  68,
+    87,  69,  55,  56,  80, 109,  81,  87,
+    95,  98, 103, 104, 103,  62,  77, 113,
+    121, 112, 100, 120,  92, 101, 103,  99
+    };
+  static const QUANT_VAL std_chrominance_quant_tbl[DCTSIZE2] = {
+    17,  18,  18,  24,  21,  24,  47,  26,
+    26,  47,  99,  66,  56,  66,  99,  99,
+    99,  99,  99,  99,  99,  99,  99,  99,
+    99,  99,  99,  99,  99,  99,  99,  99,
+    99,  99,  99,  99,  99,  99,  99,  99,
+    99,  99,  99,  99,  99,  99,  99,  99,
+    99,  99,  99,  99,  99,  99,  99,  99,
+    99,  99,  99,  99,  99,  99,  99,  99
+    };
+
+  /* Convert user 0-100 rating to percentage scaling */
+  quality = j_quality_scaling(quality);
+
   /* Set up two quantization tables using the specified quality scaling */
-  add_quant_table(cinfo, 0, std_luminance_quant_tbl, quality, force_baseline);
-  add_quant_table(cinfo, 1, std_chrominance_quant_tbl, quality, force_baseline);
+  j_add_quant_table(cinfo, 0, std_luminance_quant_tbl,
+		    quality, force_baseline);
+  j_add_quant_table(cinfo, 1, std_chrominance_quant_tbl,
+		    quality, force_baseline);
 }
 
 
@@ -347,8 +361,12 @@
   /* By default, use the simpler non-cosited sampling alignment */
   cinfo->CCIR601_sampling = FALSE;
 
+  /* No input smoothing */
+  cinfo->smoothing_factor = 0;
+
   /* No restart markers */
   cinfo->restart_interval = 0;
+  cinfo->restart_in_rows = 0;
 
   /* Install default do-nothing progress monitoring method. */
   cinfo->methods->progress_monitor = progress_monitor;
diff --git a/jcexpand.c b/jcexpand.c
index 1f20938..6b42b71 100644
--- a/jcexpand.c
+++ b/jcexpand.c
@@ -16,7 +16,7 @@
  * Expand an image so that it is a multiple of the MCU dimensions.
  * This is to be accomplished by duplicating the rightmost column
  * and/or bottommost row of pixels.  The image has not yet been
- * subsampled, so all components have the same dimensions.
+ * downsampled, so all components have the same dimensions.
  */
 
 METHODDEF void
diff --git a/jchuff.c b/jchuff.c
index 07bff64..3f657db 100644
--- a/jchuff.c
+++ b/jchuff.c
@@ -7,7 +7,7 @@
  *
  * This file contains Huffman entropy encoding routines.
  * These routines are invoked via the methods entropy_encode,
- * entropy_encoder_init/term, and entropy_optimize.
+ * entropy_encode_init/term, and entropy_optimize.
  */
 
 #include "jinclude.h"
@@ -66,7 +66,7 @@
   /* Set any codeless symbols to have code length 0;
    * this allows emit_bits to detect any attempt to emit such symbols.
    */
-  MEMZERO((void *) htbl->ehufsi, SIZEOF(htbl->ehufsi));
+  MEMZERO(htbl->ehufsi, SIZEOF(htbl->ehufsi));
 
   for (p = 0; p < lastp; p++) {
     htbl->ehufco[htbl->huffval[p]] = huffcode[p];
@@ -104,6 +104,7 @@
  * between calls, so 24 bits are sufficient.
  */
 
+INLINE
 LOCAL void
 emit_bits (UINT16 code, int size)
 {
@@ -379,8 +380,8 @@
 
   /* This algorithm is explained in section K.2 of the JPEG standard */
 
-  MEMZERO((void *) bits, SIZEOF(bits));
-  MEMZERO((void *) codesize, SIZEOF(codesize));
+  MEMZERO(bits, SIZEOF(bits));
+  MEMZERO(codesize, SIZEOF(codesize));
   for (i = 0; i < 257; i++)
     others[i] = -1;		/* init links to empty */
   
@@ -482,7 +483,7 @@
   bits[i]--;
   
   /* Return final symbol counts (only for lengths 0..16) */
-  memcpy((void *) htbl->bits, (void *) bits, SIZEOF(htbl->bits));
+  MEMCOPY(htbl->bits, bits, SIZEOF(htbl->bits));
   
   /* Return a list of the symbols sorted by code length */
   /* It's not real clear to me why we don't need to consider the codelength
@@ -623,14 +624,14 @@
     if (dc_count_ptrs[tbl] == NULL) {
       dc_count_ptrs[tbl] = (long *) (*cinfo->emethods->alloc_small)
 					(257 * SIZEOF(long));
-      MEMZERO((void *) dc_count_ptrs[tbl], 257 * SIZEOF(long));
+      MEMZERO(dc_count_ptrs[tbl], 257 * SIZEOF(long));
     }
     /* Create AC table */
     tbl = cinfo->cur_comp_info[i]->ac_tbl_no;
     if (ac_count_ptrs[tbl] == NULL) {
       ac_count_ptrs[tbl] = (long *) (*cinfo->emethods->alloc_small)
 					(257 * SIZEOF(long));
-      MEMZERO((void *) ac_count_ptrs[tbl], 257 * SIZEOF(long));
+      MEMZERO(ac_count_ptrs[tbl], 257 * SIZEOF(long));
     }
   }
 
@@ -683,9 +684,9 @@
 jselchuffman (compress_info_ptr cinfo)
 {
   if (! cinfo->arith_code) {
-    cinfo->methods->entropy_encoder_init = huff_init;
+    cinfo->methods->entropy_encode_init = huff_init;
     cinfo->methods->entropy_encode = huff_encode;
-    cinfo->methods->entropy_encoder_term = huff_term;
+    cinfo->methods->entropy_encode_term = huff_term;
 #ifdef ENTROPY_OPT_SUPPORTED
     cinfo->methods->entropy_optimize = huff_optimize;
     /* The standard Huffman tables are only valid for 8-bit data precision.
diff --git a/jcmain.c b/jcmain.c
index 8263426..fd6d831 100644
--- a/jcmain.c
+++ b/jcmain.c
@@ -5,7 +5,7 @@
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
- * This file contains a trivial test user interface for the JPEG compressor.
+ * This file contains a command-line user interface for the JPEG compressor.
  * It should work on any system with Unix- or MS-DOS-style command lines.
  *
  * Two different command line styles are permitted, depending on the
@@ -24,9 +24,13 @@
 #ifdef INCLUDES_ARE_ANSI
 #include <stdlib.h>		/* to declare exit() */
 #endif
+#include <ctype.h>		/* to declare isupper(), tolower() */
 #ifdef NEED_SIGNAL_CATCHER
 #include <signal.h>		/* to declare signal() */
 #endif
+#ifdef USE_SETMODE
+#include <fcntl.h>		/* to declare setmode() */
+#endif
 
 #ifdef THINK_C
 #include <console.h>		/* command-line reader for Macintosh */
@@ -56,13 +60,6 @@
 
 
 /*
- * PD version of getopt(3).
- */
-
-#include "egetopt.c"
-
-
-/*
  * This routine determines what format the input file is,
  * and selects the appropriate input-reading module.
  *
@@ -85,10 +82,10 @@
  * We presently apply this method for Targa files.  Most of the time Targa
  * files start with 0x00, so we recognize that case.  Potentially, however,
  * a Targa file could start with any byte value (byte 0 is the length of the
- * seldom-used ID field), so we accept a -T switch to force Targa input mode.
+ * seldom-used ID field), so we provide a switch to force Targa input mode.
  */
 
-static boolean is_targa;	/* records user -T switch */
+static boolean is_targa;	/* records user -targa switch */
 
 
 LOCAL void
@@ -131,7 +128,7 @@
 #endif
   default:
 #ifdef TARGA_SUPPORTED
-    ERREXIT(cinfo->emethods, "Unrecognized input file format --- did you forget -T ?");
+    ERREXIT(cinfo->emethods, "Unrecognized input file format --- perhaps you need -targa");
 #else
     ERREXIT(cinfo->emethods, "Unrecognized input file format");
 #endif
@@ -177,29 +174,424 @@
 GLOBAL void
 signal_catcher (int signum)
 {
-  emethods->trace_level = 0;	/* turn off trace output */
-  (*emethods->free_all) ();	/* clean up memory allocation & temp files */
+  if (emethods != NULL) {
+    emethods->trace_level = 0;	/* turn off trace output */
+    (*emethods->free_all) ();	/* clean up memory allocation & temp files */
+  }
   exit(EXIT_FAILURE);
 }
 
 #endif
 
 
+/*
+ * Optional routine to display a percent-done figure on stderr.
+ * See jcdeflts.c for explanation of the information used.
+ */
+
+#ifdef PROGRESS_REPORT
+
+METHODDEF void
+progress_monitor (compress_info_ptr cinfo, long loopcounter, long looplimit)
+{
+  if (cinfo->total_passes > 1) {
+    fprintf(stderr, "\rPass %d/%d: %3d%% ",
+	    cinfo->completed_passes+1, cinfo->total_passes,
+	    (int) (loopcounter*100L/looplimit));
+  } else {
+    fprintf(stderr, "\r %3d%% ",
+	    (int) (loopcounter*100L/looplimit));
+  }
+  fflush(stderr);
+}
+
+#endif
+
+
+/*
+ * Argument-parsing code.
+ * The switch parser is designed to be useful with DOS-style command line
+ * syntax, ie, intermixed switches and file names, where only the switches
+ * to the left of a given file name affect processing of that file.
+ * The main program in this file doesn't actually use this capability...
+ */
+
+
+static char * progname;		/* program name for error messages */
+
+
 LOCAL void
-usage (char * progname)
+usage (void)
 /* complain about bad command line */
 {
-  fprintf(stderr, "usage: %s ", progname);
-  fprintf(stderr, "[-Q quality 0..100] [-o] [-T] [-I] [-a] [-d] [-m mem]");
+  fprintf(stderr, "usage: %s [switches] ", progname);
 #ifdef TWO_FILE_COMMANDLINE
-  fprintf(stderr, " inputfile outputfile\n");
+  fprintf(stderr, "inputfile outputfile\n");
 #else
-  fprintf(stderr, " [inputfile]\n");
+  fprintf(stderr, "[inputfile]\n");
 #endif
+
+  fprintf(stderr, "Switches (names may be abbreviated):\n");
+  fprintf(stderr, "  -quality N     Compression quality (0..100; 5-95 is useful range)\n");
+  fprintf(stderr, "  -grayscale     Create monochrome JPEG file\n");
+#ifdef ENTROPY_OPT_SUPPORTED
+  fprintf(stderr, "  -optimize      Optimize Huffman table (smaller file, but slow compression)\n");
+#endif
+#ifdef TARGA_SUPPORTED
+  fprintf(stderr, "  -targa         Input file is Targa format (usually not needed)\n");
+#endif
+  fprintf(stderr, "Switches for advanced users:\n");
+  fprintf(stderr, "  -restart N     Set restart interval in rows, or in blocks with B\n");
+#ifdef INPUT_SMOOTHING_SUPPORTED
+  fprintf(stderr, "  -smooth N      Smooth dithered input (N=1..100 is strength)\n");
+#endif
+  fprintf(stderr, "  -maxmemory N   Maximum memory to use (in kbytes)\n");
+  fprintf(stderr, "  -verbose  or  -debug   Emit debug output\n");
+  fprintf(stderr, "Switches for wizards:\n");
+#ifdef C_ARITH_CODING_SUPPORTED
+  fprintf(stderr, "  -arithmetic    Use arithmetic coding\n");
+#endif
+#ifdef C_MULTISCAN_FILES_SUPPORTED
+  fprintf(stderr, "  -nointerleave  Create noninterleaved JPEG file\n");
+#endif
+  fprintf(stderr, "  -qtables file  Use quantization tables given in file\n");
+  fprintf(stderr, "  -sample HxV[,...]  Set JPEG sampling factors\n");
   exit(EXIT_FAILURE);
 }
 
 
+LOCAL boolean
+keymatch (char * arg, const char * keyword, int minchars)
+/* Case-insensitive matching of (possibly abbreviated) keyword switches. */
+/* keyword is the constant keyword (must be lower case already), */
+/* minchars is length of minimum legal abbreviation. */
+{
+  register int ca, ck;
+  register int nmatched = 0;
+
+  while ((ca = *arg++) != '\0') {
+    if ((ck = *keyword++) == '\0')
+      return FALSE;		/* arg longer than keyword, no good */
+    if (isupper(ca))		/* force arg to lcase (assume ck is already) */
+      ca = tolower(ca);
+    if (ca != ck)
+      return FALSE;		/* no good */
+    nmatched++;			/* count matched characters */
+  }
+  /* reached end of argument; fail if it's too short for unique abbrev */
+  if (nmatched < minchars)
+    return FALSE;
+  return TRUE;			/* A-OK */
+}
+
+
+LOCAL int
+qt_getc (FILE * file)
+/* Read next char, skipping over any comments (# to end of line) */
+/* A comment/newline sequence is returned as a newline */
+{
+  register int ch;
+  
+  ch = getc(file);
+  if (ch == '#') {
+    do {
+      ch = getc(file);
+    } while (ch != '\n' && ch != EOF);
+  }
+  return ch;
+}
+
+
+LOCAL long
+read_qt_integer (FILE * file)
+/* Read an unsigned decimal integer from a quantization-table file */
+/* Swallows one trailing character after the integer */
+{
+  register int ch;
+  register long val;
+  
+  /* Skip any leading whitespace, detect EOF */
+  do {
+    ch = qt_getc(file);
+    if (ch == EOF)
+      return EOF;
+  } while (isspace(ch));
+  
+  if (! isdigit(ch)) {
+    fprintf(stderr, "%s: bogus data in quantization file\n", progname);
+    exit(EXIT_FAILURE);
+  }
+
+  val = ch - '0';
+  while (ch = qt_getc(file), isdigit(ch)) {
+    val *= 10;
+    val += ch - '0';
+  }
+  return val;
+}
+
+
+LOCAL void
+read_quant_tables (compress_info_ptr cinfo, char * filename, int scale_factor)
+/* Read a set of quantization tables from the specified file.
+ * The file is plain ASCII text: decimal numbers with whitespace between.
+ * Comments preceded by '#' may be included in the file.
+ * There may be one to NUM_QUANT_TBLS tables in the file, each of 64 values.
+ * The tables are implicitly numbered 0,1,etc.
+ */
+{
+  /* ZIG[i] is the zigzag-order position of the i'th element of a DCT block */
+  /* read in natural order (left to right, top to bottom). */
+  static const short ZIG[DCTSIZE2] = {
+     0,  1,  5,  6, 14, 15, 27, 28,
+     2,  4,  7, 13, 16, 26, 29, 42,
+     3,  8, 12, 17, 25, 30, 41, 43,
+     9, 11, 18, 24, 31, 40, 44, 53,
+    10, 19, 23, 32, 39, 45, 52, 54,
+    20, 22, 33, 38, 46, 51, 55, 60,
+    21, 34, 37, 47, 50, 56, 59, 61,
+    35, 36, 48, 49, 57, 58, 62, 63
+    };
+  FILE * fp;
+  int tblno, i;
+  long val;
+  QUANT_TBL table;
+
+  if ((fp = fopen(filename, "r")) == NULL) {
+    fprintf(stderr, "%s: can't open %s\n", progname, filename);
+    exit(EXIT_FAILURE);
+  }
+  tblno = 0;
+
+  while ((val = read_qt_integer(fp)) != EOF) { /* read 1st element of table */
+    if (tblno >= NUM_QUANT_TBLS) {
+      fprintf(stderr, "%s: too many tables in file %s\n", progname, filename);
+      exit(EXIT_FAILURE);
+    }
+    table[0] = (QUANT_VAL) val;
+    for (i = 1; i < DCTSIZE2; i++) {
+      if ((val = read_qt_integer(fp)) == EOF) {
+	fprintf(stderr, "%s: incomplete table in file %s\n", progname, filename);
+	exit(EXIT_FAILURE);
+      }
+      table[ZIG[i]] = (QUANT_VAL) val;
+    }
+    j_add_quant_table(cinfo, tblno, table, scale_factor, FALSE);
+    tblno++;
+  }
+
+  fclose(fp);
+}
+
+
+LOCAL void
+set_sample_factors (compress_info_ptr cinfo, char *arg)
+/* Process a sample-factors parameter string, of the form */
+/*     HxV[,HxV,...]    */
+{
+#define MAX_COMPONENTS 4	/* # of comp_info slots made by jcdeflts.c */
+  int ci, val1, val2;
+  char ch1, ch2;
+
+  for (ci = 0; ci < MAX_COMPONENTS; ci++) {
+    if (*arg) {
+      ch2 = ',';		/* if not set by sscanf, will be ',' */
+      if (sscanf(arg, "%d%c%d%c", &val1, &ch1, &val2, &ch2) < 3)
+	usage();
+      if ((ch1 != 'x' && ch1 != 'X') || ch2 != ',')
+	usage();		/* syntax check */
+      if (val1 <= 0 || val1 > 4 || val2 <= 0 || val2 > 4) {
+	fprintf(stderr, "JPEG sampling factors must be 1..4\n");
+	exit(EXIT_FAILURE);
+      }
+      cinfo->comp_info[ci].h_samp_factor = val1;
+      cinfo->comp_info[ci].v_samp_factor = val2;
+      while (*arg && *arg++ != ',') /* advance to next segment of arg string */
+	;
+    } else {
+      /* reached end of parameter, set remaining components to 1x1 sampling */
+      cinfo->comp_info[ci].h_samp_factor = 1;
+      cinfo->comp_info[ci].v_samp_factor = 1;
+    }
+  }
+}
+
+
+LOCAL int
+parse_switches (compress_info_ptr cinfo, int last_file_arg_seen,
+		int argc, char **argv)
+/* Initialize cinfo with default switch settings, then parse option switches.
+ * Returns argv[] index of first file-name argument (== argc if none).
+ * Any file names with indexes <= last_file_arg_seen are ignored;
+ * they have presumably been processed in a previous iteration.
+ * (Pass 0 for last_file_arg_seen on the first or only iteration.)
+ */
+{
+  int argn;
+  char * arg;
+  char * qtablefile = NULL;	/* saves -qtables filename if any */
+  int q_scale_factor = 100;	/* default to no scaling for -qtables */
+
+  /* (Re-)initialize the system-dependent error and memory managers. */
+  jselerror(cinfo->emethods);	/* error/trace message routines */
+  jselmemmgr(cinfo->emethods);	/* memory allocation routines */
+  cinfo->methods->c_ui_method_selection = c_ui_method_selection;
+
+  /* Now OK to enable signal catcher. */
+#ifdef NEED_SIGNAL_CATCHER
+  emethods = cinfo->emethods;
+#endif
+
+  /* Set up default JPEG parameters. */
+  /* Note that default -quality level here need not, and does not,
+   * match the default scaling for an explicit -qtables argument.
+   */
+  j_c_defaults(cinfo, 75, FALSE); /* default quality level = 75 */
+  is_targa = FALSE;
+
+  /* Scan command line options, adjust parameters */
+
+  for (argn = 1; argn < argc; argn++) {
+    arg = argv[argn];
+    if (*arg != '-') {
+      /* Not a switch, must be a file name argument */
+      if (argn <= last_file_arg_seen)
+	continue;		/* ignore it if previously processed */
+      break;			/* else done parsing switches */
+    }
+    arg++;			/* advance past switch marker character */
+
+    if (keymatch(arg, "arithmetic", 1)) {
+      /* Use arithmetic coding. */
+#ifdef C_ARITH_CODING_SUPPORTED
+      cinfo->arith_code = TRUE;
+#else
+      fprintf(stderr, "%s: sorry, arithmetic coding not supported\n",
+	      progname);
+      exit(EXIT_FAILURE);
+#endif
+
+    } else if (keymatch(arg, "debug", 1) || keymatch(arg, "verbose", 1)) {
+      /* Enable debug printouts. */
+      /* On first -d, print version identification */
+      if (last_file_arg_seen == 0 && cinfo->emethods->trace_level == 0)
+	fprintf(stderr, "Independent JPEG Group's CJPEG, version %s\n%s\n",
+		JVERSION, JCOPYRIGHT);
+      cinfo->emethods->trace_level++;
+
+    } else if (keymatch(arg, "grayscale", 2) || keymatch(arg, "greyscale",2)) {
+      /* Force a monochrome JPEG file to be generated. */
+      j_monochrome_default(cinfo);
+
+    } else if (keymatch(arg, "maxmemory", 1)) {
+      /* Maximum memory in Kb (or Mb with 'm'). */
+      long lval;
+      char ch = 'x';
+
+      if (++argn >= argc)	/* advance to next argument */
+	usage();
+      if (sscanf(argv[argn], "%ld%c", &lval, &ch) < 1)
+	usage();
+      if (ch == 'm' || ch == 'M')
+	lval *= 1000L;
+      cinfo->emethods->max_memory_to_use = lval * 1000L;
+
+    } else if (keymatch(arg, "nointerleave", 3)) {
+      /* Create noninterleaved file. */
+#ifdef C_MULTISCAN_FILES_SUPPORTED
+      cinfo->interleave = FALSE;
+#else
+      fprintf(stderr, "%s: sorry, multiple-scan support was not compiled\n",
+	      progname);
+      exit(EXIT_FAILURE);
+#endif
+
+    } else if (keymatch(arg, "optimize", 1) || keymatch(arg, "optimise", 1)) {
+      /* Enable entropy parm optimization. */
+#ifdef ENTROPY_OPT_SUPPORTED
+      cinfo->optimize_coding = TRUE;
+#else
+      fprintf(stderr, "%s: sorry, entropy optimization was not compiled\n",
+	      progname);
+      exit(EXIT_FAILURE);
+#endif
+
+    } else if (keymatch(arg, "quality", 1)) {
+      /* Quality factor (quantization table scaling factor). */
+      int val;
+
+      if (++argn >= argc)	/* advance to next argument */
+	usage();
+      if (sscanf(argv[argn], "%d", &val) != 1)
+	usage();
+      /* Set quantization tables (will be overridden if -qtables also given).
+       * Note: we make force_baseline FALSE.
+       * This means non-baseline JPEG files can be created with low Q values.
+       * To ensure only baseline files are generated, pass TRUE instead.
+       */
+      j_set_quality(cinfo, val, FALSE);
+      /* Change scale factor in case -qtables is present. */
+      q_scale_factor = j_quality_scaling(val);
+
+    } else if (keymatch(arg, "qtables", 2)) {
+      /* Quantization tables fetched from file. */
+      if (++argn >= argc)	/* advance to next argument */
+	usage();
+      qtablefile = argv[argn];
+      /* we postpone actually reading the file in case -quality comes later */
+
+    } else if (keymatch(arg, "restart", 1)) {
+      /* Restart interval in MCU rows (or in MCUs with 'b'). */
+      long lval;
+      char ch = 'x';
+
+      if (++argn >= argc)	/* advance to next argument */
+	usage();
+      if (sscanf(argv[argn], "%ld%c", &lval, &ch) < 1)
+	usage();
+      if (lval < 0 || lval > 65535L)
+	usage();
+      if (ch == 'b' || ch == 'B')
+	cinfo->restart_interval = (UINT16) lval;
+      else
+	cinfo->restart_in_rows = (int) lval;
+
+    } else if (keymatch(arg, "sample", 2)) {
+      /* Set sampling factors. */
+      if (++argn >= argc)	/* advance to next argument */
+	usage();
+      set_sample_factors(cinfo, argv[argn]);
+
+    } else if (keymatch(arg, "smooth", 2)) {
+      /* Set input smoothing factor. */
+      int val;
+
+      if (++argn >= argc)	/* advance to next argument */
+	usage();
+      if (sscanf(argv[argn], "%d", &val) != 1)
+	usage();
+      if (val < 0 || val > 100)
+	usage();
+      cinfo->smoothing_factor = val;
+
+    } else if (keymatch(arg, "targa", 1)) {
+      /* Input file is Targa format. */
+      is_targa = TRUE;
+
+    } else {
+      usage();			/* bogus switch */
+    }
+  }
+
+  /* Post-switch-scanning cleanup */
+
+  if (qtablefile != NULL)	/* process -qtables if it was present */
+    read_quant_tables(cinfo, qtablefile, q_scale_factor);
+
+  return argn;			/* return index of next arg (file name) */
+}
+
+
 /*
  * The main program.
  */
@@ -207,124 +599,47 @@
 GLOBAL int
 main (int argc, char **argv)
 {
-  struct compress_info_struct cinfo;
-  struct compress_methods_struct c_methods;
-  struct external_methods_struct e_methods;
-  int c;
+  struct Compress_info_struct cinfo;
+  struct Compress_methods_struct c_methods;
+  struct External_methods_struct e_methods;
+  int file_index;
 
   /* On Mac, fetch a command line. */
 #ifdef THINK_C
   argc = ccommand(&argv);
 #endif
 
-  /* Initialize the system-dependent method pointers. */
+  progname = argv[0];
+
+  /* Set up links to method structures. */
   cinfo.methods = &c_methods;
   cinfo.emethods = &e_methods;
-  jselerror(&e_methods);	/* error/trace message routines */
-  jselmemmgr(&e_methods);	/* memory allocation routines */
-  c_methods.c_ui_method_selection = c_ui_method_selection;
 
-  /* Now OK to enable signal catcher. */
+  /* Install, but don't yet enable signal catcher. */
 #ifdef NEED_SIGNAL_CATCHER
-  emethods = &e_methods;
+  emethods = NULL;
   signal(SIGINT, signal_catcher);
 #ifdef SIGTERM			/* not all systems have SIGTERM */
   signal(SIGTERM, signal_catcher);
 #endif
 #endif
 
-  /* Set up default JPEG parameters. */
-  j_c_defaults(&cinfo, 75, FALSE); /* default quality level = 75 */
-  is_targa = FALSE;
+  /* Scan command line: set up compression parameters, input & output files. */
 
-  /* Scan command line options, adjust parameters */
-  
-  while ((c = egetopt(argc, argv, "IQ:Taom:d")) != EOF)
-    switch (c) {
-    case 'I':			/* Create noninterleaved file. */
-#ifdef MULTISCAN_FILES_SUPPORTED
-      cinfo.interleave = FALSE;
-#else
-      fprintf(stderr, "%s: sorry, multiple-scan support was not compiled\n",
-	      argv[0]);
-      exit(EXIT_FAILURE);
-#endif
-      break;
-    case 'Q':			/* Quality factor. */
-      { int val;
-	if (optarg == NULL)
-	  usage(argv[0]);
-	if (sscanf(optarg, "%d", &val) != 1)
-	  usage(argv[0]);
-	/* Note: for now, we make force_baseline FALSE.
-	 * This means non-baseline JPEG files can be created with low Q values.
-	 * To ensure only baseline files are generated, pass TRUE instead.
-	 */
-	j_set_quality(&cinfo, val, FALSE);
-      }
-      break;
-    case 'T':			/* Input file is Targa format. */
-      is_targa = TRUE;
-      break;
-    case 'a':			/* Use arithmetic coding. */
-#ifdef ARITH_CODING_SUPPORTED
-      cinfo.arith_code = TRUE;
-#else
-      fprintf(stderr, "%s: sorry, arithmetic coding not supported\n",
-	      argv[0]);
-      exit(EXIT_FAILURE);
-#endif
-      break;
-    case 'o':			/* Enable entropy parm optimization. */
-#ifdef ENTROPY_OPT_SUPPORTED
-      cinfo.optimize_coding = TRUE;
-#else
-      fprintf(stderr, "%s: sorry, entropy optimization was not compiled\n",
-	      argv[0]);
-      exit(EXIT_FAILURE);
-#endif
-      break;
-    case 'm':			/* Maximum memory in Kb (or Mb with 'm'). */
-      { long lval;
-	char ch = 'x';
-
-	if (optarg == NULL)
-	  usage(argv[0]);
-	if (sscanf(optarg, "%ld%c", &lval, &ch) < 1)
-	  usage(argv[0]);
-	if (ch == 'm' || ch == 'M')
-	  lval *= 1000L;
-	e_methods.max_memory_to_use = lval * 1000L;
-      }
-      break;
-    case 'd':			/* Debugging. */
-      e_methods.trace_level++;
-      break;
-    case '?':
-    default:
-      usage(argv[0]);
-      break;
-    }
-
-  /* If -d appeared, print version identification */
-  if (e_methods.trace_level > 0)
-    fprintf(stderr, "Independent JPEG Group's CJPEG, version %s\n%s\n",
-	    JVERSION, JCOPYRIGHT);
-
-  /* Select the input and output files */
+  file_index = parse_switches(&cinfo, 0, argc, argv);
 
 #ifdef TWO_FILE_COMMANDLINE
 
-  if (optind != argc-2) {
-    fprintf(stderr, "%s: must name one input and one output file\n", argv[0]);
-    usage(argv[0]);
+  if (file_index != argc-2) {
+    fprintf(stderr, "%s: must name one input and one output file\n", progname);
+    usage();
   }
-  if ((cinfo.input_file = fopen(argv[optind], READ_BINARY)) == NULL) {
-    fprintf(stderr, "%s: can't open %s\n", argv[0], argv[optind]);
+  if ((cinfo.input_file = fopen(argv[file_index], READ_BINARY)) == NULL) {
+    fprintf(stderr, "%s: can't open %s\n", progname, argv[file_index]);
     exit(EXIT_FAILURE);
   }
-  if ((cinfo.output_file = fopen(argv[optind+1], WRITE_BINARY)) == NULL) {
-    fprintf(stderr, "%s: can't open %s\n", argv[0], argv[optind+1]);
+  if ((cinfo.output_file = fopen(argv[file_index+1], WRITE_BINARY)) == NULL) {
+    fprintf(stderr, "%s: can't open %s\n", progname, argv[file_index+1]);
     exit(EXIT_FAILURE);
   }
 
@@ -333,13 +648,18 @@
   cinfo.input_file = stdin;	/* default input file */
   cinfo.output_file = stdout;	/* always the output file */
 
-  if (optind < argc-1) {
-    fprintf(stderr, "%s: only one input file\n", argv[0]);
-    usage(argv[0]);
+#ifdef USE_SETMODE		/* need to hack file mode? */
+  setmode(fileno(stdin), O_BINARY);
+  setmode(fileno(stdout), O_BINARY);
+#endif
+
+  if (file_index < argc-1) {
+    fprintf(stderr, "%s: only one input file\n", progname);
+    usage();
   }
-  if (optind < argc) {
-    if ((cinfo.input_file = fopen(argv[optind], READ_BINARY)) == NULL) {
-      fprintf(stderr, "%s: can't open %s\n", argv[0], argv[optind]);
+  if (file_index < argc) {
+    if ((cinfo.input_file = fopen(argv[file_index], READ_BINARY)) == NULL) {
+      fprintf(stderr, "%s: can't open %s\n", progname, argv[file_index]);
       exit(EXIT_FAILURE);
     }
   }
@@ -349,9 +669,23 @@
   /* Figure out the input file format, and set up to read it. */
   select_file_type(&cinfo);
 
+#ifdef PROGRESS_REPORT
+  /* Start up progress display, unless trace output is on */
+  if (e_methods.trace_level == 0)
+    c_methods.progress_monitor = progress_monitor;
+#endif
+
   /* Do it to it! */
   jpeg_compress(&cinfo);
 
+#ifdef PROGRESS_REPORT
+  /* Clear away progress display */
+  if (e_methods.trace_level == 0) {
+    fprintf(stderr, "\r                \r");
+    fflush(stderr);
+  }
+#endif
+
   /* All done. */
   exit(EXIT_SUCCESS);
   return 0;			/* suppress no-return-value warnings */
diff --git a/jcmaster.c b/jcmaster.c
index ec5c96d..b192aae 100644
--- a/jcmaster.c
+++ b/jcmaster.c
@@ -19,8 +19,8 @@
 {
   /* Edge expansion */
   jselexpand(cinfo);
-  /* Subsampling of pixels */
-  jselsubsample(cinfo);
+  /* Downsampling of pixels */
+  jseldownsample(cinfo);
   /* MCU extraction */
   jselcmcu(cinfo);
 }
@@ -36,7 +36,7 @@
   /* Gamma and color space conversion */
   jselccolor(cinfo);
   /* Entropy encoding: either Huffman or arithmetic coding. */
-#ifdef ARITH_CODING_SUPPORTED
+#ifdef C_ARITH_CODING_SUPPORTED
   jselcarithmetic(cinfo);
 #else
   cinfo->arith_code = FALSE;	/* force Huffman mode */
@@ -71,7 +71,7 @@
 
   }
 
-  /* Compute logical subsampled dimensions of components */
+  /* Compute logical downsampled dimensions of components */
   for (ci = 0; ci < cinfo->num_components; ci++) {
     compptr = &cinfo->comp_info[ci];
     compptr->true_comp_width = (cinfo->image_width * compptr->h_samp_factor
diff --git a/jcmcu.c b/jcmcu.c
index b1b15a8..e921b49 100644
--- a/jcmcu.c
+++ b/jcmcu.c
@@ -59,31 +59,31 @@
   { register JSAMPROW elemptr;
     register DCTELEM *localblkptr = block;
 #if DCTSIZE != 8
-    register short elemc;
+    register int elemc;
 #endif
-    register short elemr;
+    register int elemr;
 
     for (elemr = DCTSIZE; elemr > 0; elemr--) {
       elemptr = input_data[start_row++] + start_col;
 #if DCTSIZE == 8		/* unroll the inner loop */
-      *localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
-      *localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
-      *localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
-      *localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
-      *localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
-      *localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
-      *localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
-      *localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
+      *localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
+      *localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
+      *localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
+      *localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
+      *localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
+      *localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
+      *localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
+      *localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
 #else
       for (elemc = DCTSIZE; elemc > 0; elemc--) {
-	*localblkptr++ = (DCTELEM) GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
+	*localblkptr++ = (DCTELEM) (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
       }
 #endif
     }
   }
 
 #ifdef DCT_ERR_STATS
-  memcpy((void *) svblock, (void *) block, SIZEOF(DCTBLOCK));
+  MEMCOPY(svblock, block, SIZEOF(DCTBLOCK));
 #endif
 
   j_fwd_dct(block);
diff --git a/jconfig.h b/jconfig.h
index 92b697e..aeb29fe 100644
--- a/jconfig.h
+++ b/jconfig.h
@@ -106,9 +106,11 @@
  */
 
 #ifdef MSDOS			/* two-file style is needed for PCs */
+#ifndef USE_SETMODE		/* unless you have setmode() */
 #define TWO_FILE_COMMANDLINE
 #endif
-#ifdef THINK_C			/* needed for Macintosh too */
+#endif
+#ifdef THINK_C			/* it's needed for Macintosh too */
 #define TWO_FILE_COMMANDLINE
 #endif
 
@@ -135,6 +137,17 @@
  */
 
 
+/* If your compiler supports inline functions, define INLINE
+ * as the inline keyword; otherwise define it as empty.
+ */
+
+#ifdef __GNUC__			/* for instance, GNU C knows about inline */
+#define INLINE __inline__
+#endif
+#ifndef INLINE			/* default is to define it as empty */
+#define INLINE
+#endif
+
 /* On a few systems, type boolean and/or macros FALSE, TRUE may appear
  * in standard header files.  Or you may have conflicts with application-
  * specific header files that you want to include together with these files.
@@ -166,10 +179,16 @@
  */
 
 /* Arithmetic coding is unsupported for legal reasons.  Complaints to IBM. */
-#undef  ARITH_CODING_SUPPORTED	/* Arithmetic coding back end? */
-#define MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files? */
-#define ENTROPY_OPT_SUPPORTED	/* Optimization of entropy coding parms? */
-#define BLOCK_SMOOTHING_SUPPORTED /* Block smoothing during decoding? */
+
+/* Encoder capability options: */
+#undef  C_ARITH_CODING_SUPPORTED    /* Arithmetic coding back end? */
+#undef  C_MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files?  (NYI) */
+#define ENTROPY_OPT_SUPPORTED	    /* Optimization of entropy coding parms? */
+#define INPUT_SMOOTHING_SUPPORTED   /* Input image smoothing option? */
+/* Decoder capability options: */
+#undef  D_ARITH_CODING_SUPPORTED    /* Arithmetic coding back end? */
+#define D_MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files? */
+#define BLOCK_SMOOTHING_SUPPORTED   /* Block smoothing during decoding? */
 #define QUANT_1PASS_SUPPORTED	/* 1-pass color quantization? */
 #define QUANT_2PASS_SUPPORTED	/* 2-pass color quantization? */
 /* these defines indicate which JPEG file formats are allowed */
diff --git a/jcpipe.c b/jcpipe.c
index eca34ac..6b64c4f 100644
--- a/jcpipe.c
+++ b/jcpipe.c
@@ -30,15 +30,15 @@
 /*
  * About the data structures:
  *
- * The processing chunk size for subsampling is referred to in this file as
+ * The processing chunk size for downsampling is referred to in this file as
  * a "row group": a row group is defined as Vk (v_samp_factor) sample rows of
- * any component after subsampling, or Vmax (max_v_samp_factor) unsubsampled
+ * any component after downsampling, or Vmax (max_v_samp_factor) unsubsampled
  * rows.  In an interleaved scan each MCU row contains exactly DCTSIZE row
  * groups of each component in the scan.  In a noninterleaved scan an MCU row
  * is one row of blocks, which might not be an integral number of row groups;
  * for convenience we use a buffer of the same size as in interleaved scans,
- * and process Vk MCU rows in each burst of subsampling.
- * To provide context for the subsampling step, we have to retain the last
+ * and process Vk MCU rows in each burst of downsampling.
+ * To provide context for the downsampling step, we have to retain the last
  * two row groups of the previous MCU row while reading in the next MCU row
  * (or set of Vk MCU rows).  To do this without copying data about, we create
  * a rather strange data structure.  Exactly DCTSIZE+2 row groups of samples
@@ -82,12 +82,12 @@
     compptr->MCU_height = compptr->v_samp_factor;
     compptr->MCU_blocks = compptr->MCU_width * compptr->MCU_height;
     /* compute physical dimensions of component */
-    compptr->subsampled_width = jround_up(compptr->true_comp_width,
-					  (long) (compptr->MCU_width*DCTSIZE));
-    compptr->subsampled_height = jround_up(compptr->true_comp_height,
-					   (long) (compptr->MCU_height*DCTSIZE));
+    compptr->downsampled_width = jround_up(compptr->true_comp_width,
+					   (long) (compptr->MCU_width*DCTSIZE));
+    compptr->downsampled_height = jround_up(compptr->true_comp_height,
+					    (long) (compptr->MCU_height*DCTSIZE));
     /* Sanity check */
-    if (compptr->subsampled_width !=
+    if (compptr->downsampled_width !=
 	(cinfo->MCUs_per_row * (compptr->MCU_width*DCTSIZE)))
       ERREXIT(cinfo->emethods, "I'm confused about the image width");
     /* Prepare array describing MCU composition */
@@ -99,6 +99,13 @@
     }
   }
 
+  /* Convert restart specified in rows to actual MCU count. */
+  /* Note that count must fit in 16 bits, so we provide limiting. */
+  if (cinfo->restart_in_rows > 0) {
+    long nominal = cinfo->restart_in_rows * cinfo->MCUs_per_row;
+    cinfo->restart_interval = (UINT16) MIN(nominal, 65535L);
+  }
+
   (*cinfo->methods->c_per_scan_method_selection) (cinfo);
 }
 
@@ -115,18 +122,25 @@
   compptr->MCU_height = 1;
   compptr->MCU_blocks = 1;
   /* compute physical dimensions of component */
-  compptr->subsampled_width = jround_up(compptr->true_comp_width,
-					(long) DCTSIZE);
-  compptr->subsampled_height = jround_up(compptr->true_comp_height,
+  compptr->downsampled_width = jround_up(compptr->true_comp_width,
 					 (long) DCTSIZE);
+  compptr->downsampled_height = jround_up(compptr->true_comp_height,
+					  (long) DCTSIZE);
 
-  cinfo->MCUs_per_row = compptr->subsampled_width / DCTSIZE;
-  cinfo->MCU_rows_in_scan = compptr->subsampled_height / DCTSIZE;
+  cinfo->MCUs_per_row = compptr->downsampled_width / DCTSIZE;
+  cinfo->MCU_rows_in_scan = compptr->downsampled_height / DCTSIZE;
 
   /* Prepare array describing MCU composition */
   cinfo->blocks_in_MCU = 1;
   cinfo->MCU_membership[0] = 0;
 
+  /* Convert restart specified in rows to actual MCU count. */
+  /* Note that count must fit in 16 bits, so we provide limiting. */
+  if (cinfo->restart_in_rows > 0) {
+    long nominal = cinfo->restart_in_rows * cinfo->MCUs_per_row;
+    cinfo->restart_interval = (UINT16) MIN(nominal, 65535L);
+  }
+
   (*cinfo->methods->c_per_scan_method_selection) (cinfo);
 }
 
@@ -135,7 +149,7 @@
 LOCAL void
 alloc_sampling_buffer (compress_info_ptr cinfo, JSAMPIMAGE fullsize_data[2],
 		       long fullsize_width)
-/* Create a pre-subsampling data buffer having the desired structure */
+/* Create a pre-downsampling data buffer having the desired structure */
 /* (see comments at head of file) */
 {
   short ci, vs, i;
@@ -193,18 +207,18 @@
 
 
 LOCAL void
-subsample (compress_info_ptr cinfo,
-	   JSAMPIMAGE fullsize_data, JSAMPIMAGE subsampled_data,
-	   long fullsize_width,
-	   short above, short current, short below, short out)
-/* Do subsampling of a single row group (of each component). */
+downsample (compress_info_ptr cinfo,
+	    JSAMPIMAGE fullsize_data, JSAMPIMAGE sampled_data,
+	    long fullsize_width,
+	    short above, short current, short below, short out)
+/* Do downsampling of a single row group (of each component). */
 /* above, current, below are indexes of row groups in fullsize_data;      */
-/* out is the index of the target row group in subsampled_data.           */
+/* out is the index of the target row group in sampled_data.              */
 /* Special case: above, below can be -1 to indicate top, bottom of image. */
 {
   jpeg_component_info *compptr;
   JSAMPARRAY above_ptr, below_ptr;
-  JSAMPROW dummy[MAX_SAMP_FACTOR]; /* for subsample expansion at top/bottom */
+  JSAMPROW dummy[MAX_SAMP_FACTOR]; /* for downsample expansion at top/bottom */
   short ci, vs, i;
 
   vs = cinfo->max_v_samp_factor; /* row group height */
@@ -231,25 +245,27 @@
       below_ptr = (JSAMPARRAY) dummy; /* possible near->far pointer conv */
     }
 
-    (*cinfo->methods->subsample[ci])
+    (*cinfo->methods->downsample[ci])
 		(cinfo, (int) ci,
 		 fullsize_width, (int) vs,
-		 compptr->subsampled_width, (int) compptr->v_samp_factor,
+		 compptr->downsampled_width, (int) compptr->v_samp_factor,
 		 above_ptr,
 		 fullsize_data[ci] + current * vs,
 		 below_ptr,
-		 subsampled_data[ci] + out * compptr->v_samp_factor);
+		 sampled_data[ci] + out * compptr->v_samp_factor);
   }
 }
 
 
-/* These vars are initialized by the pipeline controller for use by
+/* These variables are initialized by the pipeline controller for use by
  * MCU_output_catcher.
  * To avoid a lot of row-pointer overhead, we cram as many MCUs into each
- * row of whole_scan_MCUs as we can get without exceeding 64KB per row.
+ * row of whole_scan_MCUs as we can get without exceeding 32Kbytes per row.
+ * NOTE: the "arbitrary" constant here must not exceed MAX_ALLOC_CHUNK
+ * defined in jmemsys.h, which is 64K-epsilon in most DOS implementations.
  */
 
-#define MAX_WHOLE_ROW_BLOCKS	((int) (65500 / SIZEOF(JBLOCK))) /* max blocks/row */
+#define MAX_WHOLE_ROW_BLOCKS	((int) (32768L / SIZEOF(JBLOCK))) /* max blocks/row */
 
 static big_barray_ptr whole_scan_MCUs; /* Big array for saving the MCUs */
 static int MCUs_in_big_row;	/* # of MCUs in each row of whole_scan_MCUs */
@@ -342,10 +358,10 @@
   long cur_pixel_row;		/* counts # of pixel rows processed */
   long mcu_rows_output;		/* # of MCU rows actually emitted */
   int mcu_rows_per_loop;	/* # of MCU rows processed per outer loop */
-  /* Work buffer for pre-subsampling data (see comments at head of file) */
+  /* Work buffer for pre-downsampling data (see comments at head of file) */
   JSAMPIMAGE fullsize_data[2];
-  /* Work buffer for subsampled data */
-  JSAMPIMAGE subsampled_data;
+  /* Work buffer for downsampled data */
+  JSAMPIMAGE sampled_data;
   int rows_this_time;
   short ci, whichss, i;
 
@@ -374,14 +390,14 @@
 			     (long) (cinfo->max_h_samp_factor * DCTSIZE));
 
   /* Allocate working memory: */
-  /* fullsize_data is sample data before subsampling */
+  /* fullsize_data is sample data before downsampling */
   alloc_sampling_buffer(cinfo, fullsize_data, fullsize_width);
-  /* subsampled_data is sample data after subsampling */
-  subsampled_data = (JSAMPIMAGE) (*cinfo->emethods->alloc_small)
+  /* sampled_data is sample data after downsampling */
+  sampled_data = (JSAMPIMAGE) (*cinfo->emethods->alloc_small)
 				(cinfo->num_components * SIZEOF(JSAMPARRAY));
   for (ci = 0; ci < cinfo->num_components; ci++) {
-    subsampled_data[ci] = (*cinfo->emethods->alloc_small_sarray)
-			(cinfo->comp_info[ci].subsampled_width,
+    sampled_data[ci] = (*cinfo->emethods->alloc_small_sarray)
+			(cinfo->comp_info[ci].downsampled_width,
 			 (long) (cinfo->comp_info[ci].v_samp_factor * DCTSIZE));
   }
 
@@ -398,8 +414,8 @@
 
   (*cinfo->methods->write_scan_header) (cinfo);
   cinfo->methods->entropy_output = cinfo->methods->write_jpeg_data;
-  (*cinfo->methods->entropy_encoder_init) (cinfo);
-  (*cinfo->methods->subsample_init) (cinfo);
+  (*cinfo->methods->entropy_encode_init) (cinfo);
+  (*cinfo->methods->downsample_init) (cinfo);
   (*cinfo->methods->extract_init) (cinfo);
 
   /* Loop over input image: rows_in_mem pixel rows are processed per loop */
@@ -415,7 +431,7 @@
     whichss ^= 1;		/* switch to other fullsize_data buffer */
     
     /* Obtain rows_this_time pixel rows and expand to rows_in_mem rows. */
-    /* Then we have exactly DCTSIZE row groups for subsampling. */   
+    /* Then we have exactly DCTSIZE row groups for downsampling. */   
     rows_this_time = (int) MIN((long) rows_in_mem,
 			       cinfo->image_height - cur_pixel_row);
  
@@ -426,51 +442,51 @@
 				    fullsize_width, rows_in_mem,
 				    fullsize_data[whichss]);
     
-    /* Subsample the data (all components) */
+    /* Downsample the data (all components) */
     /* First time through is a special case */
     
     if (cur_pixel_row) {
-      /* Subsample last row group of previous set */
-      subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-		(short) DCTSIZE, (short) (DCTSIZE+1), (short) 0,
-		(short) (DCTSIZE-1));
-      /* and dump the previous set's subsampled data */
-      (*cinfo->methods->extract_MCUs) (cinfo, subsampled_data, 
+      /* Downsample last row group of previous set */
+      downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+		 (short) DCTSIZE, (short) (DCTSIZE+1), (short) 0,
+		 (short) (DCTSIZE-1));
+      /* and dump the previous set's downsampled data */
+      (*cinfo->methods->extract_MCUs) (cinfo, sampled_data, 
 				       mcu_rows_per_loop,
 				       cinfo->methods->entropy_encode);
       mcu_rows_output += mcu_rows_per_loop;
-      /* Subsample first row group of this set */
-      subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-		(short) (DCTSIZE+1), (short) 0, (short) 1,
-		(short) 0);
+      /* Downsample first row group of this set */
+      downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+		 (short) (DCTSIZE+1), (short) 0, (short) 1,
+		 (short) 0);
     } else {
-      /* Subsample first row group with dummy above-context */
-      subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-		(short) (-1), (short) 0, (short) 1,
-		(short) 0);
+      /* Downsample first row group with dummy above-context */
+      downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+		 (short) (-1), (short) 0, (short) 1,
+		 (short) 0);
     }
-    /* Subsample second through next-to-last row groups of this set */
+    /* Downsample second through next-to-last row groups of this set */
     for (i = 1; i <= DCTSIZE-2; i++) {
-      subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-		(short) (i-1), (short) i, (short) (i+1),
-		(short) i);
+      downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+		 (short) (i-1), (short) i, (short) (i+1),
+		 (short) i);
     }
   } /* end of outer loop */
   
-  /* Subsample the last row group with dummy below-context */
+  /* Downsample the last row group with dummy below-context */
   /* Note whichss points to last buffer side used */
-  subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-	    (short) (DCTSIZE-2), (short) (DCTSIZE-1), (short) (-1),
-	    (short) (DCTSIZE-1));
+  downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+	     (short) (DCTSIZE-2), (short) (DCTSIZE-1), (short) (-1),
+	     (short) (DCTSIZE-1));
   /* Dump the remaining data (may be less than full height if uninterleaved) */
-  (*cinfo->methods->extract_MCUs) (cinfo, subsampled_data, 
+  (*cinfo->methods->extract_MCUs) (cinfo, sampled_data, 
 		(int) (cinfo->MCU_rows_in_scan - mcu_rows_output),
 		cinfo->methods->entropy_encode);
 
   /* Finish output file */
   (*cinfo->methods->extract_term) (cinfo);
-  (*cinfo->methods->subsample_term) (cinfo);
-  (*cinfo->methods->entropy_encoder_term) (cinfo);
+  (*cinfo->methods->downsample_term) (cinfo);
+  (*cinfo->methods->entropy_encode_term) (cinfo);
   (*cinfo->methods->write_scan_trailer) (cinfo);
   cinfo->completed_passes++;
 
@@ -494,10 +510,10 @@
   long cur_pixel_row;		/* counts # of pixel rows processed */
   long mcu_rows_output;		/* # of MCU rows actually emitted */
   int mcu_rows_per_loop;	/* # of MCU rows processed per outer loop */
-  /* Work buffer for pre-subsampling data (see comments at head of file) */
+  /* Work buffer for pre-downsampling data (see comments at head of file) */
   JSAMPIMAGE fullsize_data[2];
-  /* Work buffer for subsampled data */
-  JSAMPIMAGE subsampled_data;
+  /* Work buffer for downsampled data */
+  JSAMPIMAGE sampled_data;
   int rows_this_time;
   int blocks_in_big_row;
   short ci, whichss, i;
@@ -527,14 +543,14 @@
 			     (long) (cinfo->max_h_samp_factor * DCTSIZE));
 
   /* Allocate working memory: */
-  /* fullsize_data is sample data before subsampling */
+  /* fullsize_data is sample data before downsampling */
   alloc_sampling_buffer(cinfo, fullsize_data, fullsize_width);
-  /* subsampled_data is sample data after subsampling */
-  subsampled_data = (JSAMPIMAGE) (*cinfo->emethods->alloc_small)
+  /* sampled_data is sample data after downsampling */
+  sampled_data = (JSAMPIMAGE) (*cinfo->emethods->alloc_small)
 				(cinfo->num_components * SIZEOF(JSAMPARRAY));
   for (ci = 0; ci < cinfo->num_components; ci++) {
-    subsampled_data[ci] = (*cinfo->emethods->alloc_small_sarray)
-			(cinfo->comp_info[ci].subsampled_width,
+    sampled_data[ci] = (*cinfo->emethods->alloc_small_sarray)
+			(cinfo->comp_info[ci].downsampled_width,
 			 (long) (cinfo->comp_info[ci].v_samp_factor * DCTSIZE));
   }
 
@@ -560,7 +576,7 @@
 
   /* Do per-scan object init */
 
-  (*cinfo->methods->subsample_init) (cinfo);
+  (*cinfo->methods->downsample_init) (cinfo);
   (*cinfo->methods->extract_init) (cinfo);
 
   /* Loop over input image: rows_in_mem pixel rows are processed per loop */
@@ -577,7 +593,7 @@
     whichss ^= 1;		/* switch to other fullsize_data buffer */
     
     /* Obtain rows_this_time pixel rows and expand to rows_in_mem rows. */
-    /* Then we have exactly DCTSIZE row groups for subsampling. */   
+    /* Then we have exactly DCTSIZE row groups for downsampling. */   
     rows_this_time = (int) MIN((long) rows_in_mem,
 			       cinfo->image_height - cur_pixel_row);
  
@@ -588,51 +604,51 @@
 				    fullsize_width, rows_in_mem,
 				    fullsize_data[whichss]);
     
-    /* Subsample the data (all components) */
+    /* Downsample the data (all components) */
     /* First time through is a special case */
     
     if (cur_pixel_row) {
-      /* Subsample last row group of previous set */
-      subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-		(short) DCTSIZE, (short) (DCTSIZE+1), (short) 0,
-		(short) (DCTSIZE-1));
-      /* and dump the previous set's subsampled data */
-      (*cinfo->methods->extract_MCUs) (cinfo, subsampled_data, 
+      /* Downsample last row group of previous set */
+      downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+		 (short) DCTSIZE, (short) (DCTSIZE+1), (short) 0,
+		 (short) (DCTSIZE-1));
+      /* and dump the previous set's downsampled data */
+      (*cinfo->methods->extract_MCUs) (cinfo, sampled_data, 
 				       mcu_rows_per_loop,
 				       MCU_output_catcher);
       mcu_rows_output += mcu_rows_per_loop;
-      /* Subsample first row group of this set */
-      subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-		(short) (DCTSIZE+1), (short) 0, (short) 1,
-		(short) 0);
+      /* Downsample first row group of this set */
+      downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+		 (short) (DCTSIZE+1), (short) 0, (short) 1,
+		 (short) 0);
     } else {
-      /* Subsample first row group with dummy above-context */
-      subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-		(short) (-1), (short) 0, (short) 1,
-		(short) 0);
+      /* Downsample first row group with dummy above-context */
+      downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+		 (short) (-1), (short) 0, (short) 1,
+		 (short) 0);
     }
-    /* Subsample second through next-to-last row groups of this set */
+    /* Downsample second through next-to-last row groups of this set */
     for (i = 1; i <= DCTSIZE-2; i++) {
-      subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-		(short) (i-1), (short) i, (short) (i+1),
-		(short) i);
+      downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+		 (short) (i-1), (short) i, (short) (i+1),
+		 (short) i);
     }
   } /* end of outer loop */
   
-  /* Subsample the last row group with dummy below-context */
+  /* Downsample the last row group with dummy below-context */
   /* Note whichss points to last buffer side used */
-  subsample(cinfo, fullsize_data[whichss], subsampled_data, fullsize_width,
-	    (short) (DCTSIZE-2), (short) (DCTSIZE-1), (short) (-1),
-	    (short) (DCTSIZE-1));
+  downsample(cinfo, fullsize_data[whichss], sampled_data, fullsize_width,
+	     (short) (DCTSIZE-2), (short) (DCTSIZE-1), (short) (-1),
+	     (short) (DCTSIZE-1));
   /* Dump the remaining data (may be less than full height if uninterleaved) */
-  (*cinfo->methods->extract_MCUs) (cinfo, subsampled_data, 
+  (*cinfo->methods->extract_MCUs) (cinfo, sampled_data, 
 		(int) (cinfo->MCU_rows_in_scan - mcu_rows_output),
 		MCU_output_catcher);
 
   /* Clean up after that stuff, then find the optimal entropy parameters */
 
   (*cinfo->methods->extract_term) (cinfo);
-  (*cinfo->methods->subsample_term) (cinfo);
+  (*cinfo->methods->downsample_term) (cinfo);
 
   cinfo->completed_passes++;
 
@@ -643,9 +659,9 @@
 
   (*cinfo->methods->write_scan_header) (cinfo);
   cinfo->methods->entropy_output = cinfo->methods->write_jpeg_data;
-  (*cinfo->methods->entropy_encoder_init) (cinfo);
+  (*cinfo->methods->entropy_encode_init) (cinfo);
   dump_scan_MCUs(cinfo, cinfo->methods->entropy_encode);
-  (*cinfo->methods->entropy_encoder_term) (cinfo);
+  (*cinfo->methods->entropy_encode_term) (cinfo);
   (*cinfo->methods->write_scan_trailer) (cinfo);
 
   /* Release working memory */
@@ -660,7 +676,7 @@
  * with no optimization of entropy parameters.
  */
 
-#ifdef MULTISCAN_FILES_SUPPORTED
+#ifdef C_MULTISCAN_FILES_SUPPORTED
 
 METHODDEF void
 multi_ccontroller (compress_info_ptr cinfo)
@@ -668,7 +684,7 @@
   ERREXIT(cinfo->emethods, "Not implemented yet");
 }
 
-#endif /* MULTISCAN_FILES_SUPPORTED */
+#endif /* C_MULTISCAN_FILES_SUPPORTED */
 
 
 /*
@@ -676,7 +692,7 @@
  * with optimization of entropy parameters.
  */
 
-#ifdef MULTISCAN_FILES_SUPPORTED
+#ifdef C_MULTISCAN_FILES_SUPPORTED
 #ifdef ENTROPY_OPT_SUPPORTED
 
 METHODDEF void
@@ -686,7 +702,7 @@
 }
 
 #endif /* ENTROPY_OPT_SUPPORTED */
-#endif /* MULTISCAN_FILES_SUPPORTED */
+#endif /* C_MULTISCAN_FILES_SUPPORTED */
 
 
 /*
@@ -706,7 +722,7 @@
       cinfo->methods->c_pipeline_controller = single_ccontroller;
   } else {
     /* multiple scans needed */
-#ifdef MULTISCAN_FILES_SUPPORTED
+#ifdef C_MULTISCAN_FILES_SUPPORTED
 #ifdef ENTROPY_OPT_SUPPORTED
     if (cinfo->optimize_coding)
       cinfo->methods->c_pipeline_controller = multi_eopt_ccontroller;
diff --git a/jcsample.c b/jcsample.c
index 9362ec4..31a636b 100644
--- a/jcsample.c
+++ b/jcsample.c
@@ -5,50 +5,72 @@
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
- * This file contains subsampling routines.
- * These routines are invoked via the subsample and
- * subsample_init/term methods.
+ * This file contains downsampling routines.
+ * These routines are invoked via the downsample and
+ * downsample_init/term methods.
+ *
+ * An excellent reference for image resampling is
+ *   Digital Image Warping, George Wolberg, 1990.
+ *   Pub. by IEEE Computer Society Press, Los Alamitos, CA. ISBN 0-8186-8944-7.
+ *
+ * The downsampling algorithm used here is a simple average of the source
+ * pixels covered by the output pixel.  The hi-falutin sampling literature
+ * refers to this as a "box filter".  In general the characteristics of a box
+ * filter are not very good, but for the specific cases we normally use (1:1
+ * and 2:1 ratios) the box is equivalent to a "triangle filter" which is not
+ * nearly so bad.  If you intend to use other sampling ratios, you'd be well
+ * advised to improve this code.
+ *
+ * A simple input-smoothing capability is provided.  This is mainly intended
+ * for cleaning up color-dithered GIF input files (if you find it inadequate,
+ * we suggest using an external filtering program such as pnmconvol).  When
+ * enabled, each input pixel P is replaced by a weighted sum of itself and its
+ * eight neighbors.  P's weight is 1-8*SF and each neighbor's weight is SF,
+ * where SF = (smoothing_factor / 1024).
+ * Currently, smoothing is only supported for 2h2v sampling factors.
  */
 
 #include "jinclude.h"
 
 
 /*
- * Initialize for subsampling a scan.
+ * Initialize for downsampling a scan.
  */
 
 METHODDEF void
-subsample_init (compress_info_ptr cinfo)
+downsample_init (compress_info_ptr cinfo)
 {
   /* no work for now */
 }
 
 
 /*
- * Subsample pixel values of a single component.
- * This version only handles integral sampling ratios.
+ * Downsample pixel values of a single component.
+ * This version handles arbitrary integral sampling ratios, without smoothing.
+ * Note that this version is not actually used for customary sampling ratios.
  */
 
 METHODDEF void
-subsample (compress_info_ptr cinfo, int which_component,
-	   long input_cols, int input_rows,
-	   long output_cols, int output_rows,
-	   JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
-	   JSAMPARRAY output_data)
+int_downsample (compress_info_ptr cinfo, int which_component,
+		long input_cols, int input_rows,
+		long output_cols, int output_rows,
+		JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+		JSAMPARRAY output_data)
 {
   jpeg_component_info * compptr = cinfo->cur_comp_info[which_component];
   int inrow, outrow, h_expand, v_expand, numpix, numpix2, h, v;
-  long outcol;
+  long outcol, outcol_h;	/* outcol_h == outcol*h_expand */
   JSAMPROW inptr, outptr;
   INT32 outvalue;
 
-  /* TEMP FOR DEBUGGING PIPELINE CONTROLLER */
+#ifdef DEBUG			/* for debugging pipeline controller */
   if (output_rows != compptr->v_samp_factor ||
       input_rows != cinfo->max_v_samp_factor ||
       (output_cols % compptr->h_samp_factor) != 0 ||
       (input_cols % cinfo->max_h_samp_factor) != 0 ||
       input_cols*compptr->h_samp_factor != output_cols*cinfo->max_h_samp_factor)
-    ERREXIT(cinfo->emethods, "Bogus subsample parameters");
+    ERREXIT(cinfo->emethods, "Bogus downsample parameters");
+#endif
 
   h_expand = cinfo->max_h_samp_factor / compptr->h_samp_factor;
   v_expand = cinfo->max_v_samp_factor / compptr->v_samp_factor;
@@ -58,10 +80,11 @@
   inrow = 0;
   for (outrow = 0; outrow < output_rows; outrow++) {
     outptr = output_data[outrow];
-    for (outcol = 0; outcol < output_cols; outcol++) {
+    for (outcol = 0, outcol_h = 0; outcol < output_cols;
+	 outcol++, outcol_h += h_expand) {
       outvalue = 0;
       for (v = 0; v < v_expand; v++) {
-	inptr = input_data[inrow+v] + (outcol*h_expand);
+	inptr = input_data[inrow+v] + outcol_h;
 	for (h = 0; h < h_expand; h++) {
 	  outvalue += (INT32) GETJSAMPLE(*inptr++);
 	}
@@ -74,30 +97,307 @@
 
 
 /*
- * Subsample pixel values of a single component.
- * This version handles the special case of a full-size component.
+ * Downsample pixel values of a single component.
+ * This version handles the common case of 2:1 horizontal and 1:1 vertical,
+ * without smoothing.
  */
 
 METHODDEF void
-fullsize_subsample (compress_info_ptr cinfo, int which_component,
-		    long input_cols, int input_rows,
-		    long output_cols, int output_rows,
-		    JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
-		    JSAMPARRAY output_data)
+h2v1_downsample (compress_info_ptr cinfo, int which_component,
+		 long input_cols, int input_rows,
+		 long output_cols, int output_rows,
+		 JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+		 JSAMPARRAY output_data)
 {
-  if (input_cols != output_cols || input_rows != output_rows) /* DEBUG */
+  int outrow;
+  long outcol;
+  register JSAMPROW inptr, outptr;
+
+#ifdef DEBUG			/* for debugging pipeline controller */
+  jpeg_component_info * compptr = cinfo->cur_comp_info[which_component];
+  if (output_rows != compptr->v_samp_factor ||
+      input_rows != cinfo->max_v_samp_factor ||
+      (output_cols % compptr->h_samp_factor) != 0 ||
+      (input_cols % cinfo->max_h_samp_factor) != 0 ||
+      input_cols*compptr->h_samp_factor != output_cols*cinfo->max_h_samp_factor)
+    ERREXIT(cinfo->emethods, "Bogus downsample parameters");
+#endif
+
+  for (outrow = 0; outrow < output_rows; outrow++) {
+    outptr = output_data[outrow];
+    inptr = input_data[outrow];
+    for (outcol = 0; outcol < output_cols; outcol++) {
+      *outptr++ = (JSAMPLE) ((GETJSAMPLE(*inptr) + GETJSAMPLE(inptr[1])
+			      + 1) >> 1);
+      inptr += 2;
+    }
+  }
+}
+
+
+/*
+ * Downsample pixel values of a single component.
+ * This version handles the standard case of 2:1 horizontal and 2:1 vertical,
+ * without smoothing.
+ */
+
+METHODDEF void
+h2v2_downsample (compress_info_ptr cinfo, int which_component,
+		 long input_cols, int input_rows,
+		 long output_cols, int output_rows,
+		 JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+		 JSAMPARRAY output_data)
+{
+  int inrow, outrow;
+  long outcol;
+  register JSAMPROW inptr0, inptr1, outptr;
+
+#ifdef DEBUG			/* for debugging pipeline controller */
+  jpeg_component_info * compptr = cinfo->cur_comp_info[which_component];
+  if (output_rows != compptr->v_samp_factor ||
+      input_rows != cinfo->max_v_samp_factor ||
+      (output_cols % compptr->h_samp_factor) != 0 ||
+      (input_cols % cinfo->max_h_samp_factor) != 0 ||
+      input_cols*compptr->h_samp_factor != output_cols*cinfo->max_h_samp_factor)
+    ERREXIT(cinfo->emethods, "Bogus downsample parameters");
+#endif
+
+  inrow = 0;
+  for (outrow = 0; outrow < output_rows; outrow++) {
+    outptr = output_data[outrow];
+    inptr0 = input_data[inrow];
+    inptr1 = input_data[inrow+1];
+    for (outcol = 0; outcol < output_cols; outcol++) {
+      *outptr++ = (JSAMPLE) ((GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) +
+			      GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1])
+			      + 2) >> 2);
+      inptr0 += 2; inptr1 += 2;
+    }
+    inrow += 2;
+  }
+}
+
+
+/*
+ * Downsample pixel values of a single component.
+ * This version handles the special case of a full-size component,
+ * without smoothing.
+ */
+
+METHODDEF void
+fullsize_downsample (compress_info_ptr cinfo, int which_component,
+		     long input_cols, int input_rows,
+		     long output_cols, int output_rows,
+		     JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+		     JSAMPARRAY output_data)
+{
+#ifdef DEBUG			/* for debugging pipeline controller */
+  if (input_cols != output_cols || input_rows != output_rows)
     ERREXIT(cinfo->emethods, "Pipeline controller messed up");
+#endif
 
   jcopy_sample_rows(input_data, 0, output_data, 0, output_rows, output_cols);
 }
 
 
+#ifdef INPUT_SMOOTHING_SUPPORTED
+
+/*
+ * Downsample pixel values of a single component.
+ * This version handles the standard case of 2:1 horizontal and 2:1 vertical,
+ * with smoothing.
+ */
+
+METHODDEF void
+h2v2_smooth_downsample (compress_info_ptr cinfo, int which_component,
+			long input_cols, int input_rows,
+			long output_cols, int output_rows,
+			JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+			JSAMPARRAY output_data)
+{
+  int inrow, outrow;
+  long colctr;
+  register JSAMPROW inptr0, inptr1, above_ptr, below_ptr, outptr;
+  INT32 membersum, neighsum, memberscale, neighscale;
+
+#ifdef DEBUG			/* for debugging pipeline controller */
+  jpeg_component_info * compptr = cinfo->cur_comp_info[which_component];
+  if (output_rows != compptr->v_samp_factor ||
+      input_rows != cinfo->max_v_samp_factor ||
+      (output_cols % compptr->h_samp_factor) != 0 ||
+      (input_cols % cinfo->max_h_samp_factor) != 0 ||
+      input_cols*compptr->h_samp_factor != output_cols*cinfo->max_h_samp_factor)
+    ERREXIT(cinfo->emethods, "Bogus downsample parameters");
+#endif
+
+  /* We don't bother to form the individual "smoothed" input pixel values;
+   * we can directly compute the output which is the average of the four
+   * smoothed values.  Each of the four member pixels contributes a fraction
+   * (1-8*SF) to its own smoothed image and a fraction SF to each of the three
+   * other smoothed pixels, therefore a total fraction (1-5*SF)/4 to the final
+   * output.  The four corner-adjacent neighbor pixels contribute a fraction
+   * SF to just one smoothed pixel, or SF/4 to the final output; while the
+   * eight edge-adjacent neighbors contribute SF to each of two smoothed
+   * pixels, or SF/2 overall.  In order to use integer arithmetic, these
+   * factors are scaled by 2^16 = 65536.
+   * Also recall that SF = smoothing_factor / 1024.
+   */
+
+  memberscale = 16384 - cinfo->smoothing_factor * 80; /* scaled (1-5*SF)/4 */
+  neighscale = cinfo->smoothing_factor * 16; /* scaled SF/4 */
+
+  inrow = 0;
+  for (outrow = 0; outrow < output_rows; outrow++) {
+    outptr = output_data[outrow];
+    inptr0 = input_data[inrow];
+    inptr1 = input_data[inrow+1];
+    if (inrow == 0)
+      above_ptr = above[input_rows-1];
+    else
+      above_ptr = input_data[inrow-1];
+    if (inrow >= input_rows-2)
+      below_ptr = below[0];
+    else
+      below_ptr = input_data[inrow+2];
+
+    /* Special case for first column: pretend column -1 is same as column 0 */
+    membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) +
+		GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]);
+    neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) +
+	       GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) +
+	       GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[2]) +
+	       GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[2]);
+    neighsum += neighsum;
+    neighsum += GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[2]) +
+		GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[2]);
+    membersum = membersum * memberscale + neighsum * neighscale;
+    *outptr++ = (JSAMPLE) ((membersum + 32768L) >> 16);
+    inptr0 += 2; inptr1 += 2; above_ptr += 2; below_ptr += 2;
+
+    for (colctr = output_cols - 2; colctr > 0; colctr--) {
+      /* sum of pixels directly mapped to this output element */
+      membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) +
+		  GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]);
+      /* sum of edge-neighbor pixels */
+      neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) +
+		 GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) +
+		 GETJSAMPLE(inptr0[-1]) + GETJSAMPLE(inptr0[2]) +
+		 GETJSAMPLE(inptr1[-1]) + GETJSAMPLE(inptr1[2]);
+      /* The edge-neighbors count twice as much as corner-neighbors */
+      neighsum += neighsum;
+      /* Add in the corner-neighbors */
+      neighsum += GETJSAMPLE(above_ptr[-1]) + GETJSAMPLE(above_ptr[2]) +
+		  GETJSAMPLE(below_ptr[-1]) + GETJSAMPLE(below_ptr[2]);
+      /* form final output scaled up by 2^16 */
+      membersum = membersum * memberscale + neighsum * neighscale;
+      /* round, descale and output it */
+      *outptr++ = (JSAMPLE) ((membersum + 32768L) >> 16);
+      inptr0 += 2; inptr1 += 2; above_ptr += 2; below_ptr += 2;
+    }
+
+    /* Special case for last column */
+    membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) +
+		GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]);
+    neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) +
+	       GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) +
+	       GETJSAMPLE(inptr0[-1]) + GETJSAMPLE(inptr0[1]) +
+	       GETJSAMPLE(inptr1[-1]) + GETJSAMPLE(inptr1[1]);
+    neighsum += neighsum;
+    neighsum += GETJSAMPLE(above_ptr[-1]) + GETJSAMPLE(above_ptr[1]) +
+		GETJSAMPLE(below_ptr[-1]) + GETJSAMPLE(below_ptr[1]);
+    membersum = membersum * memberscale + neighsum * neighscale;
+    *outptr = (JSAMPLE) ((membersum + 32768L) >> 16);
+
+    inrow += 2;
+  }
+}
+
+
+/*
+ * Downsample pixel values of a single component.
+ * This version handles the special case of a full-size component,
+ * with smoothing.
+ */
+
+METHODDEF void
+fullsize_smooth_downsample (compress_info_ptr cinfo, int which_component,
+			    long input_cols, int input_rows,
+			    long output_cols, int output_rows,
+			    JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+			    JSAMPARRAY output_data)
+{
+  int outrow;
+  long colctr;
+  register JSAMPROW inptr, above_ptr, below_ptr, outptr;
+  INT32 membersum, neighsum, memberscale, neighscale;
+  int colsum, lastcolsum, nextcolsum;
+
+#ifdef DEBUG			/* for debugging pipeline controller */
+  if (input_cols != output_cols || input_rows != output_rows)
+    ERREXIT(cinfo->emethods, "Pipeline controller messed up");
+#endif
+
+  /* Each of the eight neighbor pixels contributes a fraction SF to the
+   * smoothed pixel, while the main pixel contributes (1-8*SF).  In order
+   * to use integer arithmetic, these factors are multiplied by 2^16 = 65536.
+   * Also recall that SF = smoothing_factor / 1024.
+   */
+
+  memberscale = 65536L - cinfo->smoothing_factor * 512L; /* scaled 1-8*SF */
+  neighscale = cinfo->smoothing_factor * 64; /* scaled SF */
+
+  for (outrow = 0; outrow < output_rows; outrow++) {
+    outptr = output_data[outrow];
+    inptr = input_data[outrow];
+    if (outrow == 0)
+      above_ptr = above[input_rows-1];
+    else
+      above_ptr = input_data[outrow-1];
+    if (outrow >= input_rows-1)
+      below_ptr = below[0];
+    else
+      below_ptr = input_data[outrow+1];
+
+    /* Special case for first column */
+    colsum = GETJSAMPLE(*above_ptr++) + GETJSAMPLE(*below_ptr++) +
+	     GETJSAMPLE(*inptr);
+    membersum = GETJSAMPLE(*inptr++);
+    nextcolsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(*below_ptr) +
+		 GETJSAMPLE(*inptr);
+    neighsum = colsum + (colsum - membersum) + nextcolsum;
+    membersum = membersum * memberscale + neighsum * neighscale;
+    *outptr++ = (JSAMPLE) ((membersum + 32768L) >> 16);
+    lastcolsum = colsum; colsum = nextcolsum;
+
+    for (colctr = output_cols - 2; colctr > 0; colctr--) {
+      membersum = GETJSAMPLE(*inptr++);
+      above_ptr++; below_ptr++;
+      nextcolsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(*below_ptr) +
+		   GETJSAMPLE(*inptr);
+      neighsum = lastcolsum + (colsum - membersum) + nextcolsum;
+      membersum = membersum * memberscale + neighsum * neighscale;
+      *outptr++ = (JSAMPLE) ((membersum + 32768L) >> 16);
+      lastcolsum = colsum; colsum = nextcolsum;
+    }
+
+    /* Special case for last column */
+    membersum = GETJSAMPLE(*inptr);
+    neighsum = lastcolsum + (colsum - membersum) + colsum;
+    membersum = membersum * memberscale + neighsum * neighscale;
+    *outptr = (JSAMPLE) ((membersum + 32768L) >> 16);
+
+  }
+}
+
+#endif /* INPUT_SMOOTHING_SUPPORTED */
+
+
 /*
  * Clean up after a scan.
  */
 
 METHODDEF void
-subsample_term (compress_info_ptr cinfo)
+downsample_term (compress_info_ptr cinfo)
 {
   /* no work for now */
 }
@@ -105,31 +405,56 @@
 
 
 /*
- * The method selection routine for subsampling.
+ * The method selection routine for downsampling.
  * Note that we must select a routine for each component.
  */
 
 GLOBAL void
-jselsubsample (compress_info_ptr cinfo)
+jseldownsample (compress_info_ptr cinfo)
 {
   short ci;
   jpeg_component_info * compptr;
+  boolean smoothok = TRUE;
 
   if (cinfo->CCIR601_sampling)
-    ERREXIT(cinfo->emethods, "CCIR601 subsampling not implemented yet");
+    ERREXIT(cinfo->emethods, "CCIR601 downsampling not implemented yet");
 
   for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
     compptr = cinfo->cur_comp_info[ci];
     if (compptr->h_samp_factor == cinfo->max_h_samp_factor &&
-	compptr->v_samp_factor == cinfo->max_v_samp_factor)
-      cinfo->methods->subsample[ci] = fullsize_subsample;
-    else if ((cinfo->max_h_samp_factor % compptr->h_samp_factor) == 0 &&
-	     (cinfo->max_v_samp_factor % compptr->v_samp_factor) == 0)
-      cinfo->methods->subsample[ci] = subsample;
-    else
-      ERREXIT(cinfo->emethods, "Fractional subsampling not implemented yet");
+	compptr->v_samp_factor == cinfo->max_v_samp_factor) {
+#ifdef INPUT_SMOOTHING_SUPPORTED
+      if (cinfo->smoothing_factor)
+	cinfo->methods->downsample[ci] = fullsize_smooth_downsample;
+      else
+#endif
+	cinfo->methods->downsample[ci] = fullsize_downsample;
+    } else if (compptr->h_samp_factor * 2 == cinfo->max_h_samp_factor &&
+	     compptr->v_samp_factor == cinfo->max_v_samp_factor) {
+      smoothok = FALSE;
+      cinfo->methods->downsample[ci] = h2v1_downsample;
+    } else if (compptr->h_samp_factor * 2 == cinfo->max_h_samp_factor &&
+	     compptr->v_samp_factor * 2 == cinfo->max_v_samp_factor) {
+#ifdef INPUT_SMOOTHING_SUPPORTED
+      if (cinfo->smoothing_factor)
+	cinfo->methods->downsample[ci] = h2v2_smooth_downsample;
+      else
+#endif
+	cinfo->methods->downsample[ci] = h2v2_downsample;
+    } else if ((cinfo->max_h_samp_factor % compptr->h_samp_factor) == 0 &&
+	     (cinfo->max_v_samp_factor % compptr->v_samp_factor) == 0) {
+      smoothok = FALSE;
+      cinfo->methods->downsample[ci] = int_downsample;
+    } else
+      ERREXIT(cinfo->emethods, "Fractional downsampling not implemented yet");
   }
 
-  cinfo->methods->subsample_init = subsample_init;
-  cinfo->methods->subsample_term = subsample_term;
+#ifdef INPUT_SMOOTHING_SUPPORTED
+  if (cinfo->smoothing_factor && !smoothok)
+    TRACEMS(cinfo->emethods, 0,
+	    "Smoothing not supported with nonstandard sampling ratios");
+#endif
+
+  cinfo->methods->downsample_init = downsample_init;
+  cinfo->methods->downsample_term = downsample_term;
 }
diff --git a/jdarith.c b/jdarith.c
index 9af5483..7265f7e 100644
--- a/jdarith.c
+++ b/jdarith.c
@@ -7,12 +7,12 @@
  *
  * This file contains arithmetic entropy decoding routines.
  * These routines are invoked via the methods entropy_decode
- * and entropy_decoder_init/term.
+ * and entropy_decode_init/term.
  */
 
 #include "jinclude.h"
 
-#ifdef ARITH_CODING_SUPPORTED
+#ifdef D_ARITH_CODING_SUPPORTED
 
 
 /*
@@ -39,4 +39,4 @@
   }
 }
 
-#endif /* ARITH_CODING_SUPPORTED */
+#endif /* D_ARITH_CODING_SUPPORTED */
diff --git a/jdcolor.c b/jdcolor.c
index 35a6656..fad9c73 100644
--- a/jdcolor.c
+++ b/jdcolor.c
@@ -23,31 +23,35 @@
  *	G = Y - 0.34414 * Cb - 0.71414 * Cr
  *	B = Y + 1.77200 * Cb
  * where Cb and Cr represent the incoming values less MAXJSAMPLE/2.
- * (These numbers are derived from TIFF Appendix O, draft of 4/10/91.)
+ * (These numbers are derived from TIFF 6.0 section 21, dated 3-June-92.)
  *
  * To avoid floating-point arithmetic, we represent the fractional constants
- * as integers scaled up by 2^14 (about 4 digits precision); we have to divide
- * the products by 2^14, with appropriate rounding, to get the correct answer.
+ * as integers scaled up by 2^16 (about 4 digits precision); we have to divide
+ * the products by 2^16, with appropriate rounding, to get the correct answer.
  * Notice that Y, being an integral input, does not contribute any fraction
  * so it need not participate in the rounding.
  *
  * For even more speed, we avoid doing any multiplications in the inner loop
  * by precalculating the constants times Cb and Cr for all possible values.
- * For 8-bit JSAMPLEs this is very reasonable (only 256 table entries); for
- * 12-bit samples it is still acceptable.  It's not very reasonable for 16-bit
- * samples, but if you want lossless storage you shouldn't be changing
+ * For 8-bit JSAMPLEs this is very reasonable (only 256 entries per table);
+ * for 12-bit samples it is still acceptable.  It's not very reasonable for
+ * 16-bit samples, but if you want lossless storage you shouldn't be changing
  * colorspace anyway.
  * The Cr=>R and Cb=>B values can be rounded to integers in advance; the
  * values for the G calculation are left scaled up, since we must add them
  * together before rounding.
  */
 
-#define SCALEBITS	14
+#ifdef SIXTEEN_BIT_SAMPLES
+#define SCALEBITS	14	/* avoid overflow */
+#else
+#define SCALEBITS	16	/* speedier right-shift on some machines */
+#endif
 #define ONE_HALF	((INT32) 1 << (SCALEBITS-1))
 #define FIX(x)		((INT32) ((x) * (1L<<SCALEBITS) + 0.5))
 
-static INT16 * Cr_r_tab;	/* => table for Cr to R conversion */
-static INT16 * Cb_b_tab;	/* => table for Cb to B conversion */
+static int * Cr_r_tab;		/* => table for Cr to R conversion */
+static int * Cb_b_tab;		/* => table for Cb to B conversion */
 static INT32 * Cr_g_tab;	/* => table for Cr to G conversion */
 static INT32 * Cb_g_tab;	/* => table for Cb to G conversion */
 
@@ -59,17 +63,13 @@
 METHODDEF void
 ycc_rgb_init (decompress_info_ptr cinfo)
 {
-#ifdef SIXTEEN_BIT_SAMPLES
   INT32 i, x2;
-#else
-  int i, x2;			/* smart compiler may do 16x16=>32 multiply */
-#endif
   SHIFT_TEMPS
 
-  Cr_r_tab = (INT16 *) (*cinfo->emethods->alloc_small)
-				((MAXJSAMPLE+1) * SIZEOF(INT16));
-  Cb_b_tab = (INT16 *) (*cinfo->emethods->alloc_small)
-				((MAXJSAMPLE+1) * SIZEOF(INT16));
+  Cr_r_tab = (int *) (*cinfo->emethods->alloc_small)
+				((MAXJSAMPLE+1) * SIZEOF(int));
+  Cb_b_tab = (int *) (*cinfo->emethods->alloc_small)
+				((MAXJSAMPLE+1) * SIZEOF(int));
   Cr_g_tab = (INT32 *) (*cinfo->emethods->alloc_small)
 				((MAXJSAMPLE+1) * SIZEOF(INT32));
   Cb_g_tab = (INT32 *) (*cinfo->emethods->alloc_small)
@@ -80,10 +80,10 @@
     /* The Cb or Cr value we are thinking of is x = i - MAXJSAMPLE/2 */
     x2 = 2*i - MAXJSAMPLE;	/* twice x */
     /* Cr=>R value is nearest int to 1.40200 * x */
-    Cr_r_tab[i] = (INT16)
+    Cr_r_tab[i] = (int)
 		    RIGHT_SHIFT(FIX(1.40200/2) * x2 + ONE_HALF, SCALEBITS);
     /* Cb=>B value is nearest int to 1.77200 * x */
-    Cb_b_tab[i] = (INT16)
+    Cb_b_tab[i] = (int)
 		    RIGHT_SHIFT(FIX(1.77200/2) * x2 + ONE_HALF, SCALEBITS);
     /* Cr=>G value is scaled-up -0.71414 * x */
     Cr_g_tab[i] = (- FIX(0.71414/2)) * x2;
@@ -103,15 +103,20 @@
 		 JSAMPIMAGE input_data, JSAMPIMAGE output_data)
 {
 #ifdef SIXTEEN_BIT_SAMPLES
-  register UINT16 y, cb, cr;
-  register INT32 x;
+  register INT32 y;
+  register UINT16 cb, cr;
 #else
   register int y, cb, cr;
-  register int x;
 #endif
   register JSAMPROW inptr0, inptr1, inptr2;
   register JSAMPROW outptr0, outptr1, outptr2;
-  long col;
+  register long col;
+  /* copy these pointers into registers if possible */
+  register JSAMPLE * range_limit = cinfo->sample_range_limit;
+  register int * Crrtab = Cr_r_tab;
+  register int * Cbbtab = Cb_b_tab;
+  register INT32 * Crgtab = Cr_g_tab;
+  register INT32 * Cbgtab = Cb_g_tab;
   int row;
   SHIFT_TEMPS
   
@@ -122,26 +127,19 @@
     outptr0 = output_data[0][row];
     outptr1 = output_data[1][row];
     outptr2 = output_data[2][row];
-    for (col = num_cols; col > 0; col--) {
-      y  = GETJSAMPLE(*inptr0++);
-      cb = GETJSAMPLE(*inptr1++);
-      cr = GETJSAMPLE(*inptr2++);
+    for (col = 0; col < num_cols; col++) {
+      y  = GETJSAMPLE(inptr0[col]);
+      cb = GETJSAMPLE(inptr1[col]);
+      cr = GETJSAMPLE(inptr2[col]);
       /* Note: if the inputs were computed directly from RGB values,
        * range-limiting would be unnecessary here; but due to possible
        * noise in the DCT/IDCT phase, we do need to apply range limits.
        */
-      x = y + Cr_r_tab[cr];	/* red */
-      if (x < 0) x = 0;
-      else if (x > MAXJSAMPLE) x = MAXJSAMPLE;
-      *outptr0++ = (JSAMPLE) x;
-      x = y + ((int) RIGHT_SHIFT(Cb_g_tab[cb] + Cr_g_tab[cr], SCALEBITS));
-      if (x < 0) x = 0;
-      else if (x > MAXJSAMPLE) x = MAXJSAMPLE;
-      *outptr1++ = (JSAMPLE) x;
-      x = y + Cb_b_tab[cb];	/* blue */
-      if (x < 0) x = 0;
-      else if (x > MAXJSAMPLE) x = MAXJSAMPLE;
-      *outptr2++ = (JSAMPLE) x;
+      outptr0[col] = range_limit[y + Crrtab[cr]];	/* red */
+      outptr1[col] = range_limit[y +			/* green */
+				 ((int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
+						    SCALEBITS))];
+      outptr2[col] = range_limit[y + Cbbtab[cb]];	/* blue */
     }
   }
 }
diff --git a/jddeflts.c b/jddeflts.c
index 60d8427..fde74d1 100644
--- a/jddeflts.c
+++ b/jddeflts.c
@@ -44,7 +44,12 @@
 
 /*
  * Reload the input buffer after it's been emptied, and return the next byte.
- * See the JGETC macro for calling conditions.
+ * See the JGETC macro for calling conditions.  Note in particular that
+ * read_jpeg_data may NOT return EOF.  If no more data is available, it must
+ * exit via ERREXIT, or perhaps synthesize fake data (such as an RST marker).
+ * In the present implementation, we insert an EOI marker; this might not be
+ * appropriate for non-JFIF file formats, but it usually allows us to handle
+ * a truncated JFIF file.
  *
  * This routine can be overridden by the system-dependent user interface,
  * in case the data source is not a stdio stream or some other special
@@ -63,8 +68,12 @@
 					cinfo->next_input_byte,
 					JPEG_BUF_SIZE);
   
-  if (cinfo->bytes_in_buffer <= 0)
-    ERREXIT(cinfo->emethods, "Unexpected EOF in JPEG file");
+  if (cinfo->bytes_in_buffer <= 0) {
+    WARNMS(cinfo->emethods, "Premature EOF in JPEG file");
+    cinfo->next_input_byte[0] = (char) 0xFF;
+    cinfo->next_input_byte[1] = (char) 0xD9; /* EOI marker */
+    cinfo->bytes_in_buffer = 2;
+  }
 
   return JGETC(cinfo);
 }
diff --git a/jdhuff.c b/jdhuff.c
index 8071fa2..cdc9bd7 100644
--- a/jdhuff.c
+++ b/jdhuff.c
@@ -7,7 +7,7 @@
  *
  * This file contains Huffman entropy decoding routines.
  * These routines are invoked via the methods entropy_decode
- * and entropy_decoder_init/term.
+ * and entropy_decode_init/term.
  */
 
 #include "jinclude.h"
@@ -19,7 +19,7 @@
 
 static INT32 get_buffer;	/* current bit-extraction buffer */
 static int bits_left;		/* # of unused bits in it */
-
+static boolean printed_eod;	/* flag to suppress multiple end-of-data msgs */
 
 LOCAL void
 fix_huff_tbl (HUFF_TBL * htbl)
@@ -75,116 +75,137 @@
 }
 
 
-/* Extract the next N bits from the input stream (N <= 15) */
+/*
+ * Code for extracting the next N bits from the input stream.
+ * (N never exceeds 15 for JPEG data.)
+ * This needs to go as fast as possible!
+ *
+ * We read source bytes into get_buffer and dole out bits as needed.
+ * If get_buffer already contains enough bits, they are fetched in-line
+ * by the macros get_bits() and get_bit().  When there aren't enough bits,
+ * fill_bit_buffer is called; it will attempt to fill get_buffer to the
+ * "high water mark", then extract the desired number of bits.  The idea,
+ * of course, is to minimize the function-call overhead cost of entering
+ * fill_bit_buffer.
+ * On most machines MIN_GET_BITS should be 25 to allow the full 32-bit width
+ * of get_buffer to be used.  (On machines with wider words, an even larger
+ * buffer could be used.)  However, on some machines 32-bit shifts are
+ * relatively slow and take time proportional to the number of places shifted.
+ * (This is true with most PC compilers, for instance.)  In this case it may
+ * be a win to set MIN_GET_BITS to the minimum value of 15.  This reduces the
+ * average shift distance at the cost of more calls to fill_bit_buffer.
+ */
+
+#ifdef SLOW_SHIFT_32
+#define MIN_GET_BITS  15	/* minimum allowable value */
+#else
+#define MIN_GET_BITS  25	/* max value for 32-bit get_buffer */
+#endif
+
+static const int bmask[16] =	/* bmask[n] is mask for n rightmost bits */
+  { 0, 0x0001, 0x0003, 0x0007, 0x000F, 0x001F, 0x003F, 0x007F, 0x00FF,
+    0x01FF, 0x03FF, 0x07FF, 0x0FFF, 0x1FFF, 0x3FFF, 0x7FFF };
+
 
 LOCAL int
-get_bits (int nbits)
+fill_bit_buffer (int nbits)
+/* Load up the bit buffer and do get_bits(nbits) */
 {
-  int result;
-  
-  while (nbits > bits_left) {
-    int c = JGETC(dcinfo);
+  /* Attempt to load at least MIN_GET_BITS bits into get_buffer. */
+  while (bits_left < MIN_GET_BITS) {
+    register int c = JGETC(dcinfo);
     
-    get_buffer <<= 8;
-    get_buffer |= c;
-    bits_left += 8;
     /* If it's 0xFF, check and discard stuffed zero byte */
-    if (c == 0xff) {
-      c = JGETC(dcinfo);  /* Byte stuffing */
-      if (c != 0)
-	ERREXIT1(dcinfo->emethods,
-		 "Unexpected marker 0x%02x in compressed data", c);
+    if (c == 0xFF) {
+      int c2 = JGETC(dcinfo);
+      if (c2 != 0) {
+	/* Oops, it's actually a marker indicating end of compressed data. */
+	/* Better put it back for use later */
+	JUNGETC(c2,dcinfo);
+	JUNGETC(c,dcinfo);
+	/* There should be enough bits still left in the data segment; */
+	/* if so, just break out of the while loop. */
+	if (bits_left >= nbits)
+	  break;
+	/* Uh-oh.  Report corrupted data to user and stuff zeroes into
+	 * the data stream, so we can produce some kind of image.
+	 * Note that this will be repeated for each byte demanded for the
+	 * rest of the segment; this is a bit slow but not unreasonably so.
+	 * The main thing is to avoid getting a zillion warnings, hence:
+	 */
+	if (! printed_eod) {
+	  WARNMS(dcinfo->emethods, "Corrupt JPEG data: premature end of data segment");
+	  printed_eod = TRUE;
+	}
+	c = 0;			/* insert a zero byte into bit buffer */
+      }
     }
+
+    /* OK, load c into get_buffer */
+    get_buffer = (get_buffer << 8) | c;
+    bits_left += 8;
   }
-  
+
+  /* Having filled get_buffer, extract desired bits (this simplifies macros) */
   bits_left -= nbits;
-  result = ((int) (get_buffer >> bits_left)) & ((1 << nbits) - 1);
-  return result;
+  return ((int) (get_buffer >> bits_left)) & bmask[nbits];
 }
 
-/* Macro to make things go at some speed! */
 
-#define get_bit()	(bits_left ? \
-			 ((int) (get_buffer >> (--bits_left))) & 1 : \
-			 get_bits(1))
+/* Macros to make things go at some speed! */
+/* NB: parameter to get_bits should be simple variable, not expression */
+
+#define get_bits(nbits) \
+	(bits_left >= (nbits) ? \
+	 ((int) (get_buffer >> (bits_left -= (nbits)))) & bmask[nbits] : \
+	 fill_bit_buffer(nbits))
+
+#define get_bit() \
+	(bits_left ? \
+	 ((int) (get_buffer >> (--bits_left))) & 1 : \
+	 fill_bit_buffer(1))
 
 
 /* Figure F.16: extract next coded symbol from input stream */
   
+INLINE
 LOCAL int
 huff_DECODE (HUFF_TBL * htbl)
 {
-  int l, p;
-  INT32 code;
+  register int l;
+  register INT32 code;
   
   code = get_bit();
   l = 1;
   while (code > htbl->maxcode[l]) {
-    code = (code << 1) + get_bit();
+    code = (code << 1) | get_bit();
     l++;
   }
 
   /* With garbage input we may reach the sentinel value l = 17. */
 
   if (l > 16) {
-    ERREXIT(dcinfo->emethods, "Corrupted data in JPEG file");
+    WARNMS(dcinfo->emethods, "Corrupt JPEG data: bad Huffman code");
+    return 0;			/* fake a zero as the safest result */
   }
 
-  p = (int) (htbl->valptr[l] + (code - htbl->mincode[l]));
-  
-  return (int) htbl->huffval[p];
+  return htbl->huffval[ htbl->valptr[l] + ((int) (code - htbl->mincode[l])) ];
 }
 
 
 /* Figure F.12: extend sign bit */
 
-/* NB: on some compilers this will only work for s > 0 */
+#define huff_EXTEND(x,s)  ((x) < extend_test[s] ? (x) + extend_offset[s] : (x))
 
-#define huff_EXTEND(x, s)	((x) < (1 << ((s)-1)) ? \
-				 (x) + (-1 << (s)) + 1 : \
-				 (x))
+static const int extend_test[16] =   /* entry n is 2**(n-1) */
+  { 0, 0x0001, 0x0002, 0x0004, 0x0008, 0x0010, 0x0020, 0x0040, 0x0080,
+    0x0100, 0x0200, 0x0400, 0x0800, 0x1000, 0x2000, 0x4000 };
 
-
-/* Decode a single block's worth of coefficients */
-/* Note that only the difference is returned for the DC coefficient */
-
-LOCAL void
-decode_one_block (JBLOCK block, HUFF_TBL *dctbl, HUFF_TBL *actbl)
-{
-  int s, k, r, n;
-
-  /* zero out the coefficient block */
-
-  MEMZERO((void *) block, SIZEOF(JBLOCK));
-  
-  /* Section F.2.2.1: decode the DC coefficient difference */
-
-  s = huff_DECODE(dctbl);
-  if (s) {
-    r = get_bits(s);
-    s = huff_EXTEND(r, s);
-  }
-  block[0] = s;
-
-  /* Section F.2.2.2: decode the AC coefficients */
-  
-  for (k = 1; k < DCTSIZE2; k++) {
-    r = huff_DECODE(actbl);
-    
-    s = r & 15;
-    n = r >> 4;
-    
-    if (s) {
-      k += n;
-      r = get_bits(s);
-      block[k] = huff_EXTEND(r, s);
-    } else {
-      if (n != 15)
-	break;
-      k += 15;
-    }
-  }
-}
+static const int extend_offset[16] = /* entry n is (-1 << n) + 1 */
+  { 0, ((-1)<<1) + 1, ((-1)<<2) + 1, ((-1)<<3) + 1, ((-1)<<4) + 1,
+    ((-1)<<5) + 1, ((-1)<<6) + 1, ((-1)<<7) + 1, ((-1)<<8) + 1,
+    ((-1)<<9) + 1, ((-1)<<10) + 1, ((-1)<<11) + 1, ((-1)<<12) + 1,
+    ((-1)<<13) + 1, ((-1)<<14) + 1, ((-1)<<15) + 1 };
 
 
 /*
@@ -201,6 +222,7 @@
   /* Initialize static variables */
   dcinfo = cinfo;
   bits_left = 0;
+  printed_eod = FALSE;
 
   for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
     compptr = cinfo->cur_comp_info[ci];
@@ -232,30 +254,33 @@
   int c, nbytes;
   short ci;
 
-  /* Throw away any partial unread byte */
+  /* Throw away any unused bits remaining in bit buffer */
+  nbytes = bits_left / 8;	/* count any full bytes loaded into buffer */
   bits_left = 0;
+  printed_eod = FALSE;		/* next segment can get another warning */
 
   /* Scan for next JPEG marker */
-  nbytes = 0;
   do {
     do {			/* skip any non-FF bytes */
       nbytes++;
       c = JGETC(cinfo);
     } while (c != 0xFF);
     do {			/* skip any duplicate FFs */
-      nbytes++;
+      /* we don't increment nbytes here since extra FFs are legal */
       c = JGETC(cinfo);
     } while (c == 0xFF);
   } while (c == 0);		/* repeat if it was a stuffed FF/00 */
 
-  if (c != (RST0 + cinfo->next_restart_num))
-    ERREXIT2(cinfo->emethods, "Found 0x%02x marker instead of RST%d",
-	     c, cinfo->next_restart_num);
+  if (nbytes != 1)
+    WARNMS2(cinfo->emethods,
+	    "Corrupt JPEG data: %d extraneous bytes before marker 0x%02x",
+	    nbytes-1, c);
 
-  if (nbytes != 2)
-    TRACEMS2(cinfo->emethods, 1, "Skipped %d bytes before RST%d",
-	     nbytes-2, cinfo->next_restart_num);
-  else
+  if (c != (RST0 + cinfo->next_restart_num)) {
+    /* Uh-oh, the restart markers have been messed up too. */
+    /* Let the file-format module try to figure out how to resync. */
+    (*cinfo->methods->resync_to_restart) (cinfo, c);
+  } else
     TRACEMS1(cinfo->emethods, 2, "RST%d", cinfo->next_restart_num);
 
   /* Re-initialize DC predictions to 0 */
@@ -264,19 +289,49 @@
 
   /* Update restart state */
   cinfo->restarts_to_go = cinfo->restart_interval;
-  cinfo->next_restart_num++;
-  cinfo->next_restart_num &= 7;
+  cinfo->next_restart_num = (cinfo->next_restart_num + 1) & 7;
 }
 
 
+/* ZAG[i] is the natural-order position of the i'th element of zigzag order.
+ * If the incoming data is corrupted, huff_decode_mcu could attempt to
+ * reference values beyond the end of the array.  To avoid a wild store,
+ * we put some extra zeroes after the real entries.
+ */
+
+static const short ZAG[DCTSIZE2+16] = {
+  0,  1,  8, 16,  9,  2,  3, 10,
+ 17, 24, 32, 25, 18, 11,  4,  5,
+ 12, 19, 26, 33, 40, 48, 41, 34,
+ 27, 20, 13,  6,  7, 14, 21, 28,
+ 35, 42, 49, 56, 57, 50, 43, 36,
+ 29, 22, 15, 23, 30, 37, 44, 51,
+ 58, 59, 52, 45, 38, 31, 39, 46,
+ 53, 60, 61, 54, 47, 55, 62, 63,
+  0,  0,  0,  0,  0,  0,  0,  0, /* extra entries in case k>63 below */
+  0,  0,  0,  0,  0,  0,  0,  0
+};
+
+
 /*
  * Decode and return one MCU's worth of Huffman-compressed coefficients.
+ * This routine also handles quantization descaling and zigzag reordering
+ * of coefficient values.
+ *
+ * The i'th block of the MCU is stored into the block pointed to by
+ * MCU_data[i].  WE ASSUME THIS AREA HAS BEEN ZEROED BY THE CALLER.
+ * (Wholesale zeroing is usually a little faster than retail...)
  */
 
 METHODDEF void
-huff_decode (decompress_info_ptr cinfo, JBLOCK *MCU_data)
+huff_decode_mcu (decompress_info_ptr cinfo, JBLOCKROW *MCU_data)
 {
+  register int s, k, r;
   short blkn, ci;
+  register JBLOCKROW block;
+  register QUANT_TBL_PTR quanttbl;
+  HUFF_TBL *dctbl;
+  HUFF_TBL *actbl;
   jpeg_component_info * compptr;
 
   /* Account for restart interval, process restart marker if needed */
@@ -286,15 +341,51 @@
     cinfo->restarts_to_go--;
   }
 
+  /* Outer loop handles each block in the MCU */
+
   for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
+    block = MCU_data[blkn];
     ci = cinfo->MCU_membership[blkn];
     compptr = cinfo->cur_comp_info[ci];
-    decode_one_block(MCU_data[blkn],
-		     cinfo->dc_huff_tbl_ptrs[compptr->dc_tbl_no],
-		     cinfo->ac_huff_tbl_ptrs[compptr->ac_tbl_no]);
+    quanttbl = cinfo->quant_tbl_ptrs[compptr->quant_tbl_no];
+    actbl = cinfo->ac_huff_tbl_ptrs[compptr->ac_tbl_no];
+    dctbl = cinfo->dc_huff_tbl_ptrs[compptr->dc_tbl_no];
+
+    /* Decode a single block's worth of coefficients */
+
+    /* Section F.2.2.1: decode the DC coefficient difference */
+    s = huff_DECODE(dctbl);
+    if (s) {
+      r = get_bits(s);
+      s = huff_EXTEND(r, s);
+    }
+
     /* Convert DC difference to actual value, update last_dc_val */
-    MCU_data[blkn][0] += cinfo->last_dc_val[ci];
-    cinfo->last_dc_val[ci] = MCU_data[blkn][0];
+    s += cinfo->last_dc_val[ci];
+    cinfo->last_dc_val[ci] = (JCOEF) s;
+    /* Descale and output the DC coefficient (assumes ZAG[0] = 0) */
+    (*block)[0] = (JCOEF) (((JCOEF) s) * quanttbl[0]);
+    
+    /* Section F.2.2.2: decode the AC coefficients */
+    /* Since zero values are skipped, output area must be zeroed beforehand */
+    for (k = 1; k < DCTSIZE2; k++) {
+      r = huff_DECODE(actbl);
+      
+      s = r & 15;
+      r = r >> 4;
+      
+      if (s) {
+	k += r;
+	r = get_bits(s);
+	s = huff_EXTEND(r, s);
+	/* Descale coefficient and output in natural (dezigzagged) order */
+	(*block)[ZAG[k]] = (JCOEF) (((JCOEF) s) * quanttbl[k]);
+      } else {
+	if (r != 15)
+	  break;
+	k += 15;
+      }
+    }
   }
 }
 
@@ -318,8 +409,8 @@
 jseldhuffman (decompress_info_ptr cinfo)
 {
   if (! cinfo->arith_code) {
-    cinfo->methods->entropy_decoder_init = huff_decoder_init;
-    cinfo->methods->entropy_decode = huff_decode;
-    cinfo->methods->entropy_decoder_term = huff_decoder_term;
+    cinfo->methods->entropy_decode_init = huff_decoder_init;
+    cinfo->methods->entropy_decode = huff_decode_mcu;
+    cinfo->methods->entropy_decode_term = huff_decoder_term;
   }
 }
diff --git a/jdmain.c b/jdmain.c
index 380a96c..4991653 100644
--- a/jdmain.c
+++ b/jdmain.c
@@ -5,7 +5,7 @@
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
- * This file contains a trivial test user interface for the JPEG decompressor.
+ * This file contains a command-line user interface for the JPEG decompressor.
  * It should work on any system with Unix- or MS-DOS-style command lines.
  *
  * Two different command line styles are permitted, depending on the
@@ -24,9 +24,13 @@
 #ifdef INCLUDES_ARE_ANSI
 #include <stdlib.h>		/* to declare exit() */
 #endif
+#include <ctype.h>		/* to declare isupper(), tolower() */
 #ifdef NEED_SIGNAL_CATCHER
 #include <signal.h>		/* to declare signal() */
 #endif
+#ifdef USE_SETMODE
+#include <fcntl.h>		/* to declare setmode() */
+#endif
 
 #ifdef THINK_C
 #include <console.h>		/* command-line reader for Macintosh */
@@ -56,13 +60,6 @@
 
 
 /*
- * PD version of getopt(3).
- */
-
-#include "egetopt.c"
-
-
-/*
  * This list defines the known output image formats
  * (not all of which need be supported by a given version).
  * You can change the default output format by defining DEFAULT_FMT;
@@ -145,29 +142,231 @@
 GLOBAL void
 signal_catcher (int signum)
 {
-  emethods->trace_level = 0;	/* turn off trace output */
-  (*emethods->free_all) ();	/* clean up memory allocation & temp files */
+  if (emethods != NULL) {
+    emethods->trace_level = 0;	/* turn off trace output */
+    (*emethods->free_all) ();	/* clean up memory allocation & temp files */
+  }
   exit(EXIT_FAILURE);
 }
 
 #endif
 
 
+/*
+ * Optional routine to display a percent-done figure on stderr.
+ * See jddeflts.c for explanation of the information used.
+ */
+
+#ifdef PROGRESS_REPORT
+
+METHODDEF void
+progress_monitor (decompress_info_ptr cinfo, long loopcounter, long looplimit)
+{
+  if (cinfo->total_passes > 1) {
+    fprintf(stderr, "\rPass %d/%d: %3d%% ",
+	    cinfo->completed_passes+1, cinfo->total_passes,
+	    (int) (loopcounter*100L/looplimit));
+  } else {
+    fprintf(stderr, "\r %3d%% ",
+	    (int) (loopcounter*100L/looplimit));
+  }
+  fflush(stderr);
+}
+
+#endif
+
+
+/*
+ * Argument-parsing code.
+ * The switch parser is designed to be useful with DOS-style command line
+ * syntax, ie, intermixed switches and file names, where only the switches
+ * to the left of a given file name affect processing of that file.
+ * The main program in this file doesn't actually use this capability...
+ */
+
+
+static char * progname;		/* program name for error messages */
+
+
 LOCAL void
-usage (char * progname)
+usage (void)
 /* complain about bad command line */
 {
-  fprintf(stderr, "usage: %s ", progname);
-  fprintf(stderr, "[-G] [-P] [-R] [-T] [-b] [-g] [-q colors] [-1] [-D] [-d] [-m mem]");
+  fprintf(stderr, "usage: %s [switches] ", progname);
 #ifdef TWO_FILE_COMMANDLINE
-  fprintf(stderr, " inputfile outputfile\n");
+  fprintf(stderr, "inputfile outputfile\n");
 #else
-  fprintf(stderr, " [inputfile]\n");
+  fprintf(stderr, "[inputfile]\n");
 #endif
+
+  fprintf(stderr, "Switches (names may be abbreviated):\n");
+  fprintf(stderr, "  -colors N      Reduce image to no more than N colors\n");
+#ifdef GIF_SUPPORTED
+  fprintf(stderr, "  -gif           Select GIF output format\n");
+#endif
+#ifdef PPM_SUPPORTED
+  fprintf(stderr, "  -pnm           Select PBMPLUS (PPM/PGM) output format (default)\n");
+#endif
+  fprintf(stderr, "  -quantize N    Same as -colors N\n");
+#ifdef RLE_SUPPORTED
+  fprintf(stderr, "  -rle           Select Utah RLE output format\n");
+#endif
+#ifdef TARGA_SUPPORTED
+  fprintf(stderr, "  -targa         Select Targa output format\n");
+#endif
+  fprintf(stderr, "Switches for advanced users:\n");
+#ifdef BLOCK_SMOOTHING_SUPPORTED
+  fprintf(stderr, "  -blocksmooth   Apply cross-block smoothing\n");
+#endif
+  fprintf(stderr, "  -grayscale     Force grayscale output\n");
+  fprintf(stderr, "  -nodither      Don't use dithering in quantization\n");
+#ifdef QUANT_1PASS_SUPPORTED
+  fprintf(stderr, "  -onepass       Use 1-pass quantization (fast, low quality)\n");
+#endif
+  fprintf(stderr, "  -maxmemory N   Maximum memory to use (in kbytes)\n");
+  fprintf(stderr, "  -verbose  or  -debug   Emit debug output\n");
   exit(EXIT_FAILURE);
 }
 
 
+LOCAL boolean
+keymatch (char * arg, const char * keyword, int minchars)
+/* Case-insensitive matching of (possibly abbreviated) keyword switches. */
+/* keyword is the constant keyword (must be lower case already), */
+/* minchars is length of minimum legal abbreviation. */
+{
+  register int ca, ck;
+  register int nmatched = 0;
+
+  while ((ca = *arg++) != '\0') {
+    if ((ck = *keyword++) == '\0')
+      return FALSE;		/* arg longer than keyword, no good */
+    if (isupper(ca))		/* force arg to lcase (assume ck is already) */
+      ca = tolower(ca);
+    if (ca != ck)
+      return FALSE;		/* no good */
+    nmatched++;			/* count matched characters */
+  }
+  /* reached end of argument; fail if it's too short for unique abbrev */
+  if (nmatched < minchars)
+    return FALSE;
+  return TRUE;			/* A-OK */
+}
+
+
+LOCAL int
+parse_switches (decompress_info_ptr cinfo, int last_file_arg_seen,
+		int argc, char **argv)
+/* Initialize cinfo with default switch settings, then parse option switches.
+ * Returns argv[] index of first file-name argument (== argc if none).
+ * Any file names with indexes <= last_file_arg_seen are ignored;
+ * they have presumably been processed in a previous iteration.
+ * (Pass 0 for last_file_arg_seen on the first or only iteration.)
+ */
+{
+  int argn;
+  char * arg;
+
+  /* (Re-)initialize the system-dependent error and memory managers. */
+  jselerror(cinfo->emethods);	/* error/trace message routines */
+  jselmemmgr(cinfo->emethods);	/* memory allocation routines */
+  cinfo->methods->d_ui_method_selection = d_ui_method_selection;
+
+  /* Now OK to enable signal catcher. */
+#ifdef NEED_SIGNAL_CATCHER
+  emethods = cinfo->emethods;
+#endif
+
+  /* Set up default JPEG parameters. */
+  j_d_defaults(cinfo, TRUE);
+  requested_fmt = DEFAULT_FMT;	/* set default output file format */
+
+  /* Scan command line options, adjust parameters */
+
+  for (argn = 1; argn < argc; argn++) {
+    arg = argv[argn];
+    if (*arg != '-') {
+      /* Not a switch, must be a file name argument */
+      if (argn <= last_file_arg_seen)
+	continue;		/* ignore it if previously processed */
+      break;			/* else done parsing switches */
+    }
+    arg++;			/* advance past switch marker character */
+
+    if (keymatch(arg, "blocksmooth", 1)) {
+      /* Enable cross-block smoothing. */
+      cinfo->do_block_smoothing = TRUE;
+
+    } else if (keymatch(arg, "colors", 1) || keymatch(arg, "colours", 1) ||
+	       keymatch(arg, "quantize", 1) || keymatch(arg, "quantise", 1)) {
+      /* Do color quantization. */
+      int val;
+
+      if (++argn >= argc)	/* advance to next argument */
+	usage();
+      if (sscanf(argv[argn], "%d", &val) != 1)
+	usage();
+      cinfo->desired_number_of_colors = val;
+      cinfo->quantize_colors = TRUE;
+
+    } else if (keymatch(arg, "debug", 1) || keymatch(arg, "verbose", 1)) {
+      /* Enable debug printouts. */
+      /* On first -d, print version identification */
+      if (last_file_arg_seen == 0 && cinfo->emethods->trace_level == 0)
+	fprintf(stderr, "Independent JPEG Group's DJPEG, version %s\n%s\n",
+		JVERSION, JCOPYRIGHT);
+      cinfo->emethods->trace_level++;
+
+    } else if (keymatch(arg, "gif", 1)) {
+      /* GIF output format. */
+      requested_fmt = FMT_GIF;
+
+    } else if (keymatch(arg, "grayscale", 2) || keymatch(arg, "greyscale",2)) {
+      /* Force monochrome output. */
+      cinfo->out_color_space = CS_GRAYSCALE;
+
+    } else if (keymatch(arg, "maxmemory", 1)) {
+      /* Maximum memory in Kb (or Mb with 'm'). */
+      long lval;
+      char ch = 'x';
+
+      if (++argn >= argc)	/* advance to next argument */
+	usage();
+      if (sscanf(argv[argn], "%ld%c", &lval, &ch) < 1)
+	usage();
+      if (ch == 'm' || ch == 'M')
+	lval *= 1000L;
+      cinfo->emethods->max_memory_to_use = lval * 1000L;
+
+    } else if (keymatch(arg, "nodither", 3)) {
+      /* Suppress dithering in color quantization. */
+      cinfo->use_dithering = FALSE;
+
+    } else if (keymatch(arg, "onepass", 1)) {
+      /* Use fast one-pass quantization. */
+      cinfo->two_pass_quantize = FALSE;
+
+    } else if (keymatch(arg, "pnm", 1)) {
+      /* PPM/PGM output format. */
+      requested_fmt = FMT_PPM;
+
+    } else if (keymatch(arg, "rle", 1)) {
+      /* RLE output format. */
+      requested_fmt = FMT_RLE;
+
+    } else if (keymatch(arg, "targa", 1)) {
+      /* Targa output format. */
+      requested_fmt = FMT_TARGA;
+
+    } else {
+      usage();			/* bogus switch */
+    }
+  }
+
+  return argn;			/* return index of next arg (file name) */
+}
+
+
 /*
  * The main program.
  */
@@ -175,115 +374,47 @@
 GLOBAL int
 main (int argc, char **argv)
 {
-  struct decompress_info_struct cinfo;
-  struct decompress_methods_struct dc_methods;
-  struct external_methods_struct e_methods;
-  int c;
+  struct Decompress_info_struct cinfo;
+  struct Decompress_methods_struct dc_methods;
+  struct External_methods_struct e_methods;
+  int file_index;
 
   /* On Mac, fetch a command line. */
 #ifdef THINK_C
   argc = ccommand(&argv);
 #endif
 
-  /* Initialize the system-dependent method pointers. */
+  progname = argv[0];
+
+  /* Set up links to method structures. */
   cinfo.methods = &dc_methods;
   cinfo.emethods = &e_methods;
-  jselerror(&e_methods);	/* error/trace message routines */
-  jselmemmgr(&e_methods);	/* memory allocation routines */
-  dc_methods.d_ui_method_selection = d_ui_method_selection;
 
-  /* Now OK to enable signal catcher. */
+  /* Install, but don't yet enable signal catcher. */
 #ifdef NEED_SIGNAL_CATCHER
-  emethods = &e_methods;
+  emethods = NULL;
   signal(SIGINT, signal_catcher);
 #ifdef SIGTERM			/* not all systems have SIGTERM */
   signal(SIGTERM, signal_catcher);
 #endif
 #endif
 
-  /* Set up default JPEG parameters. */
-  j_d_defaults(&cinfo, TRUE);
-  requested_fmt = DEFAULT_FMT;	/* set default output file format */
+  /* Scan command line: set up compression parameters, input & output files. */
 
-  /* Scan command line options, adjust parameters */
-  
-  while ((c = egetopt(argc, argv, "GPRTbgq:1Dm:d")) != EOF)
-    switch (c) {
-    case 'G':			/* GIF output format. */
-      requested_fmt = FMT_GIF;
-      break;
-    case 'P':			/* PPM output format. */
-      requested_fmt = FMT_PPM;
-      break;
-    case 'R':			/* RLE output format. */
-      requested_fmt = FMT_RLE;
-      break;
-    case 'T':			/* Targa output format. */
-      requested_fmt = FMT_TARGA;
-      break;
-    case 'b':			/* Enable cross-block smoothing. */
-      cinfo.do_block_smoothing = TRUE;
-      break;
-    case 'g':			/* Force grayscale output. */
-      cinfo.out_color_space = CS_GRAYSCALE;
-      break;
-    case 'q':			/* Do color quantization. */
-      { int val;
-	if (optarg == NULL)
-	  usage(argv[0]);
-	if (sscanf(optarg, "%d", &val) != 1)
-	  usage(argv[0]);
-	cinfo.desired_number_of_colors = val;
-      }
-      cinfo.quantize_colors = TRUE;
-      break;
-    case '1':			/* Use fast one-pass quantization. */
-      cinfo.two_pass_quantize = FALSE;
-      break;
-    case 'D':			/* Suppress dithering in color quantization. */
-      cinfo.use_dithering = FALSE;
-      break;
-    case 'm':			/* Maximum memory in Kb (or Mb with 'm'). */
-      { long lval;
-	char ch = 'x';
-
-	if (optarg == NULL)
-	  usage(argv[0]);
-	if (sscanf(optarg, "%ld%c", &lval, &ch) < 1)
-	  usage(argv[0]);
-	if (ch == 'm' || ch == 'M')
-	  lval *= 1000L;
-	e_methods.max_memory_to_use = lval * 1000L;
-      }
-      break;
-    case 'd':			/* Debugging. */
-      e_methods.trace_level++;
-      break;
-    case '?':
-    default:
-      usage(argv[0]);
-      break;
-    }
-
-  /* If -d appeared, print version identification */
-  if (e_methods.trace_level > 0)
-    fprintf(stderr, "Independent JPEG Group's DJPEG, version %s\n%s\n",
-	    JVERSION, JCOPYRIGHT);
-
-  /* Select the input and output files */
+  file_index = parse_switches(&cinfo, 0, argc, argv);
 
 #ifdef TWO_FILE_COMMANDLINE
 
-  if (optind != argc-2) {
-    fprintf(stderr, "%s: must name one input and one output file\n", argv[0]);
-    usage(argv[0]);
+  if (file_index != argc-2) {
+    fprintf(stderr, "%s: must name one input and one output file\n", progname);
+    usage();
   }
-  if ((cinfo.input_file = fopen(argv[optind], READ_BINARY)) == NULL) {
-    fprintf(stderr, "%s: can't open %s\n", argv[0], argv[optind]);
+  if ((cinfo.input_file = fopen(argv[file_index], READ_BINARY)) == NULL) {
+    fprintf(stderr, "%s: can't open %s\n", progname, argv[file_index]);
     exit(EXIT_FAILURE);
   }
-  if ((cinfo.output_file = fopen(argv[optind+1], WRITE_BINARY)) == NULL) {
-    fprintf(stderr, "%s: can't open %s\n", argv[0], argv[optind+1]);
+  if ((cinfo.output_file = fopen(argv[file_index+1], WRITE_BINARY)) == NULL) {
+    fprintf(stderr, "%s: can't open %s\n", progname, argv[file_index+1]);
     exit(EXIT_FAILURE);
   }
 
@@ -292,13 +423,18 @@
   cinfo.input_file = stdin;	/* default input file */
   cinfo.output_file = stdout;	/* always the output file */
 
-  if (optind < argc-1) {
-    fprintf(stderr, "%s: only one input file\n", argv[0]);
-    usage(argv[0]);
+#ifdef USE_SETMODE		/* need to hack file mode? */
+  setmode(fileno(stdin), O_BINARY);
+  setmode(fileno(stdout), O_BINARY);
+#endif
+
+  if (file_index < argc-1) {
+    fprintf(stderr, "%s: only one input file\n", progname);
+    usage();
   }
-  if (optind < argc) {
-    if ((cinfo.input_file = fopen(argv[optind], READ_BINARY)) == NULL) {
-      fprintf(stderr, "%s: can't open %s\n", argv[0], argv[optind]);
+  if (file_index < argc) {
+    if ((cinfo.input_file = fopen(argv[file_index], READ_BINARY)) == NULL) {
+      fprintf(stderr, "%s: can't open %s\n", progname, argv[file_index]);
       exit(EXIT_FAILURE);
     }
   }
@@ -314,9 +450,23 @@
   You shoulda defined JFIF_SUPPORTED.   /* deliberate syntax error */
 #endif
 
+#ifdef PROGRESS_REPORT
+  /* Start up progress display, unless trace output is on */
+  if (e_methods.trace_level == 0)
+    dc_methods.progress_monitor = progress_monitor;
+#endif
+
   /* Do it to it! */
   jpeg_decompress(&cinfo);
 
+#ifdef PROGRESS_REPORT
+  /* Clear away progress display */
+  if (e_methods.trace_level == 0) {
+    fprintf(stderr, "\r                \r");
+    fflush(stderr);
+  }
+#endif
+
   /* All done. */
   exit(EXIT_SUCCESS);
   return 0;			/* suppress no-return-value warnings */
diff --git a/jdmaster.c b/jdmaster.c
index 17513a4..7a4a147 100644
--- a/jdmaster.c
+++ b/jdmaster.c
@@ -19,8 +19,8 @@
 {
   /* MCU disassembly */
   jseldmcu(cinfo);
-  /* Un-subsampling of pixels */
-  jselunsubsample(cinfo);
+  /* Upsampling of pixels */
+  jselupsample(cinfo);
 }
 
 
@@ -32,7 +32,7 @@
   /* So is output file format selection (both are done by user interface). */
 
   /* Entropy decoding: either Huffman or arithmetic coding. */
-#ifdef ARITH_CODING_SUPPORTED
+#ifdef D_ARITH_CODING_SUPPORTED
   jseldarithmetic(cinfo);
 #else
   if (cinfo->arith_code) {
@@ -106,7 +106,7 @@
 
   }
 
-  /* Compute logical subsampled dimensions of components */
+  /* Compute logical downsampled dimensions of components */
   for (ci = 0; ci < cinfo->num_components; ci++) {
     compptr = &cinfo->comp_info[ci];
     compptr->true_comp_width = (cinfo->image_width * compptr->h_samp_factor
diff --git a/jdmcu.c b/jdmcu.c
index 4045f3d..3927055 100644
--- a/jdmcu.c
+++ b/jdmcu.c
@@ -5,7 +5,7 @@
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
- * This file contains MCU disassembly routines and quantization descaling.
+ * This file contains MCU disassembly and IDCT control routines.
  * These routines are invoked via the disassemble_MCU, reverse_DCT, and
  * disassemble_init/term methods.
  */
@@ -14,38 +14,6 @@
 
 
 /*
- * Quantization descaling and zigzag reordering
- */
-
-
-/* ZAG[i] is the natural-order position of the i'th element of zigzag order. */
-
-static const short ZAG[DCTSIZE2] = {
-  0,  1,  8, 16,  9,  2,  3, 10,
- 17, 24, 32, 25, 18, 11,  4,  5,
- 12, 19, 26, 33, 40, 48, 41, 34,
- 27, 20, 13,  6,  7, 14, 21, 28,
- 35, 42, 49, 56, 57, 50, 43, 36,
- 29, 22, 15, 23, 30, 37, 44, 51,
- 58, 59, 52, 45, 38, 31, 39, 46,
- 53, 60, 61, 54, 47, 55, 62, 63
-};
-
-
-LOCAL void
-qdescale_zig (JBLOCK input, JBLOCKROW outputptr, QUANT_TBL_PTR quanttbl)
-{
-  const short * zagptr = ZAG;
-  short i;
-
-  for (i = DCTSIZE2-1; i >= 0; i--) {
-    (*outputptr)[*zagptr++] = (*input++) * (*quanttbl++);
-  }
-}
-
-
-
-/*
  * Fetch one MCU row from entropy_decode, build coefficient array.
  * This version is used for noninterleaved (single-component) scans.
  */
@@ -54,19 +22,21 @@
 disassemble_noninterleaved_MCU (decompress_info_ptr cinfo,
 				JBLOCKIMAGE image_data)
 {
-  JBLOCK MCU_data[1];
+  JBLOCKROW MCU_data[1];
   long mcuindex;
-  jpeg_component_info * compptr;
-  QUANT_TBL_PTR quant_ptr;
 
   /* this is pretty easy since there is one component and one block per MCU */
-  compptr = cinfo->cur_comp_info[0];
-  quant_ptr = cinfo->quant_tbl_ptrs[compptr->quant_tbl_no];
+
+  /* Pre-zero the target area to speed up entropy decoder */
+  /* (we assume wholesale zeroing is faster than retail) */
+  jzero_far((void FAR *) image_data[0][0],
+	    (size_t) (cinfo->MCUs_per_row * SIZEOF(JBLOCK)));
+
   for (mcuindex = 0; mcuindex < cinfo->MCUs_per_row; mcuindex++) {
+    /* Point to the proper spot in the image array for this MCU */
+    MCU_data[0] = image_data[0][0] + mcuindex;
     /* Fetch the coefficient data */
     (*cinfo->methods->entropy_decode) (cinfo, MCU_data);
-    /* Descale, reorder, and distribute it into the image array */
-    qdescale_zig(MCU_data[0], image_data[0][0] + mcuindex, quant_ptr);
   }
 }
 
@@ -80,30 +50,38 @@
 disassemble_interleaved_MCU (decompress_info_ptr cinfo,
 			     JBLOCKIMAGE image_data)
 {
-  JBLOCK MCU_data[MAX_BLOCKS_IN_MCU];
+  JBLOCKROW MCU_data[MAX_BLOCKS_IN_MCU];
   long mcuindex;
   short blkn, ci, xpos, ypos;
   jpeg_component_info * compptr;
-  QUANT_TBL_PTR quant_ptr;
   JBLOCKROW image_ptr;
 
+  /* Pre-zero the target area to speed up entropy decoder */
+  /* (we assume wholesale zeroing is faster than retail) */
+  for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
+    compptr = cinfo->cur_comp_info[ci];
+    for (ypos = 0; ypos < compptr->MCU_height; ypos++) {
+      jzero_far((void FAR *) image_data[ci][ypos],
+		(size_t) (cinfo->MCUs_per_row * compptr->MCU_width * SIZEOF(JBLOCK)));
+    }
+  }
+
   for (mcuindex = 0; mcuindex < cinfo->MCUs_per_row; mcuindex++) {
-    /* Fetch the coefficient data */
-    (*cinfo->methods->entropy_decode) (cinfo, MCU_data);
-    /* Descale, reorder, and distribute it into the image array */
+    /* Point to the proper spots in the image array for this MCU */
     blkn = 0;
     for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
       compptr = cinfo->cur_comp_info[ci];
-      quant_ptr = cinfo->quant_tbl_ptrs[compptr->quant_tbl_no];
       for (ypos = 0; ypos < compptr->MCU_height; ypos++) {
 	image_ptr = image_data[ci][ypos] + (mcuindex * compptr->MCU_width);
 	for (xpos = 0; xpos < compptr->MCU_width; xpos++) {
-	  qdescale_zig(MCU_data[blkn], image_ptr, quant_ptr);
+	  MCU_data[blkn] = image_ptr;
 	  image_ptr++;
 	  blkn++;
 	}
       }
     }
+    /* Fetch the coefficient data */
+    (*cinfo->methods->entropy_decode) (cinfo, MCU_data);
   }
 }
 
@@ -130,7 +108,7 @@
 
   for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
     /* calculate size of an MCU row in this component */
-    blocksperrow = cinfo->cur_comp_info[ci]->subsampled_width / DCTSIZE;
+    blocksperrow = cinfo->cur_comp_info[ci]->downsampled_width / DCTSIZE;
     numrows = cinfo->cur_comp_info[ci]->MCU_height;
     /* iterate through all blocks in MCU row */
     for (ri = 0; ri < numrows; ri++) {
@@ -143,7 +121,7 @@
 	 */
 	{ register JCOEFPTR elemptr = browptr[bi];
 	  register DCTELEM *localblkptr = block;
-	  register short elem = DCTSIZE2;
+	  register int elem = DCTSIZE2;
 
 	  while (--elem >= 0)
 	    *localblkptr++ = (DCTELEM) *elemptr++;
@@ -151,26 +129,39 @@
 
 	j_rev_dct(block);	/* perform inverse DCT */
 
-	/* output the data into the sample array.
+	/* Output the data into the sample array.
 	 * Note change from signed to unsigned representation:
 	 * DCT calculation works with values +-CENTERJSAMPLE,
 	 * but sample arrays always hold 0..MAXJSAMPLE.
-	 * Have to do explicit range-limiting because of quantization errors
-	 * and so forth in the DCT/IDCT phase.
+	 * We have to do range-limiting because of quantization errors in the
+	 * DCT/IDCT phase.  We use the sample_range_limit[] table to do this
+	 * quickly; the CENTERJSAMPLE offset is folded into table indexing.
 	 */
 	{ register JSAMPROW elemptr;
 	  register DCTELEM *localblkptr = block;
-	  register short elemr, elemc;
-	  register DCTELEM temp;
+	  register JSAMPLE *range_limit = cinfo->sample_range_limit +
+						CENTERJSAMPLE;
+#if DCTSIZE != 8
+	  register int elemc;
+#endif
+	  register int elemr;
 
 	  for (elemr = 0; elemr < DCTSIZE; elemr++) {
 	    elemptr = srowptr[elemr] + (bi * DCTSIZE);
-	    for (elemc = 0; elemc < DCTSIZE; elemc++) {
-	      temp = (*localblkptr++) + CENTERJSAMPLE;
-	      if (temp < 0) temp = 0;
-	      else if (temp > MAXJSAMPLE) temp = MAXJSAMPLE;
-	      *elemptr++ = (JSAMPLE) temp;
+#if DCTSIZE == 8		/* unroll the inner loop */
+	    *elemptr++ = range_limit[*localblkptr++];
+	    *elemptr++ = range_limit[*localblkptr++];
+	    *elemptr++ = range_limit[*localblkptr++];
+	    *elemptr++ = range_limit[*localblkptr++];
+	    *elemptr++ = range_limit[*localblkptr++];
+	    *elemptr++ = range_limit[*localblkptr++];
+	    *elemptr++ = range_limit[*localblkptr++];
+	    *elemptr++ = range_limit[*localblkptr++];
+#else
+	    for (elemc = DCTSIZE; elemc > 0; elemc--) {
+	      *elemptr++ = range_limit[*localblkptr++];
 	    }
+#endif
 	  }
 	}
       }
diff --git a/jdpipe.c b/jdpipe.c
index a6d7576..b87336f 100644
--- a/jdpipe.c
+++ b/jdpipe.c
@@ -24,7 +24,7 @@
 #include "jinclude.h"
 
 
-#ifdef MULTISCAN_FILES_SUPPORTED /* wish we could assume ANSI's defined() */
+#ifdef D_MULTISCAN_FILES_SUPPORTED /* wish we could assume ANSI's defined() */
 #define NEED_COMPLEX_CONTROLLER
 #else
 #ifdef QUANT_2PASS_SUPPORTED
@@ -36,15 +36,15 @@
 /*
  * About the data structures:
  *
- * The processing chunk size for unsubsampling is referred to in this file as
+ * The processing chunk size for upsampling is referred to in this file as
  * a "row group": a row group is defined as Vk (v_samp_factor) sample rows of
- * any component while subsampled, or Vmax (max_v_samp_factor) unsubsampled
+ * any component while downsampled, or Vmax (max_v_samp_factor) unsubsampled
  * rows.  In an interleaved scan each MCU row contains exactly DCTSIZE row
  * groups of each component in the scan.  In a noninterleaved scan an MCU row
  * is one row of blocks, which might not be an integral number of row groups;
  * therefore, we read in Vk MCU rows to obtain the same amount of data as we'd
  * have in an interleaved scan.
- * To provide context for the unsubsampling step, we have to retain the last
+ * To provide context for the upsampling step, we have to retain the last
  * two row groups of the previous MCU row while reading in the next MCU row
  * (or set of Vk MCU rows).  To do this without copying data about, we create
  * a rather strange data structure.  Exactly DCTSIZE+2 row groups of samples
@@ -76,7 +76,7 @@
 static JSAMPIMAGE output_workspace;
 
 #ifdef NEED_COMPLEX_CONTROLLER
-/* Full-size image array holding desubsampled, but not color-processed data. */
+/* Full-size image array holding upsampled, but not color-processed data. */
 static big_sarray_ptr *fullsize_image;
 static JSAMPIMAGE fullsize_ptrs; /* workspace for access_big_sarray() result */
 #endif
@@ -114,12 +114,12 @@
     compptr->MCU_height = compptr->v_samp_factor;
     compptr->MCU_blocks = compptr->MCU_width * compptr->MCU_height;
     /* compute physical dimensions of component */
-    compptr->subsampled_width = jround_up(compptr->true_comp_width,
-					  (long) (compptr->MCU_width*DCTSIZE));
-    compptr->subsampled_height = jround_up(compptr->true_comp_height,
-					   (long) (compptr->MCU_height*DCTSIZE));
+    compptr->downsampled_width = jround_up(compptr->true_comp_width,
+					   (long) (compptr->MCU_width*DCTSIZE));
+    compptr->downsampled_height = jround_up(compptr->true_comp_height,
+					    (long) (compptr->MCU_height*DCTSIZE));
     /* Sanity check */
-    if (compptr->subsampled_width !=
+    if (compptr->downsampled_width !=
 	(cinfo->MCUs_per_row * (compptr->MCU_width*DCTSIZE)))
       ERREXIT(cinfo->emethods, "I'm confused about the image width");
     /* Prepare array describing MCU composition */
@@ -147,13 +147,13 @@
   compptr->MCU_height = 1;
   compptr->MCU_blocks = 1;
   /* compute physical dimensions of component */
-  compptr->subsampled_width = jround_up(compptr->true_comp_width,
-					(long) DCTSIZE);
-  compptr->subsampled_height = jround_up(compptr->true_comp_height,
+  compptr->downsampled_width = jround_up(compptr->true_comp_width,
 					 (long) DCTSIZE);
+  compptr->downsampled_height = jround_up(compptr->true_comp_height,
+					  (long) DCTSIZE);
 
-  cinfo->MCUs_per_row = compptr->subsampled_width / DCTSIZE;
-  cinfo->MCU_rows_in_scan = compptr->subsampled_height / DCTSIZE;
+  cinfo->MCUs_per_row = compptr->downsampled_width / DCTSIZE;
+  cinfo->MCU_rows_in_scan = compptr->downsampled_height / DCTSIZE;
 
   /* Prepare array describing MCU composition */
   cinfo->blocks_in_MCU = 1;
@@ -209,7 +209,7 @@
 				(cinfo->comps_in_scan * SIZEOF(JBLOCKARRAY));
   for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
     image[ci] = (*cinfo->emethods->alloc_small_barray)
-			(cinfo->cur_comp_info[ci]->subsampled_width / DCTSIZE,
+			(cinfo->cur_comp_info[ci]->downsampled_width / DCTSIZE,
 			 (long) cinfo->cur_comp_info[ci]->MCU_height);
   }
   return image;
@@ -234,35 +234,35 @@
 
 
 LOCAL void
-alloc_sampling_buffer (decompress_info_ptr cinfo, JSAMPIMAGE subsampled_data[2])
-/* Create a subsampled-data buffer having the desired structure */
+alloc_sampling_buffer (decompress_info_ptr cinfo, JSAMPIMAGE sampled_data[2])
+/* Create a downsampled-data buffer having the desired structure */
 /* (see comments at head of file) */
 {
   short ci, vs, i;
 
   /* Get top-level space for array pointers */
-  subsampled_data[0] = (JSAMPIMAGE) (*cinfo->emethods->alloc_small)
+  sampled_data[0] = (JSAMPIMAGE) (*cinfo->emethods->alloc_small)
 				(cinfo->comps_in_scan * SIZEOF(JSAMPARRAY));
-  subsampled_data[1] = (JSAMPIMAGE) (*cinfo->emethods->alloc_small)
+  sampled_data[1] = (JSAMPIMAGE) (*cinfo->emethods->alloc_small)
 				(cinfo->comps_in_scan * SIZEOF(JSAMPARRAY));
 
   for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
     vs = cinfo->cur_comp_info[ci]->v_samp_factor; /* row group height */
     /* Allocate the real storage */
-    subsampled_data[0][ci] = (*cinfo->emethods->alloc_small_sarray)
-				(cinfo->cur_comp_info[ci]->subsampled_width,
+    sampled_data[0][ci] = (*cinfo->emethods->alloc_small_sarray)
+				(cinfo->cur_comp_info[ci]->downsampled_width,
 				(long) (vs * (DCTSIZE+2)));
     /* Create space for the scrambled-order pointers */
-    subsampled_data[1][ci] = (JSAMPARRAY) (*cinfo->emethods->alloc_small)
+    sampled_data[1][ci] = (JSAMPARRAY) (*cinfo->emethods->alloc_small)
 				(vs * (DCTSIZE+2) * SIZEOF(JSAMPROW));
     /* Duplicate the first DCTSIZE-2 row groups */
     for (i = 0; i < vs * (DCTSIZE-2); i++) {
-      subsampled_data[1][ci][i] = subsampled_data[0][ci][i];
+      sampled_data[1][ci][i] = sampled_data[0][ci][i];
     }
     /* Copy the last four row groups in swapped order */
     for (i = 0; i < vs * 2; i++) {
-      subsampled_data[1][ci][vs*DCTSIZE + i] = subsampled_data[0][ci][vs*(DCTSIZE-2) + i];
-      subsampled_data[1][ci][vs*(DCTSIZE-2) + i] = subsampled_data[0][ci][vs*DCTSIZE + i];
+      sampled_data[1][ci][vs*DCTSIZE + i] = sampled_data[0][ci][vs*(DCTSIZE-2) + i];
+      sampled_data[1][ci][vs*(DCTSIZE-2) + i] = sampled_data[0][ci][vs*DCTSIZE + i];
     }
   }
 }
@@ -271,26 +271,64 @@
 #ifdef NEED_COMPLEX_CONTROLLER	/* not used by simple controller */
 
 LOCAL void
-free_sampling_buffer (decompress_info_ptr cinfo, JSAMPIMAGE subsampled_data[2])
+free_sampling_buffer (decompress_info_ptr cinfo, JSAMPIMAGE sampled_data[2])
 /* Release a sampling buffer created by alloc_sampling_buffer */
 {
   short ci;
 
   for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
     /* Free the real storage */
-    (*cinfo->emethods->free_small_sarray) (subsampled_data[0][ci]);
+    (*cinfo->emethods->free_small_sarray) (sampled_data[0][ci]);
     /* Free the scrambled-order pointers */
-    (*cinfo->emethods->free_small) ((void *) subsampled_data[1][ci]);
+    (*cinfo->emethods->free_small) ((void *) sampled_data[1][ci]);
   }
 
   /* Free the top-level space */
-  (*cinfo->emethods->free_small) ((void *) subsampled_data[0]);
-  (*cinfo->emethods->free_small) ((void *) subsampled_data[1]);
+  (*cinfo->emethods->free_small) ((void *) sampled_data[0]);
+  (*cinfo->emethods->free_small) ((void *) sampled_data[1]);
 }
 
 #endif
 
 
+/*
+ * Several decompression processes need to range-limit values to the range
+ * 0..MAXJSAMPLE; the input value may fall somewhat outside this range
+ * due to noise introduced by quantization, roundoff error, etc.  These
+ * processes are inner loops and need to be as fast as possible.  On most
+ * machines, particularly CPUs with pipelines or instruction prefetch,
+ * a (range-check-less) C table lookup
+ *		x = sample_range_limit[x];
+ * is faster than explicit tests
+ *		if (x < 0)  x = 0;
+ *		else if (x > MAXJSAMPLE)  x = MAXJSAMPLE;
+ * These processes all use a common table prepared by the routine below.
+ *
+ * The table will work correctly for x within MAXJSAMPLE+1 of the legal
+ * range.  This is a much wider range than is needed for most cases,
+ * but the wide range is handy for color quantization.
+ * Note that the table is allocated in near data space on PCs; it's small
+ * enough and used often enough to justify this.
+ */
+
+LOCAL void
+prepare_range_limit_table (decompress_info_ptr cinfo)
+/* Allocate and fill in the sample_range_limit table */
+{
+  JSAMPLE * table;
+  int i;
+
+  table = (JSAMPLE *) (*cinfo->emethods->alloc_small)
+			(3 * (MAXJSAMPLE+1) * SIZEOF(JSAMPLE));
+  cinfo->sample_range_limit = table + (MAXJSAMPLE+1);
+  for (i = 0; i <= MAXJSAMPLE; i++) {
+    table[i] = 0;			/* sample_range_limit[x] = 0 for x<0 */
+    table[i+(MAXJSAMPLE+1)] = (JSAMPLE) i;	/* sample_range_limit[x] = x */
+    table[i+(MAXJSAMPLE+1)*2] = MAXJSAMPLE;	/* x beyond MAXJSAMPLE */
+  }
+}
+
+
 LOCAL void
 duplicate_row (JSAMPARRAY image_data,
 	       long num_cols, int source_row, int num_rows)
@@ -309,17 +347,17 @@
 
 LOCAL void
 expand (decompress_info_ptr cinfo,
-	JSAMPIMAGE subsampled_data, JSAMPIMAGE fullsize_data,
+	JSAMPIMAGE sampled_data, JSAMPIMAGE fullsize_data,
 	long fullsize_width,
 	short above, short current, short below, short out)
-/* Do unsubsampling expansion of a single row group (of each component).  */
-/* above, current, below are indexes of row groups in subsampled_data;    */
+/* Do upsampling expansion of a single row group (of each component). */
+/* above, current, below are indexes of row groups in sampled_data;       */
 /* out is the index of the target row group in fullsize_data.             */
 /* Special case: above, below can be -1 to indicate top, bottom of image. */
 {
   jpeg_component_info *compptr;
   JSAMPARRAY above_ptr, below_ptr;
-  JSAMPROW dummy[MAX_SAMP_FACTOR]; /* for subsample expansion at top/bottom */
+  JSAMPROW dummy[MAX_SAMP_FACTOR]; /* for downsample expansion at top/bottom */
   short ci, vs, i;
 
   for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
@@ -327,30 +365,30 @@
     vs = compptr->v_samp_factor; /* row group height */
 
     if (above >= 0)
-      above_ptr = subsampled_data[ci] + above * vs;
+      above_ptr = sampled_data[ci] + above * vs;
     else {
       /* Top of image: make a dummy above-context with copies of 1st row */
       /* We assume current=0 in this case */
       for (i = 0; i < vs; i++)
-	dummy[i] = subsampled_data[ci][0];
+	dummy[i] = sampled_data[ci][0];
       above_ptr = (JSAMPARRAY) dummy; /* possible near->far pointer conv */
     }
 
     if (below >= 0)
-      below_ptr = subsampled_data[ci] + below * vs;
+      below_ptr = sampled_data[ci] + below * vs;
     else {
       /* Bot of image: make a dummy below-context with copies of last row */
       for (i = 0; i < vs; i++)
-	dummy[i] = subsampled_data[ci][(current+1)*vs-1];
+	dummy[i] = sampled_data[ci][(current+1)*vs-1];
       below_ptr = (JSAMPARRAY) dummy; /* possible near->far pointer conv */
     }
 
-    (*cinfo->methods->unsubsample[ci])
+    (*cinfo->methods->upsample[ci])
 		(cinfo, (int) ci,
-		 compptr->subsampled_width, (int) vs,
+		 compptr->downsampled_width, (int) vs,
 		 fullsize_width, (int) cinfo->max_v_samp_factor,
 		 above_ptr,
-		 subsampled_data[ci] + current * vs,
+		 sampled_data[ci] + current * vs,
 		 below_ptr,
 		 fullsize_data[ci] + out * cinfo->max_v_samp_factor);
   }
@@ -529,9 +567,9 @@
   JBLOCKIMAGE bsmooth[3];	/* this is optional */
   int whichb;
 #endif
-  /* Work buffer for subsampled image data (see comments at head of file) */
-  JSAMPIMAGE subsampled_data[2];
-  /* Work buffer for desubsampled data */
+  /* Work buffer for downsampled image data (see comments at head of file) */
+  JSAMPIMAGE sampled_data[2];
+  /* Work buffer for upsampled data */
   JSAMPIMAGE fullsize_data;
   int whichss, ri;
   short i;
@@ -565,14 +603,15 @@
     bsmooth[2] = alloc_MCU_row(cinfo);
   }
 #endif
-  /* subsampled_data is sample data before unsubsampling */
-  alloc_sampling_buffer(cinfo, subsampled_data);
-  /* fullsize_data is sample data after unsubsampling */
+  /* sampled_data is sample data before upsampling */
+  alloc_sampling_buffer(cinfo, sampled_data);
+  /* fullsize_data is sample data after upsampling */
   fullsize_data = alloc_sampimage(cinfo, (int) cinfo->num_components,
 				  (long) rows_in_mem, fullsize_width);
   /* output_workspace is the color-processed data */
   output_workspace = alloc_sampimage(cinfo, (int) cinfo->final_out_comps,
 				     (long) rows_in_mem, fullsize_width);
+  prepare_range_limit_table(cinfo);
 
   /* Tell the memory manager to instantiate big arrays.
    * We don't need any big arrays in this controller,
@@ -587,27 +626,27 @@
 
   /* Initialize to read scan data */
 
-  (*cinfo->methods->entropy_decoder_init) (cinfo);
-  (*cinfo->methods->unsubsample_init) (cinfo);
+  (*cinfo->methods->entropy_decode_init) (cinfo);
+  (*cinfo->methods->upsample_init) (cinfo);
   (*cinfo->methods->disassemble_init) (cinfo);
 
   /* Loop over scan's data: rows_in_mem pixel rows are processed per loop */
 
   pixel_rows_output = 0;
-  whichss = 1;			/* arrange to start with subsampled_data[0] */
+  whichss = 1;			/* arrange to start with sampled_data[0] */
 
   for (cur_mcu_row = 0; cur_mcu_row < cinfo->MCU_rows_in_scan;
        cur_mcu_row += mcu_rows_per_loop) {
     (*cinfo->methods->progress_monitor) (cinfo, cur_mcu_row,
 					 cinfo->MCU_rows_in_scan);
 
-    whichss ^= 1;		/* switch to other subsample buffer */
+    whichss ^= 1;		/* switch to other downsampled-data buffer */
 
     /* Obtain v_samp_factor block rows of each component in the scan. */
     /* This is a single MCU row if interleaved, multiple MCU rows if not. */
     /* In the noninterleaved case there might be fewer than v_samp_factor */
     /* block rows remaining; if so, pad with copies of the last pixel row */
-    /* so that unsubsampling doesn't have to treat it as a special case. */
+    /* so that upsampling doesn't have to treat it as a special case. */
 
     for (ri = 0; ri < mcu_rows_per_loop; ri++) {
       if (cur_mcu_row + ri < cinfo->MCU_rows_in_scan) {
@@ -621,41 +660,41 @@
 	  (*cinfo->methods->disassemble_MCU) (cinfo, coeff_data);
       
 	(*cinfo->methods->reverse_DCT) (cinfo, coeff_data,
-					subsampled_data[whichss],
+					sampled_data[whichss],
 					ri * DCTSIZE);
       } else {
-	/* Need to pad out with copies of the last subsampled row. */
+	/* Need to pad out with copies of the last downsampled row. */
 	/* This can only happen if there is just one component. */
-	duplicate_row(subsampled_data[whichss][0],
-		      cinfo->cur_comp_info[0]->subsampled_width,
+	duplicate_row(sampled_data[whichss][0],
+		      cinfo->cur_comp_info[0]->downsampled_width,
 		      ri * DCTSIZE - 1, DCTSIZE);
       }
     }
 
-    /* Unsubsample the data */
+    /* Upsample the data */
     /* First time through is a special case */
 
     if (cur_mcu_row) {
       /* Expand last row group of previous set */
-      expand(cinfo, subsampled_data[whichss], fullsize_data, fullsize_width,
+      expand(cinfo, sampled_data[whichss], fullsize_data, fullsize_width,
 	     (short) DCTSIZE, (short) (DCTSIZE+1), (short) 0,
 	     (short) (DCTSIZE-1));
       /* and dump the previous set's expanded data */
-      emit_1pass (cinfo, rows_in_mem, fullsize_data, NULL);
+      emit_1pass (cinfo, rows_in_mem, fullsize_data, (JSAMPARRAY) NULL);
       pixel_rows_output += rows_in_mem;
       /* Expand first row group of this set */
-      expand(cinfo, subsampled_data[whichss], fullsize_data, fullsize_width,
+      expand(cinfo, sampled_data[whichss], fullsize_data, fullsize_width,
 	     (short) (DCTSIZE+1), (short) 0, (short) 1,
 	     (short) 0);
     } else {
       /* Expand first row group with dummy above-context */
-      expand(cinfo, subsampled_data[whichss], fullsize_data, fullsize_width,
+      expand(cinfo, sampled_data[whichss], fullsize_data, fullsize_width,
 	     (short) (-1), (short) 0, (short) 1,
 	     (short) 0);
     }
     /* Expand second through next-to-last row groups of this set */
     for (i = 1; i <= DCTSIZE-2; i++) {
-      expand(cinfo, subsampled_data[whichss], fullsize_data, fullsize_width,
+      expand(cinfo, sampled_data[whichss], fullsize_data, fullsize_width,
 	     (short) (i-1), (short) i, (short) (i+1),
 	     (short) i);
     }
@@ -663,23 +702,23 @@
 
   /* Expand the last row group with dummy below-context */
   /* Note whichss points to last buffer side used */
-  expand(cinfo, subsampled_data[whichss], fullsize_data, fullsize_width,
+  expand(cinfo, sampled_data[whichss], fullsize_data, fullsize_width,
 	 (short) (DCTSIZE-2), (short) (DCTSIZE-1), (short) (-1),
 	 (short) (DCTSIZE-1));
   /* and dump the remaining data (may be less than full height) */
   emit_1pass (cinfo, (int) (cinfo->image_height - pixel_rows_output),
-	      fullsize_data, NULL);
+	      fullsize_data, (JSAMPARRAY) NULL);
 
   /* Clean up after the scan */
   (*cinfo->methods->disassemble_term) (cinfo);
-  (*cinfo->methods->unsubsample_term) (cinfo);
-  (*cinfo->methods->entropy_decoder_term) (cinfo);
+  (*cinfo->methods->upsample_term) (cinfo);
+  (*cinfo->methods->entropy_decode_term) (cinfo);
   (*cinfo->methods->read_scan_trailer) (cinfo);
   cinfo->completed_passes++;
 
   /* Verify that we've seen the whole input file */
   if ((*cinfo->methods->read_scan_header) (cinfo))
-    ERREXIT(cinfo->emethods, "Didn't expect more than one scan");
+    WARNMS(cinfo->emethods, "Didn't expect more than one scan");
 
   /* Release working memory */
   /* (no work -- we let free_all release what's needful) */
@@ -691,13 +730,13 @@
  * and/or 2-pass color quantization.
  *
  * The current implementation places the "big" buffer at the stage of
- * desubsampled, non-color-processed data.  This is the only place that
+ * upsampled, non-color-processed data.  This is the only place that
  * makes sense when doing 2-pass quantization.  For processing multiple-scan
  * files without 2-pass quantization, it would be possible to develop another
- * controller that buffers the subsampled data instead, thus reducing the size
+ * controller that buffers the downsampled data instead, thus reducing the size
  * of the temp files (by about a factor of 2 in typical cases).  However,
- * our present unsubsampling logic is dependent on the assumption that
- * unsubsampling occurs during a scan, so it's much easier to do the
+ * our present upsampling logic is dependent on the assumption that
+ * upsampling occurs during a scan, so it's much easier to do the
  * enlargement as the JPEG file is read.  This also simplifies life for the
  * memory manager, which would otherwise have to deal with overlapping
  * access_big_sarray() requests.
@@ -721,8 +760,8 @@
   JBLOCKIMAGE bsmooth[3];	/* this is optional */
   int whichb;
 #endif
-  /* Work buffer for subsampled image data (see comments at head of file) */
-  JSAMPIMAGE subsampled_data[2];
+  /* Work buffer for downsampled image data (see comments at head of file) */
+  JSAMPIMAGE sampled_data[2];
   int whichss, ri;
   short ci, i;
   boolean single_scan;
@@ -737,8 +776,9 @@
   /* output_workspace is the color-processed data */
   output_workspace = alloc_sampimage(cinfo, (int) cinfo->final_out_comps,
 				     (long) rows_in_mem, fullsize_width);
+  prepare_range_limit_table(cinfo);
 
-  /* Get a big image: fullsize_image is sample data after unsubsampling. */
+  /* Get a big image: fullsize_image is sample data after upsampling. */
   fullsize_image = (big_sarray_ptr *) (*cinfo->emethods->alloc_small)
 			(cinfo->num_components * SIZEOF(big_sarray_ptr));
   for (ci = 0; ci < cinfo->num_components; ci++) {
@@ -753,7 +793,7 @@
 
   /* Tell the memory manager to instantiate big arrays */
   (*cinfo->emethods->alloc_big_arrays)
-	 /* extra sarray space is for subsampled-data buffers: */
+	 /* extra sarray space is for downsampled-data buffers: */
 	((long) (fullsize_width			/* max width in samples */
 	 * cinfo->max_v_samp_factor*(DCTSIZE+2)	/* max height */
 	 * cinfo->num_components),		/* max components per scan */
@@ -818,8 +858,8 @@
       bsmooth[2] = alloc_MCU_row(cinfo);
     }
 #endif
-    /* subsampled_data is sample data before unsubsampling */
-    alloc_sampling_buffer(cinfo, subsampled_data);
+    /* sampled_data is sample data before upsampling */
+    alloc_sampling_buffer(cinfo, sampled_data);
 
     /* line up the big buffers for components in this scan */
     for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
@@ -830,27 +870,27 @@
     
     /* Initialize to read scan data */
     
-    (*cinfo->methods->entropy_decoder_init) (cinfo);
-    (*cinfo->methods->unsubsample_init) (cinfo);
+    (*cinfo->methods->entropy_decode_init) (cinfo);
+    (*cinfo->methods->upsample_init) (cinfo);
     (*cinfo->methods->disassemble_init) (cinfo);
     
     /* Loop over scan's data: rows_in_mem pixel rows are processed per loop */
     
     pixel_rows_output = 0;
-    whichss = 1;		/* arrange to start with subsampled_data[0] */
+    whichss = 1;		/* arrange to start with sampled_data[0] */
     
     for (cur_mcu_row = 0; cur_mcu_row < cinfo->MCU_rows_in_scan;
 	 cur_mcu_row += mcu_rows_per_loop) {
       (*cinfo->methods->progress_monitor) (cinfo, cur_mcu_row,
 					   cinfo->MCU_rows_in_scan);
 
-      whichss ^= 1;		/* switch to other subsample buffer */
+      whichss ^= 1;		/* switch to other downsampled-data buffer */
 
       /* Obtain v_samp_factor block rows of each component in the scan. */
       /* This is a single MCU row if interleaved, multiple MCU rows if not. */
       /* In the noninterleaved case there might be fewer than v_samp_factor */
       /* block rows remaining; if so, pad with copies of the last pixel row */
-      /* so that unsubsampling doesn't have to treat it as a special case. */
+      /* so that upsampling doesn't have to treat it as a special case. */
       
       for (ri = 0; ri < mcu_rows_per_loop; ri++) {
 	if (cur_mcu_row + ri < cinfo->MCU_rows_in_scan) {
@@ -864,23 +904,23 @@
 	    (*cinfo->methods->disassemble_MCU) (cinfo, coeff_data);
 	  
 	  (*cinfo->methods->reverse_DCT) (cinfo, coeff_data,
-					  subsampled_data[whichss],
+					  sampled_data[whichss],
 					  ri * DCTSIZE);
 	} else {
-	  /* Need to pad out with copies of the last subsampled row. */
+	  /* Need to pad out with copies of the last downsampled row. */
 	  /* This can only happen if there is just one component. */
-	  duplicate_row(subsampled_data[whichss][0],
-			cinfo->cur_comp_info[0]->subsampled_width,
+	  duplicate_row(sampled_data[whichss][0],
+			cinfo->cur_comp_info[0]->downsampled_width,
 			ri * DCTSIZE - 1, DCTSIZE);
 	}
       }
       
-      /* Unsubsample the data */
+      /* Upsample the data */
       /* First time through is a special case */
       
       if (cur_mcu_row) {
 	/* Expand last row group of previous set */
-	expand(cinfo, subsampled_data[whichss], fullsize_ptrs, fullsize_width,
+	expand(cinfo, sampled_data[whichss], fullsize_ptrs, fullsize_width,
 	       (short) DCTSIZE, (short) (DCTSIZE+1), (short) 0,
 	       (short) (DCTSIZE-1));
 	/* If single scan, can do color quantization prescan on-the-fly */
@@ -896,18 +936,18 @@
 	     pixel_rows_output, TRUE);
 	}
 	/* Expand first row group of this set */
-	expand(cinfo, subsampled_data[whichss], fullsize_ptrs, fullsize_width,
+	expand(cinfo, sampled_data[whichss], fullsize_ptrs, fullsize_width,
 	       (short) (DCTSIZE+1), (short) 0, (short) 1,
 	       (short) 0);
       } else {
 	/* Expand first row group with dummy above-context */
-	expand(cinfo, subsampled_data[whichss], fullsize_ptrs, fullsize_width,
+	expand(cinfo, sampled_data[whichss], fullsize_ptrs, fullsize_width,
 	       (short) (-1), (short) 0, (short) 1,
 	       (short) 0);
       }
       /* Expand second through next-to-last row groups of this set */
       for (i = 1; i <= DCTSIZE-2; i++) {
-	expand(cinfo, subsampled_data[whichss], fullsize_ptrs, fullsize_width,
+	expand(cinfo, sampled_data[whichss], fullsize_ptrs, fullsize_width,
 	       (short) (i-1), (short) i, (short) (i+1),
 	       (short) i);
       }
@@ -915,7 +955,7 @@
     
     /* Expand the last row group with dummy below-context */
     /* Note whichss points to last buffer side used */
-    expand(cinfo, subsampled_data[whichss], fullsize_ptrs, fullsize_width,
+    expand(cinfo, sampled_data[whichss], fullsize_ptrs, fullsize_width,
 	   (short) (DCTSIZE-2), (short) (DCTSIZE-1), (short) (-1),
 	   (short) (DCTSIZE-1));
     /* If single scan, finish on-the-fly color quantization prescan */
@@ -926,8 +966,8 @@
     
     /* Clean up after the scan */
     (*cinfo->methods->disassemble_term) (cinfo);
-    (*cinfo->methods->unsubsample_term) (cinfo);
-    (*cinfo->methods->entropy_decoder_term) (cinfo);
+    (*cinfo->methods->upsample_term) (cinfo);
+    (*cinfo->methods->entropy_decode_term) (cinfo);
     (*cinfo->methods->read_scan_trailer) (cinfo);
     if (single_scan)
       cinfo->completed_passes++;
@@ -943,7 +983,7 @@
       free_MCU_row(cinfo, bsmooth[2]);
     }
 #endif
-    free_sampling_buffer(cinfo, subsampled_data);
+    free_sampling_buffer(cinfo, sampled_data);
     
     /* Repeat if there is another scan */
   } while ((!single_scan) && (*cinfo->methods->read_scan_header) (cinfo));
@@ -951,7 +991,7 @@
   if (single_scan) {
     /* If we expected just one scan, make SURE there's just one */
     if ((*cinfo->methods->read_scan_header) (cinfo))
-      ERREXIT(cinfo->emethods, "Didn't expect more than one scan");
+      WARNMS(cinfo->emethods, "Didn't expect more than one scan");
     /* We did the CQ prescan on-the-fly, so we are all set. */
   } else {
     /* For multiple-scan file, do the CQ prescan as a separate pass. */
diff --git a/jdsample.c b/jdsample.c
index 71fb453..3f49e29 100644
--- a/jdsample.c
+++ b/jdsample.c
@@ -5,37 +5,48 @@
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
- * This file contains un-subsampling routines.
- * These routines are invoked via the unsubsample and
- * unsubsample_init/term methods.
+ * This file contains upsampling routines.
+ * These routines are invoked via the upsample and
+ * upsample_init/term methods.
+ *
+ * An excellent reference for image resampling is
+ *   Digital Image Warping, George Wolberg, 1990.
+ *   Pub. by IEEE Computer Society Press, Los Alamitos, CA. ISBN 0-8186-8944-7.
  */
 
 #include "jinclude.h"
 
 
 /*
- * Initialize for un-subsampling a scan.
+ * Initialize for upsampling a scan.
  */
 
 METHODDEF void
-unsubsample_init (decompress_info_ptr cinfo)
+upsample_init (decompress_info_ptr cinfo)
 {
   /* no work for now */
 }
 
 
 /*
- * Un-subsample pixel values of a single component.
+ * Upsample pixel values of a single component.
  * This version handles any integral sampling ratios.
+ *
  * This is not used for typical JPEG files, so it need not be fast.
+ * Nor, for that matter, is it particularly accurate: the algorithm is
+ * simple replication of the input pixel onto the corresponding output
+ * pixels.  The hi-falutin sampling literature refers to this as a
+ * "box filter".  A box filter tends to introduce visible artifacts,
+ * so if you are actually going to use 3:1 or 4:1 sampling ratios
+ * you would be well advised to improve this code.
  */
 
 METHODDEF void
-int_unsubsample (decompress_info_ptr cinfo, int which_component,
-		 long input_cols, int input_rows,
-		 long output_cols, int output_rows,
-		 JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
-		 JSAMPARRAY output_data)
+int_upsample (decompress_info_ptr cinfo, int which_component,
+	      long input_cols, int input_rows,
+	      long output_cols, int output_rows,
+	      JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+	      JSAMPARRAY output_data)
 {
   jpeg_component_info * compptr = cinfo->cur_comp_info[which_component];
   register JSAMPROW inptr, outptr;
@@ -51,7 +62,7 @@
       (input_cols % compptr->h_samp_factor) != 0 ||
       (output_cols % cinfo->max_h_samp_factor) != 0 ||
       output_cols*compptr->h_samp_factor != input_cols*cinfo->max_h_samp_factor)
-    ERREXIT(cinfo->emethods, "Bogus unsubsample parameters");
+    ERREXIT(cinfo->emethods, "Bogus upsample parameters");
 #endif
 
   h_expand = cinfo->max_h_samp_factor / compptr->h_samp_factor;
@@ -74,95 +85,149 @@
 
 
 /*
- * Un-subsample pixel values of a single component.
- * This version handles the extremely common case of
- * horizontal expansion by 2 and any integral vertical expansion.
+ * Upsample pixel values of a single component.
+ * This version handles the common case of 2:1 horizontal and 1:1 vertical.
+ *
+ * The upsampling algorithm is linear interpolation between pixel centers,
+ * also known as a "triangle filter".  This is a good compromise between
+ * speed and visual quality.  The centers of the output pixels are 1/4 and 3/4
+ * of the way between input pixel centers.
  */
 
 METHODDEF void
-h2_unsubsample (decompress_info_ptr cinfo, int which_component,
-		long input_cols, int input_rows,
-		long output_cols, int output_rows,
-		JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
-		JSAMPARRAY output_data)
+h2v1_upsample (decompress_info_ptr cinfo, int which_component,
+	       long input_cols, int input_rows,
+	       long output_cols, int output_rows,
+	       JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+	       JSAMPARRAY output_data)
 {
-  jpeg_component_info * compptr = cinfo->cur_comp_info[which_component];
   register JSAMPROW inptr, outptr;
-  register JSAMPLE invalue;
-  short v_expand, v;
-  int inrow, outrow;
-  register long incol;
+  register int invalue;
+  int inrow;
+  register long colctr;
 
 #ifdef DEBUG			/* for debugging pipeline controller */
+  jpeg_component_info * compptr = cinfo->cur_comp_info[which_component];
   if (input_rows != compptr->v_samp_factor ||
       output_rows != cinfo->max_v_samp_factor ||
       (input_cols % compptr->h_samp_factor) != 0 ||
       (output_cols % cinfo->max_h_samp_factor) != 0 ||
       output_cols*compptr->h_samp_factor != input_cols*cinfo->max_h_samp_factor)
-    ERREXIT(cinfo->emethods, "Bogus unsubsample parameters");
+    ERREXIT(cinfo->emethods, "Bogus upsample parameters");
 #endif
 
-  v_expand = cinfo->max_v_samp_factor / compptr->v_samp_factor;
+  for (inrow = 0; inrow < input_rows; inrow++) {
+    inptr = input_data[inrow];
+    outptr = output_data[inrow];
+    /* Special case for first column */
+    invalue = GETJSAMPLE(*inptr++);
+    *outptr++ = (JSAMPLE) invalue;
+    *outptr++ = (JSAMPLE) ((invalue * 3 + GETJSAMPLE(*inptr) + 2) >> 2);
 
-/* The subsampled image width will always be a multiple of DCTSIZE,
- * so we can unroll the inner loop.
+    for (colctr = input_cols - 2; colctr > 0; colctr--) {
+      /* General case: 3/4 * nearer pixel + 1/4 * further pixel */
+      invalue = GETJSAMPLE(*inptr++) * 3;
+      *outptr++ = (JSAMPLE) ((invalue + GETJSAMPLE(inptr[-2]) + 2) >> 2);
+      *outptr++ = (JSAMPLE) ((invalue + GETJSAMPLE(*inptr) + 2) >> 2);
+    }
+
+    /* Special case for last column */
+    invalue = GETJSAMPLE(*inptr);
+    *outptr++ = (JSAMPLE) ((invalue * 3 + GETJSAMPLE(inptr[-1]) + 2) >> 2);
+    *outptr++ = (JSAMPLE) invalue;
+  }
+}
+
+
+/*
+ * Upsample pixel values of a single component.
+ * This version handles the common case of 2:1 horizontal and 2:1 vertical.
+ *
+ * The upsampling algorithm is linear interpolation between pixel centers,
+ * also known as a "triangle filter".  This is a good compromise between
+ * speed and visual quality.  The centers of the output pixels are 1/4 and 3/4
+ * of the way between input pixel centers.
  */
 
+METHODDEF void
+h2v2_upsample (decompress_info_ptr cinfo, int which_component,
+	       long input_cols, int input_rows,
+	       long output_cols, int output_rows,
+	       JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+	       JSAMPARRAY output_data)
+{
+  register JSAMPROW inptr0, inptr1, outptr;
+#ifdef EIGHT_BIT_SAMPLES
+  register int thiscolsum, lastcolsum, nextcolsum;
+#else
+  register INT32 thiscolsum, lastcolsum, nextcolsum;
+#endif
+  int inrow, outrow, v;
+  register long colctr;
+
+#ifdef DEBUG			/* for debugging pipeline controller */
+  jpeg_component_info * compptr = cinfo->cur_comp_info[which_component];
+  if (input_rows != compptr->v_samp_factor ||
+      output_rows != cinfo->max_v_samp_factor ||
+      (input_cols % compptr->h_samp_factor) != 0 ||
+      (output_cols % cinfo->max_h_samp_factor) != 0 ||
+      output_cols*compptr->h_samp_factor != input_cols*cinfo->max_h_samp_factor)
+    ERREXIT(cinfo->emethods, "Bogus upsample parameters");
+#endif
+
   outrow = 0;
   for (inrow = 0; inrow < input_rows; inrow++) {
-    for (v = 0; v < v_expand; v++) {
-      inptr = input_data[inrow];
+    for (v = 0; v < 2; v++) {
+      /* inptr0 points to nearest input row, inptr1 points to next nearest */
+      inptr0 = input_data[inrow];
+      if (v == 0) {		/* next nearest is row above */
+	if (inrow == 0)
+	  inptr1 = above[input_rows-1];
+	else
+	  inptr1 = input_data[inrow-1];
+      } else {			/* next nearest is row below */
+	if (inrow == input_rows-1)
+	  inptr1 = below[0];
+	else
+	  inptr1 = input_data[inrow+1];
+      }
       outptr = output_data[outrow++];
-#if DCTSIZE == 8
-      for (incol = 0; incol < input_cols; incol += DCTSIZE) {
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
+
+      /* Special case for first column */
+      thiscolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++);
+      nextcolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++);
+      *outptr++ = (JSAMPLE) ((thiscolsum * 4 + 8) >> 4);
+      *outptr++ = (JSAMPLE) ((thiscolsum * 3 + nextcolsum + 8) >> 4);
+      lastcolsum = thiscolsum; thiscolsum = nextcolsum;
+
+      for (colctr = input_cols - 2; colctr > 0; colctr--) {
+	/* General case: 3/4 * nearer pixel + 1/4 * further pixel in each */
+	/* dimension, thus 9/16, 3/16, 3/16, 1/16 overall */
+	nextcolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++);
+	*outptr++ = (JSAMPLE) ((thiscolsum * 3 + lastcolsum + 8) >> 4);
+	*outptr++ = (JSAMPLE) ((thiscolsum * 3 + nextcolsum + 8) >> 4);
+	lastcolsum = thiscolsum; thiscolsum = nextcolsum;
       }
-#else /* nonstandard DCTSIZE */
-      for (incol = 0; incol < input_cols; incol++) {
-	invalue = GETJSAMPLE(*inptr++);
-	*outptr++ = invalue;
-	*outptr++ = invalue;
-      }
-#endif
+
+      /* Special case for last column */
+      *outptr++ = (JSAMPLE) ((thiscolsum * 3 + lastcolsum + 8) >> 4);
+      *outptr++ = (JSAMPLE) ((thiscolsum * 4 + 8) >> 4);
     }
   }
 }
 
 
 /*
- * Un-subsample pixel values of a single component.
+ * Upsample pixel values of a single component.
  * This version handles the special case of a full-size component.
  */
 
 METHODDEF void
-fullsize_unsubsample (decompress_info_ptr cinfo, int which_component,
-		      long input_cols, int input_rows,
-		      long output_cols, int output_rows,
-		      JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
-		      JSAMPARRAY output_data)
+fullsize_upsample (decompress_info_ptr cinfo, int which_component,
+		   long input_cols, int input_rows,
+		   long output_cols, int output_rows,
+		   JSAMPARRAY above, JSAMPARRAY input_data, JSAMPARRAY below,
+		   JSAMPARRAY output_data)
 {
 #ifdef DEBUG			/* for debugging pipeline controller */
   if (input_cols != output_cols || input_rows != output_rows)
@@ -179,7 +244,7 @@
  */
 
 METHODDEF void
-unsubsample_term (decompress_info_ptr cinfo)
+upsample_term (decompress_info_ptr cinfo)
 {
   /* no work for now */
 }
@@ -187,34 +252,37 @@
 
 
 /*
- * The method selection routine for unsubsampling.
+ * The method selection routine for upsampling.
  * Note that we must select a routine for each component.
  */
 
 GLOBAL void
-jselunsubsample (decompress_info_ptr cinfo)
+jselupsample (decompress_info_ptr cinfo)
 {
   short ci;
   jpeg_component_info * compptr;
 
   if (cinfo->CCIR601_sampling)
-    ERREXIT(cinfo->emethods, "CCIR601 subsampling not implemented yet");
+    ERREXIT(cinfo->emethods, "CCIR601 upsampling not implemented yet");
 
   for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
     compptr = cinfo->cur_comp_info[ci];
     if (compptr->h_samp_factor == cinfo->max_h_samp_factor &&
 	compptr->v_samp_factor == cinfo->max_v_samp_factor)
-      cinfo->methods->unsubsample[ci] = fullsize_unsubsample;
+      cinfo->methods->upsample[ci] = fullsize_upsample;
     else if (compptr->h_samp_factor * 2 == cinfo->max_h_samp_factor &&
-	     (cinfo->max_v_samp_factor % compptr->v_samp_factor) == 0)
-      cinfo->methods->unsubsample[ci] = h2_unsubsample;
+	     compptr->v_samp_factor == cinfo->max_v_samp_factor)
+      cinfo->methods->upsample[ci] = h2v1_upsample;
+    else if (compptr->h_samp_factor * 2 == cinfo->max_h_samp_factor &&
+	     compptr->v_samp_factor * 2 == cinfo->max_v_samp_factor)
+      cinfo->methods->upsample[ci] = h2v2_upsample;
     else if ((cinfo->max_h_samp_factor % compptr->h_samp_factor) == 0 &&
 	     (cinfo->max_v_samp_factor % compptr->v_samp_factor) == 0)
-      cinfo->methods->unsubsample[ci] = int_unsubsample;
+      cinfo->methods->upsample[ci] = int_upsample;
     else
-      ERREXIT(cinfo->emethods, "Fractional subsampling not implemented yet");
+      ERREXIT(cinfo->emethods, "Fractional upsampling not implemented yet");
   }
 
-  cinfo->methods->unsubsample_init = unsubsample_init;
-  cinfo->methods->unsubsample_term = unsubsample_term;
+  cinfo->methods->upsample_init = upsample_init;
+  cinfo->methods->upsample_term = upsample_term;
 }
diff --git a/jerror.c b/jerror.c
index 2302312..0256f56 100644
--- a/jerror.c
+++ b/jerror.c
@@ -48,7 +48,7 @@
 METHODDEF void
 error_exit (const char *msgtext)
 {
-  trace_message(msgtext);
+  (*methods->trace_message) (msgtext);
   (*methods->free_all) ();	/* clean up memory allocation */
   exit(EXIT_FAILURE);
 }
@@ -69,4 +69,13 @@
   emethods->trace_message = trace_message;
 
   emethods->trace_level = 0;	/* default = no tracing */
+
+  emethods->num_warnings = 0;	/* no warnings emitted yet */
+  /* By default, the first corrupt-data warning will be displayed,
+   * but additional ones will appear only if trace level is at least 3.
+   * A corrupt data file could generate many warnings, so it's a good idea
+   * to suppress additional messages except at high tracing levels.
+   */
+  emethods->first_warning_level = 0;
+  emethods->more_warning_level = 3;
 }
diff --git a/jfwddct.c b/jfwddct.c
index 0ca0e78..3254bce 100644
--- a/jfwddct.c
+++ b/jfwddct.c
@@ -8,10 +8,15 @@
  * This file contains the basic DCT (Discrete Cosine Transform)
  * transformation subroutine.
  *
- * This implementation is based on Appendix A.2 of the book
- * "Discrete Cosine Transform---Algorithms, Advantages, Applications"
- * by K.R. Rao and P. Yip  (Academic Press, Inc, London, 1990).
- * It uses scaled fixed-point arithmetic instead of floating point.
+ * This implementation is based on an algorithm described in
+ *   C. Loeffler, A. Ligtenberg and G. Moschytz, "Practical Fast 1-D DCT
+ *   Algorithms with 11 Multiplications", Proc. Int'l. Conf. on Acoustics,
+ *   Speech, and Signal Processing 1989 (ICASSP '89), pp. 988-991.
+ * The primary algorithm described there uses 11 multiplies and 29 adds.
+ * We use their alternate method with 12 multiplies and 32 adds.
+ * The advantage of this method is that no data path contains more than one
+ * multiplication; this allows a very simple and accurate implementation in
+ * scaled fixed-point arithmetic, with a minimal number of shifts.
  */
 
 #include "jinclude.h"
@@ -25,184 +30,269 @@
 #endif
 
 
-/* The poop on this scaling stuff is as follows:
+/*
+ * A 2-D DCT can be done by 1-D DCT on each row followed by 1-D DCT
+ * on each column.  Direct algorithms are also available, but they are
+ * much more complex and seem not to be any faster when reduced to code.
+ *
+ * The poop on this scaling stuff is as follows:
+ *
+ * Each 1-D DCT step produces outputs which are a factor of sqrt(N)
+ * larger than the true DCT outputs.  The final outputs are therefore
+ * a factor of N larger than desired; since N=8 this can be cured by
+ * a simple right shift at the end of the algorithm.  The advantage of
+ * this arrangement is that we save two multiplications per 1-D DCT,
+ * because the y0 and y4 outputs need not be divided by sqrt(N).
  *
  * We have to do addition and subtraction of the integer inputs, which
  * is no problem, and multiplication by fractional constants, which is
  * a problem to do in integer arithmetic.  We multiply all the constants
- * by DCT_SCALE and convert them to integer constants (thus retaining
- * LG2_DCT_SCALE bits of precision in the constants).  After doing a
- * multiplication we have to divide the product by DCT_SCALE, with proper
- * rounding, to produce the correct output.  The division can be implemented
- * cheaply as a right shift of LG2_DCT_SCALE bits.  The DCT equations also
- * specify an additional division by 2 on the final outputs; this can be
- * folded into the right-shift by shifting one more bit (see UNFIXH).
+ * by CONST_SCALE and convert them to integer constants (thus retaining
+ * CONST_BITS bits of precision in the constants).  After doing a
+ * multiplication we have to divide the product by CONST_SCALE, with proper
+ * rounding, to produce the correct output.  This division can be done
+ * cheaply as a right shift of CONST_BITS bits.  We postpone shifting
+ * as long as possible so that partial sums can be added together with
+ * full fractional precision.
  *
- * If you are planning to recode this in assembler, you might want to set
- * LG2_DCT_SCALE to 15.  This loses a bit of precision, but then all the
- * multiplications are between 16-bit quantities (given 8-bit JSAMPLEs!)
- * so you could use a signed 16x16=>32 bit multiply instruction instead of
- * full 32x32 multiply.  Unfortunately there's no way to describe such a
- * multiply portably in C, so we've gone for the extra bit of accuracy here.
+ * The outputs of the first pass are scaled up by PASS1_BITS bits so that
+ * they are represented to better-than-integral precision.  These outputs
+ * require BITS_IN_JSAMPLE + PASS1_BITS + 3 bits; this fits in a 16-bit word
+ * with the recommended scaling.  (To scale up 12-bit sample data, an
+ * intermediate INT32 array would be needed.)
+ *
+ * To avoid overflow of the 32-bit intermediate results in pass 2, we must
+ * have BITS_IN_JSAMPLE + CONST_BITS + PASS1_BITS <= 25.  Error analysis
+ * shows that the values given below are the most effective.
  */
 
 #ifdef EIGHT_BIT_SAMPLES
-#define LG2_DCT_SCALE 16
+#define CONST_BITS  13
+#define PASS1_BITS  2
 #else
-#define LG2_DCT_SCALE 15	/* lose a little precision to avoid overflow */
+#define CONST_BITS  13
+#define PASS1_BITS  0		/* lose a little precision to avoid overflow */
 #endif
 
 #define ONE	((INT32) 1)
 
-#define DCT_SCALE (ONE << LG2_DCT_SCALE)
+#define CONST_SCALE (ONE << CONST_BITS)
 
-/* In some places we shift the inputs left by a couple more bits, */
-/* so that they can be added to fractional results without too much */
-/* loss of precision. */
-#define LG2_OVERSCALE 2
-#define OVERSCALE  (ONE << LG2_OVERSCALE)
-#define OVERSHIFT(x)  ((x) <<= LG2_OVERSCALE)
+/* Convert a positive real constant to an integer scaled by CONST_SCALE. */
 
-/* Scale a fractional constant by DCT_SCALE */
-#define FIX(x)	((INT32) ((x) * DCT_SCALE + 0.5))
+#define FIX(x)	((INT32) ((x) * CONST_SCALE + 0.5))
 
-/* Scale a fractional constant by DCT_SCALE/OVERSCALE */
-/* Such a constant can be multiplied with an overscaled input */
-/* to produce something that's scaled by DCT_SCALE */
-#define FIXO(x)  ((INT32) ((x) * DCT_SCALE / OVERSCALE + 0.5))
+/* Some C compilers fail to reduce "FIX(constant)" at compile time, thus
+ * causing a lot of useless floating-point operations at run time.
+ * To get around this we use the following pre-calculated constants.
+ * If you change CONST_BITS you may want to add appropriate values.
+ * (With a reasonable C compiler, you can just rely on the FIX() macro...)
+ */
 
-/* Descale and correctly round a value that's scaled by DCT_SCALE */
-#define UNFIX(x)   RIGHT_SHIFT((x) + (ONE << (LG2_DCT_SCALE-1)), LG2_DCT_SCALE)
+#if CONST_BITS == 13
+#define FIX_0_298631336  ((INT32)  2446)	/* FIX(0.298631336) */
+#define FIX_0_390180644  ((INT32)  3196)	/* FIX(0.390180644) */
+#define FIX_0_541196100  ((INT32)  4433)	/* FIX(0.541196100) */
+#define FIX_0_765366865  ((INT32)  6270)	/* FIX(0.765366865) */
+#define FIX_0_899976223  ((INT32)  7373)	/* FIX(0.899976223) */
+#define FIX_1_175875602  ((INT32)  9633)	/* FIX(1.175875602) */
+#define FIX_1_501321110  ((INT32)  12299)	/* FIX(1.501321110) */
+#define FIX_1_847759065  ((INT32)  15137)	/* FIX(1.847759065) */
+#define FIX_1_961570560  ((INT32)  16069)	/* FIX(1.961570560) */
+#define FIX_2_053119869  ((INT32)  16819)	/* FIX(2.053119869) */
+#define FIX_2_562915447  ((INT32)  20995)	/* FIX(2.562915447) */
+#define FIX_3_072711026  ((INT32)  25172)	/* FIX(3.072711026) */
+#else
+#define FIX_0_298631336  FIX(0.298631336)
+#define FIX_0_390180644  FIX(0.390180644)
+#define FIX_0_541196100  FIX(0.541196100)
+#define FIX_0_765366865  FIX(0.765366865)
+#define FIX_0_899976223  FIX(0.899976223)
+#define FIX_1_175875602  FIX(1.175875602)
+#define FIX_1_501321110  FIX(1.501321110)
+#define FIX_1_847759065  FIX(1.847759065)
+#define FIX_1_961570560  FIX(1.961570560)
+#define FIX_2_053119869  FIX(2.053119869)
+#define FIX_2_562915447  FIX(2.562915447)
+#define FIX_3_072711026  FIX(3.072711026)
+#endif
 
-/* Same with an additional division by 2, ie, correctly rounded UNFIX(x/2) */
-#define UNFIXH(x)  RIGHT_SHIFT((x) + (ONE << LG2_DCT_SCALE), LG2_DCT_SCALE+1)
 
-/* Take a value scaled by DCT_SCALE and round to integer scaled by OVERSCALE */
-#define UNFIXO(x)  RIGHT_SHIFT((x) + (ONE << (LG2_DCT_SCALE-1-LG2_OVERSCALE)),\
-			       LG2_DCT_SCALE-LG2_OVERSCALE)
+/* Descale and correctly round an INT32 value that's scaled by N bits.
+ * We assume RIGHT_SHIFT rounds towards minus infinity, so adding
+ * the fudge factor is correct for either sign of X.
+ */
 
-/* Here are the constants we need */
-/* SIN_i_j is sine of i*pi/j, scaled by DCT_SCALE */
-/* COS_i_j is cosine of i*pi/j, scaled by DCT_SCALE */
+#define DESCALE(x,n)  RIGHT_SHIFT((x) + (ONE << ((n)-1)), n)
 
-#define SIN_1_4 FIX(0.707106781)
-#define COS_1_4 SIN_1_4
+/* Multiply an INT32 variable by an INT32 constant to yield an INT32 result.
+ * For 8-bit samples with the recommended scaling, all the variable
+ * and constant values involved are no more than 16 bits wide, so a
+ * 16x16->32 bit multiply can be used instead of a full 32x32 multiply;
+ * this provides a useful speedup on many machines.
+ * There is no way to specify a 16x16->32 multiply in portable C, but
+ * some C compilers will do the right thing if you provide the correct
+ * combination of casts.
+ * NB: for 12-bit samples, a full 32-bit multiplication will be needed.
+ */
 
-#define SIN_1_8 FIX(0.382683432)
-#define COS_1_8 FIX(0.923879533)
-#define SIN_3_8 COS_1_8
-#define COS_3_8 SIN_1_8
+#ifdef EIGHT_BIT_SAMPLES
+#ifdef SHORTxSHORT_32		/* may work if 'int' is 32 bits */
+#define MULTIPLY(var,const)  (((INT16) (var)) * ((INT16) (const)))
+#endif
+#ifdef SHORTxLCONST_32		/* known to work with Microsoft C 6.0 */
+#define MULTIPLY(var,const)  (((INT16) (var)) * ((INT32) (const)))
+#endif
+#endif
 
-#define SIN_1_16 FIX(0.195090322)
-#define COS_1_16 FIX(0.980785280)
-#define SIN_7_16 COS_1_16
-#define COS_7_16 SIN_1_16
-
-#define SIN_3_16 FIX(0.555570233)
-#define COS_3_16 FIX(0.831469612)
-#define SIN_5_16 COS_3_16
-#define COS_5_16 SIN_3_16
-
-/* OSIN_i_j is sine of i*pi/j, scaled by DCT_SCALE/OVERSCALE */
-/* OCOS_i_j is cosine of i*pi/j, scaled by DCT_SCALE/OVERSCALE */
-
-#define OSIN_1_4 FIXO(0.707106781)
-#define OCOS_1_4 OSIN_1_4
-
-#define OSIN_1_8 FIXO(0.382683432)
-#define OCOS_1_8 FIXO(0.923879533)
-#define OSIN_3_8 OCOS_1_8
-#define OCOS_3_8 OSIN_1_8
-
-#define OSIN_1_16 FIXO(0.195090322)
-#define OCOS_1_16 FIXO(0.980785280)
-#define OSIN_7_16 OCOS_1_16
-#define OCOS_7_16 OSIN_1_16
-
-#define OSIN_3_16 FIXO(0.555570233)
-#define OCOS_3_16 FIXO(0.831469612)
-#define OSIN_5_16 OCOS_3_16
-#define OCOS_5_16 OSIN_3_16
+#ifndef MULTIPLY		/* default definition */
+#define MULTIPLY(var,const)  ((var) * (const))
+#endif
 
 
 /*
  * Perform the forward DCT on one block of samples.
- *
- * A 2-D DCT can be done by 1-D DCT on each row
- * followed by 1-D DCT on each column.
  */
 
 GLOBAL void
 j_fwd_dct (DCTBLOCK data)
 {
-  int pass, rowctr;
-  register DCTELEM *inptr, *outptr;
-  DCTBLOCK workspace;
+  INT32 tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7;
+  INT32 tmp10, tmp11, tmp12, tmp13;
+  INT32 z1, z2, z3, z4, z5;
+  register DCTELEM *dataptr;
+  int rowctr;
+  SHIFT_TEMPS
 
-  /* Each iteration of the inner loop performs one 8-point 1-D DCT.
-   * It reads from a *row* of the input matrix and stores into a *column*
-   * of the output matrix.  In the first pass, we read from the data[] array
-   * and store into the local workspace[].  In the second pass, we read from
-   * the workspace[] array and store into data[], thus performing the
-   * equivalent of a columnar DCT pass with no variable array indexing.
-   */
+  /* Pass 1: process rows. */
+  /* Note results are scaled up by sqrt(8) compared to a true DCT; */
+  /* furthermore, we scale the results by 2**PASS1_BITS. */
 
-  inptr = data;			/* initialize pointers for first pass */
-  outptr = workspace;
-  for (pass = 1; pass >= 0; pass--) {
-    for (rowctr = DCTSIZE-1; rowctr >= 0; rowctr--) {
-      /* many tmps have nonoverlapping lifetime -- flashy register colourers
-       * should be able to do this lot very well
-       */
-      INT32 tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7;
-      INT32 tmp10, tmp11, tmp12, tmp13;
-      INT32 tmp14, tmp15, tmp16, tmp17;
-      INT32 tmp25, tmp26;
-      SHIFT_TEMPS
+  dataptr = data;
+  for (rowctr = DCTSIZE-1; rowctr >= 0; rowctr--) {
+    tmp0 = dataptr[0] + dataptr[7];
+    tmp7 = dataptr[0] - dataptr[7];
+    tmp1 = dataptr[1] + dataptr[6];
+    tmp6 = dataptr[1] - dataptr[6];
+    tmp2 = dataptr[2] + dataptr[5];
+    tmp5 = dataptr[2] - dataptr[5];
+    tmp3 = dataptr[3] + dataptr[4];
+    tmp4 = dataptr[3] - dataptr[4];
+    
+    /* Even part per LL&M figure 1 --- note that published figure is faulty;
+     * rotator "sqrt(2)*c1" should be "sqrt(2)*c6".
+     */
+    
+    tmp10 = tmp0 + tmp3;
+    tmp13 = tmp0 - tmp3;
+    tmp11 = tmp1 + tmp2;
+    tmp12 = tmp1 - tmp2;
+    
+    dataptr[0] = (DCTELEM) ((tmp10 + tmp11) << PASS1_BITS);
+    dataptr[4] = (DCTELEM) ((tmp10 - tmp11) << PASS1_BITS);
+    
+    z1 = MULTIPLY(tmp12 + tmp13, FIX_0_541196100);
+    dataptr[2] = (DCTELEM) DESCALE(z1 + MULTIPLY(tmp13, FIX_0_765366865),
+				   CONST_BITS-PASS1_BITS);
+    dataptr[6] = (DCTELEM) DESCALE(z1 + MULTIPLY(tmp12, - FIX_1_847759065),
+				   CONST_BITS-PASS1_BITS);
+    
+    /* Odd part per figure 8 --- note paper omits factor of sqrt(2).
+     * cK represents cos(K*pi/16).
+     * i0..i3 in the paper are tmp4..tmp7 here.
+     */
+    
+    z1 = tmp4 + tmp7;
+    z2 = tmp5 + tmp6;
+    z3 = tmp4 + tmp6;
+    z4 = tmp5 + tmp7;
+    z5 = MULTIPLY(z3 + z4, FIX_1_175875602); /* sqrt(2) * c3 */
+    
+    tmp4 = MULTIPLY(tmp4, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
+    tmp5 = MULTIPLY(tmp5, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
+    tmp6 = MULTIPLY(tmp6, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
+    tmp7 = MULTIPLY(tmp7, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
+    z1 = MULTIPLY(z1, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
+    z2 = MULTIPLY(z2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
+    z3 = MULTIPLY(z3, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
+    z4 = MULTIPLY(z4, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
+    
+    z3 += z5;
+    z4 += z5;
+    
+    dataptr[7] = (DCTELEM) DESCALE(tmp4 + z1 + z3, CONST_BITS-PASS1_BITS);
+    dataptr[5] = (DCTELEM) DESCALE(tmp5 + z2 + z4, CONST_BITS-PASS1_BITS);
+    dataptr[3] = (DCTELEM) DESCALE(tmp6 + z2 + z3, CONST_BITS-PASS1_BITS);
+    dataptr[1] = (DCTELEM) DESCALE(tmp7 + z1 + z4, CONST_BITS-PASS1_BITS);
+    
+    dataptr += DCTSIZE;		/* advance pointer to next row */
+  }
 
-      tmp0 = inptr[7] + inptr[0];
-      tmp1 = inptr[6] + inptr[1];
-      tmp2 = inptr[5] + inptr[2];
-      tmp3 = inptr[4] + inptr[3];
-      tmp4 = inptr[3] - inptr[4];
-      tmp5 = inptr[2] - inptr[5];
-      tmp6 = inptr[1] - inptr[6];
-      tmp7 = inptr[0] - inptr[7];
-      
-      tmp10 = tmp3 + tmp0;
-      tmp11 = tmp2 + tmp1;
-      tmp12 = tmp1 - tmp2;
-      tmp13 = tmp0 - tmp3;
-      
-      outptr[        0] = (DCTELEM) UNFIXH((tmp10 + tmp11) * SIN_1_4);
-      outptr[DCTSIZE*4] = (DCTELEM) UNFIXH((tmp10 - tmp11) * COS_1_4);
-      
-      outptr[DCTSIZE*2] = (DCTELEM) UNFIXH(tmp13*COS_1_8 + tmp12*SIN_1_8);
-      outptr[DCTSIZE*6] = (DCTELEM) UNFIXH(tmp13*SIN_1_8 - tmp12*COS_1_8);
-      
-      tmp16 = UNFIXO((tmp6 + tmp5) * SIN_1_4);
-      tmp15 = UNFIXO((tmp6 - tmp5) * COS_1_4);
-      
-      OVERSHIFT(tmp4);
-      OVERSHIFT(tmp7);
-      
-      /* tmp4, tmp7, tmp15, tmp16 are overscaled by OVERSCALE */
-      
-      tmp14 = tmp4 + tmp15;
-      tmp25 = tmp4 - tmp15;
-      tmp26 = tmp7 - tmp16;
-      tmp17 = tmp7 + tmp16;
-      
-      outptr[DCTSIZE  ] = (DCTELEM) UNFIXH(tmp17*OCOS_1_16 + tmp14*OSIN_1_16);
-      outptr[DCTSIZE*7] = (DCTELEM) UNFIXH(tmp17*OCOS_7_16 - tmp14*OSIN_7_16);
-      outptr[DCTSIZE*5] = (DCTELEM) UNFIXH(tmp26*OCOS_5_16 + tmp25*OSIN_5_16);
-      outptr[DCTSIZE*3] = (DCTELEM) UNFIXH(tmp26*OCOS_3_16 - tmp25*OSIN_3_16);
+  /* Pass 2: process columns. */
+  /* Note that we must descale the results by a factor of 8 == 2**3, */
+  /* and also undo the PASS1_BITS scaling. */
 
-      inptr += DCTSIZE;		/* advance inptr to next row */
-      outptr++;			/* advance outptr to next column */
-    }
-    /* end of pass; in case it was pass 1, set up for pass 2 */
-    inptr = workspace;
-    outptr = data;
+  dataptr = data;
+  for (rowctr = DCTSIZE-1; rowctr >= 0; rowctr--) {
+    tmp0 = dataptr[DCTSIZE*0] + dataptr[DCTSIZE*7];
+    tmp7 = dataptr[DCTSIZE*0] - dataptr[DCTSIZE*7];
+    tmp1 = dataptr[DCTSIZE*1] + dataptr[DCTSIZE*6];
+    tmp6 = dataptr[DCTSIZE*1] - dataptr[DCTSIZE*6];
+    tmp2 = dataptr[DCTSIZE*2] + dataptr[DCTSIZE*5];
+    tmp5 = dataptr[DCTSIZE*2] - dataptr[DCTSIZE*5];
+    tmp3 = dataptr[DCTSIZE*3] + dataptr[DCTSIZE*4];
+    tmp4 = dataptr[DCTSIZE*3] - dataptr[DCTSIZE*4];
+    
+    /* Even part per LL&M figure 1 --- note that published figure is faulty;
+     * rotator "sqrt(2)*c1" should be "sqrt(2)*c6".
+     */
+    
+    tmp10 = tmp0 + tmp3;
+    tmp13 = tmp0 - tmp3;
+    tmp11 = tmp1 + tmp2;
+    tmp12 = tmp1 - tmp2;
+    
+    dataptr[DCTSIZE*0] = (DCTELEM) DESCALE(tmp10 + tmp11, PASS1_BITS+3);
+    dataptr[DCTSIZE*4] = (DCTELEM) DESCALE(tmp10 - tmp11, PASS1_BITS+3);
+    
+    z1 = MULTIPLY(tmp12 + tmp13, FIX_0_541196100);
+    dataptr[DCTSIZE*2] = (DCTELEM) DESCALE(z1 + MULTIPLY(tmp13, FIX_0_765366865),
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*6] = (DCTELEM) DESCALE(z1 + MULTIPLY(tmp12, - FIX_1_847759065),
+					   CONST_BITS+PASS1_BITS+3);
+    
+    /* Odd part per figure 8 --- note paper omits factor of sqrt(2).
+     * cK represents cos(K*pi/16).
+     * i0..i3 in the paper are tmp4..tmp7 here.
+     */
+    
+    z1 = tmp4 + tmp7;
+    z2 = tmp5 + tmp6;
+    z3 = tmp4 + tmp6;
+    z4 = tmp5 + tmp7;
+    z5 = MULTIPLY(z3 + z4, FIX_1_175875602); /* sqrt(2) * c3 */
+    
+    tmp4 = MULTIPLY(tmp4, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
+    tmp5 = MULTIPLY(tmp5, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
+    tmp6 = MULTIPLY(tmp6, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
+    tmp7 = MULTIPLY(tmp7, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
+    z1 = MULTIPLY(z1, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
+    z2 = MULTIPLY(z2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
+    z3 = MULTIPLY(z3, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
+    z4 = MULTIPLY(z4, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
+    
+    z3 += z5;
+    z4 += z5;
+    
+    dataptr[DCTSIZE*7] = (DCTELEM) DESCALE(tmp4 + z1 + z3,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*5] = (DCTELEM) DESCALE(tmp5 + z2 + z4,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*3] = (DCTELEM) DESCALE(tmp6 + z2 + z3,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*1] = (DCTELEM) DESCALE(tmp7 + z1 + z4,
+					   CONST_BITS+PASS1_BITS+3);
+    
+    dataptr++;			/* advance pointer to next column */
   }
 }
diff --git a/jinclude.h b/jinclude.h
index 2786670..c8a529f 100644
--- a/jinclude.h
+++ b/jinclude.h
@@ -76,21 +76,25 @@
  * We need the memcpy() and strcmp() functions, plus memory zeroing.
  * ANSI and System V implementations declare these in <string.h>.
  * BSD doesn't have the mem() functions, but it does have bcopy()/bzero().
+ * Some systems may declare memset and memcpy in <memory.h>.
+ *
  * NOTE: we assume the size parameters to these functions are of type size_t.
- * Insert casts in these macros if not!
+ * Change the casts in these macros if not!
  */
 
 #ifdef INCLUDES_ARE_ANSI
 #include <string.h>
-#define MEMZERO(voidptr,size)	memset((voidptr), 0, (size))
+#define MEMZERO(target,size)	memset((void *)(target), 0, (size_t)(size))
+#define MEMCOPY(dest,src,size)	memcpy((void *)(dest), (const void *)(src), (size_t)(size))
 #else /* not ANSI */
 #ifdef BSD
 #include <strings.h>
-#define MEMZERO(voidptr,size)	bzero((voidptr), (size))
-#define memcpy(dest,src,size)	bcopy((src), (dest), (size))
+#define MEMZERO(target,size)	bzero((void *)(target), (size_t)(size))
+#define MEMCOPY(dest,src,size)	bcopy((const void *)(src), (void *)(dest), (size_t)(size))
 #else /* not BSD, assume Sys V or compatible */
 #include <string.h>
-#define MEMZERO(voidptr,size)	memset((voidptr), 0, (size))
+#define MEMZERO(target,size)	memset((void *)(target), 0, (size_t)(size))
+#define MEMCOPY(dest,src,size)	memcpy((void *)(dest), (const void *)(src), (size_t)(size))
 #endif /* BSD */
 #endif /* ANSI */
 
diff --git a/jmemdos.c b/jmemdos.c
index f6839c9..8190273 100644
--- a/jmemdos.c
+++ b/jmemdos.c
@@ -41,10 +41,11 @@
 #include "jmemsys.h"
 
 #ifdef INCLUDES_ARE_ANSI
-#include <stdlib.h>		/* to declare malloc(), free() */
+#include <stdlib.h>		/* to declare malloc(), free(), getenv() */
 #else
 extern void * malloc PP((size_t size));
 extern void free PP((void *ptr));
+extern char * getenv PP((const char * name));
 #endif
 
 #ifdef NEED_FAR_POINTERS
@@ -285,8 +286,12 @@
   char tracemsg[TEMP_NAME_LENGTH+40];
 
   select_file_name(info->temp_name);
-  if (jdos_open((short far *) & handle, (char far *) info->temp_name))
+  if (jdos_open((short far *) & handle, (char far *) info->temp_name)) {
+    /* hack to get around TRACEMS' inability to handle string parameters */
+    sprintf(tracemsg, "Failed to create temporary file %s", info->temp_name);
+    ERREXIT(methods, tracemsg);	/* jopen_backing_store will fail anyway */
     return FALSE;
+  }
   info->handle.file_handle = handle;
   info->read_backing_store = read_file_store;
   info->write_backing_store = write_file_store;
diff --git a/jmemdos.h b/jmemdos.h
index f124928..215dea6 100644
--- a/jmemdos.h
+++ b/jmemdos.h
@@ -49,7 +49,7 @@
  * On machines with flat address spaces, any large constant may be used here.
  */
 
-#define MAX_ALLOC_CHUNK		65400L
+#define MAX_ALLOC_CHUNK		65440L	/* leave room for malloc overhead */
 
 /*
  * This routine computes the total space available for allocation by
diff --git a/jmemmgr.c b/jmemmgr.c
index 614755f..77b0253 100644
--- a/jmemmgr.c
+++ b/jmemmgr.c
@@ -35,6 +35,14 @@
 #include "jinclude.h"
 #include "jmemsys.h"		/* import the system-dependent declarations */
 
+#ifndef NO_GETENV
+#ifdef INCLUDES_ARE_ANSI
+#include <stdlib.h>		/* to declare getenv() */
+#else
+extern char * getenv PP((const char * name));
+#endif
+#endif
+
 
 /*
  * On many systems it is not necessary to distinguish alloc_small from
@@ -51,6 +59,24 @@
 
 
 /*
+ * Many machines require storage alignment: longs must start on 4-byte
+ * boundaries, doubles on 8-byte boundaries, etc.  On such machines, malloc()
+ * always returns pointers that are multiples of the worst-case alignment
+ * requirement, and we had better do so too.  This means the headers that
+ * we tack onto allocated structures had better have length a multiple of
+ * the alignment requirement.
+ * There isn't any really portable way to determine the worst-case alignment
+ * requirement.  In this code we assume that the alignment requirement is
+ * multiples of sizeof(align_type).  Here we define align_type as double;
+ * with this definition, the code will run on all machines known to me.
+ * If your machine has lesser alignment needs, you can save a few bytes
+ * by making align_type smaller.
+ */
+
+typedef double align_type;
+
+
+/*
  * Some important notes:
  *   The allocation routines provided here must never return NULL.
  *   They should exit to error_exit if unsuccessful.
@@ -152,10 +178,11 @@
  * These are all-in-memory, and are in near-heap space on an 80x86.
  */
 
-typedef struct small_struct * small_ptr;
+typedef union small_struct * small_ptr;
 
-typedef struct small_struct {
+typedef union small_struct {
 	small_ptr next;		/* next in list of allocated objects */
+	align_type dummy;	/* ensures alignment of following storage */
       } small_hdr;
 
 static small_ptr small_list;	/* head of list */
@@ -222,10 +249,11 @@
 
 #ifdef NEED_ALLOC_MEDIUM
 
-typedef struct medium_struct FAR * medium_ptr;
+typedef union medium_struct FAR * medium_ptr;
 
-typedef struct medium_struct {
+typedef union medium_struct {
 	medium_ptr next;	/* next in list of allocated objects */
+	align_type dummy;	/* ensures alignment of following storage */
       } medium_hdr;
 
 static medium_ptr medium_list;	/* head of list */
@@ -304,6 +332,7 @@
 	small_sarray_ptr next;	/* next in list of allocated sarrays */
 	long numrows;		/* # of rows in this array */
 	long rowsperchunk;	/* max # of rows per allocation chunk */
+	JSAMPROW dummy;		/* ensures alignment of following storage */
       } small_sarray_hdr;
 
 static small_sarray_ptr small_sarray_list; /* head of list */
@@ -411,6 +440,7 @@
 	small_barray_ptr next;	/* next in list of allocated barrays */
 	long numrows;		/* # of rows in this array */
 	long rowsperchunk;	/* max # of rows per allocation chunk */
+	JBLOCKROW dummy;	/* ensures alignment of following storage */
       } small_barray_hdr;
 
 static small_barray_ptr small_barray_list; /* head of list */
@@ -551,7 +581,7 @@
  * that is, successive access start_row numbers always differ by exactly the
  * unitheight.  This allows fairly simple buffer dump/reload logic if the
  * in-memory buffer is made a multiple of the unitheight.  It would be
- * possible to keep subsampled rather than fullsize data in the "big" arrays,
+ * possible to keep downsampled rather than fullsize data in the "big" arrays,
  * thus reducing temp file size, if we supported overlapping strip access
  * (access requests differing by less than the unitheight).  At the moment
  * I don't believe this is worth the extra complexity.
@@ -712,8 +742,8 @@
 	/* It doesn't fit in memory, create backing store. */
 	sptr->rows_in_mem = max_unitheights * sptr->unitheight;
 	jopen_backing_store(& sptr->b_s_info,
-			    sptr->rows_in_array
-			    * sptr->samplesperrow * SIZEOF(JSAMPLE));
+			    (long) (sptr->rows_in_array *
+				    sptr->samplesperrow * SIZEOF(JSAMPLE)));
 	sptr->b_s_open = TRUE;
       }
       sptr->mem_buffer = alloc_small_sarray(sptr->samplesperrow,
@@ -739,8 +769,8 @@
 	/* It doesn't fit in memory, create backing store. */
 	bptr->rows_in_mem = max_unitheights * bptr->unitheight;
 	jopen_backing_store(& bptr->b_s_info,
-			    bptr->rows_in_array
-			    * bptr->blocksperrow * SIZEOF(JBLOCK));
+			    (long) (bptr->rows_in_array *
+				    bptr->blocksperrow * SIZEOF(JBLOCK)));
 	bptr->b_s_open = TRUE;
       }
       bptr->mem_buffer = alloc_small_barray(bptr->blocksperrow,
@@ -1046,4 +1076,27 @@
   big_barray_list = NULL;
 
   jmem_init(emethods);		/* system-dependent initialization */
+
+  /* Check for an environment variable JPEGMEM; if found, override the
+   * default max_memory setting from jmem_init.  Note that a command line
+   * -m argument may again override this value.
+   * If your system doesn't support getenv(), define NO_GETENV to disable
+   * this feature.
+   */
+#ifndef NO_GETENV
+  { char * memenv;
+
+    if ((memenv = getenv("JPEGMEM")) != NULL) {
+      long lval;
+      char ch = 'x';
+
+      if (sscanf(memenv, "%ld%c", &lval, &ch) > 0) {
+	if (ch == 'm' || ch == 'M')
+	  lval *= 1000L;
+	emethods->max_memory_to_use = lval * 1000L;
+      }
+    }
+  }
+#endif
+
 }
diff --git a/jmemname.c b/jmemname.c
index 18e6711..170f968 100644
--- a/jmemname.c
+++ b/jmemname.c
@@ -215,8 +215,11 @@
   char tracemsg[TEMP_NAME_LENGTH+40];
 
   select_file_name(info->temp_name);
-  if ((info->temp_file = fopen(info->temp_name, RW_BINARY)) == NULL)
-    ERREXIT(methods, "Failed to create temporary file");
+  if ((info->temp_file = fopen(info->temp_name, RW_BINARY)) == NULL) {
+    /* hack to get around ERREXIT's inability to handle string parameters */
+    sprintf(tracemsg, "Failed to create temporary file %s", info->temp_name);
+    ERREXIT(methods, tracemsg);
+  }
   info->read_backing_store = read_backing_store;
   info->write_backing_store = write_backing_store;
   info->close_backing_store = close_backing_store;
diff --git a/jpegdata.h b/jpegdata.h
index 6a0fa0a..a8ce18d 100644
--- a/jpegdata.h
+++ b/jpegdata.h
@@ -67,7 +67,7 @@
 
 /*
  * NOTE: if you have an ancient, strict-K&R C compiler, it may choke on the
- * similarly-named fields in compress_info_struct and decompress_info_struct.
+ * similarly-named fields in Compress_info_struct and Decompress_info_struct.
  * If this happens, you can get around it by rearranging the two structs so
  * that the similarly-named fields appear first and in the same order in
  * each struct.  Since such compilers are now pretty rare, we haven't done
@@ -87,9 +87,9 @@
 #endif
 
 /* Forward references to lists of method pointers */
-typedef struct external_methods_struct * external_methods_ptr;
-typedef struct compress_methods_struct * compress_methods_ptr;
-typedef struct decompress_methods_struct * decompress_methods_ptr;
+typedef struct External_methods_struct * external_methods_ptr;
+typedef struct Compress_methods_struct * compress_methods_ptr;
+typedef struct Decompress_methods_struct * decompress_methods_ptr;
 
 
 /* Data structures for images containing either samples or coefficients. */
@@ -154,13 +154,13 @@
   /* These values are computed during compression or decompression startup */
 	long true_comp_width;	/* component's image width in samples */
 	long true_comp_height;	/* component's image height in samples */
-	/* the above are the logical dimensions of the subsampled image */
+	/* the above are the logical dimensions of the downsampled image */
   /* These values are computed before starting a scan of the component */
 	short MCU_width;	/* number of blocks per MCU, horizontally */
 	short MCU_height;	/* number of blocks per MCU, vertically */
 	short MCU_blocks;	/* MCU_width * MCU_height */
-	long subsampled_width;	/* image width in samples, after expansion */
-	long subsampled_height;	/* image height in samples, after expansion */
+	long downsampled_width;	/* image width in samples, after expansion */
+	long downsampled_height; /* image height in samples, after expansion */
 	/* the above are the true_comp_xxx values rounded up to multiples of */
 	/* the MCU dimensions; these are the working dimensions of the array */
 	/* as it is passed through the DCT or IDCT step.  NOTE: these values */
@@ -218,7 +218,7 @@
 
 /* Working data for compression */
 
-struct compress_info_struct {
+struct Compress_info_struct {
 /*
  * All of these fields shall be established by the user interface before
  * calling jpeg_compress, or by the input_init or c_ui_method_selection
@@ -273,8 +273,15 @@
 	boolean interleave;	/* TRUE=interleaved output, FALSE=not */
 	boolean optimize_coding; /* TRUE=optimize entropy encoding parms */
 	boolean CCIR601_sampling; /* TRUE=first samples are cosited */
+	int smoothing_factor;	/* 1..100, or 0 for no input smoothing */
 
-	UINT16 restart_interval;/* MDUs per restart interval, or 0 for no restart */
+	/* The restart interval can be specified in absolute MCUs by setting
+	 * restart_interval, or in MCU rows by setting restart_in_rows
+	 * (in which case the correct restart_interval will be figured
+	 * for each scan).
+	 */
+	UINT16 restart_interval;/* MCUs per restart interval, or 0 for no restart */
+	int restart_in_rows;	/* if > 0, MCU rows per restart interval */
 
 /*
  * These fields are computed during jpeg_compress startup
@@ -307,16 +314,16 @@
 	/* these fields are private data for the entropy encoder */
 	JCOEF last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each comp */
 	JCOEF last_dc_diff[MAX_COMPS_IN_SCAN]; /* last DC diff for each comp */
-	UINT16 restarts_to_go;	/* MDUs left in this restart interval */
+	UINT16 restarts_to_go;	/* MCUs left in this restart interval */
 	short next_restart_num;	/* # of next RSTn marker (0..7) */
 };
 
-typedef struct compress_info_struct * compress_info_ptr;
+typedef struct Compress_info_struct * compress_info_ptr;
 
 
 /* Working data for decompression */
 
-struct decompress_info_struct {
+struct Decompress_info_struct {
 /*
  * These fields shall be established by the user interface before
  * calling jpeg_decompress.
@@ -344,7 +351,7 @@
 	int desired_number_of_colors;	/* max number of colors to use */
 
 	boolean do_block_smoothing; /* T = apply cross-block smoothing */
-	boolean do_pixel_smoothing; /* T = apply post-subsampling smoothing */
+	boolean do_pixel_smoothing; /* T = apply post-upsampling smoothing */
 
 /*
  * These fields are used for efficient buffering of data between read_jpeg_data
@@ -395,7 +402,7 @@
 	boolean arith_code;	/* TRUE=arithmetic coding, FALSE=Huffman */
 	boolean CCIR601_sampling; /* TRUE=first samples are cosited */
 
-	UINT16 restart_interval;/* MDUs per restart interval, or 0 for no restart */
+	UINT16 restart_interval;/* MCUs per restart interval, or 0 for no restart */
 
 /*
  * These fields are computed during jpeg_decompress startup
@@ -408,6 +415,8 @@
 	short final_out_comps;	/* # of color components sent to put_pixel_rows */
 	/* (1 when quantizing colors, else same as color_out_comps) */
 
+	JSAMPLE * sample_range_limit; /* table for fast range-limiting */
+
 /*
  * When quantizing colors, the color quantizer leaves a pointer to the output
  * colormap in these fields.  The colormap is valid from the time put_color_map
@@ -443,11 +452,11 @@
 	/* these fields are private data for the entropy encoder */
 	JCOEF last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each comp */
 	JCOEF last_dc_diff[MAX_COMPS_IN_SCAN]; /* last DC diff for each comp */
-	UINT16 restarts_to_go;	/* MDUs left in this restart interval */
+	UINT16 restarts_to_go;	/* MCUs left in this restart interval */
 	short next_restart_num;	/* # of next RSTn marker (0..7) */
 };
 
-typedef struct decompress_info_struct * decompress_info_ptr;
+typedef struct Decompress_info_struct * decompress_info_ptr;
 
 
 /* Macros for reading data from the decompression input buffer */
@@ -463,7 +472,7 @@
 #endif
 
 #define JUNGETC(ch,cinfo)  ((cinfo)->bytes_in_buffer++, \
-			    *(--((cinfo)->next_input_byte)) = (ch))
+			    *(--((cinfo)->next_input_byte)) = (char) (ch))
 
 #define MIN_UNGET	4	/* may always do at least 4 JUNGETCs */
 
@@ -500,22 +509,22 @@
 					     JBLOCK *MCU_data));
 typedef METHOD(void, MCU_output_caller_ptr, (compress_info_ptr cinfo,
 					     MCU_output_method_ptr output_method));
-typedef METHOD(void, subsample_ptr, (compress_info_ptr cinfo,
-				     int which_component,
-				     long input_cols, int input_rows,
-				     long output_cols, int output_rows,
-				     JSAMPARRAY above,
-				     JSAMPARRAY input_data,
-				     JSAMPARRAY below,
-				     JSAMPARRAY output_data));
-typedef METHOD(void, unsubsample_ptr, (decompress_info_ptr cinfo,
-				       int which_component,
-				       long input_cols, int input_rows,
-				       long output_cols, int output_rows,
-				       JSAMPARRAY above,
-				       JSAMPARRAY input_data,
-				       JSAMPARRAY below,
-				       JSAMPARRAY output_data));
+typedef METHOD(void, downsample_ptr, (compress_info_ptr cinfo,
+				      int which_component,
+				      long input_cols, int input_rows,
+				      long output_cols, int output_rows,
+				      JSAMPARRAY above,
+				      JSAMPARRAY input_data,
+				      JSAMPARRAY below,
+				      JSAMPARRAY output_data));
+typedef METHOD(void, upsample_ptr, (decompress_info_ptr cinfo,
+				    int which_component,
+				    long input_cols, int input_rows,
+				    long output_cols, int output_rows,
+				    JSAMPARRAY above,
+				    JSAMPARRAY input_data,
+				    JSAMPARRAY below,
+				    JSAMPARRAY output_data));
 typedef METHOD(void, quantize_method_ptr, (decompress_info_ptr cinfo,
 					   int num_rows,
 					   JSAMPIMAGE input_data,
@@ -531,21 +540,30 @@
  * decompression.
  */
 
-struct external_methods_struct {
+struct External_methods_struct {
 	/* User interface: error exit and trace message routines */
-	/* NOTE: the string msgtext parameters will eventually be replaced */
-	/* by an enumerated-type code so that non-English error messages */
-	/* can be substituted easily.  This will not be done until all the */
-	/* code is in place, so that we know what messages are needed. */
+	/* NOTE: the string msgtext parameters will eventually be replaced
+	 * by an enumerated-type code so that non-English error messages
+	 * can be substituted easily.  This will not be done until all the
+	 * code is in place, so that we know what messages are needed.
+	 */
 	METHOD(void, error_exit, (const char *msgtext));
 	METHOD(void, trace_message, (const char *msgtext));
 
 	/* Working data for error/trace facility */
 	/* See macros below for the usage of these variables */
 	int trace_level;	/* level of detail of tracing messages */
-	/* Use level 0 for unsuppressable messages (nonfatal errors) */
+	/* Use level 0 for important warning messages (nonfatal errors) */
 	/* Use levels 1, 2, 3 for successively more detailed trace options */
 
+	/* For recoverable corrupt-data errors, we emit a warning message and
+	 * keep going.  A surrounding application can check for bad data by
+	 * seeing if num_warnings is nonzero at the end of processing.
+	 */
+	long num_warnings;	/* number of corrupt-data warnings */
+	int first_warning_level; /* trace level for first warning */
+	int more_warning_level;	/* trace level for subsequent warnings */
+
 	int message_parm[8];	/* store numeric parms for messages here */
 
 	/* Memory management */
@@ -586,6 +604,7 @@
 /* Macros to simplify using the error and trace message stuff */
 /* The first parameter is generally cinfo->emethods */
 
+/* Fatal errors (print message and exit) */
 #define ERREXIT(emeth,msg)		((*(emeth)->error_exit) (msg))
 #define ERREXIT1(emeth,msg,p1)		((emeth)->message_parm[0] = (p1), \
 					 (*(emeth)->error_exit) (msg))
@@ -604,6 +623,25 @@
 
 #define MAKESTMT(stuff)		do { stuff } while (0)
 
+/* Nonfatal errors (we'll keep going, but the data is probably corrupt) */
+/* Note that warning count is incremented as a side-effect! */
+#define WARNMS(emeth,msg)    \
+  MAKESTMT( if ((emeth)->trace_level >= ((emeth)->num_warnings++ ? \
+		(emeth)->more_warning_level : (emeth)->first_warning_level)){ \
+		(*(emeth)->trace_message) (msg); } )
+#define WARNMS1(emeth,msg,p1)    \
+  MAKESTMT( if ((emeth)->trace_level >= ((emeth)->num_warnings++ ? \
+		(emeth)->more_warning_level : (emeth)->first_warning_level)){ \
+		(emeth)->message_parm[0] = (p1); \
+		(*(emeth)->trace_message) (msg); } )
+#define WARNMS2(emeth,msg,p1,p2)    \
+  MAKESTMT( if ((emeth)->trace_level >= ((emeth)->num_warnings++ ? \
+		(emeth)->more_warning_level : (emeth)->first_warning_level)){ \
+		(emeth)->message_parm[0] = (p1); \
+		(emeth)->message_parm[1] = (p2); \
+		(*(emeth)->trace_message) (msg); } )
+
+/* Informational/debugging messages */
 #define TRACEMS(emeth,lvl,msg)    \
   MAKESTMT( if ((emeth)->trace_level >= (lvl)) { \
 		(*(emeth)->trace_message) (msg); } )
@@ -636,7 +674,7 @@
 
 /* Methods used during JPEG compression. */
 
-struct compress_methods_struct {
+struct Compress_methods_struct {
 	/* Hook for user interface to get control after input_init */
 	METHOD(void, c_ui_method_selection, (compress_info_ptr cinfo));
 	/* Hook for user interface to do progress monitoring */
@@ -658,11 +696,11 @@
 				   long input_cols, int input_rows,
 				   long output_cols, int output_rows,
 				   JSAMPIMAGE image_data));
-	/* Subsample pixel values of a single component */
-	/* There can be a different subsample method for each component */
-	METHOD(void, subsample_init, (compress_info_ptr cinfo));
-	subsample_ptr subsample[MAX_COMPS_IN_SCAN];
-	METHOD(void, subsample_term, (compress_info_ptr cinfo));
+	/* Downsample pixel values of a single component */
+	/* There can be a different downsample method for each component */
+	METHOD(void, downsample_init, (compress_info_ptr cinfo));
+	downsample_ptr downsample[MAX_COMPS_IN_SCAN];
+	METHOD(void, downsample_term, (compress_info_ptr cinfo));
 	/* Extract samples in MCU order, process & hand off to output_method */
 	/* The input is always exactly N MCU rows worth of data */
 	METHOD(void, extract_init, (compress_info_ptr cinfo));
@@ -675,10 +713,10 @@
 	METHOD(void, entropy_optimize, (compress_info_ptr cinfo,
 					MCU_output_caller_ptr source_method));
 	/* Entropy encoding */
-	METHOD(void, entropy_encoder_init, (compress_info_ptr cinfo));
+	METHOD(void, entropy_encode_init, (compress_info_ptr cinfo));
 	METHOD(void, entropy_encode, (compress_info_ptr cinfo,
 				      JBLOCK *MCU_data));
-	METHOD(void, entropy_encoder_term, (compress_info_ptr cinfo));
+	METHOD(void, entropy_encode_term, (compress_info_ptr cinfo));
 	/* JPEG file header construction */
 	METHOD(void, write_file_header, (compress_info_ptr cinfo));
 	METHOD(void, write_scan_header, (compress_info_ptr cinfo));
@@ -698,7 +736,7 @@
 
 /* Methods used during JPEG decompression. */
 
-struct decompress_methods_struct {
+struct Decompress_methods_struct {
 	/* Hook for user interface to get control after reading file header */
 	METHOD(void, d_ui_method_selection, (decompress_info_ptr cinfo));
 	/* Hook for user interface to do progress monitoring */
@@ -708,13 +746,15 @@
 	METHOD(void, read_file_header, (decompress_info_ptr cinfo));
 	METHOD(boolean, read_scan_header, (decompress_info_ptr cinfo));
 	METHOD(int, read_jpeg_data, (decompress_info_ptr cinfo));
+	METHOD(void, resync_to_restart, (decompress_info_ptr cinfo,
+					 int marker));
 	METHOD(void, read_scan_trailer, (decompress_info_ptr cinfo));
 	METHOD(void, read_file_trailer, (decompress_info_ptr cinfo));
 	/* Entropy decoding */
-	METHOD(void, entropy_decoder_init, (decompress_info_ptr cinfo));
+	METHOD(void, entropy_decode_init, (decompress_info_ptr cinfo));
 	METHOD(void, entropy_decode, (decompress_info_ptr cinfo,
-				      JBLOCK *MCU_data));
-	METHOD(void, entropy_decoder_term, (decompress_info_ptr cinfo));
+				      JBLOCKROW *MCU_data));
+	METHOD(void, entropy_decode_term, (decompress_info_ptr cinfo));
 	/* MCU disassembly: fetch MCUs from entropy_decode, build coef array */
 	/* The reverse_DCT step is in the same module for symmetry reasons */
 	METHOD(void, disassemble_init, (decompress_info_ptr cinfo));
@@ -731,11 +771,11 @@
 					   JBLOCKROW currow,
 					   JBLOCKROW below,
 					   JBLOCKROW output));
-	/* Un-subsample pixel values of a single component */
-	/* There can be a different unsubsample method for each component */
-	METHOD(void, unsubsample_init, (decompress_info_ptr cinfo));
-	unsubsample_ptr unsubsample[MAX_COMPS_IN_SCAN];
-	METHOD(void, unsubsample_term, (decompress_info_ptr cinfo));
+	/* Upsample pixel values of a single component */
+	/* There can be a different upsample method for each component */
+	METHOD(void, upsample_init, (decompress_info_ptr cinfo));
+	upsample_ptr upsample[MAX_COMPS_IN_SCAN];
+	METHOD(void, upsample_term, (decompress_info_ptr cinfo));
 	/* Color space and gamma conversion */
 	METHOD(void, colorout_init, (decompress_info_ptr cinfo));
 	METHOD(void, color_convert, (decompress_info_ptr cinfo,
@@ -791,6 +831,11 @@
 EXTERN void j_monochrome_default PP((compress_info_ptr cinfo));
 EXTERN void j_set_quality PP((compress_info_ptr cinfo, int quality,
 			      boolean force_baseline));
+/* advanced compression parameter setup aids */
+EXTERN void j_add_quant_table PP((compress_info_ptr cinfo, int which_tbl,
+				  const QUANT_VAL *basic_table,
+				  int scale_factor, boolean force_baseline));
+EXTERN int j_quality_scaling PP((int quality));
 
 /* main entry for decompression */
 EXTERN void jpeg_decompress PP((decompress_info_ptr cinfo));
@@ -818,7 +863,7 @@
 EXTERN void jselchuffman PP((compress_info_ptr cinfo)); /* jchuff.c */
 EXTERN void jselcarithmetic PP((compress_info_ptr cinfo)); /* jcarith.c */
 EXTERN void jselexpand PP((compress_info_ptr cinfo)); /* jcexpand.c */
-EXTERN void jselsubsample PP((compress_info_ptr cinfo)); /* jcsample.c */
+EXTERN void jseldownsample PP((compress_info_ptr cinfo)); /* jcsample.c */
 EXTERN void jselcmcu PP((compress_info_ptr cinfo)); /* jcmcu.c */
 EXTERN void jselccolor PP((compress_info_ptr cinfo));	/* jccolor.c */
 /* The user interface should call one of these to select input format: */
@@ -835,7 +880,7 @@
 EXTERN void jseldarithmetic PP((decompress_info_ptr cinfo)); /* jdarith.c */
 EXTERN void jseldmcu PP((decompress_info_ptr cinfo)); /* jdmcu.c */
 EXTERN void jselbsmooth PP((decompress_info_ptr cinfo)); /* jbsmooth.c */
-EXTERN void jselunsubsample PP((decompress_info_ptr cinfo)); /* jdsample.c */
+EXTERN void jselupsample PP((decompress_info_ptr cinfo)); /* jdsample.c */
 EXTERN void jseldcolor PP((decompress_info_ptr cinfo));	/* jdcolor.c */
 EXTERN void jsel1quantize PP((decompress_info_ptr cinfo)); /* jquant1.c */
 EXTERN void jsel2quantize PP((decompress_info_ptr cinfo)); /* jquant2.c */
diff --git a/jquant1.c b/jquant1.c
index 8fdb2a3..83399e5 100644
--- a/jquant1.c
+++ b/jquant1.c
@@ -247,7 +247,7 @@
    * (Forcing the upper and lower values to the limits ensures that
    * dithering can't produce a color outside the selected gamut.)
    */
-  return (j * MAXJSAMPLE + maxj/2) / maxj;
+  return (int) (((INT32) j * MAXJSAMPLE + maxj/2) / maxj);
 }
 
 
@@ -257,7 +257,7 @@
 /* Must have largest(j=0) >= 0, and largest(j=maxj) >= MAXJSAMPLE */
 {
   /* Breakpoints are halfway between values returned by output_value */
-  return ((2*j + 1) * MAXJSAMPLE + maxj) / (2*maxj);
+  return (int) (((INT32) (2*j + 1) * MAXJSAMPLE + maxj) / (2*maxj));
 }
 
 
@@ -454,6 +454,7 @@
   register FSERRPTR thisrowerr, nextrowerr;
   register JSAMPROW input_ptr;
   register JSAMPROW output_ptr;
+  JSAMPLE *range_limit = cinfo->sample_range_limit;
   JSAMPROW colorindex_ci;
   JSAMPROW colormap_ci;
   register int pixcode;
@@ -463,6 +464,7 @@
   int row;
   long col_counter;
   long width = cinfo->image_width;
+  SHIFT_TEMPS
 
   for (row = 0; row < num_rows; row++) {
     do_color_conversion(cinfo, input_data, row);
@@ -489,22 +491,22 @@
       colormap_ci = colormap[ci];
       *nextrowerr = 0;		/* need only initialize this one entry */
       for (col_counter = width; col_counter > 0; col_counter--) {
-	/* Compute pixel value + accumulated error for this component */
-	val = (((FSERROR) GETJSAMPLE(*input_ptr)) << 4) + *thisrowerr;
-	if (val < 0) val = 0;	/* must watch for range overflow! */
-	else {
-	  val += 8;		/* divide by 16 with proper rounding */
-	  val >>= 4;
-	  if (val > MAXJSAMPLE) val = MAXJSAMPLE;
-	}
+	/* Get accumulated error for this component, round to integer.
+	 * RIGHT_SHIFT rounds towards minus infinity, so adding 8 is correct
+	 * for either sign of the error value.
+	 */
+	val = RIGHT_SHIFT(*thisrowerr + 8, 4);
+	/* Compute pixel value + error compensation, range-limit to
+	 * 0..MAXJSAMPLE.  Note max error value is +- MAXJSAMPLE.
+	 */
+	val = GETJSAMPLE(range_limit[GETJSAMPLE(*input_ptr) + val]);
 	/* Select output value, accumulate into output code for this pixel */
-	pixcode = GETJSAMPLE(*output_ptr);
-	pixcode += GETJSAMPLE(colorindex_ci[val]);
+	pixcode = GETJSAMPLE(*output_ptr) + GETJSAMPLE(colorindex_ci[val]);
 	*output_ptr = (JSAMPLE) pixcode;
 	/* Compute actual representation error at this pixel */
 	/* Note: we can do this even though we don't yet have the final */
 	/* value of pixcode, because the colormap is orthogonal. */
-	val -= (FSERROR) GETJSAMPLE(colormap_ci[pixcode]);
+	val -= GETJSAMPLE(colormap_ci[pixcode]);
 	/* Propagate error to (same component of) adjacent pixels */
 	/* Remember that nextrowerr entries are in reverse order! */
 	two_val = val * 2;
diff --git a/jquant2.c b/jquant2.c
index cf3eab1..c5c4110 100644
--- a/jquant2.c
+++ b/jquant2.c
@@ -913,16 +913,26 @@
 	      JSAMPIMAGE image_data, JSAMPARRAY output_workspace)
 /* This version performs Floyd-Steinberg dithering */
 {
-  register FSERROR val;
-  register FSERRPTR thisrowerr, nextrowerr;
+#ifdef EIGHT_BIT_SAMPLES
+  register int c0, c1, c2;
+  int two_val;
+#else
   register FSERROR c0, c1, c2;
-  register int pixcode;
+  FSERROR two_val;
+#endif
+  register FSERRPTR thisrowerr, nextrowerr;
   JSAMPROW ptr0, ptr1, ptr2, outptr;
   histptr cachep;
+  register int pixcode;
   int dir;
-  long col;
   int row;
+  long col;
   long width = cinfo->image_width;
+  JSAMPLE *range_limit = cinfo->sample_range_limit;
+  JSAMPROW colormap0 = my_colormap[0];
+  JSAMPROW colormap1 = my_colormap[1];
+  JSAMPROW colormap2 = my_colormap[2];
+  SHIFT_TEMPS
 
   /* Convert data to colormap indexes, which we save in output_workspace */
   for (row = 0; row < num_rows; row++) {
@@ -950,33 +960,21 @@
     /* need only initialize this one entry in nextrowerr */
     nextrowerr[0] = nextrowerr[1] = nextrowerr[2] = 0;
     for (col = width; col > 0; col--) {
-      /* Get this pixel's value and add accumulated errors */
-      /* The errors are in units of 1/16th pixel value */
-      val = (GETJSAMPLE(*ptr0) << 4) + thisrowerr[0];
-      if (val <= 0) val = 0;	/* must watch for range overflow! */
-      else {
-	val += 8;		/* divide by 16 with proper rounding */
-	val >>= 4;
-	if (val > MAXJSAMPLE) val = MAXJSAMPLE;
-      }
-      c0 = val;
-      val = (GETJSAMPLE(*ptr1) << 4) + thisrowerr[1];
-      if (val <= 0) val = 0;	/* must watch for range overflow! */
-      else {
-	val += 8;		/* divide by 16 with proper rounding */
-	val >>= 4;
-	if (val > MAXJSAMPLE) val = MAXJSAMPLE;
-      }
-      c1 = val;
-      val = (GETJSAMPLE(*ptr2) << 4) + thisrowerr[2];
-      if (val <= 0) val = 0;	/* must watch for range overflow! */
-      else {
-	val += 8;		/* divide by 16 with proper rounding */
-	val >>= 4;
-	if (val > MAXJSAMPLE) val = MAXJSAMPLE;
-      }
-      c2 = val;
-      /* Index into the cache with adjusted value */
+      /* For each component, get accumulated error and round to integer;
+       * form pixel value + error, and range-limit to 0..MAXJSAMPLE.
+       * RIGHT_SHIFT rounds towards minus infinity, so adding 8 is correct
+       * for either sign of the error value.  Max error is +- MAXJSAMPLE.
+       */
+      c0 = RIGHT_SHIFT(thisrowerr[0] + 8, 4);
+      c1 = RIGHT_SHIFT(thisrowerr[1] + 8, 4);
+      c2 = RIGHT_SHIFT(thisrowerr[2] + 8, 4);
+      c0 += GETJSAMPLE(*ptr0);
+      c1 += GETJSAMPLE(*ptr1);
+      c2 += GETJSAMPLE(*ptr2);
+      c0 = GETJSAMPLE(range_limit[c0]);
+      c1 = GETJSAMPLE(range_limit[c1]);
+      c2 = GETJSAMPLE(range_limit[c2]);
+      /* Index into the cache with adjusted pixel value */
       cachep = & histogram[c0 >> Y_SHIFT][c1 >> C_SHIFT][c2 >> C_SHIFT];
       /* If we have not seen this color before, find nearest colormap */
       /* entry and update the cache */
@@ -986,34 +984,34 @@
       pixcode = *cachep - 1;
       *outptr = (JSAMPLE) pixcode;
       /* Compute representation error for this pixel */
-      c0 -= (FSERROR) GETJSAMPLE(my_colormap[0][pixcode]);
-      c1 -= (FSERROR) GETJSAMPLE(my_colormap[1][pixcode]);
-      c2 -= (FSERROR) GETJSAMPLE(my_colormap[2][pixcode]);
+      c0 -= GETJSAMPLE(colormap0[pixcode]);
+      c1 -= GETJSAMPLE(colormap1[pixcode]);
+      c2 -= GETJSAMPLE(colormap2[pixcode]);
       /* Propagate error to adjacent pixels */
       /* Remember that nextrowerr entries are in reverse order! */
-      val = c0 * 2;
+      two_val = c0 * 2;
       nextrowerr[0-3]  = c0;	/* not +=, since not initialized yet */
-      c0 += val;		/* form error * 3 */
+      c0 += two_val;		/* form error * 3 */
       nextrowerr[0+3] += c0;
-      c0 += val;		/* form error * 5 */
+      c0 += two_val;		/* form error * 5 */
       nextrowerr[0  ] += c0;
-      c0 += val;		/* form error * 7 */
+      c0 += two_val;		/* form error * 7 */
       thisrowerr[0+3] += c0;
-      val = c1 * 2;
+      two_val = c1 * 2;
       nextrowerr[1-3]  = c1;	/* not +=, since not initialized yet */
-      c1 += val;		/* form error * 3 */
+      c1 += two_val;		/* form error * 3 */
       nextrowerr[1+3] += c1;
-      c1 += val;		/* form error * 5 */
+      c1 += two_val;		/* form error * 5 */
       nextrowerr[1  ] += c1;
-      c1 += val;		/* form error * 7 */
+      c1 += two_val;		/* form error * 7 */
       thisrowerr[1+3] += c1;
-      val = c2 * 2;
+      two_val = c2 * 2;
       nextrowerr[2-3]  = c2;	/* not +=, since not initialized yet */
-      c2 += val;		/* form error * 3 */
+      c2 += two_val;		/* form error * 3 */
       nextrowerr[2+3] += c2;
-      c2 += val;		/* form error * 5 */
+      c2 += two_val;		/* form error * 5 */
       nextrowerr[2  ] += c2;
-      c2 += val;		/* form error * 7 */
+      c2 += two_val;		/* form error * 7 */
       thisrowerr[2+3] += c2;
       /* Advance to next column */
       ptr0 += dir;
diff --git a/jrdgif.c b/jrdgif.c
index f484da8..4629bf1 100644
--- a/jrdgif.c
+++ b/jrdgif.c
@@ -196,7 +196,7 @@
   if ( (cur_bit+code_size) > last_bit) {
     /* Time to reload the buffer */
     if (out_of_blocks) {
-      TRACEMS(cinfo->emethods, 1, "Ran out of GIF bits");
+      WARNMS(cinfo->emethods, "Ran out of GIF bits");
       return end_code;		/* fake something useful */
     }
     /* preserve last two bytes of what we have -- assume code_size <= 16 */
@@ -205,7 +205,7 @@
     /* Load more bytes; set flag if we reach the terminator block */
     if ((count = GetDataBlock(cinfo, &code_buf[2])) == 0) {
       out_of_blocks = TRUE;
-      TRACEMS(cinfo->emethods, 1, "Ran out of GIF bits");
+      WARNMS(cinfo->emethods, "Ran out of GIF bits");
       return end_code;		/* fake something useful */
     }
     /* Reset counters */
@@ -249,43 +249,57 @@
   int incode;			/* saves actual input code */
 
   /* First time, just eat the expected Clear code(s) and return next code, */
-  /* which is assumed to be a raw byte. */
+  /* which is expected to be a raw byte. */
   if (first_time) {
     first_time = FALSE;
-    do {
-      code = GetCode(cinfo);
-    } while (code == clear_code);
-    firstcode = oldcode = code;	/* make firstcode, oldcode valid! */
-    return code;
+    code = clear_code;		/* enables sharing code with Clear case */
+  } else {
+
+    /* If any codes are stacked from a previously read symbol, return them */
+    if (sp > symbol_stack)
+      return (int) *(--sp);
+
+    /* Time to read a new symbol */
+    code = GetCode(cinfo);
+
   }
 
-  /* If any codes are stacked from a previously read symbol, return them */
-  if (sp > symbol_stack)
-    return (int) *(--sp);
-
-  code = GetCode(cinfo);
-
   if (code == clear_code) {
-    /* Reinit static state, swallow any extra Clear codes, and return */
+    /* Reinit static state, swallow any extra Clear codes, and */
+    /* return next code, which is expected to be a raw byte. */
     ReInitLZW();
     do {
       code = GetCode(cinfo);
     } while (code == clear_code);
-    firstcode = oldcode = code; /* gotta reinit these too */
+    if (code > clear_code) {	/* make sure it is a raw byte */
+      WARNMS(cinfo->emethods, "Corrupt data in GIF file");
+      code = 0;			/* use something valid */
+    }
+    firstcode = oldcode = code;	/* make firstcode, oldcode valid! */
     return code;
   }
 
   if (code == end_code) {
     /* Skip the rest of the image, unless GetCode already read terminator */
-    if (! out_of_blocks)
+    if (! out_of_blocks) {
       SkipDataBlocks(cinfo);
-    return -1;
+      out_of_blocks = TRUE;
+    }
+    /* Complain that there's not enough data */
+    WARNMS(cinfo->emethods, "Premature end of GIF image");
+    /* Pad data with 0's */
+    return 0;			/* fake something usable */
   }
 
-  /* Normal raw byte or LZW symbol */
+  /* Got normal raw byte or LZW symbol */
   incode = code;		/* save for a moment */
   
   if (code >= max_code) {	/* special case for not-yet-defined symbol */
+    /* code == max_code is OK; anything bigger is bad data */
+    if (code > max_code) {
+      WARNMS(cinfo->emethods, "Corrupt data in GIF file");
+      incode = 0;		/* prevent creation of loops in symbol table */
+    }
     *sp++ = (UINT8) firstcode;	/* it will be defined as oldcode/firstcode */
     code = oldcode;
   }
@@ -365,14 +379,14 @@
   /* Read and verify GIF Header */
   if (! ReadOK(cinfo->input_file, hdrbuf, 6))
     ERREXIT(cinfo->emethods, "Not a GIF file");
-  if (strncmp(hdrbuf, "GIF", 3) != 0)
+  if (hdrbuf[0] != 'G' || hdrbuf[1] != 'I' || hdrbuf[2] != 'F')
     ERREXIT(cinfo->emethods, "Not a GIF file");
   /* Check for expected version numbers.
    * If unknown version, give warning and try to process anyway;
    * this is per recommendation in GIF89a standard.
    */
-  if ((strncmp(hdrbuf+3, "87a", 3) != 0) &&
-      (strncmp(hdrbuf+3, "89a", 3) != 0))
+  if ((hdrbuf[3] != '8' || hdrbuf[4] != '7' || hdrbuf[5] != 'a') &&
+      (hdrbuf[3] != '8' || hdrbuf[4] != '9' || hdrbuf[5] != 'a'))
     TRACEMS3(cinfo->emethods, 1,
 	     "Warning: unexpected GIF version number '%c%c%c'",
 	     hdrbuf[3], hdrbuf[4], hdrbuf[5]);
@@ -418,13 +432,14 @@
     width = LM_to_uint(hdrbuf[4],hdrbuf[5]);
     height = LM_to_uint(hdrbuf[6],hdrbuf[7]);
     is_interlaced = BitSet(hdrbuf[8], INTERLACE);
-    colormaplen = 2 << (hdrbuf[8] & 0x07);
 
     /* Read local colormap if header indicates it is present */
     /* Note: if we wanted to support skipping images, */
     /* we'd need to skip rather than read colormap for ignored images */
-    if (BitSet(hdrbuf[8], COLORMAPFLAG))
+    if (BitSet(hdrbuf[8], COLORMAPFLAG)) {
+      colormaplen = 2 << (hdrbuf[8] & 0x07);
       ReadColorMap(cinfo, colormaplen, colormap);
+    }
 
     input_code_size = ReadByte(cinfo); /* get minimum-code-size byte */
     if (input_code_size < 2 || input_code_size >= MAX_LZW_BITS)
@@ -468,6 +483,9 @@
   cinfo->image_width = width;
   cinfo->image_height = height;
   cinfo->data_precision = 8;	/* always, even if 12-bit JSAMPLEs */
+
+  TRACEMS3(cinfo->emethods, 1, "%ux%ux%d GIF image",
+	   (unsigned int) width, (unsigned int) height, colormaplen);
 }
 
 
@@ -488,8 +506,7 @@
   ptr1 = pixel_row[1];
   ptr2 = pixel_row[2];
   for (col = cinfo->image_width; col > 0; col--) {
-    if ((c = LZWReadByte(cinfo)) < 0)
-      ERREXIT(cinfo->emethods, "Premature end of GIF image");
+    c = LZWReadByte(cinfo);
     *ptr0++ = colormap[CM_RED][c];
     *ptr1++ = colormap[CM_GREEN][c];
     *ptr2++ = colormap[CM_BLUE][c];
@@ -509,7 +526,6 @@
   JSAMPARRAY image_ptr;
   register JSAMPROW sptr;
   register long col;
-  register int c;
   long row;
 
   /* Read the interlaced image into the big array we've created. */
@@ -519,9 +535,7 @@
 			(interlaced_image, row, TRUE);
     sptr = image_ptr[0];
     for (col = cinfo->image_width; col > 0; col--) {
-      if ((c = LZWReadByte(cinfo)) < 0)
-	ERREXIT(cinfo->emethods, "Premature end of GIF image");
-      *sptr++ = (JSAMPLE) c;
+      *sptr++ = (JSAMPLE) LZWReadByte(cinfo);
     }
   }
   cinfo->completed_passes++;
diff --git a/jrdjfif.c b/jrdjfif.c
index dc4b646..9d63d4d 100644
--- a/jrdjfif.c
+++ b/jrdjfif.c
@@ -87,7 +87,10 @@
 /*
  * Reload the input buffer after it's been emptied, and return the next byte.
  * This is exported for direct use by the entropy decoder.
- * See the JGETC macro for calling conditions.
+ * See the JGETC macro for calling conditions.  Note in particular that
+ * read_jpeg_data may NOT return EOF.  If no more data is available, it must
+ * exit via ERREXIT, or perhaps synthesize fake data (such as an RST marker).
+ * For error recovery purposes, synthesizing an EOI marker is probably best.
  *
  * For this header control module, read_jpeg_data is supplied by the
  * user interface.  However, header formats that require random access
@@ -106,8 +109,12 @@
 					cinfo->next_input_byte,
 					JPEG_BUF_SIZE);
   
-  if (cinfo->bytes_in_buffer <= 0)
-    ERREXIT(cinfo->emethods, "Unexpected EOF in JPEG file");
+  if (cinfo->bytes_in_buffer <= 0) {
+    WARNMS(cinfo->emethods, "Premature EOF in JPEG file");
+    cinfo->next_input_byte[0] = (char) 0xFF;
+    cinfo->next_input_byte[1] = (char) M_EOI;
+    cinfo->bytes_in_buffer = 2;
+  }
 
   return JGETC(cinfo);
 }
@@ -199,10 +206,8 @@
     if (*htblptr == NULL)
       *htblptr = (HUFF_TBL *) (*cinfo->emethods->alloc_small) (SIZEOF(HUFF_TBL));
   
-    memcpy((void *) (*htblptr)->bits, (void *) bits,
-	   SIZEOF((*htblptr)->bits));
-    memcpy((void *) (*htblptr)->huffval, (void *) huffval,
-	   SIZEOF((*htblptr)->huffval));
+    MEMCOPY((*htblptr)->bits, bits, SIZEOF((*htblptr)->bits));
+    MEMCOPY((*htblptr)->huffval, huffval, SIZEOF((*htblptr)->huffval));
     }
 }
 
@@ -275,7 +280,7 @@
     }
 
     for (i = 0; i < DCTSIZE2; i += 8) {
-      TRACEMS8(cinfo->emethods, 2, "        %4d %4d %4d %4d %4d %4d %4d %4d",
+      TRACEMS8(cinfo->emethods, 2, "        %4u %4u %4u %4u %4u %4u %4u %4u",
 	       quant_ptr[i  ], quant_ptr[i+1], quant_ptr[i+2], quant_ptr[i+3],
 	       quant_ptr[i+4], quant_ptr[i+5], quant_ptr[i+6], quant_ptr[i+7]);
     }
@@ -296,7 +301,7 @@
   cinfo->restart_interval = (UINT16) get_2bytes(cinfo);
 
   TRACEMS1(cinfo->emethods, 1,
-	   "Define Restart Interval %d", cinfo->restart_interval);
+	   "Define Restart Interval %u", cinfo->restart_interval);
 }
 
 
@@ -318,15 +323,15 @@
       b[buffp] = (UINT8) JGETC(cinfo);
     length -= JFIF_LEN;
 
-    if (b[0]=='J' && b[1]=='F' && b[2]=='I' && b[3]=='F' && b[4]==0) {
+    if (b[0]==0x4A && b[1]==0x46 && b[2]==0x49 && b[3]==0x46 && b[4]==0) {
       /* Found JFIF APP0 marker: check version */
       /* Major version must be 1 */
       if (b[5] != 1)
 	ERREXIT2(cinfo->emethods, "Unsupported JFIF revision number %d.%02d",
 		 b[5], b[6]);
-      /* Minor version should be 0 or 1, but try to process anyway if newer */
-      if (b[6] != 0 && b[6] != 1)
-	TRACEMS2(cinfo->emethods, 0, "Warning: unknown JFIF revision number %d.%02d",
+      /* Minor version should be 0..2, but try to process anyway if newer */
+      if (b[6] > 2)
+	TRACEMS2(cinfo->emethods, 1, "Warning: unknown JFIF revision number %d.%02d",
 		 b[5], b[6]);
       /* Save info */
       cinfo->density_unit = b[7];
@@ -337,12 +342,19 @@
 	cinfo->jpeg_color_space = CS_YCbCr;
       TRACEMS3(cinfo->emethods, 1, "JFIF APP0 marker, density %dx%d  %d",
 	       cinfo->X_density, cinfo->Y_density, cinfo->density_unit);
+      if (b[12] | b[13])
+	TRACEMS2(cinfo->emethods, 1, "    with %d x %d thumbnail image",
+		 b[12], b[13]);
+      if (length != ((INT32) b[12] * (INT32) b[13] * (INT32) 3))
+	TRACEMS1(cinfo->emethods, 1,
+		 "Warning: thumbnail image size does not match data length %u",
+		 (int) length);
     } else {
-      TRACEMS(cinfo->emethods, 1, "Unknown APP0 marker (not JFIF)");
+      TRACEMS1(cinfo->emethods, 1, "Unknown APP0 marker (not JFIF), length %u",
+	       (int) length + JFIF_LEN);
     }
   } else {
-    TRACEMS1(cinfo->emethods, 1,
-	     "Short APP0 marker, length %d", (int) length);
+    TRACEMS1(cinfo->emethods, 1, "Short APP0 marker, length %u", (int) length);
   }
 
   while (length-- > 0)		/* skip any remaining data */
@@ -500,14 +512,15 @@
       c = JGETC(cinfo);
     } while (c != 0xFF);
     do {			/* skip any duplicate FFs */
-      nbytes++;
+      /* we don't increment nbytes here since extra FFs are legal */
       c = JGETC(cinfo);
     } while (c == 0xFF);
   } while (c == 0);		/* repeat if it was a stuffed FF/00 */
 
-  if (nbytes != 2)
-    TRACEMS2(cinfo->emethods, 1, "Skipped %d bytes before marker 0x%02x",
-	     nbytes-2, c);
+  if (nbytes != 1)
+    WARNMS2(cinfo->emethods,
+	    "Corrupt JPEG data: %d extraneous bytes before marker 0x%02x",
+	    nbytes-1, c);
 
   return c;
 }
@@ -644,7 +657,7 @@
       else if (cid0 == 1 && cid1 == 4 && cid2 == 5)
 	cinfo->jpeg_color_space = CS_YIQ; /* prototype's YIQ matrix */
       else {
-	TRACEMS3(cinfo->emethods, 0,
+	TRACEMS3(cinfo->emethods, 1,
 		 "Unrecognized component IDs %d %d %d, assuming YCbCr",
 		 cid0, cid1, cid2);
 	cinfo->jpeg_color_space = CS_YCbCr;
@@ -694,6 +707,94 @@
 
 
 /*
+ * The entropy decoder calls this routine if it finds a marker other than
+ * the restart marker it was expecting.  (This code is *not* used unless
+ * a nonzero restart interval has been declared.)  The passed parameter is
+ * the marker code actually found (might be anything, except 0 or FF).
+ * The desired restart marker is that indicated by cinfo->next_restart_num.
+ * This routine is supposed to apply whatever error recovery strategy seems
+ * appropriate in order to position the input stream to the next data segment.
+ * For some file formats (eg, TIFF) extra information such as tile boundary
+ * pointers may be available to help in this decision.
+ *
+ * This implementation is substantially constrained by wanting to treat the
+ * input as a data stream; this means we can't back up.  (For instance, we
+ * generally can't fseek() if the input is a Unix pipe.)  Therefore, we have
+ * only the following actions to work with:
+ *   1. Do nothing, let the entropy decoder resume at next byte of file.
+ *   2. Read forward until we find another marker, discarding intervening
+ *      data.  (In theory we could look ahead within the current bufferload,
+ *      without having to discard data if we don't find the desired marker.
+ *      This idea is not implemented here, in part because it makes behavior
+ *      dependent on buffer size and chance buffer-boundary positions.)
+ *   3. Push back the passed marker (with JUNGETC).  This will cause the
+ *      entropy decoder to process an empty data segment, inserting dummy
+ *      zeroes, and then re-read the marker we pushed back.
+ * #2 is appropriate if we think the desired marker lies ahead, while #3 is
+ * appropriate if the found marker is a future restart marker (indicating
+ * that we have missed the desired restart marker, probably because it got
+ * corrupted).
+
+ * We apply #2 or #3 if the found marker is a restart marker no more than
+ * two counts behind or ahead of the expected one.  We also apply #2 if the
+ * found marker is not a legal JPEG marker code (it's certainly bogus data).
+ * If the found marker is a restart marker more than 2 counts away, we do #1
+ * (too much risk that the marker is erroneous; with luck we will be able to
+ * resync at some future point).
+ * For any valid non-restart JPEG marker, we apply #3.  This keeps us from
+ * overrunning the end of a scan.  An implementation limited to single-scan
+ * files might find it better to apply #2 for markers other than EOI, since
+ * any other marker would have to be bogus data in that case.
+ */
+
+METHODDEF void
+resync_to_restart (decompress_info_ptr cinfo, int marker)
+{
+  int desired = cinfo->next_restart_num;
+  int action = 1;
+
+  /* Always put up a warning. */
+  WARNMS2(cinfo->emethods,
+	  "Corrupt JPEG data: found 0x%02x marker instead of RST%d",
+	  marker, desired);
+  /* Outer loop handles repeated decision after scanning forward. */
+  for (;;) {
+    if (marker < M_SOF0)
+      action = 2;		/* invalid marker */
+    else if (marker < M_RST0 || marker > M_RST7)
+      action = 3;		/* valid non-restart marker */
+    else {
+      if (marker == (M_RST0 + ((desired+1) & 7)) ||
+	  marker == (M_RST0 + ((desired+2) & 7)))
+	action = 3;		/* one of the next two expected restarts */
+      else if (marker == (M_RST0 + ((desired-1) & 7)) ||
+	       marker == (M_RST0 + ((desired-2) & 7)))
+	action = 2;		/* a prior restart, so advance */
+      else
+	action = 1;		/* desired restart or too far away */
+    }
+    TRACEMS2(cinfo->emethods, 4,
+	     "At marker 0x%02x, recovery action %d", marker, action);
+    switch (action) {
+    case 1:
+      /* Let entropy decoder resume processing. */
+      return;
+    case 2:
+      /* Scan to the next marker, and repeat the decision loop. */
+      marker = next_marker(cinfo);
+      break;
+    case 3:
+      /* Put back this marker & return. */
+      /* Entropy decoder will be forced to process an empty segment. */
+      JUNGETC(marker, cinfo);
+      JUNGETC(0xFF, cinfo);
+      return;
+    }
+  }
+}
+
+
+/*
  * Finish up after a compressed scan (series of read_jpeg_data calls);
  * prepare for another read_scan_header call.
  */
@@ -733,6 +834,7 @@
 #if 0
   cinfo->methods->read_jpeg_data = read_jpeg_data;
 #endif
+  cinfo->methods->resync_to_restart = resync_to_restart;
   cinfo->methods->read_scan_trailer = read_scan_trailer;
   cinfo->methods->read_file_trailer = read_file_trailer;
 }
diff --git a/jrdppm.c b/jrdppm.c
index 7f38048..ad637af 100644
--- a/jrdppm.c
+++ b/jrdppm.c
@@ -25,9 +25,6 @@
 #ifdef PPM_SUPPORTED
 
 
-static JSAMPLE * rescale;	/* => maxval-remapping array, or NULL */
-
-
 /* Portions of this code are based on the PBMPLUS library, which is:
 **
 ** Copyright (C) 1988 by Jef Poskanzer.
@@ -41,6 +38,41 @@
 */
 
 
+/* Macros to deal with unsigned chars as efficiently as compiler allows */
+
+#ifdef HAVE_UNSIGNED_CHAR
+typedef unsigned char U_CHAR;
+#define UCH(x)	((int) (x))
+#else /* !HAVE_UNSIGNED_CHAR */
+#ifdef CHAR_IS_UNSIGNED
+typedef char U_CHAR;
+#define UCH(x)	((int) (x))
+#else
+typedef char U_CHAR;
+#define UCH(x)	((int) (x) & 0xFF)
+#endif
+#endif /* HAVE_UNSIGNED_CHAR */
+
+
+#define	ReadOK(file,buffer,len)	(JFREAD(file,buffer,len) == ((size_t) (len)))
+
+
+/*
+ * On most systems, reading individual bytes with getc() is drastically less
+ * efficient than buffering a row at a time with fread().  But we must
+ * allocate the row buffer in near data space on PCs, because we are assuming
+ * small-data memory model, wherein fread() can't reach far memory.  If you
+ * need to process very wide images on a PC, you may have to use the getc()
+ * approach.  In that case, define USE_GETC_INPUT.
+ */
+
+#ifndef USE_GETC_INPUT
+static U_CHAR * row_buffer;	/* holds 1 pixel row's worth of raw input */
+#endif
+
+static JSAMPLE * rescale;	/* => maxval-remapping array, or NULL */
+
+
 LOCAL int
 pbm_getc (FILE * file)
 /* Read next char, skipping over any comments */
@@ -145,6 +177,9 @@
 }
 
 
+#ifdef USE_GETC_INPUT
+
+
 METHODDEF void
 get_scaled_gray_row (compress_info_ptr cinfo, JSAMPARRAY pixel_row)
 /* This version is for reading raw-format PGM files with any maxval */
@@ -213,6 +248,92 @@
 }
 
 
+#else /* use row buffering */
+
+
+METHODDEF void
+get_scaled_gray_row (compress_info_ptr cinfo, JSAMPARRAY pixel_row)
+/* This version is for reading raw-format PGM files with any maxval */
+{
+  register JSAMPROW ptr0;
+  register U_CHAR * row_bufferptr;
+  register long col;
+  
+  if (! ReadOK(cinfo->input_file, row_buffer, cinfo->image_width))
+    ERREXIT(cinfo->emethods, "Premature EOF in PPM file");
+  ptr0 = pixel_row[0];
+  row_bufferptr = row_buffer;
+  for (col = cinfo->image_width; col > 0; col--) {
+    *ptr0++ = rescale[UCH(*row_bufferptr++)];
+  }
+}
+
+
+METHODDEF void
+get_scaled_rgb_row (compress_info_ptr cinfo, JSAMPARRAY pixel_row)
+/* This version is for reading raw-format PPM files with any maxval */
+{
+  register JSAMPROW ptr0, ptr1, ptr2;
+  register U_CHAR * row_bufferptr;
+  register long col;
+  
+  if (! ReadOK(cinfo->input_file, row_buffer, 3 * cinfo->image_width))
+    ERREXIT(cinfo->emethods, "Premature EOF in PPM file");
+  ptr0 = pixel_row[0];
+  ptr1 = pixel_row[1];
+  ptr2 = pixel_row[2];
+  row_bufferptr = row_buffer;
+  for (col = cinfo->image_width; col > 0; col--) {
+    *ptr0++ = rescale[UCH(*row_bufferptr++)];
+    *ptr1++ = rescale[UCH(*row_bufferptr++)];
+    *ptr2++ = rescale[UCH(*row_bufferptr++)];
+  }
+}
+
+
+METHODDEF void
+get_raw_gray_row (compress_info_ptr cinfo, JSAMPARRAY pixel_row)
+/* This version is for reading raw-format PGM files with maxval = MAXJSAMPLE */
+{
+  register JSAMPROW ptr0;
+  register U_CHAR * row_bufferptr;
+  register long col;
+  
+  if (! ReadOK(cinfo->input_file, row_buffer, cinfo->image_width))
+    ERREXIT(cinfo->emethods, "Premature EOF in PPM file");
+  ptr0 = pixel_row[0];
+  row_bufferptr = row_buffer;
+  for (col = cinfo->image_width; col > 0; col--) {
+    *ptr0++ = (JSAMPLE) UCH(*row_bufferptr++);
+  }
+}
+
+
+METHODDEF void
+get_raw_rgb_row (compress_info_ptr cinfo, JSAMPARRAY pixel_row)
+/* This version is for reading raw-format PPM files with maxval = MAXJSAMPLE */
+{
+  register JSAMPROW ptr0, ptr1, ptr2;
+  register U_CHAR * row_bufferptr;
+  register long col;
+  
+  if (! ReadOK(cinfo->input_file, row_buffer, 3 * cinfo->image_width))
+    ERREXIT(cinfo->emethods, "Premature EOF in PPM file");
+  ptr0 = pixel_row[0];
+  ptr1 = pixel_row[1];
+  ptr2 = pixel_row[2];
+  row_bufferptr = row_buffer;
+  for (col = cinfo->image_width; col > 0; col--) {
+    *ptr0++ = (JSAMPLE) UCH(*row_bufferptr++);
+    *ptr1++ = (JSAMPLE) UCH(*row_bufferptr++);
+    *ptr2++ = (JSAMPLE) UCH(*row_bufferptr++);
+  }
+}
+
+
+#endif /* USE_GETC_INPUT */
+
+
 /*
  * Read the file header; return image size and component count.
  */
@@ -232,17 +353,22 @@
   h = read_pbm_integer(cinfo);
   maxval = read_pbm_integer(cinfo);
 
+  if (w <= 0 || h <= 0 || maxval <= 0) /* error check */
+    ERREXIT(cinfo->emethods, "Not a PPM file");
+
   switch (c) {
   case '2':			/* it's a text-format PGM file */
     cinfo->methods->get_input_row = get_text_gray_row;
     cinfo->input_components = 1;
     cinfo->in_color_space = CS_GRAYSCALE;
+    TRACEMS2(cinfo->emethods, 1, "%ux%u text PGM image", w, h);
     break;
 
   case '3':			/* it's a text-format PPM file */
     cinfo->methods->get_input_row = get_text_rgb_row;
     cinfo->input_components = 3;
     cinfo->in_color_space = CS_RGB;
+    TRACEMS2(cinfo->emethods, 1, "%ux%u text PPM image", w, h);
     break;
 
   case '5':			/* it's a raw-format PGM file */
@@ -252,6 +378,12 @@
       cinfo->methods->get_input_row = get_scaled_gray_row;
     cinfo->input_components = 1;
     cinfo->in_color_space = CS_GRAYSCALE;
+#ifndef USE_GETC_INPUT
+    /* allocate space for row buffer: 1 byte/pixel */
+    row_buffer = (U_CHAR *) (*cinfo->emethods->alloc_small)
+			((size_t) (SIZEOF(U_CHAR) * (long) w));
+#endif
+    TRACEMS2(cinfo->emethods, 1, "%ux%u PGM image", w, h);
     break;
 
   case '6':			/* it's a raw-format PPM file */
@@ -261,6 +393,12 @@
       cinfo->methods->get_input_row = get_scaled_rgb_row;
     cinfo->input_components = 3;
     cinfo->in_color_space = CS_RGB;
+#ifndef USE_GETC_INPUT
+    /* allocate space for row buffer: 3 bytes/pixel */
+    row_buffer = (U_CHAR *) (*cinfo->emethods->alloc_small)
+			((size_t) (3 * SIZEOF(U_CHAR) * (long) w));
+#endif
+    TRACEMS2(cinfo->emethods, 1, "%ux%u PPM image", w, h);
     break;
 
   default:
@@ -268,9 +406,6 @@
     break;
   }
 
-  if (w <= 0 || h <= 0 || maxval <= 0) /* error check */
-    ERREXIT(cinfo->emethods, "Not a PPM file");
-
   /* Compute the rescaling array if necessary */
   /* This saves per-pixel calculation */
   if (maxval == MAXJSAMPLE)
diff --git a/jrdtarga.c b/jrdtarga.c
index 3e35223..aefc6ab 100644
--- a/jrdtarga.c
+++ b/jrdtarga.c
@@ -369,6 +369,8 @@
       get_pixel_row = get_8bit_row;
     else
       ERREXIT(cinfo->emethods, "Invalid or unsupported Targa file");
+    TRACEMS2(cinfo->emethods, 1, "%ux%u colormapped Targa image",
+	     width, height);
     break;
   case 2:			/* RGB image */
     switch (pixel_size) {
@@ -385,6 +387,8 @@
       ERREXIT(cinfo->emethods, "Invalid or unsupported Targa file");
       break;
     }
+    TRACEMS2(cinfo->emethods, 1, "%ux%u RGB Targa image",
+	     width, height);
     break;
   case 3:			/* Grayscale image */
     components = 1;
@@ -393,6 +397,8 @@
       get_pixel_row = get_8bit_gray_row;
     else
       ERREXIT(cinfo->emethods, "Invalid or unsupported Targa file");
+    TRACEMS2(cinfo->emethods, 1, "%ux%u grayscale Targa image",
+	     width, height);
     break;
   default:
     ERREXIT(cinfo->emethods, "Invalid or unsupported Targa file");
diff --git a/jrevdct.c b/jrevdct.c
index 949973b..3203b7e 100644
--- a/jrevdct.c
+++ b/jrevdct.c
@@ -7,10 +7,15 @@
  *
  * This file contains the basic inverse-DCT transformation subroutine.
  *
- * This implementation is based on Appendix A.2 of the book
- * "Discrete Cosine Transform---Algorithms, Advantages, Applications"
- * by K.R. Rao and P. Yip  (Academic Press, Inc, London, 1990).
- * It uses scaled fixed-point arithmetic instead of floating point.
+ * This implementation is based on an algorithm described in
+ *   C. Loeffler, A. Ligtenberg and G. Moschytz, "Practical Fast 1-D DCT
+ *   Algorithms with 11 Multiplications", Proc. Int'l. Conf. on Acoustics,
+ *   Speech, and Signal Processing 1989 (ICASSP '89), pp. 988-991.
+ * The primary algorithm described there uses 11 multiplies and 29 adds.
+ * We use their alternate method with 12 multiplies and 32 adds.
+ * The advantage of this method is that no data path contains more than one
+ * multiplication; this allows a very simple and accurate implementation in
+ * scaled fixed-point arithmetic, with a minimal number of shifts.
  */
 
 #include "jinclude.h"
@@ -24,198 +29,341 @@
 #endif
 
 
-/* The poop on this scaling stuff is as follows:
+/*
+ * A 2-D IDCT can be done by 1-D IDCT on each row followed by 1-D IDCT
+ * on each column.  Direct algorithms are also available, but they are
+ * much more complex and seem not to be any faster when reduced to code.
+ *
+ * The poop on this scaling stuff is as follows:
+ *
+ * Each 1-D IDCT step produces outputs which are a factor of sqrt(N)
+ * larger than the true IDCT outputs.  The final outputs are therefore
+ * a factor of N larger than desired; since N=8 this can be cured by
+ * a simple right shift at the end of the algorithm.  The advantage of
+ * this arrangement is that we save two multiplications per 1-D IDCT,
+ * because the y0 and y4 inputs need not be divided by sqrt(N).
  *
  * We have to do addition and subtraction of the integer inputs, which
  * is no problem, and multiplication by fractional constants, which is
  * a problem to do in integer arithmetic.  We multiply all the constants
- * by DCT_SCALE and convert them to integer constants (thus retaining
- * LG2_DCT_SCALE bits of precision in the constants).  After doing a
- * multiplication we have to divide the product by DCT_SCALE, with proper
- * rounding, to produce the correct output.  The division can be implemented
- * cheaply as a right shift of LG2_DCT_SCALE bits.  The DCT equations also
- * specify an additional division by 2 on the final outputs; this can be
- * folded into the right-shift by shifting one more bit (see UNFIXH).
+ * by CONST_SCALE and convert them to integer constants (thus retaining
+ * CONST_BITS bits of precision in the constants).  After doing a
+ * multiplication we have to divide the product by CONST_SCALE, with proper
+ * rounding, to produce the correct output.  This division can be done
+ * cheaply as a right shift of CONST_BITS bits.  We postpone shifting
+ * as long as possible so that partial sums can be added together with
+ * full fractional precision.
  *
- * If you are planning to recode this in assembler, you might want to set
- * LG2_DCT_SCALE to 15.  This loses a bit of precision, but then all the
- * multiplications are between 16-bit quantities (given 8-bit JSAMPLEs!)
- * so you could use a signed 16x16=>32 bit multiply instruction instead of
- * full 32x32 multiply.  Unfortunately there's no way to describe such a
- * multiply portably in C, so we've gone for the extra bit of accuracy here.
+ * The outputs of the first pass are scaled up by PASS1_BITS bits so that
+ * they are represented to better-than-integral precision.  These outputs
+ * require BITS_IN_JSAMPLE + PASS1_BITS + 3 bits; this fits in a 16-bit word
+ * with the recommended scaling.  (To scale up 12-bit sample data further, an
+ * intermediate INT32 array would be needed.)
+ *
+ * To avoid overflow of the 32-bit intermediate results in pass 2, we must
+ * have BITS_IN_JSAMPLE + CONST_BITS + PASS1_BITS <= 26.  Error analysis
+ * shows that the values given below are the most effective.
  */
 
 #ifdef EIGHT_BIT_SAMPLES
-#define LG2_DCT_SCALE 16
+#define CONST_BITS  13
+#define PASS1_BITS  2
 #else
-#define LG2_DCT_SCALE 15	/* lose a little precision to avoid overflow */
+#define CONST_BITS  13
+#define PASS1_BITS  1		/* lose a little precision to avoid overflow */
 #endif
 
 #define ONE	((INT32) 1)
 
-#define DCT_SCALE (ONE << LG2_DCT_SCALE)
+#define CONST_SCALE (ONE << CONST_BITS)
 
-/* In some places we shift the inputs left by a couple more bits, */
-/* so that they can be added to fractional results without too much */
-/* loss of precision. */
-#define LG2_OVERSCALE 2
-#define OVERSCALE  (ONE << LG2_OVERSCALE)
-#define OVERSHIFT(x)  ((x) <<= LG2_OVERSCALE)
+/* Convert a positive real constant to an integer scaled by CONST_SCALE. */
 
-/* Scale a fractional constant by DCT_SCALE */
-#define FIX(x)	((INT32) ((x) * DCT_SCALE + 0.5))
+#define FIX(x)	((INT32) ((x) * CONST_SCALE + 0.5))
 
-/* Scale a fractional constant by DCT_SCALE/OVERSCALE */
-/* Such a constant can be multiplied with an overscaled input */
-/* to produce something that's scaled by DCT_SCALE */
-#define FIXO(x)  ((INT32) ((x) * DCT_SCALE / OVERSCALE + 0.5))
+/* Some C compilers fail to reduce "FIX(constant)" at compile time, thus
+ * causing a lot of useless floating-point operations at run time.
+ * To get around this we use the following pre-calculated constants.
+ * If you change CONST_BITS you may want to add appropriate values.
+ * (With a reasonable C compiler, you can just rely on the FIX() macro...)
+ */
 
-/* Descale and correctly round a value that's scaled by DCT_SCALE */
-#define UNFIX(x)   RIGHT_SHIFT((x) + (ONE << (LG2_DCT_SCALE-1)), LG2_DCT_SCALE)
+#if CONST_BITS == 13
+#define FIX_0_298631336  ((INT32)  2446)	/* FIX(0.298631336) */
+#define FIX_0_390180644  ((INT32)  3196)	/* FIX(0.390180644) */
+#define FIX_0_541196100  ((INT32)  4433)	/* FIX(0.541196100) */
+#define FIX_0_765366865  ((INT32)  6270)	/* FIX(0.765366865) */
+#define FIX_0_899976223  ((INT32)  7373)	/* FIX(0.899976223) */
+#define FIX_1_175875602  ((INT32)  9633)	/* FIX(1.175875602) */
+#define FIX_1_501321110  ((INT32)  12299)	/* FIX(1.501321110) */
+#define FIX_1_847759065  ((INT32)  15137)	/* FIX(1.847759065) */
+#define FIX_1_961570560  ((INT32)  16069)	/* FIX(1.961570560) */
+#define FIX_2_053119869  ((INT32)  16819)	/* FIX(2.053119869) */
+#define FIX_2_562915447  ((INT32)  20995)	/* FIX(2.562915447) */
+#define FIX_3_072711026  ((INT32)  25172)	/* FIX(3.072711026) */
+#else
+#define FIX_0_298631336  FIX(0.298631336)
+#define FIX_0_390180644  FIX(0.390180644)
+#define FIX_0_541196100  FIX(0.541196100)
+#define FIX_0_765366865  FIX(0.765366865)
+#define FIX_0_899976223  FIX(0.899976223)
+#define FIX_1_175875602  FIX(1.175875602)
+#define FIX_1_501321110  FIX(1.501321110)
+#define FIX_1_847759065  FIX(1.847759065)
+#define FIX_1_961570560  FIX(1.961570560)
+#define FIX_2_053119869  FIX(2.053119869)
+#define FIX_2_562915447  FIX(2.562915447)
+#define FIX_3_072711026  FIX(3.072711026)
+#endif
 
-/* Same with an additional division by 2, ie, correctly rounded UNFIX(x/2) */
-#define UNFIXH(x)  RIGHT_SHIFT((x) + (ONE << LG2_DCT_SCALE), LG2_DCT_SCALE+1)
 
-/* Take a value scaled by DCT_SCALE and round to integer scaled by OVERSCALE */
-#define UNFIXO(x)  RIGHT_SHIFT((x) + (ONE << (LG2_DCT_SCALE-1-LG2_OVERSCALE)),\
-			       LG2_DCT_SCALE-LG2_OVERSCALE)
+/* Descale and correctly round an INT32 value that's scaled by N bits.
+ * We assume RIGHT_SHIFT rounds towards minus infinity, so adding
+ * the fudge factor is correct for either sign of X.
+ */
 
-/* Here are the constants we need */
-/* SIN_i_j is sine of i*pi/j, scaled by DCT_SCALE */
-/* COS_i_j is cosine of i*pi/j, scaled by DCT_SCALE */
+#define DESCALE(x,n)  RIGHT_SHIFT((x) + (ONE << ((n)-1)), n)
 
-#define SIN_1_4 FIX(0.707106781)
-#define COS_1_4 SIN_1_4
+/* Multiply an INT32 variable by an INT32 constant to yield an INT32 result.
+ * For 8-bit samples with the recommended scaling, all the variable
+ * and constant values involved are no more than 16 bits wide, so a
+ * 16x16->32 bit multiply can be used instead of a full 32x32 multiply;
+ * this provides a useful speedup on many machines.
+ * There is no way to specify a 16x16->32 multiply in portable C, but
+ * some C compilers will do the right thing if you provide the correct
+ * combination of casts.
+ * NB: for 12-bit samples, a full 32-bit multiplication will be needed.
+ */
 
-#define SIN_1_8 FIX(0.382683432)
-#define COS_1_8 FIX(0.923879533)
-#define SIN_3_8 COS_1_8
-#define COS_3_8 SIN_1_8
+#ifdef EIGHT_BIT_SAMPLES
+#ifdef SHORTxSHORT_32		/* may work if 'int' is 32 bits */
+#define MULTIPLY(var,const)  (((INT16) (var)) * ((INT16) (const)))
+#endif
+#ifdef SHORTxLCONST_32		/* known to work with Microsoft C 6.0 */
+#define MULTIPLY(var,const)  (((INT16) (var)) * ((INT32) (const)))
+#endif
+#endif
 
-#define SIN_1_16 FIX(0.195090322)
-#define COS_1_16 FIX(0.980785280)
-#define SIN_7_16 COS_1_16
-#define COS_7_16 SIN_1_16
-
-#define SIN_3_16 FIX(0.555570233)
-#define COS_3_16 FIX(0.831469612)
-#define SIN_5_16 COS_3_16
-#define COS_5_16 SIN_3_16
-
-/* OSIN_i_j is sine of i*pi/j, scaled by DCT_SCALE/OVERSCALE */
-/* OCOS_i_j is cosine of i*pi/j, scaled by DCT_SCALE/OVERSCALE */
-
-#define OSIN_1_4 FIXO(0.707106781)
-#define OCOS_1_4 OSIN_1_4
-
-#define OSIN_1_8 FIXO(0.382683432)
-#define OCOS_1_8 FIXO(0.923879533)
-#define OSIN_3_8 OCOS_1_8
-#define OCOS_3_8 OSIN_1_8
-
-#define OSIN_1_16 FIXO(0.195090322)
-#define OCOS_1_16 FIXO(0.980785280)
-#define OSIN_7_16 OCOS_1_16
-#define OCOS_7_16 OSIN_1_16
-
-#define OSIN_3_16 FIXO(0.555570233)
-#define OCOS_3_16 FIXO(0.831469612)
-#define OSIN_5_16 OCOS_3_16
-#define OCOS_5_16 OSIN_3_16
+#ifndef MULTIPLY		/* default definition */
+#define MULTIPLY(var,const)  ((var) * (const))
+#endif
 
 
 /*
  * Perform the inverse DCT on one block of coefficients.
- *
- * A 2-D IDCT can be done by 1-D IDCT on each row
- * followed by 1-D IDCT on each column.
  */
 
 GLOBAL void
 j_rev_dct (DCTBLOCK data)
 {
-  int pass, rowctr;
-  register DCTELEM *inptr, *outptr;
-  DCTBLOCK workspace;
+  INT32 tmp0, tmp1, tmp2, tmp3;
+  INT32 tmp10, tmp11, tmp12, tmp13;
+  INT32 z1, z2, z3, z4, z5;
+  register DCTELEM *dataptr;
+  int rowctr;
+  SHIFT_TEMPS
 
-  /* Each iteration of the inner loop performs one 8-point 1-D IDCT.
-   * It reads from a *row* of the input matrix and stores into a *column*
-   * of the output matrix.  In the first pass, we read from the data[] array
-   * and store into the local workspace[].  In the second pass, we read from
-   * the workspace[] array and store into data[], thus performing the
-   * equivalent of a columnar IDCT pass with no variable array indexing.
-   */
+  /* Pass 1: process rows. */
+  /* Note results are scaled up by sqrt(8) compared to a true IDCT; */
+  /* furthermore, we scale the results by 2**PASS1_BITS. */
 
-  inptr = data;			/* initialize pointers for first pass */
-  outptr = workspace;
-  for (pass = 1; pass >= 0; pass--) {
-    for (rowctr = DCTSIZE-1; rowctr >= 0; rowctr--) {
-      /* many tmps have nonoverlapping lifetime -- flashy register colourers
-       * should be able to do this lot very well
-       */
-      INT32 in0, in1, in2, in3, in4, in5, in6, in7;
-      INT32 tmp10, tmp11, tmp12, tmp13;
-      INT32 tmp20, tmp21, tmp22, tmp23;
-      INT32 tmp30, tmp31;
-      INT32 tmp40, tmp41, tmp42, tmp43;
-      INT32 tmp50, tmp51, tmp52, tmp53;
-      SHIFT_TEMPS
-	
-      in0 = inptr[0];
-      in1 = inptr[1];
-      in2 = inptr[2];
-      in3 = inptr[3];
-      in4 = inptr[4];
-      in5 = inptr[5];
-      in6 = inptr[6];
-      in7 = inptr[7];
+  dataptr = data;
+  for (rowctr = DCTSIZE-1; rowctr >= 0; rowctr--) {
+    /* Due to quantization, we will usually find that many of the input
+     * coefficients are zero, especially the AC terms.  We can exploit this
+     * by short-circuiting the IDCT calculation for any row in which all
+     * the AC terms are zero.  In that case each output is equal to the
+     * DC coefficient (with scale factor as needed).
+     * With typical images and quantization tables, half or more of the
+     * row DCT calculations can be simplified this way.
+     */
+
+    if ((dataptr[1] | dataptr[2] | dataptr[3] | dataptr[4] |
+	 dataptr[5] | dataptr[6] | dataptr[7]) == 0) {
+      /* AC terms all zero */
+      DCTELEM dcval = (DCTELEM) (dataptr[0] << PASS1_BITS);
       
-      /* These values are scaled by DCT_SCALE */
+      dataptr[0] = dcval;
+      dataptr[1] = dcval;
+      dataptr[2] = dcval;
+      dataptr[3] = dcval;
+      dataptr[4] = dcval;
+      dataptr[5] = dcval;
+      dataptr[6] = dcval;
+      dataptr[7] = dcval;
       
-      tmp10 = (in0 + in4) * COS_1_4;
-      tmp11 = (in0 - in4) * COS_1_4;
-      tmp12 = in2 * SIN_1_8 - in6 * COS_1_8;
-      tmp13 = in6 * SIN_1_8 + in2 * COS_1_8;
-      
-      tmp20 = tmp10 + tmp13;
-      tmp21 = tmp11 + tmp12;
-      tmp22 = tmp11 - tmp12;
-      tmp23 = tmp10 - tmp13;
-      
-      /* These values are scaled by OVERSCALE */
-      
-      tmp30 = UNFIXO((in3 + in5) * COS_1_4);
-      tmp31 = UNFIXO((in3 - in5) * COS_1_4);
-      
-      OVERSHIFT(in1);
-      OVERSHIFT(in7);
-      
-      tmp40 = in1 + tmp30;
-      tmp41 = in7 + tmp31;
-      tmp42 = in1 - tmp30;
-      tmp43 = in7 - tmp31;
-      
-      /* And these are scaled by DCT_SCALE */
-      
-      tmp50 = tmp40 * OCOS_1_16 + tmp41 * OSIN_1_16;
-      tmp51 = tmp40 * OSIN_1_16 - tmp41 * OCOS_1_16;
-      tmp52 = tmp42 * OCOS_5_16 + tmp43 * OSIN_5_16;
-      tmp53 = tmp42 * OSIN_5_16 - tmp43 * OCOS_5_16;
-      
-      outptr[        0] = (DCTELEM) UNFIXH(tmp20 + tmp50);
-      outptr[DCTSIZE  ] = (DCTELEM) UNFIXH(tmp21 + tmp53);
-      outptr[DCTSIZE*2] = (DCTELEM) UNFIXH(tmp22 + tmp52);
-      outptr[DCTSIZE*3] = (DCTELEM) UNFIXH(tmp23 + tmp51);
-      outptr[DCTSIZE*4] = (DCTELEM) UNFIXH(tmp23 - tmp51);
-      outptr[DCTSIZE*5] = (DCTELEM) UNFIXH(tmp22 - tmp52);
-      outptr[DCTSIZE*6] = (DCTELEM) UNFIXH(tmp21 - tmp53);
-      outptr[DCTSIZE*7] = (DCTELEM) UNFIXH(tmp20 - tmp50);
-      
-      inptr += DCTSIZE;		/* advance inptr to next row */
-      outptr++;			/* advance outptr to next column */
+      dataptr += DCTSIZE;	/* advance pointer to next row */
+      continue;
     }
-    /* end of pass; in case it was pass 1, set up for pass 2 */
-    inptr = workspace;
-    outptr = data;
+
+    /* Even part: reverse the even part of the forward DCT. */
+    /* The rotator is sqrt(2)*c(-6). */
+
+    z2 = (INT32) dataptr[2];
+    z3 = (INT32) dataptr[6];
+
+    z1 = MULTIPLY(z2 + z3, FIX_0_541196100);
+    tmp2 = z1 + MULTIPLY(z3, - FIX_1_847759065);
+    tmp3 = z1 + MULTIPLY(z2, FIX_0_765366865);
+
+    tmp0 = ((INT32) dataptr[0] + (INT32) dataptr[4]) << CONST_BITS;
+    tmp1 = ((INT32) dataptr[0] - (INT32) dataptr[4]) << CONST_BITS;
+
+    tmp10 = tmp0 + tmp3;
+    tmp13 = tmp0 - tmp3;
+    tmp11 = tmp1 + tmp2;
+    tmp12 = tmp1 - tmp2;
+    
+    /* Odd part per figure 8; the matrix is unitary and hence its
+     * transpose is its inverse.  i0..i3 are y7,y5,y3,y1 respectively.
+     */
+
+    tmp0 = (INT32) dataptr[7];
+    tmp1 = (INT32) dataptr[5];
+    tmp2 = (INT32) dataptr[3];
+    tmp3 = (INT32) dataptr[1];
+
+    z1 = tmp0 + tmp3;
+    z2 = tmp1 + tmp2;
+    z3 = tmp0 + tmp2;
+    z4 = tmp1 + tmp3;
+    z5 = MULTIPLY(z3 + z4, FIX_1_175875602); /* sqrt(2) * c3 */
+    
+    tmp0 = MULTIPLY(tmp0, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
+    tmp1 = MULTIPLY(tmp1, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
+    tmp2 = MULTIPLY(tmp2, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
+    tmp3 = MULTIPLY(tmp3, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
+    z1 = MULTIPLY(z1, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
+    z2 = MULTIPLY(z2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
+    z3 = MULTIPLY(z3, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
+    z4 = MULTIPLY(z4, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
+    
+    z3 += z5;
+    z4 += z5;
+    
+    tmp0 += z1 + z3;
+    tmp1 += z2 + z4;
+    tmp2 += z2 + z3;
+    tmp3 += z1 + z4;
+
+    /* Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 */
+
+    dataptr[0] = (DCTELEM) DESCALE(tmp10 + tmp3, CONST_BITS-PASS1_BITS);
+    dataptr[7] = (DCTELEM) DESCALE(tmp10 - tmp3, CONST_BITS-PASS1_BITS);
+    dataptr[1] = (DCTELEM) DESCALE(tmp11 + tmp2, CONST_BITS-PASS1_BITS);
+    dataptr[6] = (DCTELEM) DESCALE(tmp11 - tmp2, CONST_BITS-PASS1_BITS);
+    dataptr[2] = (DCTELEM) DESCALE(tmp12 + tmp1, CONST_BITS-PASS1_BITS);
+    dataptr[5] = (DCTELEM) DESCALE(tmp12 - tmp1, CONST_BITS-PASS1_BITS);
+    dataptr[3] = (DCTELEM) DESCALE(tmp13 + tmp0, CONST_BITS-PASS1_BITS);
+    dataptr[4] = (DCTELEM) DESCALE(tmp13 - tmp0, CONST_BITS-PASS1_BITS);
+
+    dataptr += DCTSIZE;		/* advance pointer to next row */
+  }
+
+  /* Pass 2: process columns. */
+  /* Note that we must descale the results by a factor of 8 == 2**3, */
+  /* and also undo the PASS1_BITS scaling. */
+
+  dataptr = data;
+  for (rowctr = DCTSIZE-1; rowctr >= 0; rowctr--) {
+    /* Columns of zeroes can be exploited in the same way as we did with rows.
+     * However, the row calculation has created many nonzero AC terms, so the
+     * simplification applies less often (typically 5% to 10% of the time).
+     * On machines with very fast multiplication, it's possible that the
+     * test takes more time than it's worth.  In that case this section
+     * may be commented out.
+     */
+
+#ifndef NO_ZERO_COLUMN_TEST
+    if ((dataptr[DCTSIZE*1] | dataptr[DCTSIZE*2] | dataptr[DCTSIZE*3] |
+	 dataptr[DCTSIZE*4] | dataptr[DCTSIZE*5] | dataptr[DCTSIZE*6] |
+	 dataptr[DCTSIZE*7]) == 0) {
+      /* AC terms all zero */
+      DCTELEM dcval = (DCTELEM) DESCALE((INT32) dataptr[0], PASS1_BITS+3);
+      
+      dataptr[DCTSIZE*0] = dcval;
+      dataptr[DCTSIZE*1] = dcval;
+      dataptr[DCTSIZE*2] = dcval;
+      dataptr[DCTSIZE*3] = dcval;
+      dataptr[DCTSIZE*4] = dcval;
+      dataptr[DCTSIZE*5] = dcval;
+      dataptr[DCTSIZE*6] = dcval;
+      dataptr[DCTSIZE*7] = dcval;
+      
+      dataptr++;		/* advance pointer to next column */
+      continue;
+    }
+#endif
+
+    /* Even part: reverse the even part of the forward DCT. */
+    /* The rotator is sqrt(2)*c(-6). */
+
+    z2 = (INT32) dataptr[DCTSIZE*2];
+    z3 = (INT32) dataptr[DCTSIZE*6];
+
+    z1 = MULTIPLY(z2 + z3, FIX_0_541196100);
+    tmp2 = z1 + MULTIPLY(z3, - FIX_1_847759065);
+    tmp3 = z1 + MULTIPLY(z2, FIX_0_765366865);
+
+    tmp0 = ((INT32) dataptr[DCTSIZE*0] + (INT32) dataptr[DCTSIZE*4]) << CONST_BITS;
+    tmp1 = ((INT32) dataptr[DCTSIZE*0] - (INT32) dataptr[DCTSIZE*4]) << CONST_BITS;
+
+    tmp10 = tmp0 + tmp3;
+    tmp13 = tmp0 - tmp3;
+    tmp11 = tmp1 + tmp2;
+    tmp12 = tmp1 - tmp2;
+    
+    /* Odd part per figure 8; the matrix is unitary and hence its
+     * transpose is its inverse.  i0..i3 are y7,y5,y3,y1 respectively.
+     */
+
+    tmp0 = (INT32) dataptr[DCTSIZE*7];
+    tmp1 = (INT32) dataptr[DCTSIZE*5];
+    tmp2 = (INT32) dataptr[DCTSIZE*3];
+    tmp3 = (INT32) dataptr[DCTSIZE*1];
+
+    z1 = tmp0 + tmp3;
+    z2 = tmp1 + tmp2;
+    z3 = tmp0 + tmp2;
+    z4 = tmp1 + tmp3;
+    z5 = MULTIPLY(z3 + z4, FIX_1_175875602); /* sqrt(2) * c3 */
+    
+    tmp0 = MULTIPLY(tmp0, FIX_0_298631336); /* sqrt(2) * (-c1+c3+c5-c7) */
+    tmp1 = MULTIPLY(tmp1, FIX_2_053119869); /* sqrt(2) * ( c1+c3-c5+c7) */
+    tmp2 = MULTIPLY(tmp2, FIX_3_072711026); /* sqrt(2) * ( c1+c3+c5-c7) */
+    tmp3 = MULTIPLY(tmp3, FIX_1_501321110); /* sqrt(2) * ( c1+c3-c5-c7) */
+    z1 = MULTIPLY(z1, - FIX_0_899976223); /* sqrt(2) * (c7-c3) */
+    z2 = MULTIPLY(z2, - FIX_2_562915447); /* sqrt(2) * (-c1-c3) */
+    z3 = MULTIPLY(z3, - FIX_1_961570560); /* sqrt(2) * (-c3-c5) */
+    z4 = MULTIPLY(z4, - FIX_0_390180644); /* sqrt(2) * (c5-c3) */
+    
+    z3 += z5;
+    z4 += z5;
+    
+    tmp0 += z1 + z3;
+    tmp1 += z2 + z4;
+    tmp2 += z2 + z3;
+    tmp3 += z1 + z4;
+
+    /* Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 */
+
+    dataptr[DCTSIZE*0] = (DCTELEM) DESCALE(tmp10 + tmp3,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*7] = (DCTELEM) DESCALE(tmp10 - tmp3,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*1] = (DCTELEM) DESCALE(tmp11 + tmp2,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*6] = (DCTELEM) DESCALE(tmp11 - tmp2,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*2] = (DCTELEM) DESCALE(tmp12 + tmp1,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*5] = (DCTELEM) DESCALE(tmp12 - tmp1,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*3] = (DCTELEM) DESCALE(tmp13 + tmp0,
+					   CONST_BITS+PASS1_BITS+3);
+    dataptr[DCTSIZE*4] = (DCTELEM) DESCALE(tmp13 - tmp0,
+					   CONST_BITS+PASS1_BITS+3);
+    
+    dataptr++;			/* advance pointer to next column */
   }
 }
diff --git a/jutils.c b/jutils.c
index 74ac6b7..7433e27 100644
--- a/jutils.c
+++ b/jutils.c
@@ -23,6 +23,26 @@
 }
 
 
+/* On normal machines we can apply MEMCOPY() and MEMZERO() to sample arrays
+ * and coefficient-block arrays.  This won't work on 80x86 because the arrays
+ * are FAR and we're assuming a small-pointer memory model.  However, some
+ * DOS compilers provide far-pointer versions of memcpy() and memset() even
+ * in the small-model libraries.  These will be used if USE_FMEM is defined.
+ * Otherwise, the routines below do it the hard way.  (The performance cost
+ * is not all that great, because these routines aren't very heavily used.)
+ */
+
+#ifndef NEED_FAR_POINTERS	/* normal case, same as regular macros */
+#define FMEMCOPY(dest,src,size)	MEMCOPY(dest,src,size)
+#define FMEMZERO(target,size)	MEMZERO(target,size)
+#else				/* 80x86 case, define if we can */
+#ifdef USE_FMEM
+#define FMEMCOPY(dest,src,size)	_fmemcpy((void FAR *)(dest), (const void FAR *)(src), (size_t)(size))
+#define FMEMZERO(target,size)	_fmemset((void FAR *)(target), 0, (size_t)(size))
+#endif
+#endif
+
+
 GLOBAL void
 jcopy_sample_rows (JSAMPARRAY input_array, int source_row,
 		   JSAMPARRAY output_array, int dest_row,
@@ -33,14 +53,11 @@
  * The source and destination arrays must be at least as wide as num_cols.
  */
 {
-  /* On normal machines we can use memcpy().  This won't work on 80x86 because
-   * the sample arrays are FAR and we're assuming a small-pointer memory model.
-   */
   register JSAMPROW inptr, outptr;
-#ifdef NEED_FAR_POINTERS
-  register long count;
-#else
+#ifdef FMEMCOPY
   register size_t count = (size_t) (num_cols * SIZEOF(JSAMPLE));
+#else
+  register long count;
 #endif
   register int row;
 
@@ -50,11 +67,11 @@
   for (row = num_rows; row > 0; row--) {
     inptr = *input_array++;
     outptr = *output_array++;
-#ifdef NEED_FAR_POINTERS
+#ifdef FMEMCOPY
+    FMEMCOPY(outptr, inptr, count);
+#else
     for (count = num_cols; count > 0; count--)
       *outptr++ = *inptr++;	/* needn't bother with GETJSAMPLE() here */
-#else
-    memcpy((void *) outptr, (void *) inptr, count);
 #endif
   }
 }
@@ -64,10 +81,9 @@
 jcopy_block_row (JBLOCKROW input_row, JBLOCKROW output_row, long num_blocks)
 /* Copy a row of coefficient blocks from one place to another. */
 {
-  /* On normal machines we can use memcpy().  This won't work on 80x86 because
-   * the block arrays are FAR and we're assuming a small-pointer memory model.
-   */
-#ifdef NEED_FAR_POINTERS
+#ifdef FMEMCOPY
+  FMEMCOPY(output_row, input_row, num_blocks * (DCTSIZE2 * SIZEOF(JCOEF)));
+#else
   register JCOEFPTR inptr, outptr;
   register long count;
 
@@ -76,9 +92,6 @@
   for (count = num_blocks * DCTSIZE2; count > 0; count--) {
     *outptr++ = *inptr++;
   }
-#else
-    memcpy((void *) output_row, (void *) input_row,
-	   (size_t) (num_blocks * (DCTSIZE2 * SIZEOF(JCOEF))));
 #endif
 }
 
@@ -88,17 +101,14 @@
 /* Zero out a chunk of FAR memory. */
 /* This might be sample-array data, block-array data, or alloc_medium data. */
 {
-  /* On normal machines we can use MEMZERO().  This won't work on 80x86
-   * because we're assuming a small-pointer memory model.
-   */
-#ifdef NEED_FAR_POINTERS
+#ifdef FMEMZERO
+  FMEMZERO(target, bytestozero);
+#else
   register char FAR * ptr = (char FAR *) target;
   register size_t count;
 
   for (count = bytestozero; count > 0; count--) {
     *ptr++ = 0;
   }
-#else
-  MEMZERO((void *) target, bytestozero);
 #endif
 }
diff --git a/jversion.h b/jversion.h
index 91688ff..1f0171e 100644
--- a/jversion.h
+++ b/jversion.h
@@ -9,6 +9,6 @@
  */
 
 
-#define JVERSION	"3  17-Mar-92"
+#define JVERSION	"4 10-Dec-92"
 
 #define JCOPYRIGHT	"Copyright (C) 1992, Thomas G. Lane"
diff --git a/jwrgif.c b/jwrgif.c
index 9250701..612a22c 100644
--- a/jwrgif.c
+++ b/jwrgif.c
@@ -318,7 +318,12 @@
    * Write the GIF header.
    * Note that we generate a plain GIF87 header for maximum compatibility.
    */
-  (void) JFWRITE(dcinfo->output_file, "GIF87a", 6);
+  putc('G', dcinfo->output_file);
+  putc('I', dcinfo->output_file);
+  putc('F', dcinfo->output_file);
+  putc('8', dcinfo->output_file);
+  putc('7', dcinfo->output_file);
+  putc('a', dcinfo->output_file);
   /* Write the Logical Screen Descriptor */
   put_word((UINT16) dcinfo->image_width);
   put_word((UINT16) dcinfo->image_height);
diff --git a/jwrjfif.c b/jwrjfif.c
index 08a9a9d..5bd7afb 100644
--- a/jwrjfif.c
+++ b/jwrjfif.c
@@ -313,10 +313,10 @@
   
   emit_2bytes(cinfo, 2 + 4 + 1 + 2 + 1 + 2 + 2 + 1 + 1); /* length */
 
-  emit_byte(cinfo, 'J');	/* Identifier */
-  emit_byte(cinfo, 'F');
-  emit_byte(cinfo, 'I');
-  emit_byte(cinfo, 'F');
+  emit_byte(cinfo, 0x4A);	/* Identifier: ASCII "JFIF" */
+  emit_byte(cinfo, 0x46);
+  emit_byte(cinfo, 0x49);
+  emit_byte(cinfo, 0x46);
   emit_byte(cinfo, 0);
   emit_byte(cinfo, 1);		/* Major version */
   emit_byte(cinfo, 1);		/* Minor version */
@@ -363,9 +363,6 @@
   }
   /* now prec is nonzero iff there are any 16-bit quant tables. */
 
-  if (cinfo->restart_interval)
-    emit_dri(cinfo);
-
   /* Check for a non-baseline specification. */
   /* Note we assume that Huffman table numbers won't be changed later. */
   is_baseline = TRUE;
@@ -418,6 +415,13 @@
     }
   }
 
+  /* Emit DRI if required --- note that DRI value could change for each scan.
+   * If it doesn't, a tiny amount of space is wasted in multiple-scan files.
+   * We assume DRI will never be nonzero for one scan and zero for a later one.
+   */
+  if (cinfo->restart_interval)
+    emit_dri(cinfo);
+
   emit_sos(cinfo);
 }
 
diff --git a/jwrppm.c b/jwrppm.c
index 5c9a47d..5d9f60b 100644
--- a/jwrppm.c
+++ b/jwrppm.c
@@ -5,8 +5,7 @@
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
- * This file contains routines to write output images in PPM format.
- * The PBMPLUS library is required (well, it will be in the real version).
+ * This file contains routines to write output images in PPM/PGM format.
  *
  * These routines may need modification for non-Unix environments or
  * specialized applications.  As they stand, they assume output to
@@ -32,6 +31,22 @@
 
 
 /*
+ * On most systems, writing individual bytes with putc() is drastically less
+ * efficient than buffering a row at a time for fwrite().  But we must
+ * allocate the row buffer in near data space on PCs, because we are assuming
+ * small-data memory model, wherein fwrite() can't reach far memory.  If you
+ * need to process very wide images on a PC, you may have to use the putc()
+ * approach.  Also, there are still a few systems around wherein fwrite() is
+ * actually implemented as a putc() loop, in which case this buffer is a waste
+ * of space.  So the putc() method can be used by defining USE_PUTC_OUTPUT.
+ */
+
+#ifndef USE_PUTC_OUTPUT
+static char * row_buffer;	/* holds 1 pixel row's worth of output */
+#endif
+
+
+/*
  * Write the file header.
  */
 
@@ -42,10 +57,20 @@
     /* emit header for raw PGM format */
     fprintf(cinfo->output_file, "P5\n%ld %ld\n%d\n",
 	    cinfo->image_width, cinfo->image_height, 255);
+#ifndef USE_PUTC_OUTPUT
+    /* allocate space for row buffer: 1 byte/pixel */
+    row_buffer = (char *) (*cinfo->emethods->alloc_small)
+			((size_t) (SIZEOF(char) * cinfo->image_width));
+#endif
   } else if (cinfo->out_color_space == CS_RGB) {
     /* emit header for raw PPM format */
     fprintf(cinfo->output_file, "P6\n%ld %ld\n%d\n",
 	    cinfo->image_width, cinfo->image_height, 255);
+#ifndef USE_PUTC_OUTPUT
+    /* allocate space for row buffer: 3 bytes/pixel */
+    row_buffer = (char *) (*cinfo->emethods->alloc_small)
+			((size_t) (3 * SIZEOF(char) * cinfo->image_width));
+#endif
   } else {
     ERREXIT(cinfo->emethods, "PPM output must be grayscale or RGB");
   }
@@ -56,6 +81,8 @@
  * Write some pixel data.
  */
 
+#ifdef USE_PUTC_OUTPUT
+
 METHODDEF void
 put_pixel_rows (decompress_info_ptr cinfo, int num_rows,
 		JSAMPIMAGE pixel_data)
@@ -63,83 +90,215 @@
   register FILE * outfile = cinfo->output_file;
   register JSAMPROW ptr0, ptr1, ptr2;
   register long col;
-  register long width = cinfo->image_width;
-  register int row;
+  long width = cinfo->image_width;
+  int row;
   
-  if (cinfo->out_color_space == CS_GRAYSCALE) {
-    for (row = 0; row < num_rows; row++) {
-      ptr0 = pixel_data[0][row];
-      for (col = width; col > 0; col--) {
-	putc(GETJSAMPLE(*ptr0), outfile);
-	ptr0++;
-      }
-    }
-  } else {
-    for (row = 0; row < num_rows; row++) {
-      ptr0 = pixel_data[0][row];
-      ptr1 = pixel_data[1][row];
-      ptr2 = pixel_data[2][row];
-      for (col = width; col > 0; col--) {
-	putc(GETJSAMPLE(*ptr0), outfile);
-	ptr0++;
-	putc(GETJSAMPLE(*ptr1), outfile);
-	ptr1++;
-	putc(GETJSAMPLE(*ptr2), outfile);
-	ptr2++;
-      }
+  for (row = 0; row < num_rows; row++) {
+    ptr0 = pixel_data[0][row];
+    ptr1 = pixel_data[1][row];
+    ptr2 = pixel_data[2][row];
+    for (col = width; col > 0; col--) {
+      putc(GETJSAMPLE(*ptr0), outfile);
+      ptr0++;
+      putc(GETJSAMPLE(*ptr1), outfile);
+      ptr1++;
+      putc(GETJSAMPLE(*ptr2), outfile);
+      ptr2++;
     }
   }
 }
 
+METHODDEF void
+put_gray_rows (decompress_info_ptr cinfo, int num_rows,
+	       JSAMPIMAGE pixel_data)
+{
+  register FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr0;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr0 = pixel_data[0][row];
+    for (col = width; col > 0; col--) {
+      putc(GETJSAMPLE(*ptr0), outfile);
+      ptr0++;
+    }
+  }
+}
+
+#else /* use row buffering */
+
+METHODDEF void
+put_pixel_rows (decompress_info_ptr cinfo, int num_rows,
+		JSAMPIMAGE pixel_data)
+{
+  FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr0, ptr1, ptr2;
+  register char * row_bufferptr;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr0 = pixel_data[0][row];
+    ptr1 = pixel_data[1][row];
+    ptr2 = pixel_data[2][row];
+    row_bufferptr = row_buffer;
+    for (col = width; col > 0; col--) {
+      *row_bufferptr++ = (char) GETJSAMPLE(*ptr0++);
+      *row_bufferptr++ = (char) GETJSAMPLE(*ptr1++);
+      *row_bufferptr++ = (char) GETJSAMPLE(*ptr2++);
+    }
+    (void) JFWRITE(outfile, row_buffer, 3*width);
+  }
+}
+
+METHODDEF void
+put_gray_rows (decompress_info_ptr cinfo, int num_rows,
+	       JSAMPIMAGE pixel_data)
+{
+  FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr0;
+  register char * row_bufferptr;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr0 = pixel_data[0][row];
+    row_bufferptr = row_buffer;
+    for (col = width; col > 0; col--) {
+      *row_bufferptr++ = (char) GETJSAMPLE(*ptr0++);
+    }
+    (void) JFWRITE(outfile, row_buffer, width);
+  }
+}
+
+#endif /* USE_PUTC_OUTPUT */
+
 
 /*
  * Write some pixel data when color quantization is in effect.
  */
 
+#ifdef USE_PUTC_OUTPUT
+
 METHODDEF void
-put_demapped_rows (decompress_info_ptr cinfo, int num_rows,
-		   JSAMPIMAGE pixel_data)
+put_demapped_rgb (decompress_info_ptr cinfo, int num_rows,
+		  JSAMPIMAGE pixel_data)
 {
   register FILE * outfile = cinfo->output_file;
-  register JSAMPARRAY color_map = cinfo->colormap;
   register JSAMPROW ptr;
+  register JSAMPROW color_map0 = cinfo->colormap[0];
+  register JSAMPROW color_map1 = cinfo->colormap[1];
+  register JSAMPROW color_map2 = cinfo->colormap[2];
+  register int pixval;
   register long col;
   long width = cinfo->image_width;
   int row;
   
-  if (cinfo->out_color_space == CS_GRAYSCALE) {
-    for (row = 0; row < num_rows; row++) {
-      ptr = pixel_data[0][row];
-      for (col = width; col > 0; col--) {
-	putc(GETJSAMPLE(color_map[0][GETJSAMPLE(*ptr)]), outfile);
-	ptr++;
-      }
-    }
-  } else {
-    for (row = 0; row < num_rows; row++) {
-      ptr = pixel_data[0][row];
-      for (col = width; col > 0; col--) {
-	register int pixval = GETJSAMPLE(*ptr);
-
-	putc(GETJSAMPLE(color_map[0][pixval]), outfile);
-	putc(GETJSAMPLE(color_map[1][pixval]), outfile);
-	putc(GETJSAMPLE(color_map[2][pixval]), outfile);
-	ptr++;
-      }
+  for (row = 0; row < num_rows; row++) {
+    ptr = pixel_data[0][row];
+    for (col = width; col > 0; col--) {
+      pixval = GETJSAMPLE(*ptr++);
+      putc(GETJSAMPLE(color_map0[pixval]), outfile);
+      putc(GETJSAMPLE(color_map1[pixval]), outfile);
+      putc(GETJSAMPLE(color_map2[pixval]), outfile);
     }
   }
 }
 
+METHODDEF void
+put_demapped_gray (decompress_info_ptr cinfo, int num_rows,
+		   JSAMPIMAGE pixel_data)
+{
+  register FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr;
+  register JSAMPROW color_map0 = cinfo->colormap[0];
+  register int pixval;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr = pixel_data[0][row];
+    for (col = width; col > 0; col--) {
+      pixval = GETJSAMPLE(*ptr++);
+      putc(GETJSAMPLE(color_map0[pixval]), outfile);
+    }
+  }
+}
+
+#else /* use row buffering */
+
+METHODDEF void
+put_demapped_rgb (decompress_info_ptr cinfo, int num_rows,
+		  JSAMPIMAGE pixel_data)
+{
+  FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr;
+  register char * row_bufferptr;
+  register JSAMPROW color_map0 = cinfo->colormap[0];
+  register JSAMPROW color_map1 = cinfo->colormap[1];
+  register JSAMPROW color_map2 = cinfo->colormap[2];
+  register int pixval;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr = pixel_data[0][row];
+    row_bufferptr = row_buffer;
+    for (col = width; col > 0; col--) {
+      pixval = GETJSAMPLE(*ptr++);
+      *row_bufferptr++ = (char) GETJSAMPLE(color_map0[pixval]);
+      *row_bufferptr++ = (char) GETJSAMPLE(color_map1[pixval]);
+      *row_bufferptr++ = (char) GETJSAMPLE(color_map2[pixval]);
+    }
+    (void) JFWRITE(outfile, row_buffer, 3*width);
+  }
+}
+
+METHODDEF void
+put_demapped_gray (decompress_info_ptr cinfo, int num_rows,
+		   JSAMPIMAGE pixel_data)
+{
+  FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr;
+  register char * row_bufferptr;
+  register JSAMPROW color_map0 = cinfo->colormap[0];
+  register int pixval;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr = pixel_data[0][row];
+    row_bufferptr = row_buffer;
+    for (col = width; col > 0; col--) {
+      pixval = GETJSAMPLE(*ptr++);
+      *row_bufferptr++ = (char) GETJSAMPLE(color_map0[pixval]);
+    }
+    (void) JFWRITE(outfile, row_buffer, width);
+  }
+}
+
+#endif /* USE_PUTC_OUTPUT */
+
 
 /*
  * Write the color map.
- * For PPM output, we just demap the output data!
+ * For PPM output, we just remember to demap the output data!
  */
 
 METHODDEF void
 put_color_map (decompress_info_ptr cinfo, int num_colors, JSAMPARRAY colormap)
 {
-  cinfo->methods->put_pixel_rows = put_demapped_rows;
+  if (cinfo->out_color_space == CS_RGB)
+    cinfo->methods->put_pixel_rows = put_demapped_rgb;
+  else
+    cinfo->methods->put_pixel_rows = put_demapped_gray;
 }
 
 
@@ -150,7 +309,8 @@
 METHODDEF void
 output_term (decompress_info_ptr cinfo)
 {
-  /* No work except to make sure we wrote the output file OK */
+  /* No work except to make sure we wrote the output file OK; */
+  /* we let free_all release any workspace */
   fflush(cinfo->output_file);
   if (ferror(cinfo->output_file))
     ERREXIT(cinfo->emethods, "Output file write error");
@@ -167,7 +327,10 @@
 {
   cinfo->methods->output_init = output_init;
   cinfo->methods->put_color_map = put_color_map;
-  cinfo->methods->put_pixel_rows = put_pixel_rows;
+  if (cinfo->out_color_space == CS_RGB)
+    cinfo->methods->put_pixel_rows = put_pixel_rows;
+  else
+    cinfo->methods->put_pixel_rows = put_gray_rows;
   cinfo->methods->output_term = output_term;
 }
 
diff --git a/jwrrle.c b/jwrrle.c
index 49afbcf..4fec56e 100644
--- a/jwrrle.c
+++ b/jwrrle.c
@@ -138,7 +138,7 @@
   /* Allocate storage for RLE-style cmap, zero any extra entries */
   cmapsize = cinfo->color_out_comps * CMAPLENGTH * SIZEOF(rle_map);
   output_colormap = (rle_map *) (*cinfo->emethods->alloc_small) (cmapsize);
-  MEMZERO((void *) output_colormap, cmapsize);
+  MEMZERO(output_colormap, cmapsize);
 
   /* Save away data in RLE format --- note 8-bit left shift! */
   /* Shifting would need adjustment for JSAMPLEs wider than 8 bits. */
@@ -167,7 +167,7 @@
   long row;
 
   /* Initialize the header info */
-  MEMZERO((void *) &header, SIZEOF(rle_hdr)); /* make sure all bits are 0 */
+  MEMZERO(&header, SIZEOF(rle_hdr)); /* make sure all bits are 0 */
   header.rle_file = cinfo->output_file;
   header.xmin     = 0;
   header.xmax     = cinfo->image_width  - 1;
diff --git a/jwrtarga.c b/jwrtarga.c
index ba263a6..009d0bb 100644
--- a/jwrtarga.c
+++ b/jwrtarga.c
@@ -32,6 +32,22 @@
 #endif
 
 
+/*
+ * On most systems, writing individual bytes with putc() is drastically less
+ * efficient than buffering a row at a time for fwrite().  But we must
+ * allocate the row buffer in near data space on PCs, because we are assuming
+ * small-data memory model, wherein fwrite() can't reach far memory.  If you
+ * need to process very wide images on a PC, you may have to use the putc()
+ * approach.  Also, there are still a few systems around wherein fwrite() is
+ * actually implemented as a putc() loop, in which case this buffer is a waste
+ * of space.  So the putc() method can be used by defining USE_PUTC_OUTPUT.
+ */
+
+#ifndef USE_PUTC_OUTPUT
+static char * row_buffer;	/* holds 1 pixel row's worth of output */
+#endif
+
+
 LOCAL void
 write_header (decompress_info_ptr cinfo, int num_colors)
 /* Create and write a Targa header */
@@ -39,7 +55,7 @@
   char targaheader[18];
 
   /* Set unused fields of header to 0 */
-  MEMZERO((void *) targaheader, SIZEOF(targaheader));
+  MEMZERO(targaheader, SIZEOF(targaheader));
 
   if (num_colors > 0) {
     targaheader[1] = 1;		/* color map type 1 */
@@ -83,10 +99,20 @@
     /* Targa doesn't have a mapped grayscale format, so we will */
     /* demap quantized gray output.  Never emit a colormap. */
     write_header(cinfo, 0);
+#ifndef USE_PUTC_OUTPUT
+    /* allocate space for row buffer: 1 byte/pixel */
+    row_buffer = (char *) (*cinfo->emethods->alloc_small)
+			((size_t) (SIZEOF(char) * cinfo->image_width));
+#endif
   } else if (cinfo->out_color_space == CS_RGB) {
     /* For quantized output, defer writing header until put_color_map time. */
     if (! cinfo->quantize_colors)
       write_header(cinfo, 0);
+#ifndef USE_PUTC_OUTPUT
+    /* allocate space for row buffer: 3 bytes/pixel */
+    row_buffer = (char *) (*cinfo->emethods->alloc_small)
+			((size_t) (3 * SIZEOF(char) * cinfo->image_width));
+#endif
   } else {
     ERREXIT(cinfo->emethods, "Targa output must be grayscale or RGB");
   }
@@ -97,56 +123,122 @@
  * Write some pixel data.
  */
 
+#ifdef USE_PUTC_OUTPUT
+
 METHODDEF void
 put_pixel_rows (decompress_info_ptr cinfo, int num_rows,
 		JSAMPIMAGE pixel_data)
+/* used for unquantized full-color output */
 {
   register FILE * outfile = cinfo->output_file;
   register JSAMPROW ptr0, ptr1, ptr2;
   register long col;
-  register long width = cinfo->image_width;
-  register int row;
+  long width = cinfo->image_width;
+  int row;
   
-  if (cinfo->final_out_comps == 1) {
-    /* here for grayscale or quantized color output */
-    for (row = 0; row < num_rows; row++) {
-      ptr0 = pixel_data[0][row];
-      for (col = width; col > 0; col--) {
-	putc(GETJSAMPLE(*ptr0), outfile);
-	ptr0++;
-      }
-    }
-  } else {
-    /* here for unquantized color output */
-    for (row = 0; row < num_rows; row++) {
-      ptr0 = pixel_data[0][row];
-      ptr1 = pixel_data[1][row];
-      ptr2 = pixel_data[2][row];
-      for (col = width; col > 0; col--) {
-	putc(GETJSAMPLE(*ptr2), outfile); /* write in BGR order */
-	ptr2++;
-	putc(GETJSAMPLE(*ptr1), outfile);
-	ptr1++;
-	putc(GETJSAMPLE(*ptr0), outfile);
-	ptr0++;
-      }
+  for (row = 0; row < num_rows; row++) {
+    ptr0 = pixel_data[0][row];
+    ptr1 = pixel_data[1][row];
+    ptr2 = pixel_data[2][row];
+    for (col = width; col > 0; col--) {
+      putc(GETJSAMPLE(*ptr2), outfile); /* write in BGR order */
+      ptr2++;
+      putc(GETJSAMPLE(*ptr1), outfile);
+      ptr1++;
+      putc(GETJSAMPLE(*ptr0), outfile);
+      ptr0++;
     }
   }
 }
 
+METHODDEF void
+put_gray_rows (decompress_info_ptr cinfo, int num_rows,
+	       JSAMPIMAGE pixel_data)
+/* used for grayscale OR quantized color output */
+{
+  register FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr0;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr0 = pixel_data[0][row];
+    for (col = width; col > 0; col--) {
+      putc(GETJSAMPLE(*ptr0), outfile);
+      ptr0++;
+    }
+  }
+}
+
+#else /* use row buffering */
+
+METHODDEF void
+put_pixel_rows (decompress_info_ptr cinfo, int num_rows,
+		JSAMPIMAGE pixel_data)
+/* used for unquantized full-color output */
+{
+  FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr0, ptr1, ptr2;
+  register char * row_bufferptr;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr0 = pixel_data[0][row];
+    ptr1 = pixel_data[1][row];
+    ptr2 = pixel_data[2][row];
+    row_bufferptr = row_buffer;
+    for (col = width; col > 0; col--) {
+      *row_bufferptr++ = (char) GETJSAMPLE(*ptr2++); /* BGR order */
+      *row_bufferptr++ = (char) GETJSAMPLE(*ptr1++);
+      *row_bufferptr++ = (char) GETJSAMPLE(*ptr0++);
+    }
+    (void) JFWRITE(outfile, row_buffer, 3*width);
+  }
+}
+
+METHODDEF void
+put_gray_rows (decompress_info_ptr cinfo, int num_rows,
+	       JSAMPIMAGE pixel_data)
+/* used for grayscale OR quantized color output */
+{
+  FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr0;
+  register char * row_bufferptr;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr0 = pixel_data[0][row];
+    row_bufferptr = row_buffer;
+    for (col = width; col > 0; col--) {
+      *row_bufferptr++ = (char) GETJSAMPLE(*ptr0++);
+    }
+    (void) JFWRITE(outfile, row_buffer, width);
+  }
+}
+
+#endif /* USE_PUTC_OUTPUT */
+
 
 /*
  * Write some demapped pixel data when color quantization is in effect.
  * For Targa, this is only applied to grayscale data.
  */
 
+#ifdef USE_PUTC_OUTPUT
+
 METHODDEF void
-put_demapped_rows (decompress_info_ptr cinfo, int num_rows,
+put_demapped_gray (decompress_info_ptr cinfo, int num_rows,
 		   JSAMPIMAGE pixel_data)
 {
   register FILE * outfile = cinfo->output_file;
-  register JSAMPARRAY color_map = cinfo->colormap;
   register JSAMPROW ptr;
+  register JSAMPROW color_map0 = cinfo->colormap[0];
+  register int pixval;
   register long col;
   long width = cinfo->image_width;
   int row;
@@ -154,12 +246,40 @@
   for (row = 0; row < num_rows; row++) {
     ptr = pixel_data[0][row];
     for (col = width; col > 0; col--) {
-      putc(GETJSAMPLE(color_map[0][GETJSAMPLE(*ptr)]), outfile);
-      ptr++;
+      pixval = GETJSAMPLE(*ptr++);
+      putc(GETJSAMPLE(color_map0[pixval]), outfile);
     }
   }
 }
 
+#else /* use row buffering */
+
+METHODDEF void
+put_demapped_gray (decompress_info_ptr cinfo, int num_rows,
+		   JSAMPIMAGE pixel_data)
+{
+  FILE * outfile = cinfo->output_file;
+  register JSAMPROW ptr;
+  register char * row_bufferptr;
+  register JSAMPROW color_map0 = cinfo->colormap[0];
+  register int pixval;
+  register long col;
+  long width = cinfo->image_width;
+  int row;
+  
+  for (row = 0; row < num_rows; row++) {
+    ptr = pixel_data[0][row];
+    row_bufferptr = row_buffer;
+    for (col = width; col > 0; col--) {
+      pixval = GETJSAMPLE(*ptr++);
+      *row_bufferptr++ = (char) GETJSAMPLE(color_map0[pixval]);
+    }
+    (void) JFWRITE(outfile, row_buffer, width);
+  }
+}
+
+#endif /* USE_PUTC_OUTPUT */
+
 
 /*
  * Write the color map.
@@ -184,7 +304,7 @@
       putc(GETJSAMPLE(colormap[0][i]), outfile);
     }
   } else {
-    cinfo->methods->put_pixel_rows = put_demapped_rows;
+    cinfo->methods->put_pixel_rows = put_demapped_gray;
   }
 }
 
@@ -213,7 +333,10 @@
 {
   cinfo->methods->output_init = output_init;
   cinfo->methods->put_color_map = put_color_map;
-  cinfo->methods->put_pixel_rows = put_pixel_rows;
+  if (cinfo->out_color_space == CS_GRAYSCALE || cinfo->quantize_colors)
+    cinfo->methods->put_pixel_rows = put_gray_rows;
+  else
+    cinfo->methods->put_pixel_rows = put_pixel_rows;
   cinfo->methods->output_term = output_term;
 }
 
diff --git a/makcjpeg.cf b/makcjpeg.cf
deleted file mode 100644
index 58f6daa..0000000
--- a/makcjpeg.cf
+++ /dev/null
@@ -1,6 +0,0 @@
-L jcmain.mix jcmaster.mix jcdeflts.mix jcarith.mix jccolor.mix jcexpand.mix
-L jchuff.mix jcmcu.mix jcpipe.mix jcsample.mix jfwddct.mix jwrjfif.mix
-L jrdgif.mix jrdppm.mix jrdrle.mix jrdtarga.mix jutils.mix jerror.mix
-L jmemmgr.mix jmemsys.mix jmemdosa.mix
-fa;
-b cjpeg,8K,48K,
diff --git a/makcjpeg.lnk b/makcjpeg.lnk
deleted file mode 100644
index 9951bd8..0000000
--- a/makcjpeg.lnk
+++ /dev/null
@@ -1,25 +0,0 @@
-jcmain.obj +
-jcmaster.obj +
-jcdeflts.obj +
-jcarith.obj +
-jccolor.obj +
-jcexpand.obj +
-jchuff.obj +
-jcmcu.obj +
-jcpipe.obj +
-jcsample.obj +
-jfwddct.obj +
-jwrjfif.obj +
-jrdgif.obj +
-jrdppm.obj +
-jrdrle.obj +
-jrdtarga.obj +
-jutils.obj +
-jerror.obj +
-jmemmgr.obj +
-jmemsys.obj +
-jmemdosa.obj
-cjpeg.exe /NOI
-nul.map
-
-nul.def
diff --git a/makcjpeg.lst b/makcjpeg.lst
deleted file mode 100644
index 19fd9d3..0000000
--- a/makcjpeg.lst
+++ /dev/null
@@ -1,4 +0,0 @@
-jcmain.obj jcmaster.obj jcdeflts.obj jcarith.obj jccolor.obj jcexpand.obj
-jchuff.obj jcmcu.obj jcpipe.obj jcsample.obj jfwddct.obj jwrjfif.obj
-jrdgif.obj jrdppm.obj jrdrle.obj jrdtarga.obj jutils.obj jerror.obj
-jmemmgr.obj jmemsys.obj jmemdosa.obj
diff --git a/makcjpeg.st b/makcjpeg.st
new file mode 100644
index 0000000..4819efb
--- /dev/null
+++ b/makcjpeg.st
@@ -0,0 +1,37 @@
+; Project file for Independent JPEG Group's software
+;
+; This project file is for Atari ST/STE/TT systems using Pure C or Turbo C.
+; Thanks to Frank Moehle (Frank.Moehle@arbi.informatik.uni-oldenburg.de)
+; and to Dr. B. Setzepfandt (bernd@gina.uni-muenster.de).
+;
+; To use this file, rename it to CJPEG.PRJ.
+; If you are using Turbo C, change filenames beginning with "PC..." to "TC..."
+; Read SETUP instructions before trying to make the program!
+;
+;
+;      * * * Output file * * *
+cjpeg.ttp
+;
+;      * * * DEFINES * * * 
+.C[-DHAVE_STDC]
+.C[-DINCLUDES_ARE_ANSI]
+.C[-DNEED_SIGNAL_CATCHER]  ; It's safer to have a signal catcher.
+.C[-DTWO_FILE_COMMANDLINE] ; If you like the UNIX - style with
+                           ; I/O redirection, comment this line out.
+.C[-DPROGRESS_REPORT]      ; This enables the % progress indicator
+;
+; * * * COMPILER OPTIONS * * *  
+.C[-P]        ; absolute calls
+.C[-M]        ; and no string merging, folks
+.C[-w-par]    ; no "parameter xxxx unused"
+.C[-DINCOMPLETE_TYPES_BROKEN] ; suppress warnings about undefined structures
+.C[-wsig]     ; warn if significant digits may be lost
+.C[-w-cln]    ; no "constant is long" warnings
+=
+; * * * * List of modules * * * * 
+PCSTART.O
+jcmain.c		(jinclude.h,jconfig.h,jpegdata.h,jversion.h)
+JPEG.LIB           ; built by LIB_JPEG.PRJ
+;PCFLTLIB.LIB      ; floating point library -- you shouldn't need this
+PCSTDLIB.LIB       ; standard library
+PCEXTLIB.LIB       ; extended library
diff --git a/makdjpeg.cf b/makdjpeg.cf
deleted file mode 100644
index 40c1830..0000000
--- a/makdjpeg.cf
+++ /dev/null
@@ -1,6 +0,0 @@
-L jdmain.mix jdmaster.mix jddeflts.mix jbsmooth.mix jdarith.mix jdcolor.mix
-L jdhuff.mix jdmcu.mix jdpipe.mix jdsample.mix jquant1.mix jquant2.mix
-L jrevdct.mix jrdjfif.mix jwrgif.mix jwrppm.mix jwrrle.mix jwrtarga.mix
-L jutils.mix jerror.mix jmemmgr.mix jmemsys.mix jmemdosa.mix
-fa;
-b djpeg,8K,48K,
diff --git a/makdjpeg.lnk b/makdjpeg.lnk
deleted file mode 100644
index 44207d5..0000000
--- a/makdjpeg.lnk
+++ /dev/null
@@ -1,27 +0,0 @@
-jdmain.obj +
-jdmaster.obj +
-jddeflts.obj +
-jbsmooth.obj +
-jdarith.obj +
-jdcolor.obj +
-jdhuff.obj +
-jdmcu.obj +
-jdpipe.obj +
-jdsample.obj +
-jquant1.obj +
-jquant2.obj +
-jrevdct.obj +
-jrdjfif.obj +
-jwrgif.obj +
-jwrppm.obj +
-jwrrle.obj +
-jwrtarga.obj +
-jutils.obj +
-jerror.obj +
-jmemmgr.obj +
-jmemsys.obj +
-jmemdosa.obj
-djpeg.exe /NOI
-nul.map
-
-nul.def
diff --git a/makdjpeg.lst b/makdjpeg.lst
deleted file mode 100644
index 4f17e6d..0000000
--- a/makdjpeg.lst
+++ /dev/null
@@ -1,4 +0,0 @@
-jdmain.obj jdmaster.obj jddeflts.obj jbsmooth.obj jdarith.obj jdcolor.obj
-jdhuff.obj jdmcu.obj jdpipe.obj jdsample.obj jquant1.obj jquant2.obj
-jrevdct.obj jrdjfif.obj jwrgif.obj jwrppm.obj jwrrle.obj jwrtarga.obj
-jutils.obj jerror.obj jmemmgr.obj jmemsys.obj jmemdosa.obj
diff --git a/makdjpeg.st b/makdjpeg.st
new file mode 100644
index 0000000..7c407e6
--- /dev/null
+++ b/makdjpeg.st
@@ -0,0 +1,37 @@
+; Project file for Independent JPEG Group's software
+;
+; This project file is for Atari ST/STE/TT systems using Pure C or Turbo C.
+; Thanks to Frank Moehle (Frank.Moehle@arbi.informatik.uni-oldenburg.de)
+; and to Dr. B. Setzepfandt (bernd@gina.uni-muenster.de).
+;
+; To use this file, rename it to DJPEG.PRJ.
+; If you are using Turbo C, change filenames beginning with "PC..." to "TC..."
+; Read SETUP instructions before trying to make the program!
+;
+;
+;      * * * Output file * * *
+djpeg.ttp
+;
+;      * * * DEFINES * * * 
+.C[-DHAVE_STDC]
+.C[-DINCLUDES_ARE_ANSI]
+.C[-DNEED_SIGNAL_CATCHER]  ; It's safer to have a signal catcher.
+.C[-DTWO_FILE_COMMANDLINE] ; If you like the UNIX - style with
+                           ; I/O redirection, comment this line out.
+.C[-DPROGRESS_REPORT]      ; This enables the % progress indicator
+;
+; * * * COMPILER OPTIONS * * *  
+.C[-P]        ; absolute calls
+.C[-M]        ; and no string merging, folks
+.C[-w-par]    ; no "parameter xxxx unused"
+.C[-DINCOMPLETE_TYPES_BROKEN] ; suppress warnings about undefined structures
+.C[-wsig]     ; warn if significant digits may be lost
+.C[-w-cln]    ; no "constant is long" warnings
+=
+; * * * * List of modules * * * * 
+PCSTART.O
+jdmain.c		(jinclude.h,jconfig.h,jpegdata.h,jversion.h)
+JPEG.LIB           ; built by LIB_JPEG.PRJ
+;PCFLTLIB.LIB      ; floating point library -- you shouldn't need this
+PCSTDLIB.LIB       ; standard library
+PCEXTLIB.LIB       ; extended library
diff --git a/makefile.ansi b/makefile.ansi
index 2159093..59bba3b 100644
--- a/makefile.ansi
+++ b/makefile.ansi
@@ -43,19 +43,18 @@
         jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c \
         jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c \
         jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
+# virtual source files (not present in distribution file, see SETUP)
 VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
+# system-dependent implementations of virtual source files
 SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h \
         jmemdosa.asm
 # files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
+INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h
 # documentation, test, and support files
 DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
 MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas \
-        makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc \
-        makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf \
-        makljpeg.cf makefile.mms makefile.vms makvms.opt
+        makcjpeg.st makdjpeg.st makljpeg.st makefile.mc5 makefile.mc6 \
+        makefile.bcc makefile.mms makefile.vms makvms.opt
 OTHERFILES= ansi2knr.c ckconfig.c example.c
 TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
 DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) \
@@ -94,6 +93,11 @@
 	$(AR) libjpeg.a  $(LIBOBJECTS)
 	$(AR2) libjpeg.a
 
+jmemsys.c:
+	echo You must select a system-dependent jmemsys.c file.
+	echo Please read the SETUP directions.
+	exit 1
+
 clean:
 	$(RM) *.o cjpeg djpeg libjpeg.a core testout.*
 
@@ -105,7 +109,7 @@
 test: cjpeg djpeg
 	$(RM) testout.ppm testout.gif testout.jpg
 	./djpeg testorig.jpg >testout.ppm
-	./djpeg -G testorig.jpg >testout.gif
+	./djpeg -gif testorig.jpg >testout.gif
 	./cjpeg testimg.ppm >testout.jpg
 	cmp testimg.ppm testout.ppm
 	cmp testimg.gif testout.gif
@@ -118,7 +122,7 @@
 jcdeflts.o : jcdeflts.c jinclude.h jconfig.h jpegdata.h 
 jcexpand.o : jcexpand.c jinclude.h jconfig.h jpegdata.h 
 jchuff.o : jchuff.c jinclude.h jconfig.h jpegdata.h 
-jcmain.o : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c 
+jcmain.o : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h 
 jcmaster.o : jcmaster.c jinclude.h jconfig.h jpegdata.h 
 jcmcu.o : jcmcu.c jinclude.h jconfig.h jpegdata.h 
 jcpipe.o : jcpipe.c jinclude.h jconfig.h jpegdata.h 
@@ -127,7 +131,7 @@
 jdcolor.o : jdcolor.c jinclude.h jconfig.h jpegdata.h 
 jddeflts.o : jddeflts.c jinclude.h jconfig.h jpegdata.h 
 jdhuff.o : jdhuff.c jinclude.h jconfig.h jpegdata.h 
-jdmain.o : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c 
+jdmain.o : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h 
 jdmaster.o : jdmaster.c jinclude.h jconfig.h jpegdata.h 
 jdmcu.o : jdmcu.c jinclude.h jconfig.h jpegdata.h 
 jdpipe.o : jdpipe.c jinclude.h jconfig.h jpegdata.h 
diff --git a/makefile.bcc b/makefile.bcc
index 00b2d7d..8540afc 100644
--- a/makefile.bcc
+++ b/makefile.bcc
@@ -15,10 +15,11 @@
 
 # You may need to adjust these cc options:
 CFLAGS= -DHAVE_STDC -DINCLUDES_ARE_ANSI \
-	-ms -DMSDOS -DINCOMPLETE_TYPES_BROKEN -w-par -O2
+	-ms -DMSDOS -DUSE_FMEM -DINCOMPLETE_TYPES_BROKEN -w-par -O2
 # -DHAVE_STDC -DINCLUDES_ARE_ANSI enable ANSI-C features (we DON'T want -A)
 # -ms selects small memory model for most efficient code
 # -DMSDOS enables DOS-specific code
+# -DUSE_FMEM indicates we have _fmemcpy() and _fmemset()
 # -DINCOMPLETE_TYPES_BROKEN suppresses bogus warning about undefined structures
 # -w-par suppresses warnings about unused function parameters
 # -O2 enables full code optimization (for pre-3.0 Borland C++, use -O -G -Z)
@@ -27,6 +28,9 @@
 LDFLAGS= -ms
 # memory model option here must match CFLAGS!
 
+# need linker response file because file list > 128 chars
+RFILE= libjpeg.ans
+
 
 # source files (independently compilable files)
 SOURCES= jbsmooth.c jcarith.c jccolor.c jcdeflts.c jcexpand.c jchuff.c \
@@ -35,19 +39,18 @@
         jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c \
         jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c \
         jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
+# virtual source files (not present in distribution file, see SETUP)
 VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
+# system-dependent implementations of virtual source files
 SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h \
         jmemdosa.asm
 # files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
+INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h
 # documentation, test, and support files
 DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
 MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas \
-        makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc \
-        makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf \
-        makljpeg.cf makefile.mms makefile.vms makvms.opt
+        makcjpeg.st makdjpeg.st makljpeg.st makefile.mc5 makefile.mc6 \
+        makefile.bcc makefile.mms makefile.vms makvms.opt
 OTHERFILES= ansi2knr.c ckconfig.c example.c
 TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
 DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) \
@@ -72,17 +75,44 @@
 all: cjpeg.exe djpeg.exe
 
 
-cjpeg.exe: $(COBJECTS)
-	$(CC) $(LDFLAGS) -ecjpeg.exe @makcjpeg.lst
+# libjpeg.lib is useful if you are including the JPEG software in a larger
+# program; you'd include it in your link, rather than the individual modules.
+libjpeg.lib: $(LIBOBJECTS) $(RFILE)
+    del libjpeg.lib
+    tlib @$(RFILE)
 
-djpeg.exe: $(DOBJECTS)
-	$(CC) $(LDFLAGS) -edjpeg.exe @makdjpeg.lst
+# linker response file for same
+$(RFILE) : Makefile
+    del $(RFILE)
+    echo libjpeg.lib & >$(RFILE)
+    echo +jcmaster.obj +jcdeflts.obj +jcarith.obj +jccolor.obj & >>$(RFILE)
+    echo +jcexpand.obj +jchuff.obj +jcmcu.obj +jcpipe.obj & >>$(RFILE)
+    echo +jcsample.obj +jfwddct.obj +jwrjfif.obj +jrdgif.obj & >>$(RFILE)
+    echo +jrdppm.obj +jrdrle.obj +jrdtarga.obj +jdmaster.obj & >>$(RFILE)
+    echo +jddeflts.obj +jbsmooth.obj +jdarith.obj +jdcolor.obj & >>$(RFILE)
+    echo +jdhuff.obj +jdmcu.obj +jdpipe.obj +jdsample.obj & >>$(RFILE)
+    echo +jquant1.obj +jquant2.obj +jrevdct.obj +jrdjfif.obj & >>$(RFILE)
+    echo +jwrgif.obj +jwrppm.obj +jwrrle.obj +jwrtarga.obj & >>$(RFILE)
+    echo +jutils.obj +jerror.obj +jmemmgr.obj +jmemsys.obj & >>$(RFILE)
+    echo +jmemdosa.obj >>$(RFILE)
+
+cjpeg.exe: jcmain.obj libjpeg.lib
+	$(CC) $(LDFLAGS) -ecjpeg.exe jcmain.obj libjpeg.lib
+
+djpeg.exe: jdmain.obj libjpeg.lib
+	$(CC) $(LDFLAGS) -edjpeg.exe jdmain.obj libjpeg.lib
 
 .c.obj:
 	$(CC) $(CFLAGS) -c $<
 
+jmemsys.c:
+	echo You must select a system-dependent jmemsys.c file.
+	echo Please read the SETUP directions.
+	exit 1
+
 clean:
 	del *.obj
+	del libjpeg.lib
 	del cjpeg.exe
 	del djpeg.exe
 	del testout.*
@@ -90,7 +120,7 @@
 test:
 	del testout.*
 	djpeg testorig.jpg testout.ppm
-	djpeg -G testorig.jpg testout.gif
+	djpeg -gif testorig.jpg testout.gif
 	cjpeg testimg.ppm testout.jpg
 	fc testimg.ppm testout.ppm
 	fc testimg.gif testout.gif
@@ -103,7 +133,7 @@
 jcdeflts.obj : jcdeflts.c jinclude.h jconfig.h jpegdata.h
 jcexpand.obj : jcexpand.c jinclude.h jconfig.h jpegdata.h
 jchuff.obj : jchuff.c jinclude.h jconfig.h jpegdata.h
-jcmain.obj : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
+jcmain.obj : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h
 jcmaster.obj : jcmaster.c jinclude.h jconfig.h jpegdata.h
 jcmcu.obj : jcmcu.c jinclude.h jconfig.h jpegdata.h
 jcpipe.obj : jcpipe.c jinclude.h jconfig.h jpegdata.h
@@ -112,7 +142,7 @@
 jdcolor.obj : jdcolor.c jinclude.h jconfig.h jpegdata.h
 jddeflts.obj : jddeflts.c jinclude.h jconfig.h jpegdata.h
 jdhuff.obj : jdhuff.c jinclude.h jconfig.h jpegdata.h
-jdmain.obj : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
+jdmain.obj : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h
 jdmaster.obj : jdmaster.c jinclude.h jconfig.h jpegdata.h
 jdmcu.obj : jdmcu.c jinclude.h jconfig.h jpegdata.h
 jdpipe.obj : jdpipe.c jinclude.h jconfig.h jpegdata.h
diff --git a/makefile.manx b/makefile.manx
index 428f3dc..88e98a9 100644
--- a/makefile.manx
+++ b/makefile.manx
@@ -10,11 +10,17 @@
 CC= cc
 
 # You may need to adjust these cc options:
-CFLAGS= -MC -MD -sf -sn -sp -DAMIGA -DTWO_FILE_COMMANDLINE \
-	-DNEED_SIGNAL_CATCHER -Dsignal_catcher=_abort
+# uncomment for generic 68000 code (will work on any Amiga)
+ARCHFLAGS= -sn
+
+# uncomment for 68020/68030 code (faster, but won't run on 68000 CPU)
+#ARCHFLAGS= -c2
+
+CFLAGS= -MC -MD -DAMIGA -DTWO_FILE_COMMANDLINE -DNEED_SIGNAL_CATCHER \
+	-Dsignal_catcher=_abort -DSHORTxSHORT_32 $(ARCHFLAGS) -spfam -r4
 
 # Link-time cc options:
-LDFLAGS= 
+LDFLAGS= -g
 
 # To link any special libraries, add the necessary -l commands here.
 LDLIBS= -lml -lcl
@@ -35,19 +41,18 @@
         jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c \
         jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c \
         jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
+# virtual source files (not present in distribution file, see SETUP)
 VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
+# system-dependent implementations of virtual source files
 SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h \
         jmemdosa.asm
 # files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
+INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h
 # documentation, test, and support files
 DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
 MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas \
-        makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc \
-        makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf \
-        makljpeg.cf makefile.mms makefile.vms makvms.opt
+        makcjpeg.st makdjpeg.st makljpeg.st makefile.mc5 makefile.mc6 \
+        makefile.bcc makefile.mms makefile.vms makvms.opt
 OTHERFILES= ansi2knr.c ckconfig.c example.c
 TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
 DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) \
@@ -85,6 +90,11 @@
 	-$(RM) libjpeg.lib
 	$(AR) libjpeg.lib  $(LIBOBJECTS)
 
+jmemsys.c:
+	echo You must select a system-dependent jmemsys.c file.
+	echo Please read the SETUP directions.
+	exit 1
+
 clean:
 	-$(RM) *.o cjpeg djpeg libjpeg.lib core testout.*
 
@@ -96,7 +106,7 @@
 test: cjpeg djpeg
 	-$(RM) testout.ppm testout.gif testout.jpg
 	djpeg testorig.jpg testout.ppm
-	djpeg -G testorig.jpg testout.gif
+	djpeg -gif testorig.jpg testout.gif
 	cjpeg testimg.ppm testout.jpg
 	cmp testimg.ppm testout.ppm
 	cmp testimg.gif testout.gif
@@ -109,7 +119,7 @@
 jcdeflts.o : jcdeflts.c jinclude.h jconfig.h jpegdata.h 
 jcexpand.o : jcexpand.c jinclude.h jconfig.h jpegdata.h 
 jchuff.o : jchuff.c jinclude.h jconfig.h jpegdata.h 
-jcmain.o : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c 
+jcmain.o : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h 
 jcmaster.o : jcmaster.c jinclude.h jconfig.h jpegdata.h 
 jcmcu.o : jcmcu.c jinclude.h jconfig.h jpegdata.h 
 jcpipe.o : jcpipe.c jinclude.h jconfig.h jpegdata.h 
@@ -118,7 +128,7 @@
 jdcolor.o : jdcolor.c jinclude.h jconfig.h jpegdata.h 
 jddeflts.o : jddeflts.c jinclude.h jconfig.h jpegdata.h 
 jdhuff.o : jdhuff.c jinclude.h jconfig.h jpegdata.h 
-jdmain.o : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c 
+jdmain.o : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h 
 jdmaster.o : jdmaster.c jinclude.h jconfig.h jpegdata.h 
 jdmcu.o : jdmcu.c jinclude.h jconfig.h jpegdata.h 
 jdpipe.o : jdpipe.c jinclude.h jconfig.h jpegdata.h 
diff --git a/makefile.mc5 b/makefile.mc5
index 62f95b8..db42f33 100644
--- a/makefile.mc5
+++ b/makefile.mc5
@@ -28,19 +28,18 @@
         jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c \
         jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c \
         jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
+# virtual source files (not present in distribution file, see SETUP)
 VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
+# system-dependent implementations of virtual source files
 SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h \
         jmemdosa.asm
 # files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
+INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h
 # documentation, test, and support files
 DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
 MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas \
-        makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc \
-        makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf \
-        makljpeg.cf makefile.mms makefile.vms makvms.opt
+        makcjpeg.st makdjpeg.st makljpeg.st makefile.mc5 makefile.mc6 \
+        makefile.bcc makefile.mms makefile.vms makvms.opt
 OTHERFILES= ansi2knr.c ckconfig.c example.c
 TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
 DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) \
@@ -86,7 +85,7 @@
 
 jchuff.obj : jchuff.c jinclude.h jconfig.h jpegdata.h
 
-jcmain.obj : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
+jcmain.obj : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h
 	cl $(CFLAGS) /c $*.c
 
 jcmaster.obj : jcmaster.c jinclude.h jconfig.h jpegdata.h
@@ -105,7 +104,7 @@
 
 jdhuff.obj : jdhuff.c jinclude.h jconfig.h jpegdata.h
 
-jdmain.obj : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
+jdmain.obj : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h
 	cl $(CFLAGS) /c $*.c
 
 jdmaster.obj : jdmaster.c jinclude.h jconfig.h jpegdata.h
diff --git a/makefile.mc6 b/makefile.mc6
index 2373e44..64373ab 100644
--- a/makefile.mc6
+++ b/makefile.mc6
@@ -1,23 +1,29 @@
 # Makefile for Independent JPEG Group's software
 
-# This makefile is for Microsoft C for MS-DOS, version 6.x (use NMAKE).
+# This makefile is for Microsoft C for MS-DOS, version 6.00A and up.
+# Use NMAKE, not Microsoft's brain-damaged MAKE.
 # Thanks to Alan Wright and Chris Turner of Olivetti Research Ltd.
 
-# Read SETUP instructions before saying "make" !!
+# Read SETUP instructions before saying "nmake" !!
 
 # compiler flags. -D gives a #define to the sources:
-#       -O              default optimisation
+#       -AS             small memory model (or use -AM for medium model)
+#       -Ox             maximum safe optimisation
 #       -W3             warning level 3
 #       -Za             ANSI conformance, defines __STDC__ but undefines far
 #                       and near, so we DON'T use it.
 #       -DHAVE_STDC     indicate we do have all the ANSI language features
-#       -DINCLUDES_ARE_ANSI	and all the ANSI include files.
+#       -DINCLUDES_ARE_ANSI    and all the ANSI include files.
 #       -DMSDOS         we are on an MSDOS machine
+#       -DUSE_FMEM      we have _fmemcpy() and _fmemset()
+#       -DSHORTxLCONST_32      enables compiler-specific multiply optimization
 #       -DMEM_STATS     enable memory usage statistics (optional)
-#       -c              compile, don't link (implicit in inference rules)
 # You might also want to add -G2 if you have an 80286, etc.
 
-CFLAGS = -c -O -W3 -DHAVE_STDC -DINCLUDES_ARE_ANSI -DMSDOS
+CFLAGS = -AS -Ox -W3 -DHAVE_STDC -DINCLUDES_ARE_ANSI -DMSDOS -DUSE_FMEM -DSHORTxLCONST_32
+
+# need linker response file because file list > 128 chars
+RFILE = libjpeg.ans
 
 
 # source files (independently compilable files)
@@ -27,19 +33,18 @@
         jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c \
         jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c \
         jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
+# virtual source files (not present in distribution file, see SETUP)
 VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
+# system-dependent implementations of virtual source files
 SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h \
         jmemdosa.asm
 # files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
+INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h
 # documentation, test, and support files
 DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
 MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas \
-        makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc \
-        makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf \
-        makljpeg.cf makefile.mms makefile.vms makvms.opt
+        makcjpeg.st makdjpeg.st makljpeg.st makefile.mc5 makefile.mc6 \
+        makefile.bcc makefile.mms makefile.vms makvms.opt
 OTHERFILES= ansi2knr.c ckconfig.c example.c
 TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
 DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) \
@@ -63,7 +68,57 @@
 
 all: cjpeg.exe djpeg.exe
 
-# default rules in nmake will use cflags and compile the list below
+
+# libjpeg.lib is useful if you are including the JPEG software in a larger
+# program; you'd include it in your link, rather than the individual modules.
+libjpeg.lib: $(LIBOBJECTS) $(RFILE)
+    del libjpeg.lib
+    lib @$(RFILE) ;
+
+# linker response file for same
+$(RFILE) : Makefile
+    del $(RFILE)
+    echo libjpeg.lib >$(RFILE)
+# silly want-to-create-it prompt:
+    echo y >>$(RFILE)
+    echo +jcmaster.obj +jcdeflts.obj +jcarith.obj +jccolor.obj & >>$(RFILE)
+    echo +jcexpand.obj +jchuff.obj +jcmcu.obj +jcpipe.obj & >>$(RFILE)
+    echo +jcsample.obj +jfwddct.obj +jwrjfif.obj +jrdgif.obj & >>$(RFILE)
+    echo +jrdppm.obj +jrdrle.obj +jrdtarga.obj +jdmaster.obj & >>$(RFILE)
+    echo +jddeflts.obj +jbsmooth.obj +jdarith.obj +jdcolor.obj & >>$(RFILE)
+    echo +jdhuff.obj +jdmcu.obj +jdpipe.obj +jdsample.obj & >>$(RFILE)
+    echo +jquant1.obj +jquant2.obj +jrevdct.obj +jrdjfif.obj & >>$(RFILE)
+    echo +jwrgif.obj +jwrppm.obj +jwrrle.obj +jwrtarga.obj & >>$(RFILE)
+    echo +jutils.obj +jerror.obj +jmemmgr.obj +jmemsys.obj & >>$(RFILE)
+    echo +jmemdosa.obj >>$(RFILE)
+
+cjpeg.exe: jcmain.obj libjpeg.lib
+    link /STACK:4096 /EXEPACK jcmain.obj, cjpeg.exe, , libjpeg.lib, ;
+
+djpeg.exe: jdmain.obj libjpeg.lib
+    link /STACK:4096 /EXEPACK jdmain.obj, djpeg.exe, , libjpeg.lib, ;
+
+jmemsys.c:
+	echo You must select a system-dependent jmemsys.c file.
+	echo Please read the SETUP directions.
+	exit 1
+
+clean:
+	del *.obj
+	del libjpeg.lib
+	del cjpeg.exe
+	del djpeg.exe
+	del testout.*
+
+test:
+        del testout.*
+        djpeg testorig.jpg testout.ppm
+        djpeg -gif testorig.jpg testout.gif
+        cjpeg testimg.ppm testout.jpg
+        fc testimg.ppm testout.ppm
+        fc testimg.gif testout.gif
+        fc testimg.jpg testout.jpg
+
 
 jbsmooth.obj : jbsmooth.c jinclude.h jconfig.h jpegdata.h
 jcarith.obj : jcarith.c jinclude.h jconfig.h jpegdata.h
@@ -71,7 +126,7 @@
 jcdeflts.obj : jcdeflts.c jinclude.h jconfig.h jpegdata.h
 jcexpand.obj : jcexpand.c jinclude.h jconfig.h jpegdata.h
 jchuff.obj : jchuff.c jinclude.h jconfig.h jpegdata.h
-jcmain.obj : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
+jcmain.obj : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h
 jcmaster.obj : jcmaster.c jinclude.h jconfig.h jpegdata.h
 jcmcu.obj : jcmcu.c jinclude.h jconfig.h jpegdata.h
 jcpipe.obj : jcpipe.c jinclude.h jconfig.h jpegdata.h
@@ -80,7 +135,7 @@
 jdcolor.obj : jdcolor.c jinclude.h jconfig.h jpegdata.h
 jddeflts.obj : jddeflts.c jinclude.h jconfig.h jpegdata.h
 jdhuff.obj : jdhuff.c jinclude.h jconfig.h jpegdata.h
-jdmain.obj : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
+jdmain.obj : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h
 jdmaster.obj : jdmaster.c jinclude.h jconfig.h jpegdata.h
 jdmcu.obj : jdmcu.c jinclude.h jconfig.h jpegdata.h
 jdpipe.obj : jdpipe.c jinclude.h jconfig.h jpegdata.h
@@ -103,24 +158,5 @@
 jwrrle.obj : jwrrle.c jinclude.h jconfig.h jpegdata.h
 jwrtarga.obj : jwrtarga.c jinclude.h jconfig.h jpegdata.h
 jmemsys.obj : jmemsys.c jinclude.h jconfig.h jpegdata.h jmemsys.h
-
 jmemdosa.obj : jmemdosa.asm
-	masm /mx $*;
-
-
-# use linker response files because file list > 128 chars
-
-cjpeg.exe: $(COBJECTS)
-        link /STACK:8192 @makcjpeg.lnk
-
-djpeg.exe: $(DOBJECTS)
-        link /STACK:8192 @makdjpeg.lnk
-
-test:
-        del testout.*
-        djpeg testorig.jpg testout.ppm
-        djpeg -G testorig.jpg testout.gif
-        cjpeg testimg.ppm testout.jpg
-        fc testimg.ppm testout.ppm
-        fc testimg.gif testout.gif
-        fc testimg.jpg testout.jpg
+    masm /mx $*;
diff --git a/makefile.mms b/makefile.mms
index 75c8432..690e96d 100644
--- a/makefile.mms
+++ b/makefile.mms
@@ -1,7 +1,8 @@
 # Makefile for Independent JPEG Group's software
 
 # This makefile is for use with MMS on VAX/VMS systems.
-# Thanks to Rick Dyson (dyson@iowasp.physics.uiowa.edu) for his help.
+# Thanks to Rick Dyson (dyson@iowasp.physics.uiowa.edu)
+# and Tim Bell (tbell@netcom.com) for their help.
 
 # Read SETUP instructions before saying "MMS" !!
 
@@ -16,19 +17,18 @@
         jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c \
         jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c \
         jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
+# virtual source files (not present in distribution file, see SETUP)
 VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
+# system-dependent implementations of virtual source files
 SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h \
         jmemdosa.asm
 # files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
+INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h
 # documentation, test, and support files
 DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
 MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas \
-        makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc \
-        makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf \
-        makljpeg.cf makefile.mms makefile.vms makvms.opt
+        makcjpeg.st makdjpeg.st makljpeg.st makefile.mc5 makefile.mc6 \
+        makefile.bcc makefile.mms makefile.vms makvms.opt
 OTHERFILES= ansi2knr.c ckconfig.c example.c
 TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
 DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) \
@@ -93,6 +93,14 @@
 	- Purge /NoLog /NoConfirm *.*
 	- Delete /NoLog /NoConfirm *.OBJ;
 
+test : cjpeg.exe djpeg.exe
+	mcr sys$disk:[]djpeg      testorig.jpg testout.ppm
+	mcr sys$disk:[]djpeg -gif testorig.jpg testout.gif
+	mcr sys$disk:[]cjpeg      testimg.ppm testout.jpg
+	- Backup /Compare/Log	  testimg.ppm testout.ppm
+	- Backup /Compare/Log	  testimg.gif testout.gif
+	- Backup /Compare/Log	  testimg.jpg testout.jpg
+
 
 jbsmooth.obj : jbsmooth.c jinclude.h jconfig.h jpegdata.h
 jcarith.obj : jcarith.c jinclude.h jconfig.h jpegdata.h
@@ -100,7 +108,7 @@
 jcdeflts.obj : jcdeflts.c jinclude.h jconfig.h jpegdata.h
 jcexpand.obj : jcexpand.c jinclude.h jconfig.h jpegdata.h
 jchuff.obj : jchuff.c jinclude.h jconfig.h jpegdata.h
-jcmain.obj : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
+jcmain.obj : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h
 jcmaster.obj : jcmaster.c jinclude.h jconfig.h jpegdata.h
 jcmcu.obj : jcmcu.c jinclude.h jconfig.h jpegdata.h
 jcpipe.obj : jcpipe.c jinclude.h jconfig.h jpegdata.h
@@ -109,7 +117,7 @@
 jdcolor.obj : jdcolor.c jinclude.h jconfig.h jpegdata.h
 jddeflts.obj : jddeflts.c jinclude.h jconfig.h jpegdata.h
 jdhuff.obj : jdhuff.c jinclude.h jconfig.h jpegdata.h
-jdmain.obj : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
+jdmain.obj : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h
 jdmaster.obj : jdmaster.c jinclude.h jconfig.h jpegdata.h
 jdmcu.obj : jdmcu.c jinclude.h jconfig.h jpegdata.h
 jdpipe.obj : jdpipe.c jinclude.h jconfig.h jpegdata.h
diff --git a/makefile.pwc b/makefile.pwc
deleted file mode 100644
index 8d36310..0000000
--- a/makefile.pwc
+++ /dev/null
@@ -1,130 +0,0 @@
-# Makefile for Independent JPEG Group's software
-
-# This makefile is for Mix Software's Power C, v2.1.1
-# and Dan Grayson's pd make 2.14 under MS-DOS.
-# This file assumes that you have Microsoft's MASM or a compatible assembler
-# to handle the jmemdosa.asm file.  If not, you will need to use jmemname.c
-# and go to a large-data memory model.
-# Thanks to Bob Hardy for this version.
-
-# Read SETUP instructions before saying "make" !!
-
-# The name of your C compiler:
-CC=pc
-
-# You may need to adjust these cc options:
-MODEL=m
-CFLAGS= -dMSDOS -m$(MODEL)
-# In particular:
-#   Add -dMEM_STATS to enable gathering of memory usage statistics.
-
-# Link-time cc options:
-LDFLAGS=
-LDLIBS=
-
-# miscellaneous OS-dependent stuff
-# linker
-LN=pcl
-# file deletion command
-RM=del
-# library (.mix) file creation command
-AR=merge
-
-
-# source files (independently compilable files)
-SOURCES= jbsmooth.c jcarith.c jccolor.c jcdeflts.c jcexpand.c jchuff.c jcmain.c jcmaster.c jcmcu.c jcpipe.c jcsample.c jdarith.c jdcolor.c jddeflts.c jdhuff.c jdmain.c jdmaster.c jdmcu.c jdpipe.c jdsample.c jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
-VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
-SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h jmemdosa.asm
-# files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
-# documentation, test, and support files
-DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
-MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf makljpeg.cf makefile.mms makefile.vms makvms.opt
-OTHERFILES= ansi2knr.c ckconfig.c example.c
-TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
-DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) $(OTHERFILES) $(TESTFILES)
-# objectfiles common to cjpeg and djpeg
-COMOBJECTS= jutils.mix jerror.mix jmemmgr.mix jmemsys.mix jmemdosa.mix
-# compression objectfiles
-CLIBOBJECTS= jcmaster.mix jcdeflts.mix jcarith.mix jccolor.mix jcexpand.mix jchuff.mix jcmcu.mix jcpipe.mix jcsample.mix jfwddct.mix jwrjfif.mix jrdgif.mix jrdppm.mix jrdrle.mix jrdtarga.mix
-COBJECTS= jcmain.mix $(CLIBOBJECTS) $(COMOBJECTS)
-# decompression objectfiles
-DLIBOBJECTS= jdmaster.mix jddeflts.mix jbsmooth.mix jdarith.mix jdcolor.mix jdhuff.mix jdmcu.mix jdpipe.mix jdsample.mix jquant1.mix jquant2.mix jrevdct.mix jrdjfif.mix jwrgif.mix jwrppm.mix jwrrle.mix jwrtarga.mix
-DOBJECTS= jdmain.mix $(DLIBOBJECTS) $(COMOBJECTS)
-# These objectfiles are included in libjpeg.mix
-LIBOBJECTS= $(CLIBOBJECTS) $(DLIBOBJECTS) $(COMOBJECTS)
-
-
-all: cjpeg.exe djpeg.exe
-# By default, libjpeg.mix is not built unless you explicitly request it.
-# You can add libjpeg.mix to the line above if you want it built by default.
-
-
-cjpeg.exe: $(COBJECTS)
-	$(LN) $(LDFLAGS) @makcjpeg.cf
-
-djpeg.exe: $(DOBJECTS)
-	$(LN) $(LDFLAGS) @makdjpeg.cf
-
-# libjpeg.mix is useful if you are including the JPEG software in a larger
-# program; you'd include it in your link, rather than the individual modules.
-libjpeg.mix: $(LIBOBJECTS)
-	@$(RM) libjpeg.mix
-	$(AR) libjpeg.mix @makljpeg.cf
-
-clean:
-	$(RM) *.mix cjpeg.exe djpeg.exe testout.*
-
-test:
-	@$(RM) testout.*
-	+djpeg testorig.jpg testout.ppm
-	+djpeg -G testorig.jpg testout.gif
-	+cjpeg testimg.ppm testout.jpg
-	fc testimg.ppm testout.ppm
-	fc testimg.gif testout.gif
-	fc testimg.jpg testout.jpg
-
-
-jbsmooth.mix : jbsmooth.c jinclude.h jconfig.h jpegdata.h
-jcarith.mix : jcarith.c jinclude.h jconfig.h jpegdata.h
-jccolor.mix : jccolor.c jinclude.h jconfig.h jpegdata.h
-jcdeflts.mix : jcdeflts.c jinclude.h jconfig.h jpegdata.h
-jcexpand.mix : jcexpand.c jinclude.h jconfig.h jpegdata.h
-jchuff.mix : jchuff.c jinclude.h jconfig.h jpegdata.h
-jcmain.mix : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
-jcmaster.mix : jcmaster.c jinclude.h jconfig.h jpegdata.h
-jcmcu.mix : jcmcu.c jinclude.h jconfig.h jpegdata.h
-jcpipe.mix : jcpipe.c jinclude.h jconfig.h jpegdata.h
-jcsample.mix : jcsample.c jinclude.h jconfig.h jpegdata.h
-jdarith.mix : jdarith.c jinclude.h jconfig.h jpegdata.h
-jdcolor.mix : jdcolor.c jinclude.h jconfig.h jpegdata.h
-jddeflts.mix : jddeflts.c jinclude.h jconfig.h jpegdata.h
-jdhuff.mix : jdhuff.c jinclude.h jconfig.h jpegdata.h
-jdmain.mix : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c
-jdmaster.mix : jdmaster.c jinclude.h jconfig.h jpegdata.h
-jdmcu.mix : jdmcu.c jinclude.h jconfig.h jpegdata.h
-jdpipe.mix : jdpipe.c jinclude.h jconfig.h jpegdata.h
-jdsample.mix : jdsample.c jinclude.h jconfig.h jpegdata.h
-jerror.mix : jerror.c jinclude.h jconfig.h jpegdata.h
-jquant1.mix : jquant1.c jinclude.h jconfig.h jpegdata.h
-jquant2.mix : jquant2.c jinclude.h jconfig.h jpegdata.h
-jfwddct.mix : jfwddct.c jinclude.h jconfig.h jpegdata.h
-jrevdct.mix : jrevdct.c jinclude.h jconfig.h jpegdata.h
-jutils.mix : jutils.c jinclude.h jconfig.h jpegdata.h
-jmemmgr.mix : jmemmgr.c jinclude.h jconfig.h jpegdata.h jmemsys.h
-jrdjfif.mix : jrdjfif.c jinclude.h jconfig.h jpegdata.h
-jrdgif.mix : jrdgif.c jinclude.h jconfig.h jpegdata.h
-jrdppm.mix : jrdppm.c jinclude.h jconfig.h jpegdata.h
-jrdrle.mix : jrdrle.c jinclude.h jconfig.h jpegdata.h
-jrdtarga.mix : jrdtarga.c jinclude.h jconfig.h jpegdata.h
-jwrjfif.mix : jwrjfif.c jinclude.h jconfig.h jpegdata.h
-jwrgif.mix : jwrgif.c jinclude.h jconfig.h jpegdata.h
-jwrppm.mix : jwrppm.c jinclude.h jconfig.h jpegdata.h
-jwrrle.mix : jwrrle.c jinclude.h jconfig.h jpegdata.h
-jwrtarga.mix : jwrtarga.c jinclude.h jconfig.h jpegdata.h
-jmemsys.mix : jmemsys.c jinclude.h jconfig.h jpegdata.h jmemsys.h
-jmemdosa.mix : jmemdosa.asm
-	masm /mx jmemdosa;
-	mix jmemdosa.obj
diff --git a/makefile.sas b/makefile.sas
index 7b0dd98..458251a 100644
--- a/makefile.sas
+++ b/makefile.sas
@@ -20,7 +20,7 @@
 # You may need to adjust these cc options:
 CFLAGS= -v -b -rr -O -j104 $(ARCHFLAGS) -DHAVE_STDC -DINCLUDES_ARE_ANSI \
 	-DAMIGA -DTWO_FILE_COMMANDLINE -DINCOMPLETE_TYPES_BROKEN \
-	-DNO_MKTEMP -DNEED_SIGNAL_CATCHER
+	-DNO_MKTEMP -DNEED_SIGNAL_CATCHER -DSHORTxSHORT_32
 # -j104 disables warnings for mismatched const qualifiers
 
 # Link-time cc options:
@@ -45,19 +45,18 @@
         jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c \
         jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c \
         jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
+# virtual source files (not present in distribution file, see SETUP)
 VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
+# system-dependent implementations of virtual source files
 SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h \
         jmemdosa.asm
 # files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
+INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h
 # documentation, test, and support files
 DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
 MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas \
-        makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc \
-        makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf \
-        makljpeg.cf makefile.mms makefile.vms makvms.opt
+        makcjpeg.st makdjpeg.st makljpeg.st makefile.mc5 makefile.mc6 \
+        makefile.bcc makefile.mms makefile.vms makvms.opt
 OTHERFILES= ansi2knr.c ckconfig.c example.c
 TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
 DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) \
@@ -105,6 +104,11 @@
 	-$(RM) libjpeg.lib
 	$(AR) libjpeg.lib r $(LIBOBJECTS)
 
+jmemsys.c:
+	echo You must select a system-dependent jmemsys.c file.
+	echo Please read the SETUP directions.
+	exit 1
+
 clean:
 	-$(RM) *.o cjpeg djpeg cjpeg.030 djpeg.030 libjpeg.lib core testout.*
 
@@ -116,7 +120,7 @@
 test: cjpeg djpeg
 	-$(RM) testout.ppm testout.gif testout.jpg
 	djpeg testorig.jpg testout.ppm
-	djpeg -G testorig.jpg testout.gif
+	djpeg -gif testorig.jpg testout.gif
 	cjpeg testimg.ppm testout.jpg
 	cmp testimg.ppm testout.ppm
 	cmp testimg.gif testout.gif
@@ -129,7 +133,7 @@
 jcdeflts.o : jcdeflts.c jinclude.h jconfig.h jpegdata.h 
 jcexpand.o : jcexpand.c jinclude.h jconfig.h jpegdata.h 
 jchuff.o : jchuff.c jinclude.h jconfig.h jpegdata.h 
-jcmain.o : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c 
+jcmain.o : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h 
 jcmaster.o : jcmaster.c jinclude.h jconfig.h jpegdata.h 
 jcmcu.o : jcmcu.c jinclude.h jconfig.h jpegdata.h 
 jcpipe.o : jcpipe.c jinclude.h jconfig.h jpegdata.h 
@@ -138,7 +142,7 @@
 jdcolor.o : jdcolor.c jinclude.h jconfig.h jpegdata.h 
 jddeflts.o : jddeflts.c jinclude.h jconfig.h jpegdata.h 
 jdhuff.o : jdhuff.c jinclude.h jconfig.h jpegdata.h 
-jdmain.o : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c 
+jdmain.o : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h 
 jdmaster.o : jdmaster.c jinclude.h jconfig.h jpegdata.h 
 jdmcu.o : jdmcu.c jinclude.h jconfig.h jpegdata.h 
 jdpipe.o : jdpipe.c jinclude.h jconfig.h jpegdata.h 
diff --git a/makefile.unix b/makefile.unix
index 701fc4f..0681c92 100644
--- a/makefile.unix
+++ b/makefile.unix
@@ -45,19 +45,18 @@
         jerror.c jquant1.c jquant2.c jfwddct.c jrevdct.c jutils.c jmemmgr.c \
         jrdjfif.c jrdgif.c jrdppm.c jrdrle.c jrdtarga.c jwrjfif.c jwrgif.c \
         jwrppm.c jwrrle.c jwrtarga.c
-# virtual source files (not present in distribution file)
+# virtual source files (not present in distribution file, see SETUP)
 VIRTSOURCES= jmemsys.c
-# system-dependent implementations of source files
+# system-dependent implementations of virtual source files
 SYSDEPFILES= jmemansi.c jmemname.c jmemnobs.c jmemdos.c jmemdos.h \
         jmemdosa.asm
 # files included by source files
-INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h egetopt.c
+INCLUDES= jinclude.h jconfig.h jpegdata.h jversion.h jmemsys.h
 # documentation, test, and support files
 DOCS= README SETUP USAGE CHANGELOG cjpeg.1 djpeg.1 architecture codingrules
 MAKEFILES= makefile.ansi makefile.unix makefile.manx makefile.sas \
-        makefile.mc5 makefile.mc6 makcjpeg.lnk makdjpeg.lnk makefile.bcc \
-        makcjpeg.lst makdjpeg.lst makefile.pwc makcjpeg.cf makdjpeg.cf \
-        makljpeg.cf makefile.mms makefile.vms makvms.opt
+        makcjpeg.st makdjpeg.st makljpeg.st makefile.mc5 makefile.mc6 \
+        makefile.bcc makefile.mms makefile.vms makvms.opt
 OTHERFILES= ansi2knr.c ckconfig.c example.c
 TESTFILES= testorig.jpg testimg.ppm testimg.gif testimg.jpg
 DISTFILES= $(DOCS) $(MAKEFILES) $(SOURCES) $(SYSDEPFILES) $(INCLUDES) \
@@ -109,6 +108,11 @@
 	$(AR) libjpeg.a  $(LIBOBJECTS)
 	$(AR2) libjpeg.a
 
+jmemsys.c:
+	echo You must select a system-dependent jmemsys.c file.
+	echo Please read the SETUP directions.
+	exit 1
+
 clean:
 	$(RM) *.o cjpeg djpeg libjpeg.a ansi2knr core tmpansi.* testout.*
 
@@ -120,7 +124,7 @@
 test: cjpeg djpeg
 	$(RM) testout.ppm testout.gif testout.jpg
 	./djpeg testorig.jpg >testout.ppm
-	./djpeg -G testorig.jpg >testout.gif
+	./djpeg -gif testorig.jpg >testout.gif
 	./cjpeg testimg.ppm >testout.jpg
 	cmp testimg.ppm testout.ppm
 	cmp testimg.gif testout.gif
@@ -133,7 +137,7 @@
 jcdeflts.o : jcdeflts.c jinclude.h jconfig.h jpegdata.h 
 jcexpand.o : jcexpand.c jinclude.h jconfig.h jpegdata.h 
 jchuff.o : jchuff.c jinclude.h jconfig.h jpegdata.h 
-jcmain.o : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c 
+jcmain.o : jcmain.c jinclude.h jconfig.h jpegdata.h jversion.h 
 jcmaster.o : jcmaster.c jinclude.h jconfig.h jpegdata.h 
 jcmcu.o : jcmcu.c jinclude.h jconfig.h jpegdata.h 
 jcpipe.o : jcpipe.c jinclude.h jconfig.h jpegdata.h 
@@ -142,7 +146,7 @@
 jdcolor.o : jdcolor.c jinclude.h jconfig.h jpegdata.h 
 jddeflts.o : jddeflts.c jinclude.h jconfig.h jpegdata.h 
 jdhuff.o : jdhuff.c jinclude.h jconfig.h jpegdata.h 
-jdmain.o : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h egetopt.c 
+jdmain.o : jdmain.c jinclude.h jconfig.h jpegdata.h jversion.h 
 jdmaster.o : jdmaster.c jinclude.h jconfig.h jpegdata.h 
 jdmcu.o : jdmcu.c jinclude.h jconfig.h jpegdata.h 
 jdpipe.o : jdpipe.c jinclude.h jconfig.h jpegdata.h 
diff --git a/makefile.vms b/makefile.vms
index 460363f..243c6bb 100644
--- a/makefile.vms
+++ b/makefile.vms
@@ -2,8 +2,9 @@
 $!
 $! This is a command procedure for use on VAX/VMS systems that do not have MMS.
 $! It builds the JPEG software by brute force, recompiling everything whether
-$! or not it is necessary.
-$! Thanks to Rick Dyson (dyson@iowasp.physics.uiowa.edu) for his help.
+$! or not it is necessary.  It then runs the basic self-test.
+$! Thanks to Rick Dyson (dyson@iowasp.physics.uiowa.edu)
+$! and Tim Bell (tbell@netcom.com) for their help.
 $!
 $! Read SETUP instructions before running this!!
 $!
@@ -61,4 +62,13 @@
 $!
 $ Link /Executable = djpeg.exe  jdmain.obj,libjpeg.olb/Library,Sys$Disk:[]MAKVMS.OPT/Option
 $!
+$! Run the self-test
+$!
+$ mcr sys$disk:[]djpeg      testorig.jpg testout.ppm
+$ mcr sys$disk:[]djpeg -gif testorig.jpg testout.gif
+$ mcr sys$disk:[]cjpeg      testimg.ppm testout.jpg
+$ Backup /Compare/Log testimg.ppm testout.ppm
+$ Backup /Compare/Log testimg.gif testout.gif
+$ Backup /Compare/Log testimg.jpg testout.jpg
+$!
 $ Exit
diff --git a/makljpeg.cf b/makljpeg.cf
deleted file mode 100644
index 23c24fa..0000000
--- a/makljpeg.cf
+++ /dev/null
@@ -1,7 +0,0 @@
-jcmaster.mix,jcdeflts.mix,jcarith.mix,jccolor.mix,jcexpand.mix,jchuff.mix
-jcmcu.mix,jcpipe.mix,jcsample.mix,jfwddct.mix,jwrjfif.mix,jrdgif.mix
-jrdppm.mix,jrdrle.mix,jrdtarga.mix,jdmaster.mix,jddeflts.mix,jbsmooth.mix
-jdarith.mix,jdcolor.mix,jdhuff.mix,jdmcu.mix,jdpipe.mix,jdsample.mix
-jquant1.mix,jquant2.mix,jrevdct.mix,jrdjfif.mix,jwrgif.mix,jwrppm.mix
-jwrrle.mix,jwrtarga.mix,jutils.mix,jerror.mix,jmemmgr.mix,jmemsys.mix
-jmemdosa.mix
diff --git a/makljpeg.st b/makljpeg.st
new file mode 100644
index 0000000..cec75ce
--- /dev/null
+++ b/makljpeg.st
@@ -0,0 +1,63 @@
+; Project file for Independent JPEG Group's software
+;
+; This project file is for Atari ST/STE/TT systems using Pure C or Turbo C.
+; Thanks to Frank Moehle (Frank.Moehle@arbi.informatik.uni-oldenburg.de)
+; and to Dr. B. Setzepfandt (bernd@gina.uni-muenster.de).
+;
+; To use this file, rename it to LIB_JPEG.PRJ.
+; Read SETUP instructions before trying to make the library!
+;
+;
+;      * * * Output file * * *
+jpeg.lib
+;
+;      * * * DEFINES * * * 
+.C[-DHAVE_STDC]
+.C[-DINCLUDES_ARE_ANSI]
+;
+; * * * COMPILER OPTIONS * * *  
+.C[-P]        ; absolute calls
+.C[-M]        ; and no string merging, folks
+.C[-w-par]    ; no "parameter xxxx unused"
+.C[-DINCOMPLETE_TYPES_BROKEN] ; suppress warnings about undefined structures
+.C[-wsig]     ; warn if significant digits may be lost
+.C[-w-cln]    ; no "constant is long" warnings
+.L[-J]        ; link new Obj-format (so we get a library)
+=
+; * * * * List of modules * * * * 
+jbsmooth.c		(jinclude.h,jconfig.h,jpegdata.h)
+jcarith.c		(jinclude.h,jconfig.h,jpegdata.h)
+jccolor.c		(jinclude.h,jconfig.h,jpegdata.h)
+jcdeflts.c		(jinclude.h,jconfig.h,jpegdata.h)
+jcexpand.c		(jinclude.h,jconfig.h,jpegdata.h)
+jchuff.c		(jinclude.h,jconfig.h,jpegdata.h)
+jcmaster.c		(jinclude.h,jconfig.h,jpegdata.h)
+jcmcu.c		(jinclude.h,jconfig.h,jpegdata.h)
+jcpipe.c		(jinclude.h,jconfig.h,jpegdata.h)
+jcsample.c		(jinclude.h,jconfig.h,jpegdata.h)
+jdarith.c		(jinclude.h,jconfig.h,jpegdata.h)
+jdcolor.c		(jinclude.h,jconfig.h,jpegdata.h)
+jddeflts.c		(jinclude.h,jconfig.h,jpegdata.h)
+jdhuff.c		(jinclude.h,jconfig.h,jpegdata.h)
+jdmaster.c		(jinclude.h,jconfig.h,jpegdata.h)
+jdmcu.c		(jinclude.h,jconfig.h,jpegdata.h)
+jdpipe.c		(jinclude.h,jconfig.h,jpegdata.h)
+jdsample.c		(jinclude.h,jconfig.h,jpegdata.h)
+jerror.c		(jinclude.h,jconfig.h,jpegdata.h)
+jquant1.c		(jinclude.h,jconfig.h,jpegdata.h)
+jquant2.c		(jinclude.h,jconfig.h,jpegdata.h)
+jfwddct.c		(jinclude.h,jconfig.h,jpegdata.h)
+jrevdct.c		(jinclude.h,jconfig.h,jpegdata.h)
+jutils.c		(jinclude.h,jconfig.h,jpegdata.h)
+jmemmgr.c		(jinclude.h,jconfig.h,jpegdata.h,jmemsys.h)
+jrdjfif.c		(jinclude.h,jconfig.h,jpegdata.h)
+jrdgif.c		(jinclude.h,jconfig.h,jpegdata.h)
+jrdppm.c		(jinclude.h,jconfig.h,jpegdata.h)
+jrdrle.c		(jinclude.h,jconfig.h,jpegdata.h)
+jrdtarga.c		(jinclude.h,jconfig.h,jpegdata.h)
+jwrjfif.c		(jinclude.h,jconfig.h,jpegdata.h)
+jwrgif.c		(jinclude.h,jconfig.h,jpegdata.h)
+jwrppm.c		(jinclude.h,jconfig.h,jpegdata.h)
+jwrrle.c		(jinclude.h,jconfig.h,jpegdata.h)
+jwrtarga.c		(jinclude.h,jconfig.h,jpegdata.h)
+jmemsys.c		(jinclude.h,jconfig.h,jpegdata.h,jmemsys.h)
diff --git a/testimg.gif b/testimg.gif
index 9726814..1744b40 100644
--- a/testimg.gif
+++ b/testimg.gif
Binary files differ
diff --git a/testimg.jpg b/testimg.jpg
index 644d07e..e88cb2e 100644
--- a/testimg.jpg
+++ b/testimg.jpg
Binary files differ
diff --git a/testimg.ppm b/testimg.ppm
index 61fd707..1a936c4 100644
--- a/testimg.ppm
+++ b/testimg.ppm
Binary files differ
diff --git a/testorig.jpg b/testorig.jpg
index dabab10..db9a62f 100644
--- a/testorig.jpg
+++ b/testorig.jpg
Binary files differ