The Independent JPEG Group's JPEG software v5b
diff --git a/README b/README
index daa7961..6f2a8ed 100644
--- a/README
+++ b/README
@@ -1,8 +1,8 @@
 The Independent JPEG Group's JPEG software
 ==========================================
 
-README for release 5a of 7-Dec-94
-=================================
+README for release 5b of 15-Mar-95
+==================================
 
 This distribution contains the fifth public release of the Independent JPEG
 Group's free JPEG software.  You are welcome to redistribute this software and
@@ -82,7 +82,7 @@
 reasons, we are not distributing code for the arithmetic-coding process; see
 LEGAL ISSUES.  At present we have made no provision for supporting the
 progressive, hierarchical, or lossless processes defined in the standard.
-(Support for progressive mode may be offered in a future release.)
+(Support for progressive mode will be offered in a future release.)
 
 In order to support file conversion and viewing software, we have included
 considerable functionality beyond the bare JPEG coding/decoding capability;
@@ -124,7 +124,7 @@
 fitness for a particular purpose.  This software is provided "AS IS", and you,
 its user, assume the entire risk as to its quality and accuracy.
 
-This software is copyright (C) 1991, 1992, 1993, 1994, Thomas G. Lane.
+This software is copyright (C) 1991, 1992, 1993, 1994, 1995, Thomas G. Lane.
 All Rights Reserved except as specified below.
 
 Permission is hereby granted to use, copy, modify, and distribute this
@@ -176,6 +176,14 @@
 So far as we are aware, there are no patent restrictions on the remaining
 code.
 
+WARNING: Unisys has begun to enforce their patent on LZW compression against
+GIF encoders and decoders.  You will need a license from Unisys to use the
+included rdgif.c or wrgif.c files in a commercial or shareware application.
+At this time, Unisys is not enforcing their patent against freeware, so
+distribution of this package remains legal.  However, we intend to remove
+GIF support from the IJG package as soon as a suitable replacement format
+becomes reasonably popular.
+
 We are required to state that
     "The Graphics Interchange Format(c) is the Copyright property of
     CompuServe Incorporated.  GIF(sm) is a Service Mark property of
@@ -263,7 +271,7 @@
 The "official" archive site for this software is ftp.uu.net (Internet
 address 192.48.96.9).  The most recent released version can always be found
 there in directory graphics/jpeg.  This particular version will be archived
-as graphics/jpeg/jpegsrc.v5a.tar.gz.  If you are on the Internet, you
+as graphics/jpeg/jpegsrc.v5b.tar.gz.  If you are on the Internet, you
 can retrieve files from ftp.uu.net by standard anonymous FTP.  If you don't
 have FTP access, UUNET's archives are also available via UUCP; contact
 help@uunet.uu.net for information on retrieving files that way.
@@ -273,10 +281,9 @@
 submissions.  However, only ftp.uu.net is guaranteed to have the latest
 official version.
 
-You can also obtain this software from CompuServe, in the GRAPHSUPPORT
-forum (GO GRAPHSUP), probably in library 15 (there are rumors of a pending
-reorganization there).  Again, CompuServe is not guaranteed to have the
-very latest version.
+You can also obtain this software from CompuServe, in the GRAPHSUPPORT forum
+(GO GRAPHSUP), library 12 "JPEG Tools".  Again, CompuServe is not guaranteed
+to have the very latest version.
 
 The JPEG FAQ (Frequently Asked Questions) article is a useful source of
 general information about JPEG.  It is updated constantly and therefore is
diff --git a/change.log b/change.log
index 8a7b8db..6197001 100644
--- a/change.log
+++ b/change.log
@@ -1,6 +1,23 @@
 CHANGE LOG for Independent JPEG Group's JPEG software
 
 
+Version 5b  15-Mar-95
+---------------------
+
+Correct bugs with grayscale images having v_samp_factor > 1.
+
+jpeg_write_raw_data() now supports output suspension.
+
+Correct bugs in "configure" script for case of compiling in
+a directory other than the one containing the source files.
+
+Repair bug in jquant1.c: sometimes didn't use as many colors as it could.
+
+Borland C makefile and jconfig file work under either MS-DOS or OS/2.
+
+Miscellaneous improvements to documentation.
+
+
 Version 5a  7-Dec-94
 --------------------
 
diff --git a/cjpeg.1 b/cjpeg.1
index 80c2110..77c60cc 100644
--- a/cjpeg.1
+++ b/cjpeg.1
@@ -1,4 +1,4 @@
-.TH CJPEG 1 "30 August 1994"
+.TH CJPEG 1 "12 December 1994"
 .SH NAME
 cjpeg \- compress an image file to a JPEG file
 .SH SYNOPSIS
@@ -115,11 +115,11 @@
 .TP
 .B \-dct float
 Use floating-point DCT method.
-The floating-point method is the most accurate, but will be the slowest unless
-your machine has very fast floating-point hardware.  Also note that results of
-the floating-point method may vary slightly across machines, while the integer
-methods should give the same results everywhere.  The fast integer method is
-much less accurate than the other two.
+The float method is very slightly more accurate than the int method, but is
+much slower unless your machine has very fast floating-point hardware.  Also
+note that results of the floating-point method may vary slightly across
+machines, while the integer methods should give the same results everywhere.
+The fast integer method is much less accurate than the other two.
 .TP
 .BI \-restart " N"
 Emit a JPEG restart marker every N MCU rows, or every N MCU blocks if "B" is
diff --git a/configure b/configure
index 4c25b3d..ae3f222 100755
--- a/configure
+++ b/configure
@@ -1346,6 +1346,12 @@
 else
   ANSI2KNRFLAGS=""
 fi
+# Set up -I directives
+if test "x$srcdir" = x.; then
+  INCLUDEFLAGS='-I$(srcdir)'
+else
+  INCLUDEFLAGS='-I. -I$(srcdir)'
+fi
 trap '' 1 2 15
 
 trap 'rm -fr conftest* confdefs* core $ac_clean_files; exit 1' 1 2 15
@@ -1426,6 +1432,7 @@
 s%@ANSI2KNR@%$ANSI2KNR%g
 s%@ISANSICOM@%$ISANSICOM%g
 s%@ANSI2KNRFLAGS@%$ANSI2KNRFLAGS%g
+s%@INCLUDEFLAGS@%$INCLUDEFLAGS%g
 
 CEOF
 EOF
diff --git a/djpeg.1 b/djpeg.1
index 08f80c5..e9ac63e 100644
--- a/djpeg.1
+++ b/djpeg.1
@@ -1,4 +1,4 @@
-.TH DJPEG 1 "28 August 1994"
+.TH DJPEG 1 "12 December 1994"
 .SH NAME
 djpeg \- decompress a JPEG file to an image file
 .SH SYNOPSIS
@@ -117,11 +117,11 @@
 .TP
 .B \-dct float
 Use floating-point DCT method.
-The floating-point method is the most accurate, but will be the slowest unless
-your machine has very fast floating-point hardware.  Also note that results of
-the floating-point method may vary slightly across machines, while the integer
-methods should give the same results everywhere.  The fast integer method is
-much less accurate than the other two.
+The float method is very slightly more accurate than the int method, but is
+much slower unless your machine has very fast floating-point hardware.  Also
+note that results of the floating-point method may vary slightly across
+machines, while the integer methods should give the same results everywhere.
+The fast integer method is much less accurate than the other two.
 .TP
 .B \-dither fs
 Use Floyd-Steinberg dithering in color quantization.
@@ -214,8 +214,10 @@
 one-pass mode.
 .PP
 If you are fortunate enough to have very fast floating point hardware,
-.B \-dct float
-may be even faster than \fB\-dct fast\fR.
+\fB\-dct float\fR may be even faster than \fB\-dct fast\fR.  But on most
+machines \fB\-dct float\fR is slower than \fB\-dct int\fR; in this case it is
+not worth using, because its theoretical accuracy advantage is too small to be
+significant in practice.
 .SH ENVIRONMENT
 .TP
 .B JPEGMEM
diff --git a/example.c b/example.c
index cb8c64a..577f061 100644
--- a/example.c
+++ b/example.c
@@ -83,6 +83,8 @@
    * (see the second half of this file for an example).  But here we just
    * take the easy way out and use the standard error handler, which will
    * print a message on stderr and call exit() if compression fails.
+   * Note that this struct must live as long as the main JPEG parameter
+   * struct, to avoid dangling-pointer problems.
    */
   struct jpeg_error_mgr jerr;
   /* More stuff */
@@ -281,7 +283,10 @@
    * working space (which is allocated as needed by the JPEG library).
    */
   struct jpeg_decompress_struct cinfo;
-  /* We use our private extension JPEG error handler. */
+  /* We use our private extension JPEG error handler.
+   * Note that this struct must live as long as the main JPEG parameter
+   * struct, to avoid dangling-pointer problems.
+   */
   struct my_error_mgr jerr;
   /* More stuff */
   FILE * infile;		/* source file */
diff --git a/install.doc b/install.doc
index 827b9d3..3235c6c 100644
--- a/install.doc
+++ b/install.doc
@@ -1,6 +1,6 @@
 INSTALLATION INSTRUCTIONS for the Independent JPEG Group's JPEG software
 
-Copyright (C) 1991-1994, Thomas G. Lane.
+Copyright (C) 1991-1995, Thomas G. Lane.
 This file is part of the Independent JPEG Group's software.
 For conditions of distribution and use, see the accompanying README file.
 
@@ -132,7 +132,7 @@
 makefile.manx	jconfig.manx	Amiga, Manx Aztec C
 makefile.sas	jconfig.sas	Amiga, SAS C
 mak*jpeg.st	jconfig.st	Atari ST/STE/TT, Pure C or Turbo C
-makefile.bcc	jconfig.bcc	MS-DOS, Borland C (Turbo C)
+makefile.bcc	jconfig.bcc	MS-DOS or OS/2, Borland C
 makefile.dj	jconfig.dj	MS-DOS, DJGPP (Delorie's port of GNU C)
 makefile.mc6	jconfig.mc6	MS-DOS, Microsoft C version 6.x and up
 makefile.mms	jconfig.vms	Digital VMS, with MMS software
@@ -393,6 +393,55 @@
 	    the directory containing the URT "librle.a" file (typically the
 	    "lib" subdirectory of the URT distribution).
 
+Support for 12-bit-deep pixel data:
+
+The JPEG standard allows either 8-bit or 12-bit data precision.  (For color,
+this means 8 or 12 bits per channel, of course.)  If you need to work with
+deeper than 8-bit data, you can compile the IJG code for 12-bit operation.
+To do so:
+  1. In jmorecfg.h, define BITS_IN_JSAMPLE as 12 rather than 8.
+  2. In jconfig.h, undefine BMP_SUPPORTED, RLE_SUPPORTED, and TARGA_SUPPORTED,
+     because the code for those formats doesn't handle 12-bit data and won't
+     even compile.  (The PPM code does work, as explained below.  The GIF
+     code works too; it scales 8-bit GIF data to and from 12-bit depth
+     automatically.)
+  3. Compile.  Don't expect "make test" to pass, since the supplied test
+     files are for 8-bit data.
+
+Currently, 12-bit support does not work on 16-bit-int machines.
+
+Note that a 12-bit version will not read 8-bit JPEG files, nor vice versa;
+so you'll want to keep around a regular 8-bit compilation as well.
+(Run-time selection of data depth, to allow a single copy that does both,
+is possible but would probably slow things down considerably; it's very low
+on our to-do list.)
+
+The PPM reader (rdppm.c) can read 12-bit data from either text-format or
+binary-format PPM and PGM files.  Binary-format PPM/PGM files which have a
+maxval greater than 255 are assumed to use 2 bytes per sample, LSB first
+(little-endian order).  As of early 1995, 2-byte binary format is not
+officially supported by the PBMPLUS library, but it is expected that the
+next release of PBMPLUS will support it.  Note that the PPM reader will
+read files of any maxval regardless of the BITS_IN_JSAMPLE setting; incoming
+data is automatically rescaled to either maxval=255 or maxval=4095 as
+appropriate for the cjpeg bit depth.
+
+The PPM writer (wrppm.c) will normally write 2-byte binary PPM or PGM
+format, maxval 4095, when compiled with BITS_IN_JSAMPLE=12.  Since this
+format is not yet widely supported, you can disable it by compiling wrppm.c
+with PPM_NORAWWORD defined; then the data is scaled down to 8 bits to make a
+standard 1-byte/sample PPM or PGM file.  (Yes, this means still another copy
+of djpeg to keep around.  But hopefully you won't need it for very long.
+Poskanzer's supposed to get that new PBMPLUS release out Real Soon Now.)
+
+Of course, if you are working with 12-bit data, you probably have it stored
+in some other, nonstandard format.  In that case you'll probably want to
+write your own I/O modules to read and write your format.
+
+Note that a 12-bit version of cjpeg always runs in "-optimize" mode, in
+order to generate valid Huffman tables.  This is necessary because our
+default Huffman tables only cover 8-bit data.
+
 Removing code:
 
 If you need to make a smaller version of the JPEG software, some optional
@@ -738,6 +787,12 @@
 using configure, you should say
 	./configure RANLIB='ar -ts'
 
+On the MIPS R4000 architecture (Indy, etc.), the compiler option "-mips2"
+reportedly speeds up the float DCT method substantially, enough to make it
+faster than the default int method (but still slower than the fast int
+method).  If you use -mips2, you may want to alter the default DCT method to
+be float.  To do this, put "#define JDCT_DEFAULT JDCT_FLOAT" in jconfig.h.
+
 
 VMS:
 
diff --git a/jcapi.c b/jcapi.c
index d9ba37b..52dacd0 100644
--- a/jcapi.c
+++ b/jcapi.c
@@ -1,7 +1,7 @@
 /*
  * jcapi.c
  *
- * Copyright (C) 1994, Thomas G. Lane.
+ * Copyright (C) 1994-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -199,14 +199,14 @@
 
 /*
  * Alternate entry point to write raw data.
- * Processes exactly one iMCU row per call.
+ * Processes exactly one iMCU row per call, unless suspended.
  */
 
 GLOBAL JDIMENSION
 jpeg_write_raw_data (j_compress_ptr cinfo, JSAMPIMAGE data,
 		     JDIMENSION num_lines)
 {
-  JDIMENSION mcu_ctr, lines_per_MCU_row;
+  JDIMENSION lines_per_iMCU_row;
 
   if (cinfo->global_state != CSTATE_RAW_OK)
     ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
@@ -231,22 +231,19 @@
     (*cinfo->master->pass_startup) (cinfo);
 
   /* Verify that at least one iMCU row has been passed. */
-  lines_per_MCU_row = cinfo->max_v_samp_factor * DCTSIZE;
-  if (num_lines < lines_per_MCU_row)
+  lines_per_iMCU_row = cinfo->max_v_samp_factor * DCTSIZE;
+  if (num_lines < lines_per_iMCU_row)
     ERREXIT(cinfo, JERR_BUFFER_SIZE);
 
   /* Directly compress the row. */
-  mcu_ctr = 0;
-  (*cinfo->coef->compress_data) (cinfo, data, &mcu_ctr);
-  /* If compressor did not consume the whole row, then we must need to
-   * suspend processing; this is not currently supported.
-   */
-  if (mcu_ctr != cinfo->MCUs_per_row)
-    ERREXIT(cinfo, JERR_CANT_SUSPEND);
+  if (! (*cinfo->coef->compress_data) (cinfo, data)) {
+    /* If compressor did not consume the whole row, suspend processing. */
+    return 0;
+  }
 
   /* OK, we processed one iMCU row. */
-  cinfo->next_scanline += lines_per_MCU_row;
-  return lines_per_MCU_row;
+  cinfo->next_scanline += lines_per_iMCU_row;
+  return lines_per_iMCU_row;
 }
 
 
@@ -260,7 +257,7 @@
 GLOBAL void
 jpeg_finish_compress (j_compress_ptr cinfo)
 {
-  JDIMENSION iMCU_row, mcu_ctr;
+  JDIMENSION iMCU_row;
 
   if (cinfo->global_state != CSTATE_SCANNING && 
       cinfo->global_state != CSTATE_RAW_OK)
@@ -281,9 +278,7 @@
       /* We bypass the main controller and invoke coef controller directly;
        * all work is being done from the coefficient buffer.
        */
-      mcu_ctr = 0;
-      (*cinfo->coef->compress_data) (cinfo, (JSAMPIMAGE) NULL, &mcu_ctr);
-      if (mcu_ctr != cinfo->MCUs_per_row)
+      if (! (*cinfo->coef->compress_data) (cinfo, (JSAMPIMAGE) NULL))
 	ERREXIT(cinfo, JERR_CANT_SUSPEND);
     }
     (*cinfo->master->finish_pass) (cinfo);
diff --git a/jccoefct.c b/jccoefct.c
index 2ca1f37..0de9fcf 100644
--- a/jccoefct.c
+++ b/jccoefct.c
@@ -1,7 +1,7 @@
 /*
  * jccoefct.c
  *
- * Copyright (C) 1994, Thomas G. Lane.
+ * Copyright (C) 1994-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -34,7 +34,10 @@
 typedef struct {
   struct jpeg_c_coef_controller pub; /* public fields */
 
-  JDIMENSION MCU_row_num;	/* keep track of MCU row # within image */
+  JDIMENSION iMCU_row_num;	/* iMCU row # within image */
+  JDIMENSION mcu_ctr;		/* counts MCUs processed in current row */
+  int MCU_vert_offset;		/* counts MCU rows within iMCU row */
+  int MCU_rows_per_iMCU_row;	/* number of such rows needed */
 
   /* For single-pass compression, it's sufficient to buffer just one MCU
    * (although this may prove a bit slow in practice).  We allocate a
@@ -55,16 +58,40 @@
 
 
 /* Forward declarations */
-METHODDEF void compress_data
-    JPP((j_compress_ptr cinfo, JSAMPIMAGE input_buf, JDIMENSION *in_mcu_ctr));
+METHODDEF boolean compress_data
+    JPP((j_compress_ptr cinfo, JSAMPIMAGE input_buf));
 #ifdef FULL_COEF_BUFFER_SUPPORTED
-METHODDEF void compress_first_pass
-    JPP((j_compress_ptr cinfo, JSAMPIMAGE input_buf, JDIMENSION *in_mcu_ctr));
-METHODDEF void compress_output
-    JPP((j_compress_ptr cinfo, JSAMPIMAGE input_buf, JDIMENSION *in_mcu_ctr));
+METHODDEF boolean compress_first_pass
+    JPP((j_compress_ptr cinfo, JSAMPIMAGE input_buf));
+METHODDEF boolean compress_output
+    JPP((j_compress_ptr cinfo, JSAMPIMAGE input_buf));
 #endif
 
 
+LOCAL void
+start_iMCU_row (j_compress_ptr cinfo)
+/* Reset within-iMCU-row counters for a new row */
+{
+  my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
+
+  /* In an interleaved scan, an MCU row is the same as an iMCU row.
+   * In a noninterleaved scan, an iMCU row has v_samp_factor MCU rows.
+   * But at the bottom of the image, process only what's left.
+   */
+  if (cinfo->comps_in_scan > 1) {
+    coef->MCU_rows_per_iMCU_row = 1;
+  } else {
+    if (coef->iMCU_row_num < (cinfo->total_iMCU_rows-1))
+      coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->v_samp_factor;
+    else
+      coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->last_row_height;
+  }
+
+  coef->mcu_ctr = 0;
+  coef->MCU_vert_offset = 0;
+}
+
+
 /*
  * Initialize for a processing pass.
  */
@@ -74,7 +101,8 @@
 {
   my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
 
-  coef->MCU_row_num = 0;
+  coef->iMCU_row_num = 0;
+  start_iMCU_row(cinfo);
 
   switch (pass_mode) {
   case JBUF_PASS_THRU:
@@ -103,78 +131,89 @@
 
 /*
  * Process some data in the single-pass case.
- * Up to one MCU row is processed (less if suspension is forced).
+ * We process the equivalent of one fully interleaved MCU row ("iMCU" row)
+ * per call, ie, v_samp_factor block rows for each component in the image.
+ * Returns TRUE if the iMCU row is completed, FALSE if suspended.
  *
  * NB: input_buf contains a plane for each component in image.
  * For single pass, this is the same as the components in the scan.
  */
 
-METHODDEF void
-compress_data (j_compress_ptr cinfo,
-	       JSAMPIMAGE input_buf, JDIMENSION *in_mcu_ctr)
+METHODDEF boolean
+compress_data (j_compress_ptr cinfo, JSAMPIMAGE input_buf)
 {
   my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
   JDIMENSION MCU_col_num;	/* index of current MCU within row */
   JDIMENSION last_MCU_col = cinfo->MCUs_per_row - 1;
-  JDIMENSION last_MCU_row = cinfo->MCU_rows_in_scan - 1;
-  int blkn, bi, ci, yindex, blockcnt;
+  JDIMENSION last_iMCU_row = cinfo->total_iMCU_rows - 1;
+  int blkn, bi, ci, yindex, yoffset, blockcnt;
   JDIMENSION ypos, xpos;
   jpeg_component_info *compptr;
 
-  /* Loop to write as much as one whole MCU row */
-
-  for (MCU_col_num = *in_mcu_ctr; MCU_col_num <= last_MCU_col; MCU_col_num++) {
-    /* Determine where data comes from in input_buf and do the DCT thing.
-     * Each call on forward_DCT processes a horizontal row of DCT blocks
-     * as wide as an MCU; we rely on having allocated the MCU_buffer[] blocks
-     * sequentially.  Dummy blocks at the right or bottom edge are filled in
-     * specially.  The data in them does not matter for image reconstruction,
-     * so we fill them with values that will encode to the smallest amount of
-     * data, viz: all zeroes in the AC entries, DC entries equal to previous
-     * block's DC value.  (Thanks to Thomas Kinsman for this idea.)
-     */
-    blkn = 0;
-    for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
-      compptr = cinfo->cur_comp_info[ci];
-      blockcnt = (MCU_col_num < last_MCU_col) ? compptr->MCU_width
-					      : compptr->last_col_width;
-      xpos = MCU_col_num * compptr->MCU_sample_width;
-      ypos = 0;
-      for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
-	if (coef->MCU_row_num < last_MCU_row ||
-	    yindex < compptr->last_row_height) {
-	  (*cinfo->fdct->forward_DCT) (cinfo, compptr,
-				       input_buf[ci], coef->MCU_buffer[blkn],
-				       ypos, xpos, (JDIMENSION) blockcnt);
-	  if (blockcnt < compptr->MCU_width) {
-	    /* Create some dummy blocks at the right edge of the image. */
-	    jzero_far((void FAR *) coef->MCU_buffer[blkn + blockcnt],
-		      (compptr->MCU_width - blockcnt) * SIZEOF(JBLOCK));
-	    for (bi = blockcnt; bi < compptr->MCU_width; bi++) {
-	      coef->MCU_buffer[blkn+bi][0][0] = coef->MCU_buffer[blkn+bi-1][0][0];
+  /* Loop to write as much as one whole iMCU row */
+  for (yoffset = coef->MCU_vert_offset; yoffset < coef->MCU_rows_per_iMCU_row;
+       yoffset++) {
+    for (MCU_col_num = coef->mcu_ctr; MCU_col_num <= last_MCU_col;
+	 MCU_col_num++) {
+      /* Determine where data comes from in input_buf and do the DCT thing.
+       * Each call on forward_DCT processes a horizontal row of DCT blocks
+       * as wide as an MCU; we rely on having allocated the MCU_buffer[] blocks
+       * sequentially.  Dummy blocks at the right or bottom edge are filled in
+       * specially.  The data in them does not matter for image reconstruction,
+       * so we fill them with values that will encode to the smallest amount of
+       * data, viz: all zeroes in the AC entries, DC entries equal to previous
+       * block's DC value.  (Thanks to Thomas Kinsman for this idea.)
+       */
+      blkn = 0;
+      for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
+	compptr = cinfo->cur_comp_info[ci];
+	blockcnt = (MCU_col_num < last_MCU_col) ? compptr->MCU_width
+						: compptr->last_col_width;
+	xpos = MCU_col_num * compptr->MCU_sample_width;
+	ypos = yoffset * DCTSIZE; /* ypos == (yoffset+yindex) * DCTSIZE */
+	for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
+	  if (coef->iMCU_row_num < last_iMCU_row ||
+	      yoffset+yindex < compptr->last_row_height) {
+	    (*cinfo->fdct->forward_DCT) (cinfo, compptr,
+					 input_buf[ci], coef->MCU_buffer[blkn],
+					 ypos, xpos, (JDIMENSION) blockcnt);
+	    if (blockcnt < compptr->MCU_width) {
+	      /* Create some dummy blocks at the right edge of the image. */
+	      jzero_far((void FAR *) coef->MCU_buffer[blkn + blockcnt],
+			(compptr->MCU_width - blockcnt) * SIZEOF(JBLOCK));
+	      for (bi = blockcnt; bi < compptr->MCU_width; bi++) {
+		coef->MCU_buffer[blkn+bi][0][0] = coef->MCU_buffer[blkn+bi-1][0][0];
+	      }
+	    }
+	  } else {
+	    /* Create a row of dummy blocks at the bottom of the image. */
+	    jzero_far((void FAR *) coef->MCU_buffer[blkn],
+		      compptr->MCU_width * SIZEOF(JBLOCK));
+	    for (bi = 0; bi < compptr->MCU_width; bi++) {
+	      coef->MCU_buffer[blkn+bi][0][0] = coef->MCU_buffer[blkn-1][0][0];
 	    }
 	  }
-	} else {
-	  /* Create a whole row of dummy blocks at the bottom of the image. */
-	  jzero_far((void FAR *) coef->MCU_buffer[blkn],
-		    compptr->MCU_width * SIZEOF(JBLOCK));
-	  for (bi = 0; bi < compptr->MCU_width; bi++) {
-	    coef->MCU_buffer[blkn+bi][0][0] = coef->MCU_buffer[blkn-1][0][0];
-	  }
+	  blkn += compptr->MCU_width;
+	  ypos += DCTSIZE;
 	}
-	blkn += compptr->MCU_width;
-	ypos += DCTSIZE;
+      }
+      /* Try to write the MCU.  In event of a suspension failure, we will
+       * re-DCT the MCU on restart (a bit inefficient, could be fixed...)
+       */
+      if (! (*cinfo->entropy->encode_mcu) (cinfo, coef->MCU_buffer)) {
+	/* Suspension forced; update state counters and exit */
+	coef->MCU_vert_offset = yoffset;
+	coef->mcu_ctr = MCU_col_num;
+	return FALSE;
       }
     }
-    /* Try to write the MCU.  In event of a suspension failure, we will
-     * re-DCT the MCU on restart (a bit inefficient, could be fixed...)
-     */
-    if (! (*cinfo->entropy->encode_mcu) (cinfo, coef->MCU_buffer))
-      break;			/* suspension forced; exit loop */
+    /* Completed an MCU row, but perhaps not an iMCU row */
+    coef->mcu_ctr = 0;
   }
-  if (MCU_col_num > last_MCU_col)
-    coef->MCU_row_num++;	/* advance if we finished the row */
-  *in_mcu_ctr = MCU_col_num;
+  /* Completed the iMCU row, advance counters for next one */
+  coef->iMCU_row_num++;
+  start_iMCU_row(cinfo);
+  return TRUE;
 }
 
 
@@ -201,12 +240,11 @@
  * at the scan-dependent variables (MCU dimensions, etc).
  */
 
-METHODDEF void
-compress_first_pass (j_compress_ptr cinfo,
-		     JSAMPIMAGE input_buf, JDIMENSION *in_mcu_ctr)
+METHODDEF boolean
+compress_first_pass (j_compress_ptr cinfo, JSAMPIMAGE input_buf)
 {
   my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
-  JDIMENSION last_MCU_row = cinfo->total_iMCU_rows - 1;
+  JDIMENSION last_iMCU_row = cinfo->total_iMCU_rows - 1;
   JDIMENSION blocks_across, MCUs_across, MCUindex;
   int bi, ci, h_samp_factor, block_row, block_rows, ndummy;
   JCOEF lastDC;
@@ -219,11 +257,12 @@
     /* Align the virtual buffer for this component. */
     buffer = (*cinfo->mem->access_virt_barray)
       ((j_common_ptr) cinfo, coef->whole_image[ci],
-       coef->MCU_row_num * compptr->v_samp_factor, TRUE);
+       coef->iMCU_row_num * compptr->v_samp_factor, TRUE);
     /* Count non-dummy DCT block rows in this iMCU row. */
-    if (coef->MCU_row_num < last_MCU_row)
+    if (coef->iMCU_row_num < last_iMCU_row)
       block_rows = compptr->v_samp_factor;
     else {
+      /* NB: can't use last_row_height here, since may not be set! */
       block_rows = (int) (compptr->height_in_blocks % compptr->v_samp_factor);
       if (block_rows == 0) block_rows = compptr->v_samp_factor;
     }
@@ -257,7 +296,7 @@
      * of the dummy blocks to match the last real block's DC value.
      * This squeezes a few more bytes out of the resulting file...
      */
-    if (coef->MCU_row_num == last_MCU_row) {
+    if (coef->iMCU_row_num == last_iMCU_row) {
       blocks_across += ndummy;	/* include lower right corner */
       MCUs_across = blocks_across / h_samp_factor;
       for (block_row = block_rows; block_row < compptr->v_samp_factor;
@@ -277,10 +316,12 @@
       }
     }
   }
-  /* NB: compress_output will increment MCU_row_num */
+  /* NB: compress_output will increment iMCU_row_num if successful.
+   * A suspension return will result in redoing all the work above next time.
+   */
 
   /* Emit data to the entropy encoder, sharing code with subsequent passes */
-  compress_output(cinfo, input_buf, in_mcu_ctr);
+  return compress_output(cinfo, input_buf);
 }
 
 
@@ -289,22 +330,18 @@
  * We process the equivalent of one fully interleaved MCU row ("iMCU" row)
  * per call, ie, v_samp_factor block rows for each component in the scan.
  * The data is obtained from the virtual arrays and fed to the entropy coder.
- *
- * Note that output suspension is not supported during multi-pass operation,
- * so the complete MCU row will always be emitted to the entropy encoder
- * before returning.
+ * Returns TRUE if the iMCU row is completed, FALSE if suspended.
  *
  * NB: input_buf is ignored; it is likely to be a NULL pointer.
  */
 
-METHODDEF void
-compress_output (j_compress_ptr cinfo,
-		 JSAMPIMAGE input_buf, JDIMENSION *in_mcu_ctr)
+METHODDEF boolean
+compress_output (j_compress_ptr cinfo, JSAMPIMAGE input_buf)
 {
   my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
   JDIMENSION MCU_col_num;	/* index of current MCU within row */
-  int blkn, ci, xindex, yindex, yoffset, num_MCU_rows;
-  JDIMENSION remaining_rows, start_col;
+  int blkn, ci, xindex, yindex, yoffset;
+  JDIMENSION start_col;
   JBLOCKARRAY buffer[MAX_COMPS_IN_SCAN];
   JBLOCKROW buffer_ptr;
   jpeg_component_info *compptr;
@@ -317,28 +354,14 @@
     compptr = cinfo->cur_comp_info[ci];
     buffer[ci] = (*cinfo->mem->access_virt_barray)
       ((j_common_ptr) cinfo, coef->whole_image[compptr->component_index],
-       coef->MCU_row_num * compptr->v_samp_factor, FALSE);
-  }
-
-  /* In an interleaved scan, we process exactly one MCU row.
-   * In a noninterleaved scan, we need to process v_samp_factor MCU rows,
-   * each of which contains a single block row.
-   */
-  if (cinfo->comps_in_scan == 1) {
-    compptr = cinfo->cur_comp_info[0];
-    num_MCU_rows = compptr->v_samp_factor;
-    /* but watch out for the bottom of the image */
-    remaining_rows = cinfo->MCU_rows_in_scan -
-		     coef->MCU_row_num * compptr->v_samp_factor;
-    if (remaining_rows < (JDIMENSION) num_MCU_rows)
-      num_MCU_rows = (int) remaining_rows;
-  } else {
-    num_MCU_rows = 1;
+       coef->iMCU_row_num * compptr->v_samp_factor, FALSE);
   }
 
   /* Loop to process one whole iMCU row */
-  for (yoffset = 0; yoffset < num_MCU_rows; yoffset++) {
-    for (MCU_col_num = 0; MCU_col_num < cinfo->MCUs_per_row; MCU_col_num++) {
+  for (yoffset = coef->MCU_vert_offset; yoffset < coef->MCU_rows_per_iMCU_row;
+       yoffset++) {
+    for (MCU_col_num = coef->mcu_ctr; MCU_col_num < cinfo->MCUs_per_row;
+	 MCU_col_num++) {
       /* Construct list of pointers to DCT blocks belonging to this MCU */
       blkn = 0;			/* index of current DCT block within MCU */
       for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
@@ -353,13 +376,19 @@
       }
       /* Try to write the MCU. */
       if (! (*cinfo->entropy->encode_mcu) (cinfo, coef->MCU_buffer)) {
-	ERREXIT(cinfo, JERR_CANT_SUSPEND); /* not supported */
+	/* Suspension forced; update state counters and exit */
+	coef->MCU_vert_offset = yoffset;
+	coef->mcu_ctr = MCU_col_num;
+	return FALSE;
       }
     }
+    /* Completed an MCU row, but perhaps not an iMCU row */
+    coef->mcu_ctr = 0;
   }
-
-  coef->MCU_row_num++;		/* advance to next iMCU row */
-  *in_mcu_ctr = cinfo->MCUs_per_row;
+  /* Completed the iMCU row, advance counters for next one */
+  coef->iMCU_row_num++;
+  start_iMCU_row(cinfo);
+  return TRUE;
 }
 
 #endif /* FULL_COEF_BUFFER_SUPPORTED */
diff --git a/jcmainct.c b/jcmainct.c
index 5359268..c7315c9 100644
--- a/jcmainct.c
+++ b/jcmainct.c
@@ -1,7 +1,7 @@
 /*
  * jcmainct.c
  *
- * Copyright (C) 1994, Thomas G. Lane.
+ * Copyright (C) 1994-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -28,9 +28,8 @@
 typedef struct {
   struct jpeg_c_main_controller pub; /* public fields */
 
-  JDIMENSION cur_mcu_row;	/* number of current iMCU row */
+  JDIMENSION cur_iMCU_row;	/* number of current iMCU row */
   JDIMENSION rowgroup_ctr;	/* counts row groups received in iMCU row */
-  JDIMENSION mcu_ctr;		/* counts MCUs output from current row */
   boolean suspended;		/* remember if we suspended output */
   J_BUF_MODE pass_mode;		/* current operating mode */
 
@@ -75,9 +74,8 @@
   if (cinfo->raw_data_in)
     return;
 
-  main->cur_mcu_row = 0;	/* initialize counters */
+  main->cur_iMCU_row = 0;	/* initialize counters */
   main->rowgroup_ctr = 0;
-  main->mcu_ctr = 0;
   main->suspended = FALSE;
   main->pass_mode = pass_mode;	/* save mode for use by process_data */
 
@@ -118,7 +116,7 @@
 {
   my_main_ptr main = (my_main_ptr) cinfo->main;
 
-  while (main->cur_mcu_row < cinfo->total_iMCU_rows) {
+  while (main->cur_iMCU_row < cinfo->total_iMCU_rows) {
     /* Read input data if we haven't filled the main buffer yet */
     if (main->rowgroup_ctr < DCTSIZE)
       (*cinfo->prep->pre_process_data) (cinfo,
@@ -134,15 +132,13 @@
       return;
 
     /* Send the completed row to the compressor */
-    (*cinfo->coef->compress_data) (cinfo, main->buffer, &main->mcu_ctr);
-
-    /* If compressor did not consume the whole row, then we must need to
-     * suspend processing and return to the application.  In this situation
-     * we pretend we didn't yet consume the last input row; otherwise, if
-     * it happened to be the last row of the image, the application would
-     * think we were done.
-     */
-    if (main->mcu_ctr < cinfo->MCUs_per_row) {
+    if (! (*cinfo->coef->compress_data) (cinfo, main->buffer)) {
+      /* If compressor did not consume the whole row, then we must need to
+       * suspend processing and return to the application.  In this situation
+       * we pretend we didn't yet consume the last input row; otherwise, if
+       * it happened to be the last row of the image, the application would
+       * think we were done.
+       */
       if (! main->suspended) {
 	(*in_row_ctr)--;
 	main->suspended = TRUE;
@@ -156,9 +152,8 @@
       (*in_row_ctr)++;
       main->suspended = FALSE;
     }
-    main->mcu_ctr = 0;
     main->rowgroup_ctr = 0;
-    main->cur_mcu_row++;
+    main->cur_iMCU_row++;
   }
 }
 
@@ -180,14 +175,14 @@
   jpeg_component_info *compptr;
   boolean writing = (main->pass_mode != JBUF_CRANK_DEST);
 
-  while (main->cur_mcu_row < cinfo->total_iMCU_rows) {
+  while (main->cur_iMCU_row < cinfo->total_iMCU_rows) {
     /* Realign the virtual buffers if at the start of an iMCU row. */
     if (main->rowgroup_ctr == 0) {
       for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
 	   ci++, compptr++) {
 	main->buffer[ci] = (*cinfo->mem->access_virt_sarray)
 	  ((j_common_ptr) cinfo, main->whole_image[ci],
-	   main->cur_mcu_row * (compptr->v_samp_factor * DCTSIZE), writing);
+	   main->cur_iMCU_row * (compptr->v_samp_factor * DCTSIZE), writing);
       }
       /* In a read pass, pretend we just read some source data. */
       if (! writing) {
@@ -210,14 +205,13 @@
 
     /* Emit data, unless this is a sink-only pass. */
     if (main->pass_mode != JBUF_SAVE_SOURCE) {
-      (*cinfo->coef->compress_data) (cinfo, main->buffer, &main->mcu_ctr);
-      /* If compressor did not consume the whole row, then we must need to
-       * suspend processing and return to the application.  In this situation
-       * we pretend we didn't yet consume the last input row; otherwise, if
-       * it happened to be the last row of the image, the application would
-       * think we were done.
-       */
-      if (main->mcu_ctr < cinfo->MCUs_per_row) {
+      if (! (*cinfo->coef->compress_data) (cinfo, main->buffer)) {
+	/* If compressor did not consume the whole row, then we must need to
+	 * suspend processing and return to the application.  In this situation
+	 * we pretend we didn't yet consume the last input row; otherwise, if
+	 * it happened to be the last row of the image, the application would
+	 * think we were done.
+	 */
 	if (! main->suspended) {
 	  (*in_row_ctr)--;
 	  main->suspended = TRUE;
@@ -234,9 +228,8 @@
     }
 
     /* If get here, we are done with this iMCU row.  Mark buffer empty. */
-    main->mcu_ctr = 0;
     main->rowgroup_ctr = 0;
-    main->cur_mcu_row++;
+    main->cur_iMCU_row++;
   }
 }
 
diff --git a/jcmaster.c b/jcmaster.c
index d9b5c03..6e6a11a 100644
--- a/jcmaster.c
+++ b/jcmaster.c
@@ -1,7 +1,7 @@
 /*
  * jcmaster.c
  *
- * Copyright (C) 1991-1994, Thomas G. Lane.
+ * Copyright (C) 1991-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -134,7 +134,12 @@
     compptr->MCU_blocks = 1;
     compptr->MCU_sample_width = DCTSIZE;
     compptr->last_col_width = 1;
-    compptr->last_row_height = 1;
+    /* For noninterleaved scans, it is convenient to define last_row_height
+     * as the number of block rows present in the last iMCU row.
+     */
+    tmp = (int) (compptr->height_in_blocks % compptr->v_samp_factor);
+    if (tmp == 0) tmp = compptr->v_samp_factor;
+    compptr->last_row_height = tmp;
     
     /* Prepare array describing MCU composition */
     cinfo->blocks_in_MCU = 1;
diff --git a/jconfig.bcc b/jconfig.bcc
index 3271ee6..c6c53ff 100644
--- a/jconfig.bcc
+++ b/jconfig.bcc
@@ -1,4 +1,4 @@
-/* jconfig.bcc --- jconfig.h for Borland C (Turbo C) on MS-DOS. */
+/* jconfig.bcc --- jconfig.h for Borland C (Turbo C) on MS-DOS or OS/2. */
 /* see jconfig.doc for explanations */
 
 #define HAVE_PROTOTYPES
@@ -11,19 +11,21 @@
 #define HAVE_STDLIB_H
 #undef NEED_BSD_STRINGS
 #undef NEED_SYS_TYPES_H
+#ifdef __MSDOS__
 #define NEED_FAR_POINTERS	/* for small or medium memory model */
+#endif
 #undef NEED_SHORT_EXTERNAL_NAMES
-#define INCOMPLETE_TYPES_BROKEN	/* suppress undefined-structure warnings */
+#undef INCOMPLETE_TYPES_BROKEN	/* this assumes you have -w-stu in CFLAGS */
 
 #ifdef JPEG_INTERNALS
 
 #undef RIGHT_SHIFT_IS_UNSIGNED
 
+#ifdef __MSDOS__
 #define USE_MSDOS_MEMMGR	/* Define this if you use jmemdos.c */
-
 #define MAX_ALLOC_CHUNK 65520L	/* Maximum request to malloc() */
-
 #define USE_FMEM		/* Borland has _fmemcpy() and _fmemset() */
+#endif
 
 #endif /* JPEG_INTERNALS */
 
@@ -37,7 +39,9 @@
 
 #define TWO_FILE_COMMANDLINE
 #define USE_SETMODE		/* Borland has setmode() */
+#ifdef __MSDOS__
 #define NEED_SIGNAL_CATCHER	/* Define this if you use jmemdos.c */
+#endif
 #undef DONT_USE_B_MODE
 #undef PROGRESS_REPORT		/* optional */
 
diff --git a/jdapi.c b/jdapi.c
index b34b701..df399dd 100644
--- a/jdapi.c
+++ b/jdapi.c
@@ -1,7 +1,7 @@
 /*
  * jdapi.c
  *
- * Copyright (C) 1994, Thomas G. Lane.
+ * Copyright (C) 1994-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -345,14 +345,14 @@
 
 /*
  * Alternate entry point to read raw data.
- * Processes exactly one MCU row per call.
+ * Processes exactly one iMCU row per call, unless suspended.
  */
 
 GLOBAL JDIMENSION
 jpeg_read_raw_data (j_decompress_ptr cinfo, JSAMPIMAGE data,
 		    JDIMENSION max_lines)
 {
-  JDIMENSION lines_per_MCU_row;
+  JDIMENSION lines_per_iMCU_row;
 
   if (cinfo->global_state != DSTATE_RAW_OK)
     ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
@@ -368,18 +368,18 @@
     (*cinfo->progress->progress_monitor) ((j_common_ptr) cinfo);
   }
 
-  /* Verify that at least one MCU row can be returned. */
-  lines_per_MCU_row = cinfo->max_v_samp_factor * cinfo->min_DCT_scaled_size;
-  if (max_lines < lines_per_MCU_row)
+  /* Verify that at least one iMCU row can be returned. */
+  lines_per_iMCU_row = cinfo->max_v_samp_factor * cinfo->min_DCT_scaled_size;
+  if (max_lines < lines_per_iMCU_row)
     ERREXIT(cinfo, JERR_BUFFER_SIZE);
 
   /* Decompress directly into user's buffer. */
   if (! (*cinfo->coef->decompress_data) (cinfo, data))
     return 0;			/* suspension forced, can do nothing more */
 
-  /* OK, we processed one MCU row. */
-  cinfo->output_scanline += lines_per_MCU_row;
-  return lines_per_MCU_row;
+  /* OK, we processed one iMCU row. */
+  cinfo->output_scanline += lines_per_iMCU_row;
+  return lines_per_iMCU_row;
 }
 
 
diff --git a/jdcoefct.c b/jdcoefct.c
index 19790c5..03f6690 100644
--- a/jdcoefct.c
+++ b/jdcoefct.c
@@ -1,7 +1,7 @@
 /*
  * jdcoefct.c
  *
- * Copyright (C) 1994, Thomas G. Lane.
+ * Copyright (C) 1994-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -20,8 +20,10 @@
 typedef struct {
   struct jpeg_d_coef_controller pub; /* public fields */
 
-  JDIMENSION MCU_col_num;	/* saves next MCU column to process */
-  JDIMENSION MCU_row_num;	/* keep track of MCU row # within image */
+  JDIMENSION iMCU_row_num;	/* iMCU row # within image */
+  JDIMENSION mcu_ctr;		/* counts MCUs processed in current row */
+  int MCU_vert_offset;		/* counts MCU rows within iMCU row */
+  int MCU_rows_per_iMCU_row;	/* number of such rows needed */
 
   /* In single-pass modes without block smoothing, it's sufficient to buffer
    * just one MCU (although this may prove a bit slow in practice).
@@ -53,6 +55,30 @@
 #endif
 
 
+LOCAL void
+start_iMCU_row (j_decompress_ptr cinfo)
+/* Reset within-iMCU-row counters for a new row */
+{
+  my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
+
+  /* In an interleaved scan, an MCU row is the same as an iMCU row.
+   * In a noninterleaved scan, an iMCU row has v_samp_factor MCU rows.
+   * But at the bottom of the image, process only what's left.
+   */
+  if (cinfo->comps_in_scan > 1) {
+    coef->MCU_rows_per_iMCU_row = 1;
+  } else {
+    if (coef->iMCU_row_num < (cinfo->total_iMCU_rows-1))
+      coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->v_samp_factor;
+    else
+      coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->last_row_height;
+  }
+
+  coef->mcu_ctr = 0;
+  coef->MCU_vert_offset = 0;
+}
+
+
 /*
  * Initialize for a processing pass.
  */
@@ -62,8 +88,8 @@
 {
   my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
 
-  coef->MCU_col_num = 0;
-  coef->MCU_row_num = 0;
+  coef->iMCU_row_num = 0;
+  start_iMCU_row(cinfo);
 
   switch (pass_mode) {
   case JBUF_PASS_THRU:
@@ -105,65 +131,67 @@
   my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
   JDIMENSION MCU_col_num;	/* index of current MCU within row */
   JDIMENSION last_MCU_col = cinfo->MCUs_per_row - 1;
-  JDIMENSION last_MCU_row = cinfo->MCU_rows_in_scan - 1;
-  int blkn, ci, xindex, yindex, useful_width;
+  JDIMENSION last_iMCU_row = cinfo->total_iMCU_rows - 1;
+  int blkn, ci, xindex, yindex, yoffset, useful_width;
   JSAMPARRAY output_ptr;
   JDIMENSION start_col, output_col;
   jpeg_component_info *compptr;
   inverse_DCT_method_ptr inverse_DCT;
 
-  /* Loop to process as much as one whole MCU row */
-
-  for (MCU_col_num = coef->MCU_col_num; MCU_col_num <= last_MCU_col;
-       MCU_col_num++) {
-
-    /* Try to fetch an MCU.  Entropy decoder expects buffer to be zeroed. */
-    jzero_far((void FAR *) coef->MCU_buffer[0],
-	      (size_t) (cinfo->blocks_in_MCU * SIZEOF(JBLOCK)));
-    if (! (*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) {
-      /* Suspension forced; return with row unfinished */
-      coef->MCU_col_num = MCU_col_num; /* update my state */
-      return FALSE;
-    }
-
-    /* Determine where data should go in output_buf and do the IDCT thing.
-     * We skip dummy blocks at the right and bottom edges (but blkn gets
-     * incremented past them!).  Note the inner loop relies on having
-     * allocated the MCU_buffer[] blocks sequentially.
-     */
-    blkn = 0;			/* index of current DCT block within MCU */
-    for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
-      compptr = cinfo->cur_comp_info[ci];
-      /* Don't bother to IDCT an uninteresting component. */
-      if (! compptr->component_needed) {
-	blkn += compptr->MCU_blocks;
-	continue;
+  /* Loop to process as much as one whole iMCU row */
+  for (yoffset = coef->MCU_vert_offset; yoffset < coef->MCU_rows_per_iMCU_row;
+       yoffset++) {
+    for (MCU_col_num = coef->mcu_ctr; MCU_col_num <= last_MCU_col;
+	 MCU_col_num++) {
+      /* Try to fetch an MCU.  Entropy decoder expects buffer to be zeroed. */
+      jzero_far((void FAR *) coef->MCU_buffer[0],
+		(size_t) (cinfo->blocks_in_MCU * SIZEOF(JBLOCK)));
+      if (! (*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) {
+	/* Suspension forced; update state counters and exit */
+	coef->MCU_vert_offset = yoffset;
+	coef->mcu_ctr = MCU_col_num;
+	return FALSE;
       }
-      inverse_DCT = cinfo->idct->inverse_DCT[compptr->component_index];
-      useful_width = (MCU_col_num < last_MCU_col) ? compptr->MCU_width
-						  : compptr->last_col_width;
-      output_ptr = output_buf[ci];
-      start_col = MCU_col_num * compptr->MCU_sample_width;
-      for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
-	if (coef->MCU_row_num < last_MCU_row ||
-	    yindex < compptr->last_row_height) {
-	  output_col = start_col;
-	  for (xindex = 0; xindex < useful_width; xindex++) {
-	    (*inverse_DCT) (cinfo, compptr,
-			    (JCOEFPTR) coef->MCU_buffer[blkn+xindex],
-			    output_ptr, output_col);
-	    output_col += compptr->DCT_scaled_size;
-	  }
+      /* Determine where data should go in output_buf and do the IDCT thing.
+       * We skip dummy blocks at the right and bottom edges (but blkn gets
+       * incremented past them!).  Note the inner loop relies on having
+       * allocated the MCU_buffer[] blocks sequentially.
+       */
+      blkn = 0;			/* index of current DCT block within MCU */
+      for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
+	compptr = cinfo->cur_comp_info[ci];
+	/* Don't bother to IDCT an uninteresting component. */
+	if (! compptr->component_needed) {
+	  blkn += compptr->MCU_blocks;
+	  continue;
 	}
-	blkn += compptr->MCU_width;
-	output_ptr += compptr->DCT_scaled_size;
+	inverse_DCT = cinfo->idct->inverse_DCT[compptr->component_index];
+	useful_width = (MCU_col_num < last_MCU_col) ? compptr->MCU_width
+						    : compptr->last_col_width;
+	output_ptr = output_buf[ci] + yoffset * compptr->DCT_scaled_size;
+	start_col = MCU_col_num * compptr->MCU_sample_width;
+	for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
+	  if (coef->iMCU_row_num < last_iMCU_row ||
+	      yoffset+yindex < compptr->last_row_height) {
+	    output_col = start_col;
+	    for (xindex = 0; xindex < useful_width; xindex++) {
+	      (*inverse_DCT) (cinfo, compptr,
+			      (JCOEFPTR) coef->MCU_buffer[blkn+xindex],
+			      output_ptr, output_col);
+	      output_col += compptr->DCT_scaled_size;
+	    }
+	  }
+	  blkn += compptr->MCU_width;
+	  output_ptr += compptr->DCT_scaled_size;
+	}
       }
     }
+    /* Completed an MCU row, but perhaps not an iMCU row */
+    coef->mcu_ctr = 0;
   }
-
-  /* We finished the row successfully */
-  coef->MCU_col_num = 0;	/* prepare for next row */
-  coef->MCU_row_num++;
+  /* Completed the iMCU row, advance counters for next one */
+  coef->iMCU_row_num++;
+  start_iMCU_row(cinfo);
   return TRUE;
 }
 
@@ -175,9 +203,7 @@
  * We read the equivalent of one fully interleaved MCU row ("iMCU" row)
  * per call, ie, v_samp_factor block rows for each component in the scan.
  * No data is returned; we just stash it in the virtual arrays.
- *
  * Returns TRUE if it completed a row, FALSE if not (suspension).
- * Currently, the suspension case is not supported.
  */
 
 METHODDEF boolean
@@ -185,8 +211,8 @@
 {
   my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
   JDIMENSION MCU_col_num;	/* index of current MCU within row */
-  int blkn, ci, xindex, yindex, yoffset, num_MCU_rows;
-  JDIMENSION total_width, remaining_rows, start_col;
+  int blkn, ci, xindex, yindex, yoffset;
+  JDIMENSION total_width, start_col;
   JBLOCKARRAY buffer[MAX_COMPS_IN_SCAN];
   JBLOCKROW buffer_ptr;
   jpeg_component_info *compptr;
@@ -196,7 +222,7 @@
     compptr = cinfo->cur_comp_info[ci];
     buffer[ci] = (*cinfo->mem->access_virt_barray)
       ((j_common_ptr) cinfo, coef->whole_image[compptr->component_index],
-       coef->MCU_row_num * compptr->v_samp_factor, TRUE);
+       coef->iMCU_row_num * compptr->v_samp_factor, TRUE);
     /* Entropy decoder expects buffer to be zeroed. */
     total_width = (JDIMENSION) jround_up((long) compptr->width_in_blocks,
 					 (long) compptr->h_samp_factor);
@@ -206,25 +232,11 @@
     }
   }
 
-  /* In an interleaved scan, we process exactly one MCU row.
-   * In a noninterleaved scan, we need to process v_samp_factor MCU rows,
-   * each of which contains a single block row.
-   */
-  if (cinfo->comps_in_scan == 1) {
-    compptr = cinfo->cur_comp_info[0];
-    num_MCU_rows = compptr->v_samp_factor;
-    /* but watch out for the bottom of the image */
-    remaining_rows = cinfo->MCU_rows_in_scan -
-		     coef->MCU_row_num * compptr->v_samp_factor;
-    if (remaining_rows < (JDIMENSION) num_MCU_rows)
-      num_MCU_rows = (int) remaining_rows;
-  } else {
-    num_MCU_rows = 1;
-  }
-
   /* Loop to process one whole iMCU row */
-  for (yoffset = 0; yoffset < num_MCU_rows; yoffset++) {
-    for (MCU_col_num = 0; MCU_col_num < cinfo->MCUs_per_row; MCU_col_num++) {
+  for (yoffset = coef->MCU_vert_offset; yoffset < coef->MCU_rows_per_iMCU_row;
+       yoffset++) {
+    for (MCU_col_num = coef->mcu_ctr; MCU_col_num < cinfo->MCUs_per_row;
+	 MCU_col_num++) {
       /* Construct list of pointers to DCT blocks belonging to this MCU */
       blkn = 0;			/* index of current DCT block within MCU */
       for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
@@ -239,12 +251,18 @@
       }
       /* Try to fetch the MCU. */
       if (! (*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) {
-	ERREXIT(cinfo, JERR_CANT_SUSPEND); /* not supported */
+	/* Suspension forced; update state counters and exit */
+	coef->MCU_vert_offset = yoffset;
+	coef->mcu_ctr = MCU_col_num;
+	return FALSE;
       }
     }
+    /* Completed an MCU row, but perhaps not an iMCU row */
+    coef->mcu_ctr = 0;
   }
-
-  coef->MCU_row_num++;
+  /* Completed the iMCU row, advance counters for next one */
+  coef->iMCU_row_num++;
+  start_iMCU_row(cinfo);
   return TRUE;
 }
 
@@ -261,7 +279,7 @@
 decompress_output (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
 {
   my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
-  JDIMENSION last_MCU_row = cinfo->total_iMCU_rows - 1;
+  JDIMENSION last_iMCU_row = cinfo->total_iMCU_rows - 1;
   JDIMENSION block_num;
   int ci, block_row, block_rows;
   JBLOCKARRAY buffer;
@@ -279,11 +297,12 @@
     /* Align the virtual buffer for this component. */
     buffer = (*cinfo->mem->access_virt_barray)
       ((j_common_ptr) cinfo, coef->whole_image[ci],
-       coef->MCU_row_num * compptr->v_samp_factor, FALSE);
+       coef->iMCU_row_num * compptr->v_samp_factor, FALSE);
     /* Count non-dummy DCT block rows in this iMCU row. */
-    if (coef->MCU_row_num < last_MCU_row)
+    if (coef->iMCU_row_num < last_iMCU_row)
       block_rows = compptr->v_samp_factor;
     else {
+      /* NB: can't use last_row_height here, since may not be set! */
       block_rows = (int) (compptr->height_in_blocks % compptr->v_samp_factor);
       if (block_rows == 0) block_rows = compptr->v_samp_factor;
     }
@@ -303,7 +322,7 @@
     }
   }
 
-  coef->MCU_row_num++;
+  coef->iMCU_row_num++;
   return TRUE;
 }
 
diff --git a/jdhuff.c b/jdhuff.c
index e92ad9a..939bbad 100644
--- a/jdhuff.c
+++ b/jdhuff.c
@@ -142,6 +142,7 @@
 
   /* Initialize private state variables */
   entropy->saved.bits_left = 0;
+  entropy->saved.get_buffer = 0; /* unnecessary, but keeps Purify quiet */
   entropy->printed_eod = FALSE;
 
   /* Initialize restart counter */
diff --git a/jdmaster.c b/jdmaster.c
index ca579da..1a49619 100644
--- a/jdmaster.c
+++ b/jdmaster.c
@@ -1,7 +1,7 @@
 /*
  * jdmaster.c
  *
- * Copyright (C) 1991-1994, Thomas G. Lane.
+ * Copyright (C) 1991-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -256,7 +256,12 @@
     compptr->MCU_blocks = 1;
     compptr->MCU_sample_width = compptr->DCT_scaled_size;
     compptr->last_col_width = 1;
-    compptr->last_row_height = 1;
+    /* For noninterleaved scans, it is convenient to define last_row_height
+     * as the number of block rows present in the last iMCU row.
+     */
+    tmp = (int) (compptr->height_in_blocks % compptr->v_samp_factor);
+    if (tmp == 0) tmp = compptr->v_samp_factor;
+    compptr->last_row_height = tmp;
     
     /* Prepare array describing MCU composition */
     cinfo->blocks_in_MCU = 1;
diff --git a/jpegint.h b/jpegint.h
index 5ae3a9f..b4eec11 100644
--- a/jpegint.h
+++ b/jpegint.h
@@ -1,7 +1,7 @@
 /*
  * jpegint.h
  *
- * Copyright (C) 1991-1994, Thomas G. Lane.
+ * Copyright (C) 1991-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -70,9 +70,8 @@
 /* Coefficient buffer control */
 struct jpeg_c_coef_controller {
   JMETHOD(void, start_pass, (j_compress_ptr cinfo, J_BUF_MODE pass_mode));
-  JMETHOD(void, compress_data, (j_compress_ptr cinfo,
-				JSAMPIMAGE input_buf,
-				JDIMENSION *in_mcu_ctr));
+  JMETHOD(boolean, compress_data, (j_compress_ptr cinfo,
+				   JSAMPIMAGE input_buf));
 };
 
 /* Colorspace conversion */
diff --git a/jquant1.c b/jquant1.c
index 9a9e63a..924b409 100644
--- a/jquant1.c
+++ b/jquant1.c
@@ -1,7 +1,7 @@
 /*
  * jquant1.c
  *
- * Copyright (C) 1991-1994, Thomas G. Lane.
+ * Copyright (C) 1991-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -77,26 +77,26 @@
 typedef int (*ODITHER_MATRIX_PTR)[ODITHER_SIZE];
 
 static const UINT8 base_dither_matrix[ODITHER_SIZE][ODITHER_SIZE] = {
-    /* Bayer's order-4 dither array.  Generated by the code given in
-     * Stephen Hawley's article "Ordered Dithering" in Graphics Gems I.
-     * The values in this array must range from 0 to ODITHER_CELLS-1.
-     */
-      0,192, 48,240, 12,204, 60,252,  3,195, 51,243, 15,207, 63,255,
-    128, 64,176,112,140, 76,188,124,131, 67,179,115,143, 79,191,127,
-     32,224, 16,208, 44,236, 28,220, 35,227, 19,211, 47,239, 31,223,
-    160, 96,144, 80,172,108,156, 92,163, 99,147, 83,175,111,159, 95,
-      8,200, 56,248,  4,196, 52,244, 11,203, 59,251,  7,199, 55,247,
-    136, 72,184,120,132, 68,180,116,139, 75,187,123,135, 71,183,119,
-     40,232, 24,216, 36,228, 20,212, 43,235, 27,219, 39,231, 23,215,
-    168,104,152, 88,164,100,148, 84,171,107,155, 91,167,103,151, 87,
-      2,194, 50,242, 14,206, 62,254,  1,193, 49,241, 13,205, 61,253,
-    130, 66,178,114,142, 78,190,126,129, 65,177,113,141, 77,189,125,
-     34,226, 18,210, 46,238, 30,222, 33,225, 17,209, 45,237, 29,221,
-    162, 98,146, 82,174,110,158, 94,161, 97,145, 81,173,109,157, 93,
-     10,202, 58,250,  6,198, 54,246,  9,201, 57,249,  5,197, 53,245,
-    138, 74,186,122,134, 70,182,118,137, 73,185,121,133, 69,181,117,
-     42,234, 26,218, 38,230, 22,214, 41,233, 25,217, 37,229, 21,213,
-    170,106,154, 90,166,102,150, 86,169,105,153, 89,165,101,149, 85
+  /* Bayer's order-4 dither array.  Generated by the code given in
+   * Stephen Hawley's article "Ordered Dithering" in Graphics Gems I.
+   * The values in this array must range from 0 to ODITHER_CELLS-1.
+   */
+  {   0,192, 48,240, 12,204, 60,252,  3,195, 51,243, 15,207, 63,255 },
+  { 128, 64,176,112,140, 76,188,124,131, 67,179,115,143, 79,191,127 },
+  {  32,224, 16,208, 44,236, 28,220, 35,227, 19,211, 47,239, 31,223 },
+  { 160, 96,144, 80,172,108,156, 92,163, 99,147, 83,175,111,159, 95 },
+  {   8,200, 56,248,  4,196, 52,244, 11,203, 59,251,  7,199, 55,247 },
+  { 136, 72,184,120,132, 68,180,116,139, 75,187,123,135, 71,183,119 },
+  {  40,232, 24,216, 36,228, 20,212, 43,235, 27,219, 39,231, 23,215 },
+  { 168,104,152, 88,164,100,148, 84,171,107,155, 91,167,103,151, 87 },
+  {   2,194, 50,242, 14,206, 62,254,  1,193, 49,241, 13,205, 61,253 },
+  { 130, 66,178,114,142, 78,190,126,129, 65,177,113,141, 77,189,125 },
+  {  34,226, 18,210, 46,238, 30,222, 33,225, 17,209, 45,237, 29,221 },
+  { 162, 98,146, 82,174,110,158, 94,161, 97,145, 81,173,109,157, 93 },
+  {  10,202, 58,250,  6,198, 54,246,  9,201, 57,249,  5,197, 53,245 },
+  { 138, 74,186,122,134, 70,182,118,137, 73,185,121,133, 69,181,117 },
+  {  42,234, 26,218, 38,230, 22,214, 41,233, 25,217, 37,229, 21,213 },
+  { 170,106,154, 90,166,102,150, 86,169,105,153, 89,165,101,149, 85 }
 };
 
 
@@ -184,6 +184,7 @@
   int nc = cinfo->out_color_components; /* number of color components */
   int max_colors = cinfo->desired_number_of_colors;
   int total_colors, iroot, i, j;
+  boolean changed;
   long temp;
   static const int RGB_order[3] = { RGB_GREEN, RGB_RED, RGB_BLUE };
 
@@ -210,18 +211,24 @@
   }
   /* We may be able to increment the count for one or more components without
    * exceeding max_colors, though we know not all can be incremented.
+   * Sometimes, the first component can be incremented more than once!
+   * (Example: for 16 colors, we start at 2*2*2, go to 3*2*2, then 4*2*2.)
    * In RGB colorspace, try to increment G first, then R, then B.
    */
-  for (i = 0; i < nc; i++) {
-    j = (cinfo->out_color_space == JCS_RGB ? RGB_order[i] : i);
-    /* calculate new total_colors if Ncolors[j] is incremented */
-    temp = total_colors / Ncolors[j];
-    temp *= Ncolors[j]+1;	/* done in long arith to avoid oflo */
-    if (temp > (long) max_colors)
-      break;			/* won't fit, done */
-    Ncolors[j]++;		/* OK, apply the increment */
-    total_colors = (int) temp;
-  }
+  do {
+    changed = FALSE;
+    for (i = 0; i < nc; i++) {
+      j = (cinfo->out_color_space == JCS_RGB ? RGB_order[i] : i);
+      /* calculate new total_colors if Ncolors[j] is incremented */
+      temp = total_colors / Ncolors[j];
+      temp *= Ncolors[j]+1;	/* done in long arith to avoid oflo */
+      if (temp > (long) max_colors)
+	break;			/* won't fit, done with this pass */
+      Ncolors[j]++;		/* OK, apply the increment */
+      total_colors = (int) temp;
+      changed = TRUE;
+    }
+  } while (changed);
 
   return total_colors;
 }
diff --git a/jversion.h b/jversion.h
index 18b03b6..ad65928 100644
--- a/jversion.h
+++ b/jversion.h
@@ -1,7 +1,7 @@
 /*
  * jversion.h
  *
- * Copyright (C) 1991-1994, Thomas G. Lane.
+ * Copyright (C) 1991-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -9,6 +9,6 @@
  */
 
 
-#define JVERSION	"5a  7-Dec-94"
+#define JVERSION	"5b  15-Mar-95"
 
-#define JCOPYRIGHT	"Copyright (C) 1994, Thomas G. Lane"
+#define JCOPYRIGHT	"Copyright (C) 1995, Thomas G. Lane"
diff --git a/libjpeg.doc b/libjpeg.doc
index 9028053..8c0673e 100644
--- a/libjpeg.doc
+++ b/libjpeg.doc
@@ -1,6 +1,6 @@
 USING THE IJG JPEG LIBRARY
 
-Copyright (C) 1994, Thomas G. Lane.
+Copyright (C) 1994-1995, Thomas G. Lane.
 This file is part of the Independent JPEG Group's software.
 For conditions of distribution and use, see the accompanying README file.
 
@@ -189,7 +189,10 @@
 pixel having the same number of "component" values (color channels).  You
 must specify how many components there are and the colorspace interpretation
 of the components.  Most applications will use RGB data (three components
-per pixel) or grayscale data (one component per pixel).
+per pixel) or grayscale data (one component per pixel).  PLEASE NOTE THAT
+RGB DATA IS THREE SAMPLES PER PIXEL, GRAYSCALE ONLY ONE.  A remarkable
+number of people manage to miss this, only to find that their programs don't
+work with grayscale JPEG files.
 
 Note that there is no provision for colormapped input.  You can feed in a
 colormapped image by expanding it to full-color format.  However JPEG often
@@ -644,7 +647,8 @@
 to be read (ie, the height of your working buffer); jpeg_read_scanlines()
 will return up to that many lines.  The return value is the number of lines
 actually read.  The format of the returned data is discussed under "Data
-formats", above.
+formats", above.  Don't forget that grayscale and color JPEGs will return
+different data formats!
 
 Image data is returned in top-to-bottom scanline order.  If you must write
 out the image in bottom-to-top order, you can use the JPEG library's virtual
@@ -855,10 +859,10 @@
 		JDCT_FLOAT: floating-point method
 		JDCT_DEFAULT: default method (normally JDCT_ISLOW)
 		JDCT_FASTEST: fastest method (normally JDCT_IFAST)
-	The floating-point method is the most accurate, but may give slightly
-	different results on different machines due to varying roundoff
-	behavior.  The integer methods should give the same results on all
-	machines.  On machines with sufficiently fast FP hardware, the
+	The FLOAT method is very slightly more accurate than the ISLOW method,
+	but may give different results on different machines due to varying
+	roundoff behavior.  The integer methods should give the same results
+	on all machines.  On machines with sufficiently fast FP hardware, the
 	floating-point method may also be the fastest.  The IFAST method is
 	considerably less accurate than the other two; its use is not
 	recommended if high quality is a concern.  JDCT_DEFAULT and
@@ -1050,6 +1054,9 @@
 JSAMPARRAY colormap
 	The color map, represented as a 2-D pixel array of out_color_components
 	rows and actual_number_of_colors columns.  Ignored if not quantizing.
+	CAUTION: if the JPEG library creates its own colormap, the storage
+	pointed to by this field is released by jpeg_finish_decompress().
+	Copy the colormap somewhere else first, if you want to save it.
 
 int actual_number_of_colors
 	The number of colors in the color map.
@@ -1889,8 +1896,9 @@
 arrays, but the scanlines count must total up to 112 so that all of the Cb
 and Cr data gets passed.
 
-Currently, output suspension is not supported with raw data output: an error
-will result if the data destination module tries to suspend.
+Output suspension is supported with raw-data compression: if the data
+destination module suspends, jpeg_write_raw_data() will return 0.
+In this case the same data rows must be passed again on the next call.
 
 
 Decompression with raw data output implies bypassing all postprocessing:
@@ -2048,13 +2056,18 @@
 larger than a char, so it affects the surrounding application's image data.
 The sample applications cjpeg and djpeg can support 12-bit mode only for PPM
 and GIF file formats; you must disable the other file formats to compile a
-12-bit cjpeg or djpeg.  At present, a 12-bit library can handle *only*
-12-bit images, not both precisions.  (If you need to include both 8- and
-12-bit libraries in a single application, you could probably do it by
-defining NEED_SHORT_EXTERNAL_NAMES for just one of the copies.  You'd have
-to access the 8-bit and 12-bit copies from separate application source
-files.  This is untested ... if you try it, we'd like to hear whether it
-works!)
+12-bit cjpeg or djpeg.  (install.doc has more information about that.)
+At present, a 12-bit library can handle *only* 12-bit images, not both
+precisions.  (If you need to include both 8- and 12-bit libraries in a single
+application, you could probably do it by defining NEED_SHORT_EXTERNAL_NAMES
+for just one of the copies.  You'd have to access the 8-bit and 12-bit copies
+from separate application source files.  This is untested ... if you try it,
+we'd like to hear whether it works!)
+
+Note that a 12-bit library always compresses in Huffman optimization mode,
+in order to generate valid Huffman tables.  This is necessary because our
+default Huffman tables only cover 8-bit data.  If you need to output 12-bit
+files in one pass, you'll have to supply suitable default Huffman tables.
 
 The maximum number of components (color channels) in the image is determined
 by MAX_COMPONENTS.  The JPEG standard allows up to 255 components, but we
diff --git a/makefile.bcc b/makefile.bcc
index d45bb7d..6f88b7d 100644
--- a/makefile.bcc
+++ b/makefile.bcc
@@ -1,35 +1,63 @@
 # Makefile for Independent JPEG Group's software
 
-# This makefile is suitable for Borland C (Turbo C) on MS-DOS.
-# It is set up for Borland C++, revision 3.0 or later.
-# For older versions (pre-3.0), replace "-O2" with "-O -G -Z" in CFLAGS.
-# If you have an even older version of Turbo C, you may be able to make it
-# work by saying "CC= tcc" below.  (Very early versions of Turbo C++,
-# like 1.01, are so buggy that you may as well forget it.)
-# Thanks to Tom Wright and Ge' Weijers for this file.
+# This makefile is suitable for Borland C on MS-DOS or OS/2.
+# It works with Borland C++ for DOS, revision 3.0 or later,
+# and has been tested with Borland C++ for OS/2, revision 2.0.
+# Thanks to Tom Wright and Ge' Weijers (original DOS) and
+# Ken Porter (OS/2) for this file.
 
 # Read installation instructions before saying "make" !!
 
+# Are we under DOS or OS/2?
+!if !$d(DOS) && !$d(OS2)
+!if $d(__MSDOS__)
+DOS=1
+!else
+OS2=1
+!endif
+!endif
+
 # The name of your C compiler:
 CC= bcc
 
 # You may need to adjust these cc options:
-CFLAGS= -mm -w-par -O2
-# -mm selects medium memory model (near data, far code pointers)
+!if $d(DOS)
+CFLAGS= -O2 -mm -w-par -w-stu -w-ccc -w-rch
+!else
+CFLAGS= -O1 -w-par -w-stu -w-ccc -w-rch
+!endif
+# -O2 enables full code optimization (for pre-3.0 Borland C++, use -O -G -Z).
+# -O2 is buggy in Borland OS/2 C++ revision 2.0, so use -O1 for now.
+# -mm selects medium memory model (near data, far code pointers; DOS only!)
 # -w-par suppresses warnings about unused function parameters
-# -O2 enables full code optimization (for pre-3.0 Borland C++, use -O -G -Z)
+# -w-stu suppresses warnings about incomplete structures
+# -w-ccc suppresses warnings about compile-time-constant conditions
+# -w-rch suppresses warnings about unreachable code
 # Generally, we recommend defining any configuration symbols in jconfig.h,
 # NOT via -D switches here.
 
 # Link-time cc options:
+!if $d(DOS)
 LDFLAGS= -mm
 # memory model option here must match CFLAGS!
+!else
+LDFLAGS=
+# -lai full-screen app
+# -lc case-significant link
+!endif
 
 # Put here the object file name for the correct system-dependent memory
-# manager file.  For DOS, we recommend jmemdos.c and jmemdosa.asm.
-# If you change this, you'll need to modify the linker response file
-# name list, below, by hand!
+# manager file.
+# For DOS, we recommend jmemdos.c and jmemdosa.asm.
+# For OS/2, we recommend jmemnobs.c (flat memory!)
+# SYSDEPMEMLIB must list the same files with "+" signs for the librarian.
+!if $d(DOS)
 SYSDEPMEM= jmemdos.obj jmemdosa.obj
+SYSDEPMEMLIB= +jmemdos.obj +jmemdosa.obj
+!else
+SYSDEPMEM= jmemnobs.obj
+SYSDEPMEMLIB= +jmemnobs.obj
+!endif
 
 # End of configurable options.
 
@@ -81,31 +109,22 @@
 DOBJECTS= djpeg.obj wrppm.obj wrgif.obj wrtarga.obj wrrle.obj wrbmp.obj \
         rdcolmap.obj
 
-# need linker response file because file list > 128 chars
-RFILE= libjpeg.ans
-
 
 all: libjpeg.lib cjpeg.exe djpeg.exe rdjpgcom.exe wrjpgcom.exe
 
-libjpeg.lib: $(LIBOBJECTS) $(RFILE)
-	del libjpeg.lib
-	tlib @$(RFILE)
-
-# linker response file for building libjpeg.lib
-$(RFILE) : Makefile
-	del $(RFILE)
-	echo libjpeg.lib & >$(RFILE)
-	echo +jcapi.obj +jcparam.obj +jdatadst.obj +jcmaster.obj & >>$(RFILE)
-	echo +jcmarker.obj +jcmainct.obj +jcprepct.obj & >>$(RFILE)
-	echo +jccoefct.obj +jccolor.obj +jcsample.obj +jchuff.obj & >>$(RFILE)
-	echo +jcdctmgr.obj +jfdctfst.obj +jfdctflt.obj & >>$(RFILE)
-	echo +jfdctint.obj +jdapi.obj +jdatasrc.obj +jdmaster.obj & >>$(RFILE)
-	echo +jdmarker.obj +jdmainct.obj +jdcoefct.obj & >>$(RFILE)
-	echo +jdpostct.obj +jddctmgr.obj +jidctfst.obj & >>$(RFILE)
-	echo +jidctflt.obj +jidctint.obj +jidctred.obj +jdhuff.obj & >>$(RFILE)
-	echo +jdsample.obj +jdcolor.obj +jquant1.obj +jquant2.obj & >>$(RFILE)
-	echo +jdmerge.obj +jcomapi.obj +jutils.obj +jerror.obj & >>$(RFILE)
-	echo +jmemmgr.obj +jmemdos.obj +jmemdosa.obj >>$(RFILE)
+libjpeg.lib: $(LIBOBJECTS)
+	- del libjpeg.lib
+	tlib libjpeg.lib /E /C @&&|
++jcapi.obj +jcparam.obj +jdatadst.obj +jcmaster.obj +jcmarker.obj &
++jcmainct.obj +jcprepct.obj +jccoefct.obj +jccolor.obj +jcsample.obj &
++jchuff.obj +jcdctmgr.obj +jfdctfst.obj +jfdctflt.obj +jfdctint.obj &
++jdapi.obj +jdatasrc.obj +jdmaster.obj +jdmarker.obj +jdmainct.obj &
++jdcoefct.obj +jdpostct.obj +jddctmgr.obj +jidctfst.obj +jidctflt.obj &
++jidctint.obj +jidctred.obj +jdhuff.obj +jdsample.obj +jdcolor.obj &
++jquant1.obj +jquant2.obj +jdmerge.obj +jcomapi.obj +jutils.obj &
++jerror.obj +jmemmgr.obj &
+$(SYSDEPMEMLIB)
+|
 
 cjpeg.exe: $(COBJECTS) libjpeg.lib
 	$(CC) $(LDFLAGS) -ecjpeg.exe $(COBJECTS) libjpeg.lib
@@ -114,14 +133,24 @@
 	$(CC) $(LDFLAGS) -edjpeg.exe $(DOBJECTS) libjpeg.lib
 
 rdjpgcom.exe: rdjpgcom.c
+!if $d(DOS)
 	$(CC) -ms -O rdjpgcom.c
+!else
+	$(CC) $(CFLAGS) rdjpgcom.c
+!endif
 
-# wrjpgcom needs large model so it can malloc a 64K chunk
+# On DOS, wrjpgcom needs large model so it can malloc a 64K chunk
 wrjpgcom.exe: wrjpgcom.c
+!if $d(DOS)
 	$(CC) -ml -O wrjpgcom.c
+!else
+	$(CC) $(CFLAGS) wrjpgcom.c
+!endif
 
+# This "{}" syntax allows Borland Make to "batch" source files.
+# In this way, each run of the compiler can build many modules.
 .c.obj:
-	$(CC) $(CFLAGS) -c $<
+	$(CC) $(CFLAGS) -c{ $<}
 
 jconfig.h: jconfig.doc
 	echo You must prepare a system-dependent jconfig.h file.
@@ -129,22 +158,30 @@
 	exit 1
 
 clean:
-	del *.obj
-	del libjpeg.lib
-	del cjpeg.exe
-	del djpeg.exe
-	del rdjpgcom.exe
-	del wrjpgcom.exe
-	del testout.*
+	- del *.obj
+	- del libjpeg.lib
+	- del cjpeg.exe
+	- del djpeg.exe
+	- del rdjpgcom.exe
+	- del wrjpgcom.exe
+	- del testout.*
 
 test: cjpeg.exe djpeg.exe
-	del testout.*
+	- del testout.*
 	djpeg -dct int -ppm -outfile testout.ppm  testorig.jpg
 	djpeg -dct int -gif -outfile testout.gif  testorig.jpg
 	cjpeg -dct int -outfile testout.jpg  testimg.ppm
+!if $d(DOS)
 	fc /b testimg.ppm testout.ppm
 	fc /b testimg.gif testout.gif
 	fc /b testimg.jpg testout.jpg
+!else
+	echo n > n.tmp
+	comp testimg.ppm testout.ppm < n.tmp
+	comp testimg.gif testout.gif < n.tmp
+	comp testimg.jpg testout.jpg < n.tmp
+	del n.tmp
+!endif
 
 
 jcapi.obj : jcapi.c jinclude.h jconfig.h jpeglib.h jmorecfg.h jpegint.h jerror.h
diff --git a/makefile.cfg b/makefile.cfg
index 1ae48f0..b7f84d4 100644
--- a/makefile.cfg
+++ b/makefile.cfg
@@ -23,7 +23,7 @@
 CC= @CC@
 
 # You may need to adjust these cc options:
-CFLAGS= @CFLAGS@ @CPPFLAGS@ -I$(srcdir)
+CFLAGS= @CFLAGS@ @CPPFLAGS@ @INCLUDEFLAGS@
 # Generally, we recommend defining any configuration symbols in jconfig.h,
 # NOT via -D switches here.
 # However, any special defines for ansi2knr.c may be included here:
@@ -150,8 +150,10 @@
 	$(INSTALL_DATA) $(srcdir)/rdjpgcom.1 $(mandir)/$(manprefix)rdjpgcom.$(manext)
 	$(INSTALL_DATA) $(srcdir)/wrjpgcom.1 $(mandir)/$(manprefix)wrjpgcom.$(manext)
 
-install-lib: libjpeg.a jconfig.h
+install-lib: libjpeg.a install-headers
 	$(INSTALL_DATA) libjpeg.a $(libdir)/$(binprefix)libjpeg.a
+
+install-headers: jconfig.h
 	$(INSTALL_DATA) jconfig.h $(includedir)/jconfig.h
 	$(INSTALL_DATA) $(srcdir)/jpeglib.h $(includedir)/jpeglib.h
 	$(INSTALL_DATA) $(srcdir)/jmorecfg.h $(includedir)/jmorecfg.h
@@ -168,17 +170,17 @@
 
 test: cjpeg djpeg
 	$(RM) testout.ppm testout.gif testout.jpg
-	./djpeg -dct int -ppm -outfile testout.ppm  testorig.jpg
-	./djpeg -dct int -gif -outfile testout.gif  testorig.jpg
-	./cjpeg -dct int -outfile testout.jpg  testimg.ppm
-	cmp testimg.ppm testout.ppm
-	cmp testimg.gif testout.gif
-	cmp testimg.jpg testout.jpg
+	./djpeg -dct int -ppm -outfile testout.ppm  $(srcdir)/testorig.jpg
+	./djpeg -dct int -gif -outfile testout.gif  $(srcdir)/testorig.jpg
+	./cjpeg -dct int -outfile testout.jpg  $(srcdir)/testimg.ppm
+	cmp $(srcdir)/testimg.ppm testout.ppm
+	cmp $(srcdir)/testimg.gif testout.gif
+	cmp $(srcdir)/testimg.jpg testout.jpg
 
 check: test
 
 # GNU Make likes to know which target names are not really files to be made:
-.PHONY: all install install-lib clean distribute test check
+.PHONY: all install install-lib install-headers clean distribute test check
 
 
 jcapi.o : jcapi.c jinclude.h jconfig.h jpeglib.h jmorecfg.h jpegint.h jerror.h
diff --git a/rdgif.c b/rdgif.c
index c4bc9ef..f16fb47 100644
--- a/rdgif.c
+++ b/rdgif.c
@@ -5,6 +5,11 @@
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
+ **************************************************************************
+ * WARNING: You will need an LZW patent license from Unisys in order to   *
+ * use this file legally in any commercial or shareware application.      *
+ **************************************************************************
+ *
  * This file contains routines to read input images in GIF format.
  *
  * These routines may need modification for non-Unix environments or
diff --git a/rdjpgcom.c b/rdjpgcom.c
index d121dc9..503ccba 100644
--- a/rdjpgcom.c
+++ b/rdjpgcom.c
@@ -1,7 +1,7 @@
 /*
  * rdjpgcom.c
  *
- * Copyright (C) 1994, Thomas G. Lane.
+ * Copyright (C) 1994-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -102,7 +102,7 @@
 #define M_SOF1  0xC1		/* N indicates which compression process */
 #define M_SOF2  0xC2		/* Only SOF0 and SOF1 are now in common use */
 #define M_SOF3  0xC3
-#define M_SOF5  0xC5
+#define M_SOF5  0xC5		/* NB: codes C4 and CC are NOT SOF markers */
 #define M_SOF6  0xC6
 #define M_SOF7  0xC7
 #define M_SOF9  0xC9
@@ -262,7 +262,7 @@
   unsigned int image_height, image_width;
   int data_precision, num_components;
   const char * process;
-  int ci, c1, c2, c3;
+  int ci;
 
   length = read_2_bytes();	/* usual parameter length count */
 
@@ -296,9 +296,9 @@
     ERREXIT("Bogus SOF marker length");
 
   for (ci = 0; ci < num_components; ci++) {
-    c1 = read_1_byte();		/* Component ID code */
-    c2 = read_1_byte();		/* H, V sampling factors */
-    c3 = read_1_byte();		/* Quant table number */
+    (void) read_1_byte();	/* Component ID code */
+    (void) read_1_byte();	/* H, V sampling factors */
+    (void) read_1_byte();	/* Quantization table number */
   }
 }
 
diff --git a/structure.doc b/structure.doc
index 30d5828..c04e1e3 100644
--- a/structure.doc
+++ b/structure.doc
@@ -1,6 +1,6 @@
 IJG JPEG LIBRARY:  SYSTEM ARCHITECTURE
 
-Copyright (C) 1991-1994, Thomas G. Lane.
+Copyright (C) 1991-1995, Thomas G. Lane.
 This file is part of the Independent JPEG Group's software.
 For conditions of distribution and use, see the accompanying README file.
 
@@ -362,7 +362,9 @@
   This controller handles MCU assembly, including insertion of dummy DCT
   blocks when needed at the right or bottom edge.  When performing
   Huffman-code optimization or emitting a multiscan JPEG file, this
-  controller is responsible for buffering the full image.
+  controller is responsible for buffering the full image.  The equivalent of
+  one fully interleaved MCU row of subsampled data is processed per call,
+  even when the JPEG file is noninterleaved.
 
 * Forward DCT and quantization: Perform DCT, quantize, and emit coefficients
   in zigzag block order.  Works on one or more DCT blocks at a time.
diff --git a/usage.doc b/usage.doc
index 3fc061f..ceb8528 100644
--- a/usage.doc
+++ b/usage.doc
@@ -131,13 +131,13 @@
 	-dct int	Use integer DCT method (default).
 	-dct fast	Use fast integer DCT (less accurate).
 	-dct float	Use floating-point DCT method.
-			The floating-point method is the most accurate, but
-			will be the slowest unless your machine has very fast
-			floating-point hardware.  Also note that results of
-			the floating-point method may vary slightly across
-			machines, while the integer methods should give the
-			same results everywhere.  The fast integer method is
-			much less accurate than the other two.
+			The float method is very slightly more accurate than
+			the int method, but is much slower unless your machine
+			has very fast floating-point hardware.  Also note that
+			results of the floating-point method may vary slightly
+			across machines, while the integer methods should give
+			the same results everywhere.  The fast integer method
+			is much less accurate than the other two.
 
 	-restart N	Emit a JPEG restart marker every N MCU rows, or every
 			N MCU blocks if "B" is attached to the number.
@@ -266,13 +266,13 @@
 	-dct int	Use integer DCT method (default).
 	-dct fast	Use fast integer DCT (less accurate).
 	-dct float	Use floating-point DCT method.
-			The floating-point method is the most accurate, but
-			will be the slowest unless your machine has very fast
-			floating-point hardware.  Also note that results of
-			the floating-point method may vary slightly across
-			machines, while the integer methods should give the
-			same results everywhere.  The fast integer method is
-			much less accurate than the other two.
+			The float method is very slightly more accurate than
+			the int method, but is much slower unless your machine
+			has very fast floating-point hardware.  Also note that
+			results of the floating-point method may vary slightly
+			across machines, while the integer methods should give
+			the same results everywhere.  The fast integer method
+			is much less accurate than the other two.
 
 	-dither fs	Use Floyd-Steinberg dithering in color quantization.
 	-dither ordered	Use ordered dithering in color quantization.
@@ -346,7 +346,10 @@
 acceptable results in two-pass mode, but is seldom tolerable in one-pass mode.
 
 If you are fortunate enough to have very fast floating point hardware,
-"-dct float" may be even faster than "-dct fast".
+"-dct float" may be even faster than "-dct fast".  But on most machines
+"-dct float" is slower than "-dct int"; in this case it is not worth using,
+because its theoretical accuracy advantage is too small to be significant
+in practice.
 
 Two-pass color quantization requires a good deal of memory; on MS-DOS machines
 it may run out of memory even with -maxmemory 0.  In that case you can still
diff --git a/wrgif.c b/wrgif.c
index d61e24d..854903e 100644
--- a/wrgif.c
+++ b/wrgif.c
@@ -5,6 +5,11 @@
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
+ **************************************************************************
+ * WARNING: You will need an LZW patent license from Unisys in order to   *
+ * use this file legally in any commercial or shareware application.      *
+ **************************************************************************
+ *
  * This file contains routines to write output images in GIF format.
  *
  * These routines may need modification for non-Unix environments or
diff --git a/wrjpgcom.c b/wrjpgcom.c
index 0b25d69..8a0d964 100644
--- a/wrjpgcom.c
+++ b/wrjpgcom.c
@@ -1,7 +1,7 @@
 /*
  * wrjpgcom.c
  *
- * Copyright (C) 1994, Thomas G. Lane.
+ * Copyright (C) 1994-1995, Thomas G. Lane.
  * This file is part of the Independent JPEG Group's software.
  * For conditions of distribution and use, see the accompanying README file.
  *
@@ -152,7 +152,7 @@
 #define M_SOF1  0xC1		/* N indicates which compression process */
 #define M_SOF2  0xC2		/* Only SOF0 and SOF1 are now in common use */
 #define M_SOF3  0xC3
-#define M_SOF5  0xC5
+#define M_SOF5  0xC5		/* NB: codes C4 and CC are NOT SOF markers */
 #define M_SOF6  0xC6
 #define M_SOF7  0xC7
 #define M_SOF9  0xC9