Wednesday, July 31, 2013

Speed up Tiff file processing with Microsoft Windows Imaging Components (WIC)

Technorati Tags: ,

This post is about speeding up processing of Tiff files in your imaging solutions. For years developers have relied on the System.Drawing namespace (GDI+) to process tiff files and for years we have suffered with the dreaded “GDI+ generic error” error message. Sometimes the error would only occur in certain environments. Urban legend says that GDI+ had changed from Windows 2008 server to Windows 2008 R2. Developers could replicate the errors on R2 but not on non-R2 servers. Microsoft has also stated  that GDI+ should not be used in a server environment System.Drawing Namespace:

“Caution:

Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions. For a supported alternative, see Windows Imaging Components.”

Well that is a tell tale sign to move to the alternative of Windows Imaging Components Overview. WIC is located in the presentationcore and windowsbase assemblies providing access to the classes via the System.Windows.Imaging and System.Windows.Imaging.Media namespaces. The framework was added in .Net 3.0. The main benefits of the WIC framework is that it enables application developers to perform image processing operations on any image format through a single, consistent set of common interfaces, without requiring prior knowledge of specific image formats. Basically it removes all the encoding lookups and stream/bitmap management you have to deal with in GDI+. It also provides an extensible "plug and play" architecture for image codecs, pixel formats, and metadata, with automatic run-time discovery of new formats.

What is so special about WIC?

In my experience GDI+ and Tiff processing errors were produced when working with tiff files that contained either mixed resolutions or mixed color (black/white and color) pages. These would occur  on Windows 2008 R2 but not on Windows 2008. Using WIC this problem has disappeared and WIC can handle these types of tiffs without a problem. Coding with GDI+ especially with scanning solutions where multiple single page tiffs need to be combined into one is needlessly complicated. Having to keep all the single page bitmaps and the underlying streams open while writing to the destination multipage tiff stream required more code to manage memory and cleanup. WIC manages the memory and cleanup for you. WIC’s biggest benefit has come in terms of performance. WIC’s built-in caching and memory management is approximately 69% faster than GDI+  when splitting and joining tiff files. Below are some number comparisons between the two frameworks:

Framework Split 500 pages Join 500 pages
GDI+ 32 Seconds 31 Seconds
WIC 10 Seconds 9 Seconds

 

Real world example

So on average GDI+ can process a tiff page every .064 seconds and WIC can process a tiff page every .020 seconds. This does not sound like much but it can add up especially with tiff files with many pages. Lets say you scan in approximately 15000 2 page documents a day, so how much faster would this be per day? The more pages you have the more it adds up. It could speed up scanning by days.

Number of Documents Pages per document Seconds per page Total time
15000 2 .064 32 minutes
15000 2 .020 10 minutes
15000 500 .064 133 hours
15000 500 .020 42 hours

 

Easy coding with WIC

Below are code examples used in the performance testing for splitting and joining tiff files using the two frameworks. What you should notice is how much cleaner the code is using WIC. GDI+ requires you to iterate the pages using the SelectActiveFrame method and WIC is much more intuitive iterating through the frames collection of the decoder. The WIC framework comes with built-in encoders and decoders for png, gif, jpg,  bmp, and tiff. The encoders and decoders all inherit from a base class which enables you to write polymorphic code to handle them.  GDI+ lacks this requiring developers to come up with their own methods of abstracting away the different encoding parameters needed to create different formats.

        public static void JoinTiffWIC()
{
TiffBitmapEncoder newFileEncoder = new TiffBitmapEncoder();
TiffBitmapDecoder originalFileDecoder = null;

DirectoryInfo di = new DirectoryInfo("c:\\tiffs");

foreach (FileInfo fi in di.GetFiles())
{
using (Stream documentStream = fi.OpenRead())
{
originalFileDecoder = new TiffBitmapDecoder(documentStream,
BitmapCreateOptions.PreservePixelFormat, BitmapCacheOption.OnLoad);
newFileEncoder.Frames.Add(originalFileDecoder.Frames[0]);

}

}

using (FileStream stream = File.Create("c:\\tiffsOut\\" + Guid.NewGuid().ToString() + ".tiff"))
{
newFileEncoder.Save(stream);
}

}

public static void JoinTiffGDI()
{
DirectoryInfo di = new DirectoryInfo("c:\\tiffs");
Stream documentStream = null;
GDI.Bitmap bmp = null;
GDI.Bitmap newFile = null;
List pages = new List();

GDIImaging.EncoderParameters multiParms = new GDIImaging.EncoderParameters(1);
GDIImaging.EncoderParameters singleParms = new GDIImaging.EncoderParameters(1);
GDIImaging.EncoderParameters saveParms = new GDIImaging.EncoderParameters(1);

multiParms.Param[0] = new GDIImaging.EncoderParameter(GDIImaging.Encoder.SaveFlag, (long)GDIImaging.EncoderValue.MultiFrame);
singleParms.Param[0] = new GDIImaging.EncoderParameter(GDIImaging.Encoder.SaveFlag, (long)GDIImaging.EncoderValue.FrameDimensionPage);
saveParms.Param[0] = new GDIImaging.EncoderParameter(GDIImaging.Encoder.SaveFlag, (long)GDIImaging.EncoderValue.Flush);

GDIImaging.ImageCodecInfo tiffCodecInfo = (from c in GDIImaging.ImageCodecInfo.GetImageEncoders()
where c.FormatID.Equals(GDIImaging.ImageFormat.Tiff.Guid)
select c).FirstOrDefault();


foreach (FileInfo fi in di.GetFiles())
{
documentStream = fi.OpenRead();
bmp = new GDI.Bitmap(documentStream);
bmp.SelectActiveFrame(GDIImaging.FrameDimension.Page, 0);
pages.Add(bmp);

}

using (MemoryStream fs = new MemoryStream())
{
for (int i = 0; i < pages.Count; i++)
{
if (i == 0)
{
newFile = pages[i];
newFile.Save(fs, tiffCodecInfo, multiParms);
}
else
{
newFile.SaveAdd(pages[i], singleParms);
}
}

newFile.SaveAdd(saveParms);

}

}

public static void SPlitTiffWIC(string fileName)
{

TiffBitmapEncoder newFileEncoder = null;

FileInfo fi = new FileInfo(fileName);
using (Stream documentStream = fi.OpenRead())
{

TiffBitmapDecoder originalFileDecoder = new TiffBitmapDecoder(documentStream,
BitmapCreateOptions.PreservePixelFormat, BitmapCacheOption.None);
foreach (BitmapFrame frame in originalFileDecoder.Frames)
{
newFileEncoder = new TiffBitmapEncoder();
newFileEncoder.Frames.Add(frame);
using (FileStream stream = File.Create("c:\\tiffs\\" + Guid.NewGuid().ToString() + ".tiff"))
{
newFileEncoder.Save(stream);
}

}
}
}
public static void SPlitTiffGDI(string fileName)
{

FileInfo fi = new FileInfo(fileName);
GDIImaging.EncoderParameters multiParms = new GDIImaging.EncoderParameters(1);
multiParms.Param[0] = new GDIImaging.EncoderParameter(GDIImaging.Encoder.SaveFlag, (long)GDIImaging.EncoderValue.MultiFrame);

GDIImaging.ImageCodecInfo tiffCodecInfo = (from c in GDIImaging.ImageCodecInfo.GetImageEncoders()
where c.FormatID.Equals(GDIImaging.ImageFormat.Tiff.Guid)
select c).FirstOrDefault();

using (Stream documentStream = fi.OpenRead())
{
using (GDI.Bitmap bp = new GDI.Bitmap(documentStream))
{
GDIImaging.FrameDimension fd = new GDIImaging.FrameDimension(bp.FrameDimensionsList[0]);
int pageCount = bp.GetFrameCount(fd);

for (int i = 0; i < pageCount; i++)
{
bp.SelectActiveFrame(fd, i);
bp.Save("c:\\tiffs\\" + Guid.NewGuid().ToString() + ".tiff", tiffCodecInfo, multiParms);
}


}

}

}

This code is for illustration purposes only. The GDI+ code that joins tiffs does not dispose of all the bitmaps that are created and leaks memory. It points out that you must handle the disposal of many bitmaps and streams whereas, with WIC this not required. The slowness from using GDI+ appears to be from calling the bitmap’s SaveAdd method multiple times. I originally thought it might be from writing to disk so I used a stream and this made no difference. It appears GDI+ has a lot of overhead encoding on each SaveAdd. WIC is optimized for navigating frames (pages) in memory and has a few options for caching the metadata needed for encoding. Memory consumption can be optimized using the BitmapCacheOption.None. However, if you do this and close the decoder’s stream before calling the encoder’s save method you will get an error. You must use either BitmapCacheOption.onLoad or BitmapCacheOption.onDemand to encode with the source decoder’s stream closed. Even when caching is enabled memory consumption was minimal.



Clear, Efficient and Fast


Compared to GDI+ WIC enables to you write clearer, more efficient and faster code for processing tiffs. The code is clearer since you don’t need to build the scaffolding for cleaning up bitmaps. WIC is more efficient since it does the clean up and has built in caching. Faster, because of optimized encoding. Experiment with WIC and you will discover it can do many things easier such as reading and writing annotations and creating custom metadata. It can also create thumbnails and manipulate images.

Tuesday, July 23, 2013

SharePoint Server MVP 2013

Technorati Tags: ,,

Once again I am very grateful to be honored with my 5th Microsoft SharePoint MVP award. It is great to be included with other incredible SharePoint professionals in the community. I look forward to another technically challenging year. SharePoint 2013 is consuming most of the community’s time and the need to help others has never been greater. I also want to thank KnowledgeLake for an awesome place to work and the technical environment to continue  my passion for SharePoint.