Tuesday, November 18, 2014

Animated navigable ultra high resolution images with Direct2D and the Nokia Image SDK

This post is a mix of rationale, history and a sprinkling of technical details. Here is the Github link

The original problem that caused me to write this library was I needed to be able to display large GIF images on windows phone 8.0 and windows store 8.0. Neither platform has native support for GIFs but both platforms are supported by ImageTools. ImageTools was created to support Silverlight and its written entirely in C#. There are a few problems with ImageTools that became apparent rather quickly. First it was only able to load about 90% of the GIFs that we saw on /r/gifs. Next up it is a CPU hog, it pegs the CPU anytime a GIF is visible. When you load a GIF using ImageTools you must wait until its fully loaded from the network before you can start displaying anything. Finally ImageTools has a voracious appetite for memory because it fully loads everything prior to display and keeps all frames fully in memory.

The first version of GifRenderer for Windows Store solved two of those problems, CPU hog and comparability. The first decision was easy, it was going to need to be implemented in C++ and fortunately both windows phone 8 and windows store 8 had excellent interop between C++ and .Net. I decided to use Windows Imaging Component (WIC) because it has easily the best support for GIFs available anywhere. Difficulties involved in creating the first version were mostly centered around learning Direct3D/XAML interop and frame timing. It turns out that the specified delay(in units of 1/100th of a second) in many GIFs cannot be actually followed. GifRenderer defaults to 10 in case the extension block is not present, is corrupted or is less than 2. The reason for this default is simple, there are lots of bad GIF encoders that have existed throughout time and this particular method of defaulting goes all the way back to netscape navigator, so a surprising amount of gifs still rely on this behavior. XAML interop was achieved using SurfaceImageSource.

Once we started to work on the Windows Phone 8 version of Baconography I realized two components GifRenderer was relying on were not available. WIC was unsupported and so was Direct2D. This meant everything would need to be rewritten to support the platform. Looking around for an alternative to WIC I found GIFLIB. Because of its extensive use in other open source software for both encoding and decoding it has extremely high levels of compatibility with GIFs found out in the wild. GIFLIB also compiled fine on Windows Store, so I knew if things were ever unified between Windows Phone and Store, it would be easy to bring the Windows Phone implementation in to support both platforms. XAML interop was achieved using DrawingSurface. It was considerably easier to implement than SurfaceImageSource because it handled syncing with the XAML render rate.

Need arose for significant changes when an incredible set of GIFs showed up on reddit, titled NYC etiquette. Most GIFs do not actually take advantage of the compression mechanisms that are present in the format, so you usually see full image frames that take quite a bit of space for any reasonably long GIF. The NYC etiquette GIFs were completely different, they had taken advantage of every last trick in the GIF playbook from sliding update windows to very limited but dynamic color palet selection. This lead to a 700k GIF that when fully decoded occupied several hundred megabytes of memory. The correct path forward was readily apparent, frames would need to be deinterlaced/decoded but not actually constructed into full frames until the moment they were to be displayed. This dropped memory consumption for the NYC etiquette GIF set below 15 megabytes and aside from some difficulty in implementing frame disposal methods in an efficient manner, this version would stand for quite a while.

Once we started implementing SnooStream as a Windows Universal App, I needed to implement GifRenderer as a Universal App compatible library. VirtualSurfaceImageSource was the XAML interop method of choice, because it offered the flexability of SurfaceImageSource but as long as you invalidate your frame area, they handle synching rendering to the XAML render rate. While working on GifRenderer I decided it would be nice to add support for incremental GIF loading, the same way a web browser would have displayed it as the frames were loaded. GIFLIB isnt really suited to incremental loading but I tried anyway. Essentially checkpoints were put into the loader loop so that if an error was encountered, it could roll back the state of the loader and try again when more data was available. This method worked ok, but there were memory leaks somewhere that I was never able to track down. The biggest problem was that GIFLIB was originally written a long time ago and its written in very stateful C style. Memory allocation and operations of the GIF frames collection was therefore very intertwined. So I started to re-implement it using mostly modern C++ style, the actual frame load operation state was separated from the rest of the GIF, and frames could be added/removed without impacting anything other than the collection of loaded frames itself. Many times people think re-implementing something in modern C++ style is going to have an inferior memory profile to something that was written in C back when a megabyte was a lot of memory. However in this case where GifRenderer was loading all of the frames anyway, the biggest difference was memory locality. More of the underlying structures were implemented to not live on the heap and the allocation performance of the STL containers is actually quite a bit better than the prior allocation mechanism, realloc. So at this point we have a fully BSD licensed ultra high performance, low memory, incrementally animated GIF renderer that is easy to plug into anything that takes an ImageSource in Windows Universal XAML Apps.

Now that GIFs were out of the way, we still had a problem with static images that are super large. This was a little bit less of a problem on Windows Phone 8 because the max texture size is 2048x2048 so there was an absolute cap at least on the texture memory being used. The actual JPEG decoding process was however not bounded so quite a few of the crashes we experienced on windows phone in baconography were caused by JPEG related OOM. So, moving to Windows Universal where the maximum texture size is something like 16k x 16k and remembering that we're already doing most of the hard work to display images ourselves, it wasn't much of a leap to say, lets make our own ImageControl that loads just the right amount of a JPEG for the current zoom level and why not take advantage of the RAJPEG technology Nokia has given us all access to through the Nokia Image SDK. That last part is really what makes all of this possible, normally if you ask WIC to load a JPEG for you or even load a BitmapImageSource, you are going to have to pay the memory budget for loading the entire image even if you only want 1/100 of the pixels in the image. RAJPEG makes it possible to crop, and re-size with a tiny fraction of the memory otherwise required. Creating a zoomable image control is by no means a new idea. The winner of the Nokia Imaging Wiki Competition 2013Q3. did exactly that, but it was for Windows Phone 8 and it had quite a few rough edges in terms of API usage and its memory profile while much better than the default one, was still less than ideal.

So, at this point we know we need a new Nokia Image SDK powered zoom-able image control, and we also know that Windows 8.1 and by extension Windows Universal Apps added support for super large virtualized surfaces using VirtualSurfaceImageSource.But using this technique brings more to the table than just offering a convenient way to interface with a ScrollViewer. Because its fundamentally implemented using the same DirectManipulation Techniques that a ScrollViewer uses, you get amazing compositor performance and no nasty tearing from areas you've not gotten around to on the render thread. This article is already a bit too long so the most important takeaway from this is DirectManipulation is the amazing technology that makes touch interaction smooth on WindowsPhone 8.1.

In my first try at this, I tried just creating the VirtualSurfaceImageSource at full size then only created textures large enough to render the currently visible portion of the image (with scaling to a max image size for full zoom-out). This fell pretty flat on its face. After a lot of testing is became apparent that despite what Windows.System.MemoryManager.AppMemoryUsage and the memory profiler were telling me, VirtualSurfaceImageSource does not take kindly to actually rendering ultra large surfaces, it really wants you to only render a small portion at a time. The symptoms on this were pretty unpleasant, rather than just being a normally reported OOM, the process would just get terminated without any message whatsoever. Figuring out that this was just a simple OOM on the compositor thread took several very long nights and a lot of cursing. So with my first attempt burning in a flaming wreckage, I decided to read as much as I could about VirtualSurfaceImageSource as I could. It turns out there really isn't that much, Microsoft has a sample for making a magazine reader but that doesn't really deal with changing the detail of an image as the user zooms in and out. So I kept digging, and found out ScrollViewer was implemented using DirectManipulation, then I even found a sample of using DirectManipulation to make an image viewer, It didn't take long to realize that actually using DirectManipulation directly in your app was not allowed for Windows Universal Apps. But some of the technique still stuck, and I started looking around at the VirtualSurfaceImageSource documentation again. This time I found what I needed, in a presentation at build 2013 it was mentioned that VirtualSurfaceImageSource is very good at handling rapid changes in its surface size. So I had a path set out for me. If I could get the DirectManipulation events from the ScrollViewer then I would be able to change the VirtualSurfaceImageSource size when the user does a zoom operation. At that point it was pretty simple to re-implement the crop/re-size logic I had from the first try to respond to system provided invalidation events. Now, all bundled up into one place we have a memory friendly image control that works with super large images and GIFs at the same time. Its all licensed under a 3 clause BSD license so you're free to use it in any app, open source or not.