Thursday, September 13, 2018

How to correctly scale the Kuwahara-Nagao filter

What is the Kuwahara-Nagao filter?

The Kuwahara-Nagao filter, or "KN filter", is awesome.  It is an edge-preserving filter that both blurs and sharpens at the same time, like magic.  The core concept is to sample regions about a reference pixel, and to pick the average colour of the region with the least variance.  Or, in other words, avoid regions with 'a lot going on'

Background

Sometimes an image is a bit 'grainy', and it's desired to smooth the image.  See figure 1 a.  One way is to apply a blur.  But doing so blurs everything including details, like edges, that were not intended to be blurred.  See figure 1 b.

Sometimes an image is overly blurry, and sharpenning is needed.  But sharpenning doesn't discriminate, and artefacts become sharpenned just as much as useful details.  See figure 1 c.

KN filter solves everything.  The KN filter both smooths small artefacts while not just preserving but even enhancing edges (though may produce a slight blockiness to the image but, that's a bonus, and doesn't count).  See figure 1 d.

a)
b)
c)
d)
Figure 1, a) an image, b) blurred, c) sharpenned, and d) filtered with the Kuwahara-Nagao filter

What's the problem?

I've come across more than a few implementations of the KN filter that produce artefacts and I'm suspecting that the standard method used to increase the scale of the filter is wrong.

Kang et al produced a paper called "Image and Video Abstraction by Anisotropic Kuwahara Filtering" that, while being an excellent paper otherwise, includes what I consider to be an incorrect implementation of the KN filter.

For example in Kang's paper (their) Figure 2 (b) shows a quad-shaped wavy/shell pattern where flat colours might be expected, per figure 2.

Figure 2, Kang et al's 's KN filtered image
The very good (for what it's meant to do) graphics program Paint.net allows user created effects.  There are two such user created KN filters (that I found) that produce the same artefacts as are visible in Kang et al.

However, another program, mtPaint, has a KN implementation but, produces images without the artefacts, per figure 3.

Figure 3, how the image should be filtered with the KN filter
While testing my own initial implementation of the KN filter, I used the mtPaint output as the benchmark for what is 'correct'.  Why do I consider the Kang and (user made) Paint.net implemenations bad and the mtPaint implementation good?  Because the former look crap, and the latter doesn't.  QED.

The usual definition for the KN filter, and the implied scaling method

The original paper (from 1976) referenced by Kang et al exists as far as I can tell only behind a pay wall; if anyone knows of a handy public copy or link please email me, I'd love to see the original paper.  Meanwhile, there are many public sources that describe the KN filter's concept and method (wikipedia, various universities' lecture notes, etc)  – but uncannily all these only ever (so far as I've found) refer to a 3x3 box size – which suggests that the source probably does not define scaling up from 3x3.  And this is likely the cause of the artefacts; people have guessed at a scaling method, incorrectly.

The method for the KN filter is roughly as follows.  (This isn't meant as a tutorial on the KN filter itself, it's here for reference, and to be improved on).
  • Take a 3x3 box, divide it into four 2x2 boxes or samples, overlapping about the centre pixel.  Per Figure 4, below.
  • For each sample get the variance and average colour. 
  • Select the sample with lowest variance, and take the average colour of that sample, and make it the colour for the centre pixel.
Figure 4, the standard 3x3 KN filter box
When I first tried to implement my own KN filter, and wanted to scale the filter size up, I did the 'obvious', by just making each of the four boxes bigger, per figure 5.

Figure 5, standard, and wrong, sampling for a 5x5 filter, with only four samples
With this scaling method I successfully reproduced the crappy wavy artefacts!  Yeay!

Recognising the problem

It occured to me that this scaling method marginalises the centre reference pixel more as the filter size increases, which is bad. 

It occured to me to fill the filter with as many sample regions (of the same size) as distinctly fit, as per figure 6.

Figure 6, correct KN sampling for a 5x5 filter area
It occured to me that the general definition for the KN filter may not be, and maybe never was, 'draw an nxn box and then sample four boxes overlapping about the centre pixel...', but, generally, 'fill the box with as many of the same sized samples as can overlap with the centre pixel' – which only happens to mean, for a 3x3 filter, four 2x2 boxes.  The fact that there are four samples for a 3x3 filter is just what happens for a 3x3 filter, and is, I reckon, not a rule to be applied for all filter sizes.

The correct definition for the KN filter
  • For an n x n box (where n is asserted to be odd), define span as the edge length of a sample such that span = ((n+1)/2). 
  • Fill the filter area with span x span number of samples, each having span edge length.
  • Proceed with getting variance and average colour per sample as per usual, picking sample with lowest variance, using the average colour of that for the reference pixel.







Thursday, September 6, 2018

Kowalski's Image Format (KIF)



I have developed a new lossless image format that is consistently (but not always) better than the .png format.

The files referred to in this post are here:
kifany.exe   convert.exe 

The purpose originally was to have my own compressed-ish image format so that I didn’t need to rely on others’ libraries, for use in the development of my game - with working title: “Kow’s Balls”.

Result, in brief, is a consistent improvement of, on average, 6% over the same sample images saved as .png files.

This statistic comes from a varied sample of 24 images that ranged from cartoonish and flat to sharp and detailed.  The range of improvement compared to the same files saved as .pngs was: -13% to (+)20%.

Give it a go.

The compression work is done by my own library (that is image format agnostic) that does not use any external libraries, that however does use a front end to shuttle image data between the world and itself (the compression lib).

I’ve created three variants of execuable each with a different (internal) front-end to handle this image i/o to a greater or lesser extent, some of which do use an existing image handling library (again, only for the initial loading/ final saving of images, and not for any part of the compression):
  • kif.exe             – converts between kif files and targa files only
  • kifbmp.exe     – converts between kif files and bitmaps (in addition to targas) using the CImg library 
  • Kifany.exe (+ convert.exe) – converts between kif files and pretty much any other image format, using the CImg library that itself calls ImageMagick’s convert.exe (tested with png, jpg, tiff, pcx, but probably works with a bunch of others) in addition to targas and bmps

To just test the compression, use kif.exe or kifbmp.exe, and compare the output .kif to a .png of the same original file.  For greatest flexibility,use kifany.exe (that itself calls convert.exe). 

-Usage-
 
Where you see “kif.exe”, interpret that as either “kif.exe”, “kifbmp.exe” or “kifany.exe”, depending on the kinds of file you’re working with.
  • kif.exe [filename].[img ext]                   - outputs [filename].kif
  • kif.exe [filename].kif                             - outputs [filename]_.bmp (default) 
  • kif.exe [filename].kif - out [img ext]      - outputs [filename]_.[img ext]
  • kif.exe [...] -p                         - prints some progress and metrics to console during execution

Notice output files, other than kifs, get an extra underscore after the filename. 

-Some examples-

From the console, (don’t type “<console>”):

<console> kif mycat.tga             (outputs mycat.kif)
<console> kif mycat.kif              (output mycat_.tga)
<console> kifbmp mycat.bmp    (outputs mycat.kif)
<console> kifbmp mycat.kif       (output mycat_.bmp)
<console> kifany mycat.jpg       (outputs mycat.kif)
<console> kifany mycat.kif -out png     (outputs mycat_.png)

The most direct way of comparing the compression between pngs and kifs, is to directly process a png and see how big the kif is that gets output.

<console> kifany mycat.png    (outputs mycat.kif)


As is, this is a ‘proof of concept’ free-standing program, nothing ‘installs’. Just put the exe/s where you have some images and run (from console) from there.   Alternatively, put the executable/s somewhere and point your Path to that location, or put the files where Path already points. Then run from a console anywhere you have images.   Got it?  Good.

-About targas-  

I’ve noticed that at least one program saves targas in a way that can produce strange results.  If you compress then uncompress a targa and the image is flipped with weird colouring, try saving the original targa with another program and try again, or use bmps instead.
 
-----------------------------------------------------------------------------------------------------------------
About the programming environment 

All done in C++(11) with gcc version 5.1.0 (tdm64-1), using the Code::Blocks IDE.

I make extensive use of the C++ Standard Template Library.

The kif library including its (my) dependant libraries has ~1550 lines of code.  The front-ends each have about 100 to 200.  So pretty compact, though it does seem to compile to a relatively large executable...

I’ve built for Windows only for now. But there’s nothing system specific that I know of (endianness wrt bit-packing?) that would inhibit compiling for Linux or Apple – when the day comes.
 
-----------------------------------------------------------------------------------------------------------------
Method (short version)

Identify all distinct two pixel aligned patterns and replace them with Huffman codes, then bit-pack everything. 

-Why only two pixels?-

I initially supposed that longer patterns would be better. I tried 4, 8, and longer, and some in-between.  I tried rectangular regions too.  The result of using longer patterns is that there will be fewer distinct patterns, which produces a corresponding increase in the number of codes, which in terms of the Huffman coding used, means more longer codes, which finally means less compression. 

-Why aligned?-
 
Because that ‘fixes’ where to perform processing, making everything that little bit easier.



-----------------------------------------------------------------------------------------------------------------
Special features and or known bugs


What I know or have otherwise noticed about the program:
  • Purpose built for 24 and 32 bit images; (for those formats that support 32 bit)  
  • Not yet tolerant of 8 bit indexed images. Program currently treats these as 24 bit  
  • Images must have even width and height.  Because I just happened to be build-testing with images that were even and so didn’t notice that I hadn’t accounted for odd sizes until running my statistics and I can’t be bothered attending to that now because the program does what I want it to do as is (mostly) and I’m happy to require myself to use even images until I care enough otherwise so there  
  • All serial processing; no multi-threading nor parallel/ GPU processing
I intend to address these in due course.
 
-----------------------------------------------------------------------------------------------------------------
Magic Source, two kinds

Very broadly, there are two main technologies involved; Huffman-coding and bit-packing.  Each is its own comp sci topic but here’s a brief note on each for now. 

-Huffman coding-
   
Huffman coding is the process of generating (variable length) codes based on the probability of the items occurring, which will result in shorter codes for more common items. That’s important. And where no code is a segment of another code.  That is, if one code is “234”, then there can’t be another code such as “2345”.  This ensures that when you have a datum string that looks like a code, it is that code and not part of another longer code.  That’s the second point of Huffman coding.  In the kif case, I’m coding for 16 bit chunks (2 x 8 bit colour values).  A common pattern is {0, 0} (no change).  In some cases (but not all) this pattern (of 16 bits) is represented by a Huffman code of 1 bit; ‘0’ or ‘1’.  The result is the replacement of 16 bits with 1 bit!  If there’s a lot of not much changing, then the compression can be significant.

Below is some data on an image that was reduced by 36% (compressed to 64% of original size).  It is marginally better than the .png (reduced by 35%), and is about half way down my list of 24 images in terms of compression rate, so it’s a reasonable pick to investigate.

Here’s the length of the H codes against how many codes there are of that length, and that count as a percentage of the total.  In other words these are the stats for the 'alphabet' of codes.


Here are the codes again but this time showing how many times each occurrs in the (same) image

  
  
What have I shown here?  The top table and chart shows that most of the generated codes are long but the bottom chart and table shows that those longer codes don’t actually occur much, that it’s the smaller codes that occur a lot. In fact that one single code of 3 bits constitutes almost 10% of all code occurrences in the image, repeating more than 86,000 times from over 870,000 total instances.   (So that’s 3 bits replacing 16 bits 10% of the time.)  The bottom data also shows that more than half (~56%) of the codes used are 8 or fewer bits long, that is, more than half the data has been replaced by codes half as long as the original (16 bit) raw data.

-Bit packing-
   
Suppose you have some numbers; 3, 10, 45, 154, 27, 79. Below is a series of bytes with the values, in binary as they are stored in memory.

 
Notice I put a box around the segment that contains the bits required to represent the value; the significant bits. 

Below are the significant bits packed into bytes.  The colours represent the distinct values, and the boxes are the byte boundaries.  You can see how some of the bit strings pack into one byte while others span byte boundaries.  And obviously the total number of bytes is smaller.  That’s because the bits have been packed.



To unpack, given that the 'natural' boundaries have been stripped, we must either know how many bits are to be expanded to make up a proper item, or have a way of checking against known patterns (like say, by using Huffman codes!).  I use the former for the code table and the latter for the encoded image body. 

For the image scrutinised:
  • Raw bitmap size is 1710 kB.
  • 8 kB for the code table + 885 kB for the packed body + 199 kB for the raw un-encoded data (+ some for the file header; width, height, etc) = 1092 kB.
  • 1092 / 1710 = ~0.64. As mentioned up top, the compressed file is reduced to 64% of the original file size.
-----------------------------------------------------------------------------------------------------------------
That's it


I’m slightly proud of what I’ve done, and in less than a month!
  • created a library for Huffman coding from scratch
  • created a library for bit packing – both for known length items and for variable length items to be recognised against bit string patterns, from scratch
  • created an image compression/ decompression program using these tools that, from what I can see, is consistently better than the .png format that with the zlib library it uses have been around for decades?
If you are also proud of me, you’re welcome to drop a coin (or a note) in the Donation box. I live in Brisbane, Australia, and am a former, since made redundant, data analyst, looking after my daughter myself, and currently productively fulfilling the role of ‘unemployed dude’ hoping to make a living selling the Android-based future-soon smash “Kow’s Balls” (that may or may not be renamed to “Zaf’s Asteroids”).

Zaf.


Hello and Welcome

Hello and Welcome.

Zaf.