Zaf Kow

Saturday, June 29, 2019

Interpolation for image resizing

I wanted my own image resizing method (specifically for image enlargement) for use in some project so, here we are.

Early on I wanted to learn how the Lanczos filter works but I couldn’t find a readable explanation. I still don’t know how the Lanczos filter works. Given the (tight) similarity of the results, I may have just re-invented the Lanczos filter (after a fashion), but I don’t know.

Here is an image, origally 500 x 500 pixels.

Below is a part of the image after resizing by 3 using Lanczos.

Below is the same part of the image using the method being described.

As is evident, my method is hardly distinguishable from Lanczos. My method shows here to be slightly lighter or having lower contrast but that can be adjusted with a parameter that can be passed in during resizing.

My method employs two steps, though both steps are functional on their own. These are linear interpolation and a special kind of contrast enhancement.

I’ll only detail the interpolation part in this post.

Linear interpolation – core method

Given two values, A and B, and a proportional position, prop, (between 0 and 1) between them, the interpolated value is found as follows.

   Apart = A * (1.0 – prop)
   Bpart = B * prop
   interp_val = Apart + Bpart

It can be seen from the pic that when prop is large, the position is closer to B, hence A’s value is proportionally less (hence the 1.0 - prop).

Or as a function (for later use):

float interp (float A, float B, float prop) {
   float Apart = A * (1.0 – prop);
   float Bpart = B * prop;
   return Apart + Bpart;
}

or just

float interp (float A, float B, float prop) {
   return (A * (1.0 – prop)) + (B * prop);
}

Linear interpolation of an image

The general method (to be improved on futher down) is:

determine the inverse scaling factor

for each dest pixel position
   get decimal position in source (src)
   get integer part of decimal value (srci)
   get fractional part of decimal value (srcf)
   interpolate for values at srci and srci+1 with srcf as prop

Or more specifically, (and without any bounds checking):

Where scale is a parameter passed in as a decimal multiple of original image, inp is input image, and outp is output image, presumed already correctly sized to accommodate new data.

float inv_scale = 1.0 / scale;
float src, srcf;
int srci;

for (int x=0; x<dest.width(); x++) {
   src = x * inv_scale;
   srci = floor (src);
   srcf = src – srci;
   outp (x) = interp (inp (srci), inp (srci+1), srcf);
}

Where the pixel’s value is

The previous method presumes that each pixel is represented on pixel boundaries.

It is however more accurate to interpret the discrete (pixel) data as a continuum, the value of a pixel then being in the abstract middle between the discrete pixel boundaries.

If each pixel’s value is in its middle then both the source and destination references must be to their respective middles.

So instead of
   src = x * inv_scale

this
   src = (x + 0.5) * inv_scale

Further, if srcf is less than 0.5 then the position is between that and the previous pixel, and not that and the next pixel. This is important.

And if that is the case then the value of the proportional position changes too. If it was 0.2 (that is 0.3 to the right of middle of the pixel at srci) then the proportional position is 0.7 from the left of middle of the previous pixel.

So
   if (srcf < 0.5) {
       srci = srci – 1;
       srcf = 0.5 + srcf;
    }

However if srcf is not less than 0.5 then the proportion needs to change too but differently. So we also have:

   else {
       srcf = srcf – 0.5
}

Edge cases

If srcf is to the left of middle of the first pixel then there is nothing to interpolate with, so use that pixel’s value as the dest value. Similarly if srcf is to the right of the middle of the last pixel then use that pixel’s value as the dest value.

Left edge

    if (srci == 0 and srcf < 0.5)
       dst_val = inp [srci];

Right edge

   if (srci == width-1 and srcf > 0.5)
       dst_val = inp [srci];

Complete(r) code

void resize (float scale, image & inp, image & outp)

    for (int x=0; x<dest.width(); x++) {
       src = (x + 0.5) * inv_scale;

       srci = floor (src);
       srcf = src – srci;

        if (srci == 0 and srcf < 0.5) {
           outp (x) = inp (srci);
           continue;
        }
       if (srci == width-1 and srcf > 0.5) {
           outp (x) = inp (srci);
           continue;
        }

       if (srcf < 0.5) {
           srci = srci – 1;
           srcf = 0.5 + srcf;
        }
       else {
           srcf = srcf - 0.5;
        }

       outp (x) = interp (inp (srci), inp (srci+1), srcf);
}

Blocking and cumulative scaling

Scaling by factors greater than two results in apparent (soft) blocks of colour. This is because many dest pixels get their values from the same source pixels (generally).

This can be addressed by cumulatively resizing an image, limiting each run to a maximum scaling factor of 2, (and a minimum of 1).

If scaling by 750%, or 7.5, then

   7.5 / 2 = 3.75         // scale by 2
   3.75 / 2 = 1.875     // scale by 2
   1.875                   // scale by 1.875

Sanity check: 2 * 2 * 1.875 = 7.5

Generally:
   while 1
       if scale >= 2.0
           interp by 2.0
           scale = scale / 2.0
       else if scale > 1.0
           interp by scale
           break

void cum_resize (float scale, image & inp, image & outp)

   while (1) {
       if (scale >= 2.0) {
           resize (2.0, inp, outp);
           inp = outp;
           scale = scale / 2.0;
        }
        else if (scale > 1.0) {
           resize (scale, inp, outp);
           break;
        }
    }

Resizing down

The method generally works for image reductions too - so long as the scale is > 0.5. If the factor is <= 0.5 then dest pixels will be missing source pixels.

Note however that there is a better way to resize down. (The Lanczos filter produces ‘nicer’ down-sized images than this method.) And that better way may be the topic of another post.

In the meantime, cum_resize () needs to be extended to accommodate cumulative reductions for scaling factors <= 0.5;

    if (scale < 1) {
        while 1 {
           if (scale <= 0.5) {
               resize (0.5, inp, outp);
               inp = outp;
               scale = scale / 0.5;
            {
           else if (scale < 1.0) {
               resize (scale, inp, outp);
               break;
            {
        }
   else if (scale > 1) {
       while 1 {
           if (scale >= 2.0) {...}
       }
   else { // scale == 1, so nothing to do but pass outp as inp
       outp = inp
}

Images are two dimensional

Indeed they are. The solution is to recognise that scaling is separable, meaning that scaling by x and y separately is fine (not just fine but correct). My personal method is to cumulatively scale by xscale, rotate (transpose) the image, cumulatively scale by yscale, rotate the image back.

Generally:

   cum_resize (xscale)
   rotate left

   cum_resize (yscale)
   rotate right

Conclusion

That's all.

Saturday, June 22, 2019

Static Ordering (A Better Ordered Container?)

[For the record, the development here described was done through October 2018.]

How to insert data into a container such that the values are maintained in order - is a classic problem in computer science.

A standard method is to store values as nodes of a tree. The tree is usually adjusted on an as-needed basis to maintain some structural rules. C++ uses Red-Black Trees to maintain its set and map containers.

My thinking on this is that there could be a way to put values in order that relies on the implicit structure of each value. Say, given value 245, could each of 2, 4 and 5 be used as nodes on a tree?

I have built such a contraption.

Basic concept is to link numerals together. So 23 and 24 would share a common 2 node that joins to respective 3 and 4 nodes. Each node has as many potential child nodes as is the numeric base of the tree (that is a parameter that can be set at tree creation or hard-coded).

Insertion durations for the same random numbers are given below (compared to a C++ set). Lower is better.

Why is this faster than RB Tree for insertion?

Because comparisons are not made between values (per C++ sets), and no nodes are ever re-linked to maintain any structural rules (also per C++ sets). In SO trees the initial placement is perpetually optimal.

Some history

I initially converted a value to a string then got elements by accessing each character and converting that to a number. Easy but slow.

I then dropped the string conversion. Instead, the initial value was converted to an alternate base, storing each numeral along the way. Using base 2 was found competitive but not better than C++ set.

Current Version

Converting bases requires some division, which is slow. Current version uses the value’s binary chunks, raw. Optimal span has been found to be 4 bits (base 16).

Iterating through container

RB trees are maintained such that adjacent nodes have the next smaller or next larger value. In SO trees, adjacent nodes are not necessarily the next smaller or larger valued nodes. With SO trees, the next value (smaller or larger) must be discovered by traversing the tree from the reference node. This requires a method to manage the traversal, and also a method to manage the edges of the tree (so that traversal can continue onto the opposite side, but lower or higher, if need be). I’ve tried describing how this is implemented but it ends up not much clearer than the source code for it – that could be described, at the moment, as some variation on the theme of grotesque. So I’ve spared you. You’re welcome.

I haven’t timed traversing SO trees. It’s probably really really bad. But I will likely do this at some point and post a follow-up, post.

Data types supported

Only implemented for 32 bit integers. The thing can however be trivially extended for chars, shorts, long ints, etc, or anything that can be represented by ints like, say, pointers.

Floating point value storage has been thought about. Just so you know.

Finally

Given my no-doubt amateurish implementation does insertions at nearly double the speed of the industry standard library, there’s probably some potential in this model.

Thursday, September 13, 2018

How to correctly scale the Kuwahara-Nagao filter

What is the Kuwahara-Nagao filter?

The Kuwahara-Nagao filter, or "KN filter", is awesome. It is an edge-preserving filter that both blurs and sharpens at the same time, like magic. The core concept is to sample regions about a reference pixel, and to pick the average colour of the region with the least variance. Or, in other words, avoid regions with 'a lot going on'

Background

Sometimes an image is a bit 'grainy', and it's desired to smooth the image. See figure 1 a. One way is to apply a blur. But doing so blurs everything including details, like edges, that were not intended to be blurred. See figure 1 b.

Sometimes an image is overly blurry, and sharpenning is needed. But sharpenning doesn't discriminate, and artefacts become sharpenned just as much as useful details. See figure 1 c.

KN filter solves everything. The KN filter both smooths small artefacts while not just preserving but even enhancing edges (though may produce a slight blockiness to the image but, that's a bonus, and doesn't count). See figure 1 d.

Figure 1, a) an image, b) blurred, c) sharpenned, and d) filtered with the Kuwahara-Nagao filter

What's the problem?

I've come across more than a few implementations of the KN filter that produce artefacts and I'm suspecting that the standard method used to increase the scale of the filter is wrong.

Kang et al produced a paper called "Image and Video Abstraction by Anisotropic Kuwahara Filtering" that, while being an excellent paper otherwise, includes what I consider to be an incorrect implementation of the KN filter.

For example in Kang's paper (their) Figure 2 (b) shows a quad-shaped wavy/shell pattern where flat colours might be expected, per figure 2.

Figure 2, Kang et al's 's KN filtered image

The very good (for what it's meant to do) graphics program Paint.net allows user created effects. There are two such user created KN filters (that I found) that produce the same artefacts as are visible in Kang et al.

However, another program, mtPaint, has a KN implementation but, produces images without the artefacts, per figure 3.

Figure 3, how the image should be filtered with the KN filter

While testing my own initial implementation of the KN filter, I used the mtPaint output as the benchmark for what is 'correct'. Why do I consider the Kang and (user made) Paint.net implemenations bad and the mtPaint implementation good? Because the former look crap, and the latter doesn't. QED.

The usual definition for the KN filter, and the implied scaling method

The original paper (from 1976) referenced by Kang et al exists as far as I can tell only behind a pay wall; if anyone knows of a handy public copy or link please email me, I'd love to see the original paper. Meanwhile, there are many public sources that describe the KN filter's concept and method (wikipedia, various universities' lecture notes, etc) – but uncannily all these only ever (so far as I've found) refer to a 3x3 box size – which suggests that the source probably does not define scaling up from 3x3. And this is likely the cause of the artefacts; people have guessed at a scaling method, incorrectly.

The method for the KN filter is roughly as follows. (This isn't meant as a tutorial on the KN filter itself, it's here for reference, and to be improved on).

Take a 3x3 box, divide it into four 2x2 boxes or samples, overlapping about the centre pixel. Per Figure 4, below.
For each sample get the variance and average colour.
Select the sample with lowest variance, and take the average colour of that sample, and make it the colour for the centre pixel.

Figure 4, the standard 3x3 KN filter box

When I first tried to implement my own KN filter, and wanted to scale the filter size up, I did the 'obvious', by just making each of the four boxes bigger, per figure 5.

Figure 5, standard, and wrong, sampling for a 5x5 filter, with only four samples

With this scaling method I successfully reproduced the crappy wavy artefacts! Yeay!

Recognising the problem

It occured to me that this scaling method marginalises the centre reference pixel more as the filter size increases, which is bad.

It occured to me to fill the filter with as many sample regions (of the same size) as distinctly fit, as per figure 6.

Figure 6, correct KN sampling for a 5x5 filter area

It occured to me that the general definition for the KN filter may not be, and maybe never was, 'draw an nxn box and then sample four boxes overlapping about the centre pixel...', but, generally, 'fill the box with as many of the same sized samples as can overlap with the centre pixel' – which only happens to mean, for a 3x3 filter, four 2x2 boxes. The fact that there are four samples for a 3x3 filter is just what happens for a 3x3 filter, and is, I reckon, not a rule to be applied for all filter sizes.

The correct definition for the KN filter

For an n x n box (where n is asserted to be odd), define span as the edge length of a sample such that span = ((n+1)/2).
Fill the filter area with span x span number of samples, each having span edge length.
Proceed with getting variance and average colour per sample as per usual, picking sample with lowest variance, using the average colour of that for the reference pixel.

Thursday, September 6, 2018

Kowalski's Image Format (KIF)

I have developed a new lossless image format that is consistently (but not always) better than the .png format.

The files referred to in this post are here:

kif.exe

kifbmp.exe

kifany.exe convert.exe

The purpose originally was to have my own compressed-ish image format so that I didn’t need to rely on others’ libraries, for use in the development of my game - with working title: “Kow’s Balls”.

Result, in brief, is a consistent improvement of, on average, 6% over the same sample images saved as .png files.

This statistic comes from a varied sample of 24 images that ranged from cartoonish and flat to sharp and detailed. The range of improvement compared to the same files saved as .pngs was: -13% to (+)20%.

Give it a go.

The compression work is done by my own library (that is image format agnostic) that does not use any external libraries, that however does use a front end to shuttle image data between the world and itself (the compression lib).

I’ve created three variants of execuable each with a different (internal) front-end to handle this image i/o to a greater or lesser extent, some of which do use an existing image handling library (again, only for the initial loading/ final saving of images, and not for any part of the compression):

kif.exe – converts between kif files and targa files only
kifbmp.exe – converts between kif files and bitmaps (in addition to targas) using the CImg library
Kifany.exe (+ convert.exe) – converts between kif files and pretty much any other image format, using the CImg library that itself calls ImageMagick’s convert.exe (tested with png, jpg, tiff, pcx, but probably works with a bunch of others) in addition to targas and bmps

To just test the compression, use kif.exe or kifbmp.exe, and compare the output .kif to a .png of the same original file. For greatest flexibility,use kifany.exe (that itself calls convert.exe).

-Usage-

Where you see “kif.exe”, interpret that as either “kif.exe”, “kifbmp.exe” or “kifany.exe”, depending on the kinds of file you’re working with.

kif.exe [filename].[img ext] - outputs [filename].kif
kif.exe [filename].kif - outputs [filename]_.bmp (default)
kif.exe [filename].kif - out [img ext] - outputs [filename]_.[img ext]
kif.exe [...] -p - prints some progress and metrics to console during execution

Notice output files, other than kifs, get an extra underscore after the filename.

-Some examples-

From the console, (don’t type “<console>”):

<console> kif mycat.tga             (outputs mycat.kif)
<console> kif mycat.kif    (output mycat_.tga)
<console> kifbmp mycat.bmp    (outputs mycat.kif)
<console> kifbmp mycat.kif       (output mycat_.bmp)
<console> kifany mycat.jpg       (outputs mycat.kif)
<console> kifany mycat.kif -out png (outputs mycat_.png)

The most direct way of comparing the compression between pngs and kifs, is to directly process a png and see how big the kif is that gets output.

<console> kifany mycat.png    (outputs mycat.kif)

As is, this is a ‘proof of concept’ free-standing program, nothing ‘installs’. Just put the exe/s where you have some images and run (from console) from there.   Alternatively, put the executable/s somewhere and point your Path to that location, or put the files where Path already points. Then run from a console anywhere you have images. Got it? Good.

-About targas-

I’ve noticed that at least one program saves targas in a way that can produce strange results. If you compress then uncompress a targa and the image is flipped with weird colouring, try saving the original targa with another program and try again, or use bmps instead.

-----------------------------------------------------------------------------------------------------------------

About the programming environment

All done in C++(11) with gcc version 5.1.0 (tdm64-1), using the Code::Blocks IDE.

I make extensive use of the C++ Standard Template Library.

The kif library including its (my) dependant libraries has ~1550 lines of code. The front-ends each have about 100 to 200. So pretty compact, though it does seem to compile to a relatively large executable...

I’ve built for Windows only for now. But there’s nothing system specific that I know of (endianness wrt bit-packing?) that would inhibit compiling for Linux or Apple – when the day comes.

-----------------------------------------------------------------------------------------------------------------
Method (short version)

Identify all distinct two pixel aligned patterns and replace them with Huffman codes, then bit-pack everything.

-Why only two pixels?-

I initially supposed that longer patterns would be better. I tried 4, 8, and longer, and some in-between. I tried rectangular regions too. The result of using longer patterns is that there will be fewer distinct patterns, which produces a corresponding increase in the number of codes, which in terms of the Huffman coding used, means more longer codes, which finally means less compression.

-Why aligned?-

Because that ‘fixes’ where to perform processing, making everything that little bit easier.

-----------------------------------------------------------------------------------------------------------------
Special features and or known bugs

What I know or have otherwise noticed about the program:

Purpose built for 24 and 32 bit images; (for those formats that support 32 bit)
Not yet tolerant of 8 bit indexed images. Program currently treats these as 24 bit
Images must have even width and height. Because I just happened to be build-testing with images that were even and so didn’t notice that I hadn’t accounted for odd sizes until running my statistics and I can’t be bothered attending to that now because the program does what I want it to do as is (mostly) and I’m happy to require myself to use even images until I care enough otherwise so there
All serial processing; no multi-threading nor parallel/ GPU processing

I intend to address these in due course.

-----------------------------------------------------------------------------------------------------------------

Magic Source, two kinds

Very broadly, there are two main technologies involved; Huffman-coding and bit-packing. Each is its own comp sci topic but here’s a brief note on each for now.

-Huffman coding-

Huffman coding is the process of generating (variable length) codes based on the probability of the items occurring, which will result in shorter codes for more common items. That’s important. And where no code is a segment of another code. That is, if one code is “234”, then there can’t be another code such as “2345”. This ensures that when you have a datum string that looks like a code, it is that code and not part of another longer code. That’s the second point of Huffman coding. In the kif case, I’m coding for 16 bit chunks (2 x 8 bit colour values). A common pattern is {0, 0} (no change). In some cases (but not all) this pattern (of 16 bits) is represented by a Huffman code of 1 bit; ‘0’ or ‘1’. The result is the replacement of 16 bits with 1 bit! If there’s a lot of not much changing, then the compression can be significant.

Below is some data on an image that was reduced by 36% (compressed to 64% of original size). It is marginally better than the .png (reduced by 35%), and is about half way down my list of 24 images in terms of compression rate, so it’s a reasonable pick to investigate.

Here’s the length of the H codes against how many codes there are of that length, and that count as a percentage of the total. In other words these are the stats for the 'alphabet' of codes.

Here are the codes again but this time showing how many times each occurrs in the (same) image.

What have I shown here? The top table and chart shows that most of the generated codes are long but the bottom chart and table shows that those longer codes don’t actually occur much, that it’s the smaller codes that occur a lot. In fact that one single code of 3 bits constitutes almost 10% of all code occurrences in the image, repeating more than 86,000 times from over 870,000 total instances. (So that’s 3 bits replacing 16 bits 10% of the time.) The bottom data also shows that more than half (~56%) of the codes used are 8 or fewer bits long, that is, more than half the data has been replaced by codes half as long as the original (16 bit) raw data.

-Bit packing-

Suppose you have some numbers; 3, 10, 45, 154, 27, 79. Below is a series of bytes with the values, in binary as they are stored in memory.

Notice I put a box around the segment that contains the bits required to represent the value; the significant bits.

Below are the significant bits packed into bytes. The colours represent the distinct values, and the boxes are the byte boundaries. You can see how some of the bit strings pack into one byte while others span byte boundaries. And obviously the total number of bytes is smaller. That’s because the bits have been packed.

To unpack, given that the 'natural' boundaries have been stripped, we must either know how many bits are to be expanded to make up a proper item, or have a way of checking against known patterns (like say, by using Huffman codes!). I use the former for the code table and the latter for the encoded image body.

For the image scrutinised:

Raw bitmap size is 1710 kB.
8 kB for the code table + 885 kB for the packed body + 199 kB for the raw un-encoded data (+ some for the file header; width, height, etc) = 1092 kB.
1092 / 1710 = ~0.64. As mentioned up top, the compressed file is reduced to 64% of the original file size.

-----------------------------------------------------------------------------------------------------------------

That's it

I’m slightly proud of what I’ve done, and in less than a month!

created a library for Huffman coding from scratch
created a library for bit packing – both for known length items and for variable length items to be recognised against bit string patterns, from scratch
created an image compression/ decompression program using these tools that, from what I can see, is consistently better than the .png format that with the zlib library it uses have been around for decades?

If you are also proud of me, you’re welcome to drop a coin (or a note) in the Donation box. I live in Brisbane, Australia, and am a former, since made redundant, data analyst, looking after my daughter myself, and currently productively fulfilling the role of ‘unemployed dude’ hoping to make a living selling the Android-based future-soon smash “Kow’s Balls” (that may or may not be renamed to “Zaf’s Asteroids”).

Zaf.