Saturday, June 22, 2019

Static Ordering (A Better Ordered Container?)

[For the record, the development here described was done through October 2018.]

How to insert data into a container such that the values are maintained in order - is a classic problem in computer science.

A standard method is to store values as nodes of a tree.  The tree is usually adjusted on an as-needed basis to maintain some structural rules.  C++ uses Red-Black Trees to maintain its set and map containers.

My thinking on this is that there could be a way to put values in order that relies on the implicit structure of each value.  Say, given value 245, could each of 2, 4 and 5 be used as nodes on a tree?

I have built such a contraption. 

Basic concept is to link numerals together.  So 23 and 24 would share a common 2 node that joins to respective 3 and 4 nodes.  Each node has as many potential child nodes as is the numeric base of the tree (that is a parameter that can be set at tree creation or hard-coded).

Insertion durations for the same random numbers are given below (compared to a C++ set).  Lower is better.




Why is this faster than RB Tree for insertion?


Because comparisons are not made between values (per C++ sets), and no nodes are ever re-linked to maintain any structural rules (also per C++ sets).  In SO trees the initial placement is perpetually optimal.
 

Some history


I initially converted a value to a string then got elements by accessing each character and converting that to a number.  Easy but slow.

I then dropped the string conversion.  Instead, the initial value was converted to an alternate base, storing each numeral along the way.  Using base 2 was found competitive but not better than C++ set.



Current Version


Converting bases requires some division, which is slow.  Current version uses the value’s binary chunks, raw.  Optimal span has been found to be 4 bits (base 16).
 

Iterating through container


RB trees are maintained such that adjacent nodes have the next smaller or next larger value.  In SO trees, adjacent nodes are not necessarily the next smaller or larger valued nodes.  With SO trees, the next value (smaller or larger) must be discovered by traversing the tree from the reference node.  This requires a method to manage the traversal, and also a method to manage the edges of the tree (so that traversal can continue onto the opposite side, but lower or higher, if need be).  I’ve tried describing how this is implemented but it ends up not much clearer than the source code for it – that could be described, at the moment, as some variation on the theme of grotesque.  So I’ve spared you.  You’re welcome.

I haven’t timed traversing SO trees.  It’s probably really really bad.  But I will likely do this at some point and post a follow-up, post.



Data types supported


Only implemented for 32 bit integers.  The thing can however be trivially extended for chars, shorts, long ints, etc, or anything that can be represented by ints like, say, pointers.

Floating point value storage has been thought about.  Just so you know.



Finally


Given my no-doubt amateurish implementation does insertions at nearly double the speed of the industry standard library, there’s probably some potential in this model. 

No comments:

Post a Comment