Understanding Fractal Detection

Environment: Platform Independent, Reference Implementation: Java2

Editor’s Note: This article uses Java; however, I believe you’ll find it interesting even if you are using C++ or some other languages. Also note that views and opinions in this article are the author’s.

Of all fractals, God is the greatest, for while being All, He is in every smallest part, in all His completeness. While being fully present, He is unique in every part. His source is infinity, and infinite are His destinations. I thank God for making me His part, and Him a part of me, which is how I will know Him more and will never get tired of learning.

I am a programmer. I may be a great thinker, a funny writer, and a cunning investigator, but all that is due to the fact that I love computers. And I think computers love me, too. Many programmers felt at one time or another a proud realization that a computer begins to work not because they did something right, but because the programmer is right there. However, there is something that computers love more than pleasing programmers: it is setting bits on and off. This is what they did, do, and will keep doing till the kingdom come, and then maybe after.

Now, I assume you all saw fractals, experimented with them, and so on. They sure are pretty, and they happen to be useful and fast. Our objective is to realize their utility and speed in our mind and then in our applications so we keep our jobs by working better and faster.

Fractals are easy to produce; trust me on this. The question is how to detect them. The good news is that there are algorithms that perform that detection fairly well. The bad news (well, not bad, just news) is that those algorithms involve some neck-breaking formulas. On one hand we need to tame these wild fractals; on the other, there is some intricate and expansive knowledge of math required to weave a lasso.

Thus, to ensure proper nutrition and stable growth, we need something digestible with a lot of tasty vitamins in it. And I’ll tell you one secret: There is an easy way to “get” it.

Let’s begin. Let’s experiment with this: be gin, beg in. These two expressions don’t make much sense, but they do make some sense, and then entirely different sense from each other and the original word. A simple division exposed more meaning, or complexity. This is what complexity really is: more meaning. This division caused not only more meanings, but also each expression means more than the original word. It is apparent that “be gin” stores action and object with an implied subject, and “beg in” stores action and direction with an implied object, while “begin” is just action with a blurry implication. This is what fractals are: expressions that mean less than their fragmented copies, provided that every copy has a different meaning from other copies. Also, true fractals are composed of infinite copies. Based on this definition, “begin” is not a true fractal, because fragmenting “be gin” and “beg in” wouldn’t make any sense. “Begin” is a pseudo-fractal, or practal. Practal, besides being a pseudo-fractal, is also a practical fractal, due to its finite nature. Hopefully it won’t cause deadly confusions with a spy poison that has the same name.

How do we detect fractals, then? When you notice that fragmented expressions show unique and increasing complexity, it’s a fractal. For practical purposes, it doesn’t matter whether it’s a true or pseudo fractal, because we won’t allow a computer to iterate to infinity. However, if a practal is detected before the end of the loop, its signature can be recorded to avoid iterations in the future.

Good. Now where do we start searching for fractals? Remember I was saying that “begin” has some blurry implications? That “blur” is a definite indication of fractals. What is that blur? Before I can answer this question, we need to define expressions.

An expression is whatever you make of it. “It” here means the expression of input. So what do we get, then? An expression of expression that probably expresses another expression? I’m sure we’ll be able to stop expressing at some point and get to do something useful. We do stop expressing because we can’t imagine the universe expressing anything but itself. How far is it from the universe to your everyday, average computer? As far as a computer is concerned, it is a universe, a very small universe that poorly repeats a very small part of the big universe. This way we don’t have to worry about the universe itself, only how it is delivered to the computer.

We know how to deliver the universe to computers; we did it from the very first day the very first computer was brought to the universe: input is what we call it. And the only expression of input computers understand is a sequence of bits that set off and on. We have witnessed that these seemingly simple expressions could store a whole lot of meaning. Meaning? What is that “meaning” that is stored in expressions? Meaning is again whatever you make of it, expression that is. Turns out, meaning is also an expression, only derived not from the universe but from us. What do we have here then? A computer takes the universe and then shows it to us, as if we can’t see it on our own? Apparently computers see something we can’t; that’s why we made them in the first place.

Can anybody define expression? I suggest you don’t try, or you’ll express yourself dry, and then some. For us, programmers, it is enough to know that there are expressions of input which are sequences of bits set off and on, and expressions of meaning, which are something humans can understand and that are delivered through output that again can express them comprehensibly.

Now we can return to the blur. There is no blur in input; it is all pretty clear: It’s either 0 or 1. However, the meaning, which is whatever we make of it, can be as blurry as we want. How do we do it? We feed the computer so much input and so fast that we can’t keep up with it. We lose the meaning of numbers and have to resolve to our own interpretation of output. That doesn’t help much; this blur has to be nailed down. Well, let’s see, a new word has popped up: interpretation. Interpretation is where humans and computers meet: they both interpret something. Interpretation is in fact the process of expressing another expression. And the most wonderful thing about interpretation is that it works one way: from input to output. And because there is no blur in input but a lot of blur in output, we can say that interpretation is the cause of blur. If you think about it, you’ll see that fractals are mental entities without any physical foundation.

Nevertheless, however mental, they are used by humans to speed understanding and save memory, or storage space, and so will computers, with our help.

Because blur is where fractals live, we need to create it from input. Remember I was talking about speed and volume of input fed to computers? How do we define speed and volume? By frames. A frame is a collection of bits that has a regular structure. And speed is defined by number of frames per second. For humans it is enough to observe 24 Hz (frames per second) of video and a few KHz of audio input to experience blur. They differ in frequency because they differ in volume: a lesser volume of input requires larger speed. Hmm, there might even be a constant that defines the proportion; I think I heard of it… Feigenbaum’s Number is what it is: 4.6692 and so on. I can’t be perfectly sure, so I’ll just say that there is a blur constant: volume*speed=blur. This constant defines at which point blur begins to appear.

However, this constant has nothing to do with computers, only with humans and mathematics. Still, there is a way to make a computer experience blur: Let every bit be a blurry number with a value between 0 and 1. Calculate that number as the average between a constant number of frames that we’ll call stretch. Now, what will you notice from dynamic input? You’ll notice formations of numbers with similar values move within the frame. Those formations are shapes, and the lines along which they morph are paths. The good part is we don’t need to know much about shapes, because paths will tell us a lot. Paths are more important than shapes because they can be represented as bit values. Let’s do something very simple: If a blurry number is above 0.5, it’s a path. Let’s take our frames and put them on the table one after another like this |||||, draw the paths on every frame, and connect them. We’ll get paths together with masks of nonmoving objects/noise that happened to be bright/loud enough. We need to eliminate this noise and get clear paths. The easiest way to do this is to perform an XOR (1^0=0^1=1, 0^0=0, 1^1=0) operation on every frame. Why is XOR better and faster than calculating blurry values? We know why it is faster: because it’s a bit operation that computers love to do.

The reason it is better is because blur occurs when something moves, or changes, and, naturally, XOR easily detects the change in bit value, not only that, those nonmoving objects remain in the dark because their bit value doesn’t change. Another advantage of XOR is that computer doesn’t have to do anything input-related when nothing changes; it can simply take one still image and keep repeating it for the duration to pretend that it’s working. Let’s say something happened; for instance, a programmer woke up at the terminal and commenced stretching exercises. What will the computer do? Since, like it or not, computer XORs every previous frame with the next, it will detect a presence of bright values and begin building paths. What would a good computer do? A good computer would recognize a human by general features and determine that this human just woke up in the chair and is in the act of stretching. Then it would probably advise on the adverse effects of sleeping in a sitting position, especially with clothes on.

What does that computer have to work with? All it has is a set of XOR masks that express moving shapes and their paths. Also, if nonmoving shapes happen to cover moving shapes, they appear as lines, like a back of the chair. Moving shapes very often have marks and shading that are detected by XOR. One thing XOR won’t detect is solid tones; however, it will detect the areas where solid tones overlap due to the movement. XOR will also display the parts of the background that are uncovered and covered by moving shapes. While XOR is the best and fastest method of detection, it does cause overlap errors. Those overlap errors are inevitable, however they can be reduced by increasing the frequency of input. Move your hand in front of the monitor and you will see how XOR works.

What did we achieve as a result? We gave the computer some shapes along with their paths that are basically overlap errors.

Those overlap errors are where we will look for fractals and detect them. Let’s return to the definition of fractals: expressions that can be interpreted with other expressions that have a greater and unique meaning, repeat this rule for every interpretation to infinity. A computer doesn’t have infinity to work with—just a few milliseconds—and that is enough to detect a fractal.

What is really an overlap error in computer terms? It’s a record of one state overlapped onto the next state. What will happen if we perform XOR operation on our XOR masks? We’ll be left with overlaps alone. Now, how would we rid the original image of overlap errors? I suggest using the NAND operation: NOT MASK AND IMAGE (!ms & im). This will fragment the image not so nicely. How would we go about nicer fragmenting? We would NAND the second mask with the first mask and then with the result, this way, finer fragments will appear. Very well; if we continue this nanding of masks from bottom to top we’ll get finer fragments at every level, fragments that have more meaning… Fractals is what they are. Thus by simply XORing and NANDing, we find fractals.

Now, I’d like to demonstrate this method and conduct a few experiments for our pleasure. What is the most available and meaningful input available to a computer? File input. To start with, we’ll work with English texts.

Apparently, language and especially written language doesn’t conform to multi-byte frame input; we would need something to model that. Instead, we’ll say that the whole length of text is a frame, and that movement occurs when text is shifted right by one character, and to preserve the frame length, we’ll make it loop by making the end come out at the beginning. What? You may say, “That doesn’t make any sense.” It makes perfect sense, after one story there is always another, so why not make it the same one?

You can do these experiments with me by going to http://prdownloads.sourceforge.net/fractalkiss and getting the latest “aligncube” package that contains FrameCube.java and ShowCube.java (ShowCube requires Java 3D).

I used Java2 to speed prototyping thanks to its type safety, garbage collection, and powerful Collections, not to speak of cross-platform 3D engine. The shortcoming turned out to be the absence of unsigned byte in Java, which is why FrameCube hogs two times more memory than a similar C application would; I am perfectly aware of how to work around signed bytes, but the processing overhead of such acrobatics is prohibitive. Still, 39Mb is a maximum requirement for FrameCube and 56Mb for ShowCube. English text operations are 13Mb and 22Mb, respectively. Yes, FrameCube is able to work with any kind of byte-represented input, which covers almost everything.

Let’s begin. Coming back to fractals, how do they repeat? Turns out, they repeat in space and time. When you look at it, you’ll see similar shapes. After you watch it evolve, you’ll see those similar shapes again. If that’s the case, then if we find a way to record spatial and temporal periodicity and look at it later, then by simply seeing that the result is not just some noise, but a regular structure, we’ll know we caught us some fractals.

What is space to a computer? The next bit is as far as it goes, or next byte or word. A bit is too little to store anything meaningful; a word is probably too much for most applications, but a byte is right in the middle of things, a perfect medium.

By the same token, the next frame is the unit of computer time.

Another thing of notice is when you look around, you’ll see that everything is divided, and every division is divided into smaller parts, each with different meaning, and all with more meanings than the whole. A room is made up of walls, lights, and furniture. A chair is made of seat, back, and legs. A seat (if you’re lucky) is made of soft-top and hard foundation. And so on.

Everything is made of fragments. This is especially true for a computer. English text, for instance, is not only a collection of bytes, but also every character code can be seen as fragment of another. The letter “A,” for instance, in ASCII has a code of 64, or 0100000B; and the letter “B” is 65, or 01000001B. Nothing can prevent me from saying that A is a fragment of B, because all that prevents A from being B is that last 1 at the end. More precisely, A is a fragmented copy of B, and the second 1 is the fragment. And B is a fragmented copy of DEL character, which is 01111111B.

This is what computers see that we can’t: fragments not only in space and time, but also in the very matter that makes them up: presentation.

In view of these insights, fractal detection becomes rather trivial.

Suppose your input is made up of bytes. And suppose any byte can take a value from 0 to 255, or, in English text, from 0 to 127. Let’s call that range of values N. If you make a byte array of N length, then every position in that array will correspond to a unique value. The significance of this array is that you don’t have to put a byte value in it, because you know it from position. Elementary, isn’t it?

Just what would you put there instead? Nothing yet, because you need a way to track periodicity, which only happens when something follows something, or changes value. For if we create a 2-dimensional array plane[N][N], where we can say that B happened after A and then say when: plane[‘A’][‘B’]=Time. Because computer time is measured in frames, Time is the frame number. As we know, a byte holds only 256 values, and A->B can happen on any number of frames; there is no way to store all of the frame numbers in one byte. In fact, only 8 independent numbers can be stored in a byte: 1 bit each; this is why a computer needs to work with only 8 frames at a time.

What is the frame of English text? One letter follows another, thus we have a story, which leads us to thinking that one byte can make a frame.

No matter how many bytes are in a frame, a change is recorded when a byte at a certain position changes its value in the next frame. Apparently, we need to know not only what changes, but also how and when. Here is more news: Fractals don’t understand “where;” only what, how, and when. There is no notion of size, position, or scale in fractals, which is a powerful feature. It means, for instance, that by parsing English text, a computer will learn grammar, and when given a sample story, it will be able to alter it until it’s unrecognizable and still produce readable and meaningful output. Or, by listening to human voices, it will learn to speak just like us, without digital clanging and banging. Let your imagination run wild here, and you won’t go wrong.

Again, a byte holds only 8 independent values, and now we have the task of cramming the answers to what, how and when there. We took care of what and when now: plane[A][B]=Time, and ran out of space for how. I suggest we return to XOR and NAND operations and see how they may help there.

XOR answers the question of how A becomes B: by finding the difference. NAND answers the question of how different is A without different fragments of B, by negating A^B XOR mask and comparing it to A: ~(A^B)&A. In this particular case, A, being a fragment of B, remains the same. However, if A followed B, as in “BARRACUDA,” ~(B^A)&B gives a value of 01h. How safe is it to say that plane[01h][‘A’]=Time? You can easily deduce what 01h means by ORing it with ‘A,’ or by being strict and doing ~(01h^A)|A. This way, we can record what changed and how in the same byte, and not only that. If we abstract from English text, we also recorded a fragment of input that may behave identically somewhere and sometime else. Isn’t that what we always look for? Don’t we need to recognize objects that are different from others in their shape and behavior, independently of context? That is what fractal detection is about.

We’ll go further and divide our input in fragments and record their behavior. As we know, fractal fragments have more meaning, and their behavior is more meaningful as well. Once you divide a fractal, you should be able to divide the resulting fragments, and so on. This is what well do: we’ll fragment our frames seven times and get eight copies, with the presentation layer on top, and the most fragmented layer on the bottom. And for every given frame, we’ll record our bytes: plane[An][Bn-1]|=1<<n-1, where n is the fragmentation level, A is this frame, and B is the next frame. This way, we record not only how top layer behaves, but also how each fragment behaves with other fragments, and if needed, we’ll be able to get the original by a few bit operations, but we probably wont need it, because we are trying to make a computer recognize things, not re-create them.

Apparently, for every given frame, we start with the first fragmentation level and point to the original, which leaves us with one free bit. How often do you watch movies backward? How often do you walk on your hands? Not often, but you must confess that it happens once in a while. One way or another, things are prone to happen in reverse, and you have to deal with it. Does a programmer stop being one if he happens to walk on his hands? We’ll help the computer decide by putting [B0][A1]|=1<<7. This way the top bit records upside-down life events. And if later we find out that this byte has only the top bit set, we’ll know that such behavior only happens in reverse, and never forward. Thus we’ll know in what position a programmer fits the description best.

Two bytes is not a lot of room to play with, and the resulting square doesn’t produce pretty pictures. I suggest having a cube[N][N][N], where we can record 3-byte behavior by cube[An][Bn-1][Cn-2]|=1<<n-2, which leaves us with two bits for the reverse mode: [C0][B1][A2]|=1<<6 and [C1][B2][A3]|=1<<7. While it may seem that having only 6 bits left for forward mode decreases the ability to think coherently, in fact, adding another dimension increases the precision twice: 7+6 for forward mode and 2+1 for reverse. The resulting cube can be displayed and colored at will: I put first 3 bits as red, next 3 bits as green, and last 2 reverse bits as blue. As a result, you will see all kinds of dots. The striking part is that cubes are not only different depending on input: text vs. movie vs. explorer.exe, but also each cube displays periodicity in every part as compared to whole, and as whole compared to parts, if you have a coherent input, that is.

I am glad to say that the end of this story is near. For now, that is. The last part involves formulas, or algorithms, consisting mostly of bit operations, which is nothing to be afraid of.

The following algorithms will allow you to produce L fragmented copies from L frames. L stands for levels.

Here “b” is the frame number, for buffer. A frame buffer is a FIFO stack of L frames; when one frame leaves from the top, another comes in at the bottom. The buffer is fragmented for every frame. First, the buffer is filled with L frames, fragmentation is performed, 1st frame leaves from the top, L+1 frame comes in at the bottom, and process repeats every time a frame leaves.

“n” is the fragmentation level. “0” is the top frame itself, or presentation level.

FRAG(n,b)=INAND(n-1,b) NAND FRAME(b)
FRAG(0,b)=FRAME(b)

INAND(n,b)=IXOR(n,b) NAND IXOR(n-1,b) NAND ... NAND IXOR(0,b)
INAND(0,b)=IXOR(0,b)

IXOR(n,b)=IXOR(n-1,b) XOR IXOR(n-1,b+1)
IXOR(0,b)=FRAME(b) XOR FRAME(b+1)

XOR is per byte XOR of one frame with the other.

NAND is per-byte negation of XOR mask and comparison with the target. NAND is not commutative. INAND cannot be performed another way. Bottom XOR masks are negated and compared to higher masks, all the way to the top.

IXOR table is a triangle; the first frame has most masks and can be fragmented more than others. When the first frame leaves, its XOR masks can be removed, and the second set of XOR masks becomes the first, only requiring the last layer, which is built when the next frame comes in at the bottom.

The FRAG table is also a triangle, the same shape as IXOR. While you don’t need more than three fragmented sets, it doesn’t hurt to have them all at one time, and add to them a little with every new frame.

After you build your FRAG table, you can perform “alignment,” or recording the fragment behaviors in the cube. Here you have L=8 fragmented copies for frame 0. Every frame is composed of F bytes. Populate your cube this way:

FOR D=2 D<L D++
  FOR P=0 P<F P++
    X=FRAG(D  ,0)[P]
    Y=FRAG(D-1,1)[P]
    Z=FRAG(D-2,2)[P]
    CUBE[X][Y][Z]=CUBE[X][Y][Z]|1<<(D-2)
    IF(D<4)
      CUBE[Z][Y][X]=CUBE[Z][Y][X]|1<<(L-4+D)
    ENDIF
  END
END

That’s all there is to it. If you don’t like cubes, get on a plane; you will lose precision but will gain speed and economy.

So now you know that being a “one-bit” programmer is not a bad thing; it can get quite interesting if you play with it.

Where does this leave us with programmers who sleep in chairs with their clothes on? I suggest you take a deep breath and shout, “Wake up!” because if they keep on sleeping, they’ll never make a computer justify their rest.

FRAG(n,b)=INAND(n-1,b) NAND FRAME(b)
FRAG(0,b)=FRAME(b)
INAND(n,b)=IXOR(n,b) NAND IXOR(n-1,b) NAND … NAND IXOR(0,b)
INAND(0,b)=IXOR(0,b)
IXOR(n,b)=IXOR(n-1,b) XOR IXOR(n-1,b+1)
IXOR(0,b)=FRAME(b) XOR FRAME(b+1)

XOR is per byte XOR of one frame with the other.

NAND is per-byte negation of the XOR mask and comparison with the target. NAND is not commutative. INAND cannot be performed the other way. Bottom XOR masks are negated and compared to higher masks, all the way to the top.

The IXOR table is a triangle; the first frame has most masks and can be fragmented more than others. When the first frame leaves, its XOR masks can be removed, and the second set of XOR masks becomes the first, only requiring the last layer, which is built when the next frame comes in at the bottom.

The FRAG table is also a triangle, the same shape as IXOR. While you don’t need more than three fragmented sets, it doesn’t hurt to have all at one time, and add to them a little with every new frame.

FOR D=2 D<L D++
  FOR P=0 P<F P++
    X=FRAG(D  ,0)[P]
    Y=FRAG(D-1,1)[P]
    Z=FRAG(D-2,2)[P]
    CUBE[X][Y][Z]|=1<<(D-2)
    IF(D<4)
      CUBE[Z][Y][X]|=1<<(L-4+D)
    ENDIF
  END
END

Downloads

Download demo project and source – 59 Kb

Understanding Fractal Detection

Downloads

CodeGuru Staff

Company

Categories