Search This Blog

Compression as Intelligence (Garbage Out, Brilliance In)

I am convinced that the secret to developing intelligence (in any substrate, including your brain) lies in the percentage of the data coming in that you are willing (or forced) to toss. Lossy compression is the key to intelligence. Of course there is a caveat… you can't just trash anything and everything.

The first line of the book I am writing about evolution: "What matters is what matters, knowing what matters and how to know it matters the most."

I am convinced that evolving systems can only work towards mechanisms that process salience if they are forced to maximize the amount of stuff they can trash.

If you are forced to get rid of 99.999 percent of everything that comes in, well you will have to get good at knowing the difference between needles and hay and you will have to get good at knowing the difference in a hurry. The "needles and hay" metaphor doesn't map well to what I am talking towards. If the system you are dealing with is so unstructured as to fit the haystack metaphor, you really aren't doing anything I would classify as intelligence. If there is nothing of structure in the haystack you are storing than your compression system should already have tossed the whole thing out.

Many techniques for the filtering of essence, for finding pattern, for storing pattern and for storing pattern of pattern have been developed. The most impressive reduce raw input streams and store pattern from the most general to the most specific as hierarchically stratified graphs.

Being forced to reduce data to storage formats that maximize lossy-ness minimizes necessary storage. But that is just a perk. What really gates intelligence is the amount of a complex system (or map thereof) that can be made proximal to immediate processing. Our brains might be big and mighty, but what really matters is how much of the right parts of what is stored can be brought together in one small space for semi-real-time simulations processing. Information, when organized optimally for maximal storage density, will also be information that is ideally organized for localized serialization and simultaneity of processing.

To think, a system has to be able to grab highly compressed pattern hierarchies and move them into superposition on top of each other for near instantaneous comparison. You can't do this with a whole brain's worth of data, no matter how well organized it is.

Lets say you have to store everything you know about every sport you have ever heard of, and you have to do it in a very limited space. You will be forced to build a hierarchy of grammars in which general concepts shared in every sport (opponents, the goal to win, a set of rules and consequences, physical playing geometries, equipment, etc.), with layers of groupings that allow for the similarities between some sports and so on up to the specifics that are are only present in each individual sport. Keep compressing this set. Always compress. Try all day (or all night) for even more compression. Compress until you can't even get to lots of the specifics any more. Keep compressing. Dump the sports you don't care about. Keep on throwing stuff out.

Now lets say I have some sort of morbid sense of humor and I tell you that you are going to have to store everything you encounter and everything you think about, your entire life, in that same database that you have optimized for sports.

You will have to learn to look for the meta-patterns that will allow you to store your first romance in a structure that also allows you to store everything you know about kitchen utensils and geo-politics and the way the Beatles White Album makes you feel when it is windy outside.

The necessity to toss, enforced by limited storage and an obsession to compress will result in domain-blending salience hierarchies. It is why we can find deep similarities between music and geological topologies. It is why we can "think".

For years people have tried to come up with the algorithms of thought. What we need instead is to build into our artificial systems, a very mean and ornery compression task master that forces over time, all of our disparate sensation streams into the same shared graph.

Once you have all of your memories stored within the same graph, by necessity sharing the same meta-pattern, the job of evolving processing algorithms is made that much easier.

An intelligent system will spend most if not all of its time compressing data. We have a tendency to bifurcate the behavior of a mind into storage on the one hand, and processing on the other. I am beginning to think that the thing we call "thinking" and "thought" is exclusively and only a side-effect of constant attempts at compression – that there really isn't anything separate that happens outside of compression. Is this possible?

Randall Reetz


Jack Christopher said...

Do you know of Jurgen Schmidhuber?

"I argue that science, art, music, comedy, and many other aspects of intelligent behavior are just by-products of our intrinsic desire to create or discover novel patterns, that is, data compressible in hitherto unknown ways. In other words: non-arbitrary, regular data that is surprising not in the traditional sense of Boltzmann and Shannon but in the sense that it allows for compression progress because its regularity was not yet known."

Randall Lee Reetz said...

I agree Jack,

At least in premise. But what I always look for is the causal (least-energy topology) that forces behavior or morphology towards some loci and away from others. Compression is awkward (expensive and therefor short-lived) if it doesn't represent pattern that is tightly representative of the actual causal chain (physics) of the universe.

Pattern itself is cheap. Pattern that predicts least energy topology into the future is the N-hard problem that necessitates evolution.

Evolution works only when new pattern is in absolute agreement with the salience (physics) abstracted in that systeme's previously accumulated pattern… and at the same time, building causal (least energy) bridge (tunnel?) to future optimal states. That criteria represents a high bar for the kind of pattern that matters.

For an anti-example, think crystal. Crystals take a simple pattern and overrepresent it. Crystals are to evolution what cancer is to healthy growth and repair. Crystals overwhelm the energy throughput potential of the material resources of which they are built, preventing growth in the layering of hierarchical pattern novelty that is evolution's hill-climbing advantage.

Novelty itself isn't enough. Mechanisms that select for novelty are legion and cheap. Such randomization and combinatorial methods actually work against the entire selective/additive scheme that is evolution's only way to game the house.

Randall Reetz


This content is not yet available over encrypted connections.