Let me take a stab at defending compression as equivalent to intelligence.
Standard string compression (LZW, etc.) works by understanding and then exploiting the sequencing rules that result in the redundancy built into most (all?) languages and communication protocols.
Compression is necessary in any storage/retrieval/manipulation system for the simple reason that all systems are finite. Any library, any hard drive, any computer memory… all finite. If working with primary in-situ environments was as efficient as working with maps or abstractions we would never have to go through the trouble of making maps or abstracting and filtering and representing.
It might seem sarcastic even to say it, but a universe is larger than a brain.
You have however stumbled upon an interesting insight. Where exactly is intelligence? In classic Shannon information theory, and the communication metrics (signal/noise ratio) upon which it is based, information is a duality where data and cypher are interlocked. In this model, you can reduce the size of your content, but only if you increase the size (or capacity) of the cypher. Want to reduce the complexity of the cypher, well you are forced to accept the fact that your content will grow in size or complexity. No free lunch!
In order to build a more robust cypher, one has to generalize in order find salience (the difference that make a difference) in a greater and greater chunk of the universe. It is one thing to build an data crawler for a single content protocol, quite another to build a domain and protocol independent data crawler. It is one thing to build hash trees based on word or token frequency and quite another to build them based on causal semantics (not how the words are sequenced, but how the concepts they refer to are graphed.
I think the main trouble you are having with this compression = intelligence concept has to do with a limited mapping of the word "compression".
Lets say you are driving and need to know which way to turn as you approach a fork in the road. If you are equipped with some sort of mental abstraction of the territory ahead, or on a map, you can choose based on the information encoded into these representations. But what if you didn't? What if you could not build a map, either on paper, or in your head. Then you would be forced to drive up each fork in turn. In fact, had you no abstraction device, you would have to do this continually as you would not be able to remember the first road by the time you took the second.
What if you had to traverse every road in every city you came to just to decide which road you were meant to take in the first place? What if the universe it self was the best map you could ever build of the universe? Surely you can see that a map is a form of compression.
But lets say that your brain can never be big enough to build a perfect map of every part of the universe important to you. Lets imagine that the map-building map you build in order to create mental memories of roads and cities is ineffective at building maps of biological knowledge or physics or the names and faces of your friends. You will have to go about building unique map builders for each domain of knowledge important to you. Eventually, every cubic centimeter of your brain will be full of domain-specific map making algorithms. No room for the maps!
What you need to build sited is a universal map builder. A map builder that works just as well for topological territory as it does for concepts and lists and complex n-dimensional pattern-scapes.
Do so and you will end up with the ultimate compression algorithm!
But your point about where the intelligence lies is important. I haven't read the rules for the contest you sight, but if I were to design such a contest, I would insist that the final byte count of each entrants' data also include the byte count of the code necessary to unpack it.
I realize that even this doesn't go far enough. You are correctly asserting that most of the intelligence is in the human minds that build these compression algorithms in the first place.
How would you go about designing a contest that correctly or more accurately measures the full complexity of both cypher and the content it interprets?
But before you do, you should take the time to realize that a compression algorithm becomes a smaller and smaller component of the total complexity metric the more often it is used. How many trillions of trillions of bytes have been trimmed from the global data tree over the lifespan of use of MPEG or JPEG on video and images? Even if you factor in a robust calculation of the quantum wave space inhabited by the humans brains that created these protocols it is plain to see that use continues to diminish the complexity contribution of the cypher no matter how complex.
Now what do you think?
Randall Lee Reetz
Change increases entropy. The only variable; how fast the Universe falls towards chaos. Determining this rate is the complexity being carried. Complexity exists only to increase disorder. Evolution is the refinement of a fitness metric. It is the process of refining a criteria for the measurement of the capacity of a system to maximize its future potential to hold complexity. This metric becomes ever more sophisticated, and can never be predetermined. Evolution is the computation.
Search This Blog
Showing posts with label compression. Show all posts
Showing posts with label compression. Show all posts
Building Pattern Matching Graphs
I talk a lot about the integral relationship between compression and intelligence. Here are some simple methods. We will talk of images but images are not special in any way (just easier to visualize). Recognizing pattern in an image is easier if you can't see very well.
What?
Blur your eyes and you vastly reduce the information that has to be processed. Garbage in, brilliance out!
Do this with every image you want to compare. Make copies and blur them heavily. Now compress their size down to a very small bitmap (say 10 by 10 pixels) using a pixel averaging algorithm. Now convert each to grey scale. Now increase the contrast (about, 150 percent). Store them thus compressed. Now compare each image to all of the rest: subtract the target image from the compared image. The result will be the delta between the two. Reduce this combined image to one pixel. It will have a value somewhere between pure white (0) and pure black (256), representing the gross difference between the two images. Perform this comparison between your target image and all of the images in your data base. Rank and group them from most similar to least.
Now perform image averages of the top 10 percent matches. Build a graph that has all of the source images at the bottom, the next layer is the image averages you just made. Now perform the same comparison to the 10 percent that make up this new layer of averages, that will be your next layer. Repeat until your top layer contains two images.
Once you have a graph like this, you can quickly find matching images by moving down the graph and making simple binary choices for the next best match. Very fast. If you also take the trouble to optimize your whole salience graph each time you add a new image, your filter should get smarter and smarter.
To increase the fidelity of your intelligence, simply compare individual regions of your image that were most salient in the hierarchical filtering that cascaded down to cause the match. This process can back-propagate up the match hierarchy to help refine salience in the filter graph. Same process works for text or sound or video or topology of any kind. If you have information, this process will find pattern in it. Lots of parameters to tweak. Work the parameters into your fitness or salience breading algorithm and you have a living breathing learning intelligence. Do it right and you shouldn't have to know which category your information originated from (video, sound, text, numbers, binary, etc.). Your system should find those categories automatically.
Remember that intelligence is a lossy compression problem. What to pay attention to, what to ignore. What to save, what to throw away. And finally, how to store your compressed patterns such that the graph that results says something real about the meta-paterns that exist natively in your source set.
This whole approach has a history of course. Over the history of human scientific and practical thought many people have settled in on the idea that fast filtering is most efficient when it is initiated on a highly compressed pattern range. It is more efficient for instance to go right to the "J's" than to compare the word "joy" to every word in a dictionary or database. This efficiency is only available if your match set is highly structured (in this example, alphabetically ordered). One can do way way way better than alphabetically ordered lists of 3 million words. Lets say there are a million words in a dictionary. If one sets up a graph, an inverted pyramid, where each level where the level one has 2 "folders" and each folder is named for the last word in the subset of all words at that level divided into two groups. The first folder would reference all words from "A" to something like "Monolith" (and is named "Monolith") The second folder at that level contains all words alphabetically larger than "Monolith" (maybe starting with "Monolithic") and is named "Zyzer" (or what ever the last word is in the dictionary). Now, put two folders in each of these folders to make up the second tier of your sorting graph. At the second level you will have 4 folders. Do this again at the third level and you will have 8 folders each named for the last word in the graph referenced in the tiers of the graph above them. It will only take 20 levels to reference a million words, 24 levels for 15 million words. That represents a 6 order of magnitude savings over an unstructured sort.
A cleaver administrative assistant working for Edward Hubble (or was it Wilson, I can't find the reference?) made punch cards of star positions from observational photo plates of the heavens and was able to perform fast searches for quickly moving stars by running knitting needles into the punch holes in a stack of cards.
Pens A and B found their way through all cards. Pen C hits the second card.
What matters, what is salient, is always that which is proximal in the correct context. What matters is what is near the object of focus at some specific point in time.
Lets go back to the image search I introduced earlier. As in the alphabetical word search just mentioned, what should matter isn't the search method (that is just a perk), but rather the association graph that is produced over the course of many searches. This structured graph represents a meta-pattern inherent in the source data set. If the source data is structurally non-random, its structure will encode part of its semantic content. If this is the case, the data can be assumed to have been encoded according to a set of structural rules themselves encoding a grammar.
For each of these grammatical rule sets (chunking/combinatorial schemes) one should be able to represent content as a meta-pattern graph. One of the graphs representing a set of words might be pointers to the full lexicon graph. A second graph of the same source text might represent the ordered proximity of each word to its neighbors (remember the alphabetical meta-pattern graph simply represents the neighbors at the character chunk level).
What gets interesting of course are the meta-graphs that can be produced when these structured graphs are cross compressed. In human cognition these meta-graphs are called associative memory (experience) and are why we can quickly reference a memory when we see a color or our nose picks up a scent.
At base, all of these storage and processing tricks depend on two things, storing data structures that allow fast matching, and getting rid of details that don't matter. In concert these two goals result in a self optimization towards maximal compression.
The map MUST be smaller than the territory or it isn't of any value.
It MUST hold ONLY those aspects of the territory that matter to the entity referencing them. The difference between photos and text: A photo-sensor in a digital camera doesn't know for human salience. It sees all points of the visual plane as equal. The memory chips upon which these color points are stored see all pixels as equal. So far, no compression, and no salience. Salience only appears at the level of where digital photos originate (who took them, where, and when). On the other hand, text is usually highly compressed from the very beginning. What a person writes about and how they write it always represents a very very very small subset of
Labels:
algorithm,
compare,
compressed,
compression,
data,
dictionary,
graph,
image,
images,
information,
intelligence,
memory,
meta-pattern,
pattern,
process,
salience,
search,
source,
text,
words
Compression as Intelligence (Garbage Out, Brilliance In)
I am convinced that the secret to developing intelligence (in any substrate, including your brain) lies in the percentage of the data coming in that you are willing (or forced) to toss. Lossy compression is the key to intelligence. Of course there is a caveat… you can't just trash anything and everything.
The first line of the book I am writing about evolution: "What matters is what matters, knowing what matters and how to know it matters the most."
I am convinced that evolving systems can only work towards mechanisms that process salience if they are forced to maximize the amount of stuff they can trash.
If you are forced to get rid of 99.999 percent of everything that comes in, well you will have to get good at knowing the difference between needles and hay and you will have to get good at knowing the difference in a hurry. The "needles and hay" metaphor doesn't map well to what I am talking towards. If the system you are dealing with is so unstructured as to fit the haystack metaphor, you really aren't doing anything I would classify as intelligence. If there is nothing of structure in the haystack you are storing than your compression system should already have tossed the whole thing out.
Many techniques for the filtering of essence, for finding pattern, for storing pattern and for storing pattern of pattern have been developed. The most impressive reduce raw input streams and store pattern from the most general to the most specific as hierarchically stratified graphs.
Being forced to reduce data to storage formats that maximize lossy-ness minimizes necessary storage. But that is just a perk. What really gates intelligence is the amount of a complex system (or map thereof) that can be made proximal to immediate processing. Our brains might be big and mighty, but what really matters is how much of the right parts of what is stored can be brought together in one small space for semi-real-time simulations processing. Information, when organized optimally for maximal storage density, will also be information that is ideally organized for localized serialization and simultaneity of processing.
To think, a system has to be able to grab highly compressed pattern hierarchies and move them into superposition on top of each other for near instantaneous comparison. You can't do this with a whole brain's worth of data, no matter how well organized it is.
Lets say you have to store everything you know about every sport you have ever heard of, and you have to do it in a very limited space. You will be forced to build a hierarchy of grammars in which general concepts shared in every sport (opponents, the goal to win, a set of rules and consequences, physical playing geometries, equipment, etc.), with layers of groupings that allow for the similarities between some sports and so on up to the specifics that are are only present in each individual sport. Keep compressing this set. Always compress. Try all day (or all night) for even more compression. Compress until you can't even get to lots of the specifics any more. Keep compressing. Dump the sports you don't care about. Keep on throwing stuff out.
Now lets say I have some sort of morbid sense of humor and I tell you that you are going to have to store everything you encounter and everything you think about, your entire life, in that same database that you have optimized for sports.
You will have to learn to look for the meta-patterns that will allow you to store your first romance in a structure that also allows you to store everything you know about kitchen utensils and geo-politics and the way the Beatles White Album makes you feel when it is windy outside.
The necessity to toss, enforced by limited storage and an obsession to compress will result in domain-blending salience hierarchies. It is why we can find deep similarities between music and geological topologies. It is why we can "think".
For years people have tried to come up with the algorithms of thought. What we need instead is to build into our artificial systems, a very mean and ornery compression task master that forces over time, all of our disparate sensation streams into the same shared graph.
Once you have all of your memories stored within the same graph, by necessity sharing the same meta-pattern, the job of evolving processing algorithms is made that much easier.
An intelligent system will spend most if not all of its time compressing data. We have a tendency to bifurcate the behavior of a mind into storage on the one hand, and processing on the other. I am beginning to think that the thing we call "thinking" and "thought" is exclusively and only a side-effect of constant attempts at compression – that there really isn't anything separate that happens outside of compression. Is this possible?
Randall Reetz
Labels:
algorithms,
compression,
data,
evolving,
haystack,
hierarchical,
hierarchies,
information,
intelligence,
Knowing,
map,
metaphor,
organized,
pattern,
processing,
salience,
set,
storage,
structure,
thought
Subscribe to:
Posts (Atom)