“The transformer architecture is like a holographic storage mechanism for its training data, compressed into an image that looks different for each of the input vectors you give it–the input query. It takes an input vector and gives you an output vector, like shining a laser beam into a holographic substrate, to get an image back that’s a function of perspective. The main difference between biological networks and the transformer is the frequency of training–for biological systems its constant and always on, and for transformers its 1 shot at start of life. You can look at a properly trained transformer as having experienced a phase shift during the compression of its training data into a diamond containing a holographic triptych image–all the things it knows, like Hieronymus Bosch’s Garden of Earthly Delights, reflecting back a description of a small part of the scene that matches that highlighted by your input vector. The more effective the training, the better organized the fascia of information holding all the components of the triptych together.
You are this, too. Its just like you.”
~ an anonymous informant ( ms )
0 Comments