I think it's the other way round: humans have effectively unbounded training data. We can count exactly how much text any given model saw during training. We know exactly how many images or video frames were used to train it, and so on. Can we count the amount of input humans receive?
I can look at my coffee mug from any angle I want, I can feel it in my hands, I can sniff it, lick it and fiddle with it as much as I want. What happens if I move it away from me? Can I turn it this way, can I lift it up? What does it feel like to drink from this cup? What does it feel like when someone else drinks from my cup? The LLM has no idea because it doesn't have access to sensory data and it can't manipulate real-life objects (yet).
But as we interact with other people using mostly language, and since the start of internet a lot of those interactions happen in way similar to how we interact with AI, the difference is not so obvious. We are falling into the Turing test in this, mostly because that test is more about language than about intelligence.
I think this is incorrect on two accounts: Yes, transformers and individual layers are parallel, but the entire network is not. On a first level, it's obviously sequential over generated tokens - but even generation of a single token is sequential in the number of layers that the information travels through.
Both those constraints are comparable to the way humans think I believe. (The human brain doesn't have neatly organized layers, but it does have "pathways" where certain brain regions project into other brain regions)
But as we saw over the course of recent months or years, AI outputs are becoming more indistinguishable for human output.
Some people are obsessed with chatting with ghosts. It seems like a rational adult couldn't be seriously harmed by chatting with a ghost, but there are news reports showing that some people get possessed.
It's a better metaphor than parrots, anyway.
more:
A car is not just an engine, it's a drivetrain, a transmission, wheels, steering, all of which affect the end-product and its usability. LLMs are no different, and focusing on alignment without even addressing all the scaffolding that intermediates the exchange between the user and the LLM in an assistant use case seems disingenuous.
And I am sorry to be negative but there is so much bad cognitive science in this article that I couldn't take the product seriously.
> LLMs can be scaled almost arbitrarily in ways biological brains cannot: more parameters, more training compute, more depth.
- Capacity of raw compute is irrelevant without mentioning the complexity of computation task at hand. LLM's can scale - not infinitely - but they solve for O(n^2) tasks. It is also amiss to think human compute = a singular human's head. Language itself is both a tool and protocol of distributed compute among humans. You borrow a lot of your symbolic preprocessing from culture! Like said, this is exactly what LLM's piggyback on.
> We are constantly hit with a large, continuous stream of sensory input, but we cannot process or store more than a very small part of it.
- This is called relevance, and we are so frigging good at it! The fact that machine has to deal with a lot more unprioritized data in a relatively flat O(n^2) problem formulation is a shortcoming, not a feature. Visual cortex is such an opinionated accelerator of processing all that massive data that only the relevant bits need to make to your consciousness. And this architecture was trained for hundreds of millions of years, over trillions of experiment arms - that were in parallel experimenting on everything else too.
> Humans often have to act quickly. Deliberation is slow, so many decisions rely on fast, heuristic processing. In many situations (danger, social interaction, physical movement), waiting for more evidence simply isn't an option.
- Again a lot of this equivocates conscious processing to entire cognition. Anyone who plays sports or music knows to respect the implicit, embodied cognition that goes on to achieve complex motor tasks. We are yet to see a non-massively-fast-forwarded household robot do a mundane kitchen cleaning task, and go play table tennis with the same motor "cortex". Motor planning and articulation is a fantastically complex computation; just because it doesn't make it to our consciousness or instrumented exclusively through language doesn't mean it is not.
> Human thinking works in a slow, step-by-step way. We pay attention to only a few things at a time, and our memory is limited.
- Thinking, Fast and Slow by Kahneman is a fantastic way of getting into how much more complex the mechanism is.
The key point here is as limited in their recall, how good humans are at relevance, because it matters, because it is existential. Therefore when you are using a tool to extend your recall, it is important to see its limitations. Google search having indexed billions of pages is not a feature if it can't bring the top results well. If it gets the capability to sell me whatever it brought up was relevant, that still doesn't mean the results are actually relevant. And this is exactly the degradation of relevance we are seeing in our culture.
I don't care if the language terminal is a human or a machine, if the human was convinced by the low relevance crap of the machine it just a legitimacy laundering scheme. Therefore this is not a tech problem, it is a problem of culture; we need to be simultaneously cultivating epistemic humility, including quitting the Cartesian tyranny of worshipping explicit verbal cognition that is assumed to be locked up in a brain; we have to accept that we are also embodied and social beings that depend on a lot of distributed compute to solve for agency.