Data Efficiency Theory of Intelligence

Data efficiency hypothesis of intelligence

Intelligent beings understand a tremendous number of causes, consequences, and other associations with very limited experiences. This “data efficiency” characteristic is so critical to our understanding of intelligence that we judge each other’s intelligence by it:

A “gifted” student can learn faster, with fewer lessons and less practice than an average student.
A “sharp” person is one who remembers some information when told only once, and can utilize that information right away in at least some capacity.
A “genius” is one who can infer seemingly new knowledge without ever having learned it at all—i.e. generalize to seemingly fundamentally different areas of knowledge.

By contrast, characterizations such as “experienced”, “expert”, or “master of the craft”, while certainly positive assessments of knowledge and effectiveness in a particular area, simply don’t have connotations of general intelligence. Because they simply aren’t good predictors of effectiveness in other areas.

Why do overfit models seem less trustworthy than underfit models?
- Same accuracy according to optimal function theory
- Underfit models are usually interpretable
  - Dr Xu: “I understand why it made this mistake, its very understandable”
- Overfit models are generally uninterpretable
  - Blank stares from reviewers, no possible human interpretation
Dropping information
- Translation engines based on language modeling prefer “fluency” (internal coherence) over “accuracy” (picking up every important detail in the source text).
Hallucination
- Making up new information that has high coherence with training data rather than with actual example