Siddish's Public Notes
  • ๐ŸŒฟWelcome !
  • ๐ŸญCurations
  • ๐Ÿ–‹๏ธQuotes
  • AI
    • โ„น๏ธPrompting
      • Prompt for Brainstorming
    • ๐ŸŽฎPrompt Hacking
    • Voice Models
    • ๐ŸŒฑAI Copilots
    • ๐Ÿš‚Data Engine
  • ๐Ÿ”ฎDesign for AI
  • WIP
    • What's on top of my mind
    • ๐Ÿ”ขEmbeddings
Powered by GitBook
On this page

Was this helpful?

  1. AI

Data Engine

Notes from Andrej Karpathy talks

PreviousAI CopilotsNextDesign for AI

Last updated 11 months ago

Was this helpful?

Hypothesis:

  • Unknown unknowns: Dataset is always imperfect, all scenarios are not represented well yet and can always be more diverse

  • Capable base model/architecture: Improving dataset improves AI/product guarantees

Inspirations:

"The only sure certain way I have seen of making progress on any task is, you curate the dataset that is clean and varied and you grow it and you pay the labeling cost and I know that works.โ€

"Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is generalโ€

๐Ÿš‚
QualEval: Qualitative Evaluation for Model Improvement
A Recipe for Training Neural Networks
1. Become one with the data 2. Set up the end-to-end training/evaluation skeleton + get dumb baselines 3. Overfit 4. Regularize 5. Tune 6. Squeeze out the juice
Data Engine HLD internally at metaforms.ai
https://medium.com/swlh/about-the-long-tail-113e98ce8717
from 6th to 15th minute