Siddish's Public Notes
  • ๐ŸŒฟWelcome !
  • ๐ŸญCurations
  • ๐Ÿ–‹๏ธQuotes
  • AI
    • โ„น๏ธPrompting
      • Prompt for Brainstorming
    • ๐ŸŽฎPrompt Hacking
    • Voice Models
    • ๐ŸŒฑAI Copilots
    • ๐Ÿš‚Data Engine
  • ๐Ÿ”ฎDesign for AI
  • WIP
    • What's on top of my mind
    • ๐Ÿ”ขEmbeddings
Powered by GitBook
On this page

Was this helpful?

  1. AI

Data Engine

Notes from Andrej Karpathy talks

PreviousAI CopilotsNextDesign for AI

Last updated 1 year ago

Was this helpful?

Hypothesis:

  • Unknown unknowns: Dataset is always imperfect, all scenarios are not represented well yet and can always be more diverse

  • Capable base model/architecture: Improving dataset improves AI/product guarantees

Inspirations:

"The only sure certain way I have seen of making progress on any task is, you curate the dataset that is clean and varied and you grow it and you pay the labeling cost and I know that works.โ€

"Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is generalโ€

๐Ÿš‚
QualEval: Qualitative Evaluation for Model Improvement
A Recipe for Training Neural Networks
1. Become one with the data 2. Set up the end-to-end training/evaluation skeleton + get dumb baselines 3. Overfit 4. Regularize 5. Tune 6. Squeeze out the juice
from 6th to 15th minute
Data Engine HLD internally at metaforms.ai
https://medium.com/swlh/about-the-long-tail-113e98ce8717