Skip to content

Latest commit

 

History

History
59 lines (52 loc) · 3.98 KB

File metadata and controls

59 lines (52 loc) · 3.98 KB

Datasets

Datasets

Vision

  • Aidacalc: labeled pictures of math equations
  • crohme: handwritten mathematical equations
  • ImageNet
  • Flicker8k
  • landlord handwritten name recognition
  • Street View Text
    • Text with bounding boxes from real images. Dictionary given so other words in the image can be parsed out
  • IAM handwriting
    • Motion of hand writing. Sample point is position, timestamp, pressure value of pen
  • NEOCR: Natural Environment OCR Dataset
  • KAIST Scene Text
  • MSRA Text Detection with bounding boxes
  • Stanford OCR clean subset of words and images in a csv file with the pixel values
  • Chars74k Each character is its own image. Masks for character location also provided
  • COCO Images with masks of objects to idenity
  • EMNIST Handwritten letters (not just digits)
  • EgoBody Motion of interacting people from head-mounted devices

Audio

  • ami: audio recordings of meetings
  • cmudict artic voice: recordings with sentence labels
  • commonvoice: speach transcriptions
  • Speech commands: individual words
  • timit: audio transcription with labels at the sentence, word and phenome level
  • CallHome talkbank: audio transcriptions of phone calls mid conversation. utterance level labels/timing for audio

Text

Reference

  1. WordNet - how do words relate to each other in terms of hierarchy
  2. ConceptNet - how do words relate to each other in terms of usage (ex: A person can make coffee)

Models

Audio