Careers

Our companies are made up of insanely talented people driven to change the world — and many of them are hiring. If you have drive, expertise, and a taste for adventure, we whole­heart­edly encourage your interest.

  • 75
    Companies Hiring
  • 2,747
    Open Roles

AI Dataset Engineer - Dataset Generation

Reality Defender

Reality Defender

Software Engineering, Data Science
Earth · Remote
Posted on Tuesday, May 21, 2024

About Reality Defender

Reality Defender is a groundbreaking security platform offering comprehensive deepfake detection. A Y Combinator graduate, Comcast NBCUniversal LIFT Labs alumni, and backed by DCVC, Reality Defender's proactive deepfake and AI-generated content detection technology is developed by a leadership team with over 20 years of experience in applied research at the intersection of machine learning, data science, and cybersecurity.

With models defending against present and future fabrication techniques, Reality Defender is the best way to detect and deter fraudulent text, audio, and visual content, partnering with government agencies and enterprise clients to enhance security and detect fraud.

Role and Responsibilities

  • Dataset construction and maintenance to support the AI team's work, focusing in particular on computer vision (both image and video).

  • Execute data generation, data cleaning, annotation, and new dataset construction.

  • Help develop auto-annotation workflows for image and video data (e.g. identifying accessories such as hats, glasses, etc in images of people, etc)

  • Working with third parties to create new datasets (e.g. crowdsourcing images/videos) and annotate existing data.

About You

  • 3+ years of software/data science/ML industry experience

  • Preferred: Bachelors degree in STEM (science, tech, engineering, math)

  • Proficient with Python

  • Experience working with large computer vision datasets (10M+ items)

  • Familiarity with ML and deep learning theory (can be self-taught)

  • Experience implementing ML and deep learning algorithms from the literature

  • Nice to have: Experience with perceptual hashing and image deduplication