About

Sam Brotherton is a machine learning engineer and fractional CTO with fifteen years of experience in a wide range of industries, from music to healthcare to digital advertising. He is currently available for limited consulting work; please reach out at sam@cairnlabs.com if you'd like to discuss a project.


Selected Roles and Projects

Filter by:

2025

Yahoo!

Director of Engineering leading several AI/ML teams within Yahoo! Search:

  • AI Overviews: AI-powered summaries of search results that trigger on some queries and display in the search results page
  • Yahoo Grounding Engine: ETL pipelines, embedding models, indexing, search, and reranking platform powering Retrieval-Augmented Generation (RAG) across Yahoo, including the new AI-first search experience Scout
  • Yahoo Knowledge: Yahoo’s internal knowledge graph/semantic web, combining structured data from public sources and Yahoo-owned news, finance, and sports content

2023

Reddit

Staff Machine Learning Engineer within Reddit Ads Engineering. Worked on several projects including Akka streaming pipelines for bidding and pacing, as well as an AI-powered ad campaign generation tool targeted at SMBs.

2023

Ascertain

Built a prototype agentic RAG system for Ascertain, a then seed-stage healthtech startup born of a collaboration between Northwell Health and Aegis Ventures that went on to raise a $10M Series A led by Deerfield Management.

2023

Tesla Government

Built a RAG system, streaming ingestion pipeline, and custom vLLM-based LLM inference service on EKS on AWS GovCloud for Tesla Government, a contracting firm that provides information management and data analytics services to the State Department and other government agencies.

2021

Google

Two stints as a software engineer within Google Display Ads:

Early career (2014), I worked on a small team using natural language processing and other machine learning techniques to improve advertisement quality. Led a 20% project related to mining semantic information from web data, which was adopted by several teams across different product areas. Built a named entity recognition system in C++ and a link detection algorithm that runs on very large graphs; contributed to a topic model for clustering semantic entities.

More recently (2021), I boomeranged under Google’s COVID-era remote work policy, joining as a software engineer on an infrastructure team within Ads Privacy and Security using natural language processing and machine learning to prevent policy-violating ad impressions. Our systems handled 2M+ queries per second and combined modern deep learning techniques with multi-tier human evaluation pipelines. Additionally, I had a 20% role building remote sensing infrastructure and machine learning models for a stealth climate “startup” within Google.

2020

Pachama

Built a planet-scale satellite imagery ingestion and processing system for Pachama, a startup that ran a marketplace for nature-based carbon credits. We used continuous analysis of satellite imagery to track the growth of forests and other carbon sinks, and used machine learning to identify and measure the carbon stored in these assets. Pachama was acquired by Carbon Direct in 2025.

2020

Authentic Artists

Led a remote engineering team to build AI-powered audiovisual experiences, including a cloud deep learning server for symbolic (MIDI) song generation, a C++ constraint/transformation library for imposing additional musical style on its output, and an automated Max/Ableton session for audio generation. Coordinated with creative and executive teams to balance engineering and IP development with artistic and business goals. See Spotify Playlist for examples.

2019

Blue Cross Blue Shield of North Carolina

Designed and delivered a highly available, HIPAA-compliant machine learning inference server used to predict asthma risk in real time from claims records.

2019

Motion Insurance

Built a risk scoring server for car insurance, collecting realtime driving data and assigning machine learning based driver risk scores at a rate of 3000+ qps. Later aquired by Novo.

2018

Hikma Health

Fractional CTO of a health tech 501c(3) nonprofit building a mobile-first Electronic Health Record (EHR) system for deployment in rural, underserved communities. We are currently deployed to refugee clinics in 18 countries and have serverd over 500,000 patients. The technical development process is described in our paper Development of an Offline, Open-Source, Electronic Health Record System for Refugee Care.

2018

Peerless

CTO of a pre-seed financial services startup building an execution platform for alternative asset trading. Included a high-performance C++ trading engine, an Elixir-based API and task management system, and a React frontend for managing orders and positions.

2018

Live Circle

Built a real-time audio streaming and analysis service for a health tech startup. We used Twilio’s media streaming API to bidirectionally stream audio to and from a user’s phone, running a series of machine learning models on the audio in real-time. Several patents were filed and granted, as well as several papers published, including Spoken words as biomarkers: using machine learning to gain insight into communication as a predictor of anxiety.

2017

Trebella

Led a team of engineers to build an AI piano teaching app that plugs into any MIDI keyboard and provides real-time feedback as students work through a curated library of songs and exercises. We took it to NAMM 2017 in Nashville and ran a Kickstarter campaign to fund development, but ultimately failed to reach profitability.

2016

i.am+

Built a deep learning based conversational UI framework to power will.i.am’s wireless earphones and other applications. Backed by Tensorflow, Prolog, and other technologies. Supported multiple languages, extensible dialogue flows, and a custom knowledge base. Helped will.i.am raise over $100M in funding and launch the i.am+ platform, which ended up deployed to both the consumer market on the i.am+ buttons and to enterprise clients including Deutsche Telekom.

2013

Whisper

Sole data scientist at a rapidly expanding social media startup seeing upwards of three billion monthly pageviews. Designed and built an NLP service to extract topics and tags from posts, predict image searchterms from unstructured text, and target content to users. Implemented a new geographic search system using PostGIS that decreased search time by 90%. Worked closely with the front and backend development teams, writing production code in Erlang and Python.

Education

2012

Harvard University

B.A., Mathematics and East Asian Studies

Received highest honors for senior thesis analyzing over 200,000 Chinese blog posts, algorithmically detecting mutations in the Chinese language in response to censorship. Completed coursework in abstract algebra, Galois theory, topology, real and complex analysis, probability theory, and linguistics.