Sam Brotherton is a machine learning engineer and fractional CTO with fifteen years of experience
in a wide range of industries, from music to healthcare to digital advertising. He is currently
available for limited consulting work; please reach out at sam@cairnlabs.com
if you'd like to discuss a project.
2025
Director of Engineering leading several AI/ML teams within Yahoo! Search:
- AI Overviews: AI-powered summaries of search results that trigger on some queries and display in the search results page
- Yahoo Grounding Engine: ETL pipelines, embedding models, indexing, search, and reranking platform powering Retrieval-Augmented Generation (RAG) across Yahoo, including the new AI-first search experience Scout
- Yahoo Knowledge: Yahoo’s internal knowledge graph/semantic web, combining structured data from public sources and Yahoo-owned news, finance, and sports content
2023
Staff Machine Learning Engineer within Reddit Ads Engineering. Worked on several projects including
Akka streaming pipelines for bidding and pacing, as well as
an AI-powered ad campaign generation tool
targeted at SMBs.
2023
Built a prototype agentic RAG system for Ascertain, a then seed-stage healthtech startup born
of a collaboration between Northwell Health and Aegis Ventures that went on to raise a $10M Series A led by Deerfield Management.
2023
Built a RAG system, streaming ingestion pipeline, and custom vLLM-based LLM inference service
on EKS on AWS GovCloud for Tesla Government,
a contracting firm that provides information management and data analytics services to the State
Department and other government agencies.
2021
Two stints as a software engineer within Google Display Ads:
Early career (2014), I worked on a small team using natural language
processing and other machine learning techniques to improve advertisement
quality. Led a 20% project related to mining semantic information from
web data, which was adopted by several teams across different product
areas. Built a named entity recognition system in C++ and a link detection
algorithm that runs on very large graphs; contributed to a topic model for
clustering semantic entities.
More recently (2021), I boomeranged under Google’s COVID-era remote work
policy, joining as a software engineer on an infrastructure team within
Ads Privacy and Security using natural language processing and machine
learning to prevent policy-violating ad impressions. Our systems handled
2M+ queries per second and combined modern deep learning techniques with
multi-tier human evaluation pipelines. Additionally, I had a 20% role
building remote sensing infrastructure and machine learning models for a
stealth climate “startup” within Google.
2020
Built a planet-scale satellite imagery ingestion and processing system for
Pachama, a startup that ran a marketplace for nature-based
carbon credits. We used continuous analysis of satellite imagery to track the growth
of forests and other carbon sinks, and used machine learning to identify and measure
the carbon stored in these assets. Pachama was acquired by Carbon Direct in 2025.
2020
Led a remote engineering team to build AI-powered audiovisual
experiences, including a cloud deep learning server for symbolic (MIDI) song generation, a
C++ constraint/transformation library for imposing additional musical style
on its output, and an automated Max/Ableton session for audio generation.
Coordinated with creative and executive teams to balance engineering and IP
development with artistic and business goals. See
Spotify Playlist for examples.
2019
Designed and delivered a highly available, HIPAA-compliant machine learning inference server
used to predict asthma risk in real time from claims records.
2019
Built a risk scoring server for car insurance,
collecting realtime driving data and assigning machine learning based
driver risk scores at a rate of 3000+ qps. Later aquired by Novo.
2018
Fractional CTO of a health tech 501c(3) nonprofit building a mobile-first Electronic Health Record (EHR) system for deployment
in rural, underserved communities. We are currently deployed to refugee clinics in 18 countries and have serverd over 500,000 patients. The technical development
process is described in our paper Development of an Offline, Open-Source, Electronic Health Record System for Refugee Care.
2018
CTO of a pre-seed financial services startup building an execution platform for alternative asset trading.
Included a high-performance C++ trading engine, an Elixir-based API and task management system, and a React
frontend for managing orders and positions.
2018
Built a real-time audio streaming and analysis service for a health tech startup. We used Twilio’s media streaming API
to bidirectionally stream audio to and from a user’s phone, running a series of machine learning models on the audio in
real-time. Several patents were filed and granted, as well as several papers published, including
Spoken words as biomarkers: using machine learning to gain insight into communication as a predictor of anxiety.
2017
Led a team of engineers to build an AI piano teaching app that plugs into any MIDI keyboard
and provides real-time feedback as students work through a curated library of songs and exercises.
We took it to NAMM 2017 in Nashville and ran a Kickstarter campaign to
fund development, but ultimately failed to reach profitability.
2016
Built a deep learning based conversational UI framework to power will.i.am’s wireless earphones and other applications.
Backed by Tensorflow, Prolog, and other technologies. Supported multiple languages, extensible dialogue flows, and
a custom knowledge base. Helped will.i.am raise over $100M in funding
and launch the i.am+ platform, which ended up deployed to both the consumer
market on the i.am+ buttons
and to enterprise clients including Deutsche Telekom.
2013
Sole data scientist at a rapidly expanding social media startup
seeing upwards of three billion monthly pageviews. Designed and built an NLP
service to extract topics and tags from posts, predict image searchterms
from unstructured text, and target content to users. Implemented a new
geographic search system using PostGIS that decreased search time by
90%. Worked closely with the front and backend development teams, writing
production code in Erlang and Python.
2012
B.A., Mathematics and East Asian Studies
Received highest honors for senior thesis analyzing over 200,000 Chinese
blog posts, algorithmically detecting mutations in the Chinese language in
response to censorship. Completed coursework in abstract algebra,
Galois theory, topology, real and complex analysis, probability theory, and
linguistics.