WhyLabs brings more transparancy to ML ops

Published September 23, 2020

WhyLabs, a new machine learning startup that was spun out of the Allen Institute, is coming out of stealth today. Founded by a group of former Amazon machine learning engineers, Alessya Visnjic, Sam Gracie and Andy Dang, together with Madrona Venture Group principal Maria Karaivanova, WhyLabs’ focus is on ML operations after models have been trained — not on building those models from the ground up.

The team also today announced that it has raised a $4 million seed funding round from Madrona Venture Group, Bezos Expeditions, Defy Partners and Ascend VC.

Visnjic, the company’s CEO, used to work on Amazon’s demand forecasting model.

“The team was all research scientists, and I was the only engineer who had kind of tier-one operating experience,” she told me. “So I thought, “Okay, how bad could it be? I carried the pager for the retail website before. But it was one of the first AI deployments that we’d done at Amazon at scale. The pager duty was extra fun because there were no real tools. So when things would go wrong — like we’d order way too many black socks out of the blue — it was a lot of manual effort to figure out why issues were happening.”

Image Credits: WhyLabs

But while large companies like Amazon have built their own internal tools to help their data scientists and AI practitioners operate their AI systems, most enterprises continue to struggle with this — and a lot of AI projects simply fail and never make it into production. “We believe that one of the big reasons that happens is because of the operating process that remains super manual,” Visnjic said. “So at WhyLabs, we’re building the tools to address that — specifically to monitor and track data quality and alert — you can think of it as Datadog for AI applications.”

The team has brought ambitions, but to get started, it is focusing on observability. The team is building — and open-sourcing — a new tool for continuously logging what’s happening in the AI system, using a low-overhead agent. That platform-agnostic system, dubbed WhyLogs, is meant to help practitioners understand the data that moves through the AI/ML pipeline.

For a lot of businesses, Visnjic noted, the amount of data that flows through these systems is so large that it doesn’t make sense for them to keep “lots of big haystacks with possibly some needles in there for some investigation to come in the future.” So what they do instead is just discard all of this. With its data logging solution, WhyLabs aims to give these companies the tools to investigate their data and find issues right at the start of the pipeline.

Image Credits: WhyLabs

According to Karaivanova, the company doesn’t have paying customers yet, but it is working on a number of proofs of concepts. Among those users is Zulily, which is also a design partner for the company. The company is going after mid-size enterprises for the time being, but as Karaivanova noted, to hit the sweet spot for the company, a customer needs to have an established data science team with 10 to 15 ML practitioners. While the team is still figuring out its pricing model, it’ll likely be a volume-based approach, Karaivanova said.

“We love to invest in great founding teams who have built solutions at scale inside cutting-edge companies, who can then bring products to the broader market at the right time. The WhyLabs team are practitioners building for practitioners. They have intimate, first-hand knowledge of the challenges facing AI builders from their years at Amazon and are putting that experience and insight to work for their customers,” said Tim Porter, managing director at Madrona. “We couldn’t be more excited to invest in WhyLabs and partner with them to bring cross-platform model reliability and observability to this exploding category of MLOps.”