3 methodologies for automated video game highlight detection and capture

Published September 10, 2021

Nathan Babcock
Contributor

Nathan Babcock is a computer scientist and freelance writer in Chicago and a co-founder of automated highlight detection startup Clip It.

Benjamin Clingan
Contributor

Benjamin Clingan is a software developer specializing in Python back ends, finance, genetic neural networks and other machine learning strategies and a co-founder of automated highlight detection startup Clip It.

With the rise of livestreaming, gaming has evolved from a toy-like consumer product to a legitimate platform and medium in its own right for entertainment and competition.

Twitch’s viewer base alone has grown from 250,000 average concurrent viewers to over 3 million since its acquisition by Amazon in 2014. Competitors like Facebook Gaming and YouTube Live are following similar trajectories.

The boom in viewership has fueled an ecosystem of supporting products as today’s professional streamers push technology to its limit to increase the production value of their content and automate repetitive aspects of the video production cycle.

The largest streamers hire teams of video editors and social media managers, but growing and part-time streamers struggle to do this themselves or come up with the money to outsource it.

The online streaming game is a grind, with full-time creators putting in eight- if not 12-hour performances on a daily basis. In a bid to capture valuable viewer attention, 24-hour marathon streams are not uncommon either.

However, these hours in front of the camera and keyboard are only half of the streaming grind. Maintaining a constant presence on social media and YouTube fuels the growth of the stream channel and attracts more viewers to catch a stream live, where they may purchase monthly subscriptions, donate and watch ads.

Distilling the most impactful five to 10 minutes of content out of eight or more hours of raw video becomes a non-trivial time commitment. At the top of the food chain, the largest streamers can hire teams of video editors and social media managers to tackle this part of the job, but growing and part-time streamers struggle to find the time to do this themselves or come up with the money to outsource it. There aren’t enough minutes in the day to carefully review all the footage on top of other life and work priorities.

Computer vision analysis of game UI

An emerging solution is to use automated tools to identify key moments in a longer broadcast. Several startups compete to dominate this emerging niche. Differences in their approaches to solving this problem are what differentiate competing solutions from each other. Many of these approaches follow a classic computer science hardware-versus-software dichotomy.

Athenascope was one of the first companies to execute on this concept at scale. Backed by $2.5 million of venture capital funding and an impressive team of Silicon Valley Big Tech alumni, Athenascope developed a computer vision system to identify highlight clips within longer recordings.

In principle, it’s not so different from how self-driving cars operate, but instead of using cameras to read nearby road signs and traffic lights, the tool captures the gamer’s screen and recognizes indicators in the game’s user interface that communicate important events happening in-game: kills and deaths, goals and saves, wins and losses.

These are the same visual cues that traditionally inform the game’s player what is happening in the game. In modern game UIs, this information is high-contrast, clear and unobscured, and typically located in predictable, fixed locations on the screen at all times. This predictability and clarity lends itself extremely well to computer vision techniques such as optical character recognition (OCR) — reading text from an image.

The stakes here are lower than self-driving cars, too, since a false positive from this system produces nothing more than a less-exciting-than-average video clip — not a car crash.