Visual One smartens up home security cameras with object and action recognition

Published March 11, 2020

“Smart” cameras are to be found in millions of homes, but the truth is they’re not all that smart. Facial recognition and motion detection are their main tricks… but what if you want to know if the dog jumped on the couch, or if your toddler is playing with the stove? Visual One equips cameras with the intellect to understand a bit more of the world and give you more granular — and important — information.

Founder Mohammad Rafiee said that the idea came to him after he got a puppy (Zula) and was dissatisfied with the options he had for monitoring her activities while he was away. Here she is doing what dogs do best:

There are no bad dogs, but chairs are for people

“There were specific things I wanted to know were happening, like I wanted to check if the dog got picked up by the dog walker. The cameras’ motion detection is useless — she’s always moving,” he lamented. “In fact, with a lot of these cameras, just a change in the lighting or wind or rain can trigger the motion alert, so it’s completely impractical.”

“My background is in machine learning. I was thinking about it, and realized we’re at a stage where this problem is starting to become solvable,” he continued.

Some tasks in computer vision, indeed, are as good as solved — detecting faces and common objects such as cars and bikes can be done quickly and efficiently. But that’s not always useful — what’s the point of knowing someone rode their bike past your house? In order for this to have value, the objects need to be understood as part of a greater context, and that’s what Rafiee and Visual One are undertaking.

Unfortunately, it’s far from easy — or else everyone would be doing it already. Identifying a cat is simple, and identifying a table is simple, but identifying a cat on a table is surprisingly hard.

“It’s a very difficult problem. So we’re breaking it down to things we can solve right now, then building on that,” Rafiee explained. “With deep learning techniques we can identify different objects, and we build models on top of those to specify different interactions, or specific objects being in specific locations. Like a car in the wrong spot, or a dog getting on a couch. We can recognize that with high accuracy right now — we have a list of supported objects and models that we’re expanding.”

In case you’re not convinced that the capabilities are that much advanced from the usual “activity in the living room” or “Kendra is at the front door” notifications, here are a few situations that Visual One is set up to detect:

Kid playing with the stove
Toddler climbing furniture
Kid holding a knife
Baby left alone for too long
Raccoon getting into garbage
Elderly person taking her medications
Elderly person in bed for too long
Car parked in the wrong spot
Garage door left open
Dog chewing on a shoe
Cat scratching the furniture

The process for creating these triggers is pretty straightforward

If one of those doesn’t make you think “actually… that would be really good to know,” then perhaps a basic security camera is enough for your purposes after all. Not everyone has a knife-curious toddler. But those of you who do are probably scrolling furiously past this paragraph looking for where to buy one of these things.

Unfortunately Visual One isn’t something you can just install on any old existing system — with the prominent exception of Nest, into which it can plug. Camera workflows are generally too locked down for security and privacy purposes to allow for third-party apps and services to be slipped in. But the company isn’t trying to bankrupt everyone with an ultra-luxury offering. It’s using off-the-shelf cameras from Wyze and loading them with its own software stack.

Rafiee said he pictures Visual One as a mid-tier option for people who want to have more than a basic camera setup but aren’t convinced by the more expensive plays. That way the company avoids going head-on with commodity hardware’s race to the bottom or the brand warfare taking place between Google and Amazon’s Nest and Ring. Cameras cost $30-$40, and the service is $7 per month currently.

Ultimately the low-end companies may want to license from Visual One, while the high-end companies will be developing their own full stack at great cost, making it difficult for them to go downmarket. “Hardware is hard, and AI is specialized — unless you’re a giant company it’s hard to do both. I think we can fill the gap in the market for mid-market companies without those resources,” he said.

Of course privacy is paramount as well, and Rafiee said that because of the way their system works, although the AI lives in the cloud and therefore requires the cameras to be online (like most others), no important user data needs to be or will be stored on Visual One servers. “We do inference in the cloud so we can be hardware agnostic, but we don’t need to store any data. So we don’t add any risk,” he said.

Visual One is launching today (after a stint in YC’s latest cohort) with an initial set of objects and interactions, and will continue developing more as it observes which use cases prove popular and effective.