Catalin Voss of Heidelberg, Germany, was 13 when the first iPhone came out in 2007, and he felt that it created such an opportunity for software developers that he had to act immediately. He began to post podcasts sharing how-to tips on developing apps. (“On the Internet,” he notes, “no one knows how old you are.”)
Soon, he had attracted hundreds of thousands of followers—and an offer to work with a Silicon Valley payment-processing startup called PayNearMe. So Voss sought—and gained—emancipation from his parents (a necessary precondition, he notes, “if you are 14 years old and you want to do iPhone development and sign a venture contract”) and became a regular traveler between Germany and Silicon Valley. He eventually decided to move to America full time to attend Stanford University, which came as a shock to his parents. “I had, just before that, almost failed Latin,” he says. But Stanford forgave the Latin grades, and he is now beginning his junior year.
Like many software engineers, Voss is intrigued by unsolved puzzles, and he was especially drawn to the problem of how computers might interact with humans more perceptively. What might they see in our emotions? It was with this question in mind that he turned his attention a few years ago to facial tracking and mapping. By 2013, at age 18, Voss had gone so far with his explorations that he co-founded a company called Sension, which makes software that maps up to 100 points on a face. That data is then compared to an existing database of expressions.
One afternoon, Voss gives me an advanced tutorial on what his technology can do.
Voss was especially drawn to the problem of how computers might interact with humans more perceptively. What might they see in our emotions?
“You extract both shape features—the position of certain points on the face—and you extract texture features,” he says. Voss explains that if your technology can only detect the shape of facial features, then it might be able to identify that your mouth is smiling or frowning or open—each movement creating a different shape. But if your technology can also read facial texture, it might be able to explain why. For instance, it could “capture whether your forehead is wrinkling up because you are pulling your eyebrows up,” says Voss. “That might be an important characteristic in, say, detecting surprise.”
I’m speaking to Voss on the phone from across the country. I’m in Cambridge, and he’s in Palo Alto. But even without Voss’ software to tell me how he’s feeling, I can hear his enthusiasm as he describes what the latest generation of mapping technology can detect on a face: whether someone is bored or interested, looking away or struggling to understand. He hopes his work will transform workplace training and online university courses by making it easier to monitor the level of engagement among participants. His technology may also improve the lives of people with autism, giving them more clues into how to understand what other people are feeling.
While Voss is not alone in the field—the promise of facial recognition technology has spawned multiple start-ups—he seems to be holding his own. Sension has already licensed its technology to Mindflash, a company that earlier this year rolled out a program that uses the built-in camera of a computer to monitor the facial expressions of a user. Mindflash hopes that this face-reading software will be used by organizations that rely on some form of online learning, training, or workplace orientation to see when their audience is engaged or not. Maybe people watching a traffic safety video tune out during, say, the segment on seatbelts but become interested in the segment on road rage.
Recently, as I watched on a screen from Cambridge, Randhir Vieira, Sension’s vice president of product and marketing, demonstrated how the company’s software works. He played the role of a worker taking online training—one who looked puzzled by a difficult section and got distracted when a colleague came by to chat. These states of mind, he said, could be picked up by Sension’s software. Since at least five employees must be monitored at once for the software to generate any formal scores or data, Vieira’s demonstration yielded no tangible data or results. But when a group of people is being monitored, Sension’s software produces a second-by-second average “engagement score” that zigs and zags like a polygraph, indicating when people are especially engaged or bored.
Voss, for his part, is already busy with new, related efforts. He is working on a new Google Glass application to help autistic people recognize emotions in other people. When wearing the device, the user would see words flash on the screen, words such as happy, sad, angry, neutral, surprise, contempt, agreeing, disagreeing, interest, or losing interest—providing insight into the steady stream of social cues that people are expected to pick up automatically. Voss hopes that such cues will help autistic users react better in different social situations.
And he is likely to find interested partners and investors in the project, since he is now a Silicon Valley veteran. “It’s ridiculous and sounds so stupid that some people would call me a veteran in the i-app industry, given that I am 19 years old,” he admits. “But in that industry, that’s totally normal, because it’s been around for, like, five years.”
For more on the science of society, and to support our work, sign up for our free email newsletters and subscribe to our bimonthly magazine. Digital editions are available in the App Store (iPad) and on Google Play (Android) and Zinio (Android, iPad, PC/MAC, iPhone, and Win8).