CHANGE the color of the shirt, the voice on the video said. Like magic, the shirt of the woman in the photo took on a bluish hue and, with a swipe on a slider, turned orange.
The video is a demonstration of PixelTone, a prototype iPad app that allows users to edit images using voice commands and touch gestures.
The app was created by a team from the University of Michigan School of Information working with Adobe Research. That team is led by graduate student research assistant and masters student Gierad Laput, a Cebuano.
Laput is from Barangay Guizo, Mandaue City. He went to Colegio de la Inmaculada Concepcion – Mandaue in elementary before attending Cebu City National Science High School. He studied engineering at the University of San Carlos for a year before moving to the United States in 2004.
Laput said the training he got from the schools in Cebu “really prepared me for the academic work and rigor in the US.” He got his undergraduate degree at the University of Michigan, where he is working on a masters degree. He will be pursuing a PhD in computer science this September.
Although he still hasn’t decided where to attend, Laput said he got full offers from 10 schools, including Massachusetts Institute of Technology, University of Washington, University of California Berkeley, Carnegie Mellon University and Stanford University.
In his undergraduate study, Laput and a colleague submitted CrowdConnect, a platform for internal crowdsourcing, to the Ford IT Innovation Contest. It was voted as one of the top 10 entries out of more than 200 submissions.
“We received great feedback from top-level managers within the company, but unfortunately, I had to leave Ford to study for my masters, so I was not able to fully push through with idea,” he said.
For a research project in the summer of 2012, Laput and six research scientists collaborated on PixelTone. He was an intern with Adobe Research in San Francisco and the only student on the team.
He said the idea was inspired by Siri, Instagram and Photoshop. In their paper “PixelTone: A Multimodal Interface for Image Editing,” the team said, “photo editing can be a challenging task, and it becomes even more difficult on the small, portable screens of mobile devices that are now frequently used to capture and edit images.”
“To address this problem we present PixelTone, a multimodal photo editing interface that combines speech and direct manipulation.”
Laput said, “the idea was also inspired by folks like my dad and my sister, who have less experience or are sometimes intimidated by monolithic applications such as Photoshop. They often turn to tools like Instagram or Microsoft Paint for their photo editing needs.”
“In essence, we tried to answer the question: ‘how can we fuse the richness of Photoshop and the simplicity of Instagram?’ This question was the main motivation behind PixelTone.”
Laput said the app can understand spoken commands and users do not need to memorize phrases.
“For example, you can say ‘make the image spicy’ and PixelTone will try to interpret what ‘spicy’ means. It uses grammar technology to find a command that is a synonym to the an unknown word, in this case ‘spicy.’ In this particular example, it will
increase the ‘warmth’ of the image since ‘spicy’ is related to ‘warm,’” he said.
He said that even if the user jumbles the words, the app will still try to understand the command. If PixelTone cannot understand the request, it will offer the user options and “has the potential to learn new commands that way.”
Laput said they showed the app to people inside Adobe, including product managers of Photoshop.
“Since the idea behind PixelTone is particularly new and potentially game-changing, Adobe will have to invest time to make sure the technology is ready for a wider audience,” he said.
While the current prototype will not be able to understand “ipa-gwapa (make her beautiful),” Laput said they are working on another idea that will allow users to teach PixelTone to understand words and phrases like “ipa-gwapa.”
Laput said voice interface will, in the future, become the main driver for interacting with computers.
“But the best experience is the one that gives users multiple options (i.e., voice with gestures, but not voice or gestures alone), since this brings greater flexibility in helping them accomplish their goals,” he said.