Microsoft recently released "Seeing AI," an app aimed to help the blind understand their surroundings. As Microsoft puts it, "the app narrates the world around you by turning the visual world into an audible experience."
We asked Matthew Chao, the brother of one of our founders, to test the app for us — who has been totally blind since birth (retinopathy of prematurity). For mobility outside of his home near Boston, he relies on both his seeing eye dog and, when he is without his dog, a white cane. He has integrated into sighted society — he was on the varsity track team at Brandeis University and has worked in vocational counseling his adult life.
With that said, the prospect of technology materially improving his life is, of course, a radical hope shared by all. From early childhood, he learned to read braille, an analog miracle engineered by a 19th century Frenchman. In the 1970s, the advent of computers and the World Wide Web gave Matthew access to many worlds that had previously been inaccessible. And now, four decades after the birth of the PC, the present day boom in artificial intelligence promises to revolutionize his access into the sighted world once again.
The following interview was conducted by our co-founder and edited for length and clarity by myself.
Gadget Hacks: What did you think?
Matthew Chao: Well, kudos to any company that develops a free AI app that helps the blind, even if today's version is 1.0. To be clear for all my blind friends; the app is called Seeing AI and was developed by Microsoft. It is available through the App Store. It works on iOS devices only. And it is free.
GH: Will the app change your life?
MC: For me, being entirely blind, the question is ... is this Seeing AI app a transformative technology, or a very cool toy? The answer is more the latter than the former.
GH: OK. Can you imagine any transformative technology on the horizon?
MC: Sure. A driverless car.
GH: Fair enough. Let's get to that later. Were any functions of Seeing AI radical to you?
MC: Seeing AI is designed to perform facial recognition, read short text and longer documents, identify bar code information, and offer up a "Scene Beta" function that attempts to describe your surroundings. All results are spoken via VoiceOver, the screen reader used on Apple products. That is a shit ton of features in one app. But let's break it down. Facial recognition apps have become increasingly available in recent years, barcode reader apps are ubiquitous, and short text reading (OCR) has been around for a while too. Arguably, the Scene Beta is the most ambitious and novel function, but realistically the iPhone cannot compete with cutting edge tech. In truth, the app is a handy Swiss army knife. It aggregates existing technology into one app, and that is convenient. If you were a blind person living alone, and with no sighted friends, Seeing AI could be really great. For others, I am less sure.
GH: Sounds like a mixed review. Was there any big fail?
MC: Scene Beta is the most daring, but it is also the most unreliable. But let me give you an example. I was just horsing around, and I happened to aim it at the sky.
Seeing AI told me the image above was "probably a man flying through the air on a cloudy day." Imagine that. Fantastic. Flying men in Newton, Mass., and over my house!
GH: That is kind of like the dude in the Telsa who died. Telsa had a lot of trouble recognizing white space on a tractor trailer, confusing that with clouds and abstract objects.
MC: Yeah. I don't know what white is, and I don't know what a cloud looks like, and the concept of an abstract object is ... difficult to me. I cannot see anything, but I would want a computer to know the difference between a white cargo trailer and a cloud. I am told that AI/computers want to identify, parse, and categorize every pixel into an object; when there is no object, that confuses the heck out of AI. I would imagine there is more powerful Scene Beta out there. I would imagine that DARPA has superior versions for soldiers, in sandstorms, in Mosul, at night. But seriously, I really believe that AI, neural networks, and machine learning will change the lives of blind people everywhere, and during my lifetime. A dedicated "seeing" navigation headset for the blind would be transformative, but Seeing AI is not that.
GH: I think I showed you the GM Cruise Test demo that was point to point level 4 autonomy. That is what you want?
MC: Yes. Or the much publicized blind man in the Austin Waymo car ... now that would be transformative. An autonomous vehicle would bestow true independence to a blind person. That is something, that if you asked me ten years ago, I would have said, no way. As an aside, the fact that they sidelined that Firefly project shows you that they are still a few years away from being able to execute the concept in the real world. The technology is here, but the infrastructure is not.
GH: So you would not put your life in seeing AI's hands?
MC: No. And that is the point. When a blind person is willing to surrender his or her well being to a machine, that will be transformative. Here is a very practical shortcoming of Seeing AI. It gives me no concept of distance. That precipitous cliff is two feet away or 20 feet away. That flying man is about to buzz my head, or he is hovering at 35,000 feet, I just don't know. I have no clue from Seeing AI — to the app, the object simply exists. Obviously, this is a simple problem Driverless cars have solved easily with LiDAR, radar, and other things that cost more than free.
GH: So, how would you characterize these 24 hours?
MC: Well, I happen to sleep. So there is that. And, well, this app is a battery hog. I was so fascinated by the first hour and a half of experimentation, but then I looked at my battery, and I was down to nothing, from full. I guess the camera usage, with the connectivity to the cloud and the processing just chewed up the battery. Since it takes me three hours to recharge, I could realistically test Seeing AI for four to six hours, max, in a 24 hour period.
GH: Did you learn anything surprising?
MC: Yes. I learned that my yellow lab is not yellow at all. So much for my command of colors. Here is the picture Seeing AI took of my dog:
Seeing AI says "a large brown dog standing on the floor." Immediately I was kind of pissed that the app got it wrong. I knew that Quill is a yellow labrador. He has been my dog his entire life. So I knew positively that Quill is yellow. But today, Margot, my girlfriend, and the app both inform me that yellow labradors, and Quill, in particular, are more brown/beige/dirty blond than yellow. That shows you how much I know about color! Sorry, Microsoft, for my cursing at your app.
GH: OK. Scene Beta amused you. How about the reading?
MC: This OCR functionality has been available to blind people for years. Prizmo costs $49 and KFNB costs $99. Seeing AI is free. And that is significant. On the fly, this app is good for a quick read of envelopes and letters, along with menus. However, in a restaurant, it might be cumbersome to hold your iPhone over the menu. But accuracy is not its strong suit: I wouldn't recommend this app for reading fentanyl instructions. Also, a lot of adjustment (moving the phone around) is required to get a good read on the info underneath the lens. One improvement might be to have the up volume control on the iPhone be used to snap the image of the text to be read (as it is when taking a video on the iPhone under the Camera app). It's hard for me to hold the phone steady over a page and have to double-tap the screen to take a picture. Back to the drawing board, Microsoft.
GH: The barcode reader?
MC: Out of three kitchen items, Seeing AI correctly identified two — chicken bouillon (in a round jar) and Dole Pineapple Bits (in a sort of tall can). First time around, though, I had to do lots of moving, adjusting, and fiddling until the app found and processed the barcode. If you're hungry, you need lots of patience, at least in my experience. I probably could get faster at this with practice, but if you're in a hurry, that's small consolation. It happened to take me about two minutes to line up the barcode so the phone could identify the Dole Pineapple Bits. Given the size of some items on which you might need to find the barcode, it helps to have lots of room around bigger things, so you don't feel cramped when turning them over (such as a big box of trash bags or packages from Amazon). For the third item, the app failed to identify an old package of hot chocolate. I wonder if that's a product of the barcode not being crisp enough for the reader. But Margot tells me there was a barcode on the package.
GH: Facial recognition?
MC: This misjudged the age of Margo by a few years — on the younger side. Thanks, Seeing AI! I have yet to tag her name to her face. I am not sure how useful that is for me, as I have pretty good hearing and can recognize voices very quickly. So I am not sure if me randomly scanning the crowd with my iPhone is embarrassing or useful. Nice to have, but not essential for me.
If you'd like to try out Seeing AI for yourself, download it from the iOS App Store.