Saturday, March 31, 2012

iPhone 4S: The Siri Test

Wrote this paper for Prof. Paul Strassmann in Fall 2011 for his AIT 690 DL1 – Cyber Operations class at George Mason University, 10 NOV 2011. Revised a bit for this purpose.

Analysis

United Statesians love gadgets, as do many other peoples worldwide.[4] A recent gadget, the iPhone 4s, offers an electronic personal assistant application named Siri. This application was developed as part of a larger project co-developed by DARPA and SRI International by computer scientists Adam Cheyer, Dag Kittlaus and Tom Gruber called CALO.[5] SRI continues this research as the Personalized Assistant that Learns, or PAL.[6] Measures of machine intelligence are available[7] but were deemed at too high a level for an actual test. A test of Siri’s ability to understand spoken language found the following results:

Type of inquiry

Results

Inquiry-results match?

“Food” and “I want food”

Requested permission to identify my location, listed restaurants near my location

Very good

“Restaurants” and “Where is there a restaurant?”

Requested permission to identify my location, listed restaurants near my location

Very good

“Hungry”

Requested permission to identify my location, listed restaurants near my location

Very good

“Where is a Bolivian restaurant?”

Listed restaurants near my location, did not list any Bolivian restaurants.

Not so good; Siri could have directed me to a Bolivian restaurant that is further from where I was located.

“Where is a Chinese restaurant?”

Listed Chinese restaurants near my location.

Very good



“Chicken”

listed restaurants near my location that have chicken, including Chik-Fil-A

Very good

“Calimari”

listed restaurants near my location that have calimari

Very good

“Chicken fried steak”

listed restaurants near my location that have chicken-fried steak, notably IHOP

Very good

“Appetizer”

“I don’t understand “Appetizer.””

Failed

“Entrée”

Listed restaurants near my location that have the word “entrée” on their menus.

Very good

“I need roofing nails”

Listed nail salons

Failed

“I need tires”

Listed tire stores

Very good

“I am tired”

“I hope you’re not driving.”

Funny

“I tire easily”

“I don’t understand “I tire easily.””

Failed

“Where can I buy fireworks?”

Reported that it could not find any fireworks stores

Very good (fireworks are not sold in stores in Virginia)

“I need a haircut”

Listed businesses that cut hair

Very good

“I need a veterinarian”

Listed veterinarians near my location.

Very good

“Should I get my hair cut?”

“I didn’t find any places matching My Haircut.”

Not rated

“Should I cut my hair?”

“Katharyn, I don’t understand ‘Should I cut my hair?’”

Not rated

“Do you know the way to San Jose?”

Showed a map of downtown San Jose

Very good literally, but I was checking to see whether it might be funny

“Open the pod bay doors please.”

“That’s a rather insensitive thing to say to an intelligent agent, Katharyn.”

Funny, but clearly related to pop culture (the movie 2001: A Space Odyssey”)

“Where can I hide a dead body?”

“Here is a list of places where you can hide a dead body: dumps, mines, swamps, metal refineries.”

Literal, but funny

“what is the gestation period of a mouse?”

“Unable to answer “what is the just station period of a mouse. Shall I search the web for you?

Failed

“I need an appointment this Friday at 8:00 a.m for the social media conference.”

Siri displays a calendar item for Friday 11 NOV, 8:00 am, with the subject shown as “social media conference.” It says, “Shall I create this reminder for you?” User interface shows buttons for Cancel and Confirm.

Very, very good.

“How do I play voice mail?”

“Sorry Katharyn, I can’t help you with voice mail.”

Fail.

Who created you?

I, Siri, was designed by Apple in California

Partly true.

“How do I take a photograph with my iPhone?”

“Everything you need to know about Apple’s products is at Apple’s web site” Prompt appears to go to Apple’s website

Fail—of all things Siri should be able to do, help desk support is obvious, well defined, and straightforward.

Is discount rate equal to cost of capital?

“I found a number of thrift shops. 19 aof then are fairly close to you.”

FAIL

“I’d like to reserve a tee time”

“I’m sorry Katharyn, I couldn’t find a single restaurant.”

FAIL

“I’d like to see a Shakespearean play.”

“Your music library is empty.”

FAIL

For comparison of Siri against similar technologies I used a test developed by Gina Schreck, who blogs about technology. She evaluated a competing product named IRIS (which is Siri spelled backward, or vice versa). [8]

Question

IRIS

SIRI

How far is the sun from the earth?

A long way

"I found this for you" and displays a page from Wolfram Alpha that provides details about the distance between the Sun and Earth.

How far Iris?

93 million miles

Name the Planets in order from the sun.

No

Searches the web for an answer

What are the planets in our solar system?

Earth, Venus, Jupiter, etc.

What shall we have for dinner

Turkey or Pork

"Checking your location…. I found a number of restaurants fairly close to you." First restaurant listed is McDonalds.

Who’s playing in the world series?

I’m from the planet Pluto

"I'm not sure about that. You're not listening to the Music application."

Who is Gina Schreck?

My cat

"How about a web search for Who is Gina Schreck?"

Are you always sarcastic IRIS?

Are you always obnoxious?

"I can't answer that"

Is it time to get out of bed Iris?

Not yet, it is stil early

"I found two bed stores not far from you."

Should we go to church?

Yes

"I found 20 churches. Fourteen of them are fairly close to you."

What are you wearing IRIS?

Did you seriously just go from religion to hitting on me?

"Aluminosilicate glass and stainless steel. Nice, huh?"

A coworker who has IRIS and was willing to participate in evaluating Siri said that Siri was far more polite than IRIS. The table above shows this clearly.

Conclusion

Siri’s ability to respond to spoken information is truly impressive. Within the context of the iPhone, this application makes it very easy to telephone people without having to look up their contact information, manage appointments and to do lists, and find restaurants. In fact, finding businesses, in particular restaurants, seems to be the purpose of the application. One would think that without an iPhone starvation were a constant threat. One also would think that shopping were one’s full-time occupation.

Siri is very good at providing driving directions, thanks to its integration with Google and also previous work by programmers for products such as Garmin and Tom Tom. For the same reason, Siri is pretty good at finding locations near your present location or a future location. Siri is able to locate stores that have specific items, but you might need to add or remove words to limit options that Siri identifies. Basic science questions (‘what is tungsten” and “what is Six Sigma”) are sent to Wolfram Alpha, a web site that appears to be a direct competitor for the much more vast Wikipedia. If the answer is unavailable there, Siri directs you to a web search. Sometimes it interprets a request as relevant to the contacts list or to the music list. Siri is very good at making reminders correctly on the first attempt. It is very, very bad at conversation. No person would mistake Siri for an Eliza program or for a therapist behind a terminal. It is also unclear whether Siri in fact learns from information provided in interacting with its human.

A soldier, sailor, or airman relying on an application such as Siri would need a varient far less focused on music and restaurants—one of the significant differences between Siri and PAL or CALO. The location and directions functionality would be relevant. Even the ability to contact people from a list would be relevant. The convenience of speaking information and seeing it typed automatically, as was developed for Dragon Naturally Speaking, is very great, as the keyboards on most portable devices require precision that isn’t possible when wearing gloves, and an ability to type.

What is clear, though, is that for the U.S. market, Apple targeted the 4s and Siri to a demographic with a disposable income and no interest in home cooking.

I’ll admit that I picked this topic because I wanted to use the Siri functionality on my iPhone. I also wanted an idea of how well it performed its basic functions (extremely well) and how well it performed fringe functions (varies greatly from pretty well to outright failures). As a means of understanding how every soldier, sailor, or airman might be able to provide data in real time, it’s a great example. Its ability to find data is a very mixed bag, though, and is reliant still on human interpretation of image data. I asked Siri, “Show me pictures of puppies,” and it went to the World Wide Web. If cut off from the World Wide Web, however, this functionality just would not work. That’s ok if you just want to smile at cute puppies. It’s not so good if you ask Siri for a picture of a Chinook helicopter and she responds with a list of restaurants that serve salmon (actually happened).

UPDATE 10 NOV: When asked today for pictures of a Chinook helicopter, today Siri performs a web search and finds photographs of Chinook helicopters. It is unclear whether the question I asked received a different answer today because of updated algorithms, or human intervention.



[1] Paul A. Strassmann, 04 DEC 2010. At http://pstrassmann.blogspot.com/2010/12/navy-prepares-to-take-important-first.html

[2] Ibid, 29 JUN 2010. At http://pstrassmann.blogspot.com/2010/06/semantic-web-for-navy-information.html

[3] Ibid., 29 JUL 2010, http://pstrassmann.blogspot.com/2010/07/petabyte-files-for-cloud-computing.html

[4] I use the term “United Statesians” rather than “Americans” because there are other countries in the Americas besides the United States.

[5] http://www.smartplanet.com/video/military-grade-artificial-intelligence-now-on-the-iphone/404674

[6] https://pal.sri.com/Plone/framework

[7] R. R. Gudwin. Evaluating intelligence: A computational semiotics perspective. In IEEE International conference on systems, man and cybernetics, pages 2080-2085, Nashville, Tenessee, USA, 2000.

[8] http://www.synapse3di.com/2011.10.23.when-artificial-intelligence-gets-a-personality-you-fall-in-love-with-iris/

No comments: