The Zoolanders |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Project Sections | Project Report Book |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Part 0 - Topic Definition Part 1 - Understanding the Problem Part 2 - Design Alternatives Part 3 - Prototype/Evaluation Plan Part 4 - Evaluation |
Part 4 - Evaluationproject description | evalution methods | results | appendices Zoo Atlanta is well into construction of the barn and other facilities of the new Outback Station petting zoo. At the new Children's Zoo, children would get an opportunity to pet the animals in a contact area and learn about them. Our primary objective is to convey procedure knowledge about the proper way to interact with the animals so as not to injure or perturb them. Ideally, this should help to reduce ill treatment of the animals, both intentional or unintentional. The target user population is families with children from 3-7 years old. Of course, with such an age group it's critical to entertain in order to maintain attention long enough to teach. In this part of the design process, we wanted to evaluate how well our prototypes met our requirements and criteria through interviews, expert evaluations, and observations. Evaluation Techniques, Tasks, Users, and Rationale There were initially three main parts of our evaluation. We had intended to do an interview with parents, 3 or more cognitive walkthroughs with experts from the zoo and our class, and a Wizard of Oz interaction and observation using visiting families. In the end, we found that the cognitive walkthroughs were not the right format for the type of information we wanted. The forms we used for data collection for all of the evaluations are in Appendix A. Cognitive Walkthrough and Heuristic Evaluation Lori Arkin-Diem is the education director at Zoo Atlanta, and she volunteered to do our first cognitive walkthrough. Although we gleaned much useful information from the session, most of it was outside the format we had established for the walkthrough. A typical example: Lori performed an action and answered each of our defined questions simply with a “yes.” Afterwards she might mention a possible problem that wasn't captured in the questions. Added to the feedback we received from our Part 3 report, we realized that cognitive walkthroughs were excellent for exploring errors but not for critiquing a design based on principles like learnability and familiarity, which we had set out to do in our evaluation plan. In addition, we deliberately set out to design a system that had as few errors as possible: almost every interaction should have an appropriate system response. Our session with Lori seemed to show that we succeeded. We found that we should have been evaluating how well the design fit our requirements as heuristics, rather than where and how errors were occurring. As such, we were able to recruit three more experts from our section of CS 6750 to evaluate our system based on a combination of our own design requirements and criteria as well as some pertinent standards. Each expert was given a brief overview of the system and a set of heuristics which would be used for the evaluation. We chose the heuristics which we thought suited best for our design. The expert then carried out the evaluation and judged our system based on the guidelines. A complete list of the heuristics is provided in the next section. The sample data sheet used for these evaluations is provided in Appendix A. Interview If the parents chose not to interact with the system, we had them sign consent forms and complete the interview part of our evaluation during their children's play. Otherwise we did this at the end. This final part was an evaluation of our materials for sanitation and attractiveness. We included this because parents had expressed some concerns about the cleanliness of a hairy mechanical goat and the willingness of their children to interact with it depending on its aesthetic during our interviews in parts 2 and 3. We used our layered prototype for this evaluation, which we had also built to demonstrate (partly to ourselves) that the sensors, structure, and outer look and feel required to fit our goat form were feasible with existing materials. As you can see in Appendix B, it consisted in objective and subjective responses to questions ranging from attractiveness to their willingness to let their children play with it. Observation Our second and third evaluations went hand in hand as families visiting the zoo interacted with our prototypes and we interviewed the parents during or afterwards. We wanted to get as natural an idea of the interactions as possible, and get the experience as close to the real system as we could. The goal was to get useful data on the effectiveness of our script/voice in influencing the behaviors of the families, and we had several objective and subjective measures, including: · Total number of behaviors · Objective responses to negative feedback (next behavior after negative) · Did the families enjoy the interaction (time spent, comments)? · Were the same negative actions repeated after system admonishment? · Was there a reduction in negative behaviors in the second half compared to the first? · Are children's responses easy and natural or forced? These tell us how well our system encourages positive behaviors and curtails the natural urge to explore the bounds of the system's understanding. We wanted negative behavior to be changed, positive behavior to be reinforced, and we wanted to do it all in a way that did not discourage further interaction. Our chosen measures tell us how well we did. We also tracked the length of time and the number of interactions each member of the family interacted with the system. This was not explicitly in our first evaluation plan, but we decided it would lend further insight into the attractiveness of the system, and tell us how long we should wait before the system ends the interaction due to positive behaviors or idle time. For the observation of interactions, we found an area of the zoo near the planned petting zoo (now under construction). We selected a shaded location to simulate the light and glare levels in the barn, where the completed system would go. We were unable to simulate the smell that we assume would go along with an enclosed space with a few penned animals. One environmental factor that took us by surprise was noise level (see discussion below). We put the piñata prototype on a bench to make it more accessible to the appropriate ages and be closer to the height of the completed system. A laptop was placed next to it and the keyboard covered to avoid distraction from the display. Our sole “prop,” the brush, was put on the bench at the feet of the piñata. The population was admittedly opportunistic rather than truly random. Unfortunately, we could not afford to apply a random selection criterion to visitors when we were pressed for time. There may also have been a volunteer bias: although we were fortunate that children were attracted to our system and began playing with it on their own, this may have precluded getting data from children who were more shy or would have needed more parental coaxing with the real system. Of course, it could also be that this is how it would work with the real system as well: children who were more attracted to it would spend more time with it at the expense of shyer children. We ended up with 14 families and 25 total people interacting with the system. We felt that with this number, many of the problems we found would have begun to repeat themselves with new subjects, and the objective data we gathered would have some power and reliability. They were encouraged to play with the system in any way they liked, and we tried to stay removed from their interaction. Their goals were their own and their only defined task was playing with the mechanical goat prototype. Heuristic Evaluation Does the system speak the user's language? The experts said that the system did a good job in communicating messages in a way that could be understood by children. One expert suggested that we could use a more childish voice or a more “goaty voice.” Another expert commented that the system did a good job in terms of voice inflections/intonations, but suggested that the system could be improved by changing its speech based on the age of the interacting children. Our third expert said that though our system speaks the user's language, spoken language was not what a child would expect from an animal. Is the system consistent in its behavior? All the three experts said that the system had a very consistent behavior. Each bad behavior was followed by a bad message and each good behavior by a good message. One of the experts stated that our messages were limited and we should have tried a variety of good and bad messages. Does the system provide feedback for user's actions? Each expert remarked that the system did a good job in providing feedback. One of the experts appreciated the real time feedback of our system. He also suggested that we could try and provide feedback through the mechanical goat. Does the system respond with good error messages? Two of the experts felt that they could not answer this because they did not commit any errors while interacting with the system. The other expert said that we had clear error messages. However, errors in our case are those interactions with the system which we have not accounted for and our system cannot provide any feedback. So, we did not get any valuable critique of the system based on this heuristics. Does the system recognize, diagnose and recover from errors? The expert who came across the error in the system said that he was not clear about the system behavior in terms of error. He gave an example scenario of two children simultaneously interacting with the system and one of them doing something bad. The system in this case would “scold” both of them. This could confuse the child who was interacting with the goat correctly. Flexibility and Efficiency? The experts said that our system performed very well in terms of flexibility. One of them mentioned that we had covered a large array of interactions with the system. However, one of the experts expressed concerns regarding the efficiency of the system. He was doubtful about the system performance in the case where a child is performing both good as well as bad actions simultaneously. Is the design aesthetic and minimalist? All the experts said that we had done a fair job in this department and the aesthetics of our system was good from a child's point of view. Discussion The heuristic evaluations provided us with some valuable insight on what changes could be made to make our system more effective. One of the first changes we could make would be to have the onscreen goat talk in a way where children would understand it better. We feel that a child would have a more useful experience with the system if they could relate to it well. As such, we could change the system voice to a child's. If we had time, we could also test the words and sentences used for their clarity. We have covered a wide variety of interactions with the users. However, we also found out that our system was too limited in the variety of output responses. This tended to make the interaction repetitive and boring. We also found that we didn't include responses for every possible action. In this case the prototype may have provided a generic good or bad response. Obviously, the real system couldn't make such on-the-fly judgments about the unanticipated behaviors. Some interactions were unaccounted for and were not clearly either good or bad (such as touching the horns), so we provided no visible/audible feedback, which was confusing for one of the experts. To liven up the interaction we need to have several different sayings, backgrounds, goat images, etc. This would help keep the children's attention better than the current system. Though we had done a fair job on looks, there was still plenty of room for us to improve the aesthetics of the design. Obviously, the real system would have more natural and complete hair covering, but the more subtle form and look of the system could be improved. Finally, in terms of this evaluation, our definition of “an error” was not clear and difficult to evaluate. Interview We were able to interview 11 out of 14 families that interacted with our system, the other three were unavailable. During each interview we asked a set of questions to an elder family member. The parents were asked to answer the first four questions based on our look and feel prototype. The final two questions were designed to get feedback about the effectiveness of the overall system. We asked the parents to rate the naturalness of the material on a scale of 1 to 7, with 7 representing the most natural feel. The average rating of the material was 4.28 out of 7. The best rating the material got was 6 and the worst rating was 2. A couple of parents said that the material was not rough enough. A mother also expressed the same concern and said that the material did not have the right texture and was softer than real goat fur. When asked about the durability of the material, 55 % of the parents felt that the material was durable while 45 % felt that it was not. Many parents said that the material would not stand up to continuous brushing. One of the parents pointed out that the hair was falling out while brushing. Another parent felt that the material was not durable enough to withstand pulling. Most of the parents interviewed said that they would let their child play with the material. Only one of the parents was not sure if she would allow her child to play with it. All the families found the material attractive. A mother mentioned that the material was especially attractive for girls. Approximately 80% of parents felt that the children would find the interaction with the system educational. They also mentioned that educational experience would improve if the parents helped the children. One parent said that older children would find the system more educational than a 3 year old would. A couple of parents said that the interaction with the system was not educational. One of them remarked that the child was not paying any attention to the messages. Another parent said that it was difficult to hear the messages clearly so the child would not be able to follow the message. Most parents felt that the children would learn about animal empathy by interacting with the goat. Some of them said that the children would be able to learn better with help from them. A parent thought that the children would learn about empathy for at least a short term. Discussion After reviewing the interview data, we determined that our system was successful in its attempt to educate families on the treatment of animals. However, based on our interview results we feel that our system doesn't meet the necessary durability requirement. Most parents had doubts about the durability of the fabric we used to simulate the goat's fur. These concerns stemmed from such things as the fur falling out when brushed and pulled by children. There were also questions raised about the naturalness of the system, and we got an average of a 4.28 rating out of 7 on how natural the system appeared. We were told that the fur needed to be coarser and more resistant to abuse to make the system seem lifelike. Based on these results, we think it is necessary to use a more realistic substance for the fur. This will increase the durability of the system, make the interaction more lifelike, and increase the naturalness ratings from the users. Although our system had issues with its naturalness, the parents overwhelmingly found the system attractive and would let their kids interact with it. This suggests that we designed a system that looks like it should in order to invite interaction. However, the interaction suffers because the system doesn't feel how they anticipate it should. Many parents mentioned that they don't remember what a goat feels like. This made us think that probably it was not too important for the system to feel exactly like a goat. However, a more durable fabric would probably provide that characteristic since it would likely be a more coarse material. From the answers of the questions regarding the overall effectiveness of the system we found that our system would be able to convey the message of animal empathy in a useful manor. We did find, however, that there were two important factors influencing the educational experience and the effectiveness of the message provided by the system. The first of these was that having assistance from a parent would help children relate better and understand the output of the system more clearly. The second influential factor was the age group of the users. The parents felt that children around three years old would not understand the messages given by the system and, therefore, would not find the experience educational. They also felt that older children would understand the messages and would learn more from the system. However, we found that younger children did understand and respond to at least some of the system messages. WIZARD OF OZ Results The average total time of interaction, sometimes including short breaks and returning to play more, was 1m 38s. Most of the interactions were less than minute.
For most children, total interactions were comprised of multiple short trips, where each trip included a sequence of many interactions with the system. Most of these trips were less than 1 minute.
One of our criteria was minimal repetition of the same bad action (within 1 person). We found that only 27% of all recorded bad actions were repeated bad actions by the same person. That is, 27% percent of the time when a bad action was performed, it had been performed by the same person at some previous point in the interaction.
Another criterion was that behaviors immediately following a bad behavior (and admonishment) should be the same. Indeed, we found that 83% of the behaviors after a bad behavior were different. Over half, 53% of them were positive behaviors. We also thought that if the system were effectively teaching proper ways to treat real animals, behavior should improve over the course of a single session of play (the major goal of the system). Instead, we found that when everyone's total interactions were split in half based on order, there were more bad behaviors in the second half than the first.
Discussion The average interaction time was actually more than we expected. This affects the sliding windows we had established for the number of good or bad interactions that must take place for the system to give an appropriate cumulative response. In several cases the prototype dismissed the children when they were not done playing. While we wanted to keep children moving through the installation in the real system, we have no clear guidelines for a maximum interaction time to avoid lines and crowding. As such, it is a higher priority to support the children through their entire desired interaction. Thus, we might require more good or bad behaviors before a cumulative response was given. 27% repetition of the same bad action is more than acceptable. Considering that in some cases children could not hear system responses, some children were younger than our target age, and one or two other were incorrigible in their behaviors, it seems that most children learned a lesson from the system response and did not come back to the same bad action. Of course, this could also be explained by exploration of the system: children simply moved on to more behaviors regardless of system response. However, most children repeated petting and grooming behaviors many times in their interactions without feeling the need to move on. Exploration of the system did come into play when children selected their next behavior after a bad one. 83% of the time, their next action was different and most of the time, it was a good behavior. Again, it's possible that they would have moved on anyway, but there are other situations where they chose not to. It's safe to conclude, especially when subjects were observed watching and listening to the onscreen goat, that it had an effect on subsequent behavior. The fact that there were more negative behaviors in the second halves of interaction than the first could be due to many factors. Due to our criteria for ending the interaction based on cumulative good and bad behaviors, children may have been dismissed by the system one or more times before they left the prototype. Once dismissed, they may have a more carefree attitude about how to interact with the goat, and we did not keep track of dismissals in the order of interactions. Another factor is again, exploration of the system. It is barely possible to force the system through each of its 18 direct responses to inputs in the 1m 38s average interactions. It's easy to think that children are, to some extent, trying to find all of the different inputs and associated responses. It would be naïve to think their sole goal is to get positive system responses, as illustrated by the girl who renamed Mel “PinkyPie.” We were most pleased that in many cases, the children approached and played with our prototype before we approached their family. Not only did this make it easier to get the families' involvement, it also encouraged us that the design was attractive to our target age group. Parents would see their children playing with something on a bench and scold them, and then we would be able to step in and explain that the children were welcome to play with it in any way they liked, and could even participate in our study. Our presence may have influenced the course of the interaction in an unnatural way, despite our efforts to remain out of the picture. Parents seemed unwilling to interact with the system while their child did, especially when there was only one adult per family. Instead, they would hang back and talk to the experimenters who had asked their permission. Also, some of the children took note of our presence and may have acted differently than with the real system. Certainly the Wizard of Oz method completely fooled all of the children and at least suspended the disbelief of parents. However, the system's obvious incomplete state, non-exhibit location, and our peripheral presence certainly led some to interact differently in ways we cannot easily factor. It did not stop children from behaving badly at times, so we hope we obtained an accurate picture of the interaction process. As well as our onscreen goat may have supported positive behaviors, we hope that with the real system, parents would be more prominent components of the system. An unanticipated problem was the noisy environment. We were near a gift shop on the way to the popular Giant Panda exhibit so that we could have easy access to visitors and an electrical outlet. Unfortunately, we had only the laptop's built-in speakers. Most of the time this was sufficient for children to hear, but we were near the construction site of the new contact zoo, as well as the zoo's tour train tracks. Chainsaws and train whistles, in addition to the normal bustle of people, sometimes made it very difficult for children to hear feedback from the computer. We imagine that zoo maintenance, trains, and large numbers of people, will all cause similar levels of noise from time to time in the proposed location of our system. At the very least, we would certainly need strong, directed sound for those interacting with the system. At most, it could require volume that adapted to ambient noise, sound shielding of the room the system was in, or even active noise cancellation, although this would probably be overkill. For our evaluation, it may have had a major effect on the last few families who interacted with the prototypes. Except for environmental interference, most children heard and understood the basic message of the onscreen goat, Nicole. However, some of the younger children, mostly those even younger than our target group of 3-7, weren't old enough to understand the relationship between the two goats, and so would sometimes respond to the invitation to groom by brushing the screen of the laptop. As mentioned above, we had covered the keyboard section and had planned to create an opening in cardboard through which the screen would show. We hope that placing the screen in a wall and slightly higher than Mel would reduce this confusion but this would require further testing. Still, our target age group did not seem to have this trouble. Another problem developed from the combination of the physical prototype and our script. Only parts of the prototype were covered in fur, and this may have been a cue for selective interaction. That is, children played more frequently with those areas as we predicted, but this could be because there was fur on them. This was part of the problem when children would play with the ears, which invited play because they were covered in fur. The system was designed to respond negatively to pulls on the ear, and so would frequently respond: “Hey, our ears are sensitive! Try gently grooming Mel with the brush.” In retrospect, even we might have ambiguous responses to this instruction. The children's response was to much more slowly and lightly brush Mel's ears: a perfectly reasonable interpretation. The script would need to be changed to direct children to the areas the goats would find acceptable. The same problem occurred when children brushed or petted Mel's tail. A related problem is that covering selective areas with fur makes them similar to dolls. One girl renamed Mel “Pinkypie” after her My Little Pony. She went on to groom every area that had hair into her preferred patterns. She even moved Mel's glued-on smile. Of course, this produced many negative system responses, even making the onscreen goat run away, but when that happened she pronounced Pinkypie done, said “ta-da!” and skipped back to her mother. We believe this is specific to the prototype due to its size and selective covering. Seeing a full-size, more accurately covered and life-like Mel might reduce such behaviors by removing it from the appearance of a doll. That would help to keep such behaviors from coming to mind and potentially lend more authority to the voice of something they knew less about. Again, this would require further testing to be sure. A similar problem was that once children used the brush, whether by their own initiative or with the Nicole's prompt, they rarely went back to petting with their hands. The script would often suggest that they pet, but we never thought to suggest they stop using the brush. In the end, this may be an issue of preference and not a problem at all. Improving the Prototype Based upon the feedback from both the observations performed at Zoo Atlanta and the feedback from the heuristic evaluation we learned about the aspects of our prototype that could be improved. First, we learned that there is a need for a greater variety of responses from the system. Currently there is only one response for every interaction. By providing more responses, they don't become monotonous. They also encourage different interactions and will hopefully hold a child's attention span longer. In addition, most of the current messages encourage grooming, which is also the harshest activity that could be performed on the prototype. Even some of the current, as well as any responses to be added should encourage interactions other than grooming the goat with the brush. This could assist in extending the life of the material that covers the mechanical goat. From our cognitive walkthrough with Lori and our observations, we learned that we had not accounted for all possible interactions that a child might perform. For example, Lori pulled the goat's horns. This is an interaction that we had suspected would be performed. We had created a generic message for our system, to use in such cases. However, this message was not always appropriate, is as stated above, could become too monotonous and cause a child to lose interest in the interaction. Another valuable lesson that we learned about our prototype was the importance of a durable material. Mel left the evaluation with half the fur he arrived with because grooming was the most popular activity of the day. The prototype needs to be covered in a material that will not shed quite so easy and will stand up harsh and constant grooming by eager children. We learned from our interviews that the parents did not find the material to be like goat fur at all. The material was much too soft and goat fur has the coarseness of human hair. This is another characteristic that should be sought in a new material. Throughout the day of the evaluation at Zoo Atlanta, the noise level became increasingly higher. Thus, very many of the children could not hear the system responses, so they did not pay any attention to them. Better acoustics would have helped significantly. If we were to redesign this prototype, sound would be a factor to which much more attention would be paid. The system should have a volume control that could be adjusted accordingly based upon the current level of noise. The system responses were confusing to some children due to the rapidness of their interactions with the system. A child would perform a negative action such as poking the goat in the eye and then immediately proceed to pet the goat. By the time the system responded to the negative action, the child had refocused their actions and the system response simply did not make sense to them. The prototype should be designed in order to launch the system responses quickly. By having a method to launch quick responses, the system could provide feedback that was appropriately matched to the interaction a greater percentage of the time. The piñata was intimidating to some children. While they understood that they could touch it and play with it, several did not seem to understand what it represented or that there was a relationship between the prototype and a real live goat. If the prototype of the mechanical goat were to be created again, it should be more to the scale of an adult goat with features that are to scale. This might encourage more children to play with the prototype because it could sit on the ground instead of on a bench. Children are also more likely to make the connection between the prototype of the mechanical goat with a real animal. Critique of the Evaluation Plan If we had to do our evaluation over again, there are several material things we would have changed about our process. External speakers would have ensured that all of the children could easily hear the system responses to their behaviors. Choosing an area with less noise may also have helped, but it would be more realistic to attempt to compensate for the noise in the environment that the real system would also have to face. A larger computer screen would have better mimicked what we intended for the real system: a larger, television-like display that could be viewed from more angles and distances while people are interacting. Placing it higher may also have alleviated some of the confusion as to which item (computer or piñata) should be the object of interaction. Using video was not permitted by the Zoo and would likely have caused problems with parents. However, it would have been invaluable as we plowed through opaque data sheets to decipher who had done what at what time. It would also be a reference for the facial expression and subjective comments made by both parents and children. We had talked about more concrete incentive for positive interactions and successful completion of system interaction, such as badges or stickers. Many parents had mentioned the junior park ranger badge as a pertinent example. We forgot to include this aspect in the prototype and remain uncertain how it would affect what ended up being very quick interactions with the system. In retrospect, it seems infeasible to have such incentives without more zoo staff and more complex, longer interactions that our current design does not support. Unfortunately, we did not time how long people stayed away from the system before coming back to interact more. This would have suggested how long to wait before the system goes idle. We would use the upper 95th percentile of duration spent away before returning as the time to go idle. There is a greater cost of prematurely ending the interaction compared to not letting the families hear the system say “goodbye.” Ideally, we would have had enough time to recruit more expert heuristic evaluators. Three is a fairly standard number of contracted experts in industry, but we found that our results from other graduate students would be more complete with a few more perspectives. Those people we knew with truly expert level experience that a corporation would be happy to contract were, not surprisingly, very busy. We would like to have had their feedback as well. Given time, we would likely revise our interview questions and/or methodology. All parents were rushed to some degree: keep kids entertained, move on to the next exhibit. Questions with answers in the format of “Yes/No, Explain” lent themselves to curt responses and it was difficult to get good explanations. This appendix contains samples of the data sheets that we used to collect data for the four different evaluations that we performed. The first data sheet is a copy of the questionnaire that we used to interview the parents at Zoo Atlanta. The second data sheet is the observation data sheet that we used in order to track a family's interaction with the prototype at Zoo Atlanta. The third data sheet was the form that we used to time the interaction. On this sheet we recorded the time that each person in the family spent interacting with the prototype. The fourth data sheet is a copy of the heuristic evaluation that was developed as per feedback from part 3 of this project. The fifth data sheet is the cognitive walkthrough that we had initially planned to use in order to evaluated our prototype. Data Sheet 1 Sex: ______ Male ______ Female Parent: ______ Yes ______ No 1. How natural do the flesh and hair feel on a scale of 1 to 7, 1 being not natural at all and 7 being completely natural? 1 2 3 4 5 6 7 2. Do you think the material will stand up to a child's interactions? Yes No Explanation: 3. Would you let a child play with this material? Yes No Explanation: 4. Do you think a child would find this material attractive? Yes No Explanation: 5. Do you think that a child would find this experience educational? Yes No Explanation: 6. Do you think that a child would learn about empathy for animals from this experience such as not to pull the goats tail? Yes No Explanation: (Explore WHAT they think the child would learn) Data Sheet 2 Family No: Date: Sheet No: TimeLine
Data Sheet 3
Data Sheet 4 For each step of the interaction, they cognitive expert should answer:
Action: Pull the Goat's Tail Action: Pet the Goat Action: Groom the Goat with the Brush Action: Touch the Goat's Face Action: Poke the Goat in the Eye Action: Give the user to opportunity to perform actions that they think a child would perform Once the actions have been performed, the following questions should be answered:
Data Sheet 5 Heuristic Evaluation Guidelines: Speak the user's language? Be consistent? Provide feedback? Good error messages? Prevent errors? Match between the system and the real world? Flexibility and efficiency of use? Aesthetic and minimalist design? Recognize, diagnose and recover from errors? Appendix B Raw data
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||