Usability Guidelines

Definition

Usability testing measures the suitability of the software for its users, and is directed at measuring the effectiveness, efficiency and satisfaction with which specified users can achieve specified goals in particular environments or contexts of use. Effectiveness is the capability of the software product to enable users to achieve specified goals with accuracy and completeness in a specified context of use. Efficiency is the capability of the product to enable users to expend appropriate amounts of resources in relation to the effectiveness achieved in a specified context of use. Satisfaction is the capability of the software product to satisfy users in a specified context of use (see Jacob Nielsen's work including web site www.useit.com). Attributes that may be measured are

§ understandability (attributes of the software that bear on the users’ effort for recognising the logical concept and its applicability)

§ learnability (attributes of software that bear on the users’ effort for learning the application)

§ operability (attributes of the software that bear on the users’ effort for operations and operation control)

§ attractiveness (the capability of the software to be liked by the user).

Note: Usability evaluation has two purposes: the first is to remove usability defects (sometimes referred to as formative evaluation) and the second is to test against usability requirements (sometimes referred to as summative evaluation). It is important to have high level goals for effectiveness, efficiency and satisfaction, but not all means to achieving these goals can be precisely specified or measured. It is important that usability evaluation has the objective of "getting inside the user's head" and understanding why they have a difficulty using the proposed design, using methods that help understand the problems.

More information on setting criteria for effectiveness, efficiency and satisfaction can be found in ISO 9241-11: Guidance on usability, and ISO/IEC 9126-4: Quality in use metrics ("quality in use" is defined is a similar way to "usability" in ISO 9241-11).

Overall approach

A three-step approach is suggested overlaid on the V model

1. Establish and validate usability requirements (outside the scope of this standard)

2. Inspect or review the specification and designs from a usability perspective (covered in the process section of this standard)

3. Verify and validate the implementation (usability testing)

Usability test documentation: Usability testing may be documented to follow ISO 14598 (also note that a Common Industry Format is being developed by the Industry USability Reporting project (IUSR) and documented on www.nist.gov.uk). The documentation may include the following:

§ Description of the purpose of the test

§ Identification of product types

§ Specification of quality model to be used

§ Identification of contexts of use (including users, goals and environment)

§ Identification of the context for the test showing how closely this meets the actual context of use

§ Selection of metrics, normally measuring at least one metric for each of effectiveness, efficiency, satisfaction and where relevant safety

§ Criteria for assessment

§ Interpretation of measures of the usability of the software.

Selection of techniques: Many techniques are available for developing usability requirements and for measuring usability. Each project or business would make its own decision about selection of techniques, depending on cost and risk. Example - a review process only by the development team incurs lower preparation and meeting costs but does not involve the users, so it addresses in theory how a user might react to the system to be built. A review with users costs more in the short term but the user involvement will be cost effective in finding problems early. A usability lab costs a great deal to set up (video cameras, mock up office, review panel, users, etc) but enables the development staff to observe the effect of the actual system on real people. This option may be attractive where this form of testing is high priority, but for a relatively small number of applications. It is possible to make a simpler, cheaper environment; for example, Perlman’s use of a mirror on the monitor with an over-the-shoulder video camera, so that he could record the screen and the user's expression.

Test environment: Testing should be done under conditions as close as possible to those under which the system will be used. It may be necessary to build a specific test environment, but many of the usability tests may be part of other tests, for example during functional system test. Part of the test environment is the context, so thought should be given to different contexts, including environment, and to the selection of specified users.

Size of sample user group: Research (ref. http://www.usability.serco.com/trump ) shows that if users are selected who are representative of each user group then 3-5 users are sufficient to identify problems. 8 or more users of each type are required for reliable measures. The Common Industry Format (http://www.nist.gov/iusr) also requires a minimum of 8 users. In contrast, Nielsen measured the number of usability problems, not user performance (Nielsen, J. & Landauer, T. K. (1993) A mathematical model of the finding of usability problems. In: CHI '93. Conference proceedings on Human factors in computing systems, 206-213). In practice the number required would depend on the variance in the data, which will determine whether the results are statistically significant. Another paper stated "achievement of usability goals can only be validated by using a method such as the Performance Measurement Method to measure against the goals, and this requires 8 or more users to get reliable results." Macleod M, Bowden R, Bevan N and Curson I (1997) The MUSiC Performance Measurement method, Behaviour and Information Technology, 16, 279-293.

Test scenarios: The user tests require production of test scenarios. These include user instructions, allowance of time for pre and post test interviews for giving instructions and receiving feedback, logging the session (for the designers and developers to observe if they cannot be present), suitable environment, observer training, and an agreed protocol for running the sessions. The protocol includes a description of how the test will be carried out (welcome, confirmation that this is the correct user, timings, note taking and session logging, interview and survey methods used).

Usability issues: These may be raised in any area that a user (novice or expert) can be affected by. This includes documentation, installation, misleading messages, return codes. These can often be perceived as low priority if the functionality is correct even if the system is awkward to use. For instance a spelling mistake or obvious GUI problem in a screen that is frequently in use will be more serious than one in an obscure screen that is only occasionally seen.

Building the tests

In this standard there is not space to describe all the usability testing techniques in detail. We have selected a few important ones, but other techniques are available. These are listed but not defined at the end of this document. Whatever techniques are used, you will need to decide the goals for the test, make a task analysis, select the context of use, and decide on appropriate satisfaction, effectiveness and efficiency measurements. These could be based on published methods such as mental effort measured by how hard an effort someone has to make to solve a problem (referred to as Cognitive Workload), or by simply timing several users at a task, or by asking the users for their views.

Early life cycle techniques: Some techniques can be used early in the lifecycle and so influence and test the design and build of the system, for example Heuristic Evaluation (Heuristic evaluation is a systematic inspection of a user interface design for usability. The goal of heuristic evaluation is to find the usability problems in the design so that they can be attended to as part of an iterative design process. Heuristic evaluation involves having a small set of evaluators examine the interface and judge its compliance with recognised usability principles (the "heuristics").

Late life cycle techniques: Some techniques are used post software build, for example the use of survey and questionnaire techniques where the system is in use and observations of user behaviour with the system in a usability test lab.

An example of an observation technique is the Thinking aloud protocol. In this method the users describe what they are doing and why, their reaction to the system - they think aloud. This is recorded, either on a video recorder, an audiotape or by an observer sitting with the user. In this case, a "test lab" may be set up; mimicking the normal office set up. A video tape recorder is positioned behind/above the user, and the observer sits either with the user or behind a 2-way mirror. The users talk to the observers during the work to say what they are doing and what they are thinking. The purpose of the test is explained to the users - that it is a test of the system's usability not a test of the users. They are given instructions as to how to run the test, and observation and reporting rules. This type of test is explorative, using test scenarios which would be done first by the usability tester as use case tests, and then brought into the usability lab or thinking aloud with the user. It is important to consider the effect on the user of being observed; the tests must take place in an atmosphere of trust and honesty.

You may also wish to consider survey techniques and attitude questionnaires, either "home grown" or if you wish to measure against a benchmark the use of standardised and publicly available surveys such as SUMI and WAMMI, which are marked against a database of previous usability measurements. Part of the ESPIRIT MUSiC project, SUMI is developed and administered by the Human Factors Research Group at the University College Cork. SUMI is a brief questionnaire that is marked against a benchmark of responses to surveys of systems. WAMMI is the on-line survey administered as a page on the web site, and users are asked to complete it before they leave the page. This gives ongoing feedback to continue monitoring how the web site is used. Each organisation using the SUMI or WAMMI surveys send back their results to the HFRG who provide statistical results from the database build of all SUMI/WAMMI users.

Test case derivation: The test cases are the functional test cases but are measured for different outcomes, for example speed of learning rather than functional outcomes. Tests may be developed to test the syntax (structure or grammar) of the interface, for example what can be entered to a field as well as the semantics (meaning) for example that each input required, system message and output is reasonable to and meaningful to the user. These tests may be derived using black box or white box methods (for example those described in BS7925-2) and could be seen as functional tests ("we would expect this functionality in the system") which also are usability tests ("we expect that the user is protected from making this mistake"). Techniques outside BS7925-2, for example use cases, may also be used. Note: use cases are defined within UML (Unified Modelling Language) are often used in an Object Oriented (OO) development. However, UML constructs such as use cases may be used successfully in non-OO projects.

The context of use may be built into a checklist. This context checklist leads us to consider developing tests using equivalence partitioning and boundary value analysis (see BS7925-2 for the techniques). These techniques make it easier to define tests across the range of contexts without repetition or missing contexts. The partitions and boundaries in which we are interested are those between the contexts of use, rather than partitions and boundaries between inputs and outputs. We may wish to use the ideas of the techniques rather than necessarily following the standard to the letter. Use risk analysis for the effect of usability problems to weight the partitions and therefore include the most important tests.

Example: We may wish to test that a public emergency call button is available to everyone. What height should it be placed? Partitions to consider:

1. User over 2 m tall (user is extremely tall - may have to bend excessively)

2. User 1.8 m to 2m tall (tall)

3. User 1.3m to 1.79m tall (average)

4. User 1.0m to 1.29 m tall (user is short, must be able to reach)

5. User is less than 1.0 m tall (a child, or in a wheelchair, for example, must be able to reach)

6. User is prone (after accident, must be able to reach)

Consideration of these partitions and boundaries leads us to question whether one screen and controls are sufficient or whether multiple controls are needed to meet all needs.

Boundaries to consider include

§ the highest level that the controls and screen can be placed in order that someone in a wheelchair can use them comfortably - giving us boundary 1

§ the lowest level to which someone can bend - giving us boundary 2

§ the tallest that it is possible for someone to be

§ floor level (prone person)

Other techniques and sources of information

Techniques listed in the literature and on the web

We do not have space here to cover more of the techniques available. Other techniques are:

Type of technique	Examples	Comments
Inquiry	Contextual Inquiry Ethnographic Study / Field Observation Interviews and Focus Groups Surveys Questionnaires Journaled Sessions Self-reporting Logs Screen Snapshots	These are a selection of methods that gather information as a system is in use, either by observation of the user, or by asking the user to comment.
Inspection	Heuristic evaluation Cognitive Walkthroughs Formal Usability Inspections Pluralistic Walkthroughs Feature Inspection Consistency Inspection Standards Inspection Guideline checklists	These are all variations on Review, walkthrough and inspection techniques, with specialised checklists or specialist reviewers.
Testing	Thinking Aloud protocol Co-discovery method Question asking protocol Performance measurement Eye-tracking	These are techniques which help the user and the usability tester/analyst to discuss/discover how the user is using and thinking about the system
Related Techniques	Prototyping :Low-fidelity / High-fidelity / Horizontal / Vertical Affinity Diagrams Archetype Research Blind voting Card-Sorting Education Evaluation	Although not primarily usability techniques, these are techniques that some writers on usability have recommended. They are all ways to elicit reactions from users.

Sources of information

Usability research continues and information in this draft is based on drafts of standards ISO 9126 and ISO 14598, a workshop run by Improve QS bv , material from the following web sites and documents:

ISO TR 16982 is somewhat academic, but should be referenced for more information on usability methods.	ISO TR 16982
To relate usability and usability testing to the life cycle see for example see the ISO 13407 on treatment of evaluation.	ISO 13407
For the distinction between evaluation of paper prototypes, machine prototypes and usability testing see the SERCO website.	http://www.usability.serco.com/trump/methods/recommended/
James Hom's web site with a list of usability design and test methods, very extensive book list and list of other web sites	www.best.co/~jthom/home.html
Human Factors Research Group - SUMI, WAMMI and other information	www.ucc.ie/hfrg
Golden section explained (aesthetics)	www.mcs.surrey.ac.uk/personal/Rknott/fibonacci
Human Computer Interaction Resource Network	www.hcirn.com
Human Factors Research Group UCC	www.ucc.ie/hfrg/
MUSIC	http://www.newcastle.research.ec.org/esp-syn/text/5429.html
SUMI background reading	http://www.ucc.ie/hfrg/questionnaires/sumi/sumipapp.html
Publications list	http://www.lboro.ac.uk/research/husat/inuse/usabilitypapers.html
WAMMI	http://www.ucc.ie/hfrg/questionnaires/wammi/
SERCO usability web pages	http://www.usability.serco.com/trump
IN USE handbook	http://www.nectar.org/inuse/6.2/3-5.htm
SERCO Usability requirements web page	http://www.usability.serco.com/trump/ucdmethods/requirements.html

Accessibility

We have not in this example covered the design methods and the test methods for website accessibility. Readers are strongly recommended to use the following resources for information the reasons for designing with accessibility in mind, or tools, checklists and hints for designing and testing for website accessibility:

Usability and accessibility standards and guidelines:

Ergoweb - Ergonomics Standards and Guidelines - ISO 9241

IBM-Ease of Use - ISO 9241

ESSI-SCOPE Quality Characteristics and their application - ISO 9126

Web Accessibility Initiative Guidelines

RNIB Hints for designing accessible websites

Usability: Web accessibility

Disability Discrimination Act

See It Right Accessible Website Scheme

Vision and Dyslexia

Lighthouse International - Effective Color Contrast - Designing for People with Partial Sight

Website that Simulates Colourblind Vision - contains tools

US Bureau of Census www.census.gov

WHO - www.who.int

Bobby tool - www.cast.org

Effective Color Contrast - Designing for People with Partial Sight and Color Deficiencies - by Aries Arditi, Ph.D, Lighthouse - www.lighthouse.org

“Making Text Legible - Designing for People with Partial Sight” by Aries Arditi, Ph.D, Lighthouse www.lighthouse.org