Index Up Site Map Latest News Working Practices Discussion & Review Glossary Module Testing Non-Func. Testing Links Test Papers Feedback Administration

Conversion - Database

This example was reviewed at the February meeting, comments incorporated, and 'signed off' as a completed example at the meeting in April 2003.

0       Configuration

Matrix reference: Conversion test for a database system

Author: Isabel Evans     Version 0.3      April 2003

Replaces v0.2 February 2003, from comments received at meeting April 2003 and via web site

1       Introduction

Two financial service providers (FSP1 and FSP2) have merged and, as part of that merger, management and servicing of customer accounts for a particular investment type is to be moved from FSP1 to FSP2.   As a result, a data migration project is needed: the data held in the existing IT systems of FSP1 about those accounts must be extracted, and loaded onto the existing IT systems at FSP2.  Some data items are held in a different format in the two IT systems, and this data must be converted from one format to the other during the data migration.  This includes the data held on the customer database used to provide information to Marketing Planning.

1.1   Scope and limitations of this example:

This example only covers field to field data conversion. This is simple example with a database that only has 5 fields.  In a larger data conversion, the tester would need to make a risk assessment about whether to cover all or some of the fields and would concentrate testing in the high-risk areas.  The example is based on typical data conversion problems encountered in real projects, but have been much simplified.

Note for this version: further examples and the guidelines will illustrate other aspects of conversion testing not covered in this example.

During a data migration, many activities (IT and non-IT) are carried out, and both the IT and non-IT activities will need to be tested or rehearsed leading up to the live migration.  Types of testing that might be required include:

§          Performance testing of the speed of the extract, conversion and load processes

§          Procedure testing of the manual processes around the migration and their intersection with the IT processes

§          Security testing of the security processes around the migration, for example data protection, security of data from corruption during transfer, security of data from illegal access during transfer

§          Functional testing of control processes built to support the migration for example conversion control reporting

§          Functional testing of the extract and load processes

§          Additionally during a data conversion or data migration, the conversion testing itself has several aspects, which will not be covered in this example, but will be covered in other examples or in the guidelines.  As these have not yet been written, notes are added at the end of this example.

2        Requirements/Specification of System under Test

2.1   Customer Accounts Marketing Database (CAMD):

The FSP1 CAMD (the source system) holds customer information, which is to be transferred to the FSP2 CAMD (the target system).  The format of the two databases is similar but not identical. Some of the fields are outlined in the table below.

Information

to be transferred

FSP1 fields description before data migration

FSP2 fields description before data migration

Customer’s title

Field name: Title

Field description: optional, up to 20 characters

Field name: Title

Field description: mandatory, up to 5 characters

Customer’s Surname

Field name: Surname

Field description: mandatory, 30 characters

Field name: Surname

Field description:  mandatory, 30 characters

Customer’s Country

Field name: No field

Field name: Country

Field description: mandatory, 30 characters

Customer’s Age

Field name: Age

Field description: optional, numeric

Field name: Ageband

Field description: Mandatory, ages banded as A= 0 to 17, B= 18 to 64, C=65 and over

Customer’s Gender

Field name: Gender

Field description: Optional, 1 character, any alphabetic character, used to mark corporate customers (C) as well as M/F for individual customers

Field name: Gender

Field description: Mandatory, M or F or N (Male, Female, Not known)

The data migration specification states that:

§          During the data migration, if a field value cannot be converted, a line will be written to the Conversion Reject Report, and the field value in the converted data will be set to a standard value, to be updated post migration.

§          If a field value can be converted, it will be loaded as the converted value and a line written to the Conversion Load Report.

§          Under certain circumstances a whole record will be rejected and then the record will be written to the Rejection File, and a line written to the Conversion Reject Report.

§          The data migration specification states that if the age value is not within one of the bands, it will be set to D in the target system and a line written to the Conversion Reject Report. (add to spec at front – and for other bits of spec)

§          The data migration specification states that as FSP1 has only traded within the UK , all target records will have a Country value of “ UK ” added to them. There is no need to test other country values, as the source data has no country field. The data cannot be partitioned. Put in spec.

§          The data migration specification states that the definitions for the sources and target fields is the same for the Surname field, so the surnames should be transferred without any change. The data cannot be partitioned.

§          If the gender in the source system is set to M it will be converted and loaded as M and a line written to the Conversion Load Report

§          If the gender in the source system is set to F it will be converted and loaded as F and a line written to the Conversion Load Report

§          If the gender in the source system is set to C this is a corporate customer not an individual and the details must not be loaded to the target Marketing system (because it will be converted to another system which holds corporate customers). It will be loaded to the Rejection File and a line will be written to the Conversion Reject Report.

§          If the gender in the source system is set to any other value, including Null, it will be loaded as M, and a line written to the Conversion Reject Report.

3       Test Design

Definition of the test case design technique and how to apply it.

3.1   Data conversion testing stage 1: review of field formats and uses.

Software life cycle stage: part of the requirements and design stages for the data migration

Testing methods: review / inspection, static analysis (supported by tools) for example data audit tools that  check relationships between data.

Purpose of the test: check that the data conversion design correct, identify records in the source system for data cleaning (e.g. adding information mandatory in the target database)

Description of the tests:

1.        Use static test methods (e.g. code inspection, static analysis tools) for the following:

§          Check that the field descriptions in the specification match the actual definitions in the code

§          Identify which fields require conversion, and which transfer with no change in value or format – an example of a field where the value is transferred unchanged is Surname

§          Identify which of the source database fields require a data conversion during the migration – an example is the Age field, which in the target database is replaced by the Age-band field. Analysis of the two fields reveals that in the source system the field may be empty (null) or hold a value between 0 and 120 (an age) or have incorrect values in some cases. The target field is defined as a 1 character alphabetic field.

§          Identify where the target database may require changes to accommodate source data – an example is the Title field. Error guessing and knowledge of the source system / real life tells us that it is possible for titles to be longer that 5 characters. Analysis of the title field in the source database reveals a range of titles such as “Major General” which is 13 characters including the space.

2.        In the specification review, check that the data conversion design correctly describes these data transfers, conversions and changes.

3.2   Data conversion testing stage 2: design of dynamic tests

Software life cycle stage: part of the dynamic testing stages for the data migration

Testing methods: dynamic running of the conversion, using test data designed using techniques from BS7925-2:, equivalence partitioning, random testing, syntax testing , boundary value analysis, plus error guessing.

Purpose of the test: check that the data conversion design is correct and that it has been correctly implemented.

4        Actual Test Cases

4.1   Equivalence class partitioning and boundary value analysis

 

Information

to be transferred

FSP1 fields description before data migration

FSP2 fields description before data migration

Age

Field name: Age

Field description: optional, numeric – integer – add to earlier

Field name: Ageband

Field description: Mandatory, ages banded as A= 0 to 17, B= 18 to 64, C=65 and over

Customer’s Surname

Field name: Surname

Field description: mandatory, 30 characters

Field name: Surname

Field description:  mandatory, 30 characters

The data migration specification states that if the age value is not within one of the bands, it will be set to D in the target system and a line written to the Conversion Reject Report.

Tests are drawn up as per BS 7925-2, with the following partitions and boundaries identified:

 

For Age:

§          Input partitions: 0-17, 18-64, 65 – maximum

§          Output partitions: A, B, C, D

§          Boundaries: 0, 18, 65, maximum age (check value)

§          Invalid input partitions: non-numeric, null, less than 0, above maximum age (not specified – need to check)

§          Invalid output partitions: E, null, numeric (not specified, based on tester’s perceptions of what might go wrong, this list may vary, as different testers may choose different examples)

For Customer Surname:

§          Boundaries: 1 character surname, 30 character surname

 

TC no.

1

2

3

4

5

Partition type

Output / target Age

Partition tested

A

B

C

D

E

Source value

7

56

72

160

200

Target value

A

B

C

D

D

Outcome

Load report

Load report

Load report

Reject report

Reject report

 

Records will always be loaded to the target system

 

TC no.

6

7

8

9

10

11

12

Partition type

Input /Source - Age

Partition tested

0-17

18-65

65-maximum

Over maximum

under 0

Null

Non-numeric

Source value

12

43

84

e.g. 145

-12

null

S

Target value

A

B

C

D

D

D

D

Outcome

Load report

Load report

Load report

Reject report

Reject report

Reject report

Reject report

 

Records will always be loaded to the target system

 

 

 

 

 

 

 

 

 

 

 

 

 

 

TC no.

13

14

15

16

17

18

19

20

21

22

23

24

Boundary tested

0

0

0

18

18

18

65

65

65

maximum – e.g. 120

Maximum e.g. 120

Maximum e.g. 120

Source value

-1

0

1

17

18

19

64

65

66

119

120

121

Target value

D

A

A

A

B

B

B

B

C

C

C

D

Outcome

Reject report

Load report

Load report

Load report

Load report

Load report

Load report

Load report

Load report

Load report

Load report

Reject report

Records will always be loaded to the target system

 

TC no.

25

26

27

28

29

30

Partition tested

1 char surname

1 char surname

1 char surname

30 char surname

29 chars

30 char surname

30 chars

30 char surname

31 chars

Source value

null

Y

TE

Fosdyke-Hamilton-Smythe-Jones

Foosdyke-Hamilton-Smythe-Jones

Fosdyke-StHamilton-Smythe-Jones

Target value

 

 

 

Fosdyke-Hamilton-Smythe-Jones

Foosdyke-Hamilton-Smythe-Jones

Fosdyke-StHamilton-Smythe-Jone

Outcome

Reject report

Load report

Load report

Load report

Load report

Reject report note truncated surname

 

Records will always be loaded to the target system

Note: These test cases could be minimised by testing more partitions in one test case; for example, test case 1 and test case 6 could be combined.

4.2   Random testing

Information

To be transferred

FSP1 fields description before data migration

FSP2 fields description before data migration

Country

Field name: No field

Field name: Country

Field description: mandatory, 30 characters

The data migration specification states that as FSP1 has only traded within the UK , all target records will have a Country value of “ UK ” added to them. There is no need to test other country values, as the source data has no country field. The data cannot be partitioned.  For this field, all data records are going to be treated in the same way. To test that the loaded records on the target database are set correctly to Country = UK , test data will be derived in two ways:

1)        All tests derived by other techniques will have a check that the country field has been correctly updated to UK as part of the conversion and load.

2)       A database of test data will be generated which reflects the operational profile of the live source database. A random selection of test source records will be converted and loaded, and the Country field checked to see that the value has been correctly updated to “ UK ”. The selection will be made by a random number generator, selecting by record number.

4.3   Syntax testing plus Error Guessing

Information

to be transferred

FSP1 fields description before data migration

FSP2 fields description before data migration

Customer’s Gender

Field name: Gender

Field description: Optional, 1 character, any alphabetic character, used to mark corporate customers (C), as well as M/F for male/female

Field name: Gender

Field description: Mandatory, M or F or N

The data migration specification states that:

§          If the gender in the source system is set to M it will be converted and loaded as M and a line written to the Conversion Load Report

§          If the gender in the source system is set to F it will be converted and loaded as F and a line written to the Conversion Load Report

§          If the gender in the source system is set to C this is a corporate customer not an individual and the details must not be loaded to the target Marketing system (because it will be converted to another system which holds corporate customers). It will be loaded to the Rejection File and a line will be written to the Conversion Reject Report.

§          If the gender in the source system is set to any other value, including Null, it will be loaded as M, and a line written to the Conversion Reject Report

The syntax of the source field is analysed as per the description of syntax testing in BS 7925-2, and mutations are drawn up to provide the set of tests.

The syntax is analysed as in BS7925-2. The notation is set out in the standard:

“Terminals are shown in quotation marks; these are the most elementary parts of the syntax - the actual characters that make up the input to the component.  | separates alternatives. [] surrounds an optional item, that is, one for which nothing is an alternative.  {} surrounds an item which may be iterated one or more times.” 

The syntax is set out from the system specifications as: Syntax of the source field: In the Source database, the gender field is specified to hold M, F, C, or null:

Gender            =              {Character}

Character        =              “A” | “B” … | “Z” | null

We can draw up a set of test cases for this syntax:

Test case

Syntax tested

Source value

Target value

Outcome

1

source M

M

M

Record written to target system, Gender set to M, line written to Load Report

2

source F

F

F

Record written to target system, Gender set to F, line written to Load Report

3

character

C

None – no record

Record written to reject database, line written to Reject Report, gender set to N

4

Null

Null

M

Record written to target system, Gender set to M, line written to Reject Report

We now consider mutations, and use the list of possible mutations in BS7925-2 Guidelines:

m1.      introduce an invalid value for an element;

m2.      substitute an element with another defined element;

m3.      miss out a defined element;

m4.      add an extra element. 

Test case

Mutation tested

Source value

Target value

Outcome

5

M1 – introduce invalid element

!

M

Record written to target system, Gender set to M, line written to Reject Report

6

m2. - substitute an element with another defined element

Null

M

Record written to target system, Gender set to M, line written to Reject Report

7

m3. - miss out defined element;

Null

M

Record written to target system, Gender set to M, line written to Reject Report

8

m4. - add an extra element.

MM

M

Record written to target system, Gender set to M, line written to Reject Report

Error guessing and exploration of the data in the source system by searching for other values show us that M, F, C and Null are not the only values to be found in the Gender field in the target system.  The pattern we find in the data is that, although we could enter any keyboard character, the range of values in the sources data are clustered around the C, F and M keys.  Discussion with the systems users reveals that they are using the Gender field to hold other information – D for a “Death/Probate” case and J for a Joint holder account.  Also, because the field is not validated on input by the source system, some values are mis-types.  Based on error guessing, we devise the following additional tests:

Test case

Error guess / other test

Source value

Target value

Outcome

4

Usercode

D

M

Record written to target system, Gender set to M, line written to Reject Report, noting Death case

5

Usercode

J

M

Record written to target system, Gender set to M, line written to Reject Report, noting Joint account

6

Mis-key

,

M

Record written to target system, Gender set to M, line written to Reject Report

7

Mis-key

.

M

Record written to target system, Gender set to M, line written to Reject Report

8

Space

“ “

M

Record written to target system, Gender set to M, line written to Reject Report

9

Digit

8

M

Record written to target system, Gender set to M, line written to Reject Report

10

OtherAlpha

H

M

Record written to target system, Gender set to M, line written to Reject Report

The outcome of the test is to demonstrate that all the cases where the gender value has been changed during conversion are printed out in order that customer services can either clean data before the conversion or investigate and update the target system post conversion

5       Implementation of Test Cases

The test cases would need to be implemented in a database / set of databases that match the live source and target databases as closely as possible.

6       Evaluation

 Evaluation of data conversion results can be carried out manually, or with tools support (data mining, database interrogation, data comparison tools, data audit tools).

7       Scope for Automation

The following tools may be useful in planning or executing conversion testing, or in analysing results of conversion testing:

§          Data comparison tools

§          Data mining tools

§          Data audit tools

§          Test data generation tools

§          Database interrogation tools

8       Conclusion/Summary

Testing of the data conversion itself is central to the wide range of tests required in a data migration.  Standard techniques may be used to test the field by field conversion.   Other tests will be required to cover the risks associated with a data migration.

Note: Further examples and guidelines have been suggested in comments received to cover the following issues:

§          Audit processes would be needed during the conversion

§          Conversion process validation would include user acceptance testing of received

§          Typically with a database to database conversion project there is only one chance to do the conversion; it must be done on a particular date to meet organisational or legislative requirements. For this reason one or more dry runs, dress rehearsals or dummy conversions would be needed

§          Data cleaning is often required as part of preparation of the source data for conversion and as preparation of the target database.  This may take place as an entry criterion for conversion testing to start, or before dry run as an ongoing process, or as an outcome of dry run.  However one of the risks of converting a live database is that the data is changing.  This means that although previously cleaned, anomalies may reoccur, or new anomalies may occur.  A risk assessment is needed to decide whether a data clean is always done before the live run, or whether data conversion testing is widened to include data values not currently in the live data, or whether the conversion software is written to be robust when it meets anomalous data

§          Problems can occur with database record keys, for example if records are split or combined during the conversion.  Unique keys may no longer be unique once the source data has been transferred to the target database.  In this example, key field problems and duplicate records are not covered; this will be done in a later example or the guidelines.

§          Some conversion design decisions would affect the performance of the conversion. For example, if mandatory and validated fields were converted first and immediately rejected if wrong, the conversion process may terminate more quickly than if fields are converted in definition order, and mandatory fields are converted late on.  This could be dealt with in review of the overall conversion design, by referring to priorities and rules for the conversion.

§          Other attributes of the conversion software and surrounding processes would need to be considered, for example: ease of movement or repositioning between different databases or systems, level of multiple positioning, level of effort in the movement, ease of definition or structure and relationship between entities, ease with which data acquires new traits or characteristics, level of interference, especially during adaptation of data.