Data Integrity Specialization Report

CS5529 - Software Engineering


Brad Brueske
Ben Gagne
Ramesh Kizhappali
Matt Larson
Kristy VanHornweder



General Description
Influence on other values of software engineering
Project Decisions
Past Experiences
Summary



General Description

Data integrity is important value to consider in order to produce a good software product. It refers to ensuring that a body of data in a software component satisfies consistency conditions and constraints imposed by the particular software problem being solved. Simply put, data integrity is the validity of data. To achieve data integrity, one often restricts the ability of other components in order to modify a particular software component's data. Encapsulation provides a way to achieve this goal and is a feature of many object-oriented languages.

Data integrity is often analogous to "fitness for use" which means data must be appropriate for a particular application. Data that is considered appropriate for one use may not be appropriate for another. Different users have different needs so it is important that the data satisfies these needs. Since the value of data depends on its uses, it may be meaningless until it is placed into some context.

There are many ways for data to go wrong. The value of the data may be correct, but it can be easy to misinterpret. The users have no part in ensuring data integrity so their ability to judge the reasonableness of data is lost. Even though data in systems may be accurate, it may not be useful if it is not timely enough. Data can easily be outdated. Also, data in different divisions that is correct may not be fit for use if those divisions are meant to be combined yet are incompatable. If the data is accurate and timely but is inconsistent or has missing portions, the data could be of little use. For data that has multiple dimensions, the data may be satisfactory for most of the dimensions, but insufficient for a few that are critical. Many computer systems use soft data, which is data that is not inherently verifiable. This shows the increasing awareness of the importance of data integrity.

Data integrity can be compromised in a number of ways:

There are several ways to protect the integrity of data:


Some applications in which data integrity is of importance are databases, operating systems, networks, and data structures. One issue that is currently of great concern is the Year 2000 problem. If the software is intended to deal with information about a person after a certain date (ie, 96), problems will result if the information occurs after 1999. The software will take '00' to mean 1900, and there will be no information for that year, and the person will probably be regarded as had never been born! When the software was created, the developers didn't take this into consideration and therefore never had code that maintained the integrity of its data. The developers were probably only thinking of the current time and not thinking of what might happen in the future. Since computer science is a young discipline, all software has been developed in the 1900's and software developers have never needed to consider the change of a century (or millenium). Also, back when the software was developed, memory was expensive and it was easier to store years in two digits rather than in four. So they were thinking of the current times rather than the future.


Influence on other software engineering values

Ensuring data integrity adds to the complexity of a software product and affects other values of software engineering. It solidifies the robustness and correctness of a product. Preventing human errors and maintaining valid data is essential for the quality of a software product. Coupling is an issue when objects of one class need to deal with data in another class. It makes writing those methods more difficult. Response time may be longer because of additional methods needed to ensure data integrity. It may add to the cost of a product because more resources need to be allocated to implement the extra methods.

More importance should be given to data integrity and it should be considered early in the development phase to prevent unnecessary problems later. This would make testing and debugging easier. If not addressed earlier, debugging would be more difficult because problems would not arise until the implementation phase. Likewise, coding would take longer because the data integrity issues would begin appearing at that phase. If more time was spent in planning and designing, it would eliminate some problems that otherwise would not appear until the coding and testing phases.

Data integrity also has an impact on maintenance. After the product is released, if users were to find something displeasing in the data, they might request that the software be upgraded. If the problem was considered earlier, the user may have never seen this problem since it would have already been taken care of. Having more consideration of data integrity would prevent problems that surface after the product has been released. However, spending more time on data integrity in the early phases result in a delay in the delivery time of a product. There are many trade-offs involving the issue of data integrity, therefore an appropriate balance must be maintained of all the values of software engineering.


Project decisions

Data integrity is an important consideration in the PERT chart project. This project involves graphically representing data in a sequential and sometimes parallel fashion. If a portion of the data is inaccurate, the entire representation would be incorrect. Many tasks in the PERT chart are dependant on one another. If one of the tasks is incorrect, or creates a loop, the entire PERT chart is compromised. A major concern is to avoid cycles. If the PERT chart contained cycles, then the chart would not be correct and the duration times would be undefined.

During the planning and design phase, several issues of data integrity were considered:


Several methods require that certain preconditions are satisfied in order to maintain data integrity. Task creation and prerequisite methods contain preconditions to avoid cycles. Setting duration times is only allowed when tasks are simple. Removing a task is only allowed when the task has no prerequisite tasks. The methods isSimple, hasPrereqs, isAncestor, and isPrerequisite are provided to ensure preconditions for the other methods are met. If the precondition is false, the modification is simply not allowed to occur. Preconditions are important in maintaining data integrity and keeping the PERT chart system accurate.


Past Experiences

1. I had taken an intro class in C++ where we had to design a dating game. The problem called for matching a male and a female depending if their common interests were more than a certain number. The data integrity issue we had to consider here was that we were not allowed to modify data of the persons, such as age, sex, and telephone numbers. We could achieve this through encapsulation by providing for getting methods whenever we wanted that data. The data members were declared private which prevented them from being modified by the client. They were initialized when the object was created by the constructors of the class.

2. I just lost all my data on my hard drive because it crashed. I did not to much in the ways of protecting my data integrity. In the first place, I did not make any backups of the data. But I did have virus protection software. That didn't do any good when my hard drive didn't work. I should have backed up my data, then I would not need to retype all my work that was lost (including a 200 page book). I learned my lesson there.

3. Networks must have ways in protecting data integrity. I can give some examples of ways that protect the integrity of data on the network I work on (and in networks in general). For one thing the network has a server that backs up all data every day. Sometimes you can retrieve data if accidentally deleted. The network also has virus protection software. This is needed especially if people are transferring data to and from floppy disks and different computers.

4. On most of my past programming assignments I would need to incorporate data checking (robustness). This makes sure the user does not enter invalid data that would cause the program to crash. That was actually protecting data integrity, which I did not realize at the time.

5. In a 5000 level class, we implemented various data structures such as binary trees, red black trees, and graphs. They had to satisfy certain constraints in order to keep the structures in balance. It is always a concern with binary trees that the structure could degenerate into a linked list and therefore searching for an item would take longer. Red black trees had to follow many constraints, including rotating a part of the structure if an element was added/deleted to keep it in balance. It was important to satisfy these constaints in order to keep the data balanced within its represented structure.


Summary

Working with data is the core of a software engineering project. The data must satisfy certain constraints in order to produce a good software product. In the PERT chart project, it was essential to preserve data integrity so that the chart representation would contain no cycles and the data would be accurate. Data integrity also has an influence on other values of software engineering. It is important to keep a balance between all these values to ensure that the product is a success. Data integrity is important and should be one of the major considerations in a software project.


Back to CS5529
Back home