What is Data Integrity and Why is it Important?
At its most basic level, data integrity is the reliability and trustworthiness of data through its lifecycle. This includes creation or acquisition, transfers, storage, backup, and archiving or destruction. To have data integrity, there must be validation that the data has not been corrupted or compromised—by human error or malicious actions.
Data integrity describes both the state of data (i.e., valid or invalid) and the process for achieving the valid state with tactics such as error checking and anomaly detection. As the lynchpin in so much of the world’s activities (e.g., business, leisure, health, education, government), data integrity is vital, since the ripple effect of inaccurate data can have significant consequences.
Types of Data Integrity
- Physical
Applies to hard copies and digital files, especially in a disaster (e.g., flooding, fire, power outages), as related to the safeguarding of data during storage and retrieval - Entity
Focused on the characteristics of the tables that are used to store and connect data in relational databases - Domain
In a database, refers to the suitable values that a column may contain (e.g., constraints on format or the amount of data entered) - Referential
A set of procedures for how data should be stored and used to ensure consistency and accuracy and prevent duplication, or to prohibit the entry of data that does not apply - User-defined
Rules and restrictions created by a user to meet their specific requirements - Logical
Protects data while in use in relational databases
Threats to Data Integrity
- Human error
- Formatting inconsistencies
- Collection errors
- Data breaches
What is Data Integrity?
Data integrity is the process of maintaining and ensuring the accuracy, reliability, and consistency of data throughout the data lifecycle with practices to control cybersecurity, physical safety, and database management. The characteristics that determine the reliability of the information in terms of its physical and logical validity are also part of data integrity.
As noted above, there are several types of data integrity; the two most pertinent here are physical and logical. Physical data integrity focuses on how data is stored to protect it from security breaches or disasters. Logical data integrity relates to how data is protected by users’ human error to prevent data corruption.
Six Attributes of Data Integrity
- 1. Accuracy
- The degree to which the data entry correctly describes the object
- May be identified as a single version of established truth
- A reference provides a means for identifying the deviation of data item
- Includes all data items that accurately reflect the characteristics of real-world objects within allowed specifications
- 2. Completeness
- Comprehensiveness of available data as measured as a proportion of the entire data set
- Ability to address specific information requirements
- A percentage defined based on specific variables and business rule
Note: The percentage of completeness goes down in the absence of data
- 3. Consistency
- Represents the absence of differences between the data items representing the same objects
- Data may be compared for consistency within the same database or against other data sets of similar specifications
- Discrete measurement can be used as an assessment of data quality
- May be measured as a percentage of data that reflect the same information as intended for the entire data set
- 4. Timeliness
- Degree to which the data is up-to-date and available within an acceptable time frame, timeline, and duration
- Time of occurrence is considered a reference, and assessed on a continuous basis
- Value and accuracy of data may decay over time
- 5. Uniqueness
- Discrete measure of duplication of identified data items within a data set
- May be defined as 100 percent if the number of data items in the data set is unique
- 6. Validity
- The conformity to allowable type, range, format, and other preferred attributes for data
- Measured as a percentage proportion of valid data items compared to the available data set
- Validity of data encompasses the relationships between data items that can be traced and connected to other data sources for validation purposes
Why is Data Integrity Important?
- Eliminates excess storage used for old, inaccurate, or redundant data
- Enables reliable analysis and resulting reporting to support informed decisions
- Ensures the quality of products and services
- Helps control data access
- Improves performance by minimizing or removing incomplete records and eliminating duplicate records
- Increases users’ trust and confidence in the data
- Protects users’ privacy
- Provides the framework that protects data throughout its lifecycle
- Supports positive customer experiences that are personalized with quality data
How to Enable Data Integrity
Understanding the value of data integrity is important, but enabling it is where the value lies. Here are five ways to enable data integrity:
- 1. Dedupe Data
Put processes in place to identify and remove duplicate data on a regular basis. This not only optimizes data integrity, but reduces storage costs and improves overall performance. - 2. Start with Data Quality
Ensure that data is accurate, complete, and meets quality standards—at the design stage. Create processes that guide high-quality data collection and generation, rather than trying to undo data errors downstream. Evaluate processes regularly to identify any gaps or areas for improvement. - 3. Take Time for Data Entry Training
It is often assumed that people understand the importance of data entry and that they know how to do it correctly. Data integrity starts at the point of entry.
Level-setting and training go a long way towards improving data quality. Data entry training can reduce time spent on remediation and skewed or inaccurate analytics because of poor-quality data. - 4. Update the Data Regularly
Data integrity is enhanced with frequent updates. Whether updates are done in real-time or in scheduled windows, they are crucial to keeping data fresh and relevant, thereby improving the overall data integrity. - 5. Validate Data
Diligently validating data input and checking for errors reduces the propagation of human error and the negative impact on data integrity. Putting checks in place to validate data helps ensure data integrity even from unknown sources, because errors or anomalies are detected early.
Organization-wide Benefits from Data Integrity
Because data drives everything, data integrity is not just a nice-to-have. It is a critical part of any organization’s infrastructure and must be treated accordingly.
Done well, data integrity initiatives ensure that information is searchable, traceable, trustable, and usable—up to the highest standards. Maintaining data integrity across the organization results in better insights, lower expenses, and increased efficiency.
Egnyte has experts ready to answer your questions. For more than a decade, Egnyte has helped more than 16,000 customers with millions of customers worldwide.
Last Updated: 26th July, 2021