Data archiving is a form of information storage that takes control of inactive data—identifying it and moving it to long-term storage. With data archiving, valuable data is accessible, but resides outside of active storage where costs and threats are higher. Archival storage is optimized for efficient data ingestion and minimal data access, focusing on capacity over performance.
Lower-cost storage, often with read-only settings, provides a safe, secure way to store inactive data that must be retained for regulatory or operational requirements. With an eye towards security, data archives are programmed in a way that files can be accessed, but not changed.
Although this data will rarely be referenced again, it needs to be accessible. Indexing is a crucial part of effective data archiving that simplifies and expedites data storage and retrieval. A well-indexed archive has an ordered list of headings that point to relevant information, including files, emails, and database records.
Let’s jump in and learn:
Data Archiving Benefits
The primary benefits of data archiving are reduced storage costs and optimization of storage resources. Other operational and security advantages of data archiving include:
- Optimal performance of storage systems—resources can focus on active data
- Faster time to back up—inactive files are no longer bogging down data storage systems
- Enhanced data protection—data archiving moves information to a more protected environment and reduces the data attack surface in production storage systems
- Improved regulatory and operational compliance—sensitive data can be stored with higher security and limited access
- Easier access to data—strategic data archiving includes a robust indexing system that reduces search time, increases search efficiency, and allows updates to be made more quickly
Data Archiving Tools
A data archiving tool is used to combine files in a single archive file or a library of archive files. It is used to simplify the movement of files and to help organize storage of, commonly, massive amounts of data.
File archive tools often include features that govern read and write functions, while using data compression to reduce the size of the archive.
Basic tools include rudimentary metadata, including the names and lengths of the original file. More advanced data archiving tools capture and store additional metadata, including timestamps, file attributes, and access control rules.
Data archiving tools should include these core capabilities.
- Granularity—defined as the level of detail within a data structure. Within a data archive, granularity facilitates and expedites searches by including information such as:
- File author—i.e., the person or system that created the file
- Data sources—e.g., business applications (ERP, HCM, CRM, BPM, etc.), document management applications, file servers
- Data format—e.g., documents, email, PDFs, images, log files
- Flexibility—meaning the ability to support many platforms, especially with regards to writing to and extracting from media.
- Data types—e.g., Microsoft® Office applications, Adobe® Creative Suite files, data from social media, system files
- Storage media—e.g., DBMS, tape, disk
- Overall Storage Optimization—which focuses on reducing data in production storage systems while ensuring the accessibility of archived data.
- Duplicates—automate the identification and removal of copies of the same file
- Unused and unchanged data—use rules and processes to automatically move this information to a data archive rather than continue to include in costly production storage and backups
Data Archiving vs. Data Backup
Often used interchangeably, data archiving and data backup serve two distinct and different purposes.
Data Archiving | Data Backup |
---|---|
Data archiving is used for long-term retention and optimization of production storage systems. Archives store data on systems that are not as fast or accessible as data backup systems, making them a less expensive storage option. A data archive is meant to provide a permanent record of important information or data that must be retained to meet regulatory requirements. | Data backup is used for resiliency to enable swift recovery from data corruption, hardware failure, or a ransomware attack. According to defined processes, data is backed up to readily available, often premium systems. Users are able to access the latest version of a file or an earlier version, because backups are layered in storage. |
|
|
Data Archiving Dos and Don’ts
The importance of data archives continues to grow with the increasing volume and velocity of data creation. Correctly implementing a data archive system and related processes should be a priority. Following are a few considerations for what to do and not to do as related to data archiving.
What to do for data archiving
- Create distinct systems for backup and data archiving.
- Establish data storage policies that define what data should be stored and where it is archived.
- Automate data lifecycle management to avoid the inefficiencies and errors of manual archiving.
What not to do for data archiving
- Do not construct a monolithic archive that can become a single point of failure and risk the loss of all archived data.
- Do not define and follow data retention policies that fail to consider data type and storage requirements, as well as the length of time data should be stored.
- Do not ease up on security for data archives, especially with cloud storage.
Creating a Data Archiving Strategy
Moving data from Tier 1 storage to an archive system has clear and substantial benefits, but must be done with care. The time spent developing a robust strategy is well worth the investment and will provide control of data.
Before embarking on a data archiving strategy, consider objectives, which could include reducing costs, increasing security for data that requires retention for regulatory purposes, or optimizing production storage systems’ performance. This will inform the structure of your plan and priorities.
Several vital steps when developing a successful data archiving strategy include the following.
- Gain a complete understanding of all the data that needs to be archived by conducting an inventory exercise to determine the types of and amount of data that should be included in the data archive.
- Identify and include all stakeholders—do not forget a representative from the legal team.
- Define a retention schedule for all archived data that takes into account data categories as well as timing for retention and destruction—especially for data subject to regulatory requirements.
- Determine how much data will be archived initially as well as the anticipated rate of growth. This informs the requirements for ingestion and bandwidth as well as the suitability of media (e.g., cloud, tape, disk)
- Consider how frequently the archived data will be accessed and for what reason. For instance, if the data could be used as part of e-discovery, certain reporting functionality would be required.
- Beware of vendor lock-in. Data should be accessible not only for reading, but also for migration.
- Understand the staffing requirements and costs to support the data archive. This is primarily determined by the level of automation.
- Develop a robust security protocol that protects the data at rest and rules for who can access files, under what circumstances, and when and by whom files can be extracted.
- Ensure that the planning includes outreach and engagement with all stakeholders—management, technical, business, legal, and users—for a detailed needs assessment.
Get Started with Data Archiving
A holistic approach to data archiving that includes all constituents and takes into account their different needs will ensure smooth deployment and broad adoption. Always remember that behind the strategy and technology are people who have to use the systems. Technology will do the work, processes will set the guardrails, but fitting into people’s workflows without disruption makes data archiving an effective and valuable part of an organizations’ infrastructure.
Egnyte has experts ready to answer your questions. For more than a decade, Egnyte has helped more than 16,000 customers with millions of customers worldwide.