Empa - DigitalScience - Storage, Backup and Security

Storage, Backup, and Security

1 Storage of Active Data

A key aspect of a data management plan is a storage strategy for active data and archival data. Data are easily lost, digital files are fragile, and formats and storage media become rapidly obsolete over time.

Active or working data are referred to as research data, which is collected and accessed during the course of a project. The datasets will be expanding as the collection of new data continue and so data access is regularly required for processing and analysis. An important component of data management planning is deciding where and how the active data will be stored so that it is readily accessible for data processing but also secure. These issues should be considered:

Anticipated size of datasets
Perform capacity planning of CPU, memory and diskspace over lifetime. This ensures that the systems employed are able to scale with the expected requirements.
Computational requirements: Large-scale analyses may require high-speed processors, network bandwidth (speed) and a substantial amount of disk space
Backup
Security, together with risk management
- How reliable are the systems used?
- Needed skills regarding data and system handling
Data classification

2 Backup

It must be evaluated which data and which minimum requirements must be met by the backup

Which data must be saved (raw data, processed data, results)
Differentiation between backup of the latest state (recovery) or backup of generations (versioning, undoing changes)
Can the data be retrieved with alternative procedures?
Is sufficient storage capacity available or additional storage needed?
Who is responsible for backup and restore?
How is the data restored in case of an incident?
How long will the data be retained and preserved?

Documented procedures (3.4.5 Backup and Restore) exist for the central services like network shared storage, ftp-server, private cloud-based solutions (PolyBox, SharePoint). A generation-based backup has been set up for Empa owned infrastructure. There are no standardized solutions for decentralized or external data backup. These must be checked individually.

3 Security

MHB-3.4.4 Regulations for the use of informatics at Empa (RUI) These regulations make up general guidelines concerning the proper and secure utilization of information technology systems. Their objective is to optimize the availability of information technology resources for teaching, research and service purposes and to ensure the integrity and confidentiality of all processed and stored information. In addition, the regulations provide directives concerning the misuse of information and the consequences of such misuse.

The responsibility for data security must be clarified on a project-by-project basis and compliance with the requirements ensured. For the use of central IT services, standardized procedures exist to ensure access authorizations and protect data integrity. This includes hardware, software, networks and storage[1].

4 Data Classification

The directive MHB “2.3.21 Directive for the Classification of Data at Empa“ determines how Empa data is to be classified and how it is to be processed. Any classification of data being generated by third parties will remain independent from the specifications of this directive.

There are other classification features to consider:

Relevance of the data
Primary data, secondary data, personal data (MHB 2.3.18) /group/mhb/2.3.18-umgang-mit-dokumenten-und-daten
Form (Non-electronic data, electronic data)
Confidentiality (public data, classified data, authorization area)
Storage (offline, online internal, external e.g. CSCS)
Capacity (transport and storage, lifetime)

[1] If the data is encrypted, the key management must be regulated. Who is in possession of the key, how is the key kept safe and is it available again during the backup