Archiving node	Production DB node
ОS Centos7	git
JVM 8	zip/unzip
MySQL Server	SMTP client*
Percona Xtrabackup	Percona toolkit (v3.0.3 released 2017-05-18)*
git
zip/unzip
SMTP client*

Description	Way to address the issue*
Any of the required software products is missing on the server. Notification message: link	Install the missing software products and restart the process.
For example:TASK's value is not supported. Possible values are [generate-script, delete-data, optimize-table, delete-data>optimize-table, restore-data]	Correct the values of the parameters in the configuration file and restart the process.
The data model generated on the database copy doesn’t match the data model stored in unitools/archiving/data-model subdirectory. Notification message:	Contact support
MySQL connection error. Notification message:	Verify MySQL connection settings and restart the process.
SQL user password indicated in .my.cnf file is incorrect. Notification message: ERROR 1045 (28000): Access denied for user 'root'@'127.0.0.1' (using password: YES)	Verify and modify the database user password specified in .my.cnf file and restart the process
UniPay database is missing on the Archiving node. Notification message:	Make sure that you have created a copy of the production database on the Archiving node and restart the process

Description	Way to address the issue*
The volume of the data included in one short-term data archiving iteration exceeds one million transactions. Notification message:	Decrease the time period indicated in GENERATION_PARTITIONING_CYCLE parameter and restart the process.
There is not enough disk space available on the Archiving node. Notification message:	Identify the file(s) that caused the problem by executing df command in Linux console. Increase free HDD space and restart the process. Note that we do not recommend to delete any files from unitools/archiving directory.
1) Archiving process has been interrupted. 2) Communication failure. 3) Check scripts failed. 4) Insufficient RAM for running Jailer utility. 5) The name of the file with archived data coincides with the name of previously generated file.*	Contact support. *Note that the new file will be saved with the name starting with prefix new_. However, the process will be interrupted.
Communication failure. Notification message:	Contact support
Check scripts failed. Notification message:	Contact support.
Insufficient RAM for running Jailer utility. Notification message:	Contact support.
The name of the file with archived data coincides with the name of previously generated file. Notification message:	Contact support. *Note that the new file will be saved with the name starting with prefix new_. However, the process will be interrupted.

Description	Way to address the issue*
Server parameters contained in the delete script do not match the parameters of the server where the script has been started (iapp_settings table). Notification message: Script can't be applied, it was generated for a different server build. Verify Profile Name and Type	Make sure that you are trying to apply the delete script on the relevant database server and restart it.
There is not enough disk space available on the production database node. Notification message:	Identify the file(s) that caused the problem by executing df command in Linux console. Increase free HDD space and restart the removing process.
Communication failure. Notification message:	Restore the connection and restart the process.
Data removal process has been interrupted. Notification message:	Restart the process.

Module (process phase)	Description	Shell script applied during phase
VALIDATION	system validation and verification	validate-project.sh
PREPARE_FOLDER	creating directories necessary to perform short-data archiving	prepare_folders.sh
DATAMODEL	generating data model on the database copy.	validate_datamodel.sh
DB_DUMP	creating database dump (as a result db-dump-structure.sql file in work directory and Data/short-term subdirectory are generated)	create_db_dump_structure.sh
DB_INIT	initialization: creating sql table data_archive, and views for export of the data for long-term storage	do_initialization.sh
FIX_SCRIPTS	applying fix scripts	do_preprocessing.sh
SHORT_TERM	short-term data iterative archiving and delete scripts generation	generate_short_term_data.sh
LONG_TERM	long-term data iterative export	generate_short_term_data.sh
DELETE_DATA	deleting the archived data	delete-data.sh
OPTIMIZE_TABLE	database tables optimization	optimize-table.sh
RESTORE_DATA	restoring previously archived and deleted data	restore-data.sh

Introduction

The purpose of this guide is to familiarize administrative users with the process of data archiving and removing the archived data from the production database to maintain its optimal size, meet server performance, and free hard disk space requirements.

Intended Audience

This guide will be useful for system administrators and DevOps who are going to perform a data archiving process or retrieve previously archived data.

Overview

Data Archiving Concept

As the application runs, the transactional data accumulates in the database. The increase of the database size may cause server performance degradation, as well as the need for additional hard disk space (HDD space).To avoid these issues, we recommend using our data archiving solution. All transactional data included in the archiving process are divided into two groups:

data for short-term storage is processing and remittance related data that can be completely removed from the database within several months (usually 6 months);
data for long -term storage is the data that must be maintained for a longer period (7-10 years).

Thus, the archiving process consists of 2 sub-processes: generation and export of the data for long-term storage in CSV format, and archiving and removal of the data for short-term storage. In both cases, the data does not have to remain in the database itself.
It should be noted that the archiving process doesn’t cover the following types of data that exist in UniPay:

data that don’t require storage in the database and are cleaned up automatically (data from the server logs and data around system processes);
data that don’t have to be removed from the database (configuration files related to the entities created in the gateway: merchant, merchant account, processing profile, etc).

Functional comparison of short-term and long-term data exporting processes

	Long-term data export	Short-term data archiving
Definition	used for long-term storage of the key data that may be required during the audit or for statistical analysis	used for short-term storage (after removal from the database)
Compatibility	Compatible across all releases	May be incompatible across releases
Source (database tables)	сharge_transaction chargeback_transaction	Database tables groups: 010_charge_transaction.jm 020_token_transaction.jm 030_chargeback_transaction.jm 040_account_update_transaction.jm 150_file_store.jm 210_merchant_statement_reconciliation.jm 220_merchant_statement_deposit.jm 230_merchant_financial_statistics.jm*
Format	CSV	SQL script
Usage	Audit Statistical reporting	Data restitution back to the database
Data Removal	Not required to remove data from the database	Required to delete all exported records from the database

*the full list of database tables related to each group is available in Appendix 1. The data archiving solution is implemented using the Jailerl tool, which generates the scripts necessary to remove the data from the database, as well as to reinsert it back if needed. The generated delete scripts can then be applied to remove unnecessary data at any time. Additionally, the Percona Xtrabackup tool is recommended for use to create a backup of your MySQL database while the system is running.

Terminology

Short-term data - processing and remittance related data that can be archived and completely removed from the database within several months (usually 6 months).

The archiving mechanism involves pre-validation logic that evaluates data selected for archiving and deletion. Data related to realtime and batch transaction cycles that have not been closed yet is ignored. In this case, it is necessary to complete processing and restart the archiving process.

Long-term data - transaction-related data that must be maintained in the database for 7-10 years.
Archiving node - SQL server with the production database copy used for data archiving and delete scripts generation. It can be either a separate node not connected to Unipay cluster, or a sandbox SQL server.
Data archiving iteration - a repeated cycle during which the data related to fixed regular intervals is being archived: long-term data is partitioned by month, short-term data is partitioned by month/two weeks/week/or day depending on data_partinioning_cycle parameter set in the configuration file.
Database table group - each database table containing data for short-term storage is associated with one of the following groups: charge transaction, token transaction, chargeback transaction, account update transaction, file store, merchant statement reconciliation, merchant statement deposit, merchant financial statistics.
Delete scripts - SQL scripts generated with the Jailer tool and placed into unitools/archiving/delete-scripts. Delete scripts are applied in the production environment to remove unnecessary data for short-term storage.
Replace scripts - SQL scripts generated for each short-term data archiving iteration and placed into "unitools/archiving/short-term". You can run replace scripts to reinsert data for a particular period back into the production database after removal.
Jailer - is an open-source tool allowing the removal of obsolete data without violating integrity. Jailer is used to generate delete and replace scripts based on the pre-defined data model.
Data model - database structure supported by Jailer.
Extraction Model - set of the database table groups involved in the data archiving process.

Preliminary Steps

Hardware and software requirements

To ensure data integrity in the production environment, the data archiving process and delete scripts generation are implemented on the database copy. Thus, to execute the data archiving process you must perform the following preliminary steps:
1) Configure the Archiving node and the production server appropriately

Minimal hardware requirements

Archiving node Production DB node
Random Access Memory (RAM) 16G 32-64 G
HDD/SSD 200G Free disc space (HDD/SSD) must exceed the volume
of the largest database table that is going to be archived

Minimal hardware requirements:

Archiving node Production DB node
ОS Centos7 git
JVM 8 zip/unzip
MySQL Server SMTP client*
Percona Xtrabackup Percona toolkit (v3.0.3 released 2017-05-18)*
git
zip/unzip
SMTP client*

*It is required to install Percona toolkit according to the manufacturer’s instructions if you are going to optimize tables with the script.

To run PT-ONLINE-SCHEMA-CHANGE, grant to unipaya user PROCESS and REPLICATION SLAVE privileges using the following commands:
grant PROCESS on . to unipaya@'%';
grant REPLICATION SLAVE on . to unipaya@'%';
flush privileges;

**To receive email notifications at all stages of the archiving process, you should configure SMTP client Mutt. Follow these instructions to setup Mutt. Your configuration file settings should be similar to the following:

file ~/.muttrc content
set header_cache =~/.mutt/cache/headers
set message_cachedir =~/.mutt/cache/bodies
set smtp_url = "smtp://10.0.3.104:587/"
set smtp_pass = ""
set ssl_starttls=no

To make sure that you have configured Mutt correctly, run the following command:

echo "Test message" | mutt -s "Test subject" youremail@yourserver.com

2) Run MySQL instance on the Archiving node and create a copy of the production database using Percona XtraBackup.
Please note that the database replication on this server must be disabled.

Configuration file parameters
3) Download Unitools repository using git clone command.
Use the private SSH key to UniPay repository to access Unitools.

Please contact UT support to get the link to the repository.

4) Set data archiving parameters in the configuration file.
Copy unitools/archiving/run/run_template file and save it as run.conf. Open run.conf file and indicate the following parameters:

=General parameters
=

Required

TASK - allows selecting the archiving mode depending on the process stage:

generate-script - archives data and generates delete scripts on the Archiving node;

delete-data - removes the archived data from the production database with delete-scripts;

optimize-table - optimizes tables that have been purged;

restore-data - reinserts the selected data into the production database upon deletion.

delete-data|optimize-table - removes the archived data from the production database with further tables optimization.

DATABASE - Unipay database name. Default: unipay

DATABASE="unipay"

EMAIL settings - allows setting the archiving notifications recipients’ email addresses separated with coma:

EMAIL_TO="recipient@mail.com, recipient2@mail.com " EMAIL_FROM="sender@mail.com"

Optional

BLOCK_EXECUTION - allows to stop the archiving process after the current iteration. Default value:0.

BLOCK_EXECUTION=0

ACTIVITY_TIME_RANGE - defines the time period during which the archiving process will run on the server every day until all of the iterations have completed. By default, this parameter is commented in run_template.conf file. If the activity time range is not defined, the arching process will run from the beginning to end without interruptions.

Activity time range is set in hh:mm-hh:mm format (beggining and completion time separated with dash)
For example, if ACTIVITY_TIME_RANGE=03:00-05:00,
the archiving process will be started automatically at 3 am and stopped at 5 am (server time) every day, untill all of the iterations have completed.
When it´s the completion time, the archiving process won´t be interrupted until the current iteration is terminated.
It is recommended to schedule archiving for the time periods of minimal activity on your servers to optimize the server load.

=Generate-script task parameters (required)
=

Note that there are no default values for these parameters.

GENERATION_RETENTION_PERIOD - a period of time, for which the data must remain in the database. It is calculated starting from the date when the data archiving process is launched. Note that this parameter cannot be less than 12 months, otherwise the archiving process will be interrupted during the validation phase.
Possible values: integer equal to or greater than 12 or archiving beggining date in YYYY-MM-DD format (must be 12 or more months before the current date).

'For example, if you want to keep the transactional data for the last year in the database, and delete the records that are older than 12 months:'

GENERATION_RETENTION_PERIOD=12 GENERATION_RETENTION_PERIOD=2016-09-25

GENERATION_PARTIONING_CYCLE - defines how short-term data will be divided for iterative archiving and deletion: M - monthly, B - biweekly, W - weekly, D - daily;

GENERATION_PARTIONING_CYCLE should be calculated based on your monthly transaction volume estimates:

less than one million transactions per month: Monthly

1-2 million transactions per month: Biweekly

2-4 million transactions per month: Weekly

more than 4 million transactions per month: Daily

=Delete-script task parameters (required)
=

DELETION_TRANSACTION_ENABLED - defines how the deletion script is executed. Possible values: 1 - all SQL statements are aggregated and deleted as a single transaction, 0 (default)- each SQL statement is deleted as a separate transaction.

DELETION_TRANSACTION_ENABLED=0

Note that enabling deletion transaction aggregation increases the server load because the information about each delete script execution is stored in cache untill the whole process is completed to make the rollback possible in case the deletion fails at some stage. Thus, if you set the parameter to “1”, the deletion process will required more memory resources compared to step-by-step deletion. If you are using Percona Server for MySQL, it is recommended to set DELETION_TRANSACTION_ENABLED=1.

=Optimize-table task parameters
=

OPTIMIZATION_SPACE_LIMIT - the tables with free size (in MB) exceeding this value will be optimized.
Default value: 100 MB

OPTIMIZATION_SPACE_LIMIT=100

OPTIMIZATION_TABLE_LIST - allows indicating the list of tables that will be optimized separated with coma.

If the parameter is saved as comment (#OPTIMIZATION_TABLE_LIST=""), the list of tables will be calculated automatically using OPTIMIZATION_SPACE_LIMIT parameter.

=Restore-data task parameters
=

RESTORATION_START_DATE - allows setting the beginning date, after which the data will be restored in the database, in yyyy-mm format.

RESTORATION_START_DATE=2016-03

5) Create .my.cnf file in the Home folder assigned to the OS user. The file must contain the following parameters:
[client]
user=[unipaya] password=[password]**
host=[host]*
port=[port]*
*database connection settings **if the database user password is not specified in the configuration file, you will have to enter it during each shell script execution after getting this message:

No password for mysql is found in file /home/{foldername}/.my.cnf. Enter it manually:

In case you have entered incorrect password, you will get the following message:

Authentication to MySQL failed. Try to re-enter password:

6) Grant permission for run.sh script execution by running the following command in Linux console on the archiving node and the production server

chmod +x run/run.sh

Data Archiving Steps

The data archiving process is performed in two stages:
1) Data extraction and archiving, delete scripts generation with Jailer: the process takes place on the Archiving node.

2) Removing unnecessary data from the production database by running the scripts generated at the previous stage.

3) Optimizing tables after data purging.

Archiving Node: data archiving and delete scripts generation

To initiate the archiving process, you should do the following:

1) Set the TASK parameter in run.conf file:

TASK="generate-script"

2) Indicate other required parameters for this task execution;
3) Run the following script in Linux console on the Archiving node:

./run.sh

4) We recommend to save the output generated within this process on a separate hard drive or server for further storage including:
unitools/archiving/data unitools/archiving/delete-scripts
The initial stage of the process involves:

1) Extraction of the data that is going to be archived from the database copy.
The data is extracted from the database tables for further archiving based on the extraction model applied.

2) Short-term data archiving and long-term data export.
Both of these sub-processes are iterative.

Long-term data is split into one-month periods when exported. At the moment, this is a default setting that cannot be changed. A set of fields contained in сharge_transaction and chargeback_transaction tables that require long-term storage are extracted and exported to unitools/archiving/data/long-term. The exported data is stored in zipped CSV files grouped by year, source database table, and month. Zipped CSV files can then be placed to a separate hard drive for further storage.
Short-term data archiving is split into iterations as indicated in the GENERATION_PARTITIONING_CYCLE parameter in the configuration file.
Thus, data for short-term storage is iteratively extracted from the relevant database tables and placed in unitools/archiving/data/short-term in the form of replace scripts. Replace scripts are split by release (data may be incompatible across releases), year, month and week/two weeks/day (if GENERATION_PARTITIONING_CYCLE is set as weekly, biweekly or daily).
3) Delete scripts generation with Jailer.
Delete scripts are generated with the Jailer tool and saved in unitools/archiving/delete-scripts. These scripts can be applied to remove unnecessary data in the production environment at any time.

It takes approximately half an hour to archive each million of transactions. Note that one million transactions equals to approximately four million database records since both parent and children tables are purged.

Should any issues occur in this process, you will receive an email notification with the root cause and further instructions.

If there is a need to terminate the archiving process before completion, you can navigate to use BLOCK_EXECUTION parameter to stop the archiving process after the current iteration.

Note that the data archiving process can be executed on the live database from the very beginning bypassing data archiving and scripts generation performed on the database copy. In this case, TASK="generate-script" parameter will involve deleting the exported data from the database.

Production environment: removing unnecessary data from the database
The archived data is deleted from the production database in three steps:

Removing constraints from the database tables that are going to be purged.

Running delete-scripts.

Optimizing the tables that have been purged.

To initiate the process you should do the following:
1) Configure database connection.
2) Checkout the code from unitools repository using the link provided by UT support during the preliminary steps.
For example,

git clone --progress --recursive --branch V7.2 -v "ssh://git@phabricator.unitedthinkers.com:2200/diffusion/41/unitools.git" unitools git checkout V7.2
git pull

3) Copy unitools/archiving/run/run_template.conf file and save it as run.conf. 4) Set the TASK parameter in run.conf file:

to delete unnecessary data

TASK="delete-data"

to optimize database tables after purging:

TASK="optimize-table"

to perform both deleting unnecessary data and optimizing tables in one step:

TASK="delete-data|optimize-table"

5) Set additional parameters (for optimizing tables only).

6) Copy delete-scripts folder from the Archiving node and paste it into the archiving directory on the production server.
7) Run the following script to initiate the deletion/optimization process:

/run.sh

It takes approximately half an hour to remove each million of transactions. Note that one million transactions equals to approximately four million database records since both parent and children tables are purged.

Taking into account that the overall volume of data that is going to be removed may be large, we recommend running the delete statements when the server load is low: for example, on weekends.

Retrieving archived data or inserting it back into the database

In certain cases, there is a need to retrieve the details related to one or several specific transactions. Depending on the ultimate objective, you can choose between two options:

1) Retrieving the database record from a zipped CSV file with the data for long -term storage.

For this purpose, you should locate the relevant file based on the transaction date in unitools/archiving/data/long-term.
2) Inserting the relevant records back into the database using replace scripts.

For this purpose, you should do th efollowing:
1) Set the TASK parameter in run.conf file:

TASK="restore-data"

2) Set the restoration beginning date.
3) Run the following script to initiate the restoration process:

./run.sh

Please note that short-term data is incompatible within the application releases.

Data Archiving Implementation Details

The data archiving process is automated, however, its implementation involves a series of actions:
1) Email notification upon process initiation.
2) System settings verification and validation.
Validation script checks whether the system is configured correctly to perform the data archiving process:

server capacity check, server free disc space check;

verification of the installed software;

configuration file verification:

GENERATION_RETENTION_PERIOD parameter cannot be less than 12 months

GENERATION_PARTITIONING_CYCLE must be defined;

MySQL connection check;

email server connection check;

custom checks: UniPay database availability; replication verification (it must be disabled on the Archiving node).

3) Generation of the file for sensitive data storage (MySQL users and passwords).
4) Generation of the necessary procedures and database tables.
5) Data model verification: the data model generated on the database copy must match the data model stored in the Archiving/data-model subdirectory. This verification is essential because archiving with Jailer can be completed only on the database with the predefined structure. The archiving process will be terminated if data models do not match.
6) Copy of the database is created on the backup node.
7) Filling the database records key dates where missing with fix scripts (charge_transaction.TRANSACTION_DATE, batch.PROCESSED_DATE, retail_transaction_cycle.PROCESSED_DATE).
8) Generation of the data for long-term storage.
9) Generation and deletion of the data for short-term storage.
9.1. Calculation of the database tables groups and date until which the data will remain in the database.
9.2. Archiving of the data within each database table group:

Calculation of the dates and cycles for archiving;

Creation of the necessary procedures for the current group archiving: 010_archive_data__***.sql, 020_generate_check__***.sql

Each short-term data archiving iteration includes the following actions:

Running the archive_data procedure in order to verify if there is any data available for archiving. If there is no data for archiving, the process is interrupted.

Check-scripts generation for further verification after Jailer running. Running generate_check procedure twice: for delete and keep checks.

Short-term data and delete scripts generation with Jailer.

Closing iteration, transfer of inset files into the delete-scripts directory.

Running delete_jailer_tmp_data.sql in order to purge Jailer temporary tables.

Switching to the next time period for archiving.

Note: All steps making up the short-term data archiving iteration subprocess are repeated for each iteration.
10. Email notification upon archiving process completion.

Monitoring and Troubleshooting

During the data archiving process, you will receive email notifications of three types:

1) Notification upon the process initiation.

This notification contains the server parameters, process start time, and a list of the database tables groups that will be archived.

2) Error notifications if an issue occurred and the process has been interrupted.

This notification contains the issue details: description, time, log data, etc.

3) Notification of successful process completion.

This notification contains server parameters, process execution time, the volume of the data that has been archived.

Note that if an error occurred during the archiving process, you won’t receive a notification of this type, since the process cannot be completed without additional steps aimed to resolve the issue.

Below you can find the description of the most typical issues that may arise during the archiving process and troubleshooting steps:

Validation phase

Description Way to address the issue*
Any of the required software products is missing on the server.
Notification message: link Install the missing software products and restart the process.
For example:TASK's value is not supported. Possible values are [generate-script, delete-data, optimize-table, delete-data>optimize-table, restore-data] Correct the values of the parameters in the configuration file and restart the process.
The data model generated on the database copy doesn’t match the data model stored in unitools/archiving/data-model subdirectory.
Notification message: Contact support
MySQL connection error.
Notification message: Verify MySQL connection settings and restart the process.
SQL user password indicated in .my.cnf file is incorrect.
Notification message: ERROR 1045 (28000): Access denied for user 'root'@'127.0.0.1' (using password: YES) Verify and modify the database user password specified in .my.cnf file and restart the process
UniPay database is missing on the Archiving node.
Notification message: Make sure that you have created a copy of the production database on the Archiving node and restart the process

Data archiving and delete scripts generation phase

Description Way to address the issue*
The volume of the data included in one short-term data archiving iteration exceeds one million transactions.
Notification message: Decrease the time period indicated in GENERATION_PARTITIONING_CYCLE parameter and restart the process.
There is not enough disk space available on the Archiving node.
Notification message: Identify the file(s) that caused the problem by executing df command in Linux console. Increase free HDD space and restart the process.
Note that we do not recommend to delete any files from unitools/archiving directory.
1) Archiving process has been interrupted.
2) Communication failure.
3) Check scripts failed.
4) Insufficient RAM for running Jailer utility.

5) The name of the file with archived data coincides with the name of previously generated file.* Contact support.
*Note that the new file will be saved with the name starting with prefix new_. However, the process will be interrupted.
Communication failure.
Notification message: Contact support
Check scripts failed.
Notification message: Contact support.
Insufficient RAM for running Jailer utility.
Notification message: Contact support.
The name of the file with archived data coincides with the name of previously generated file.
Notification message: Contact support.
*Note that the new file will be saved with the name starting with prefix new_. However, the process will be interrupted.

Removing data from the production database

Description Way to address the issue*
Server parameters contained in the delete script do not match the parameters of the server where the script has been started (iapp_settings table).
Notification message: Script can't be applied, it was generated for a different server build. Verify Profile Name and Type Make sure that you are trying to apply the delete script on the relevant database server and restart it.
There is not enough disk space available on the production database node.
Notification message: Identify the file(s) that caused the problem by executing df command in Linux console. Increase free HDD space and restart the removing process.
Communication failure.
Notification message: Restore the connection and restart the process.
Data removal process has been interrupted.
Notification message: Restart the process.

*more detailed instructions will be provided in this section.

Appendix 1: Archiving Directory Structure

The archiving project is a root directory located in unitools repository. It contains installation and configuration files, scripts and procedures necessary to complete the data archiving process. The data contained in the directory is updated with each application update or patch.

The archiving directory involves static and dynamic components.

The static content of the Archiving directory includes two directories: script and run.

In order to configure the system and launch the archiving process, you should refer to the latter. Run directory involves:

run_template - template of the configuration file used to set the parameters required for data archiving proceses: GENERATION_RETENTION_PERIOD, GENERATION_PARTITIONING_CYCLE, email settings, TASK.

run.sh - data archiving initial script.

Directory Script contains scripts and procedures necessary to perform all phases of the archiving process, including SQL, shell scripts, and scripts for Jailer application running. It consists of the following subdirectories:

1) SQL subdirectory comprising:

1000_initialization folder: contains SQL scripts that create the following table and views:

1001_create_table__data_archive.sql - the creation of the temporary table where calculated record IDs that are going to be archived and removed will be saved.

Views for long-term data storage:

1002_create_view__view_chargeback_transaction_data.sql

1003_create_view__view_charge_transaction_data.sql

2000_preprocessing folder: contains a number of fix scripts that allow to verify and set key dates (if missing) in the database records that are going to be archived. There is a separate application and verification script for each table. The name of the database table that is going to be fixed

is indicated in the name of the script.
For example, 2101_apply_fix__iapp_merchant_statement.sql - sets the missing dates in for the records contained in iapp_merchant_statement table. 2100_verify_fix__iapp_merchant_statement.sql - verifies if all key dates are set for the records contained in iapp_merchant_statement table.

3000_processing folder containing 5 subdirectories corresponding to 5 groups of database tables that are going to be archived. Each of them contains a set of three scripts:

[subdirectory ID]_archive_data__[short-term data group name].sql - sql script creating a procedure that calculates the IDs of the database records that are going to be archived and their total count.

[subdirectory ID]_generate_check__[short-term data group name].sql - sql script creating a verification procedure that checks if the records with IDs defined in the previous step have been deleted.

[subdirectory ID]_calculate_limit_date__[short-term data group name].sql - sql script that selects the record with the earliest date out of all records in the database group. This date is used as the archiving process beginning date for this particular group.

utils - additional SQL procedures for archiving.

2) jailer - Jailer Java archive, scripts related to starting and running Jailer.
3) data-model - database model used for archiving.
4) extraction-model - groups of short-term database tables that are going to be archived.
5) core - shell scripts and related configurations that make up the mechanism of archiving:

lib - folder containing a set of system functions.

config - a folder containing current project configurations.

For example,

messages_en.conf - configuration file that sets messages (string constants) that are used in logs and email notifications.

lib.conf, log.conf, project.conf - system configuration files.

series of shell scripts corresponding to the archiving process phases (or Modules)

Part of the Archiving directory content is generated dynamically during the data archiving process including the following directories: log, work and delete scripts.

Data directory contains:
1) data for long-term storage in zipped CSV files.
File names consist of the following parts:

long-term data group ID

long-term data group name

year and month of the records contained in the file (in YYYYMM format)

archiving start date (in YYYYMMDD format)

For example,charge_transaction_201801_20190609.zip.
2) replace SQL scripts that can be used to restore the deleted short-term data in the database.
Replace scripts are divided by release, data group, year, and data partitioning cycle (monthly, biweekly, weekly, or daily);.
File names consist of the following parts:

short-term data group ID

short-term data group name

the time range of the records contained in the file (in YYYYMMDD_YYYYMMDD format) depending on the data partitioning cycle defined in the configuration file.

For example, 010_transaction_20180107_20180101.zip.
Log directory contains run.log file with the validation process log data.

Work directory contains temporary files generated during the archiving process. Once the iteration is terminated, files go to data or delete-scripts directory. In addition, it includes log.txt file with the information about the duration and status of each archiving, phase, iteration, and action.

The archiving process consists of the following phases (or MODULES):

Module (process phase) Description Shell script applied during phase
VALIDATION system validation and verification validate-project.sh
PREPARE_FOLDER creating directories necessary to perform short-data archiving prepare_folders.sh
DATAMODEL generating data model on the database copy. validate_datamodel.sh
DB_DUMP creating database dump (as a result db-dump-structure.sql file
in work directory and Data/short-term subdirectory are generated) create_db_dump_structure.sh
DB_INIT initialization: creating sql table data_archive, and views for export
of the data for long-term storage do_initialization.sh
FIX_SCRIPTS applying fix scripts do_preprocessing.sh
SHORT_TERM short-term data iterative archiving and delete scripts generation generate_short_term_data.sh
LONG_TERM long-term data iterative export generate_short_term_data.sh
DELETE_DATA deleting the archived data delete-data.sh
OPTIMIZE_TABLE database tables optimization optimize-table.sh
RESTORE_DATA restoring previously archived and deleted data restore-data.sh

Each phase may involve a number of iterations, containing one or series of actions of the following types:
1) PING - MySQL host availability check;
2) FILE_ACTION - running a script from a file;
3) RUN_JAVA - running a Java program;
4) RUN_SQL_FILE - running SQL script from a file;
5) RUN_SQL_QUERY - running SQL query;
6) ACTION - any other operation that can not be attributed to any of the above-mentioned types.
7) PROCESS_BEGIN/PROCCESS_END - indicate the beginning and end of the process;
8) PHASE_BEGIN/ PHASE_END - indicates the beginning and end of each phase;
9) ITERATION_BEGIN/ITERATION_END - indicates the beginning and end of each iteration.
10) RUN_PT_QUERY - running database tables optimization.
11) RUN_SQL_PROCEDURE - running MySQL procedure.
12) FILE_EXISTS - check whether the file or directory exists
13) EMAIL - sending email notifications

Note: if an error occurred during an operation execution, ‘_ERROR’ is added to the action type name. For example, ACTION_ERROR.

Delete-scripts''' directory contains a set of sql removal scripts in zipped files generated during the archiving process..

Appendix 2: Archiving Data Groups and Tables

Configuration file parameters

=General parameters

=Generate-script task parameters (required)

=Delete-script task parameters (required)

=Optimize-table task parameters

=Restore-data task parameters

Data Archiving Steps

Archiving Node: data archiving and delete scripts generation

Production environment: removing unnecessary data from the database

Retrieving archived data or inserting it back into the database

Data Archiving Implementation Details

Monitoring and Troubleshooting

Appendix 1: Archiving Directory Structure