Creating Delta Files for Import
This topic describes functionality where only new or changed reference data records will be imported into Anthology Reach. Such records are identified by comparing the current source file with the file from which data was previously imported. The system then creates a temporary file with new or changed records, which is imported using functionality in the Sample - Import – Create Delta File flow.
Prerequisites
- By default, the Sample - Import – Create Delta File flow is not enabled. When it is enabled, it will be triggered at 04:00 hours UTC by default. Administrators can change this value. Ensure that there is sufficient between the default execution time of the above flow and Sample - FileImport-Reference Data - the flow responsible for importing reference data, which runs daily at 07:00 hours UTC.
- The framework described in this topic can be implemented only on source files with the .csv extension. To work with .xlsx files, save them with the .csv extension and then perform the steps in this topic.
- This framework compares the current source file with its previous version. Hence, source files must be named in the <Filename>_<yyyymmdd>_<hhmmss>.csv format. The portion in red must be identical in multiple versions of the same file. For example,
- The source file imported yesterday is Import_SummerAdmissions_20201101_083000.csv.
- The file imported today is Import_SummerAdmissions_20201102_083000.csv.
- If the name of the source file includes a unique string and is in the <Filename>_<unique string>_ <yyyymmdd>_<hhmmss>.csv format, the value of the string can differ in each version of the source file. Before performing an import of data from such source files, add the doesFileContainUniqueString parameter to the Call File Compare Service step in the Sample – FileImport – Create Delta File flow and set its value to True.
The following folders will process the file:
- referenceData-delta_drop
- referenceData-delta_processed
- referenceData-delta_archive
To Start the Import
- In Microsoft Azure, drop the source file in the referenceData-delta_drop folder.
- If a previous file with which it can be compared is not available, the file will be moved automatically to the following folders:
- referenceData-delta-processed
- referenceData-drop – From this point onward, the regular import procedure will be performed.
If a previous file was available, it will be compared with the new source file. After processing occurs, the new and previous source files and the new temporary file will be moved to the following folders: - referenceData-delta-processed – The new source file will be moved here.
- referenceData-delta_archive – The previous source file will be moved here.
- referenceData-drop – The file that includes new or changed information will be moved here, and will be suffixed with the text delta. For example, <file name>-delta.csv.
- Default import functionality will trigger on the file in the referenceData-drop folder. For more information, see File Import Overview.
The following table describes different scenarios of source file movement across folders:
Scenario | Folders | ||||
---|---|---|---|---|---|
referenceData-delta_drop | referenceData-delta_processed | referenceData-delta_archive | referenceData-drop | ||
Scenario 1 Dropped <sourcefile>_1.csv into referenceData-delta_drop and a previous file is not available for comparison. |
Before processing | <sourcefile>_1.csv | |||
After processing | <sourcefile>_1.csv | <sourcefile>_1.csv | |||
When the Sample - Import – Create Delta File flow runs, the file will be moved to the following folders:
The suffix delta will not be added to the file in the referenceData-drop folder because a previous file with which it can be compared is not available. |
|||||
Scenario 2 Dropped <sourcefile>_2.csv into referenceData-delta_drop |
Before processing | <sourcefile>_2.csv | <sourcefile>_1.csv | ||
After processing | <sourcefile>_2.csv | <sourcefile>_1.csv | <sourcefile>_2-delta.csv | ||
When the Sample - Import – Create Delta File flow runs:
|
|||||
Scenario 3 Dropped <sourcefile>_3.csv into referenceData-delta_drop |
Before processing | <sourcefile>_3.csv | <sourcefile>_2.csv | <sourcefile>_1.csv | |
After processing | <sourcefile>_3.csv | <sourcefile>_2.csv | <sourcefile>_1.csv | ||
Assumption: <sourcefile>_3.csv is not processed owing to a technical glitch. | |||||
Scenario 4 Dropped <sourcefile>_4.csv into referenceData-delta_drop |
Before processing |
<sourcefile>_3.csv
<sourcefile>_4.csv |
<sourcefile>_2.csv | <sourcefile>_1.csv | |
After processing | <sourcefile>_4.csv | <sourcefile>_1.csv <sourcefile>_2.csv <sourcefile>_3.csv |
<sourcefile>_3-delta.csv> <sourcefile>_4-delta.csv |
||
When the Sample - Import – Create Delta File flow runs, two comparisons will happen:
|
|||||
Scenario 5 Dropped the <sourcefile>_5.csv file which is identical to <sourcefile>_4.csv |
Before processing | <sourcefile>_5.csv | <sourcefile>_4.csv | <sourcefile>_1.csv <sourcefile>_2.csv <sourcefile>_3.csv | |
After processing | <sourcefile>_5.csv | <sourcefile>_1.csv <sourcefile>_2.csv <sourcefile>_3.csv <sourcefile>_4.csv | The delta file will not be created. | ||
When the Sample - Import – Create Delta File flow runs:
|
Notes:
- Existing functionality of importing .csv and .xlsx reference data files by dropping source files directly into the referenceData-drop folder continues to be available.
-
When UTF-8 encoding with a byte order mark was used for the source file used for the Delta compare process, additional characters were added in the header of the first column in the file. As a result the import failed. In order to fix this issue, use the UTF-8 encoding without the byte order mark or use other encoding method for the source file.
- For each imported file, an integration log record (with the name Delta file comparison) will be created in which the Integration Status field can have one of the following values:
- Success - indicates a successful import operation
- Failed - indicates that reference data information failed on account of multiple files with the same name being available in the referenceData-delta_Processed folder, file comparison failing due to a technical glitch, and so on.
- Success (with warning) - When the next version of the source file is placed in the referenceData-delta_drop folder, this value indicates that the file in the referenceData-delta_processed folder has duplicate records.
Administrators can use information in these records to troubleshoot any import errors.