Creating Delta Files for Import

This topic describes functionality where only new or changed reference data records will be imported into Anthology Reach. Such records are identified by comparing the current source file with the file from which data was previously imported. The system then creates a temporary file with new or changed records, which is imported using functionality in the Sample - Import – Create Delta File flow.

Prerequisites

  • By default, the Sample - Import – Create Delta File flow is not enabled. When it is enabled, it will be triggered at 04:00 hours UTC by default. Administrators can change this value. Ensure that there is sufficient between the default execution time of the above flow and Sample - FileImport-Reference Data - the flow responsible for importing reference data, which runs daily at 07:00 hours UTC.
  • The framework described in this topic can be implemented only on source files with the .csv extension. To work with .xlsx files, save them with the .csv extension and then perform the steps in this topic.
  • This framework compares the current source file with its previous version. Hence, source files must be named in the <Filename>_<yyyymmdd>_<hhmmss>.csv format. The portion in red must be identical in multiple versions of the same file. For example,
    • The source file imported yesterday is Import_SummerAdmissions_20201101_083000.csv.
    • The file imported today is Import_SummerAdmissions_20201102_083000.csv.
    The comparison will occur only if the red content is identical.
  • If the name of the source file includes a unique string and is in the <Filename>_<unique string>_ <yyyymmdd>_<hhmmss>.csv format, the value of the string can differ in each version of the source file. Before performing an import of data from such source files, add the doesFileContainUniqueString parameter to the Call File Compare Service step in the Sample – FileImport – Create Delta File flow and set its value to True.

The following folders will process the file:

  • referenceData-delta_drop
  • referenceData-delta_processed
  • referenceData-delta_archive

To Start the Import

  1. In Microsoft Azure, drop the source file in the referenceData-delta_drop folder.
  2. If a previous file with which it can be compared is not available, the file will be moved automatically to the following folders:
    • referenceData-delta-processed
    • referenceData-drop – From this point onward, the regular import procedure will be performed.
  3. If a previous file was available, it will be compared with the new source file. After processing occurs, the new and previous source files and the new temporary file will be moved to the following folders:
    • referenceData-delta-processed – The new source file will be moved here.
    • referenceData-delta_archive – The previous source file will be moved here.
    • referenceData-drop – The file that includes new or changed information will be moved here, and will be suffixed with the text delta. For example, <file name>-delta.csv.
  4. Default import functionality will trigger on the file in the referenceData-drop folder. For more information, see File Import Overview.

The following table describes different scenarios of source file movement across folders:

Scenario Folders
  referenceData-delta_drop referenceData-delta_processed referenceData-delta_archive referenceData-drop
Scenario 1

Dropped <sourcefile>_1.csv into referenceData-delta_drop and a previous file is not available for comparison.

Before processing <sourcefile>_1.csv      
After processing   <sourcefile>_1.csv   <sourcefile>_1.csv
When the Sample - Import – Create Delta File flow runs, the file will be moved to the following folders:
  • referenceData-delta-processed
  • referenceData-drop

The suffix delta will not be added to the file in the referenceData-drop folder because a previous file with which it can be compared is not available.

Scenario 2

Dropped <sourcefile>_2.csv into referenceData-delta_drop

Before processing <sourcefile>_2.csv <sourcefile>_1.csv    
After processing   <sourcefile>_2.csv <sourcefile>_1.csv <sourcefile>_2-delta.csv
When the Sample - Import – Create Delta File flow runs:
  1. The <sourcefile>_2.csv file that's in the referenceData-delta_drop folder will be compared with <sourcefile>_1.csv that's in the referenceData-delta_processed folder.
  2. After processing:
    1. A temporary file with the suffix delta (<sourcefile>_2-delta.csv) which will include only changed or new records will be created in referenceData-drop, and will be imported using the available import framework.
    2. <sourcefile>_1.csv will be moved from referenceData-delta_processed to referenceData-delta_archive.
    3. <sourcefile>_2.csv will be moved from referenceData-delta_drop to referenceData-delta_processed.
Scenario 3

Dropped <sourcefile>_3.csv into referenceData-delta_drop

Before processing <sourcefile>_3.csv <sourcefile>_2.csv <sourcefile>_1.csv  
After processing <sourcefile>_3.csv <sourcefile>_2.csv <sourcefile>_1.csv  
Assumption: <sourcefile>_3.csv is not processed owing to a technical glitch.
Scenario 4

Dropped <sourcefile>_4.csv into referenceData-delta_drop

Before processing <sourcefile>_3.csv

<sourcefile>_4.csv

<sourcefile>_2.csv <sourcefile>_1.csv  
After processing   <sourcefile>_4.csv <sourcefile>_1.csv <sourcefile>_2.csv <sourcefile>_3.csv

<sourcefile>_3-delta.csv>

<sourcefile>_4-delta.csv

When the Sample - Import – Create Delta File flow runs, two comparisons will happen:
  1. <sourcefile_3.csv that's in the referenceData-delta_drop folder will be compared with <sourcefile>_2.csv that's in the referenceData-delta_processed folder.
  2. <sourcefile>_4.csv file that's in the referenceData-delta_drop folder will be compared with <sourcefile>_3.csv that's in the referenceData-delta_processed folder.
  3. After processing:
    1. Two new temporary files with the suffix delta (<sourcefile>_3-delta.csv> and <sourcefile>_4-delta.csv) which will include only changed or new records will be created in referenceData-drop, and will be imported using the available import framework.
    2. Files of scenario 2 (<sourcefile>_2.csv in referenceData-delta_processed) and scenario 3 (<sourcefile>_3.csv in referenceData-delta_drop) will be moved to referenceData-delta_archive.
    3. <sourcefile>_4.csv will be moved from referenceData-delta_drop to referenceData-delta_processed.
Scenario 5

Dropped the <sourcefile>_5.csv file which is identical to <sourcefile>_4.csv

Before processing <sourcefile>_5.csv <sourcefile>_4.csv <sourcefile>_1.csv <sourcefile>_2.csv <sourcefile>_3.csv  
After processing   <sourcefile>_5.csv <sourcefile>_1.csv <sourcefile>_2.csv <sourcefile>_3.csv <sourcefile>_4.csv The delta file will not be created.

When the Sample - Import – Create Delta File flow runs:

  1. The <sourcefile>_5.csv file that's in the referenceData-delta_drop folder will be compared with the <sourcefile>_4.csv file that’s in the referenceData-delta_processed folder.
  2. After comparison:
    1. As there is no difference, a delta file will not be created in the referenceData-drop folder.
    2. <sourcefile>_4.csv will be moved from referenceData-delta_Processed to referenceData-delta_archive.
    3. <sourcefile>_5.csv will be moved from referenceData-delta_drop to referenceData-delta_processed.
  • Existing functionality of importing .csv and .xlsx reference data files by dropping source files directly into the referenceData-drop folder continues to be available.
  • When UTF-8 encoding with a byte order mark was used for the source file used for the Delta compare process, additional characters were added in the header of the first column in the file. As a result the import failed. In order to fix this issue, use the UTF-8 encoding without the byte order mark or use other encoding method for the source file.

  • For each imported file, an integration log record (with the name Delta file comparison) will be created in which the Integration Status field can have one of the following values:
    • Success - indicates a successful import operation
    • Failed - indicates that reference data information failed on account of multiple files with the same name being available in the referenceData-delta_Processed folder, file comparison failing due to a technical glitch, and so on.
    • Success (with warning) - When the next version of the source file is placed in the referenceData-delta_drop folder, this value indicates that the file in the referenceData-delta_processed folder has duplicate records.
  • Administrators can use information in these records to troubleshoot any import errors.