File Import Overview

The topics in this section provide an overview of the File Import process, the data that can be imported, and troubleshooting information.

File Import Process - Summary

File imports use Azure Storage file share folders for managing file import source files. This section summarizes the use of these folders.

<prefix>-drop - Stores the source file from which data will be imported.
<prefix>-processing - The source file is stored in this folder during the import. An integration log record will be created when the file is moved to this folder.
The following temporary folders will be automatically created:
- <source-file>: Contains temporary json files. The count of json files will depend on the count of records to be imported and the batch size.
  
  Example: The count of records in the source file for the TOEFL test is 900, and the batch size is 100. Hence the source file will be divided into 9 batches of 100 records each, which will all be imported simultaneously. If the batch size is greater than the count of records in the source file, all records will be imported in a single batch.
- <source file name>-<Error>: This folder will be created:
  - Only if the source file contains any errors.
  - Within the <source-file> folder and contains records that fail to import because of an error.
These folders will be deleted when the source file is moved to the <prefix>-processed folder.
<prefix>-processed - The source file will be moved to this folder when the import is complete in all import scenarios, i.e., if the import is successful, partially successful or fails. The integration log record that was created earlier will be updated when the import is complete.
<prefix>-error - Contains a file of records that fail to import (format: <source file name>-Error). The extension of the error file will be identical to the source file.

Reference Data

Records of entities in the following table can be imported or updated (if they are already available). The table describes details of infrastructure that is available by default to enable the import of reference data into Anthology Reach:

Entities	Microsoft Flow Names	Integration Mapping Records in Which Batch Size is Set	Folders for Processing Import
Records of the following reference entities can be imported: Academic Period Area of Interest Area of Study Course Course Section Program Program Version Program Version Detail	Sample - FileImport - ReferenceData Creation Flow - Reference Data	ReferenceData Import Batch Size	referenceData-drop referenceData-processing referenceData-processed referenceData-error

Note: It is not recommended to edit default logic in the flow Sample - FileImport - ReferenceData. Instead:

Copy the flow and edit logic in the copied flow.
Save and publish the copied flow in your institution’s implementation.

That way, the original flow with its default functionality will be available if the copied flow needs to be discarded. Additionally, as concurrency in the default flow is set to 1, it is recommended to retain this value in the copied flow because performance of the operation may be compromised if it is changed.

Before Importing Reference Data

Format the Source File Name and Set the Template Type in the Mapping Record

For each entity, the name of the source file must be in the format <Value in the Template Type field in the Integration Mapping record>_<User-specified text>.<.xlsx or .csv>. For example, the name of the source file for the Academic Period entity will be Default_AcademicPeriod_<User specified text>.<.xlsx or .csv>.

If your institution is importing records of a custom reference entity, the name of source file name must be in the format <Value in the Template Type field in the Integration Mapping record>_<User-specified text>.<xlsx or .csv>.

The name of the vendor and the entity must be set in the Template Type field in associated Integration mapping records.

Example

The name of the vendor is Acme Corp and the name of the reference entity is School Name. These details must be specified as follows:

Source file name: AcmeCorp_SchoolName_Import322.<xlsx or .csv>

Value in the Template Type field: AcmeCorp_SchoolName

By default, the following integration mapping records will be available for each reference data entity:

Record Name	Important Fields and Their Values
ReferenceData-<Entityname>.EntityType	Parameters – schema name of the entity External Field Name and Internal Field Name – EntityType Data Transformation Type – CONCATENATE
ReferenceData-<Entityname>.Externalidentifier	Parameters – schema name of the field that stores the external ID of the entity External Field Name and Internal Field Name – ExternalIdentifier
ReferenceData-<Entityname>.ExternalSourceSystem	Parameters – schema name of the field that holds the external source system value of the specified entity. External Field Name and Internal Field Name – ExternalSourceSystem

To import records of custom reference data entities specific to your institution, for each entity:

Create a copy of the above records with identical naming conventions.
Ensure that the indicated values are specified in the above fields.

Test Score Data

Records of entities in the following table can be imported or updated (if they are already available). The table describes details of infrastructure that’s available by default to enable the import of test score records into Anthology Reach:

Entity	Microsoft Flow Names	Integration Mapping Records in Which Batch Size is Set	Folders for Processing Import
Test Score Records of the following entities will also be created or updated if their details are available in the source file: Address Area of Interest Contact Extra Curricular Participant Previous Education	Sample - FileImport - ACT_2020_2021 Sample - FileImport – GMAT Sample - FileImport - GRE Sample - FileImport - IELTS Sample - FileImport - SAT Sample - FileImport - TOEFL The above flows are supported by the following flows: Import - Process Import File: Folder details will be passed into this flow from Test Score flows listed above, and multiple JSON files are created based on the batch size. Split folder names generated in this flow will be passed to the flow Import - Process Split Files. Import - Process Split Files: Data from each JSON file is moved to Anthology Reach schema. An error file is also created in the <Test>-error folder with the extension that's identical as the source file, and the integration log record for the import is also created. Import - Process Split File Rows: For each row that's passed to the flow Import - Entity Data - Decider, this flow collates errors in a <Split file name>-error.json file that includes details about the Error Row Number, Error Details and Flow URL. A separate JSON error file will be created for each JSON source file; and the JSON error files will then be collated into an error file that will be generated with the same extension as the source file.	For each test, the count of records that will be imported in each batch is set in the following integration mapping records: ACT Import Batch Size GMAT Import Batch Size GRE Import Batch Size IELTS Import Batch Size SAT Import Batch Size TOEFL Import Batch Size By default, 100 test score records of each test can be imported in every import batch. The administrator can change this value in the Internal Option Value field in the above Integration Mapping records. Example The count of records in the source file for the TOEFL test is 900, and the batch size is set is 100. Hence the source file will be divided into 9 batches of 100 records each, which will all be imported simultaneously. If the batch size is greater than the count of records in the source file, all records will be imported in a single batch.	ACT (The source file must be in .csv format)	act_2020_2021-drop act_2020_2021-processing act_2020_2021-processed act_2020_2021-error
			GMAT	gmat-drop gmat-processing gmat-processed gmat-error
			GRE	gre-drop gre-processing gre-processed gre-error
			IELTS	ielts-drop ielts-processing ielts-processed ielts-error
			SAT	sat sat-processing sat-processed sat-error
			TOEFL	toefl-drop toefl-processing toefl-processed toefl-error

For GMAT test scores being imported from .xlsx or .csv source files, a framework is available to import records if multiple column headers are identical in the source file. In such a scenario, in the associated integration mapping record, suffix the position of the column in the source file to the value in the field External Field Name. This will help to differentiate between the identical column headers.

Example

In the source file, the column header City occurs twice in columns 9 and 20. Before performing an import, in their integration mapping records, suffix their positions in the field External Filed Name.

In the first record, the External Field Name will change from City to City-9. In the second record, it will change to City-20. By default, column numbers are suffixed to values in the field External Field Name. Administrators can modify this framework to be implemented in their institution.

The IsGMATOrderingRequired integration mapping record governs this behavior. In this record, the value of the field Internal Option Value is set to True, indicating that the framework is enabled.

To Start the Import

In Microsoft Azure, upload a source file to the <prefix>-drop folder.
The extensions of files can be different in each folder. For example, the SAT source file can have the .xlsx extension, the GRE file can have the .csv extension, and multiple reference entity files placed in the folder referenceData-drop can have different extensions.
Caution
Ensure that you place the source file in the correct drop folder.
The import operation will start on the next run of the associated flow. By default, the flows are set to run daily at 12:00 hours UTC. The administrator can change these settings and also run the flow manually.
During the import, the source file will be moved to the <prefix>-processing folder.

When the import is complete, the source file will be moved to the <prefix>-processed folder in all import scenarios, i.e., if the import is successful, partially successful or fails.

Notes:

The import of blank rows in source files will be skipped and empty records will not be created.
If errors occur when content:
- In .txt format is imported, the errors will be displayed as follows in the file in the <prefix>-error folder:
  <Source file content>||<Row number> <Error details>|<Flow URL> This text will be displayed at the end of the row.
  Before you plan to import content from the error file, for each error, delete all content after the || (Double piping) characters.
- In .xlsx or .csv format is imported, in the <source file name>-Error.<extension> file, delete all content in the errorRowNumber, errorDetails and flowURL columns, including the column headers. These columns will be displayed at the end of the file.
  To import content from the error file, save the file with a unique name after performing the above deletions and then place the file in the <prefix>-drop folder. The import operation will start on the next run of the appropriate flow.
The Integration Status field in the associated Integration Log record that’s created will be set to the following values:
- Success: Indicates that the import operation is successful. The text in the Details field will be Data from file <source file name>.<extension> is imported successfully.
- Failed: Indicates that the import operation encountered an error. Text in the Details field will be Data from file <Source file name>.<extension> failed for <count> rows, created error file with the name <Source file name>-Error.<extension>.
For every source file placed in the <prefix>-drop folder, a unique Integration Log record will be created.

Duplicate Checks for the Import Process

The topics in this section provide information on how duplicate records are handled during an import process.

Verification to Identify Duplicate Record
Duplicate Detection Rules

Verification to Identify Duplicate Record

Import operations include 3 levels of verification to verify if records being imported are duplicate:

Level 1: In the case of vendors that share 2-way integration with Anthology Reach, the GUID originally sent from Anthology Reach will be returned when records are imported from the same source. In this scenario, changed records from the source system will be updated.
Level 2: If the incoming record does not have an Anthology Reach GUID but has its unique ID, and if a record from the same source with the same unique ID is available, the record will be updated. Otherwise, a new record will be created in Anthology Reach after an internal duplicate check is performed based on field matching criteria.
Level 3: Institutions can configure native duplicate check criteria that are inbuilt in Microsoft Dynamics 365. These will be triggered if the GUID of level 1 and unique ID of level 2 are not passed into Anthology Reach during an import.
In all scenarios:
- A new record will be created if it was not previously available.
- The record will be updated if it was previously available.
- An error will be logged if the record being imported matches with multiple destination records.
This 3-level verification process enables institutions to configure duplicate check criteria that are unique to their institution.

Duplicate Detection Rules

During an import to update records, the following conditions are included along with native duplicate check functionality that's within Microsoft Dynamics 365:

Entity	Fields	Criteria
Application	Application Registration	Exact match
Duplicate Detection Rule: Application with same Application Registration, Application Period, Program and Program Version	Program	Exact match
	Program Version	Exact match
	Application Period	Exact match
Application Registration	Application Definition Version	Exact match
Duplicate Detection Rule: Application Registration with same Application Definition Version and contact	Contact	Exact match
Contact	First Name AND	Exact match
Duplicate Detection Rule: Contact with same Last Name, First Name and BirthDate	Last Name AND	Exact match
	Date of Birth	Same Date
Contact	National Identifier	Exact Match
Duplicate Detection Rule: Contacts with same National Identifier
Experience	Contact AND	Exact match
Duplicate Detection Rule: Experience with the same Contact, Title, and Organization	Title AND	Exact match
	Organization ID	Exact match
ExtraCurricular Participant	Student AND	Exact match
Duplicate Detection Rule: Extra Curricular Activity Participant with the same Contact and Extra Curricular Activity	Extra Curricular Activity	Exact match
Previous Education	Student AND	Exact match
Duplicate Detection Rule: Previous Education with the same Contact, School Name, and Education Level	School Name AND	Exact match
	Education Level	Exact match
Test Score	Student ID AND	Exact match
Duplicate Detection Rule: Test Score with the same Contact, Test Type, and Test Source.	Test Type AND	Exact match
	Test Source AND	Exact match
	Test Date	Same Date and Time

Import Failure Scenarios

Some of the scenarios when import can fail include:

When multiple records with the same information are already available. For example, two records of the contact Alex Hales are available in Anthology Reach, and the source file also contains a record with the same information.
If an Integration Mapping record does not exist for a specific field of the entity.

Viewing Import Failure Error Messages

Depending on the type of error, the error messages for import failure are logged in one of the following locations:

In the error log (CSV) file in the <source file name>-<Error> folder on Azure portal.
In the Integration log record (under Settings > Integration) in Anthology Reach application.

Troubleshooting Tips

Ensure that integration mapping records are available for all source fields and their associated values in destination records.
Before importing content, ensure that information in the data source file is correct. This will prevent records from being omitted in the import operation.
Ensure that you place the source file in the appropriate <prefix>-drop folder.