1. Universal Import Service 1.0.0
The Universal Import Service is used to import data into arveo from third party systems. Currently, the service supports reading data from CSV files.
The service is based on Apache Camel and monitors a configurable directory for CSV files. It provides a plugin interface
for CsvLineMappers
, which are used to create one or more arveo entities from one line in the CSV file. A highly
configurable default mapper implementation is provided, which should be sufficient for most import scenarios.
1.1. Configuration
The service offers some generic settings that apply to all mapping configurations. Because the service is based on Apache Camel, several configuration options for the Camel components apply.
The route used to import CSV files uses a URI-parameter to configure the start of the route. This makes it possible to select the Camel component for the start of the route by URI scheme. The following example shows how to configure the URI:
universal-import-service:
csv:
uri: "file://${project.build.testOutputDirectory}/csv?antInclude=**/test-Demo.csv&noop=false"
In the example, the file component of Apache Camel is activated by the file:
scheme of the URI. The
Camel documentation contains information about the available
parameters for the file endpoint as well as the other available endpoints.
The way CSV files are processed is controlled by the CSV data format of Apache Camel. It offers various configuration properties, that are listed in the Camel documentation. The properties can be used in the configuration file for the Universal Import Service as shown below:
camel:
dataformat:
csv:
delimiter: ";"
In the example, the delimiter for the CSV columns is set to ;
.
2. Generic CSV mapper
The generic CSV mapper is the default implementation of the CsvLineMapper
interface contained in the service.
The mapper maps each column in the CSV file to an attribute of an arveo entity. Content to import is read from a
configurable column, which can contain zero or more file names to import. The file names can either contain a fully
qualified path, or just the name of the file. In the later case, the directory containing the files with the actual
content can be configured (see below).
The mapper offers three different modes:
-
SIMPLE: This is the default. Each line in the CSV file is mapped to one arveo document, which might contain zero or more content elements. The mapping of files to content elements is fixed and can either map file names to content element names or positions in the list of files to content element names.
-
COUNTING: Each line is mapped to one or more arveo documents. Each document contains either one content element or no content at all. A counter can be used as prefix or suffix for any imported attribute to distinguish the documents created for one line.
-
REFERENCING: Each line is mapped as a simple record structure consisting of a container entity containing all attributes and zero or more document components referenced by a foreign key, each containing one content element.
Only one mode can be used at a time.
2.1. Attribute mapping
The mapping of CSV columns to arveo attributes works the same in each mode. An attribute mapping must be configured for each CSV column that is supposed to be imported. Attribute mappings are configured in a map, the keys being the names of the columns of the CSV file. An attribute mapping consists of the following parameters:
Parameter | Explanation |
---|---|
attribute-name |
The name of the arveo attribute (in snake-case) |
type |
The type of the attribute ( |
array |
Whether the attribute is multivalued or not (the default is false). |
delimiter |
The delimiter of multivalued attributes. Ignored when |
date-pattern |
The pattern used to parse attributes of type |
time-pattern |
The pattern used to parse attributes of type |
date-time-pattern |
The pattern used to parse attributes of type |
zone-id |
The time zone ID used when the value in the CSV column for |
prefix |
An optional prefix added to imported attributes of type |
suffix |
An optional suffix added to imported attributes of type |
default-value |
The default value used when the line does not contain a value for a configured attribute mapping. Must follow the same format rules as the other values in the column. |
Attributes are parsed using the default Java parsers, e.g. Integer.parseInt()
for INTEGER, or using the supplied
patterns for date, time or date-time values. Booleans can either be Strings ('true', 'false') or integers (0,1).
Attribute mappings are configured for each mapper mode. For example, when the counting mode is used, the attribute
mappings would be configured in the setting universal-import-service.generic-csv-mapper.counting.attributes
.
The example below shows a mapping configuration for the import of CSV columns called 'sysrowid', 'systimestamp' and 'ispdf'.
attributes:
sysrowid:
attribute-name: "sys_row_id"
type: STRING
systimestamp:
attribute-name: "sys_time_stamp"
type: DATE_TIME
date-time-pattern: "u-M-d H:m:s"
zone-id: "UTC"
ispdf:
attribute-name: "pdf"
type: BOOLEAN
archive:
attribute-name: "archive"
type: STRING
default-value: "records"
Attributes using a default value do not have to be contained in the CSV file. This makes it possible to add new attributes that were not contained in the original data. |
2.2. Simple mode
The simple mode is the default operating mode of the generic CSV mapper. In this mode, a 1:1 mapping between file names read from the CSV file and content element names must be configured. The mapping can either be from file name to content element name or from the position of the file name in the list to a content element name. Because the simple mode is the default, it does not habe to be explicitly enabled in the configuration.
generic-csv-mapper:
type-definition-name: "demo_document" (1)
simple:
content:
csv-field-name: "filename" (2)
content-path: "${project.build.testOutputDirectory}/content" (3)
position-mappings:
0: "content" (4)
1 | The name of the type definition that will contain the imported documents |
2 | The name of the field in the CSV file containing the file names |
3 | The path of the directory that contains the files. In this case, the CSV is expected to contain only the file names. |
4 | Mapping by position. The first file will be stored in the content element named "content". |
A complete configuration example can be found in the system-test module in the file src/test/resource-templates/config/universal-import-service.yaml
.
2.3. Counting mode
In the counting mode, CSV lines containing more than one filename are mapped to multiple independent document entities.
Each document entity will contain one content element. If a line in the CSV file does not contain any file names, one
document with no content elements will be created. A counter can be added to the suffix or prefix of any string attribute
by using the placeholder $+{contentElementNumber}+
.
generic-csv-mapper:
type-definition-name: "document" (1)
mode: COUNTING (2)
counting:
content:
csv-field-name: "filename" (3)
delimiter: "," (4)
content-path: "${project.build.testOutputDirectory}/content" (5)
attributes:
xhdoc:
attribute-name: "xhdoc"
type: STRING
suffix: "_${contentElementNumber}" (6)
1 | The name of the type definition that will contain the imported documents |
2 | The counting mode must be enabled explicitly |
3 | The name of the field in the CSV file containing the file names |
4 | The delimiter used to separate file names |
5 | The path of the directory that contains the files. In this case, the CSV is expected to contain only the file names. |
6 | Adds a suffix with the counter (starting at 1) of the file |
A complete configuration example can be found in the system-test module in the file src/test/resource-templates/config/universal-import-service-counting.yaml
.
2.4. Referencing mode
In the referencing mode, a record container is created for each imported document. This record will contain all attributes, but no content. For each imported file, a document is created that contains only the data of the imported file. The documents are referenced by a foreign key containing the ID of the record container.
The imported documents do not contain any custom attributes, but arveo’s inheritance feature can be used to automatically inherit attributes from the referenced record. |
generic-csv-mapper:
type-definition-name: "component" (1)
mode: REFERENCING (2)
referencing:
container-type-definition-name: "container" (3)
reference-field-name: "container_id" (4)
content:
csv-field-name: "filename" (5)
delimiter: "," (6)
content-path: "${project.build.testOutputDirectory}/content" (7)
1 | The name of the type definition that will contain the imported documents |
2 | The referencing mode must be enabled explicitly |
3 | The name of the type definition containing the record containers |
4 | The name of the attribute in the documents containing the foreign key |
5 | The name of the field in the CSV file containing the file names |
6 | The delimiter used to separate file names |
7 | The path of the directory that contains the files. In this case, the CSV is expected to contain only the file names. |
A complete configuration example can be found in the system-test module in the file src/test/resource-templates/config/universal-import-service-referencing.yaml
.
3. Writing a custom line mapper
Custom line mappers have to implement the interface CsvLineMapper
. Mappers can use the typed or the generic API of arveo.
A mapper that uses the generic API has to return true in the isGeneric
method implementation and has to implement the
mapLineGeneric
method. Typed mappers have to return false in the isGeneric
method and have to implement the malLine
method.
The custom mapper implementation has to be registered as a Spring bean in a custom Spring Boot starter. To replace the
provided default mapper, the custom auto starter either has to run before the auto configuration class de.eitco.uis.generic.mapper.GenericCsvMapperAutoConfiguration
or the default mapper bean registrations have to be disabled by setting the property universal-import-service.generic-csv-mapper.mode
to DISABLED
.
Line mappers return a list of batch operations. The arveo entities created for one line can be created using the respective batch operation(s). The operations will be executed in the order in which they are contained in the list.
The custom mapper can be activated by adding the jar of the custom starter to the service’s libs directory.
Unresolved directive in index.adoc - include::configuration-properties.adoc[]