Introduction

What is arveo?

arveo is a Headless Content Service Platform.

arveo expands your digital company platform and your public cloud or data center solutions with cloud-based enterprise content management (ECM).

arveo is a multi-client and 100% cloud-ready content services platform. With arveo you can legally secure (GoBD certified) and DSGVO/GDPR-compliant manage the entire life cycle of your documents and files and process all your content. arveo ensures data and legal security even when using cloud storage services and takes into account the requirements of the GDPR and DSGVO with regard to the secure deletion of data.

With arveo enterprise-ready solutions can be created, from revision-proof content archives to complex file and transaction processing.

What is Content Service Platform?
… is a cloud ready Enterprise Content Management System
… is a collection of Microservices sharing the same data repositories
… provides REST interfaces.
… typically has ECM Services, AI Services, BPM, Conversion, Enterprise Search, etc.
… provides access to all kind of content like documents, videos, images, audio, etc.
… serves all kind of use cases with the organization
… content is stored once and edited and read by many applications.

arveo's modern architecture based on microservices and state-of-the-art technologies was natively built for the cloud. Connect our lightweight arveo content services with a single, lean API with your system landscape, other open systems and the most suitable services for you from the cloud or on-premises. With this Best-Of-Breed approach, you can easily realize your company’s dream of a “single source of truth” across all systems.

Figure 1. Our Vision: single source of information

The arveo content services manage the entire life cycle of your content like

Documents
Images
Videos
Audio
Text.

arveo allows the free configuration of the content objects including metadata and mapping of folder hierarchies and electronic files.

The NoSQL technologies allow you to search across all meta values and document content with high performance regardless of the complexity of your search. Decisive advantages in mass data processing and search performance through the additional use of horizontally scalable NoSQL technologies. The used NoSQL apache solr 8.6 enterprise search engine combined with key value caches leads to an increase in speed of up to a factor of 1,000 compared to relational database systems.

Headless Content Services

The market for "headless systems" has been growing for some time. These offer backend functions without a user interface of the system completely can be used by the end user. This is best known from content management systems (CMS) used in web development. With the increasing use of different end devices such as smartphones, tablets or wearables, they are increasing also the requirements for content management systems. In addition, users have a lot of content on different channels. Headless CMS dispense with the front end and thus enable your content to be displayed various channels through a single REST API.

So if products are to be fully and seamlessly integrated in a platform and a dependency on a user interface or client is no longer desired, one speaks of so-called "headless systems".

The wide availability of different cloud services and solutions enables the set up a modern platform for your business processes. Instead of relying on a monolithic ECM as before, companies combine the most suitable cloud content services and create with the "best-of-breed" Approach targeted added value for your digital company platforms.

Regardless of whether you have your own solution, an open cloud application or your company portal, want to add secure and legally compliant ECM functions: You can access all of your data directly via a single interface (REST API), Access documents and information.

arveo is headless by design. All modules are hosted as pure backend cloud services from Eitco or optionally hybrid in your private cloud or on-premises in your data center disposal. Of course, these are natively suitable for mobile applications.

API first

The stateless REST API is our product and is used by all arveo components and user interfaces. The web services are stable over the long term and are fully available to every customer.

It is important to us that our services have open interfaces and can be easily integrated into an enterprise service infrastructure. As a modern content services platform, the arveo uses standards wherever possible in order to use the steadily growing number of cloud-enabled services inside or outside the company infrastructure. Whether operating system, database, text recognition, machine learning or object storage, arveo can access services from different manufacturers and combine them with its own services in order to quickly create added value.

Best-Of-Breed strategy

There are many ECM products and the market is constantly changing. A manufacturer-independent ECM standard such as SQL for relational databases has not fully established itself for ECM applications despite several attempts from WebDAV to JSR 170 to CMIS. The market is dominated by monolithic packages that master all ECM applications. A customer who implements a complex ECM application for his company often becomes highly dependent on a manufacturer and is faced with costs that are difficult to calculate when changing providers.

Due to the availability of platforms such as Amazon Web Services (AWS) or Microsoft Azure, which make a wide variety of services easily usable via web services, we are seeing a change in the behavior of companies who want to buy fewer complete solutions and instead are looking for specialized services that can easily be combined and thus create targeted added value for the digital company platform. Companies choose the best features from different manufacturers and combine them to create their own solutions, whereby you control the services used via your own API management or API gateways. This creates company platforms that not only access one, but often several repositories.

Figure 2. Best-Of-Breed strategy

This often called Best-Of-Breed strategy benefits from the fact that the services available in the marketplaces have become increasingly standardized in recent years.

arveo consistently relies on a microservice architecture. The individual services are loosely connected to one another via lightweight stateless web service interfaces (http, REST) and each service can run and scale independently. All arveo functions are available via a uniform REST API gateway, which also takes care of the intelligent load distribution and the detection of defective services.

Scalability

Modern cloud-ready platforms rely on horizontal scaling and the load is distributed over many nodes, which can consist of inexpensive commodity hardware. Such a structure can also save costs through automated SCALE OUT and DOWN by switching nodes on or off as required. The arveo platform has a high tolerance for the failure of individual nodes. A high-performance availability is also required, since the end user nowadays only shows a limited understanding of long response times and can quickly switch to the competition in case of doubt.

All arveo services support containerized deployment and use stateless REST APIs so that they can be easily integrated into any cloud infrastructure. Through the use of containerized applications (Docker) and the service management of the open source Spring Framework, which well-known providers such as Netflix use and continuously improve, the services can be installed automatically as often as required and thus scale out and down if you use the cloud orchestration framework kubernetes. You can cluster together linux containers and build an auto-scaling and high available platform with high fail safety. A blue-green deployment for the risk-free, downtime-free rollout of new software versions is also possible.

Future proof

Our services use standards as far as possible, so that services from different providers can be delivered without great integration effort and the customer can react quickly to changes in the market. Due to the secure web service interfaces, all services including the database can be obtained from the cloud at any time.

With arveo services, you can build a sustained system architecture. By design arveo will you allow to separate your business logic from arveo ECM standard services and all other available cloud services like OCR, AI, document conversion (e.g. to PDF), identity management. arveo solutions are designed to be manufacturer-independent, so that the underlying REST ECM and other services can be exchanged at easily calculable costs.

This approach makes it possible to exchange individual services through to the content services of the arveo with little and easily calculable effort. Even arveo ECM services can be replaced by comparable services and via an open source S3 connector supplied, third-party systems can access the content objects migration-free using the S3 standard API.

Hybrid operation

arveo is a native cloud platform and is based on Open Source libraries and services. Through the consistent microservice architecture and the use of open source cloud technology, you can keep arveo's operating costs low.

Advantages of arveo operation

All services are horizontally scalable separately and can therefore also be operated on simple hardware. arveo runs on all Linux and Windows operating systems.
No additional license costs due to the consistent use of open source technology such as Linux, postgreSQL 12 and apache solr 8.6 NoSQL.
Container deployment: Simple integration into existing cloud platforms enables load-dependent, automated service provision up to blue-green deployment for seamless updates to new software versions.
Hybrid architecture: Flexible use of cloud services or on-premise services.
Low manufacturer dependency: By separating the user interface and business logic from the ECM / BPM services while using standards such as REST, S3 or BPMN2, there is less dependency on one manufacturer.
Web applications: We deliver templates for PWA (Progressive Web Apps) based on the state-of-the-art angular framework, which are completely open source. I.e. their surfaces belong to you and can be used independently of arveo.
Use of standards: Low training costs and high availability of know-how on the market through the use of standard frameworks (angular), standard interfaces (REST, S3, SAP Archive Link) and SDKs for Javascript, JAVA, C #.

Micro frontends

In addition, you can also use our ready-made, modern, clear, responsive and functional micro frontends, to make the arveo content services and thus their content easily available at the right time and in the right place in your business processes.

Mobile First: All surface components and interfaces are designed for mobile use.

Figure 3. Micro frontends

Architecture Overview

Content Services

arveo is a content service platform and provides a set of lightweight, operating system-independent content microservices.

All services and clients exclusively use the secure, stateless, state-of-the-art HTTPS REST API*. For the highest possible security on the web and to be suitable for mobile access, arveo uses token security based on the state-of-the-art Spring security framework.

A Java, C# und Javascript SDKs is available.

arveo has multi tenant support and separates content and meta values per tenant.

As arveo is built for cloud operating systems like Openstack you automatically deploy and can scale the arveo containerized applications with the cloud orchestration framework kubernetes. You can cluster together linux containers and build an auto-scaling and high available platform with high fail safety. Containerized applications scale horizontally and can run on commodity hardware.

arveo is available as containerized application or WAR/JAR file and allows a hybrid deployment: On-Premise or in Cloud.

Figure 4. Architecture Overview

Table 1. Content services in *arveo*
Service	Description
Document Service	Store, edit and version documents, records/folders and their metadata. Manage storage locations with retention periods (GoBD certificate & GDPR/DSGVO compliant) Search of metadata with relational database postgreSQL 12 and NoSQL document db apache solr 8.6
User Management Service	User management with users, groups and roles. Secure login and token authentication
Registry Service	Service registry for all arveo content services managing the availability of the services.
Config Service	Secure storage of configuration data in git or database
Access Control Service	Object access control providing permissions to users/groups
Audit Service	Creates and manages audit tables for all other entity types like document types, user management objects, etc. Provides API to access the audit trail of any object by its entoty ID
SAP Archive Link Service (optional)	Web server that processes documents in accordance with the SAP Archive Link standard
Document Conversion Service (optional)	Conversion of document formats like docx, xlsx, etc. to image formats or PDF/A
Enterprise User Management Service (optional)	Extends arveo with organisation structure features like positions or substitutes
Enterprise Integration Service (optional)	The _arveo enterprise integration service supports over 300 data formats and interfaces like XML, REST, CSV, Mail, Easily integrate all your applications and IT systems e.g. scheduled data import or listen on events, etc.
Federation Service (optional)	Multi repository architecture: The open connector plugin interface allows to access data from other repositories (Saperion, Documentum, file systems directories)

3rd Party Services

To operate arveo successfully the operator of the platform must provide and manage the following services.

Table 2. 3rd Party Services in *arveo*
Service	Description
Active MQ	Message Queue Service to process JMS and AMQP message
postgreSQL 12	Relational database cluster for arveo system properties and customer metadata
apache solr 8.6	NoSQL document database to support high performance content and metadata full text search
Content Storage	Either a S3 API capable object store service or a redundant file system server
Authentication Service (optional) Keycloak, Active Directory	Identity Management implementing OAUTH2 workflow for secure login. Implement Single Sign On (SSO) with identity management providers: Keycloak, Active Directory
Monitoring (optional)	Supports logging / monitoring via ELK (Elasticsearch, Logstash, and Kibana. Supports Spring Service Admin Monitor Supports Prometheus + Grafana Monitoring frontends

Industry standards

arveo relies on industry standards as much as possible to make integrations as easy as possible.

API: REST (JSON)
Storage: S3 (Cloud Object Storage API)
Authentication: OAUTH2, X.509 or Basic Auth.
Relational Database: JDBC access for PostgreSQL, Oracle, SQL Server
SAP: Archive Link Service
Containerized application deployment

Opensource technology stack

The technology stack has been chosen to ensure creating high-performance, cloud- and client-capable and scalable state-of-the-art (micro) services with a modern web user interface. Our chose tech stack enables the implementation of both small projects, which only consist of a single component in the backend, and large projects with various distributed components. The created components are deployable both locally on the customer’s hardware and in a cloud environment.

So the stack consists of the following components:

Spring Framework

The implementation of the backend components has been done in Java and Kotlin. The Spring Framework is used as the basis. Spring is an Open Source (Apache License) framework that has existed since 2004 with a large and very active developer community. The framework has a modular structure, which is why it is suitable for both simple and complex applications. It provides dependency injection, externalized configuration, and assistance with things like database access, transactions, messaging, etc.

Spring MVC, WebFlux

Spring MVC is a framework for creating web applications, especially for REST services. It is based on the servlet stack, in which a request is processed in a dedicated thread. WebFlux is also a framework for web applications, but is based on the reactive stack, in which the processing of a request is not restricted to one thread.

Spring Security

Spring Security is a component that provides authentication and authorization functionality. It can be used to secure web applications and also offers support for SSO technologies such as OAuth and SAML.

Spring Cloud

Spring Cloud is a collection of additional Spring components that provide the typical functionality required in a distributed or cloud application. The individual components can be used independently of one another and partly consist of integrable dependencies as well as independent applications. Which of the Spring Cloud components are used therefore depends entirely on the project requirements. Spring Cloud applications can be operated in managed cloud environments such as Cloud Foundry.

Spring Cloud Config

Spring Cloud Config offers a central configuration service as well as a client library for components that consume the configuration. In a Spring Boot application, it is sufficient to add the corresponding dependency. From then on, Spring will automatically read from the configuration service if it is available. The configuration data can be stored in simple files, in a database, a Git repository or in a protected repository such as Vault.

In a distributed application with several components running on different machines, Spring Cloud Config can be used to implement central management for the configuration of all components.

Spring Cloud Bus

Spring Cloud Bus provides a bus for communication between the components or for connecting external components. The communication is based on the AMQP protocol and requires a backend such as RabbitMQ or ActiveMQ. With the help of the bus, e.g. Notify components when their configuration in the configuration service has changed.

Eureka

Eureka is a Spring Cloud component provided by Netflix that provides a service registry. A service registry is a central directory of all service instances. A service or a client application therefore only needs to know the URL of the service registry in order to access one of the other services. Eureka is an independently executable component and offers a client library for access to the registry.

Hystrix

Hystrix is a Spring Cloud component provided by Netflix that can be imagined as a fuse in an electrical installation. If one component of a cloud environment fails, Hystrix can isolate it from the other components to prevent further failures. Another instance of the component can then provide the functionality.

Zuul

Zuul is a Spring Cloud component provided by Netflix that provides an API gateway. An API gateway acts like a reverse proxy and hides the individual microservices from a client application. The client application only knows the API gateway and does not have to worry about the URLs of the various services.

Ribbon

Ribbon is a Spring Cloud component provided by Netflix that provides a client-side load balancer.

Archetypes

There are Maven archetypes that can be used to easily start a new project based on our technology stack. Different archetypes are available for different types of applications. The generated projects contain a Jenkins file with a preconfigured CI environment including static code analysis with sonar, OWASP dependency checks, load tests based on JMeter, a release mechanism at the push of a button and an optional teams hook. Also included are packaging modules with which the application can be packaged as a Linux daemon or as a Windows service and IDE configuration files for IntelliJ and Eclipse.

Logging

In order not to depend on a specific logging implementation, logging has been implemented with a logging facade SLF4J or to be exact, with its specific implementation logback. In contrast to Log4J, Logback is actively maintained and is less complicated during initialization. It can be combined with SLF4J. Logback is one of the standard Spring dependencies.

Caching

Caching frameworks are available in many variants that cover very different use cases. Frameworks are listed here sorted according to their primary use case.

Local in-memory cache

Caffeine has proven itself as a fast local in-memory cache. It can be combined with Spring’s caching abstraction layer.

JDBC connection pool HikariCP has proven itself for JDBC connection pooling. This pool is also Spring’s standard dependency.

Security

Application security

arveo is a content service platform you can trust. We are continuously working to ensure that our services can be operated securely in the cloud.

All arveo content services and clients communicate via state-of-the-art secure REST interfaces via the secure HTTPS (SSL) protocol. All services require the web standard OAUTH2 with OpenID Connect authentication using tokens. A central authentication service (Keycloak, Active Directory or arveo user management service) issues tokens with an expiry date. That ensures that only client authenticated against the central service can use the content service APIs.

Data security

arveo can encrypt the content with AES 256 and thus protect it against unauthorized access. The key is stored in such a way that maximum security is guaranteed. In order not to re-encrypt all data if the key is compromised, own keys are generated. Only the keys used are encrypted with the customer key and stored separately (Encryption).
See also Data Integrity.

arveo allows you to organize documents into folders and records. arveo can control the access rights such as reading, writing or deleting to each document via attributes or access lists and thus grant or deny the corresponding access to the groups or users.

ACL Permissions

None - no authorization (object not visible)
Browse - the user is allowed to see the metadata of the object, but not the content
Read - the user can read metadata and content
Relate - The user can add an annotation
Version - The user may change the content, but may not overwrite it
Write - The user can change metadata and content with the possibility to overwrite
Delete - The user can delete the object

Tenant security

The metadata and the content of the tenants are separated. Each tenant has its own storage container and database. It is ensured that all data of a tenant is protected from unauthorized access by another tenant.
The data of a tenant can be easily exported.

Security patches

For us it is important to continuously ensure that all known vulnerabilities are fixed and that we deliver security patches and hotfixes as early as possible to our customers.

To achieve this goal we integrated all kind of state-of-the-art tools like OWASP dependency check in our build process that perform automated static code analysis. We also perform PEN Tests on a regular basis.

What is OWASP?
The Open Web Application Security Project® (OWASP) is a nonprofit foundation that works to improve the security of software. Through community-led open-source software projects, hundreds of local chapters worldwide, tens of thousands of members, and leading educational and training conferences, the OWASP Foundation is the source for developers and technologists to secure the web.
OWASP is dedicated to enabling organizations to conceive, develop, acquire, operate, and maintain applications that can be trusted.
All of our projects, tools, documents, forums, and chapters are free and open to anyone interested in improving application security (https://owasp.org).

Application protection by design

What does Eitco to develop, operate and maintain a secure content service platform?

we only use Opensource Software from secure and accepted projects like Apache or Spring.
we implemented an open source review and monitor process
- Software architecture review by the Eitco software architects
- security check using OWASP dependency check
- legal licence check to ensure that it is a real open source project on the long term.
- we continuously check our open source dependencies with reference to architecture, security leaks, maintainability.
to ensure that all known vulnerabilities of 3rd party open source projects are eliminated we integrated the OWASP dependency-check tool in our nightly build. Dependency check checks our dependencies against a database with all known vulnerabilities.
in case a severe vulnerability is found we take the appropriate countermeasures.
- provide a security path for our customers with a new version of the 3rd party library
- change the implementation or configuration using the 3rd party component
- inform our customers to update or reconfigure components like database, message queue, application server, etc.
- replace the 3rd party component. The typically requires a major update.

OWASP dependency-check tool

it is a Software Composition Analysis tool trying to find vulnerabilities made public within the project dependencies.
The tool checks if there is an issue tracked in the "Common Platform Enumeration (CPE)" for the dependency.
If a vulnerability is found it creates report with a link to the CVE entry.
It is command line interface that can be easily integrated in any nightly build process.
For further information, consult National Vulnerability Database (NVD)– (https://nvd.nist.gov).
The following source is worth having a look at: Jeff Williams und Arshan Dabirsiaghi “Unfortunate Reality of Insecure Libraries”
(https://owasp.org/www-pdf-archive/ASDC12-The_Unfortunate_Reality_of_Insecure_Libraries.pdf).

Compliance recommendations (GoBD)

All companies using electronic data processing for legally or tax relevant documents have to be compliant to the "Principles for the proper management and storage of books, records and documents in electronic form and for data access" (GoBD, BMF letter November 28, 2019).

In addition to the proper use of the arveo and 3rd party services, we recommend implementing these measures when using Eitco as compliant repository for legally compliant storage of records and documents.

Indexing and retrievel

To allow users and 3rd party applications to identify and find objects in arveo you should define a unique and immutable unique identifier property ( Data Modelling). The property must be @Unique to ensure that a user or business application can clearly identify the item. The unique identifier should use the taxonomy of business processes and contain all information to clearly recognize the document. Make the property @Readonly to ensure that the identifier is always set and immutable.

The minimizes the risk of incorrect indexing and undetectability of documents because the index is immutable, duplicate identifiers are rejected and the compliant taxonomy ensures that every user can find documents easy and fast. We strongly recommend building a documented, simple but clear taxonomy.

Your business application or the user must set the value when the object is created (@Mandatory annotation), or you can let arveo create a unique value by adding counter annotations. Add the @Autoincrement annotation if a simple sequential Long id meets your requirements.

If you need a more sophisticated unique identifier you can use the annotation _@FormattedCounter which allows you to create e.g. String identifiers like <year>-<sequence> (Unique Identifier Example)

List data types allow you to store more than String or long value for a property. You can search for each value using the array search operation of the arveo query language (Data Types).

Enumeration data types allow you to set one or more values from a fixed set of values.

Retention periods

Enable that the statutory retention periods are assigned to the records, cases and document types (Retention Periods, Retention Rules) and ensure that the storage container are configured correctly (Retention Container) .

Check if the technically assigned retention periods also correspond to the statutory retention periods. Monitor the audit logs to ensure that the retention period is set and is correct. Monitoring could be automated or could be a random control by an employee.

The operating team must ensure that storage container contain only documents with the same retention period. Please do not use the same bucket in different storage profiles or assign a storage profile containing content with retention to different document types.

Grant the deletion right for your storage containers to arveo. If arveo cannot delete the containers, your operating team is in charge of this task, and you must set the option delete rows only.

Configuring storage containers in arveo-service.yaml and your content storage is an ongoing task for your operating team. Eitco will try to create the buckets or subdirectory on your storage system but can also use already existing ones.

It must be ensured that the system time cannot be manipulated (e.g. NTP server). Suitable map measures that a change in the system time is detected promptly.

Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo please take care that the content storage has no default hardware retention activated.

Audit log

Enable the audit option for all types containing legally compliant content (Audit Log). If the platform is operated safely (Platform Security) users and applications can exclusively write content and metadata using the _arveo REST API. arveo logs all user or application update operations of content and metadata to the audit table.

All changes of content or metadata are persisted as a traceable and immutable version (Versioning) on your storage system and an audit entry is written to the audit log table (Audit Log) containing the author and the timestamp of the change. If a document is updated the version is incremented and saved in the version number. Although all version are traceable and accessible by the API we recommend making the version number system property visible in the application to identify copies of the original easily.

Ensure that the @Overwrite option is not set for legally compliant document types. If overwrite is turned on it is possible to manipulate the originally saved content and compromise the document without creating a versioned copy.

The audit logs are subject to the retention period of commercial and tax law. Ensure that the audit logs are kept for the legal retention period (10 years). We recommend that the operator of the platform exports and clears the audit tables using database tools after 2 years. Save the dumps as an arveo document with a 10-year retention period. If you need access to older audit logs you easily download the dumps and upload them to the database.

The audit tables must be protected against unauthorized access by users. Do not allow write-access to the audit tables to anyone but the arveo services. Only data protection officers are allowed to have controlled read access to the audit data.

Check the audit logs regularly to find unauthorized user activities.

Download and migration

All documents in arveo that are subject to retention are available by the REST API and can be downloaded. The integrity and availability of the content is the responsibility of the provider and operator of the platform. The provider must ensure that failures of the storage systems for database and content are identified at an early stage and take appropriate countermeasures. See chapter Fail Safety for technical and organizational measures for high availability of the arveo platform.

In the event that data has to be migrated, arveo offers an extensive export API that enables content and metadata to be exported. arveo saves the hash value (https://en.wikipedia.org/wiki/Cryptographic_hash_function) in the database that was determined when the content was first uploaded (Upload Data). This hash value can be used as a checksum to detect accidental or intentionally corruption of data. If the hash value of the content after the migration is identical to the original hash the migration report proves the correctness of the migration process. To report the completeness of the migration process the arveo API allows you to export a list of all records, cases and documents in a document type.

Legally compliant migration

Prerequisite for the migration
- use verify and best hash check possible in your solution when uploading content to arveo.
During the migration
- download content and metadata (including the original hash and retention period)
- upload metadata and content to the migrated platform and set the retention period to the exact same value.
- calculate hash of the migrated platform by downloading the content
After the migration
- Correctness: compare hash, metadata and retention period for each original and migrated record, case and document.
- Completeness: check that each migrated document can be found using the unique identifier
- Traceability: Create a report for each document type. Report the content hash evidence and the metadata for all migrated objects.
  Upload the migration report to the migrated platform and set the retention period to the retention date of the document with the longest retention period within the report.
  Depending on your retention policy you can create separate reports for a retention period range (e.g. by year).

Data integrity

arveo guarantees high availability, reliability and high performance at all times. The system has to be protected from manipulation attempts by proven and well-thought concepts. The data that is stored and managed in the system is protected via the API. The access and editing rights are managed via ACLs. User rights are based on the developed concepts for roles, groups and ACLs. More detailed information on this is provided in the relevant chapters of this manual.

Access to all data (documents, metadata) takes place exclusively via the API, with the corresponding protection mechanisms so that the security of the data is guaranteed at all times.

Content storage

Enable the verify option for all clients and integrations. The upload API optionally can verify the uploaded content. The content service downloads the just uploaded stream from the content storage and compares the hash once again with the expected value (Upload content). arveo stores the hash value in a system property and persists the value in the document type metadata table.

In case of very sensible data you can enable transparent encryption (Encryption) to follow the data protection rules and prevent your administrators from access of document content.

Databases

For the supported databases postgreSQL 12 you can select between different data replication strategies:

Asynchronous replication (backup or mirror): Enables an asynchronous disaster recovery. Your database is periodically mirrored.
Synchronous database cluster: Transactions are synchronously replicated on more than one master node. The provider of the postgreSQL 12 cluster must guarantee that data is stored redundant and reduce potential data loss. The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the backup strategy prevents data loss.

The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the backup strategy prevents data loss.

Fail safety

The system operator is responsible for data security and recovery. He must ensure that the backups of the data are checked regularly and that recovery is reliably possible in the event of a failure. The IT processes that ensure the secure, redundant and highly available storage of arveo data in databases and object or file system storage systems are particularly decisive for the proper operation of the platform. These are the responsibility of the operator of the platform, who must implement the availability and security of the systems in accordance with legal and organizational requirements.

We strongly recommend using a redundant file system or object storage system. If you do not at least backup your data periodically a data loss is likely. For high availability with almost zero data loss our storage system should replicate the written content and data synchronously. The operating team of the platform must ensure that an appropriate replication is set up and monitored.

Object storages with REST APIs are designed for the cloud. If you decide to use storage from the Cloud (public or private) we recommend to use object storage via S3 API. Object storages provide a high level of redundancy (even geo redundant) and fail safety. The REST S3 API is very tolerant against network and infrastructure failures.

Ensure technically and organizationally that there is sufficient space for storing the data.

For the best high availability the provider of your storage system must protect the stored data against accidental, malicious, or disaster-induced loss of data. The better your data replication the better is your availability in case of a failure.

To achieve high availability for arveo the provider must guarantee that all required (content services) run as a cluster.

Security

Operators

The provider of the arveo services should ensure that only authorized data protection officers & administrators have data write (INSERT,UPDATE, DELETE) permissions for the database and the content repository.

An administrator only can illegally manipulate content if he can access both database and content storage because the control hash value of the content is stored in the database. Take care that none of your administrators has exclusive and unattended access to the content storage and the database.

Distributed management roles of the storage systems and the arveo transparent encryption feature make your system more forgery-proof!

The activities of administrators with extensive rights must be logged by the operator. The logs are subject to the retention periods of tax law and must be checked regularly.

Platform

To prevent unauthorized access to the arveo platform the provider must:

ensure that HTTPS communication is enabled for all clients, applications, 3rd party components and services (Services).
enable OAuth2.0 or X.509 certificate authentication X.509 certificate authentication and authorization for all arveo service (OAuth2.0). All arveo services require authentication, ensuring that only arveo services or authenticated and authorized users can use the API. We recommend using a state-of-the-art authentication services like keycloak and to enable SSO with at least 2-factor authentication.
take suitable technical or organizational actions against unauthorized changes to the data such as firewall, VPN, transparent encryption with arveo or at hardware level,
provide adequate protection of passwords by using a state-of-the-art IDP such as Keycloak or MS Active Directory and increasing the password complexity accordingly.
take actions against denial of service attacks.

arveo Content Services

The administrators of the arveo platform must:

make sure that only authorized persons receive an account that grants access to arveo documents;
ensure that objects are protected against unauthorized access using ACLs. We recommend defining a separation of functions and implementing this via ACLs. To achieve the best data security assign ACLs to all records, cases and documents. Make sure that for all used ACLs the assignment of access rights to users and groups is carried out regularly (e.g. Invoice document type, accounting: write, employees: read);
the activities of managers who can change ACLs are logged via arveo audit and checked at regular intervals;
organizationally ensure that the password the arveo administration users are changed regularly.

Data Store

Persistence architecture

arveo guarantees forgery-proof long term availability of your content and metadata.

All revisions of content or metadata are stored as a traceable and immutable version (Versioning) to the storage systems. The content service checks the integrity of uploaded content by computing SHA-256 hashes on client and server side. Additionally, an audit entry is written to the audit log table (Audit Log). arveo provides a role based access control on object level and allows you to prevent unauthorized access to content and metadata.

arveo protects content and metadata by software design. arveo only allows access to content and metadata via the arveo REST API. As only arveo and highly authorized administrators have data writer rights for the database and the storage it is impossible that content is deleted or manipulated by unauthorized persons.

Together with arveo's capabilities to manage the retention periods of documents and records (Retention Periods) arveo guarantees a GDPR and/or DSGVO compliant data protection and data privacy.

arveo meets the the requirements of a revision proof long term archive and is a corner stone for the legal compliance of your IT systems.

Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo does not use hardware retention features.

If needed you can add verifiable evidence records to the documents (signatures, timestamps) to proof the integrity and authenticity of content and author. The creation of the evidence is not a feature of arveo . It only stores the record together with the content.

In this chapter you will find all information how to setup a secure and legally compliant content service platform with arveo.

Data types

arveo distinguishes three kinds of data and stores each to the most suitable storage system.

Content: arveo stores unstructured content like documents, audio, video and images to either a cloud object storage or a file system storage.
Most cloud providers like AWS S3, NetAPP ONTAP, EMC Elastic Cloud, etc. provide file system storage or object storage systems. Object storages are organized in buckets and allow you to store an almost unlimited numbers of objects in a bucket. arveo accesses the content via REST Standard S3.
For an optimized and fast access of often used content objects arveo can integrate a NoSQL Keyvalue Cache DB like redis.
Structured system properties: containing all primary keys and technical information about documents, containers and folders. The data has a fix data model and requires highest performance, consistency and transaction support. arveo saves the data on a relational database.
Customer specific metadata: The data model is different for each document, container or folder type. This metadata is semi structured and new properties might be added during the life cycle of the application.
- Eventually consistent customer information: Sometimes the consistency of the data is not important, but we must guarantee a high performance and facets support when we filter by any value without the risk of a full table scan. arveo saves customer metadata on a NoSQL document DB apache solr 8.6 which is highly efficient for inserting and searching and offers automatic completion and facets.
- Consistent customer keys: The properties require highest performance, consistancy and transaction support. arveo saves the data on a relational database.

High availability

The high availability (HA) of arveo depends highly on the HA of the storage systems for all kind of data. Each of the storage systems and as a result the arveo services follow the CAP (Consistency, Availability and Partition Tolerance) theorem saying that the availability and fail safety of a system depend on:

Consistency: All clients see the same content and metadata.
Availability: All clients can read and write.
Partition Tolerance: the system is fail safe when one or more nodes fail.

The CAP theorem in a nutshell predicts that you cannot have all three properties but only two of them.

As arveo is a ECM cloud platform consistency and availability (read/write) of content and metadata are most important. arveo tolerates that network or message failure of either the primary content storage or database node can cause exceptions on the client application. The arveo services do not store data within their containers and focus on scalability and partition tolerance.

The arveo micro services should be deployed as containers in your cloud environment (e.g. kubernetes) and auto scaling should be implemented.

Data integrity

arveo ensures the immutability and integrity of all your digital content and evidence records by an automated hash check each time content is up- or downloaded.

Upload

Hash-Check: When you use the upload content API, the client side and content service compute SHA-256 hash for the streamed data. Only if both values are identical the upload process is successful. The upload API allows you to pass the expected SHA-256 value and the API will only return OK if the server side hash matches the expected hash.

Verify: The upload API optionally can verify the uploaded content. The content service downloads the just uploaded stream from the content storage and compares the hash once again with the expected value (Upload Content). arveo stores the hash value in a system property and persists the value in the document type metadata table.

The verify option of the Upload API may slow down your system when uploading a huge amount of data.

Transactions

The arveo REST API is stateless and there is no session. That means that all REST API calls are atomic and all database commands are executed within one transaction. arveo guarantees the atomicity of the transactions and to avoid inconsistent states, all aborted transactions are removed and rolled back. Hanging transactions are removed and rolled back to avoid database locks.

The database provider should configure the transaction deadlock timeout on your database to avoid locks on the database that can decrease the performance of your UPDATE and DELETE calls.

Download

When you use the download API (Download Content) the client SDK computes the SHA-256 hash of the downloaded stream and compares it to the hash value in the system property of the document type. If the hash does not match the upload hash value in the database the download fails with a data integrity exception telling the caller that the data on the storage was most likely manipulated.

Distributed management roles of the storage systems and the arveo transparent encryption feature can make your system forgery-proof!

Content storage

arveo support evidence proof long term storage of your content and metadata by storing the content legally secure to either a S3 object storage or a file system. The storage must be redundant. Object storage systems like AWS, NetAPP or EMC Elastic Cloud Storage guarantee the long term availability and integrity of your content.

All changes of content or metadata are persisted as a traceable and immutable version (Versioning) on your storage system and an audit entry is written to the audit log table (Audit Log). arveo creates a version each time metadata including comments and annotations or content of a document is changed by the API arveo} creates a new entry containing the author and the timestamp pf the change in the version management table. The Update API allows you to add a comment to each version. The Version Management API provides access to all version information and metadata and content of previous versions.

To ensure that the content is immutable only arveo should have write access to the storage system.
Only authorized data protection officers & administrators should have write-access to the storage system.
In case of very sensible data you can enable encryption (Encryption) to follow the data protection rules and prevent your administrators from access of document content.

For best high availability the provider of your storage system must protect the stored data against accidental, malicious, or disaster-induced loss of data. The better your data replication the better is your availability in case of a failure.

Data replication (redundancy)

For both supported storages (S3, file system) you can select between different data replication strategies:

Backup or Mirror enables an asynchronous disaster recovery. Your content data is periodically mirrored and the data;
Synchronous replication;
Asynchronous replication.

Fail Safety (Consistency, Availability)

As arveo stores each version of the content as an immutable object it is not possible that clients will get outdated data. If the replication is asynchronous it only can happen that clients get a read error.

In case the storage is offline arveo is not available and the system has an outage. In case the storage allows only read access arveo can download content but upload operations fail.

If the storage node has a long term outage the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.

We strongly recommend using a redundant file system or object storage system. If you do not at least backup your data periodically a data loss is likely.
For high availability with almost zero data loss your storage system should replicate the written content and data synchronously.
The operating team of the platform must ensure that an appropriate replication is set up and monitored.

You can configure different storage location (Cloud-Storage or on premise) for your content and document types (Storage Configuration).
Reduce costs by storing non compliant and legally relevant data like PDF/A renditions of documents on storage systems with lower availability and performance SLAs.

Consistent meta data storage (relational database)

The relational database postgreSQL 12 is responsible for 100% consistent processing of the structured metadata and transactions.

Data replication (redundancy)

For the supported databases postgreSQL 12 you can select between different data replication strategies:

Asynchronous replication (backup or mirror): Enables an asynchronous disaster recovery. Your database is periodically mirrored.
Synchronous database cluster: Transactions are synchronously replicated on more than one master node.

The provider of the postgreSQL 12 cluster must guarantee that data is stored redundant and reduce potential data loss.

Fail safety (consistency, availability)

In case the database cluster is down or allows only read access arveo is not available (Deny Of Service/DOS). If the database has a long term outage and the data files are affected the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.

Eventually consistent meta data storage (NoSQL document database apache solr 8.6)

arveo uses modern NoSQL storage technologies to guarantee high search performance and horizontal scalability at all times. We store semi-structured or dynamic document metadata to a NoSQL document database apache solr 8.6.

Solr is an open source search platform that has been partially integrated into arveo.

Based on the type definitions that are created in arveo, arveo automatically creates a schema that Solr uses. In addition, for each client that is created in arveo, a new collection is also created in Solr, so that there is also a separation of data there.

Data Replication (Redundancy)

Setup a cluster of replicated nodes for apache solr 8.6. Refer to the apache solr 8.6 documentation to setup a redundant cluster.

Fail safety (availability, partition tolerance)

In case the database cluster is down arveo is still available but free customer searches fail. In case one database node is down or the database is read only arveo is still available but searches may return outdated results. If the database has a long term outage and the data files are affected the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.

The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the a backup strategy prevents data loss.

Clustering

Each arveo service can be configured as a service cluster to achieve HA. Depending on the deployment you can either set up an application server cluster (WAR deployment) or run our containerized applications on a cloud platform like open-stack with kubernetes.

Fail safety (consistency, availability)

Table 3. Content services in *arveo*
Service	Failure risks	Recommended
User management Service	No login possible, system outage	Cluster 2
Config Service	Configuration not available to all nodes, system outage	Cluster 2
Registry Service	Service registry not available, system outage	Cluster 2
Document Service	Store, edit and version documents and metadata not available, system outage	Cluster 2-n, automatic scale up/down by load
SAP Archive Link Service	SAP archive link not available, SAP outage	Cluster 2-n, automatic scale up/down by load
Document Conversion Service	Conversion to PDF/A not available	Cluster 2-n, automatic scale up/down by load
Enterprise Integration Service	Job execution paused and integration with external systems not available	Cluster 2-n
Federation Service	Access to external repositories (Documentum, Saperion) not available	Cluster 2-n, automatic scale up/down by load
Access Control Service	Access of objects with access control list fails, partial system outage	Cluster 2

Required 3rd party services

To operate arveo successfully with high availability the operator of the platform must provide the following services as a cluster.

Service	Failure risks	Recommended
Active MQ	Asynchronous operations are not triggered	Cluster 2
postgreSQL 12	Access to metadata not available, system outage	Cluster 2-n depending on load and configuration of postgreSQL 12 cluster
apache solr 8.6	Enterprise search not available	Cluster 2-n depending on load and configuration of apache solr 8.6 cluster
Content Storage	Content access not available, system outage	Storage cluster depending on provider
Authentication Service (optional)	Login not available via OAUTH2, system outage	Cluster 2
Monitoring (optional)	ELK (Elasticsearch, Logstash, and Kibana)	Cluster 2

Service

Failure risks

Recommended

Active MQ

Asynchronous operations are not triggered

Cluster 2

postgreSQL 12

Access to metadata not available, system outage

Cluster 2-n depending on load and configuration of postgreSQL 12 cluster

apache solr 8.6

Enterprise search not available

Cluster 2-n depending on load and configuration of apache solr 8.6 cluster

Content Storage

Content access not available, system outage

Storage cluster depending on provider

Authentication Service (optional)

Cluster 2

Monitoring (optional)

ELK (Elasticsearch, Logstash, and Kibana)

Cluster 2

To achieve high availability for arveo the provider must guarantee that all required content services run as a cluster.

Data deletion

By default all documents of a specific document type stored in arveo store the metadata to the configured database and their content to the object storage. When a version is created the content or metadata is stored as a traceable and immutable version (Versioning) to the database and storage system. That means that we have separate content objects and database entries for each version. Each document can have a retention period that ensures that the document cannot be deleted before the period expires.

You can delete or purge any object with the arveo Delete-API if you have the DELETE right for the document type and the object ACL and the retentions period has not expired.

The delete method deletes all entities including all versions of the object in the database, but it does not delete the content objects or files. The delete operation cannot be restored and the data is permanently deleted.

The purge method additionally erases the content objects or files from the content storage.

If you delete objects only in the database the content objects are orphaned, and it is impossible to restore them and almost impossible to delete them later on because there is no relation left in the database. The content objects remain as data trash in the system and cannot be accessed by the API.

Recycle bin

Any document, container or folder type can use the optional recycle bin feature. If it is enabled, entities in the type definition can be moved to and restored from the recycle bin.

The recycle bin is implemented as a boolean database system property DELETED. Entities in the recycle bin will be filtered from normal queries by default, but a client can compose search expressions that override this behavior (see Recycle Bin).

If you delete or purge an object in the recycle bin it is deleted like a document without recycle bin feature and cannot be restored.

For compliance reasons the audit entries in the database are not deleted by the Delete-API and the delete operation written to the audit log. The operator of the platform must clean up the audit table after the legal retention period has expired. We recommend backing up the audit logs to meet the legal requirements of data protection and to ensure that the backups can be restored within the legal retention period.

Automated recycle bin emptying

It is possible to empty your recycle bin by an automated job scheduled in the Enterprise Integration Service of arveo.You can activate the predefined empty recycle bin job, and you can change the age from the 6 months default value to the age you choose. The job deletes all entries permanently that have been in the recycle bin for longer than the set age.

Recovery log

In addition to the recycle bin feature, the arveo offers an additional safety layer to recover permanently deleted entities.By annotating a type definition with @Recovery, it is possible to define a time period, in which permanently deleted entities will be kept in a system-wide recovery table before they are removed completely.An entity in such a type definition that is deleted (or purged) will be removed from the type definition’s table (and its version table).A copy of each version of the entity will be stored in the recovery table making it possible to restore it manually.If the entity is a document, its contents will not be deleted from the storage until the entity is removed from the recovery table.

There is no API to restore data from the recovery table. This feature is only intended as a last backup in order to make accidentally deleted data available to the business by an administrator.The admin can copy the content file from the storage together with the JSON metadata and send it to the business department.

Recovery log emptying

The system management API provides a method to remove expired entities from the recovery table.An entity is considered expired when its keep-until timestamp is in the past compared from the moment the method is invoked.A user who calls this method needs the ECR_PURGE_RECOVERY_TABLE authority (see Access Rights).

The recovery of deleted entities is a manual process. The recovery table contains a JSONB column containing a JSON representation of the entire entity including attributes, content information and modification information. Each version of an entity is contained in the recovery table as a separate row.

It is possible to empty the recovery log by an automated custom job scheduled in the Enterprise Integration Service of arveo. The job must execute the Management-API method to empty the recovery table.

Installation

Deployment options

The lightweight and stateless services are delivered as containers for all platforms and allow the arveo to automatically scale horizontally. Customers have the choice between an on-premise, cloud or hybrid installation.

The deployment may be done as a:

Docker images (for the arveo services): A Docker image is a template that contains a set of instructions for creating a container. Several containers can be started from one image;
executable jar: Integrate the content services in your java application and run on any platform that provides a JVM;
a .war file: Deploy the services as web applications in an application server like Tomcat;
Spring Boot application: Deployed as a self running service using an embedded undertow servlet container;
Debian package: Debian packages are used for software installation on Debian-based operating systems;
Kubernetes HELM charts: Deploy the content services as containerized applications in your kubernetes environment with flexible HELM charts. That will enable load-dependent, automated service provision.

System requirements

This chapter describes the system requirements for an on premise installation. The configuration and deployment of all required artefacts is performed by Eitco or a partner by the automated deployment tool "Puppet".

General prerequisites

Firewall

Some firewall permissions are required. The IP addresses and the ports are customer-specific. In order to notify the provider of this, the customer must fill out the form customer-specific information.

Network Access

SSH access from the Eitco network to all customer-specific systems (including server) is required so that the installation can be carried out. Access to the official Ubuntu package sources is required. This is done either in the form of direct access via the Internet or by providing a local copy of the corresponding repository.

SMTP Mail

In addition, an SMTP server access is required for sending mail, as well as access to the Eitco Puppet Master via VPN. The following parameters must also be provided by the customer so that any error messages from the HL7 Integration Service can be sent by email:

SMTP_SERVER
SMTP_PORT
SMTP_STARTTLS = true / false
SMTP_USER
SMTP_PASSWORD
MAIL_TO
MAIL_FROM.

MAIL_TO is the address to which the mails are sent and MAIL_FROM is the sender address.

Reference Integration System

A reference system (in the form of a VM or similar) is required to test the system. There must be the same setup as on the customer client systems (i.e. the same web browser, with the same settings, etc). In addition, a terminal / RDP access is to be provided so that Eitco can test the client installation.

Web browser

For the administration user interfaces the following web browsers are supported: Safari, Google Chrome, Microsoft Edge, Mozilla Firefox, each in the current version.

Containerized Applications

For the installation of the product, certain requirements for the hardware, software and infrastructure to be provided must be met. In a typical cloud environment each arveo service is deployed as a containerized application and is hosted and scaled by a cloud operating system. However, a different setup can be used, depending on the customer infrastructure and the load of the system (see Deployment Options)

The following chapter describes the minimum CPU and RAM requirements of each arveo service in a production environment.

Table 4. Content services requirements
Service	CPU	RAM
Document Service	4 x> 2 GHz	>= 32 GB
User Management Service	1x > 2 GHz	>= 2 GB
Registry Service	1x > 2 GHz	>= 512 MB
Config Service	1x > 2 GHz	>= 512 MB
Access Control Service	1x > 2 GHz	>= 2 GB
Audit Service	1x > 2 GHz	>= 512 MB
SAP Archive Link Service (optional)	1x > 2 GHz	>= 1 GB
Document Conversion Service (optional)	1x > 2 GHz	>= 2 GB
Enterprise User Management Service (optional)	1x > 2 GHz	>= 1 GB
Enterprise Integration Service (optional)	1x > 2 GHz	>= 1 GB
Federation Service (optional)	1x > 2 GHz	>= 2 GB

The number of started services for each service group and the assigned CPU and RAM depends very much on the load and the number of documents and objects in the database. You should always monitor the system and scale up or down on demand. Especially service like document conversion or enterprise integration service can produce heavy load and require a lot of containers consuming RAM and CPU.

For a test or development system the requirements are lower and each service requires: < 1 CPU, 256 MB for all services.

Typical Non-Containerized Installation

Assuming that the installation is performed as spring boot services we recommend to set up a minimum of 3 machines. The database and the document service carry the highest load and should be deployed on separate machines. All other services and 3rd party services can run on one OS instance. Some services like Archive Link, Document Conversion may consume high CPU and RAM and can make it necessary to outsource them to separate machines,

System machine 1 - database. The PostgreSQL database is installed here.

Table 5. Requirements for the database machine
Component	Recommendation	Note
CPU	4x (> 2 GHz)
RAM	At least 16 GB	Depending on the size of the database
DB Storage	Proportional to the number and the kind of the entities	Recommendation: should be stored on separate storage
Log files	Depending on the volume of changes to the database	Recommendation: Should be stored on separate storage
OS	Ubuntu 18.04/20.04	The operating system recommendation is optional, hence any system satisfying the requirements of the PostgreSQL database may be installed

System machine 2 - Document Service is installed here.

Table 6. Requirements for the *arveo* machine
Component	Recommendation	Note
CPU	4x (> 2 GHz)
RAM	32 GB
Storage	Proportional to the size of the content objects	These storages are supported: 1) on a separate file storage 2) AWS, NetAPP or EMC Elastic Cloud Storage.
OS	Ubuntu 18.04/20.04	The tests are performed on a Debian machine, hence it is recommended to install a Debian based distribution, for example a current LTS version of Ubuntu

The storage is meant for storing the arveo content objects of type Document, meaning binary content. All metadata and system properties are stored in the database, see System machine 1 above.

System machine 3 - Here all other services of arveo are installed: see Content Services, 3rd party services

Table 7. Requirements for the Services machine
Component	Recommendation	Note
CPU	4x (> 2 GHz)
RAM	16 GB
OS	Ubuntu 18.04/20.04	The operating system should be a Debian based

The importance of testing shouldn’t be underestimated, so there should always be a way to test specific cases without trying it out on a production system. For this reason, it is important to create a test system, which has the same specification and a similar data set as the original system.

For the arveo services JDK 11, 16 is required. All the other recommendations listed above are non-binding, but they have proven to work well. In some cases, other recommendations can be made, according to your individual project setup as well as the requirements of the project.

Installation

General concept

These instructions describe the installation procedure, the installation content and the items required for commissioning the product. We recommend controlling the rollout of the _arveo services by a continuous integration process that provides all artefact required for the deployment of the required content services and your web solution and integrations.

Depending on the underlying platform, deployment takes place via binary service artifacts that are deployed on pre-installed VMs or via containerized applications that are made available in the host cloud system.

On premise installation By Eitco

This chapter describes the compliant On Premise installation provided by Eitco. The configuration and deployment of all required artefacts is performed by Eitco or a partner by the automated deployment tool "Puppet".

The customer provides several virtual machines that are configured by Eitco with the automated deployment tool Puppet (Puppet Deployment) in order to ensure a problem-free software rollout in the customer system.

Depending on the service level agreement Eitco can guarantee high availability, reliability and high performance at all times. The system has to be protected from manipulation attempts by technical or organizational measures. The data that is stored and managed in the system is protected via the API. The access and editing rights are managed via ACLs. User rights are based on the concepts for roles, groups and ACLs. More detailed information on this is provided in the relevant chapters of this manual.

All changes to the system and the data are logged via the API, and the changes are traceable via the audit log. If auditing is activated, every database change is logged. In order to guarantee the atomicity of the transactions and to avoid inconsistent states, all are aborted transactions removed and rolled back.

Access to all data (documents, metadata) is exclusively provided via the API, with the corresponding protection mechanisms so that the security of the data is guaranteed at all times.

Puppet

Puppet is open source software developed by Puppet Labs and is used for the automated configuration and deployment of software deliveries. It ensures the configuration management of servers with both Unix-like operating systems and the Windows operating system via network. The Ad-min-Tool allows the automated configuration of computers and servers as well as the services installed on them. The arveo services are installed and configured with Puppet. After the server has been provided, see System Requirements, the Puppet Agent is installed on it, which then takes care of setting up the environment and the actual application. The duration of the installation process can vary and requires an adequate internet connection. The individual installation components are installed in the form of .deb packages. The installation is completely automated and carried out remotely.

Installed services

3rd Party Services

postgreSQL 12 Database
apache solr 8.6 Document Database (full text)
JDK 11, 16
Keycloak, Active Directory Authentication Service
Active MQ Message Service Hub
Tomcat 9 Application Server

arveo Content Services

Document Service
Registry Service
Configuration Service
User Management Service
Access Control Service
Audit Service (optional)
Document Conversion Service (optional)
Enterprise Integration Service (optional)
Enterprise User Management Service (optional)
Enterprise Federation Service (optional).

Customer Applications & Services

Eitco or Customer application and integration services (typically web client and Apache Camel integration end pints)

Order of services

Following you find the order of the service starts. The content services may not work before important services are started.

All commands should be executed as root. When running as a non-root user, sudo should be set in front of systemctl.

The services are initially started by Puppet. After the installation of arveo has been successfully completed, the customer applications can be started. Additional information on registration, user management and the use of the web client can be found in the user and admin manual.

postgreSQL 12: systemctl start/stop postgresql
apache solr 8.6: systemctl start/stop solr.service, systemctl start/stop zookeeper.service
Config Service: systemctl start/stop common_config_service.service
Registry Service: systemctl start/stop common_registry_service.service
User Management Service: systemctl start/stop common_user_management.service
ACL Service: systemctl start/stop common_access_control.service
Enterprise User Management Service: systemctl start/stop common_enterprise_user_management.service (optional)*
Federation Service: esystemctl start/stop cr_federation.service (optional)
Audit Service: systemctl start/stop common_audit.service (optional)
Document Service: systemctl start/stop ecr_repository_service.service
Document Conversion Service: systemctl start/stop common_document_conversion.service (optional)
Enterprise Integration Service: systemctl start/stop common_enterprise_integration.service (optional)

The current status of the service can also be determined with systemctl status <service>.

SSL Certificates

If all connections between the services are to be encrypted, SSL certificates are required. The following requirements apply: An X-509 certificate with an associated "private key" is required for each server. The certificate should be signed by an official CA or the company’s own CA. Self-signed certificates can also be used. The following special feature must be observed: the X509 extension “Subject Alternative Name” must contain all DNS names and IP addresses via which the respective systems are accessed.

Licensing

The client software uses several 3rd party licenses. The list of licenses can be called up via the following link: https://<customername>.eitco.de/3rdpartylicenses.txt.

Backups

The logs are on the db server in the directory /var/log/postgresql/backup.log. The database backup script is located at /var/lib/postgresql/backup.sh. This can also be started manually at any time. There should not yet be a folder with the current date under / backup / full /. If such a folder exists, it must be moved beforehand. The script is controlled by cron and is always started automatically at 10 p.m.

Getting Started

In this guide you will create a simple application that implements a basic project file scenario. It will consist of a document type, that represents documents used in a project.

Prerequisites

To complete the steps in this guide, you need the following tools installed on your machine:

JDK 11 or newer (https://adoptium.net/). Please use only LTS versions!
Apache Maven (https://maven.apache.org/)
An IDE of your choice (we recommend IntelliJ)

Maven configuration

To be able to access the maven artifacts of arveo, you need access to the EITCO Nexus repository.

Internal

When you are inside the company network or the VPN, you can use the internal Nexus that does not require authentication. The following maven settings.xml file shows how to configure the required repositories. The settings.xml file can be found in the .m2 directory in your user home directory.

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <pluginGroups>
  </pluginGroups>
  <proxies>
  </proxies>
  <servers>
  </servers>
  <mirrors>
  </mirrors>
  <profiles>
    <profile>
      <id>repos-default</id>
      <activation>
        <activeByDefault>true</activeByDefault>
      </activation>
      <properties>
      </properties>
      <repositories>
        <repository> (1)
          <id>nexus</id>
          <url>https://nexus-intern.eitco.de/repository/maven-private/</url>
          <releases>
            <updatePolicy>never</updatePolicy>
          </releases>
          <snapshots>
            <updatePolicy>never</updatePolicy>
          </snapshots>
        </repository>
      </repositories>
      <pluginRepositories>
        <pluginRepository> (2)
          <id>nexus</id>
          <url>https://nexus-intern.eitco.de/repository/maven-private/</url>
          <releases>
            <updatePolicy>never</updatePolicy>
          </releases>
          <snapshots>
            <updatePolicy>never</updatePolicy>
          </snapshots>
        </pluginRepository>
      </pluginRepositories>
    </profile>
  </profiles>
</settings>

1	The maven repository that contains maven artifacts of arveo
2	The plugin repository that contains maven plugins used when building the demo project

External

When you are outside the company network and the VPN, you need to use the public Nexus repository that requires authentication. To do so, maven requires credentials. For security reasons, the credentials should be encrypted. Follow the instructions in the Maven documentation to configure a master password and to create an encrypted password.

You should now have created a settings-security.xml file in the .m2 directory like the one shown below:

<settingsSecurity>
  <master>{encryped-master-password}</master>
</settingsSecurity>

Then you have to adapt your maven settings.xml as follows:

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <pluginGroups>
  </pluginGroups>
  <proxies>
  </proxies>
  <servers>
    <server>
      <id>nexus</id> (1)
      <username>username</username> (2)
      <password>{your-encrypted-password}</password> (3)
    </server>
  </servers>
  <mirrors>
  </mirrors>
  <profiles>
    <profile>
      <id>repos-default</id>
      <activation>
        <activeByDefault>true</activeByDefault>
      </activation>
      <properties>
        <solr.repositoryUrl>https://nexus.eitco.de/repository/raw-public</solr.repositoryUrl> (4)
      </properties>
      <repositories>
        <repository>
          <id>nexus</id> (5)
          <url>https://nexus.eitco.de/repository/maven-private/</url>
          <releases>
            <updatePolicy>never</updatePolicy>
          </releases>
          <snapshots>
            <updatePolicy>never</updatePolicy>
          </snapshots>
        </repository>
      </repositories>
      <pluginRepositories>
        <pluginRepository>
          <id>nexus</id>
          <url>https://nexus.eitco.de/repository/maven-private/</url>
          <releases>
            <updatePolicy>never</updatePolicy>
          </releases>
          <snapshots>
            <updatePolicy>never</updatePolicy>
          </snapshots>
        </pluginRepository>
      </pluginRepositories>
    </profile>
  </profiles>
</settings>

1	The server id is used to tie credentials to repositories
2	The username you use to logon to nexus
3	The password encrypted by maven using the master password
4	Sets the repository to use for the SOLR plugin used in the system tests
5	Tells maven to use the credentials for the server with id 'nexus'

Make sure to use only https repositories when using credentials. Current maven versions already block the usage of unencrypted repository connections.

Step 1 - Type definitions

In the first step you will define the data model of your application. In arveo, this is done by creating Java (or Kotlin) interfaces which contain getters and setters for the fields that will be available on each individual entity type. There is a maven archetype to create a project that will contain those type definition interfaces and integration tests to try out the created types. More information about the archetype can be found here.

First, create a directory that will contain the project files for the demo application. Open a command line in this directory and perform the following operation.

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-types-archetype -DarchetypeVersion=<arveo-version>

The archetype version is the arveo version that you are working with. The current version is 13.0.4.

Maven will start by downloading a couple of required artifacts. After that, the archetype plugin will be started in interactive mode. It will query for several settings required for the generated project. Some of the settings have default values that can be used.

class-name-prefix: A prefix that will be used for the generated classes. Use Demo for this guide.
groupId: The group-id of the artifact that will contain the types. Use de.eitco.demo.
artifactId: The artifact-id of the artifact that will contain the type. Use demo-types.
version: The version of the artifact. You can use the default value.
package: The package that will contain the types. You can use the default value.

In the last step, the archetype plugin shows the selected property values and asks for confirmation. After the settings are confirmed, the project will be generated in a folder called demo-types.

The archetype documentation contains a description of the generated project. For this guide, the files in implementation/types are the most important ones:

DemoModel.java: This file contains the type definition interface. The generated example is a simple document type with a name, a number and some system properties. The annotations used to define the type are documented here.
DemoTypeRegistration.java: A spring component that automatically registers your type(s) in the arveo service. Only types that have been registered can be used in your application.
spring.factories: This file tells spring to autoconfigure the DemoRegistration component.

The archetype has generated integration tests for the generated type definition, too. You can find them in the directory test/system-test. The file DemoClientIT.java contains some tests that show how to perform basic CRUD operations on the generated document type.

You will notice an additional class called DemoModelId. This class demonstrates how to create a typed ID for a specific model class. It is not required for the system to be able to use the DemoModel type. If you do not require typed IDs, you can remove the class.

Running the tests

The tests are run automatically in a full maven build. The system-test module is configured to automatically start a complete arveo system including all required services and a database. If you want to run the tests manually from the IDE, you can still use maven to start the arveo system.Open a command line in the system-test directory and run mvn -Denv. Maven will start the following processes:

A PostreSQL database server
An ActiveMQ message broker
A SOLR server
The Service Registry
The Configuration Service
The User management Service
The Audit Service
The Access Control Service
The arveo Service

The services will be kept alive until you press enter in the command line.

This will only work if a complete build has been performed at least once (which can be done through mvn install).

The system set up by maven in the system test module is already configured to contain the type definitions that were defined in this project. To use those definitions in another system, you have to add the jar containing the definitions to the classpath of the arveo service instances. This can be done by copying the ja to a lib directory and adding the following command line option when starting the arveo service instances: -Dloader.path=path/to/libs

Adapt the model

Now you can adapt the generated type definition so that it fits the requirements for our project scenario. In this scenario, documents are organized in a two-level folder structure. For example, the project could contain a folder called "invoices" which again contains two folders named "inbound" and "outbound". Each document is contained in exactly one folder and belongs to exactly one project. The document type will contain the following meta data fields:

projectName: The name of the project the documents belongs to
type: The type of document, e.g. whether it is an invoice, a contract or something else
structureLevel1: This field is used to represent the first level of the folder structure
structureLevel2: This field is used to represent the second level of the folder structure
status: Represents the current status of the document
customerName: The name of the customer associated to the project
contactPerson: The contact person for the document
assignedTo: The employee currently assigned to work on the document
fileSystemCreationDate: The timestamp at which the file was created in the file system (not the time it was imported to arveo - see system fields)

In addition to these custom fields, the document will contain some system fields like content metadata (filename, size, mimetype…) and versioning information like creation- and update-timestamps. The two metadata fields name and number that are already contained in DemoModel.java can be removed.

Adding getters for system fields

Complete listings for the steps below can be found at the end of this chapter.

Let’s first add some getters for system fields. Those will provide access to system information that is generated automatically when an entity is created or updated. The generated DemoModel class already contains getters for the ID- and ACL- system properties. Add the following lines to DemoModel.java:

@SystemProperty(SystemPropertyName.CONTENT)
Map<String, ContentInformation> getContentInformation();

@SystemProperty(SystemPropertyName.VERSION_INFO)
VersionInformation getVersionInformation();

@SystemProperty(SystemPropertyName.MODIFICATION_INFO)
ModificationInformation getModificationInformation();

The JavaDoc for the SystemPropertyName enum constants contains information about each field. The data type for the contentInformation field is a map because each document can contain multiple content elements. For example, a document could contain a TIFF image and a PDF rendition of the TIFF.

Adding getters and setters for custom fields

Now we can add the getters and setters for the custom metadata fields:

@Mandatory
String getProjectName();
void setProjectName(String projectName);

@Mandatory
String getStructureLevel1();
void setStructureLevel1(String structureLevel1);

@Optional
String getStructureLevel2();
void setStructureLevel2(String structureLevel2);

@Optional
String getCustomerName();
void setCustomerName(String customerName);

@Optional
String getContactPerson();
void setContactPerson(String contactPerson);

@Mandatory
ZonedDateTime getFileSystemCreationDate();
void setFileSystemCreationDate(ZonedDateTime fileSystemCreationDate);

@Optional
Long getAssignedTo();
void setAssignedTo(Long assignedTo);

The annotations @Mandatory and @Optional can be used to control which fields have to be set by the client and which can be left empty.

The annotations for the arveo type definitions always have to be added to the getters. You can find an overview of the supported data types here.

For the type field we want to limit the possible values that can be set. This can be done by defining an enumeration. Create the following enumeration type:

package de.eitco.demo.types;

import de.eitco.ecr.type.definition.annotations.Enumeration;

@Enumeration
public enum DemoModelType {
    INVOICE,
    CONTRACT,
    OTHER
}

This enum class will be mapped to an enumeration type on the database. It needs to be registered in the type registration just like the DemoModel type. Adapt the class DemoTypeRegistration as follows:

@Component
@Register(DemoModel.class)
@Register(DemoModelType.class)
@Register(DemoModelStatus.class)
public class DemoTypeRegistration implements TypeDefinitionRegistration {
}

We will do the same for the status field. Add and register the following enum class:

@Enumeration
public enum DemoModelStatus {
    IN_PROGRESS,
    DONE
}

Don’t forget to register it in the DemoTypeRegistration class.

Now you can add the getters and setters for the two fields in the DemoModel class:

@Mandatory
DemoModelType getType();
void setType(DemoModelType type);

@Optional
DemoModelStatus getStatus();
void setStatus(DemoModelStatus status);

Your DemoModel class should now look like this:

package de.eitco.demo.types;

import de.eitco.commons.asdl.annotation.AsdlIgnore;
import de.eitco.commons.asdl.annotation.Model;
import de.eitco.commons.user.management.common.model.ModificationInformation;
import de.eitco.ecr.common.ContentInformation;
import de.eitco.ecr.common.VersionInformation;
import de.eitco.ecr.common.document.DocumentId;
import de.eitco.ecr.type.definition.annotations.ObjectType;
import de.eitco.ecr.type.definition.annotations.Type;
import de.eitco.ecr.type.definition.annotations.constraint.Mandatory;
import de.eitco.ecr.type.definition.annotations.constraint.Optional;
import de.eitco.ecr.type.definition.annotations.system.SystemProperty;
import de.eitco.ecr.type.definition.annotations.system.SystemPropertyName;

import java.time.ZonedDateTime;
import java.util.Map;

@Model
@Type(ObjectType.DOCUMENT)
public interface DemoModel {

    @SystemProperty(SystemPropertyName.ID)
    DocumentId getDocumentId();

    @AsdlIgnore
    default DemoModelId id() {

        return DemoModelId.of(getDocumentId());
    }

    @SystemProperty(SystemPropertyName.ACL_ID)
    Long getAclId();
    void setAclId(Long aclId);

    @SystemProperty(SystemPropertyName.CONTENT)
    Map<String, ContentInformation> getContentInformation();

    @SystemProperty(SystemPropertyName.VERSION_INFO)
    VersionInformation getVersionInformation();

    @SystemProperty(SystemPropertyName.MODIFICATION_INFO)
    ModificationInformation getModificationInformation();

    @Mandatory
    String getProjectName();
    void setProjectName(String projectName);

    @Mandatory
    String getStructureLevel1();
    void setStructureLevel1(String structureLevel1);

    @Optional
    String getStructureLevel2();
    void setStructureLevel2(String structureLevel2);

    @Optional
    String getCustomerName();
    void setCustomerName(String customerName);

    @Optional
    String getContactPerson();
    void setContactPerson(String contactPerson);

    @Mandatory
    ZonedDateTime getFileSystemCreationDate();
    void setFileSystemCreationDate(ZonedDateTime fileSystemCreationDate);

    @Mandatory
    DemoModelType getType();
    void setType(DemoModelType type);

    @Optional
    DemoModelStatus getStatus();
    void setStatus(DemoModelStatus status);

    @Optional
    Long getAssignedTo();
    void setAssignedTo(Long assignedTo);
}

Before you can build and use the adapted type, you have to adapt the generated integration tests.

Step 2 - Command line tool

In the second step you will implement a simple command line application that uses the model defined in step 1. We will use the Spring Initializer to generate a maven project with the required dependencies for a Spring command line application.

Generating the project

Go to https://start.spring.io/
Under "Project", select "Maven Project"
Under "Language", select "Java"
Select Spring Boot version 2.7.10. If your required version is not available, select the most compatible one in terms of major.minor.patch.
Define project metadata. For example, use Group = de.eitco.demo, Artifact = demo-tool, Name = demo-tool, Package name = de.eitco.demo.tool
Select "Jar" Packaging
Select Java version 11 or newer
Add a dependency to "Picocli"

Click Generate and download the zip file containing the generated project. Unzip the file to a directory of your choice and open the project in your IDE. Delete the 'test' directory.

Adding arveo dependencies

Open the generated pom.xml file and add the following dependencies:

<dependency>
    <groupId>de.eitco.ecr</groupId> (1)
    <artifactId>ecr-sdk-http</artifactId>
    <version>13.0.4</version>
    <exclusions>
        <exclusion> (2)
            <groupId>de.eitco.commons</groupId>
            <artifactId>cmn-spring-security5-oauth2-client</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency> (3)
    <groupId>de.eitco.commons</groupId>
    <artifactId>cmn-spring-security5-oauth2-client-non-web</artifactId>
    <version>4.0.0</version>
</dependency>
<dependency>
    <groupId>de.eitco.demo</groupId> (4)
    <artifactId>demo-types-types</artifactId>
    <version>1.0-SNAPSHOT</version>
</dependency>

1	This dependency contains a spring boot starter for the arveo SDK
2	We have to exclude the OAuth2 client for web applications because the tool will be a console application
3	This dependency contains the OAuth2 client for non-web applications
4	The data model that was defined in step 1

You have to set the version of arveo that was used in the project containing the data model.

Additionally, you have to define a dependency management for the EITCO Commons Spring Security library:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>de.eitco.commons</groupId>
            <artifactId>cmn-commons-spring-security</artifactId>
            <version>7.0.10</version>
        </dependency>
    </dependencies>
</dependencyManagement>

Implementing the tool

The tool will use the picocli library to make it easy to write a command line application with features like usage help and simple parameter binding. You can read more about picocli here.

Go to the de.eitco.demo.tool package and create a new class called "ArveoCommand". This will contain the business logic behind the commands available in the command line application. In picocli, those commands need to implement Runnable, so we have to implement this interface:

@Component (1)
@CommandLine.Command( (2)
    mixinStandardHelpOptions = true,
    version = "1.0-SNAPSHOT",
    description = "arveo demo tool")
public class ArveoCommand implements Runnable {

    private static final Logger LOGGER = Logger.getLogger(ArveoCommand.class); (3)

    @Override
    public void run() {

    }
}

1	Defines ArveoCommand as an injectable spring component
2	Activate picocli features like usage help
3	We will use the de.eitco.commons.lang.Logger to log exception messages

To be able to access arveo, the command line tool will have to authenticate to the arveo service. We will use a simple username/password authentication, so the user must be able to enter credentials. With picocli, we can implement this with some annotated fields in ArveoCommand:

@CommandLine.Option(names = {"-u", "--username"}, required = true, interactive = true,
    description = "The username used to log on to arveo")
private String username;

@CommandLine.Option(names = {"-p", "--password"}, required = true, interactive = true,
    description = "The password used to log on to arveo")
private String password;

@CommandLine.Option(names = {"-t", "--tenant"}, required = true,
    description = "The tenant used to log on to arveo")
private String tenant;

Using the interactive=true option, the user will be prompted to enter username and password when the program is running.

The ArveoCommand class will need to know the directory to import from and (optionally) a customer name. To be able to access the arveo API, we have to get an instance of the TypeDefinitionServiceClient. This can be done using dependency injection:

@CommandLine.Option(names = {"-d", "--directory"}, required = true,
    description = "The base directory to import from")
private File baseDirectory;

@CommandLine.Option(names = {"-c", "--customer"}, description = "The name of the customer")
private String customer;

@Autowired
private TypeDefinitionServiceClient typeDefinitionServiceClient;

Now it is time to implement the import. Add the following methods to the ArveoCommand class:

private void importProject(File root) {

    String projectName = root.getName();

    Arrays.stream(root.listFiles()).forEach(file -> {

        if (file.isFile()) {
            LOGGER.warn(() -> "Ignored file " + file);
        } else {
            importLevel1(projectName, file);
        }
    });
}

The importProject method will be used to import a project located in the provided root directory. The scenario does not support files located directly in the root of the project, so we will log a warning when we encounter such a file.

private void importLevel1(String projectName, File level1) {

    String level1Value = level1.getName();


    Arrays.stream(level1.listFiles()).forEach(file -> {

        if (file.isDirectory()) {
            importLevel2(projectName, level1Value, file);
        } else {
            importFile(projectName, level1Value, null, file);
        }
    });
}

The importLevel1 method will collect all files and directories located in the first level of the project structure. Files will be imported directly, directories will be passed to the next importer method.

private void importLevel2(String projectName, String level1Value, File level2) {

    String level2Value = level2.getName();

    Arrays.stream(level2.listFiles()).forEach(file -> {

        if (file.isDirectory()) {
            LOGGER.warn(() -> "Ignoring directory " + file);
        } else {
            importFile(projectName, level1Value, level2Value, file);
        }
    });
}

This method collects all files located in the second level of the project structure. We do not support deeper structures, so we log a warning when we encounter a directory below level 2.

The method used to actually import data into arveo is shown below:

private void importFile(String projectName, String level1, String level2, File file) {

    AuthenticationHelper.runAsUser(username, password, tenant, () -> { (1)

        TypedDocumentServiceClient<DemoModel> serviceClient = (2)
            typeDefinitionServiceClient.getDocumentServiceClient().byClass(DemoModel.class);

        DemoModel model = serviceClient.createTypeInstance(); (3)

        model.setProjectName(projectName);
        model.setStructureLevel1(level1);
        model.setStructureLevel2(level2);
        model.setCustomerName(customer);

        model.setFileSystemCreationDate(ZonedDateTime.ofInstant(
            Instant.ofEpochMilli(file.lastModified()),
            ZoneId.systemDefault())
        );

        DemoModelType type = DemoModelType.OTHER; (4)

        String fileName = file.getName();

        if (fileName.startsWith(DemoModelType.CONTRACT.name())) {
            type = DemoModelType.CONTRACT;
        } else if (fileName.startsWith(DemoModelType.INVOICE.name())) {
            type = DemoModelType.INVOICE;
        }

        model.setType(type);

        try (InputStream stream = Files.newInputStream(file.toPath())) {

            ContentUpload contentUpload = new ContentUpload(fileName, stream);
            Map<String, ContentUpload> contentElements = Map.of("content", contentUpload); (5)


            serviceClient.create(new TypedDocumentInput<>(contentElements, model)); (6)
            System.out.println("Imported file " + fileName + " belonging to project " + projectName);

        } catch (IOException e) {
            LOGGER.exception(e);
        }
    });
}

1	The `AuthenticationHelper` takes care of populating spring’s security context with the required credentials. The OAuth2 client will use the provided username and password to retrieve an access token from the authentication service to authenticate the requests to the arveo service.
2	We use the injected TypeDefinitionServiceClient to get a service client for the type definition of our model class.
3	The service client can provide an instance of the interface defining the model. This instance is then populated with the metadata.
4	We will use a simple file name prefix to determine the type of the document.
5	Here we define the content elements of the new document
6	Finally we send the create request to the arveo service

We can now implement the run() method of the ArveoCommand class:

@Override
public void run() {

    if (!baseDirectory.isDirectory()) {
        throw new IllegalArgumentException("Base directory option must point to a directory.");
    }

    Arrays.stream(baseDirectory.listFiles(File::isDirectory)).forEach(this::importProject); (1)
}

1	We use a filter to ignore files in the base directory as they obviously do not belong to any project

Now we have to adapt the application class that was generated by the spring initializer. Spring provides a CommandLineRunner interface for command line applications. Adapt the DemoToolApplication class as shown below:

@SpringBootApplication
public class DemoToolApplication implements CommandLineRunner {

        @Autowired
        private ArveoCommand arveoCommand;

        public static void main(String[] args) {
                new SpringApplicationBuilder(DemoToolApplication.class)
                        .web(WebApplicationType.NONE) (1)
                        .run(args);
        }

        @Override
        public void run(String... args) throws Exception {

                new CommandLine(arveoCommand).execute(args); (2)
                System.exit(1);
        }
}

1	Turns off spring boot web features that are not required in a command line application
2	Initialize the picocli command line and execute our command with the options from the command line

In the last step, we have to set some configuration properties for our command line tool. Rename the generated application.properties file in src/main/resource to application.yaml and add the following settings:

spring:
  security:
    oauth2:
      client:
        registration:
          cmn-user-service-client: (1)
            provider: user-service
            client-id: "test-client"
            client-secret: "my-secret"
            authorization-grant-type: "password"
            scope: "arveo"
        provider:
          user-service: (2)
            authorization-uri: "http://localhost:39004/oauth/auth"
            token-uri: "http://localhost:39004/oauth/token"

eureka:
  client:
    registerWithEureka: false (3)

logging: (4)
  file:
    name: "demo-tool.log"
  level:
    root: ERROR

1	Configures an OAuth2 client that uses the resource owner password grant type. Client-id and secret are configured in the test system provided by the system test module of the project created in step 1.
2	Tells the OAuth2 client where to get a token from
3	The command line tool should not register itself in the service registry
4	Log only errors to a file

Building and running the tool

Now we can build and run the command line tool. You can either use the IDE or run mvn clean install in a command line for the project containing the demo tool. After the build has finished, you have to start the test system. Open a command line in the system-test module of the type definition project and execute the command mvn -Denv (see Running the tests). Now we can use another command line in the target directory of the command line tool project to run the tool. Running java -jar .\demo-tool-0.0.1-SNAPSHOT.jar prints out usage help for the tool:

> java -jar .\demo-tool-0.0.1-SNAPSHOT.jar

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v2.5.8)

Missing required options: '--username', '--password', '--directory=<baseDirectory>'
Usage: <main class> [-hV] -p -u [-c=<customer>] -d=<baseDirectory> [-t=<tenant>]
arveo demo tool
  -c, --customer=<customer>
                          The name of the customer
  -d, --directory=<baseDirectory>
                          The base directory to import from
  -h, --help              Show this help message and exit.
  -p, --password          The password used to log on to arveo
  -t, --tenant=<tenant>   The tenant used to log on to arveo
  -u, --username          The username used to log on to arveo
  -V, --version           Print version information and exit.

The test system already contains a user that can be used for testing. The user’s credentials are:

username: ecr-user
password: password

The following example shows how to use the tool to import projects from a folder:

> java -jar .\demo-tool-0.0.1-SNAPSHOT.jar -p -u -t=integrationtest -c=Customer1 "-d=C:\test-data\"

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v2.5.8)

Enter value for --password (The password used to log on to arveo):
Enter value for --username (The username used to log on to arveo):
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by de.eitco.commons.reflection.MethodLookup to constructor java.lang.invoke.MethodHandles$Lookup(java.lang.Class,int)
WARNING: Please consider reporting this to the maintainers of de.eitco.commons.reflection.MethodLookup
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Imported file INVOICE_invoice1.txt belonging to project TestProject1
Imported file INVOICE_invoice2.txt belonging to project TestProject1
Imported file customer meeting 1.txt belonging to project TestProject1
Imported file standup 1.txt belonging to project TestProject1
Imported file standup 2.txt belonging to project TestProject1
Imported file standup 3.txt belonging to project TestProject1
Imported file uncategorized meeting 1.txt belonging to project TestProject1

The warning message can be ignored. The reflective access operation will be replaced in a future version.

Finally, here is a complete listing of the ArveoCommand class for copy&paste:

package de.eitco.demo.tool;

import de.eitco.commons.lang.Logger;
import de.eitco.commons.spring.security.AuthenticationHelper;
import de.eitco.demo.types.DemoModel;
import de.eitco.demo.types.DemoModelType;
import de.eitco.ecr.common.ContentUpload;
import de.eitco.ecr.sdk.TypeDefinitionServiceClient;
import de.eitco.ecr.sdk.document.TypedDocumentInput;
import de.eitco.ecr.sdk.document.TypedDocumentServiceClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import picocli.CommandLine;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.time.Instant;
import java.time.ZoneId;
import java.time.ZonedDateTime;
import java.util.Arrays;
import java.util.Map;

@Component
@CommandLine.Command(
    mixinStandardHelpOptions = true,
    version = "1.0-SNAPSHOT",
    description = "arveo demo tool")
public class ArveoCommand implements Runnable {

    private static final Logger LOGGER = Logger.getLogger(ArveoCommand.class);

    @CommandLine.Option(names = {"-u", "--username"}, required = true, interactive = true,
        description = "The username used to log on to arveo")
    private String username;

    @CommandLine.Option(names = {"-p", "--password"}, required = true, interactive = true,
        description = "The password used to log on to arveo")
    private String password;

    @CommandLine.Option(names = {"-t", "--tenant"}, required = true,
        description = "The tenant used to log on to arveo")
    private String tenant;

    @CommandLine.Option(names = {"-d", "--directory"}, required = true,
        description = "The base directory to import from")
    private File baseDirectory;

    @CommandLine.Option(names = {"-c", "--customer"}, description = "The name of the customer")
    private String customer;

    @Autowired
    private TypeDefinitionServiceClient typeDefinitionServiceClient;

    @Override
    public void run() {

        if (!baseDirectory.isDirectory()) {
            throw new IllegalArgumentException("Base directory option must point to a directory.");
        }

        Arrays.stream(baseDirectory.listFiles(File::isDirectory)).forEach(this::importProject);
    }

    private void importProject(File root) {

        String projectName = root.getName();

        Arrays.stream(root.listFiles()).forEach(file -> {

            if (file.isFile()) {
                LOGGER.warn(() -> "Ignored file " + file);
            } else {
                importLevel1(projectName, file);
            }
        });
    }

    private void importLevel1(String projectName, File level1) {

        String level1Value = level1.getName();


        Arrays.stream(level1.listFiles()).forEach(file -> {

            if (file.isDirectory()) {
                importLevel2(projectName, level1Value, file);
            } else {
                importFile(projectName, level1Value, null, file);
            }
        });
    }

    private void importLevel2(String projectName, String level1Value, File level2) {

        String level2Value = level2.getName();

        Arrays.stream(level2.listFiles()).forEach(file -> {

            if (file.isDirectory()) {
                LOGGER.warn(() -> "Ignoring directory " + file);
            } else {
                importFile(projectName, level1Value, level2Value, file);
            }
        });
    }

    private void importFile(String projectName, String level1, String level2, File file) {

        AuthenticationHelper.runAsUser(username, password, tenant, () -> {

            TypedDocumentServiceClient<DemoModel> serviceClient =
                typeDefinitionServiceClient.getDocumentServiceClient().byClass(DemoModel.class);

            DemoModel model = serviceClient.createTypeInstance();

            model.setProjectName(projectName);
            model.setStructureLevel1(level1);
            model.setStructureLevel2(level2);
            model.setCustomerName(customer);

            model.setFileSystemCreationDate(ZonedDateTime.ofInstant(
                Instant.ofEpochMilli(file.lastModified()),
                ZoneId.systemDefault())
            );

            DemoModelType type = DemoModelType.OTHER;

            String fileName = file.getName();

            if (fileName.startsWith(DemoModelType.CONTRACT.name())) {
                type = DemoModelType.CONTRACT;
            } else if (fileName.startsWith(DemoModelType.INVOICE.name())) {
                type = DemoModelType.INVOICE;
            }

            model.setType(type);

            try (InputStream stream = Files.newInputStream(file.toPath())) {

                ContentUpload contentUpload = new ContentUpload(fileName, stream);
                Map<String, ContentUpload> contentElements = Map.of("content", contentUpload);


                serviceClient.create(new TypedDocumentInput<>(contentElements, model));
                System.out.println("Imported file " + fileName + " belonging to project " + projectName);

            } catch (IOException e) {
                LOGGER.exception(e);
            }
        });
    }
}

Step 3 - Perform a task with arveo

In the third step you will perform a specific task using arveo.

Creating a standardized project structure

Your project structure must have a certain structure to be successfully imported and/or archived in arveo.

Required project structure

skinparam Legend {
	BackgroundColor transparent
	BorderColor transparent
}

legend
Projects
|_ Project 1
  |_ Element 1.1
  |_ Element 1.2
|_ Project 2
  |_ Element 2.1
end legend

Here is an example of implementing this structure:

Examplary project structure

skinparam Legend {
	BackgroundColor transparent
	BorderColor transparent
}

legend
Projects
|_ Webclient
  |_ Orders
  |_ Invoices
|_ Server_maintenance
  |_ Invoices
  |_ Email_correspondence
end legend

In the last step of the tutorial you executed the Maven command

> java -jar .\demo-tool-0.0.1-SNAPSHOT.jar -p -u -t=integrationtest -c=Customer1 "-d=C:\test-data\"

Now you can replace the last element with your actual project folder:

> java -jar .\demo-tool-0.0.1-SNAPSHOT.jar -p -u -t=integrationtest -c=MedicalInsuranceAG "-d=C:\Projects"

After this command has been executed, you will see the report about imported files and folders:

Enter value for --password (The password used to log on to arveo):
Enter value for --username (The username used to log on to arveo):
...
Imported file Orders - Received_invoices.png belonging to project Webclient
Imported file Incoming_invoice.png belonging to project Webclient
...

The Types archetype

This archetype creates a rather small project. It consists of an arveo scenario and tests for that.

The maven coordinate of this archetype are:

    <groupId>de.eitco.ecr</groupId>
    <artifactId>ecr-types-archetype</artifactId>
    <version>{project-technical-version}</version>

To create an arveo scenario project use the maven archetype plugin:

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-types-archetype -DarchetypeVersion={project-technical-version}

Here, the variable {project-technical-version} must be replaced with the actual version, f.e. 5.0.1.

Also, you have to remember, that this will generate a project structure into a project folder. So before you type this command in your command line, make sure you have prepared a folder where your project structure is going to be and you have switched into this folder on your command line.

This will start a process that will ask for some parameters and then generate a maven project according to the parameters. The following parameters will be asked for:

groupId

The maven groupId of the new project

artifactId

The maven artifactId of the new project

version

The maven version of the new project

class-name-prefix

A prefix for the names of the generated classes.

scm-locator

The location in the eitco bitbucket server where the sources are (or will be). For a project located in https://git.eitco.de/scm/<project>/<repository>.git, this would be <project>/<repository>.git. This configures the maven release plugin. If this is omitted (or set to a wrong value) the project will work for now - however the release process will not work - unless it is fixed.

Some or all of these parameters can also be given on the commandline via -D. The process will not ask for parameters given by command line. So the command

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-types-archetype -DarchetypeVersion={project-technical-version} -DgroupId=my.group.id -DartifactId=my-artifact-id -Dversion=0.0.1-SNAPSHOT -Dclass-name-prefix=My -Dscm-locator=prj/repo.git

would not ask for any parameters and just create the project.

Overview of the generated project

The project generated by the archetype will consist of two modules:

implementation\types

This module contains your arveo scenario. An example type will be created with the name <class-name-prefix>Model. You can define more types here, but you will need to register them in register in <class-name-prefix>TypeRegistration. The chapter arveo type definitions describes how to define types.

test\system-test

This module contains tests for your scenario. These tests will be executed in the build. For that a complete arveo environment will be created, so you can add tests, that simply connect to arveo by the http client and can assume that your scenario is deployed.

This module can also be used to set up an arveo environment with your scenario on which you can then run tests manually. In the module run

mvn -Denv

to set up the environment. It will be torn down when you press <enter> in the console.

The full-featured archetype

This archetype creates a more complex project. It is based on the eitco commons archetype It will contain a simple web service, with an automatically generated client layer, based on eitco commons. The maven coordinate of this archetype are:

    <groupId>de.eitco.ecr</groupId>
    <artifactId>ecr-service-archetype</artifactId>
    <version>{project-technical-version}</version>

To create an arveo based service project use the maven archetype plugin:

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-service-archetype -DarchetypeVersion={project-technical-version}

This will start a process that will ask for some parameters and then generate a maven project according to the parameters. The following parameters will be asked for:

groupId

The maven groupId of the new project

artifactId

The maven artifactId of the new project

version

The maven version of the new project

class-name-prefix

A prefix for the names of the generated classes.

scm-locator

disable-optional-features

when set to false, it will create a little more complex project, including the audit service, the user-management enterprise service and jmeter samplers. If set to true (the default value) these features will be disabled but can be activated by uncommenting certain source locations.

Some or all of these parameters can also be given on the commandline via -D. The process will not ask for parameters given by command line. So the command

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-service-archetype -DarchetypeVersion={project-technical-version} -DgroupId=my.group.id -DartifactId=my-artifact-id -Dversion=0.0.1-SNAPSHOT -Dclass-name-prefix=My -Dscm-locator=prj/repo.git -Ddisable-optional-features=false

would not ask for any parameters and just create the project.

Overview of the generated project

The project generated by the archetype will consist of four modules:

documentation
implementation
packaging
test

The documentation module

This module holds a frame for an asciidoc based documentation of your project.

The implementation module

This module contains the actual source code. It is separated into five submodules.

common
- This submodule contains classes that are available on the server side as well as the client side.
generated
- This submodule contains modules that are automatically generated.
- Normally developers will not add code in these modules.
  - They are however relevant for building the project.
- The following submodules exist
  - serialization
    
    This submodule contains automatically generated serialization meta information.
  - client
    
    This submodule contains a few submodules itself, holding client side applications for:
    
    a java spring based http client api,
    
    a java spring based embedded client api,
    
    a typescript http client api.
  - jmeter-sampler
    
    This submodule generated jmeter samplers of the services api, usable in load tests.
server
- This submodule contains the server side implementation.
types
- this submodule contains the arveo based model. The generated interface named <class-name-prefix>Model describes an arveo type definition as will every interface you register in <class-name-prefix>TypeRegistration. The jar compiled by this module will be available on the server side and client side. Additionally, it needs to be in the class path of your arveo instance. For the system tests (se below) this is already taken care of.

The packaging module

This module contains delivery artifacts to deliver the service to or with different runtimes. This includes:

a stand-alone jar
a java web archive (war)
a helm chart for deployment in a kubernetes cluster

The test module

This module contains a system test module. When building this module maven will start a complete arveo system (containing all required services) with the newly generated service in the pre-integration-test-phase so that tests written here (like the generated example <class-name-prefix>ClientIT) may simply call the new service via the generated http-client (see above).

Working on the generated project

Most implementation will be done in the implementation\server module since this contains the server side code. You api and model will be defined in the implementation\common and implementation\types modules. The later will only be used for classes that are part of your arveo model and need to be in the classpath of arveo.

When testing your code, the test\system-test module comes in handy. As mentioned above, it will start a complete arveo system so that your tests can simply use the generated http client api to test your functionality. However, you can use this to manually test and debug your service, too. In case you simply need to start up the environment, in the test\system-test directory call:

mvn -Denv

If you want to debug your service call

mvn -Denv -Dservice.skip

This will start the environment except for your service. You can then start your service in debug mode from your IDE.

In both cases you can now start tests manually or call the service api directly to test your code.

Administration

Configure Database access

arveo uses the default spring datasource configuration for the JDBC datasource. The datasource must be configured as shown in the following example:

configuring the datasource

spring:
  datasource:
    url: "jdbc:postgresql://localhost:5432/postgres?currentSchema=arveo&ApplicationName=${spring.application.name}"
    driver-class-name: org.postgresql.Driver
    username: username
    password: password

Specifying the ApplicationName property is optional but can be helpful when analyzing database issues. The name of the Spring application will then be visible in Postgres query analytics.

The username and password should not be stored in the configuration files. Instead, they should be stored in Vault.

Advanced configuration properties can be found in the Spring boot documentation. To configure the connection pool, use the spring.datasource.hikari properties.

Configure Storage Locations

Content and type definitions

Only Documents can contain content elements. A Document in the repository can contain several content elements. For example, a document could contain a content element with the original content (like a TIFF image or a Word document) and a PDF rendition. Each content element has a contentName and some more properties like the media type. The contentName is a label that uniquely identifies a single content element contained in a Document. For example, a Document might contain two content elements that are identified by the contentNames 'content' and 'rendition'.

The contentNames are not only relevant for uniquely identifying a content element contained in a document, but serve as reference for further customization of the repository. The repository does accept configuration options that are directly related to contentNames and the Document type definitions define restrictions regarding the allowed contentNames.

Type definitions define which contentNames can be contained in the entities stored in the definition.

Each content element is stored in a storage profile, which defines the place where the actual content will be stored. The contentType parameter can be used to define what kind of content a content element can contain. When the media type is set to application/octet-stream, any kind of content can be used.

The name of a content element must start with a letter and can consist only of letters (upper- and lower-case), numbers and the _ character. More formally, the name must match the regular expression [a-zA-Z][a-zA-Z0-9_]*.

Types of content elements

It is specified in the type definition, which content elements this type definition may have.

Usually, the content elements of the entities are stored in a JSON field in the database which contains the storage-ID and additional metadata like size, media type and a hash. The actual content data is not stored in the JSON field. If required, a content element can also be stored in a separate field of type text. The separate field will contain only the storage-ID but no additional metadata. Additional metadata for content elements using separate fields have to be handled by the client application, for example by storing them in a custom metadata attribute.

The following example is an object of type Document, for which two content elements are defined: "content" and "LARGE_CONTENT". In this example, "separateField = true" means a separate column in the database, otherwise it is written in the corresponding json field of the database. The name of a separate column in the database is derived from the name of the content element.

Example of a Document with two content elements

@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content", separateField = true)
@ContentElement(name = "LARGE_CONTENT", separateField = true)
public interface TwoContentsDocument {

    @SystemProperty(SystemPropertyName.ID)
    DocumentId getId();

    @SystemProperty(SystemPropertyName.CONTENT)
    Map<String, ContentInformation> getContentInformation();

    String getName();

    void setName(String name);
}

Using the contentType attribute of the @ContentElement annotation one can define the required content type for a content element. The content type application/octet-stream is used as a wildcard type for any type of content. For example, if the value of the contentType attribute is set to application/pdf, only PDF files can be stored in the content element.

It is possible to define the content type of a new content element when it is uploaded. The server will trust this information, so the client is responsible to send the correct content type. If the client does not define the content type, the server will automatically detect the content type of the uploaded binary data.

The default content element

If a type definition of type DOCUMENT does not contain any @ContentElement annotations, the server will automatically assign a content element with the name content to it. This content element’s metadata will be stored in the JSON field of the type definition and it accepts any kind of content type.

The ContentElement annotation

The following Table contains an Overview of the available attributes of the @ContentElement annotation.

Table 8. Attributes of the ContentElement annotation
Attribute	Default value	Explanation
name		The name of the content element. This attribute is mandatory.
profile		The name of the storage profile used to store the content element. This attribute is optional.
contentType	application/octet-stream	The type of content supported by the content element.
separateField	false	Whether to store only the content ID and no additional metadata in a separate database field.
fulltextExtraction	false	If true, the fulltext content of the content element will be extracted and stored in the NOSQL database.

Storage profiles

A StorageProfile defines on which storage the content elements are saved. Access to the storage backends (like filesystem or S3) is handled by storage plugins.

A StoragePlugin is defined in the StorageProfile, which is used to access the connected storage. The same plugin can be used in several StorageProfiles. Each StorageProfile can have a different set of parameters (access data, URls, …) for the plugin.

StorageProfile definition

ecr:
  server:
    storage:
      profiles:
        fileSystemProfile: (1)
          defaultProfile: true (4)
          pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin (2)
          pluginSettings: (3)
            storagePath: /storage
        s3Profile: (1)
          pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin (2)
          pluginSettings: (3)
            pathStyleAccessEnabled: true
            serviceEndpoint: "http://localhost:49999"
            region: us-west-2
            accessKey: myaccesskey
            secretAccessKey: mysecretaccesskey
            bucket: testbucket

1	profile name
2	class name of the plugin
3	plugin specific configuration data like the path for the filesystem plugin or the bucket for the S3 plugin
4	defines this profile as the default profile (see Mapping content elements to storage profiles)

Each profile is identified by name and defines the storage plugin to use. Plugin-specific settings can be configured in the pluginSettings map. So the plugin class name determines the storage technology and the plugin settings.

If a content element has been saved using the named StoragePlugin, the plugin defined in the profile will return a contentID, with which the stored data can be retrieved later. This id, which is usually of type String, is saved with the document. It is a task of the storage plugin to implement, which contents this id has. Usually it is a UUID, but it may also be a text string.

A plugin is assigned to each profile based on the fully qualified class name. Any name-value pairs can be specified for the configuration of the plug-in. The profiles are identified by their name.

Using aliases for storage profiles

It is possible to assign aliases to storage profile names. This might be required when storage profiles are mapped to content elements by configuration as described below. Assigning aliases can be done in the configuration by defining alias: profileName entries as shown below:

Storage profile aliases

ecr:
  server:
    storage:
      profile-aliases:
        alias1: encryptedProfile
        another_alias: encryptedProfile

It is possible to define more than one alias for a storage profile. Aliases are resolved before a content element is saved. The resulting ContentId will contain the resolved profile, not the alias name.

The bucket selector plugin does not support aliases when selection rules are evaluated.

Mapping content elements to storage profiles

There are two ways to map a specific content element to a storage profile.

Mapping by code

To define the mapping of the content elements to storage profiles in the application code, the storage profile name can be set in the @ContentElement annotation using the profile attribute.

Defining the storage profile of a content element in the code

@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content", profile = "fileSystemProfile")
public interface MyDocument {

}

The example above shows a document type with a single named content element that will be stored in a storage profile called fileSystemProfile.

Mapping by configuration

If the mapping should be controlled by the configuration and not be defined in the code, storage profiles with auto- matchable names must be used. The matching is based on the name of the type definition (in snake-case) and the name of the content element separated by -.

The following type definition is used as an example in the following explanations. It uses one content element:

@Type(ObjectType.DOCUMENT)
@ContentElement(name = "rendition")
public interface MyDocument {

}

A matching profile for the content element named rendition of the interface MyDocument would be selected using the following steps:

Check if there is a profile called my_document-rendition. If so, use it.
If not, check if there is a profile called my_document. If so, use it.
If not, check if there is a default storage profile. If so, use it.
If none of the steps above succeeded, an exception is thrown.

Examples

The following example shows the simplest possible configuration. The type definition does not contain any content element. It implicitly uses the default content element named content. The content element will be stored in a storage profile called my_document, or, if no such profile exists, in the default storage profile.

@Type(ContentType.DOCUMENT)
public interface MyDocument {

}

The next example shows the same type definition, but with an annotation that defines which storage profile to use.

@Type(ContentType.DOCUMENT)
@ContentElement(name = ContentElement.CONTENT, profile="fileSystemProfile")
public interface MyDocument {

}

The next example shows a type definition that contains two content elements. The "rendition" content element will support only PDF documents. The PDFs contained in the rendition content element will be stored in an S3 storage. The content in the other element will either be stored in a profile called my_document-content, in a profile called my_document or, if neither of those profiles exists, in the default profile.

@Type(ContentType.DOCUMENT)
@ContentElement(name="content")
@ContentElement(name="rendition", contentType="application/pdf", storageProfile="s3Profile")
public interface MyDocument {

}

Plugin configuration

The service uses a plug-in interface for connection to the specific storage provider. The following plugins are currently available:

File system

Class name: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin.

The FileSystemPlugin offers storage of the data as files in the file system.

Table 9. Configuration parameters of the file system
parameter	meaning
storagePath	Path to the directory that is used to store the files

AWS, NetAPP or EMC Elastic Cloud Storage

Class name: de.eitco.ecr.storage.plugin.s3.S3Plugin.

The S3 plug-in stores data in an Amazon S3 compatible storage.

If arveo has no permissions to create buckets, then the administrator has to create the buckets manually.

Table 10. Configuration parameter of the S3 plugin
Parameter	Meaning	Default value
pathStyleAccessEnabled	Configures the client to use path-style access for all requests. Amazon S3 supports virtual-hosted-style and path-style access in all regions. The path-style syntax, however, requires that you use the region-specific endpoint when attempting to access a bucket	false
serviceEndpoint	The URL to the S3 endpoint to be used by the plugin
region	The region for access to AWS
accessKey	AWS Access Key
secretAccessKey	AWS Secret Access Key
bucket	The name of the S3 bucket to be created by the plugin. The name can only contain lowercase letters.
signer	Sets the name of the signature algorithm to use for signing requests made by this client. If not set, the default configuration of the Amazon S3 SDK will be used.
proxyhost	The optional proxy host used by the client when connecting to the S3 storage.
proxyprotocol	The protocol (HTTP or HTTPS) used to connect to the proxy.
proxyport	The port used by the client to connect to the proxy.
streambuffersize	Size of the send- and receive-buffers in bytes.	32768
uploadpresignedurl	If set to true, the client will use pre-signed URL requests to communicate with the S3 storage.	false
acceleratemode	Configures the client to use S3 accelerate endpoint for all requests.	false
maxconnection	The maximum number of allowed open HTTP connections.	-1 (no limit)
maxErrorRetries	The maximum number of retries for failed requests.	-1 (no retries)
baseDelay	The base delay in milliseconds for the retry policy.	-1 (no delay)
maxBackoffTime	The maximum backoff time in milliseconds for the retry policy.	-1 (no maximum backoff time)
backoffStrategy	The backoff strategy used by the retry policy.
retentionEnabled	Enables the use of S3 object locks for object retention.	false
retentionMode	Specifies the protection level of retention object locks. Can be `COMPLIANCE` or `GOVERNANCE`.	`GOVERNANCE`

Configuring the retry policy of the S3 plugin

The Amazon S3 SDK used to connect to a S3 compatible storage supports different ways to retry failed requests. By default, a retry policy using jitter and 3 retries is used. To configure a custom retry policy, all three parameters baseDelay, maxBackoffTime and backoffStrategy have to be configured. The backoffStrategy parameter must be set to one of the following values:

FULL_JITTER
EQUAL_JITTER
EXPONENTIAL

The Amazon documentation contains an explanation of the different strategies.

Retention

The S3 plugin supports the usage of S3 object locks to set a retention time and litigation hold status on content elements stored in the S3 compatible storage. To enable the feature, set the parameter retentionEnabled to true.

When the retention support is enabled, the bucket used by the storage profile must be created manually. The S3 Object Locks option must be enabled for the bucket.

The S3 plugin uses the governance retention mode by default, which means, that retention protected objects can be deleted by or overwritten by any user of the AWS account with the required privileges. When the compliance retention mode is used, no user (not even the root administrator of the S3 account) is able to delete or overwrite retention protected objects. To configure this behavior, set the property retentionMode to GOVERNANCE or COMPLIANCE. More information about object locks can be found in the AWS documentation.

When the COMPLIANCE retention mode is used, it is impossible to delete objects from the S3 storage account before the end of the retention interval is reached.

Azure blob storage

Class name: de.eitco.ecr.storage.plugin.azureblob.AzureBlobStoragePlugin

The Azure blob storage plugin can be used to connect to a storage account in Microsoft Azure.

Table 11. Configuration parameter of the azure blob storage plugin
Parameter	Meaning	Default value
connectionString	The connection string used to connect to the storage account. The access string can be obtained from the azure portal.
containerName	The name of the container in the storage account that will contain the data of the storage profile.
timeoutMillis	The timeout in milliseconds for requests to Azure.	5000
retentionSupport	Enables usage of the immutability policy feature of Azure.	false
policyMode	Sets the protection level of the immutability policies. Can be `LOCKED` or `UNLOCKED`.	`UNLOCKED`

Additional parameters contained in the plugin configuration will be passed on to the Configuration used for the Azure SDK.

Retention

The Azure blob storage plugin supports the immutability policy feature of Azure blob storage. Using this feature enables an additional security level for retention protected content elements. If a content element is retention protected or in a litigation hold, it will not be possible to delete it using the Azure management interface or the Azure SDK.

To enable the retention support, the parameter retentionSupport must be set to true.

When the retention support is enabled, the container used by the storage profile must be created manually in Azure. The setting version-level immutability support must be enabled when the container is created. To be able to enable the version-level immutability support, the storage account must support versioning for blobs. More information can be found in the Azure documentation.

The plugin creates unlocked immutability policies by default. Unlocked policies can be altered by Azure users with the required privileges. Locked immutability policies can neither be deleted nor can the expiry time be shortened. Prolonging the expiry time (and by this, the retention period), is still possible. Note that even the administrator of the storage account is not able to delete objects with a locked immutability policy. To configure the policy mode, set the parameter policyMode to LOCKED or UNLOCKED.

When the policyMode is set to LOCKED, it is not possible to delete retention protected objects from the storage account before the end of the retention interval is reached.

BucketOrganizer

Class name: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin.

The BucketOrganizer is not specific for a specific storage technology or storage interface but delegates storage requests to other storage plugins.The selection of the target plugin depends on the retention information of the document that contains the content element to be stored. The selection criteria that are used to select the target plugin can be configured in terms of a list of bucket selection rules.

The relevant retention information of the document is defined by the values of the system fields RETENTION_DATE and LITIGATION_HOLD. This value pair is matched against the bucket selection rules.The matching process starts with the first rule and continues to the next rule if the rule does not match the value pair.The matching process ends at the first rule that matches the value pair.The storage profile named in this rule will be used to store the content. Each bucket selection rule consists of three parts that are separated by the pipe (|) symbol.

1. retention date match expression

The retention date match expression is usually a time interval that begins at some calendar day and extends to some later calendar day. The notation for the interval is inspired by ISO 8601 and may read like this 2021-01-01+01:00—2022-01-01+01:00. The general format is begin_date—end_date, that is both dates are separated by "--". A retention date matches the expression if begin date ⇐ retention date < end date. The begin and end dates are specified as YYYY-MM-DD followed by a time zone offset as +hh:mm or -hh:mm It is possible to define open intervals by specifying one of the boundary dates as UNBOUNDED. Retention dates may be NULL if the retention date has not (yet) been set on the document. A NULL retention date will not match any interval specified in a match rule. For this reason the retention date match expression may be specified to be NULL to match NULL retention dates. A retention date match expression can also be specified to be * if the rule should always match.

2. litigation hold match expression

The litigation hold match expression can be one of these literals: true, false, *. While the literal * will always match, the other literals will match the denoted value only.

3. target storage profile name

The name of the target storage profile to be used if both expressions match the corresponding system field values

Configuration parameters

Table 12. Configuration parameters of BucketOrganizer plugin
Parameter	Meaning
bucketSelectionRules	A list of bucket selection rules

Example

storage:
  profiles:
    bucketProfile: (1)
      pluginClassName: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin (2)
      pluginSettings:
        bucketSelectionRules: (3)
          - "*|true|fsProfileLitigationHold"  (4) (5)
          - "NULL|false|fsProfileForever" (4)
          - "2021-01-01+01:00--2022-01-01+01:00|false|fsProfile2021" (4)
          - "2022-01-01+01:00--2023-01-01+01:00|false|fsProfile2022" (4)
          - "2023-01-01+01:00--2024-01-01+01:00|false|fsProfile2023" (4)
          - "2024-01-01+01:00--2025-01-01+01:00|false|fsProfile2024" (4)
          - "2025-01-01+01:00--2026-01-01+01:00|false|fsProfile2025" (4)
          - "2026-01-01+01:00--2027-01-01+01:00|false|fsProfile2026" (4)
          - "2027-01-01+01:00--2028-01-01+01:00|false|fsProfile2027" (4)
          - "2028-01-01+01:00--2029-01-01+01:00|false|fsProfile2028" (4)
          - "2029-01-01+01:00--2030-01-01+01:00|false|fsProfile2029" (4)
          - "2030-01-01+01:00--2031-01-01+01:00|false|fsProfile2030" (4)
          - "*|*|fsProfileAnotherEra" (4)
    fsProfileLitigationHold: (5)
      pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/litigationHold
    fsProfileForever:
      pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/forever
    fsProfile2021:
      pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/2021
 #...
    fsProfile2030:
      pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/2030
    fsProfileAnotherEra:
      pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/anotherEra

1	profile name;
2	type of plugin, so its class name;
3	rules list.
4	a bucket selection rule, consisting of retention date match expression, litigation hold match expression and target storage profile name.
5	the referenced profile name.

Storage profile templates

To reduce the number of required entries in the list of bucket selection rules, storage profile templates can be used. A storage profile template consists of a name template with placeholders, a specific time range and the regular configuration parameters like the class name of the storage profile.The <year> placeholder can be used as a variable for the current year.

Storage profile templates are configured in a separate section as shown below:

Storage profile template configuration

ecr:
  server:
    storage:
      profiles:
        bucketProfile:
          pluginClassName: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin
          pluginSettings:
            bucketSelectionRules:
              - "*|true|fsProfileLitigationHold"
              - "NULL|false|fsProfileForever"
              - "<year>-01-01+01:00|false|fsProfile<year>|2021--2030" (1)
      profile-templates:
        - nameTemplate: "fsProfile<year>" (2)
          genericTimeRange: "2021--2029" (3)
          pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
          pluginSettings:
            storagePath: ${storage.base.directory}/storage/<year> (4)

1	A bucket selection rule using a profile template with the year placeholder for the years between 2021 and 2030.
2	A name template that will create profiles for the years 2021 to 2029.
3	Defines the time range used to create profiles based on the template
4	The year placeholder can be used in the configuration properties of the plugin.

Writing a custom storage plugin

As mentioned above, arveo uses a plugin interface for the connection to the storage backends. This section describes how to write a new storage plugin.

All classes and interfaces required to implement a custom plugin are contained in the dependency

<dependency>
    <groupId>de.eitco.ecr</groupId>
    <artifactId>ecr-server</artifactId>
    <version>13.0.4</version>
    <scope>provided</scope>
</dependency>

A storage plugin must implement the interface de.eitco.ecr.server.storage.StoragePlugin. There are two abstract implementations that can be extended to simplify the implementation:

AbstractStoragePlugin: Provides several methods that make it easier to get plugin configuration settings.
AbstractSimplifiedStoragePlugin: The superclass of all plugins that do not need retention support

In addition to the interface to implement, there are some guidelines to respect when writing a custom storage plugin:

The plugin must provide a default no argument constructor because it will be instantiated using reflection.
The plugin can use dependency injection, but because of the need for a default constructor, only field injection using @Autowired is possible.
There will be one instance of the plugin for each storage profile configured to use the plugin, so the plugin must be thread-safe.

Configuration settings

The StoragePlugin interface contains a method called configure, which will be called once for each plugin instance. It is used to process the generic parameter values that might be required to configure the plugin. For example, the parameters might contain a path to a file system directory or credentials for a remote storage system. Because storage plugins can be configured in profile templates, it might be necessary to replace placeholders configured in the template. The class AbstractStoragePlugin already contains helper methods like getMandatoryProperty that take care of these replacements. The configure method is expected to return the actual configuration with all replacements that is used by this plugin instance. The returned configuration settings are used by the health checks.

Using the custom storage plugin

To use the custom plugin, it is enough to add its classes to the classpath of the repository service. The plugin can then be used for a storage profile by specifying it’s qualified class name in the pluginClassName parameter. To add the plugin’s class to the classpath, use the -Dloader.path=<path> argument to start the service. The argument must point to a directory containing the required jar files.

Renditions

Renditions of content elements, for example a PDF rendition of an image, can be created automatically. To create a rendition, the @Rendition annotation can be used as shown in the following example.

A document type definition with a rendition

@Type(ObjectType.DOCUMENT)
@ContentElement(name = "original", separateField = true) (1)
@Rendition(name = "rendition", sourceElement = "original", contentType = MediaType.APPLICATION_PDF_VALUE, separateField = true) (2)
@OverwriteAllowed
public interface DocumentWithRendition {

    String getName();
    void setName(String name);

    @ContentType(contentElement = "original") (3)
    String getContentType();
    void setContentType(String contentType);

    @SystemProperty(SystemPropertyName.RENDITION_STATUS) (4)
    Map<String, RenditionStatusInformation> getRenditionStatus();
}

1	The content element containing the original content
2	The rendition content element to create automatically
3	The content type of the original content
4	A getter for the current status of the renditions of the document

The above example shows a document type with one content element and one rendition. Both the original and the rendition element are stored in separate fields. This is possible but not required for renditions. When the original content is stored in a separate field, meta information like the content type is not stored in the database. As the content type of the original content is required to create a rendition, it is recommended to define an attribute of type 'String' that contains the original content’s type. The attribute must be annotated with @ContentType to bind its value to the original content element and return a valid mime type string like "image/jpeg". If no such attribute is present, the system will try to detect the content type automatically.

The current status of the renditions can be retrieved as shown in the example above. The returned map contains a RenditionStatusInformation instance for each rendition content element of the document. The status information contains a status value and the number of times the system tried to create the rendition, if available. The status of a rendition can be one of the following values:

Table 13. Rendition status values
parameter	meaning
AVAILABLE	The rendition was created successfully (or was uploaded by a client) and is available.
PENDING	The rendition is not yet available but is expected to be available in the future.
FAILED	Creating the rendition has failed permanently.
EMPTY	The rendition is not available because the source content element does not exist.
RESET	Creating the rendition has failed and the status was manually reset (see error handling).

The @Rendition annotation accepts the following parameters:

Table 14. Parameters of `@Rendition`
parameter	meaning
name	The name of the rendition content element
sourceElement	The name of the content element to create a rendition of
contentType	The type of the rendition to create (a mime type string like "application/pdf")
profile	The name of the profile used to store the rendition content element (optional)
separateField	(optional) whether to store the renditions’s meta data in a separate field or in the JSON content field

Renditions are created asynchronously. When a document is created or updated, a message will be posted to a queue in ActiveMQ. The messages are processed by event listeners in the repository service. Depending on the current load it might take some time until the rendition is available.

The system will not try to create a rendition when the rendition content element is written by the client.

The actual rendering will be done by the Document Conversion Service. Which conversions are supported, depends on the plugins available on the classpath of the service.

Error handling

When the creation of a rendition fails, the system will re-try to create the rendition. The number of re-tries can be configured, the default is three (see configuration properties). When all retries have failed, the rendition message will be added to a dead letter queue and the status field of the rendition will be set to -1 (FAILED). For this to work, the message queue in ActiveMQ must be configured to use an individual dead letter queue as described in the ActiveMQ documentation.

<policyEntry queue="ecr-queue-create-renditions">
    <deadLetterStrategy>
        <individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true"/>
    </deadLetterStrategy>
</policyEntry>

Reset status of failed renditions

The status of failed renditions can be set to RESET (-2) either by using the API method de.eitco.ecr.sdk.document.TypedDocumentServiceClient.resetFailedRenditionStatus or simply by setting the value in the database directly. A system job polls the database and will enqueue new rendition messages in ActiveMQ to re-try to create the renditions. The interval in which the job polls the database can be configured using the parameter retry-renditions.cron-expression (see configuration properties).

Dynamically skipping renditions

There are cases where the decision, whether to create a rendition for a content element, can only be made at run-time. For cases like this a type can provide a method implementing that decision. This method is marked by the annotation @RenditionCreationCondition. Only one method of a type may have this annotation. The method

must have the return type boolean, java.lang.Boolean or kotlin.Boolean
- In case it is java.lang.Boolean it may not return null
must not be abstract
- should the defining class be an interface this means that it is either a static or a default-method
  - note that - should the type be defined in kotlin and the method not be static - this means it has to be compiled with -Xjvm-default=all or -Xjvm-default=all-compatibility
can have up to two parameters of type RenditionInfo
- The first representing the source to render
- And the second representing the target to render to
- if only one parameter is given it is assumed to be the source

If such a method exists, arveo evaluates it before posting rendition messages. If the method returns false the message is not posted. Such a method may be present on types that are not Documents, but will not have any effect. This might be helpful in scenarios where there are complex inheritance structures.

Example 1

Let’s assume a scenario where we have a document with a content element "content" that can have an arbitrary type. It is supposed to be a multi-page document, so in most cases it is a pdf-file. However, there are cases where a document is created with the content element being an MS-word document and in some cases it is just a single page image. Even multi-page tiffs are possible and in some seldom cases the content is unclear and simply "application/octet-stream".

In this scenario there is a web viewer that is supposed to show the documents content. For the viewer, pdf files are no problem whatsoever. It is fully capable to view the images also, except multiple-page tiff files that pose a problem. It is unable to view ms-office files. And for "application/octet-stream" it can only provide a download link.

Thus is decided that the backend needs to create a pdf rendition for ms-office formats and tiff files. This could be implemented with the following class:

@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content") (1)
@Rendition(name = "rendition", sourceElement = "content", contentType = MediaType.APPLICATION_PDF_VALUE) (2)
public interface DocumentWithDynamicRenditionDecision {

    String getName();
    void setName(String name);

    @RenditionCreationCondition
    default boolean decideRendition( (3)
        RenditionInfo source,
        RenditionInfo target (4)
    ) {

        if (source.getMediaType().equals("application/msword")) { (5)

            return true;
        }

        if (source.getMediaType().equals("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) {

            return true;
        }

        if (source.getMediaType().equals("image/tiff")) {

            return true;
        }

        return false;
    }

}

1	A content element with the name "content" is defined.
2	A pdf rendition of that element is defined with the name "rendition".
3	A default method "decideRendition" is created and marked with `@RenditionCreationCondition`. Note that: Its return type is `boolean`. as a `default` method it is not abstract it hast two parameters of the type `RenditionInfo` Thus, it is applicable.
4	Note that the second parameter is unused. It could be omitted.
5	The implementation of the method is pretty simple. It checks whether the mime-type of the source element is one that we want to create a rendition for - ms-word files (old and new) or tif. If so, it returns `true` indicating that the arveo should create a rendition for the element. Otherwise, it returns `false` so that no rendition is created.

Example 2

Assume the application described in example 1. Assume further that at one point it becomes necessary to migrate some older documents to this application. An importer is written, however most of the imports fail. This is due to the fact that many of the documents are in an older msword format that the current render engine is incapable of transforming into pdf. So it is decided to not create a rendition for those elements and simply provide a download link in the applications' client.

This poses a problem in the decideRendition() method: Ms word documents that are created from the old source still should have created a rendition for. Thus, it is not possible to decide whether to render from the source type alone. A simple solution for this could be to add a new property create_rendition to the type. This nullable boolean could be set when created to imply whether to create a rendition for the content element or not. A value of null would activate the behaviour already implemented:

@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content")
@Rendition(name = "rendition", sourceElement = "content", contentType = MediaType.APPLICATION_PDF_VALUE)
public interface DynamicRenditionExample2 {

    String getName();
    void setName(String name);

    (1)
    Boolean getCreateRendition();
    void setCreateRendition(Boolean value);

    @RenditionCreationCondition
    default boolean decideRendition(
        RenditionInfo source (2)
    ) {

        if (getCreateRendition() != null) { (3)

            return getCreateRendition();
        }

        (4)
        if (source.getMediaType().equals("application/msword")) {

            return true;
        }

        if (source.getMediaType().equals("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) {

            return true;
        }

        if (source.getMediaType().equals("image/tiff")) {

            return true;
        }

        return false;
    }

}

1	The property `create_rendition` is defined. Note that with the type `java.lang.Boolean` it is nullable.
2	Note that in this case the unused parameter is omitted. This is the only change in the method signature.
3	At the start of the method, it is checked, whether the new property is set, simply by calling the getter. If so the value is returned.
4	Otherwise, the code from example 1 is executed.

Text renditions

The rendition feature can be used to store extracted fulltext data as content elements of a document. To achieve this, simply add a rendition content element with the content type text/plain.

The content types of the source content element that can be used for text-extraction depend on the available extraction plugins of the Document Conversion Service.

To be able to use the extracted fulltext data for searches, use the fulltext extraction feature for the SOLR integration. See SOLR for details. The service will automatically use available text renditions when the document is transferred to SOLR. If no text rendition is available, the text extraction will be performed on the fly.

Configure retention container

Configure storage containers for yearly retention periods

Once you have deployed your new data type with enabled retention, all your data is stored in your default storage profile and has a default retention of 10 years. The following example will define separate buckets containing all your objects with a retention period within one year. Configure the buckets in the ecr-service.yaml of your config service in the section arveo:storage:profiles: You can configure a new storage profile with an unlimited number of data buckets for your content.

Mandatory properties of your new bucket profile:

Property

Description

pluginClassName:

must always be "de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin"

pluginSettings:bucketSelectionRules:

array of rules containing filter (string)|litigationHold (boolean)|storageProfile (string)

filter (string): Must be * to match all objects or a valid zoned date time range like 2031-01-01+01:00—2032-01-01+01:00, the bucket selection is based on the document type property RETENTION_DATE.

litigationHold (boolean) true= is a litigationHold bucket, false for all other regular retention buckets

storageProfile (string): a valid storage profile name (arveo:storage:profiles:).

Find more details about selection rules in Retention Bucket Selection Rules

If the configuration is not correct you will find more information in the startup log and will most likely find a MissingConfigurationException

Defining storage containers in arveo-service.yaml and your storage system is an ongoing task for your operating team. Eitco will try to create the buckets or subdirectory on your storage system but can also use already existing ones.

ecr-service.yaml example snippet for content definitions and storages. Adapt your ecr-service.yaml and replace rules, profile names and cloud storage url, etc. with your values.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

  arveo:
  server:
    content:
      default-definition:
        mediaType: "application/octet-stream"
        storageProfile: bucketProfile (1)
      definitions:
        content:
          mediaType: "application/octet-stream"
          storageProfile: bucketProfile (1)
        rendition:
          mediaType: "application/octet-stream"
          storageProfile: bucketProfile (1)
        documentTypeA: (2)
          mediaType: "application/octet-stream"
          storageProfile: storageProfileDocumentTypeA
        documentTypeB: (2)
          mediaType: "application/octet-stream"
          storageProfile: storageProfileDocumentTypeB

1	Assign your bucket storage profile to the content types with a retention period.
2	The example provides two more storage profiles for other document types (storageProfileDocumentTypeA, storageProfileDocumentTypeB). To write all content of a document type to a storage profile you must assign this content type to the document type. The upload API will only accept content of this type for the document type.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

      storage:
      profiles:
        bucketProfile:
          pluginClassName: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin
          pluginSettings:
            bucketSelectionRules:
              - "*|true|storageProfileRetentionLitigationHold"
              - "NULL|false|storageProfileRetentionNone"
              - "2031-01-01+01:00--2032-01-01+01:00|false|storageProfileRetention2031"
              - "2032-01-01+01:00--2033-01-01+01:00|false|storageProfileRetention2032"
              - "2033-01-01+01:00--2034-01-01+01:00|false|storageProfileRetention2033"
              - "2034-01-01+01:00--2035-01-01+01:00|false|storageProfileRetention2034"
              - "2035-01-01+01:00--2036-01-01+01:00|false|storageProfileRetention2035"
              - "2036-01-01+01:00--2037-01-01+01:00|false|storageProfileRetention2036"
              - "2037-01-01+01:00--2038-01-01+01:00|false|storageProfileRetention2037"
              - "2038-01-01+01:00--2039-01-01+01:00|false|storageProfileRetention2038"
              - "2039-01-01+01:00--2030-01-01+01:00|false|storageProfileRetention2039"
              - "2030-01-01+01:00--2031-01-01+01:00|false|storageProfileRetention2030"
              - "2031-01-01+01:00--2032-01-01+01:00|false|storageProfileRetention2041"
              - "*|*|storageProfileRetention2042Plus"
        storageProfileRetentionLitigationHold: (1)
          pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: LitigationHold
        storageProfileRetentionNone: (2)
          pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: NoRetention
        storageProfileRetention2032Plus: (3)
          pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: RetentionPeriod2032Plus
        storageProfileRetention2031: (4)
          pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: RetentionPeriod2031
        storageProfileRetention2032:
          pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: RetentionPeriod2032
             ... (5)
        storageProfileDocumentTypeA: (6)
          pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: DocumentTypeA
        storageProfileDocumentTypeB: (6)
          pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>" (7)
            region: eu (7)
            accessKey: <myaccesskey> (7)
            secretAccessKey: <mysecret> (7)
            bucket: DocumentTypeB

1	always configure a litigation hold bucket
2	you should also configure a data that has no retention … just in case
3	fall back bucket for all content with retention period past 2041. You can leave this bucket and get an exception if you store content which cannot be assigned to a bucket
4	One buckets for each year
5	Configure as many buckets as needed for your content
6	Two more storage profiles for other document types without retention. See content types without retention above arveo:server:content:DocumentTypeA/B
7	replace the placeholders with your S3 url, region, access key and access secret.

For more details on storage profiles and content types see Content types

If you want to use directories instead of buckets you can configure file system storage profiles and assign a sub directory (File system storage profile configuration)


  1
2
3
4
5
6
7
8
9
10
11
12

          storageProfileLitigationHold:
          pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
          pluginSettings:
            storagePath: ${storage.base.directory}/storage/litigationHold
        storageProfileRetentionNone:
          pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
          pluginSettings:
            storagePath: ${storage.base.directory}/storage/retentionNone
        storageProfile2031:
          pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
          pluginSettings:
            storagePath: ${storage.base.directory}/storage/2031

Configure Encryption

arveo provides a transparent encryption for data stored in the profiles. The encryption can be configured individually for each storage profile.

Overview

Encrypting and decrypting is performed by configurable encryption providers. Each provider is identified by a unique name. The available providers are described below.

The following tables give an overview of encryption settings for a storage profile.

Table 15. Encryption settings
Parameter	Description	Default value
enabled	enables or disables the encryption	false
providerName	name of the encryption provider to use	commons-aes

To make sure all content of a specific type definition is encrypted, make sure to limit the content types supported by the type definition to types that use an encrypting storage profile.

When the BucketOrganizerPlugin is used, the encryption settings must be configured for each plugin referenced by the bucket selection rules. Configuring the encryption for the BucketOrganizerPlugin itself is not supported.

Commons AES provider

The commons-aes provider supports AES encryption with 256bit keys. When a new content element is created in an encrypted profile, the provider generates a random cipher key for the element. The key is encrypted using a master password that is configured in the profile’s encryption settings. It is then stored in the database, which creates an identifier for the key. The keys are stored in individual tables for each profile called ecr_keys_<profileName>. After that, the content is encrypted and stored using the profile’s storage plugin. The key-id is stored in a header together with the encrypted data. When the data is read, the cipher key is loaded from the database using the key-id read from the header. The key is decrypted using the master password and used to decrypt the data read by the profile’s storage plugin.

When the database table containing the keys or the master password is lost, it is impossible to restore the data stored in the profile. When the master password for a profile is changed, it is required to re-encrypt all stored keys for the profile.

In the future, there will be a way to re-encrypt keys. For now, this issue hasn’t been implemented yet.

There is a second database table for each profile called ecr_keys_assoc_<profileName>. This table contains mappings of key IDs to content element IDs and is intended for system administration purposes. The encryption feature is configured as shown in the following example:

Configuration of encryption for a storage profile

storage:
  profiles:
    encryptedProfile:
      pluginClassName: "de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin"
      pluginSettings:
        storagePath: "/storage/encrypted"
      encryptionSettings:
        enabled: true
        providerName: "commons-aes"
        providerSettings:
          password: "changeme"

The following tables give an overview of encryption settings for the commons-aes provider:

Table 16. Provider specific settings for 'commons-aes'
Parameter	Description	Default value
password	the master password used to encrypt the cipher keys
rngAlgorithm	the algorithm used to generate secure random data	Platform specific. See docs for SecureRandom.getInstanceStrong(). If not specified, the most secure algorithm available will be used

Vault AES provider

The vault-aes encryption provider uses the transit secrets engine of Hashicorp Vault to encrypt and decrypt a generated random cipher key. The cipher key is generated using a configurable random data generation algorithm and then used to encrypt the content with AES as described below. The cipher key is then encrypted by Vault and stored in a header together with the encrypted content data. When the data is decrypted, the encrypted cipher key is read from the header, decrypted using Vault and then used to decrypt the content. The advantage in comparison to the commons-aes provider is that no master key and no stored encryption keys in the database are required. The keys required to decrypt the cipher keys (and though the data, too) are securely stored in Vault and are never known to arveo.

When the Vault instance containing the keyring used to encrypt the random cipher keys is lost, it is impossible to decrypt the content data!

The following tables give an overview of encryption settings for the vault-aes provider:

Table 17. Provider specific settings for 'commons-aes'
Parameter	Description	Default value
keyring	name of the key ring contained in Vault’s transit secrets engine used to encrypt the cipher keys
transitEnginePath	(optional) path of the transit engine. If null, the default path will be used.
rngAlgorithm	the algorithm used to generate secure random data	Platform specific. See docs for SecureRandom.getInstanceStrong(). If not specified, the most secure algorithm available will be used

The following example shows a storage profile configuration using the vault-aes encryption provider.

example configuration for the vault-aes provider

vaultEncryptedProfile:
  pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
  pluginSettings:
    storagePath: /storage/vault-encrypted
  encryptionSettings:
    enabled: true
    providerName: vault-aes
    providerSettings:
      keyring: arveo

Implementation details of AES encryption

The following chapter contains information about the implementation details of the AES encryption used by arveo.

Header

The encryption library is designed to encrypt data in such a way that it can be stored permanently in encrypted form and possibly only decrypted after a long time. In order to guarantee decryption, all data required for this (except the key, of course) are stored in a header together with the encrypted data. Using the data from the header, the library can thus obtain, for example, the algorithm used and the data for key derivation, and only needs the password or the derived key for decryption.

AES

The library uses AES according to the recommendation of the Federal Office for Information Security of March 2020:

Operating mode: Galois/Counter-Mode
Hash function for key derivation: Argon2

The library allows the configuration of different parameters, but offers default values according to the recommendation of the BSI:

Key length: 256 bit
Length of GCM checksums: 128 bit
Length of the initialisation vector: 96 bit
Length of the salt for the key derivation: 32 bit
Parallelism for Argon2: 1
Memory cost for Argon2: 4096 KB
Iterations for Argon2: 3

The initialisation vector is randomly generated each time the encryption methods are called by using SecureRandom. The salt for the key derivation is generated in the same way each time the password derivation method is called. The fact that the initialisation vector is always regenerated ensures that the same combination of initialisation vector and key can never be used more than once. For both the AES algorithm and the Argon2 hash function, the implementations of the BouncyCastle library are used. For performance and compatibility reasons, the BouncyCastle implementations are used directly and not via the JCA:

Generation of the AES cipher with GCM

GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());

Generation of the Argon2 hash generator

Argon2BytesGenerator generator = new Argon2BytesGenerator();

Since the default implementation of the CipherInputStream from javax.crypto is not suitable for block ciphers with data authentication, the implementations for CipherInputStream and CipherOutputStream from the BouncyCastle library are used. To generate the random data for the initialisation vector and the salt, a SecureRandom instance created with SecureRandom.getInstanceStrong() is used by default. However, the library allows you to specify a different RNG algorithm (see Note on Linux below).

Header Format

The header begins with a string to identify data encrypted with the library followed by the length of the payload data in the header. The header is divided into blocks and can be read serially.

++>~ENC~<++|97|AES_GCM_ARGON2|1|256|128|10|4096|1|aWFtYW5pbml0aWFsaXphdGlvbnZlY3Rvcg==|aWFtYXNhbHQ=|bXlLZXlJZA==

Marker|length|method|header version|key length|checksum length|iteration|storage cost|parallelism|initialisation vector|salt|key ID

Key

The keys used for encryption are either generated using random data or derived from any password using the Argon2 hash function. Since deriving keys can be very computationally intensive depending on the configuration, a key ID can be stored in the header. This makes it possible to store a key once it has been derived and to reuse it for decryption, which avoids having to derive the key from the password again. The library is not responsible for the secure storage of the key. Generating keys using random data is a much faster operation compared to key derivation. The disadvantage is, that it is not possible to derive the key from a master password in case it was lost. When generated keys are used, it is crucial to store those keys in a secure location. In this case, the header will not contain a salt but only the ID of the stored key. When an external system like Vault is used to encrypt generated keys, the encrypted generated key is stored in the header instead.

Usage

Instantiation of the AesEncryptorAndDecryptor:

Instantiation with default parameters

AesEncryptorAndDecryptor encryptorAndDecryptor=new AesEncryptorAndDecryptor.Builder().build();

Instantiation with customised parameters

AesEncryptorAndDecryptor encryptorAndDecryptor=new AesEncryptorAndDecryptor.Builder()
    .with128BitKeys()
    .withInitializationVectorLength(128)
    .withTagLength(128)
    .withIterations(5)
    .withMemoryCost(1024)
    .withParallelism(3)
    .withSaltLength(64)
    .withRngAlgorithm("SHA1PRNG")
    .build();

Examples of usage can be found in the test class de.eitco.commons.crypto.AesEncryptionTest.

Note on Linux

On Linux, Java uses the NativePRNG algorithm by default for generating random data with SecureRandom.getInstanceStrong(). This implementation uses /dev/random and may block if there is not enough data available there. This can lead to very long waiting times for key derivation and encryption. You can then either use a weaker RNG algorithm or make sure that /dev/random always contains enough data. This can be achieved with the haveged daemon, for example:

apt-get install haveged
update-rc.d haveged defaults
service haveged start

Configure Active MQ

arveo uses Apache ActiveMQ to queue asynchronous tasks. Access to the message broker is configured in the YAML file of the arveo service using the default configuration properties of the Spring ActiveMQ integration:

Configuration of ActiveMQ

spring:
  activemq:
    broker-url: "tcp://127.0.0.1:61616"
    user: "system"
    password: "manager"

ActiveMQ’s OpenWire protocol is used to connect to the broker. The queues and topics used by the arveo can be identified by arveo- name-prefix. The arveo uses text messages containing JSON data to make it possible to consume messages in components not implemented in Java. The JSON data uses the same serialization mechanism as the REST API.

arveo uses ActiveMQ’s scheduler support for features like automated deletion of entities in the recycle bin after a configurable time. Therefore it is required to enable the scheduler in ActiveMQ by setting schedulerSupport="true" in the broker tag in activemq.xml.

Some features like the automatic creation of renditions or the removal of stored data for data protection compliance require dead letter queues in ActiveMQ. See renditions for details. The queue-specific dead letter queues must be activated by adding the following policy entries to _activemq.xml.

<policyEntry queue="ecr-queue-create-renditions">
    <deadLetterStrategy>
        <individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true"/>
    </deadLetterStrategy>
</policyEntry>
<policyEntry queue="ecr-queue-delete-audit-entries">
    <deadLetterStrategy>
        <individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true"/>
    </deadLetterStrategy>
</policyEntry>

Configure arveo user management as Authentication Service

The User Management Service can also be used as an OAuth2.0 Authorization Server. The service can issue JSON web tokens that can be used to log in to services that are also secured with OAuth2.

Configuration of the Authorization Server

To enable the Authorization Server, the user-service.authorization-server.enabled setting must be enabled and a keystore must be configured. The keystore must contain an RSA keypair under the specified alias:

Excerpt from a configuration file

user-service:
  authorization-server:
  enabled: true
  keystore:
    file: "Pfad/zum/Keytore/keystore.jks"
    password: test
    alias: test

OAuth clients

To obtain a token, a client application must log on to a specific client configured in the Authorization Server. Clients can be created both by API and by configuration. At least one client must have been configured to be able to log in via OAuth. Clients are always stored in the master tenant. In the configuration, clients can be specified as follows:

Excerpt from a configuration file

user-service:
  config-data:
    tenants:
      - tenant-id: master
        oauth2-clients:
          - clientId: test-client
            resourceIds:
              - user-management-service
            clientSecret: my-secret
            authorizedGrantTypes:
              - password
              - client_credentials
              - refresh_token
            authorities:
              - USER_MANAGEMENT_SERVICE_USER
            accessTokenValiditySeconds: 300
            refreshTokenValiditySeconds: 600

In the above example, a client with the ID "test-client" is configured to have access to the arveo User Management Service (resourceIds and authorities) and to offer the authorization grants password, client_credentials and refresh_tokens. The grants are the same as those in the OAuth2.0 standard.

By default, the client’s configured authorities are included in the issued tokens. In addition, the user’s authorities (= privileges) configured in the user service are entered in the tokens. To prevent the client’s authorities from being included in the tokens, the user-service.authorization-server.inherit-authorities setting can be set to false.

The clients are always stored in the master tenant. For systems with multiple clients, care must be taken to specify the master tenant in the configuration.

Refreshing tokens

When a new token is issued, a refresh token is also generated (except for the client_credentials grant). This refresh token can be used to renew an expiring token without requiring the user to log in again. By default, when a token refresh request is made, the user also receives a new refresh token whose validity is still that of the first refresh token. This ensures that a user cannot be issued new access tokens by the service indefinitely. If this behavior is not desired, and the refresh tokens should each have an extended validity, the user-service.authorization-service.reuse-refresh-tokens parameter can be set to false.

Client login

To get a token, the client application must send the respective client id and the client secret as HTTP Basic Auth header in the token request. The remaining parameters are sent as form data via POST to the endpoint https://user-management-service/oauth/token. === Configure authentication/SSO with Keycloak

The arveo content services support OAuth2.0 with OpenID Connect to authenticate users and services. You can install Keycloak as your identity management and use it as OAuth2.0 authentication service instead of the _arveo user management service. This will also allow you to enable single sign on for your web clients.

The content services take either the role of a "resource server" or the role "resource server" and a "client" if the use resources of other services.

In principle, any authentication server that supports OAuth2.0 and OpenID Connect can be used. Currently Keycloak, Active Directory are approved for use with arveo.

Install Keycloak

Download and install Keycloak: https://www.keycloak.org/downloads.html
Start the server. With standalone.bat -Djboss.socket.binding.port-offset=100 the used ports can be adjusted.
Call the configuration interface (e.g. localhost:8180) and define an administrator user for the first login.
Refer to Keycloak documentation to configure Keycloak on your system.

Create Keycloak realm

Create your own realm (e.g. "arveo") with keycloak configuration interface.
Copy the public RSA key from the realm keys tab. We will use later for the configuration of the content services.

Create Keycloak clients

Next, the keycloak clients for the arveo services are set up. Clients may access resources, resources validate access to themselves. There is also a mixed form (confidential) that accesses resources, but can also be a resource itself.
All clients have set Client Protocol=openid-connect
Client Authenticator must be Client Id and Secret to allow a secure OAuth2.0 flow.
- Create a client arveo-service for the secure communication between the arveo services,
  This client behaves like a technical user for service/service calls
  access-type=confidential
- Create client for your applications e.g. arveo-webclient which is public and accesses all arveo services
  This client is used for the users of your application that have logged with credentials.
  Client-Protocol=public
  Valid Redirect URIs=<URI of your web client>
  Client Protocol=openid-connect

Implicit flow is no longer recommended. The standard flow should be used. Furthermore, the extension PKCE (Proof Key for Code Exchange Code) should be used (*Authentication Standard Flow with PKCE.

Configure a client

To allow the client to access the arveo services, add the role arveo-service-user to the client.
Add token mappers to allow arveo to get information from the token
- Tenant, the tenant in arveo. This is used to assign the user to a client.
  Name=Tenant
  Mapper-Type=User Attribute
  User Attribute=tenant
  Token Claim Name=tenant
  Claim JSON Type=String
  Multi Valued = Off
  Add To Id Token=On
  Add To Access Token=On
  Add To User Info=On
- Audience for repository service, _arveo pays attention to access tokens from the score-client at all, the score-backend client must also be in the token as an audience
  Name=Audience for arveo services
  Mapper-Type=Audience
  Included Client Audience=arveo-webclient
  Add To ID Token=Off
  Add To Access Token=On
- GUUID, important for authentication via LDAP. In the access token, the user_name attribute is set to the GUUID from the LDAP. This is largely stable, in contrast to the Keycloak internal user ID.
  Name=GUUID
  Mapper-Type=User Attribute
  User Attribute=LDAP_ID
  Token Claim Name=user_name
  Claim JSON Type=String
  Multi Valued = Off
  Add To Id Token=On
  Add To Access Token=On
  Add To User Info=On
- Client ID, keycloak ClientID Name=Client ID
  Mapper-Type=User Session Note
  User Attribute=clientid
  Token Claim Name=clientid
  Claim JSON Type=String
  Add To Id Token=On
  Add To Access Token=On
- Client roles, is required for the _arveo services. The client needs the authority arveo-user-role to access the service. All roles from the client with the ClientID arveo-client are added to the claim authorities. Name=client roles
  Mapper-Type=User Client Role
  User Attribute=LDAP_ID
  Multi Valued = On
  Token Claim Name=authorities
  Claim JSON Type=String
  Multi Valued = Off
  Add To Id Token=Off
  Add To Access Token=On
  Add To UserInfo = Off
- Service user. is required to identify the user of the access token as a service user. Name=Service user
  Mapper-Type=Script Mapper
  Script=exports=user.getUserName.startsWith("service-account");
  Multi Valued = Off
  Token Claim Name=technical-user
  Claim JSON Type=boolean
  Add To Id Token=On
  Add To Access Token=On
  Add To UserInfo = On

Configure Keycloak for SSO with Kerberos

Configure Keycloak user federation for SSO with Active Directory using Kerberos

Figure 5. LDAP Mapping

2 additional LDAP mappers have to be added:
- Adding the tenant to the user attributes, since the tenant does not come from the AD.
  Name: add-arveo-tenant
  Mapper Type: hardcoded-attribute-mapper
  User Model Attribute Name: tenant
  Attribute Value: master
- The role so that the user is authorized to access the arveo services Name: add-arveo_-user
  Mapper Type: hardcoded-ldap-role-mapper
  Role: arveo-service-user

As soon as a user logs on to the _arveo web client application for the first time, the user is imported from the LDAP into Keyclaok. Users who are not in the LDAP can be created locally in the Keycloak.

Configure Keycloak:

Install package freeipa-client (Ubuntu)
Setup /etc/krb5.conf

[libdefaults]
default_realm = <your realm>
# The following krb5.conf variables are only for MIT Kerberos.
kdc_timesync = 1
ccache_type = 4
forwardable = true
proxiable = true
# The following encryption type specification will be used by MIT Kerberos
# if uncommented. In general, the defaults in the MIT Kerberos code are
# correct and overriding these specifications only serves to disable new
# encryption types as they are added, creating interoperability problems.
#
# The only time when you might need to uncomment these lines and change
# the enctypes is if you have local software that will break on ticket
# caches containing ticket encryption types it doesn't know about (such as
# old versions of Sun Java).# default_tgs_enctypes = des3-hmac-sha1
# default_tkt_enctypes = des3-hmac-sha1
# permitted_enctypes = des3-hmac-sha1
# The following libdefaults parameters are only for Heimdal Kerberos.
fcc-mit-ticketflags = true
[realms]
YOURDOMAIN.COM={
kdc=yourdomaincontronler:port
}
[domain_realm]
yourdmain.com=YOURDOMAIN.COM
.yourdomain.com=YOURDOMAIN.COM

chown on arveo:arveo und chmod 600
Import the CA certificate to your Java truststore
e.g. %javahome%/keytool -import -alias YourDomain.com -keystore truststore.jks -file ~/ca.pem
Activate Kerberos Single Sign On:
To allow SSO set requirement for all Flow to ALTERNATIVE

Figure 6. Keycloak Flows

Add a non ldap test user in manage users
Details:
Name=TestUser
User Enabled=On
Attributes:
LDAP_ID=<new UUID>
Tenant=master
Role Mappings=<add arveo-service-user>

Configure authentication between Content Services

All arveo content services use Spring Security for user authentication and authorization. Spring Security supports several standardized protocols as well as custom implementations. The basic configuration is independent of the protocol used.

When configuring the service, it is important to consider the role that the service plays in the overall system. Some services are only used by different clients and do not communicate with other services. These services only take the role of a "resource server". Other services, such as the repository service, communicate with other services themselves and assume the role of a "resource server" and a "client" at the same time.

Resource and client configuration

The following configuration can be used to make a service an OAuth2.0 resource and/or an OAuth2.0 client in the service’s application.yaml:

Service

Client

Resource

Document Service

yes

User Management Service

yes

Access Control Service

yes

Audit Service

yes

SAP Archive Link Service (optional)

yes

Document Conversion Service (optional)

yes

Enterprise User Management Service (optional)

yes

Enterprise Integration Service (optional)

yes

Federation Service (optional)

yes

Configure resource

Configure the respective application.yaml of the service like this:


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

  security:
  general:
    secured-ant-matchers: "/api/**"
    open-ant-matchers: "/actuator/health,/actuator/info"
    role-for-secured-access: "<service - name>"
    cors-configuration:
      allowed-origins: "*"
      allowed-headers: "*"
      allowed-methods: "GET,POST,PUT,PATCH,DELETE,OPTIONS"
      max-age: 3600

spring:
  security:
    oauth2:
      resourceserver:
        jwt:
# public key for user management service
#          public-key-location: "http://localhost:39002/oauth/public_key"
# public key location for keycloak
          jwk-set-uri: "http://localhost:8080/auth/realms/ecr/protocol/openid-connect/certs"

(1) Generally, these parameters shouldn’t be changed.

(2) CORS defines a way in which a browser and server can interact to determine whether it is safe to allow the cross-origin request.

Configure the client

Configure the respective application.yaml of the service like this:


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

  spring:
  security:
    oauth2:
      client:
        registration:
          cmn-user-service-client-credentials:
            provider: user-service
            client-id: "arveo-service"
            client-secret: "my-secret"
            authorization-grant-type: "client_credentials"
            scope: "arveo"
        provider:
          user-service:
            authorization-uri: "http://localhost:39002/oauth/auth"
            token-uri: "http://localhost:39002/oauth/token"-----
          keycloak:
            authorization-uri: "http://localhost:8080/auth/realms/arveo/protocol/openid-connect/auth"
            token-uri: "http://localhost:8080/auth/realms/arveo/protocol/openid-connect/token"

Parameter

Description

oauth2.resourceserver.jwt.public-key-location

Validation key of the authentication service to validate the token. e.g. PEM or RSA Public Key. For keycloak Realm Settings Keys, for User management service see documentation on user-management/securing rest endpoints

security.general.role-for-secured-access

unique identifier of the service: see names of services in table below

spring.security.oauth2.client.registration.cmn-user-service-client-credentials.client-id

Client Id configured in your authentication service. In our keyckloak example: arveo-service, Client-ID, for User management service see documentation on user management/client context/client ID

spring.security.oauth2.client.registration.cmn-user-service-client-credentials.client-secret

Your client secret of the authentication service. For keyckloak Client Secret, for User management service see documentation on user management/client context/client secret

spring.security.oauth2.client.provider.user-service.authorization-uri

end point for user authorization

spring.security.oauth2.client.provider.user-service.token-uri

end point to get an access token

spring.security.oauth2.client.registration.cmn-user-service-client-credentials.scope

Scope is always arveo

The public-key-location defines a path to a resource containing the public key of the service that issued the signed tokens. If the issuing service supports JSON Web Keys, the URL to the JWK endpoint can be set using jwk-set-uri.
To enable the user impersonation feature, add the following to the application.yaml configuration:


  1
2
3
4

      commons:
        security:
            oauth2:
                impersonation-enabled: true

It is possible to disable the auto configuration of the server components by setting spring.security.oauth2.resourceserver.enabled to false.

Troubleshooting some common error messages and ways to fix them:

principal cannot be null from OAuth2AuthorizeRequest: There probably was no Authentication in the application’s SecurityContext. Check if the application sets an Authentication.
Startup fails because no bean of type ClientRegistrationRepository was found: Check the configuration. This usually happens when the values in spring.security.oauth2.client are either missing or invalid. Check indentation!

Troubleshooting

Some common error messages and ways to fix them:

Principal cannot be null from OAuth2AuthorizeRequest: There probably was no Authentication in the application’s SecurityContext. Check if the application sets an Authentication.
Startup fails because no bean of type ClientRegistrationRepository was found: Check the configuration. This usually happens when the values in spring.security.oauth2.client are either missing or invalid. Check indentation!

OAuth2.0 Authentication

All arveo services require authentication, ensuring that only another arveo service or an authenticated user can use the REST API. Authentication of a user is done by the either a authentication service like Keycloak or the arveo User Management Service. The user context is passed to invoked services. Single Sign on is supported with OAuth2.0 and Open ID Connect.

The arveo User Management Service can also be used as an OAuth2.0 Authorization Server. The service can issue JSON web tokens that can be used to log in to services that are also secured with OAuth2.

This chapter describes

how arveo's content services act as an OAuth2.0 resource server for applications using the arveo REST API
how the arveo services use OAuth2.0 to authenticate to other services as a technical user.

All content services use Spring Security for user authentication and authorization. The services support OAuth2.0 with OpenID Connect.

Spring Security enables both the OAuth2.0 support for the service’s web resources and the OAuth2.0 client support. The content services retrieve a new OAuth2.0 token from the configured OAuth2.0 authorization service when a authentication is required. This OAuth2.0 authorization service can either be the arveo user management service or Keycloak, Active Directory.

OAuth2.0 Flows (Grant types)

OAuth2.0 defines four flows to get an access token. These flows are called grant types. arveo supports two flows for user authentication and service authentication.

Client Credentials Flow: used for machine-to-machine content services communication.
Authorization Code Flow with Proof Key for Code Exchange (PKCE) technique: used by arveo Web Applications sand also used by mobile apps.
Resource Owner Password Flow: can be used by highly-trusted web apps.

In the following paragraphs we describe the flows. The authentication service can either be the arveo User Management Service or Keycloak. If you use Keycloak you must turn off arveo internal authentication service.

Client Credential Flow

Our machine-to-machine content services authenticate and authorize the app not an user. For this scenario, typical authentication schemes like username + password or social logins don’t make sense. Instead, the services use the Client Credentials Flow, in which they pass along their Client ID and Client Secret to authenticate themselves and get a token.

Figure 7. Credential Flow

arveo authenticates with Authorization Service using its Client ID and Client Secret (/oauth/token endpoint).
The Authorization Service validates the Client ID and Client Secret.
The Authorization Service responds with an Access Token.
arveo can use the Access Token to call an API on behalf of itself.
The Service API responds with requested data.

Authentication Code Flow (PKCE)

The arveo single-page web applications requests Access Tokens, some additional security concerns are posed that are not mitigated by the Authorization Code Flow alone. This is because:

Native apps cannot securely store a Client Secret. Decompiling the app will reveal the Client Secret, which is bound to the app and is the same for all users and devices.
Single-page apps cannot securely store a Client Secret because their entire source is available to the browser.

Given these situations, OAuth 2.0 provides a version of the Authorization Code Flow which makes use of a Proof Key for Code Exchange (PKCE).

The PKCE-enhanced Authorization Code Flow introduces a secret created by the calling application that can be verified by the authorization server; this secret is called the Code Verifier. Additionally, the calling app creates a transform value of the Code Verifier called the Code Challenge and sends this value over HTTPS to retrieve an Authorization Code. This way, a malicious attacker can only intercept the Authorization Code, and they cannot exchange it for a token without the Code Verifier.

The user clicks Login within the application.
The OAUTH2 java script SDK creates a cryptographically-random code_verifier and from this generates a code_challenge.
OAUTH2 java script SDK redirects the user to the authorization service (/authorize endpoint) along with the code_challenge.
The authorization service redirects the user to the login and authorization prompt.
The user authenticates using one of the configured login options and may see a consent page listing the permissions Auth0 will give to the application.
The authorization service stores the code_challenge and redirects the user back to the application with an authorization code, which is good for one use.
OAUTH2 jav script SDK sends this code and the code_verifier (created in step 2) to the Auth0 Authorization Server (/oauth/token endpoint).
The authorization service verifies the code_challenge and code_verifier.
The authorization server responds with an ID Token and Access Token (and optionally, a Refresh Token).
The arveo web application can use the Access Token to call an API to access information about the user.
The API responds with requested data.

Maintenance mode for the database schema

This chapter documents the arveo parameters to start the database service, alter schema and stop the service.

The arveo can be started in a special mode that ensures, that this instance changes the schema and prevents other instances from being started or have already been started. If the database schema change fails, the instance terminates in a way that can be easily evaluated by the administrator to be able to react to this exception.

The service does not start if registry query returns other running instances. The service terminates after the liquibase script is executed. The following two parameters are set:

system:
  terminateAfterCreation: true
  updateSchema: true

So the maintenance mode can be used to update the database schema. When the maintenance mode is enabled, the arveo starts, performs necessary schema updates, and terminates once the schema was updated. Requests from clients are not processed while the system is in maintenance mode. Clients will receive a HTTP 503 response code. Schema updates must be performed by one single arveo instance to avoid race conditions. The recommended procedure for a schema update is as follows:

Shut down all arveo instances
If required: Update to a newer arveo version
Enable maintenance mode by setting system.maintenanceMode: true in the configuration
Start one single arveo instance and wait for it to shut down after the schema was updated
Disable maintenance mode in the configuration
Start all arveo instances.

The database schema of an existing system can be changed by adapting the type definition classes and restarting the repository service with the setting arveo.server.system.maintenance-mode=true. The service will update the database schema and shut down once the update is finished. It will not accept requests while the schema is updated.

Supported schema changes

The following list contains the supported schema changes. Note that some changes like removing an attribute or adding constraints might not be possible when the existing data or existing constraints might be violated by the change.

Adding a new attribute.
Removing an existing attribute. Note that the column will be dropped from the schema.
Adding and removing indexes as well as changing index properties.
Change the primary key (only for META types).
Adding and removing of foreign keys.
Add new content elements (only for DOCUMENT types).
Adding and removing unique constraints.
Adding and removing not-null constraints.

It is also possible to enable certain features on existing type definitions. Disabling the features is not supported.

Enabling ACL support.
Enabling document filing.
Enabling optimistic locking.
Enabling the recycle bin.
Enabling retention support.

Checking for schema changes

By setting the properties arveo.server.system.maintenanceMode and arveo.server.system.logSchemaChanges to true, the system will start up, check for required schema changes, write them to a special log file, and shut down again. The database schema will not be changed. This makes it possible to check for unsupported changes to the schema before performing the actual schema update.

The directory used to store the schema update log can be specified using the property arveo.server.system.schemaChangeLogDirectory. The default value is logs. The system will create one logfile for each tenant. The contents of the file will look like the following example:

Supported changes for attributes of type definition my_document:
        - document_name: IS_UNIQUE
        - container_id: FOREIGN_KEY, IS_UNIQUE

Unsupported changes for attributes of type definition my_document:
        - document_name: none
        - container_id: none

In this example, there are three supported changes for the type definition named my_document. A unique constraint will be added to or removed from the attributes container_id and document_name and a foreign key will be added to or removed from the attribute container_id. There are no unsupported changes, so the actual schema update should succeed.

Please note that there are some advanced schema checks that can only be done correctly when the types are actually stored in the database. For example, the checks for the correctness of parent- and child- types of a relation type is not possible when the schema update itself is skipped.

Configure audit

A @Type may define to be audited. This means, that any write access i.e. any create, update and delete operation to any entity of this type will be logged into another table. This is done with the annotation @Audit:


  1
2
3
4
5
6
7
8
9
10
11
12

  @Type(ObjectType.CONTAINER)
@Audit(AuditLocation.TYPE_SPECIFIC) (1)
public interface AuditedContainer {

    @Optional
    String getName();
    void setName(String name);

    @Optional
    Integer getInteger();
    void setInteger(Integer integer);
}

1	The annotation `@Audit` activates auditing on a type

The name of the table to be audited to is derived from the table name of the given type, following the form <table-name>_log. You can choose to specify one audit table per entity table, or alternatively to audit to one global table:


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

  @Type(ObjectType.DOCUMENT)
@Audit(
    value = AuditLocation.GLOBAL, (1)
    indexOn = {AuditJsonField.CURRENT} (2)
)
public interface AuditedDocument {

    @Optional
    String getName();
    void setName(String name);

    @Optional
    Integer getInteger();
    void setInteger(Integer integer);
}

1	Note a different `AuditLocation`
2	with `indexOn` it is possible to specify on which json fields of the audit table indices should be set

In this case the table to be audited to will be the audit services default audit table default_audit_log.

Access audit

To access the audit, the audit service provides a REST API. Access will be restricted depending on the type: If ACLs are activated on a @Type, only users that have read access to an entity will be allowed to audit this entity. If ACLs are deactivated on a @Type, only users with the authority AUDITOR are allowed to audit the entities of this type.

Solr

In order to use Solr in connection with arveo, an installation of a Solr service is required. Under the following link you can download the current versions of Solr: https://solr.apache.org/downloads.html.

The Solr service must be configured in arveo within the application.yaml. See Configuration properties for details.

When a type definition is annotated with @NOSql, the entities stored in this type definition will be stored in Solr, too. See NOSQL Example for how to enable this feature. The system will create a special queue table in the relational database for such a type definition. The queue table will contain the entities that have to be stored in Solr. A system job is used to process the entries in the queue table.

Solr deployment

A small tutorial for setting up Solr for arveo.

arveo yaml configuration:

In the following example you can see a minimal arveo yaml configuration for Solr:

ecr:
  server:
    solr:
      defaultConfigName: "ecr-config"
      host: "http://localhost:38983/solr"
      username: "ecr-solr-user"
      password: "password"

Collections will be created automatically from arveo. Every tenant gets his own collection.

More information you can find in the chapter ecr.server.solr.

Solr security

As default Solr has no security settings set. It is very important to add the security settings for Solr!

For more information for setting up the security settings for Solr you can go the chapter solr security configuration

If you use the arveo acl functionallity you can setup the solr-acl-plugin. More information for the solr-acl-plugin you can find here.

Solr Configurations

If you want to use solr with arveo you must upload a ecr-config to the Solr zookeeper. The ecr-config you can find at the following url nexus.eitco.de as zip-file.

The zip-file you can upload to zookeeper with the following command on the command line interface.

Shell:

[path to zookeeper in solr]/zkcli.sh -cmd upconfig -confdir [path to solr-config from nexus.eitco.de]/solr-config -confname ecr-config -z [zookeeper host]:[zookeeper port]

Bash:

[path to zookeeper in solr]\zkcli.bat -cmd upconfig -confdir [path to solr-config from nexus.eitco.de]\solr-config -confname ecr-config -z [zookeeper host]:[zookeeper port]

Type definitions

You can annotate a type definition Class with @NOSql. If you do that every attribute in the type definition will be created as field in the Solr manged-schema.xml. If you want to deactivate an attribute of an type definition for Solr you can do the following:

If you have a type definition which is using the annotation @NOSql you must be setup a Solr. Otherwise, arveo will be not starting!

@NOSql
public interface PersonSimple {
    String getFirstName();
    void setFirstName(String value);

    @NOSql(value = false)
    String getLastName();
    void setLastName(String value);
}

In this example only the firstname will be automatically created in the managed-schema.xml of Solr.

Fulltext Extraction

arveo automatically extracts the content of the type definitions which are annotated with @Type(ObjectType.DOCUMENT) and @NOSql. You must set up the documents conversion service for this functionality. At the following we describe the process of the automatic extraction of Solr:

Arveo saves the documents who are annotated with @NOSql in a queue table named type_definition_name + _ng;
Now a job will fetch all entry in the queue table;
For each entry which are annotated with @Type(ObjectType.DOCUMENT) arveo will call the document conversion service to extract the fulltext from the content;
At the end all information will be saved to Solr.

More general information about the document service full text extraction can be found here and here.

More general information about extracting text with arveo and Solr you can find at chapter Fulltext extraction.

Solr security configuration

By default, Solr does not use any kind of authorization or authentication. In productive systems, Solr must be secured by enabling transport encryption, authentication and authorization.

The transport encryption can be enabled by enabling HTTPs in Solr. See the Solr documentation for instructions. When SSL is enabled, the url in the configuration property ecr.server.solr.host must use the https scheme.

arveo can use basic authentication to authenticate requests sent to Solr. Solr provides a basic authentication plugin that must be enabled as described in the documentation. Enabling authentication and authorization in Solr requires uploading a security.json file to Zookeeper. The following example shows a security.json file that enables the basic auth plugin and a rule based authorization plugin.

There is a github project containing a tool to generate the value for the salt and password used in the credentials-property of the basic auth plugin.

{
  "authentication": {
    "blockUnknown": true,
    "class": "solr.BasicAuthPlugin",
    "credentials": {
      "ecr-solr-user": "qkxp6hmEeGTaqnEvSmH7f+qytLWd/JcwaUyqpdjt5rg= NERXZefDt7lXYvdZfB0hT3ZCgNFSqI4nJ7kGgbhaTWs="
    },
    "realm": "My Solr users",
    "forwardCredentials": false
  },
  "authorization": {
    "class": "solr.RuleBasedAuthorizationPlugin",
    "permissions": [
      {
        "name": "schema-edit",
        "role": "admin"
      },
      {
        "name": "update",
        "role": "admin"
      },
      {
        "name": "read",
        "role": [
          "user",
          "admin"
        ]
      }
    ],
    "user-role": {
      "ecr-solr-user": [
        "user",
        "admin"
      ]
    }
  }
}

To enable basic auth support for the Solr client used by arveo, you have to set the following parameters in the configuration for arveo:

ecr:
  server:
    solr:
      username: "ecr-solr-user"
      password: "password"

ACL filter plugin

When searching for ACL protected entities in Solr, the search result is filtered by ACL right. This is achieved by using a custom Solr plugin, which must be installed manually by following the steps below:

Download the cmn-user-management-access-control-solr-plugin (version 4.1.0) from the nexus repository.
Install the plugin in Solr as described in the Solr documentation.
Configure the plugin in solrconfig.xml as shown below:

<queryParser name="aclright" class="de.eitco.commons.user.management.access.control.solr.AclRightParserPlugin">
    <str name="solrAclPlugin.jdbcUrl">jdbc:postgresql://localhost:5432/mydatabase?currentSchema=mytenant</str>
    <str name="solrAclPlugin.jdbcUser">myuser</str>
    <str name="solrAclPlugin.jdbcPassword">mypassword</str>
</queryParser>

The plugin requires a JDBC connection to the relational database to load ACL rights. Do not change the name of the query parser. Solr queries generated by arveo will contain a filter query using the aclright prefix to perform the actual filtering.

The Access Control Solr Plugin does not yet support multiple tenants!

Alternatively, the configuration parameters for the ACL filter plugin can be set as Java system properties for the Solr server, as environment variables, or they can be stored as secrets in Vault. To use Vault, set the following parameters either in the solrconfig.xml file, as Java system properties, or as environment variables:

solrAclPlugin.vaultEnabled=true
solrAclPlugin.vaultAddress=https://myvaultserver:port (optional, the default value is "http://127.0.0.1:8200")
solrAclPlugin.vaultToken=token (optional, can be a token value or the path to a token file. By default, the plugin will try to load a token from ~/.vault-token)
solrAclPlugin.vaultSecretEnginePath=path (optional, the default is "secret")

The order in which the plugin loads configuration properties is:

Vault
System properties
Environment variables

Storing entities in Solr

To be able to use Solr for advanced queries, the entities must be stored in Solr, too. To enable this, simply add the @NOSql annotation to the attributes to be stored in Solr to the getters of your type definitions. The annotation can be added to the class to store all attributes in Solr. arveo will automatically create the required fields in the Solr schema. All entities of one tenant will be stored in a single collection in Solr. To avoid name collisions, the names of the fields in Solr will consist of the name of the type definition and the name of the attribute, separated by a dot. For example, an attribute called "name" in a type definition called "invoice" would be named "invoice.name".

Fulltext extraction

It is also possible to store fulltext data of the content of documents in Solr. For this, set the fulltextExtraction attribute of the @ContentElement annotation to true (see Content Elements). The fulltext data of the content elements will be stored in a field called <typeDefinitionName>.<contentElementName>.fulltext in Solr. This field can be used in queries just like any other field.

The fulltext extraction is performed by the document conversion service. Which types of content are supported for fulltext extraction depends on the active plugins of the document conversion service. The open source plugins contained in the document conversion service project support fulltext extraction for PDFs with text content and Microsoft Office documents. The mime type of a content element is required for the fulltext extraction. By default, the mime type is contained in the metadata of a content element. When the content element is stored in a separate field on the database, the mime type is not available in the metadata but it can be defined in the @ContentElement annotation. If the mime type is set to the default value (application/octet-stream), the service will try to auto-detect the mime type.

Searching in Solr

The arveo API provides methods to perform queries in Solr. The methods use the EQL just like the regular search methods that perform queries on the relational database. There are some EQL features that are not supported when searching in Solr:

joins and unions
subselects
exists expressions
toLower
less than
greater than
is null

The behavior of the supported query expressions is dependent from the configuration of the field in Solr. For example, a text field with a tokenizer that splits text by whitespaces will deliver different results for equality expressions than a text field without such a tokenizer.

To perform a query in Solr for one specific type definition, use the getNoSqlSearchService method of the service client for the type definition. In the following example, this method is used to find an entity by its ID in Solr:

Searching for an entity by ID in Solr

Optional<TypedNOSqlSearchHit<FieldTypeContainer>> optional = serviceClient.getNoSqlSearchService()
    .where().id().equalTo().value(identifier).holds().uniqueResult();

The next example shows how to search in the fulltext data stored in Solr.

Searching for fulltext data in Solr

list = serviceClient.getNoSqlSearchService().where().contextReference(SimpleInvoiceNames.CONTENT_FULLTEXT)
    .contains().value("Gubergren accumsan takimata").holds().unpaged();

To search for values in a specific field, the correct field name must be used in the context reference. The auto- generated name constant classes for the type definition interfaces contain a helper method to compute the name. The following example shows how to use this method to search in a specific field of an entity.

Searching for data in Solr using a specific field

list = serviceClient.getNoSqlSearchService().where()
    .contextReference(FieldTypeContainerNames.noSqlName(FieldTypeContainerNames.INTEGER_FIELD))
    .equalTo().value(7).holds().unpaged();

When using the getNoSqlSearchService method, the query performed in Solr will automatically be limited to entities belonging to one single type definition. To search for entities in multiple type definitions, the method de.eitco.ecr.sdk.SearchClient.searchServiceForNoSql can be used.

Combined search

arveo creates a special multi-valued field called ecr_attributes in Solr. When a getter (or an entire interface) is annotated with @NOSql(combinedSearch = true), the attribute of the getter (or all attributes of the interface) will be copied in this field. The ecr_attributes field is of type string. The values of the attributes copied into this field, will be converted automatically by Solr.

The ecr_attributes field can be used for a combined search of all attribute values. This makes it possible to provide a search method where the user does not need to know the name of the attribute to search for:

Performing a combined search

Optional<TypedNOSqlSearchHit<SimpleInvoice>> hit =
    serviceClient1.getNoSqlSearchService().where().noSqlCombinedField().equalTo()
        .value(invoiceNumber).holds().uniqueResult();
Optional<TypedNOSqlSearchHit<SimpleInvoice>> hit =
    serviceClient1.getNoSqlSearchService().where().noSqlCombinedField().equalTo()
        .value(invoiceNumber).and().noSqlCombinedField().equalTo().value("29.99").holds().uniqueResult();
Optional<TypedNOSqlSearchHit<SimpleInvoice>> hit =
    serviceClient1.getNoSqlSearchService().where().noSqlCombinedField().like()
        .value("Kasd invidunt stet dolor")
        .and().contextReference(EcrQueryLanguage.COMBINED_SEARCH_FIELD).equalTo().value(invoiceNumber)
        .holds().uniqueResult();

It is possible to copy the extracted fulltext data of content elements to the combined ecr_attributes field, too. To do so, configure the respective content element as shown below:

Using combined search for extracted fulltext data

@ContentElement(name = "pdf", contentType = "application/pdf", fulltextExtraction = true, textCombinedSearch = true, textCombinedSearchLimit = 200)

The relevant settings are textCombinedSearch=true, which enables the copying of the extracted fulltext data. The textCombinedSearchLimit setting limits the number of characters to copy. This makes it possible to reduce the size of the Solr index. A value of 0 means no limit.

Relations in Solr

Relations between entities can be described using relation type definitions. While those relations can be easily resolved in the relational database, NoSQL databases like Solr are not designed to support a relational data model as shown in the following diagram:

A model using a relation table

 +------------+               +--------+-----+              +------------+
 |   Parent   |               +   Relation   +              |   Child    |
 |------------|      source   |--------------|  target      |------------|
 |            |<--------------|              |------------->|            |
 |            |               |              |              |            |
 +---+--------+               +--------------+              +------------+

It is still possible to search for Parent entities by the IDs of the related Child entities in Solr using arveo. To make this possible arveo stores the IDs of the Child entities related to a Parent entity in a multi value field in Solr. To enable this feature, the type definition interface of the Parent type must be annotated with @NOSqlResolvedRelations. The annotation requires a parameter containing the type definition classes of the relations to store in Solr.

Only relations from the current version of the Parent entity to the current version of the Child entity can be stored in Solr.

The following example shows how to search for entities by related child IDs:

Searching for relation child IDs in Solr

List<TypedNOSqlSearchHit<SimpleInvoice>> list = documentServiceClient.getNoSqlSearchService().where() (1)
    .contextReference(noSqlSearchHelper.getRelationChildIdsField(SimpleInvoice.class, InvoicePersonRelation.class)) (2)
    .contains().value(containerClient.getIdentifier()).holds().unpaged(); (3)

1	Obtain a NoSql search service from the service client for the Parent entity type definition
2	Using an (injectable) instance of `NoSqlSearchHelper` it is possible to get the name of the field containing the child IDs in Solr
3	Get the ID of the child entity to search for from the entity client of the relation child entity

Rebuilding the Solr index

Currently, there is no automated way to rebuild the Solr index. If the data in Solr was lost or corrupted, it can be rebuilt using the NOSql queue tables of the affected type definitions.

When an entity is added or updated in a type definition that is annotated with @NOSql, an entry for this entity is added to a queue table by a trigger on the database. This queue table is named like the main table for the type definition with a _nq suffix. So for example, if the main table is called my_type_definition, the queue table would be called my_type_definition_nq. It contains a column for each attribute that is supposed to be contained in Solr and the following system fields:

Field

Explanation

nosql_queue_id

The ID of the entry in the queue. This value is assigned automatically by the database.

nosql_processing_counter

A counter that is incremented each time the system has tried to store the entry in Solr. The initial value is supposed to be 0.

nosql_trigger_operation

The name of the operation that caused the entry to be added to the queue table. Could be INSERT, UPDATE or DELETE.

A system job periodically reads the entries contained in the queue table and stores them in Solr. When the entry was successfully stored in Solr, it is deleted from the queue. If the entry could not be stored in Solr, the processing counter is incremented and the job will try again until the maximum number of tries has been reached.

To rebuild the Solr index, the data to be stored in Solr has to be added to the queue table by copying the data contained in the main table of the type definition. By definition, only the most recent version of each entity is contained in Solr, so the data contained in the version table of the type definition is of no interest here. The value of the nosql_processing_counter field must be set to 0. To treat the entity as a new object in Solr, set the value of the nosql_trigger_operation field to INSERT. To perform an update on an already existing object in Solr, set the value to UPDATE.

The code of the trigger function used to populate the queue table might be useful when creating the SQL script that copies the entries. The trigger function is called like the main table of the type definition with a _ntf suffix.

When the type definition makes use of the @NOSqlResolvedRelations annotation, an additional field containing the resolved IDs of the child entities of the relations on each entity will be contained in the queue table. This field will be called relation_<type-id>_child_ids where type-id is the numeric ID of the relation type definition. This ID can be found in the ecr_types table. To store the resolved child IDs in Solr, additional entries for each entity containing the entity ID, the resolved child IDs, the update counter and the trigger operation are added to the queue table. Note that the value for the trigger operation field must be set to UPDATE in this case. The resolved child IDs can be copied from the table containing the relation. The parent_id field in this table will contain the ID of the entity that is to be stored in Solr. The child_id field contains the IDs of the related child entities.

The code of the trigger function used to populate the queue with the entries containing the resolved child ID might be useful when writing the SQL script to copy these values. The trigger function will be called like the main table of the relation type definition with a _nrtf suffix.

Example scripts

The following examples show how to copy data to the queue table. The main table of the type definition in the example is called test_simple_invoice.

copying data from the main table to the queue table

insert
    into
    "test_simple_invoice_nq"(
        "version_number",
        "latest_version_id",
        "version_comment",
        "initial_creation_date",
        "creator_user_id",
        "retention_date",
        "modification_user_id",
        "creation_date",
        "update_counter",
        "last_delete_restore_date",
        "litigation_hold",
        "deleted",
        "parent_id",
        "modification_date",
        "id",
        "acl_id",
        "amount",
        "invoice_number",
        "nosql_processing_counter",
        "nosql_trigger_operation"
    )
select
    "version_number",
    "latest_version_id",
    "version_comment",
    "initial_creation_date",
    "creator_user_id",
    "retention_date",
    "modification_user_id",
    "creation_date",
    "update_counter",
    "last_delete_restore_date",
    "litigation_hold",
    "deleted",
    "parent_id",
    "modification_date",
    "id",
    "acl_id",
    "amount",
    "invoice_number",
    0,
    'INSERT'
from "test_simple_invoice";

copying the resolved child IDs to the queue table

insert
    into
    "test_simple_invoice_nq"(
        "id",
        "relation_32877_child_ids",
        "nosql_processing_counter",
        "nosql_trigger_operation"
    )
select distinct
    a."parent_id",
    array(select b."child_id" from "test_invoice_person_relation" b
        where b."parent_id" = a."parent_id" and b."parent_version_id" is null),
    0,
    'UPDATE'
from "test_invoice_person_relation" a;

System jobs

The arveo system uses several background jobs to perform essential functions. These jobs are managed by a clustered Quartz scheduler running inside the repository service and/or in a dedicated job service. The scheduler instances are synchronized using the database. The repository service creates the jobs and initial trigger configurations when the system is started for the first time. Afterwards, it is possible to modify the scheduled jobs manually.

By default, the scheduler embedded in the repository service is used to create and to execute the jobs. Dedicated job service instances configured to use the same database as the repository service can be used to execute the jobs as well. It is also possible to start the scheduler embedded in the repository service in standby mode. In standby mode, the repository service will create the jobs (if required), but it will not execute them.

The available configuration parameters for the scheduler are listed here: Job service

The available configuration parameters for the jobs are listed here: Job configuration

It is required to configure the user and the password to be used for the jobs. The user has to own the required authorities to execute the jobs: ECR_PURGE_RECOVERY_TABLE and the autority configured in security.general.role-for-secured-access (by default ECR_SERVICE_USER).

Clean recovery table job

The expired entries in the recovery table (see Recovery) are deleted by the clean recovery table job. By default, the job is triggered every day at 3 a.m. The user configured to execute the system jobs needs to have the ECR_PURGE_RECOVERY_TABLE authority to be able to perform this operation.

NOSQL queue job

Data that is supposed to be stored in the SOLR NOSQL database is stored in dedicated queue tables in the relational database used by arveo. The NOSQL queue job is used to read the data from the queue tables and to write it to SOLR. By default the job is scheduled once a second individually for every queue table in the system. It is possible to configure the number of entries to process in one run of the job as well as the maximum number of attempts to write the data to SOLR.

When the NOSQL feature is disabled for a type definition, the queue job and the triggers for the queue job have to be disabled or removed manually from the scheduler.

Using external Job Service instances

It is possible to use one or more external Job Service instances to execute the scheduled system jobs. The service must be configured to use the same tenants as the repository service. To be able to execute the system jobs, the job implementations must be present in each of the Job Service’s class paths. The jobs are available as a ZIP file (ecr-packaging-jobs-external<version>.zip) that contains all required libraries. Simply extract the contents of the ZIP file to a directory (e.g. libs) and start the Job Service with the following parameter: -Dloader.path=libs.

The configuration parameters for the jobs are already configured in the database. No further configuration parameters for the jobs are required in the Job Service’s configuration. However, the service must be able to authenticate to the repository service. As the jobs use a username and password to obtain an access token, the service needs OAuth client registrations both for the client_credentials and for the password grant types. The following example shows how to configure two client registrations for the service:

OAuth client registrations

spring:
  security:
    oauth2:
      resourceserver:
        jwt:
          public-key-location: "http://localhost:39004/oauth/public_key"
      client:
        registration:
          cmn-user-service-client-credentials:
            provider: user-service
            client-id: "tech-client"
            client-secret: "tech-secret"
            authorization-grant-type: "client_credentials"
            scope: "oauth2"
          cmn-user-service-password:
            provider: user-service
            client-id: "test-client"
            client-secret: "my-secret"
            authorization-grant-type: "password"
            scope: "oauth2"
        provider:
          user-service:
            authorization-uri: "http://localhost:39004/oauth/auth"
            token-uri: "http://localhost:39004/oauth/token"

By default, the scheduler included in the repository service instances will be used to execute the scheduled jobs, too. When the scheduler in the repository service is started in standby mode, only the external Job Service instances will execute the scheduled jobs. The following configuration can be used to start the repository service with a scheduler in standby mode:

Scheduler in standby mode

job-service:
  standbyOnlyScheduler: true

Using Hashicorp Vault

Hashicorp Vault can be used to store sensitive configuration parameters like database passwords or encryption master keys. Each arveo service tries to load configuration data from a Vault instance at startup. To configure the location and access method for Vault, the following application arguments can be used:

spring.cloud.vault.host: Defines the host name of the Vault host.
spring.cloud.vault.port: Sets the port used to connect to Vault.
spring.cloud.vault.scheme: Either https or http
spring.cloud.vault.authentication: Sets the authentication mechanism to use.

These properties cannot be configured using the Configuration Service. Configuration data from the Configuration Service is loaded after the connection to Vault has been established. Instead, these properties must be set as application parameters. Example: java -jar service.jar --spring.cloud.vault.port=8200

It is possible to disable the Vault integration by setting spring.cloud.vault.enabled=false.

Additional information about the configuration parameters, especially the possible authentication mechanisms, can be found in the Documentation of the Spring Cloud Vault project.

Defining secrets

Vault features several ways to provide secrets to applications. Configuration properties for the arveo services must be stored in the key value secrets engine. Each secret consists of a path and several key-value-pairs. The path defines the scope of the property. It can either be set to application to store a secret for all services, or to the name of the service just like the name of the configuration files in the Configuration Service. For example, to configure the password of the JDBC datasource used by all services, a key-value-pair of spring.datasource.password=password would be stored under the path application. A property for the repository service (ecr-service) would be stored in a key-value-pair property=value under the path ecr-service. The following table contains the application names of the different services.

Service

Application name

Repository Service

ecr-service

User Management Service

cmn-user-management-service

User Management Access Control Service

cmn-user-management-access-control-service

Enterprise User Management Service

cmn-user-management-enterprise-service

Document Conversion Service

document-conversion-service

Administration Service

cmn-administration-service

Audit Service

cmn-audit-service

Integration Service

cmn-integration-service

Configuration Properties

ecr.server.caching

Property

Type

Description

Default value

content-access-tokens.expire-after

java.time.Duration

The time after which an entity in the cache will be expired.

15m

content-access-tokens.size

java.lang.Long

The maximum number of entities in the cache.

500

default-acls.expire-after

java.time.Duration

The time after which an entity in the cache will be expired.

15m

default-acls.size

java.lang.Long

The maximum number of entities in the cache.

500

enums.expire-after

java.time.Duration

The time after which an entity in the cache will be expired.

15m

enums.size

java.lang.Long

The maximum number of entities in the cache.

500

type-definition-access.expire-after

java.time.Duration

The time after which an entity in the cache will be expired.

15m

type-definition-access.size

java.lang.Long

The maximum number of entities in the cache.

500

type-definitions.expire-after

java.time.Duration

The time after which an entity in the cache will be expired.

15m

type-definitions.size

java.lang.Long

The maximum number of entities in the cache.

500

ecr.server.content-access-tokens

Property

Type

Description

Default value

alias

java.lang.String

The alias of the certificate used to sign the tokens.

key-password

java.lang.String

The password for the alias.

key-store-password

java.lang.String

The password for the keystore.

key-store-path

java.lang.String

The absolute path to the keystore that contains the certificate used to sign the tokens.

key-store-type

java.lang.String

The type of the keystore (e.g. PKCS12, JKS…)

max-token-lifetime

java.time.Duration

The maximum allowed lifetime of the generated tokens.

ecr.server.http

Property

Type

Description

Default value

file.directory

java.io.File

The directory used to store the temporary files.

file.prefix

java.lang.String

The prefix to use for the names of the temporary files.

temp

file.suffix

java.lang.String

The suffix to use for the names of the temporary files.

.dat

file.threshold

java.lang.Integer

The size of the file in bytes from which on a temporary file will be used for buffering.

131072

ecr.server.jobs

Property

Type

Description

Default value

clean-recovery-table.cron-expression

java.lang.String

Defines the CRON expression used to schedule the job.

0 0 3 * * ?

clean-recovery-table.enabled

java.lang.Boolean

If set to false, the job will not be scheduled.

true

jms-statistics.cron-expression

java.lang.String

Sets the CRON expression used to schedule the job.

*/15 * * * * ?

jms-statistics.enabled

java.lang.Boolean

If set to false, the job will not be scheduled.

false

jms-statistics.jms-receive-timeout

java.time.Duration

Defines the time the job will wait for a reply from the message broker.

no-sql-queue.batch-size

java.lang.Integer

Sets the number of entries to load from the queue table in one batch.

100

no-sql-queue.cron-expression

java.lang.String

Sets the CRON expression used to schedule the job.

*/1 * * * * ?

no-sql-queue.enabled

java.lang.Boolean

If set to false, the job will not be scheduled.

true

no-sql-queue.retries

java.lang.Integer

Sets the maximum number of attempts to write an entry in the queue to solr.

password

java.lang.String

Defines the password of the user used to run the jobs.

retention-cleanup.global-settings.asynchronous

java.lang.Boolean

If true, content of documents will be removed from the storage asynchronously using a message queue. The database entries will not be removed but marked as deleted using the COMPLIANCE_DELETED field.

false

retention-cleanup.global-settings.batch-size

java.lang.Integer

Defines the size of a single batch of entities processed by the job.

1000

retention-cleanup.global-settings.max-message-queue-size

java.lang.Integer

The maximum acceptable size of the message queues used by the job. Checked when the job is started. If the size of one of the queues exceeds the limit, the job is cancelled.

100000

retention-cleanup.global-settings.max-runtime

java.time.Duration

Maximum acceptable runtime for the job. When the time is exceeded, the job is cancelled.

retention-cleanup.global-settings.purge-content

java.lang.Boolean

If true, all content elements of a document and all it’s versions will be deleted.

true

retry-renditions.batch-size

java.lang.Integer

Defines the number of document versions to select in one run of the job.

1000

retry-renditions.cron-expression

java.lang.String

Defines the CRON expression used to schedule the job.

0 0 3 * * ?

retry-renditions.enabled

java.lang.Boolean

If set to false, the job will not be scheduled.

true

username

java.lang.String

Defines the name of the user used to run the jobs.

ecr.server.jobs.retention-cleanup

Property

Type

Description

Default value

global-settings.asynchronous

java.lang.Boolean

If true, content of documents will be removed from the storage asynchronously using a message queue. The database entries will not be removed but marked as deleted using the COMPLIANCE_DELETED field.

false

global-settings.batch-size

java.lang.Integer

Defines the size of a single batch of entities processed by the job.

1000

global-settings.max-message-queue-size

java.lang.Integer

The maximum acceptable size of the message queues used by the job. Checked when the job is started. If the size of one of the queues exceeds the limit, the job is cancelled.

100000

global-settings.max-runtime

java.time.Duration

Maximum acceptable runtime for the job. When the time is exceeded, the job is cancelled.

global-settings.purge-content

java.lang.Boolean

If true, all content elements of a document and all it’s versions will be deleted.

true

ecr.server.liquibase

Property

Type

Description

Default value

auto-change-log

java.lang.String

Defines the location used to store the auto generated changelog.

changeLog/auto.xml

changelog-directory

java.lang.String

The directory used when generated changelogs are kept. This setting is only relevant when keepChangelogs is set to true.

changelog

custom-change-log

java.lang.String

Defines the location of a custom liquibase changelog to execute on startup after the database schema was initialized. Changelogs can be loaded from the classpath by adding the 'classpath:' prefix. Files must be identified by an absolute path using the prefix 'file:/'.

keep-changelogs

java.lang.Boolean

If set to true, generated changelogs will be kept in separate files in the configured directory.

false

pre-initialization-change-log

java.lang.String

Defines the location of a custom liquibase changelog to execute on startup before the database schema was initialized. Changelogs can be loaded from the classpath by adding the 'classpath:' prefix. Files must be identified by an absolute path using the prefix 'file:/'.

ecr.server.memory

Property

Type

Description

Default value

buffer-size

java.lang.Integer

Defines how many bytes of data to keep in memory when working with streams before switching to a temporary file.

1024000

ecr.server.messaging

Property

Type

Description

Default value

json-messages

java.lang.Boolean

If enabled, the payload of JMS messages will be a JSON string.

true

queue-listener-concurrency

java.lang.String

Specify the number of threads used for listeners for queues (NOT topics!) via a "lower-upper" String, e.g. "5-10", or a simple upper limit String, e.g. "10" (the lower limit will be 1 in this case).

1-10

redelivery.back-off-multiplier

java.lang.Integer

The number to multiply the redelivery delay with for every redelivery attempt.

redelivery.initial-redelivery-delay

java.lang.Long

The time in milliseconds to wait until a failed message will be redelivered.

1000

redelivery.maximum-redeliveries

java.lang.Integer

The maximum number of redelivery attempts for failed messages.

redelivery.use-exponential-back-off

java.lang.Boolean

If true, the time between redeliveries of a failed message will be multiplied with the backOffMultiplier for each redelivery.

true

ecr.server.query

Property

Type

Description

Default value

in-condition-optimization-limit

java.lang.Integer

Sets the number of entries in an in clause from which in the optimized query is used. -1 disables that feature.

-1

no-sql-query-time-warning-millis

java.lang.Integer

Sets the maximum duration in milliseconds for the execution time of queries on the noSql database after which a warning will be logged.

5000

statement-execution-time-warning-millis

java.lang.Integer

Sets the maximum duration in milliseconds for the execution time of a database statement after which a warning will be logged. This setting applies to the relational database.

5000

ecr.server.security

Property

Type

Description

Default value

type-definition-access-checks-enabled

java.lang.Boolean

Defines whether type definition specific access checks are enabled or not.

true

ecr.server.solr

Property

Type

Description

Default value

collection-name

java.lang.String

The name of the collection to use.

ecr

collection-replicas

java.lang.Integer

The default number of replicas for a new collection created by the respository service.

collection-shards

java.lang.Integer

The default number of shards for a new collection created by the respository service.

commit-within-millis

java.lang.Integer

Defines the maximum time in milliseconds after which the solr client will perform a commit.

1000

default-config-name

java.lang.String

Defines the default SolrConfig.

solr-plugin-config

host

java.lang.String

Defines the host for the connection of the Solr Client.

http-client-connection-timeout

java.lang.Integer

Defines the connection timeout for the Solr HTTP client in milliseconds.

10000

password

java.lang.String

The password used for basic authorization to SOLR.

schema-name

java.lang.String

The name of the schema. If null, the collection name is used.

ssl-key-store

java.lang.String

The path to the keystore to use for SSL communication.

ssl-key-store-password

java.lang.String

The password for the SSL keystore.

ssl-tru-ststore

java.lang.String

The truststore to use for SSL communication.

ssl-truststore-password

java.lang.String

The password for the SSL truststore.

use-ssl-client-auth

java.lang.Boolean

Whether to use SSL client authentication.

false

username

java.lang.String

The username used for basic authorization to SOLR.

ecr.server.storage

Property

Type

Description

Default value

profile-aliases

java.util.Map<java.lang.String,java.lang.String>

A mapping of alias names to storage profile names.

profile-templates

java.util.List<de.eitco.ecr.server.config.StorageProfileTemplate>

A list of profile templates used by the bucket selector plugin.

profiles

java.util.Map<java.lang.String,de.eitco.ecr.server.config.StorageProfileSettings>

A map containing all configured storage profiles.

ecr.server.system

Property

Type

Description

Default value

batch-update-statement-cache-enabled

java.lang.Boolean

Enables or disables the cache for generated batch update SQL statements.

true

create-solr-changes

java.lang.Boolean

If set to false when initializing the type schema solr changes will not be executed.

true

event-listeners-enabled

java.lang.Boolean

Enables or disables the JMS event listeners used to process system events like recycle bin cleanup and the creation of renditions.

true

fetch-jms-statistics

java.lang.Boolean

Enables or disables the regular fetching of JMS statistics from the ecr_jms_statistics table.

true

initialize-empty-database

java.lang.Boolean

If set to true, the system will create the schema even if not in maintenance mode should the table ecr_types be empty.

true

log-schema-changes

java.lang.Boolean

If set to true together with maintenanceMode, the system will only log required changes to the database schema and shut down after the log was written.

false

maintenance-mode

java.lang.Boolean

If true, the server will update the database schema at startup and shut down after the update was finished. This is actually a combination of updateSchema = true and terminateAfterCreation = true.

false

schema-change-log-directory

java.lang.String

The location of the logfile used when checkForSchemaChanges is set to true.

logs

terminate-after-creation

java.lang.Boolean

If true, the server will terminate after the database schema was created.

false

update-schema

java.lang.Boolean

Whether to update the database schema at startup or not.

false

ecr.server.upload

Property

Type

Description

Default value

maximum-file-size

java.lang.Long

Defines the maximum size of a single file in one multipart upload in bytes. -1 means no limit.

-1

maximum-in-memory-size

java.lang.Integer

Defines the maximum size of data to keep in memory before using a temporary file (in bytes).

1048576

maximum-total-size

java.lang.Long

Defines the maximum total size of all files in one multipart upload in bytes. -1 means no limit.

-1

job-service

Property

Type

Description

Default value

standby-only-scheduler

java.lang.Boolean

If true, the scheduler used by the job service will be in standby mode. It will not process any jobs.

false

wait-for-event

java.lang.Boolean

If true, the scheduler will not start to process events until the {@link de.eitco.commons.job.service.common.StartSchedulerEvent} is sent.

false

Monitoring

arveo uses Spring Boot Actuator to expose a monitoring REST API that can be consumed by monitoring systems like Prometheus or the Administration Service. The Actuator documentation linked above contains information about the available monitoring data, how to enable or disable specific endpoints and how to configure security.

The overview of all actuator endpoints is available at /actuator. Health information is available at /actuator/health.

Custom health indicators

In addition to the default health indicators, arveo provides the following additional health indicators:

storagePlugins: Checks if at least one storage profile is configured and if all storage plugins configured in the storage profiles are able to store data.
- FileSystemPlugin: Checks if the configured storage directory exists and whether the database sequence used to generate storage IDs is available.
- S3Plugin: Checks if the configured bucket exists. When the last storage operation has failed, the endpoint checks if the S3 service is available.
- SwiftV2Plugin, SwiftV3Plugin: Checks if the configured container exists. When the last storage operation has failed, the endpoint checks if the Swift service is available.
typeDefinitions: Checks if there is at least one registered type definition.

The custom health indicators can be disabled like any other health indicator by setting the configuration property management.health.key.enabled (where key is the name of the indicator) to false.

Custom endpoints

In addition to the default actuator endpoints, arveo provides the following custom actuator endpoints.

storageProfiles: Provides a list of all storage profiles and the storage plugin used by each profile.
typeDefinitions: Provides a list of all type definitions.

The custom endpoints can be disabled like any other actuator endpoint by setting the property management.endpoint.key.enabled (where key is the name of the endpoint) to false.

Custom metrics

In addition to the default metrics, arveo provides additional metrics that can be used to monitor the performance of the system.

Storage

For each storage profile a metric is available that records the following statistics:

Metric Description

ecr.storage.read.count

Number of read operations

ecr.storage.write.count

Number of write operations

ecr.storage.read.bytes

Total amount of bytes read

ecr.storage.write.bytes

Total amount of bytes written

ecr.storage.read.error

Number of read errors

ecr.storage.write.error

Number of write errors

ecr.storage.read.time

Read times

ecr.storage.write.time

Write times

Each metric contains a tag named profile with a value for each configured storage profile.

These metrics are reset each time the repository service instance is restarted.

Profiles that use the BucketOrganizerPlugin are not included in the metrics. Instead, a separate metric for each of the referenced profiles used by the bucket organizer profile is available.

It is possible to disable the recording of these metrics by setting the following parameters to false. This does not only disable the availability of the metrics but the entire recording mechanism.

management:
  metrics:
    enable:
      ecr:
        storage: false

Relational database

The following metrics are collected for operations on the relational database:

Metric Description

ecr.rdb.statement

Maximum and total execution time as well as the number of executed database statements

ecr.rdb.error.count

Number of database errors

ecr.rdb.time.warning

Number of statements that took longer than the configured threshold to execute

Each of these metrics contains a tag for the current tenant and the type of statement that was executed. The threshold time after which an execution time warning is logged and the counter is incremented can be configured using the setting ecr.server.query.statementExecutionTimeWarningMillis (in milliseconds).

The recording of these metrics can be disabled using the following configuration parameter:

management:
  metrics:
    enable:
      ecr:
        rdb: false

Solr

The following metrics are collected for the Solr object database:

Metric Description

ecr.solr.add

Maximum and total time as well as a counter for add operations

ecr.solr.query

Maximum and total time as well as a counter for query operations

ecr.solr.deletebyid

Maximum and total time as well as a counter for delete-by-id operations

ecr.solr.error

A counter for all Solr exceptions

ecr.solr.time.warning

Number of queries that took longer than the configured threshold to execute:

All of the above metrics contain a tag for the current tenant. The error counter contains a tag for the type of operation that has failed. The threshold time after which an execution time warning is logged and the counter is incremented can be configured using the setting ecr.server.query.noSqlQueryTimeWarningMillis (in milliseconds).

It is possible to disable the collection of these metrics using the following configuration parameter:

management:
  metrics:
    enable:
      ecr:
        solr: false

JMS queues

arveo provides metrics to monitor the state of the JMS queues used for asynchronous operations. These metrics rely on the statistics plugin of ActiveMQ to retrieve statistics. The plugin must be activated in activemq.xml as shown below:

<broker...>
    <plugins>
        <statisticsBrokerPlugin/>
    </plugins>
</broker>

The statistics collected by the statistics plugin are polled periodically by a job running in the repository service. The job is disabled by default. To activate it, set the following parameters in the configuration of the repository service:

ecr:
  server:
    jobs:
      jms-statistics:
        cron-expression: "*/15 * * * * ?"
        enabled: true

In the example configuration above, the job is triggered every 15 seconds.

Unlike other system jobs, this job always runs within the repository service. It cannot be offloaded to a separate job service instance.

The following metrics will be available once the job was activated:

Metric Description

ecr.jms.queue.size

The number of messages currently contained in the queue.

ecr.jms.queue.enqueuecount

The total number of messages that have been enqueued in the queue.

ecr.jms.queue.dequeuecount

The total number of messages that have been dequeued from the queue.

ecr.jms.queue.averageenqueuetime

The average time a message was enqueued before it was dequeued.

Each metric contains a tag called queue containing the name of the queue. The following queues are currently used by the repository service:

Queue Description

ecr-queue-content-to-purge

Contains messages containing IDs of content elements that have to be purged from the storage.

ecr-queue-delete-recycled

Contains messages with a delivery delay that will cause an entity to be deleted from the recycle bin once it’s recycle delay has expired.

ecr-queue-create-renditions

Contains messages of renditions that have to be created for new content elements.

DLQ.ecr-queue-create-renditions

The dead letter queue for the ecr-queue-create-renditions queue. This is used to set the rendition availability status to failed once all retries for the creation of a rendition have failed.

Type definitions

arveo provides metrics for several operations for each type definition. The following metrics are available:

Metric Description

ecr.typedefinition.read

Counter and time measurements for read operations.

ecr.typedefinition.read.error.client

Counter for read operation errors caused by the client.

ecr.typedefinition.read.error.server

Counter for read operation errors caused by the server.

ecr.typedefinition.delete

Counter and time measurements for delete operations.

ecr.typedefinition.delete.error.client

Counter for delete operation errors caused by the client.

ecr.typedefinition.delete.error.server

Counter for delete operation errors caused by the server.

ecr.typedefinition.create

Counter and time measurements for create operations.

ecr.typedefinition.create.error.client

Counter for create operation errors caused by the client.

ecr.typedefinition.create.error.server

Counter for create operation errors caused by the server.

ecr.typedefinition.update

Counter and time measurements for update operations.

ecr.typedefinition.update.error.cliet

Counter for update operation errors caused by the client.

ecr.typedefinition.update.error.server

Counter for update operation errors caused by the server.

ecr.typedefinition.recycle

Counter and time measurements for recycle operations.

ecr.typedefinition.recycle.error.client

Counter for recycle operation errors caused by the client.

ecr.typedefinition.recycle.error.server

Counter for recycle operation errors caused by the server.

ecr.typedefinition.restore

Counter and time measurements for restore operations.

ecr.typedefinition.restore.error.client

Counter for restore operation errors caused by the client.

ecr.typedefinition.restore.error.server

Counter for restore operation errors caused by the server.

ecr.typedefinition.find

Counter and time measurements for find operations.

ecr.typedefinition.find.error.client

Counter for find operation errors caused by the client.

ecr.typedefinition.find.error.server

Counter for find operation errors caused by the server.

ecr.typedefinition.batchupdate

Counter and time measurements for batchupdate operations.

ecr.typedefinition.batchupdate.error.client

Counter for batch update operation errors caused by the client.

ecr.typedefinition.batchupdate.error.server

Counter for batch update operation errors caused by the server.

Each of these metrics has a tag called type-definition containing the name of the type definition the measurement was taken for.

Prometheus

arveo provides an actuator endpoint that can be used to collect metrics data using Prometheus. Prometheus collects data by periodically calling configured sources ("scrapes"). The following example shows an entry in the prometheus.yml file for a scrape configuration that collects data from the prometheus actuator endpoint every 15 seconds:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'arveo'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['localhost:39001']

The metrics support of arveo is based on Micrometer. To support monitoring systems like Prometheus, micrometer remembers the last maximum value of time based metrics for a configurable amount of time. This time should be close to the scrape interval of Prometheus and can be configured in the configuration properties of arveo as shown in the example below:

management:
  metrics:
    export:
      prometheus:
        step: 15s

The data collected by Prometheus can be visualized using Grafana.

Other monitoring systems

Support for other monitoring system then Prometheus can be enabled by adding the required library to the classpath. The Spring Boot documentation contains a list of the supported monitoring systems and further information about how to configure them.

Attributes in MDC

arveo adds the following additional attributes to the mapped diagnostic context (MDC) of the logging framework to make it easier to analyze the system’s behavior:

ecr.user-id: The ID of the user performing the current request, if available
ecr.tenant: The tenant for which the current request is executed.

These attributes are added using a HandlerInterceptor for the REST endpoints using Spring WebMVC.Note that these attributes will not be added when arveo is used embedded.In this case, the application using the embedded arveo instance is responsible for adding required information to the MDC.

Depending on the logger appender in use, it is possible to add these attributes to log messages. See the documentation of logback for details.

Open Telemetry

arveo supports using Open Telemetry to monitor the system’s behavior. Most notably it is possible to view traces of requests across the different services using a tracing backend like Jaeger or Zipkin. The outermost span of a trace that was started by a user’s request will contain the ecr.user-id and ecr.tenant attributes containing the user’s ID and the current tenant. This is done by the same mechanism as described above for the MDC.

Because arveo is based on several widely used open source libraries, the automatic instrumentation mode of Open Telemetry can be used to record traces. This is done by the Open Telemetry java agent as described in the Open Telemetry documentation.

The following example shows the required parameters to use Open Telemetry with Jaeger for the repository service.

Start parameters for Open Telemetry

-Dotel.traces.exporter=jaeger
-Dotel.metrics.exporter=none
-Dotel.service.name=repository-service
-javaagent:<path>/opentelemetry-javaagent.jar

The metrics export is disabled in the above example. As described in the sections above, metrics can be collected using the actuator endpoints.

Access Control

Access rights

The REST API has the following user-rights (authorities) for different endpoints:

ECR_SERVICE_USER (configurable): Required authority for all API endpoints. Must always be present.
ECR_ADMIN: Allows editing type- and attribute definitions as well as other administrative operations.
ECR_DSGVO_ADMIN: Allows a user to change the litigation hold and retention settings of entities contained in type definitions using the retention feature.
ECR_DSGVO_PRIVILEGED_DELETE: An addition to ECR_DSGVO_ADMIN that allows a user to delete an entity which is still within it’s retention period. Organisational precautions must be put in place to ensure DSGVO compliance when making use of this authority.
ECR_ALL_TYPES_READ: Allows read access to all type definitions that use type level access restrictions.
ECR_ALL_TYPES_WRITE: Allows write access to all type definitions that use type level access restrictions.
ECR_PURGE_RECOVERY_TABLE: Allows a user to trigger the removal of expired entries in the recovery table.

Access control lists

arveo makes use of access control lists (ACLs) to protect entities. Each entity can be protected by one ACL. The handling of ACLs is performed by the Access Control Service. The documentation of the Access Control Service contains more information about the general concept of the ACLs used by arveo.

Mapping of the Access Control List values

Although the module user-management-access-control defines the concepts and functionality of the ACLs, the actual mapping of the values is implemented in arveo. The class de.eitco.ecr.acl.AclRight implements the following permissions using the numeric values shown below.

/**
 * The user is allowed to see the object's meta data but not the content.
 */
BROWSE(4000),

/**
 * The user is allowed to see the meta data and content of the object.
 */
READ(8000),

/**
 * The user is allowed to add annotations to the object.
 */
COMMENT(12000),

/**
 * The user is allowed to change meta data and content of the object creating a new version.
 */
WRITE(16000),

/**
 * The user is allowed to overwrite an existing version of the object.
 */
OVERWRITE(20000),

/**
 * The user is allowed to delete the object.
 */
DELETE(24000),

/**
 * The user is allowed to change the ACL of the object.
 */
CHANGE_ACL((Short.MAX_VALUE - 1));

To illustrate the information above, here are some examples:

The permission COMMENT is assigned to a group or a job position. In this case, the assignee is per default granted the permissions BROWSE and READ;
The prohibition WRITE is assigned to a group or a job position. In this case, the assignee is per default prohibited all the higher rights, so OVERWRITE, DELETE and CHANGE_ACL;
A job position J1 with the permission COMMENT acts as a substitute for a job position J2 with the permission OVERWRITE. So the job position J1 is assigned the permission OVERWRITE for the time of the substitution.
A job position J1 with the prohibition WRITE acts as a substitute for a job position J2 with the prohibition READ. The job position J1 is still assigned the prohibition WRITE for the time of the substitution. This way, it is guaranteed, that J1 is still able to perform their tasks (which would become impossible if they were assigned stronger prohibitions, that is, the prohibitions of J2).

Batch updates for ACLs

It is possible to change values of multiple ACLs. So lists of ACLs, that satisfy a certain condition, can be processed. For every ACL, that fulfills a given condition, the following modifications can be specified:

addgroupright (adds a given right to a given group in every ACL that fulfills the condition);
adduserright (adds a given right to a given user in every ACL that fulfills the condition);
keepgroupright (keeps the current right of a given group in every ACL that fulfills the condition);
keepuserright (keeps the current right of a given user in every ACL that fulfills the condition).

All the other entries in the ACLs that fulfill the condition, are removed.

The ACL updates are performed in the the module 'common', in the Client SDK by the class AclServiceClient. It calls the method updateAclsWhere() and passes two parameters: an Expression of type Boolean (the condition mentioned above) and a List of type AccessControlListModification as modifications (the four modifications mentioned above) to apply to every ACL. The method updateAclsWhere() executes the given ACL batch updates.

There is also a more convenient method with the same name updateAclsWhere(), that returns a ConditionBuilder, that can be used in searches (see Search Service).

Example of usage

Consider the following snippet from a test class as an example of the batch update functionality for the ACLs. Pay attention to the method setRightsTo(), which is called to modify the current rights.

Example of batch update functionality

        aclServiceClient.updateAclsWhere().contextReference("id").in().values(
                acl1.getIdentifier().getValue(),
                acl2.getIdentifier().getValue()
            ).holds()
            .setRightsTo(GrantAndDeny.grant(AclRight.READ)).of(umAdmin.getIdentifier())
            .execute();

Attribute Based Access Control (ABAC)

arveo allows entity access based on attributes of that entity. This can be specified per entity by a static method annotated with @Security. The method must return an eql expression that resolves to a boolean i.e. a condition. It will be called by arveo when entities of the given type are accessed to retrieve an additional filter for the access. Operations will only affect entities where the condition evaluates to true.

Every operation on entities of the type will execute the method and add the resulting expression to the filter of the operation:

Searches will add the expression to the filter of the search request.
Batch operations will add the expression to their filter
Calls that operate on a specific id will fail if the expression yields false.

The simplest case would look like this:

The simplest access check


  1
2
3
4
5
6
7
8
9
10

  @Type(ObjectType.DOCUMENT)
public interface UnsecuredDocuments {

    @Security
    static Expression<Boolean> calculateAccess() {

        return Eql.alwaysTrue();
    }
}

This would add the filter 'true' to every operation on the entity, which would allow anyone to access entities.

In most cases, one would want to compare attributes of the entity with properties of the user requesting the current operation. The first can be accomplished with the eql:

Accessing an attribute


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

  @Type(ObjectType.CONTAINER)
public interface ThresholdContainer {

    int getThreshold();

    void setThreshold(int threshold);

    @Security
    static Expression<Boolean> calculateAccess(Alias alias) {

        return EcrQueryLanguage.condition().
                        alias(alias).field("threshold").
                                greaterThan().value(300);
    }
}

Users may only access entities of the type above where the field 'threshold' is greater than 300.

In order to check the user requesting an operation, one can define a parameter to the method of the type AuthenticationContext. Other information may be accessed this way, too. The method can have up to four parameters of the following types:

AuthenticationContext: this class holds information about the user requesting the operation.
AclRight: the right needed to perform the operation.
Alias: identifies the part of the query that holds the entity
DSLContext: an entrypoint to the jooq api bound to the database and schema the table containing the entities is located in.

Parameters

`AuthenticationContext`: who requests the operation?

The AuthenticationContext holds information about the user requesting the operation. This parameter will most likely be used in every such method, except for the most basic cases.

Take a case where access to a document is specified by a field named access_token. It holds the name of a user-management authority every user with access to it must have. If it is null, every user has access to the document:

A type specifying different access to different users


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

  @Type(ObjectType.DOCUMENT)
@OverwriteAllowed
public interface DocumentWithAccessToken {

    @Mandatory(false)
    String getAccessToken(); (1)

    void setAccessToken(String accessToken);

    @Security
    static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext, DSLContext dslContext) { (4)

        return EcrQueryLanguage.condition()
            .alias(alias).field("access_token").isNull() (2)
            .or()
            .value(authenticationContext.getAuthorities())
            .contains().alias(alias).field("access_token")
            .holds();
    }

    // ...
    // more attributes (3)
}

1	the type defines the attribute that specifies access
2	the query generated uses this attribute.
3	other elements of the type are omitted for the sake of readability
4	note that the third parameter is unused. In such a case it could be omitted.

`AclRight`: what will the operation do?

The AclRight parameter holds the right necessary to perform the operation requested. This is a hint for the method about what should actually be done in the operation. It allows differentiating between read and write access:

A type differentiating between read and write access


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

  @Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface ContainerAccessedByUserId {

    long getOwner(); (1)

    void setOwner(long owner);

    List<Long> getAudience(); (2)

    void setAudience(List<Long> audience);

    @Security
    static Expression<Boolean> checkAccess(Alias alias, AuthenticationContext authenticationContext, AclRight right) {

        long userId = authenticationContext.getUser().getIdentifier().getValue();

        if (AclRight.READ.getValue() < right.getValue()) { (3)

            return EcrQueryLanguage.condition().alias(alias).field("owner").equalTo().value(userId).holds();
        }

        return EcrQueryLanguage.condition()  (4)
            .alias(alias).field("audience").contains().value(userId)
            .or().alias(alias).field("owner").equalTo().value(userId)
            .holds();
    }

    // ...
    // more attributes
}

1	This type defines an attribute `owner` holding the user id of the user, responsible. The owner of an entity will be the only user to modify the entities.
2	The type also defines a list of user ids `audience`, holding the ids of users that may read the entity. Users that are neither `owner` nor `audience` have no access on the entity.
3	Thus, in cases where a right greater than `READ` is requested, the method returns an expression, that checks whether the current user is the owner of the document.
4	In every other case .i.e. the requested access right is `READ` or below, an expression is returned, that checks whether the current user is the owner or part of the audience.

`Alias`

The alias identifies the part of the query executed that contains the entity and should be used to reference its members.

Always use the alias as given in the examples. Other ways to reference the entity might work in most cases but only using the alias assures that referencing entity attributes works in every case.

The full class name is de.eitco.ecr.common.search.Alias. Avoid confusion with another Alias class.

`DSLContext`

In some cases, using expressions on the entity itself may become cumbersome or slow. For that, one can use the DSLContext parameter. This allows access by jooq to any table in the same schema the table of the requested entity is located in. It can be used to obtain specific data directly.

Since the access is directly to the database, there are no further access checks on queries using DSLContext.

Depending on the operation requested, the method may be able to execute INSERT or UPDATE statements. It is the responsibility of the security methods author to make sure changes do not create an inconsistent or otherwise corrupted state of the database. The simplest way to assure this, is to use the DSLContext only to read data.

Examples

Subselect

There might be cases where the attribute defining access is not part of the entity itself, but part of another entity referred to by a foreign key or a relation. In such cases a subselect comes handy. Assume two entity types: documents, to which access is restricted by an attribute named owner_group which is part of the second entity a container. An owner group must be given Documents are linked to their container with a foreign key named contained_in:

The container entity


  1
2
3
4
5
6
7
8
9
10
11
12
13

  @Type(ObjectType.CONTAINER)
public interface OwnedContainer {

    long getOwnerGroup(); (1)

    void setOwnerGroup(long ownerGroup);


    // ...
    // more attributes

}

The document entity


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

  @Type(ObjectType.DOCUMENT)
public interface OwnedDocument {

    @ForeignKey(target = OwnedContainer.class, targetProperty = "id")
    ContainerId getContainedIn(); (2)

    void setContainedIn(ContainerId container);

    @Security
    static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext) {

        List<Long> groupIds = authenticationContext.getAllGroups().stream().map(group -> group.getIdentifier().getValue()).collect(Collectors.toList()); (3)

        return EcrQueryLanguage.condition().alias(alias).field("contained_in").in() (4)
            .select("id").from("owned_container").as("container").where().
            contextReference("container", "owner_group").in().values(groupIds).holds().holds();
    }

    // ...
    // more attributes
}

1	The entity `OwnedContainer` holds the attribute that specifies access.
2	The entity `OwnedDocument` is linked with a container by its attribute `contained_in`.
3	The `AuthenticationContext` is used to obtain the ids of every group the current user is a member of.
4	The group ids are used to create a check whether the entity is contained in a container whose `owner_group` is one of the users groups.

Interface inheritance

Since attribute based security - by definition - is based on attributes, it must be able to be specified by type. However, in some cases a more general solution is desired. In these cases, java interface inheritance comes handy.

Assume the class DocumentWithAccessToken from above. Assume further that there are other types (ContainerWithAccessToken and FolderWithAccessToken) that should be secured by their access-token as well. In this case it is a good practice to combine the access method and field in a common superinterface:

Superinterface


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

  (2)
public interface WithAccessToken {

    @Mandatory(false)
    String getAccessToken();

    void setAccessToken(String accessToken);

    @Security
    static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext) {

        return EcrQueryLanguage.condition() (1)
            .alias(alias).field("access_token").isNull()
            .or()
            .alias(alias).field("access_token").in()
            .values(new ArrayList<>(authenticationContext.getAuthorities()))
            .holds();
    }

}

1	The check for the access token is defined here.
2	note that this interface does not specify an entity by itself, since it lacks a `@Type` annotation.

Then the types itself can simply inherit this feature:

Inheriting entity 1


  1
2
3
4

  @Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface ContainerWithAccessToken extends WithAccessToken {
}

Inheriting entity 2


  1
2
3
4

  @Type(ObjectType.FOLDER)
@OverwriteAllowed
public interface FolderWithAccessToken extends WithAccessToken {
}

Complex scenario: a Hospital

Here we take a look at a more complex example: a Hospital. The hospital manages documents concerning cases. A case belongs to a patient. Users of the system are hospital employees and may access data about documents, cases and patients. These users are part of one or several wards. For every ward there is a group in the system containing the users that are part of this ward. Cases have a list of wards - that may change over time - where the patient was treated for that case. Access is specified as follows;

A user may only access cases whose wards contain at least one ward, the user is a member of.
A user may only access patients whose cases he may access.
A user may only access document whose cases he may access.

Cases could be modeled as follows:

The case


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

  @Type(ObjectType.CONTAINER)
public interface MedicalRecordCase {

    @Mandatory
    @ForeignKey(target = MedicalRecordPatient.class, targetProperty = "id")
    ContainerId getPatient();  (1)

    void setPatient(ContainerId containerId);

    @Mandatory
    List<String> getWards(); (2)

    void setWards(List<String> wards);

    @Security
    static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {

        List<String> groupNames = authenticationContext.getAllGroups() (3)
            .stream().map(group -> group.getEntityName().getValue()).collect(Collectors.toList());

        Expression<Boolean> result = null;

        for (String groupName : groupNames) { (4)

            Expression<Boolean> wardCondition = EcrQueryLanguage.condition() (5)
                .alias(alias).field("wards").contains().value(groupName)
                .holds();

            if (result == null) {

                result = wardCondition;

            } else {

                result = Eql.or(result, wardCondition); (6)
            }
        }

        if (result == null) {

            return Eql.alwaysFalse(); (7)
        }

        return result;
    }


    // case attributes ... (8)
}

1	A case holds a foreign key to a patient. Since a case must have a patient, this attribute is mandatory.
2	A case has a list of wards, where it was treated. This attribute is also mandatory.
3	When computing access, the groups - and thus the wards - of the current user are obtained from the `AuthenticationContext`
4	Since it is necessary to check whether the intersection between the wards of the case and the groups of the user is not empty, it is iterated over all the groups of the user.
5	A condition is created that checks whether the entities wards contain the current group.
6	Access is granted when one of the conditions created yields `true`.
7	If the user is in no group whatsoever he may access no case at all.
8	Further attributes are omitted for the sake of readability.

Now Patients specify their security as follows:

The patient


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

  @Type(ObjectType.CONTAINER)
public interface MedicalRecordPatient {

    @Security
    static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {

        Alias caseAlias = Alias.byName("case"); (2)

        Expression<Boolean> caseAccessCondition = MedicalRecordCase.access(caseAlias, authenticationContext);  (1)

        return EcrQueryLanguage.condition().exists()
            .select("id").from(MedicalRecordCase.class).as(caseAlias.getValue()) (3)
            .where()
            .alias(caseAlias).field("patient").equalTo().alias(alias).id() (4)
            .and(caseAccessCondition).holds().holds(); (5)
    }

    // patient attributes ...
}

1	Access to a patient depends on access to cases. So, the `MedicalRecordCase.access()` is called (see above).
2	In order to do that a custom alias is specified, that is used for the method call and in the query below.
3	Using a subselect its is checked whether there is a case …
4	… that is assigned to the patient the access is checked for and …
5	… and to which the current user may access.

Documents may specify their security method very similar, only the document-to-case link is specified the other way around:

The document


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

  @Type(ObjectType.DOCUMENT)
public interface MedicalRecordDocument {

    @ForeignKey(target = MedicalRecordCase.class, targetProperty = "id")
    @Mandatory
    ContainerId getCase(); (1)

    void setCase(ContainerId containerId);

    @Security
    static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {

        Alias caseAlias = Alias.byName("case");

        Expression<Boolean> caseAccessCondition = MedicalRecordCase.access(caseAlias, authenticationContext); (2)

        return EcrQueryLanguage.condition()
            .exists().select("id").from(MedicalRecordCase.class).as(caseAlias.getValue())
            .where()
            .alias(caseAlias).id().equalTo().alias(alias).field("case") (3)
            .and(caseAccessCondition).holds().holds();
    }

    // patient attributes ...
}

1	A document is assigned to a case. This is mandatory.
2	As for patients, the access check for documents depends on the access check for cases.
3	A similar subselect to the one above is created, however here the outer select holds the link to the inner one.

Revision history and ABAC

In the example above access to the entities is defined by one attribute: the wards of a case. It is assumed that a case may be treated in several wards - one after another - and every employee belonging to those wards needs access to the case, its patients data and its documents. Visiting the wards one after another will result in several updates on the case - each adding another ward - and thus in a revision history where the list of wards will build up over time.

This has an interesting consequence in the scenario above : The access to older versions of the case will be granted to users who were allowed to access it at the time the version was created.

For instance, if a case started in the pulmonology it would have the following revision list:

revision	ward(s)
1	pulmonology

If it was moved to intensive care after that, it would result in the following revision list:

revision

ward(s)

pulmonology

pulmonology, intensive care

Employees working in intensive care would be unable to access data of revision 1 of this case. Depending on the scenario this might or might not be desired.

If this is not desired, it can be fixed with a simple annotation on the case interface:

An alternative case


  1
2
3
4

      @Mandatory
    @Versioned(value = false)
    List<String> getWards();

By simply specifying the wards attribute as not versioned, changes on the attribute will affect every revision of the case. If a case started in the pulmonology it would at first have the same revision history as above:

revision	ward(s)
1	pulmonology

However, if it was moved to intensive care now, the revision list would look like this:

revision

ward(s)

pulmonology, intensive care

Now all employees in pulmonology and intensive care have access to every revision of this case.

This solution can be used generally. When access control to entities depends on attributes, deciding whether those attributes are versioned or not is an important detail.

Accessing external tables

Assume that in the hospital from the example above, the information which employee belongs to which ward is kept in a separate table named 'employee_to_ward'. This table is managed by an external application.

Using direct database access

As stated earlier, it is possible to add a parameter of the type org.jooq.DSLContext to a security method in order to gain direct access to the database. This could be used to access the 'employee_to_ward' table:

Using DSLContext


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

      @Security
    static Expression<Boolean> access(
        Alias alias,
        AuthenticationContext authenticationContext,
        DSLContext context  (1)
    ) {

        long userId = authenticationContext.getUser().getIdentifier().getValue(); (2)

        final List<String> wards = context.selectFrom("test_employee_to_ward")
            .where(DSL.field(DSL.name("employee")).eq(DSL.value(userId))) (3)
            .fetch(DSL.field("ward", String.class));

        Expression<Boolean> result = null;

        for (String ward : wards) {

 // ... (as above) (4)

1	The `DSLContext` is defined as another parameter.
2	The `AuthenticationContext` is only used to get the current users id.
3	The wards of the user are obtained using the jooq-api to directly access the database. Depending on the scenario, it might improve performance to cache the result of this query.
4	After that, the same code as above is executed.

Using a Metadata type

Alternatively, an arveo custom @Type could be used to access the external table:

An external type


  1
2
3
4
5
6
7
8
9
10
11
12
13

  @View (1)
@Name("employee_to_ward") (2)
@Type(ObjectType.META)
public interface UserToWard {

    long getEmployee();

    void setEmployee(long employee);

    String getWard();

    void setWard(String ward);
}

1	The `@View` annotation marks the type as external. This means arveo will not create the corresponding table.
2	The `@Name` annotation specifies the name of the table the types entities are stored in.

Now, in the security method this type can be accessed with a subselect:

Using subselect


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

      @Security
    static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {

        final Alias userWard = Alias.byName("user_ward");  (1)

        return EcrQueryLanguage.condition()
            .exists().select("ward").from(UserToWard.class).as(userWard.getValue())(2)
            .where()
                .alias(userWard).field("employee").equalTo()
                .value(authenticationContext.getUser().getIdentifier().getValue()) (3)
            .and()
                .alias(alias).field("wards").contains() (4)
                .alias(userWard).field("ward").holds()
            .holds();
    }

1	First, an alias is declared for the subselect.
2	Then, a query is created that checks whether there is a ward, that …
3	… the current user is assigned to and …
4	… that is contained in the current entities `wards` attribute.

Data Modelling

Entity types

The following chapter defines entity types and type definitions, used in arveo.

To be able to store objects in the database we define a class for entity definitions.So an entity represents a type of data structure used in the arveo.There are five supported entity types.

Document: an entity that can contain metadata and content. Documents are the only objects that can have content, the content may be binary. Documents can be contained in folders (Document).
Container: simple folder-like object not organized in a tree structure but with relations to other objects. A Container contains only metadata and cannot be contained in a folder (Container).
Relation: an entity that represents a relation between two other entities. A relation can contain metadata (Relation).
Folder: an entity that contains metadata and is organized in a tree structure like in a file system (Folder)
Meta: an entity that contains only metadata. Unlike containers, metadata entities do not support system attributes like ID and creation date (Metadata)

Each type definition is represented by one (or more) tables in the database.

Each entity is referred by its system-wide unique id, which consists of a tenant id and its type definition id, followed by the sequential database id of this entity:

[12bit Tenant id][14bit Type Definition id][38bit Entity id].

Versioned entities

All above listed entities (except for meta) are versioned by default. It means that they store version information, modification information. The class VersionInformation combines information about a version, including version id, version number and version comment. The version modification object stores a modification stamp, consisting of a user id and a ZonedDateTime object, both for the events of creation and last modification of the entity. The version information is stored in a separate table for each typed entity.

When specifying a type definition, you can decide which attributes of this type definition are versioned.

If none of the attributes are versioned, the entire object is not versioned. For the type Document the content changes are always versioned.

Custom types

You can make your class a type and add features by annotating your classes. You can define the custom metadata schema with simple getter and setter methods.

When you start a project you have to create your own types. Simply annotate the class with the TYPE annotation and define your schema with type safe getter/setter methods (Example).

You can find the arveo-specific annotations in the module type-definition-annotations. The goal is to create a type, and specify its properties. So annotations precisely define the behavior of the type definitions. When defining a type, a database table is created. To achieve this, you annotate the type definition with @Type. There is an exception to that: when annotating with @View or @Partial_View, no database table is created.

There are 2 types of annotations:

annotations on types (interfaces): @Target({ElementType.TYPE, ElementType.ANNOTATION_TYPE})
annotations on properties (getter-methods): @Target({ElementType.METHOD, ElementType.ANNOTATION_TYPE})

Some annotations can be used both on interfaces and on getter-methods. The annotation ElementType.ANNOTATION_TYPE is used for inherited annotations. The following annotation groups are used in arveo:

constraint: contains annotations that define specific properties or behaviour of attributes;
defaults: contains annotations that define default values of attributes;
index: contains annotations that define indexes on type definitions;
naming: contains annotations that specify names for tables, attribute definitions, type definitions, enumeration types and enumeration values;
reference: contains annotations that specify references between types or attributes;
system: contains annotations that concern system properties;
view: contains annotations that mark an interfaces as view;
other: contains annotations like @Type, @EcrIgnore and others, which stand out and cannot be classified into a group.

You can use the 5 entity classes to create custom entity types to serve the needs of your system. The customized entity types reflect the structure of your project or organization and can be created in a flexible way by extending the five entity types of the arveo system. You can make your class a type and add features by annotating your classes. You can define the custom metadata schema with simple getter and setter methods.

To create your first project using arveo you may want to review the following examples and follow the pattern.

Inherited annotations

Certain properties of annotations have a wide usage throughout the code, so it is therefore more convenient to define a certain annotation once for frequent usage.

The following is a listing of the interface definition @CustomAnnotation, which defines itself as a system property version id.If you mark a getter-method with this annotation, there is no need to list the system property name.

Listing of the interface @CustomAnnotation

@Target({ElementType.METHOD, ElementType.ANNOTATION_TYPE})
@SystemProperty(SystemPropertyName.VERSION_ID)
public @interface CustomAnnotation {

}

To take advantage of this interface, we annotate getter-methods with it as shown in the listing below:

public interface InterfaceInheritanceExample {

    @SystemProperty(SystemPropertyName.ID)
    DocumentId getId();

    @CustomAnnotation
    VersionId getVersionId();
}

Examples

Enumeration example

Define a enum class and use it in a another object type (Example).

@Enumeration(typeName = "my_enum")
public enum MyEnum {
    ENUM1, ENUM2, ENUM3, ENUM4
}

Document type example

Example of a type definition using the object type Document


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

  @Type(ObjectType.DOCUMENT) (1)
@RetentionProtected
@ContentElement(defaultDefinition = true, separateField = true)
@OverwriteAllowed
@RecycleBin
@Audit
public interface Resume {
// Immutable identifier documentid of the resume document: unique and readonly
@Unique
@ReadOnly
// alternatively: use autoincrement instead of unique and readonly to let the service create a unique sequence
//@Autoincrement
long getDocumentId(); (2)
void setDocumentId(long value);

// title of the resume document
String getTitle(); (2)
void setTitle(String value);

// relation to Person by person.id()
@ForeignKey (target = Person.class, targetProperty = "id") (3)
String getPersonId();
void setPersonId(String value);

// Multi value with former employers
List<String> getEmployers();
void setEmployers(List<String> employers);

MyEnum getEnum();
void setEnum(MyEnum myEnum);

}

1	Definition of the object type to allow Document to upload content
2	A database column is created for this property with a default name DocumentId. The column is readonly, mandatory, autoincrement and unique. The database creates a sequence of integer values. The value is readonly and so immutable. This allows users and 3rd party applications to identify and find the object. If you leave the @Autoincrement annotation the id must be set on creation and is readonly and immutable from that moment on.
3	This annotation specifies a foreign key to class Person

Container type example

The following example class is marked as type Container. To use an entity type, we annotate the class using the @Type annotation.

Example of an object definition of entity type Container

@Type(ObjectType.CONTAINER) (1)
public interface Person {
    String getFirstName(); (2)
    void setFirstName(String value);
    @Name("last_name")  (3)
    String getSurname();
    void setSurname(String value);
    @Unique  (4)
    String getVatNumber();
    void setVatNumber(String value);
}

1	Definition of the object type to be Container
2	A database column is created for this property with a default name first_name
3	This annotation specifies the name of the database column, which is different from the default
4	This annotation specifies a unique column, in this case vat_number.

Referencing attributes by name

The system creates a column for each attribute of a type definition in the type definition’s database table. The name of the column will be a snake case representation of the camel case name of the getter method of the attribute. For example, the getter getInvoiceNumber will be mapped to an attribute (and a column) named invoice_number. To make it easy to reference these names in a compile-safe manner, classes with string constants for all type definitions will be generated automatically. For example, for a type definition class called SimpleInvoice a class named SimpleInvoiceNames will be generated in the same package as SimpleInvoice.

The classes containing the constants are generated using an annotation processor that is contained in the library containing the type annotations. The processor is picked up by the compiler automatically.

The following example shows how these constants can be used to perform a search referencing two different attributes.

Example of a search using generated attribute name constants

        EcrSearchService<SimpleInvoice> searchService = serviceClient.asEntitySearchService(); (1)

        List<SimpleInvoice> list = searchService.where() (2)
            .entity().field(SimpleInvoiceNames.INVOICE_NUMBER).like().value("2021-08-*")
            .and()
            .entity().field(SimpleInvoiceNames.AMOUNT).greaterThan().value(90D)
            .holds()
            .unpaged();

1	serviceClient is a `TypedDocumentServiceClient` obtained using the `TypeDefinitionServiceClient`
2	A query is formulated using the fluent API of the EQL using the attributes invoice_number and amount

Type annotations

Table 18. Type annotations
Annotation	Parameter	Description
@Type	ObjectType	Define the entity type of your class by setting a valid ObjectType: DOCUMENT, FOLDER, RELATION, CONTAINER, META: Example
@AccessChecks	boolean	This annotation specifies whether type-based access-checking will be enabled on a type. Default = false. See permissions on type definitions for details.
@AclDisabled	boolean	Support for ACLs is enabled by default but can be disabled by annotating the type class with . Additionally, annotating a getter for the ACL-Id system property with @Mandatory enforces the assignment of an ACL to every entity. Meta types do not support ACLs.
@FilingEnabled	boolean	The filing feature makes it possible to assign a document to a folder. The feature is disabled by default and can be activated on typ classes of type DOCUMENT by annotating the class with .
@RetentionProtected	boolean	The retention and litigation hold feature is disabled by default and can be enabled by annotating a type class with. Meta types do not support retention. Example
@OptimisticLocking	boolean	The optimistic locking feature makes it possible for clients to ensure that updates do not overwrite changes made by other clients by accident. The feature is disabled by default and can be enabled by annotating a type class with
@RecycleBin	boolean	The recycle bin feature makes it possible to move entities to the recycle bin and restore them again if required. The feature is disabled by default but can be enabled by annotating a type class with. Recycle Bin
@Recovery	boolean	Enables the recovery log. Content objects or files are deleted after configurable time: Recovery Log
@ContentElement	String	Define the allowed content types: Example
@Audit	boolean	This annotation enables auditing of create-, update- and delete-operations on the type definition.
@Versioned	boolean	This annotation defines if all properties of a type are versioned or not. If the annotation is present on a type and on a getter in the type, the annotation on the getter wins.
@OverwriteAllowed	boolean	By default, arveo creates a new version if the content object of a document is changed. You can always read and restore all older versions of a content element. If overwrite is allowed you can replace a content element and overwrite it on the content store. The old version is lost.
@View	boolean	The metadata type is a database view: Example
@Tablename	String	Set the real database table name of a system column which is not camel case but snake case: Example
@SourceType	boolean	This annotation marks a setter method to be setting a property that is part of an update or create call and not a member of the entity itself. Examples are revision commentary or the update counter.
@TargetType	boolean	This annotation specifies the class being the target of a foreign key or relation.
@InheritedProperty	boolean	This annotation marks a property as an inherited property.
@Enumeration	String	This annotation can be used to configure a registered enumeration type.You must pass the database snake case name of the enumeration type: Example
@EcrIgnore	Ignore Property	This annotation marks a method to be ignored as property or a class to be ignored as type.The property is not stored in the database table.
@NOSql	boolean	This annotation enables full-text support for all columns of the document types. By default, the full-text support is disabled: Example.

Property annotations

Table 19. Property annotations
Annotation	Parameter	Description
@AutoIncrement	boolean	The annotation AutoIncrement indicates that the value of an attribute will be auto-incremented by the database.
@Indexed	String	The annotation ensures, that an index will be created for one or more properties. You must pass the index name as a parameter. When several attributes are annotated to use an index with the same name, a multi-column-index will be created for these columns. Use {@link Index} to configure additional properties of the index.
@Unique	boolean	Defines a unique column. If you try to create an entity with a duplicate value an unique constraint violation is thrown. arveo creates an unique index or an unique constraint on the database and ensures the integrity of the documents. Example
@Mandatory	boolean	Defines a mandatory column. Default = false, the create operation fails with an exception if the property is not set. Example
@Readonly	boolean	The property must be set when the entity is created (like @Mandatory) and cannot be changed afterwards. If a column has the annotations @Readonly and @Unique you have an immutable index value that can be used as business primary key. This ensures that users and third-party systems can clearly identify and find a document. Example
@Versioned	boolean	This annotation defines if an attribute of a type is versioned or not (when placed on a getter). If the annotation is present on a type and on a getter in the type, the annotation on the getter wins.
@Length	Long	This annotation specifies the length of a string or binary attribute
@Precision	Long Long	This annotation specifies the precision of a decimal, parameter: digits before and after comma
@Casesensitive	boolean	This annotation marks a field of type String as case-sensitive. This effects how searches on this field will be performed. The value itself will always be stored preserving the case.
@DefaultValue	String default<T>	It is possible to specify default values for properties. Pass the database name of the property (camel vs. snake case!) and define a function returning the required type. If an instance of a type with a field that has a default value specified is created, and a value for that field is not defined, the default value will be used instead. However, if the field is explicitly set to null, then null will be used instead. Example
@DefaultSystemPropertyValue	String ZonedDateTime	It is possible to calculate the initial value of the retention period and set it as a default value for RETENTION_DATE system column. Pass the database column name "retention_date" and a function returning a ZonedDateTime value. Example
@PrimaryKey	boolean	This annotation marks a custom property as part of the elements primary key. The primary key will be combined of every custom property annotated with this annotation and the system property id. The property will be mandatory.
@SecondaryKey	boolean	This annotation marks a property as secondary. The property is mandatory and unique.
@ForeignKey	String	Defines a foreign key. You must pass the class name and the column for the foreign key. _arveo creates the foreign key on the database and ensures the data integrity of your entities. Example
@CascadeDelete	boolean	It is possible to define foreign keys that cascade a delete operation to the referencing entity. Example
@Systemproperty	SystemPropertyName	To access system properties you can use the annotation @SystemProperty and pass one of the following names (Example).
@FormattedCounter	String	This annotation marks an attribute of type String as a formatted counter. Formatted counters can be used to generate string valued attributes with a counter backed by a sequence as well as a prefix and a suffix. The name of the sequence can be user defined, or it can be auto-generated by the system. Prefix, suffix, and the name of the sequence can contain placeholders. Currently, the only supported placeholder is `$date(<format>)`. The format string is a simple date format string as supported by `DateTimeFormatter.ofPattern`. Example
@RelationCounter	Class	This annotation marks a property of type Int as a counter for a specific relation type identified by the type definition class.
@EcrIgnore	boolean	This annotation marks a method to be ignored as property or a class to be ignored as type. The property is not stored in the database table.
@NOSql	boolean	This annotation enables or disables full-text support for this property.

Please note the following tips regarding Unique identifiers:

To allow users and 3rd party applications to identify and find objects in arveo you should define a unique and immutable property. The property must be @Unique to ensure that an application can identify the item. Make the property @ReadOnly to ensure that the identifier is always set and immutable.

Your business application or the user must set the value when the object is created. Use the @AutoIncrement annotation instead of @Unique and @Readonly if a simple sequential Long id meets your requirements. If you need a more sophisticated unique identifier you can use the annotation @FormattedCounter which allows you to create e.g. String identifiers like <year>-<sequence> (Example).

If overwrite is turned on it is possible to manipulate the originally saved content and compromise the document without creating a versioned copy. Ensure that the @OverwriteAllowed annotation is not present on legally compliant document types.

Examples

Default values

Example of a default value definition


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

  @Type(ObjectType.CONTAINER)
public interface ContainerWithSimpleDefaultProperty {

    String DEFAULT_STRING = "default string"; (2)

    @Mandatory
    String getMyStringField();

    void setMyStringField(String myStringField);

    @DefaultValue("my_string_field") (1)
    default String defaultStringField() {

        return DEFAULT_STRING;
    }

    // ...
    // your custom attribute definitions
    // ...
}

1	With `@DefaultValue("my_string_field")` the method `defaultStringField` is defined to return the default value of `my_string_field`.Note that the reference in the annotation is in snake-case while the actual property `getMyStringField` is camel-case.
2	In a simple case like this it is considered good practice to declare a constant default value as a public constant.However, the default method does not need to return a constant.For example, date-time fields could use `ZonedDateTime.now()` to specify the timestamp of the creation as default value.

Index example

As an example of annotations usage let us define an interface BookIndex with two properties, page and chapter.These properties have to be indexed.

Object annotation of type Meta

@Type(ObjectType.META)
@Index("book-chapter-page-index")
public interface BookIndex {

    @PrimaryKey
    @AutoIncrement
    int getId();

    @Indexed("book-chapter-page-index")
    int getChapter();
    void setChapter(int chapter);

    @Indexed("book-chapter-page-index")
    int getPage();
    void setPage(int page);
}

The above-mentioned properties are thus marked with the annotation @Indexed, which ensures, that an index will be created for these attributes.Here, the annotation @Index on the type is an example of an annotation on a type, described above.

Formatted counters example

Using the @FormattedCounter annotation it is possible to define counters with prefix and suffix that are backed by a sequence on the database.There are several properties that can be defined in the annotation:

Property

Description

prefix

The prefix used for the counter values. Can contain placeholders.

suffix

The suffix used by the counter values. Can contain placeholders.

digits

The number of digits for the counter. Shorter numbers will be padded with zero.

sequenceName

The name of the sequence to use. Can contain placeholders.

autoGenerateSequences

The number of sequences to auto-generate when the system is started in maintenance mode.

startValue

The start value of the generated sequence(s).

The parameters prefix, suffix and sequenceName support placeholders.Currently, the system supports a placeholder for dates in the form $date(<format>) where format is a java date format string supported by java.time.format.DateTimeFormatter#ofPattern(String)

The autoGenerateSequences property can only be used when the sequenceName contains the placeholder $date(uuuu).It must not contain any other placeholders.

The following example shows a formatted counter attribute used as an invoice number that will produce counter values in the form 2021#0103.It will be backed by a sequence called inv_no_seq_2021.The system will create the next 10 sequences automatically (inv_no_seq_2021 to inv_no_seq_2030).The start value of each sequence will be 100. The sequence to use will be determined automatically because of the date placeholder in the sequenceName property.So on January 1st 2022, the generated counter values will use another prefix and the counter will start over at 100 (2022#0100).Each time the system is started in maintenance mode, it will make sure that sequences for the next 10 years will be present.

Example: Defining a formatted counter attribute

@FormattedCounter(prefix = "$date(uuuu)#", digits = 4, sequenceName = "inv_no_seq_$date(uuuu)", autoGenerateNextSequences = 10, startValue = 100)
String getInvoiceNumber();

Foreign keys with ON DELETE CASCADE example

Add the @CascadeDelete annotation to the getter for the foreign key attribute. For relation types it is possible to add the cascade delete option to the foreign keys to the parent and child of the relation.To do that, add a system property for the parent- and/or child-id and annotate it with @CascadeDelete.

Usage of the annotation @CascadeDelete

// simple foreign key
@CascadeDelete
@Mandatory(false)
@ForeignKey(target = BookIndex.class, targetProperty = "id")
Integer getReferencedIndex();

// parent- and child-id of a relation
@CascadeDelete
@SystemProperty(SystemPropertyName.PARENT_ID)
short getParentId();

@CascadeDelete
@SystemProperty(SystemPropertyName.CHILD_ID)
short getChildId();

The cascade delete option is supported only for entities that are not versioned (hence it cannot be used on Document types) and do not support retention or inheritance.It is also not possible to inherit attribute values from a type definition that has a foreign key with the cascade delete option.

Property-like system fields

If a getter for a system field is defined, then it is possible to define a setter, if the system field is property like. The following fields are property-like:

acl_id;
retention_date.

The following listing shows the definition of a getter and a setter method on a property-like field.

Example of property-like fields

public interface Secured {

    @SystemProperty(SystemPropertyName.ACL_ID)
    AccessControlListId getAclId();

    void setAclId(AccessControlListId aclId);

}

Define a view

To define your type as a view or a partial view, you have to annotate your type with @View or @PartialView.The @View annotation specifies whether the defined type is a view i.e. whether it should create the tables for it.The @PartialView annotation marks a class to be a partial view of the type definition created by another class via the @Type annotation.Partial views can be used for updates and selects with limited select clauses.No tables will be created for classes annotated this way.The interfaces that are to be defined as views of an object type, have to be registered on the interface, representing this object type.For instance, if an interface NamedFile inherits from the interface NamedEntity, and NamedEntity is a partial view of NamedFile, it has to be registered on the object from which it inherits:

@PartialView(NamedFile.class)
public interface NamedEntity {
    //...
}

Note: An interface may also be a partial view of more than one type definitions.

External views

It is possible to expose tables that are under control of other applications to arveo and include them in its type system.This assumes that the given tables are in the same database schema as the tables of arveo.Also, one needs to know the name of these tables as well as their types.In this case one can define a meta-type annotated with @View.

External views will only be read from ecr.It will never write to an external view.

For example, the access-control-service defines several tables, one of it named usrv_acl.In this table there are - amongst others - two fields: id (a bigint) and name (a varchar).With this knowledge one can define the following external view:

Example for an external view


  1
2
3
4
5
6
7
8
9
10
11

  @View (1)
@Type(ObjectType.META) (2)
@TableName("usrv_acl") (3)
public interface AclView {

    @Unique
    String getName(); (4)

    @PrimaryKey
    long getId(); (5)
}

1	We annotate the class with @View to declare it as an external view.
2	Specifying the type as a `META` type is good practice, since every other type would expect specific system fields.
3	Specifying the table name is good practice here, since an external table most likely follows its own name convention. However, it would be possible to omit the @TableName annotation here and instead name the class `UsrvAcl`.
4	Since we know that the table `usrv_acl` has a field `name` of type `varchar` we can define the property `name` of type `String`.
5	We know that the table `usrv_acl` has a field `id` of type `bigint`, so we specify a java property accordingly.

NoSQL example

You can write the annotation @NOSql to the type definitions, which should also be created in the solr schema, so that the whole class is created with its fields.

Example of usage of the @NOSql annotation

@Type(ObjectType.CONTAINER)
@NOSql
public interface PersonSimple {
    String getFirstName();
    void setFirstName(String value);
    String getLastName();
    void setLastName(String value);
}

If you don’t want to create a field, you can disable it with the annotation @NOSql(value = false).

Example of usage of the @NOSql annotation with value set to false

@Type(ObjectType.CONTAINER)
@NOSql
public interface PersonSimple {
    String getFirstName();
    void setFirstName(String value);
   @NOSql(value = false)
    String getLastName();
    void setLastName(String value);
}

@SystemProperty annotation

To access system properties you can use the annotation @SystemProperty and pass one of the following names (Retention Information Getter)

general system fields:

ID: The unique identifier of the entity. Use on EcrId properties (or subclasses as applicable). Can be used on any entity
CREATION_DATE: The date and time the relation was created. Use on ZonedDateTime properties. Can only be used on relations.
CREATOR_USER_ID: The id of the user that created this relation. Use on UserId properties. Can only be used on relations.
ACL_ID: The id of the ACL currently assigned to the entity. Might be null. This is not supported for metadata entities.
ACL_RIGHT: The resolved right based on the ACL currently assigned to the entity and the current user. This is not supported for metadata entities.
RETENTION_INFO: Information about the retention properties of the entity. It contains the RETENTION_DATE and the LITIGATION_HOLD flag described below. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.
RETENTION_DATE: The retention date defines the minimum storage date i.e. the related object can not be deleted until after this date passed. The the storage period may be extended but never shortened. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.
LITIGATION_HOLD:A flag that indicates whether a document is related to a litigation. If the flag is set the document must never be deleted - even if the retention date has passed by. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.

versioned system fields:

VERSION_NUMBER: The number of the version of the versioned entity. Use on int/Integer properties. Can only be used on versioned entities.
VERSION_ID: The unique identifier of the version of the entity. Use on VersionId properties. Can only be used on versioned entities
UPDATE_COUNTER: A counter that is incremented each time an entity is updated. It is used for the optimistic locking feature and therefore is only available on type definitions that use optimistic locking.
IS_CURRENT_VERSION: A boolean that indicates whether the entity was the current version at the time it was loaded from the backend. Can only be used on versioned entities.
MODIFICATION_INFO: Information about the date and time as well as the user of the first and last modification of the entity. Use on ModificationInformation properties. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations.

document system fields:

CONTENT: Information about the content of the document. Use on Map<String, ContentInformation> properties. Can only be used on documents.
CONTAINING_FOLDER: The id of the folder containing the document (if any). Use on FolderId properties. Can only be used on documents.

folder system fields:

FOLDER_NAME: The name of the folder. Use on String properties. Can only be used on folders.
PARENT_FOLDER: The id of this folders parent. Use on FolderId properties. Can only be used on folders.

relation system fields:

PARENT_ID: The id of the parent of this relation. Use on TypedId properties (or applicable subclasses). Can only be used on relations.
PARENT_VERSION_ID: The version-id of the parent of this relation. Use on VersionId properties. Can only be used on relation types that support relations to or from versions.
CHILD_ID: The id of the child of this relation. Use on TypedId properties (or applicable subclasses). Can only be used on relations.
CHILD_VERSION_ID: The version-id of the child of this relation. Use on VersionId properties. Can only be used on relation types that support relations to or from versions.

Data Types

Table 20. Property Data Types
Java Type	Database Type	Description
String	text	Unlimited unicode text. Limit the length with @Length annotation
Integer or int	int	32 bit integer value, Integer = null is allowed
Long or long	bigint	64 bit long value, Long = null is allowed
Double or double	double	double value, Double = null is allowed
Boolean or boolean	boolean	Boolean value, Boolean = 3 state boolean
Decimal or decimal	decimal( precision)	Decimal value, Decimal = null is allowed, add @Precision annotation
UUID	uuid	uuid type
byte[ length ]	bytea	Binary data with a length, specified by a java int (max. 4 gb).
String	text	String based ID with a non-null length.
EnumerationType	EnumerationType	arveo creates an enumeration object on postgreSQL 12.
ZonedDateTime	datetime	arveo stores a GMT based date time value in postgreSQL 12
LocalDate	datetime	arveo stores a date time value in postgreSQL 12, but only the date is relevant
LocalTime	datetime	arveo stores a date time value in postgreSQL 12, but only the time is relevant
List<String>	array(text)	_arveo stores multiple text values in an array column of postgreSQL 12.
List<Long>	array(bigint)	_arveo stores multiple bigint values in an array column of postgreSQL 12.

By default, postgreSQL 12 does not limit the length of String values. Typically, it is not necessary to define a length using the @Length annotation because postgreSQL 12 does handle Strings of all length very well.
Your strings should have a length up to 4 kByte. Even larger strings are allowed, but you should take care that you do not inadvertently consume too much data space if you store very large strings.

List data types allow you to store more than String or long value for a property. You can search for each value using the array search operation of the arveo query language.

Enumeration data types allow you to set one or more values from a fixed set of values.

System Properties

The following chapter describes types of system properties in arveo.

There are different types of system properties:

General system properties: system properties that are available on all types of entity (except for meta data entities).
General system properties: system properties that are available on all types of entity (except for meta data entities).
Versioned entity system properties: system properties that are only available on entities that can be versioned (Containers, Documents, Folders, Relations). Those properties are contained in the main table of a type definition.
Document system properties: system properties that are only available on documents.
Folder system properties: system properties that are only available on folders.
Relation system properties: system properties that are only available on relations.
Version system properties: system properties that are only available on versions of entities. Those properties are contained in the version table of a type definition.

System Property Names

All system columns in the database are snake case but not camel case. e.g. the Java RetentionDate variable is persisted as "retention_date".

Table 21. *General system properties:*
Name	Database Type	Description
id	bigint	The unique identifier of the entity. Use EcrId properties (or subclasses as applicable). Can be used on any entity and is applied by arveo for all types but metadata.
acl_id	bigint	The id of the ACL currently assigned to the entity. Might be null. This is not supported for metadata entities or type with disabled ACLs.
creation_date	datetime	GMT timestamp when the entity or version was created, precision (1/1000 second)
creator_user_id	bigint	The ID of the user who created the entity or version (User Management)
deleted	boolean	Optional flag that indicates that an entity is currently contained in the recycle bin.
last_delete_restore_date	datetime	Optional GMT timestamp of when the entity was last moved in or out of the recycle bin.
retention_date	datetime	The GMT based retention timestamp defines the minimum storage date i.e. the related object can not be deleted until after this date passed. Cannot be used on meta data entities and is only available on entity types that declared to be retention protected (Retention)
litigation_hold	boolean	The boolean indicates whether a document is related to a litigation. If the flag is set the document must never be deleted - even if the retention date has passed by. Cannot be used on meta data entites and is only available on entity types that declared to be retention protected (Retention)
update_counter	int	Optional counter for the number of updates on an entity used for optimistic locking.

Table 22. *Versioned entity system properties:*
Name	Database Type	Description
version_number	bigint	The sequential number of the latest version of the versioned entity.
latest_version_id	bigint	The unique identifier of the latest version of the entity.
version_comment	string	A comment set by the client when a new version is created.
modification_date	datetime	GMT timestamp when the version was created or changed, precision (1/1000 second)
modification_user_id	bigint	The ID of the user who created or changed the version
initial_creation_date	datetime	GMT timestamp of when the first version of an entity was created.

Table 23. *Document system properties:*
Name	Database Type	Description
content	json	JSON containing content properties: ID : unique id of the content Hash : SHA256 hash of the content stream Hash-Algorithm: Algorithm of the hash MediaType : mime type of the content, e.g. octet-stream Creation: GMT based ZonedDateTime timestamp of the creation of the object FileName: Name of the file, if stored on a file system storage Size: bigint value containing the size of the content stream in bytes
parent_id	bigint	Optional field that contains the ID of the folder the document is contained in.

Table 24. *Folder system properties:*
Name	Database Type	Description
folder_name	String	The name of the folder.
parent_id	bigint	The ID of the parent of the folder in the folder tree.

Table 25. *Relation system properties:*
Name	Database Type	Description
parent_id	bigint	The id of the parent of this relation.
parent_version_id	bigint	The version-id of the parent of this relation. Can only be used on relation types that support relations to or from versions.
child_id	bigint	The id of the child of this relation.
child_version_id	bigint	The version-id of the child of this relation. Can only be used on relation types that support relations to or from versions.

Table 26. *Version system properties:*
Name	Database Type	Description
version_number	bigint	The sequential number of the version.
version_id	bigint	The unique identifier of the version.
version_comment	string	A comment set by the client when a new version is created.
entity_id	bigint	The ID of the entity the version belongs to.

Timestamps

All timestamp system properties (creation_date, initial_creation_date, modification_date) are stored in the database using the GMT timezone and a precision of 1 millisecond. When using the Java API, the values will be returned as ZonedDateTime instances.

The initial_creation_date field will contain the timestamp of when the very first version of an entity was created. This field is never updated. The creation_date field on the other hand will contain the time a specific version of an entity was created. Thus, the creation_date field in the main table will be updated when a new version is created because the main table will always contain the latest version of an entity. The modification timestamp field (modification_date) will contain the timestamp of when a version was created or overwritten. This field, too, will be updated in the main table each time a new version is created. It will be updated in the main table and in the version table when a version gets overwritten.

Document type

The following chapter provides a more detailed overview of the type Document.

A Document is one of five entity types supported by the arveo system. Unlike the other entity types, documents are always versioned too keep track of changes of the binary content.

A Document consists of the following components:

Technical metadata, which is filled by arveo and cannot be changed, see System properties
Typed metadata as defined in the annotated interface (the type definition)
0-n content objects: A content object has a content type that is freely configured in the system.A maximum of one element can be inserted per content type.Examples of content types are: original object, rendition, full text, text notes, XML properties, etc.
content metadata like content size, mime-type and hash
0-n annotations per content object: Only for image objects (TIFF, JPEG, PNG, BMP, PDF/A) annotations can be created in a layer independent of the document.

Any number of versions can be created for a Document. All the versions are traceable in the repository and can be referenced via independent system-wide unique IDs.

Container type

The following chapter provides a more detailed overview of the type Container.

A Container is an object without content. It supports all system managed metadata attributes and custom attributes defined by the type definition. It is called 'Container' because it’s primary use case is to serve as an entity that contains custom metadata and that is related to other entities like a document via foreign keys or relations.

Use container objects to build records and cases that contain documents.You can map the relationship between file, case and documents either as a foreign key (@ForeignKey annotation) or using the relation type objects (Relation Type).

If you use Foreign keys to create the relationship between objects you can inherit values from the parent to its children (Inheritance)

Containers can be versioned. A Container consists of the following components:

Technical meta information, which is filled by arveo and cannot be changed, see System properties
Typed container type metadata according to the type definition of the container type.

Any number of versions can be created for a Container. All the versions are traceable in the repository and can be referenced via independent IDs.

Relation type

The following chapter provides a more detailed overview of the type Relation.

A Relation represents a connection between two entities (document, container, folder or meta). It is directed, having a parent and a child and it can contain custom metadata attributes. A Relation type must specify the type of the parent and child entities. Any number of versions can be created for a Relation. All the versions are traceable in the repository and can be referenced via independent IDs.

Changes of the child-id or parent-id are not tracked in the version table.

Data model of a Relation

 +------------+               +--------+-----+              +------------+
 |   Parent   |               +   Relation   +              |   Child    |
 |------------|      source   |--------------|  target      |------------|
 |            |<--------------|              |------------->|            |
 | attributes |               |  attributes  |              | attributes |
 |            |               |              |              |            |
 +---+--------+               +--------------+              +------------+

Example: A Relation type definition

@Type(ObjectType.RELATION) (1)
@SourceType(Customer.class) (2)
@TargetType(Invoice.class) (3)
public interface CustomerInvoiceRelation {

    @SystemProperty(SystemPropertyName.CHILD_ID) (4)
    @InputProperty(InputPropertyName.RELATION_CHILD) (5)
    DocumentId getChildId();

    void setChildId(DocumentId childId);

    @SystemProperty(SystemPropertyName.PARENT_ID) (6)
    @InputProperty(InputPropertyName.RELATION_PARENT) (7)
    ContainerId getParentId();

    void setParentId(ContainerId parentId);

    String getStatus();

    void setStatus(String status);
}

1	Specifies that the type definition is used for relations
2	Defines the type of the source or parent of the relation
3	Defines the type of the target or child of the relation
4	Marks an attribute to return the value of the `childId` property of the relation
5	Marks an attribute to set the value of the `childId` property of the relation
6	Marks an attribute to return the value of the `parentId` property of the relation
7	Marks an attribute to set the value of the `parentId` property of the relation

Relations vs. foreign keys

Instead of using relations, it is possible to model a dependency between two entities using foreign keys. The key difference between the two approaches is that a relation can carry its own metadata attributes, which a foreign key can not. This possibility requires an additional database table (or two, in case of versioned relations) for a relation, which might have a negative impact on the performance. If the dependency between the two entities does not require its own metadata attributes (and is not a many-to-many relation), it is recommended to use foreign keys instead of relations.

Foreign keys can be defined by adding the @ForeignKey annotation to an attribute in a type definition. The targetProperty attribute of the annotation must point to the ID or to a custom metadata attribute with a unique constraint of the target type. The type of the annotated attribute must match the type of the target property of the foreign key. The chapter Foreign Keys contains a more detailed overview of the foreign key feature.

Example: Defining a foreign key

@ForeignKey(name = "fk_invoice_customer", target = Customer.class, targetProperty = "id")
long getCustomerNumber();

Data model of a foreign key relationship

 +------------+                 +------------+
 |   Parent   |                 |   Child    |
 |------------|   foreign key   |------------|
 |            |---------------->|            |
 | attributes |                 | attributes |
 |            |                 |            |
 +---+--------+                 +------------+

Relations to versions

By default, a relation can point to the current version or to a specific version of its parent or child, when the parent- or child-type supports versions. This behavior can be controlled by the supportedNodeVersion property of the @Source and @Target annotations used for relation type definitions. The attribute supports three different values (defined in de.eitco.ecr.type.definition.annotations.reference.SupportedNodeVersion):

Table 27. Possible values of the supportedNodeVersion attribute
Value	Meaning
CURRENT_VERSION	The relation must point to the current version of the node identified by the node’s ID (NOT the VersionId of the current version)
SPECIFIC_VERSION	The relation must point to a specific version of the node identified by it’s VersionId.
CURRENT_OR_SPECIFIC_VERSION	The relation can point to either the current version or a specific version of the node. This is the default.

Unique relations

A single relation always has exactly one parent and one child. However, by default a single entity can be the parent or child of multiple relations (many-to-many). By adding unique constraints to the parentId and/or childId system properties of the relation type, it is possible to define one-to-many, many-to-one or one-to-one relations.

Example: Adding a unique constraint to the child ID of a relation

@SystemProperty(SystemPropertyName.CHILD_ID)
@Unique(constraintName = "uccr_parent_child_uc")
ContainerId getChildId();

Relation counters

By using the @RelationCounter annotation it is possible to create counters on the parent- and child-entities for both incoming and outgoing relations. The counters are persisted in the database and are updated automatically when relations are added or removed.

The @RelationCounter annotation contains two attributes: The relationType attribute defines the type of relation to count and the direction attribute defines whether to count incoming (the entity is the child or target of the relation) or outgoing (the entity is the parent or source of the relation). By annotating the relation counter attribute with @Versioned it is possible to control whether the counter attribute is stored in the version table for each version or in the main table for all versions. When the counter is stored in the version table it will contain the count for a single version of the entity. If it is stored in the main table it will contain the count for all versions of the entity. The following example shows how to define relation counter attributes. The @Name annotation is used because the attribute name is too long for a database column name.

Example: Defining relation counter attributes

@RelationCounter(relationType = TypedContainerContainerRelation.class, direction = RelationCounterDirection.INCOMING)
@Versioned(false)
int getIncomingRelationCounter();

@RelationCounter(relationType = TypedContainerContainerRelation.class, direction = RelationCounterDirection.INCOMING)
@Versioned
@Name("v_in_relation_counter")
int getVersionedIncomingRelationCounter();

Working with relations

The $arveo API provides several methods that can be used to create, modify and resolve relations. Relations itself are treated just like any other entity type. Entities, that can be the parent or child of a relation (containers, folders, documents and meta data entities), provide additional relation-specific methods in the client API. The available methods are defined in the interface de.eitco.ecr.sdk.TypedBaseRelationNodeEntityClient, which is a super interface of the clients used in the API for documents, folders, containers and meta data entities. The injectable de.eitco.ecr.sdk.SearchClient offers additional methods to search for relations using filters on the relation, the parent or the child.

Folder type

The following chapter provides a more detailed overview of the type Folder.

A Folder is an entity that is organized in a file system like tree structure. A Folder can contain custom metadata attributes. Documents can be filed in a Folder.

A Folder consists of the following components:

Technical meta information, which is filled by arveo and cannot be changed, see System properties.
Typed folder type metadata according to a schema defined for the document type.

Any number of versions can be created for a Folder. All the versions are traceable in the repository and can be referenced via independent IDs.

Only documents can be filed in a Folder. To enable the filing feature, add the @FilingEnabled annotation to your document type.

Metadata type

Metadata types are used for example to connect external tables. They do not contain any specific system fields and no typed ID as a primary key. The database table can be created by the arveo or an existing table can be used.

Use the @View annotation to mark a metadata type as a view for which the system should not create a table and use the @TableName annotation to define the name of the table of the external system.

Metadata types do not support versioning and retention protection.

You can use the @PrimaryKey annotation to define one or more properties of a Metadata type to be the primary key.

Inheritance

Simple direct inheritance

The following chapter describes the inheritance scheme, used in arveo. The object to be inherited and its initial state is shown in the following table.

Table 28. The object to be inherited
	Create	Initial state
Company
ID (Company)	-	888
Name	CTuX	CTuX
CountryCode	DE	DE
PhoneNumber	-	[NULL]

The following table describes direct inheritance (hence with no intermediate objects). Here, Invoice is an object that inherited from Company. The following table describes its initial state, and the update status after 3 different updates.

Table 29. Inheritance scheme
	Create	Initial state	Update 1	After Update 1	Update 2	After Update 2	Update 3	After Update 3	Update 4	After Update 4
Invoice
ID (Invoice)	-	931	-	931	-	931	-	931	-	931
InvoiceNumber	EIT-53	EIT-53	-	EIT-53	-	EIT-53	-	-	-	EIT-53
companyID	-	[NULL]	888	888	[NULL]	[NULL]	[NULL]	[NULL]	-	[NULL]
companyName	-	[NULL]	SAP	CTuX	Eitco	Eitco	-	[NULL]	-	[NULL]
companyCountryCode	-	[NULL]	-	DE	-	[NULL]	-	[NULL]	-	[NULL]
companyPhone	-	[NULL]	+49 (30) 408191-425	[NULL]	+49 (30) 408191-425	+49 (30) 408191-425	-	[NULL]	+41 123456	+41 123456
		Error: no change!		Not possible: faulty update parameters!

Note the following principles:

After update2: All inherited fields are NULLs if inheritance key is set to NULL, unless values are explicitly specified. After update3: All inherited fields are NULLs if inheritance key is set to NULL, unless values are explicitly specified. - Even if the inheritance key was already NULL before.

Multilevel inheritance

This inheritance form has an object to be inherited from, just like the direct inheritance. An objects inherits from it, after that another object inherits from the second object. The initial object is still the same, its initial state is described in the table above.

In the following table, the second object Creditor, which inherits from the first object, is described.

Table 30. The object to be inherited and inheriting
	Create	Initial state
Creditor
ID (Creditor)	-	999
CreditorNumber	471147114711	471147114711
CompanyID	888	888
companyName	-	CTuX
companyCountryCode	-	DE
companyPhone	-	[NULL]

In the table above, the object Creditor inherited the following properites through the companyID: companyName, companyCountryCode, companyPhone.

The results of multilevel inheritance through an intermediate object are shown in the table below:

Table 31. Inheritance Scheme in Multilevel inheritance
	Create	Initial state	Update 1	After Update 1	Update 2	After Update 2
Invoice
ID (Invoice)	-	931	-	931	-	931
InvoiceNumber	EIT-11	EIT-11	-	EIT-11	-	EIT-11
creditorID	-	[NULL]	999	999	[NULL]	[NULL]
companyName	-	[NULL]	SAP	CTuX	Eitco	EITCO
companyCountryCode	-	[NULL]	-	DE	-	[NULL]
companyPhone	-	[NULL]	+49 (30) 408191-425	[NULL]	+49 (30) 408191-425	+49 (30) 408191-425

Indirect inheritance

The third form of inheritance is indirect inheritance. It is much like the second form, only the inheriting object inherits the IDs of both objects it inherits from. In the example above, the object Invoice inherits both the creditorID and the companyID.

In the following table, the object Creditor is described.

Table 32. An object to be inherited
	Create	Initial state
Creditor
ID (Creditor)	-	999
CreditorNumber	471147114711	471147114711
CompanyID	888	888

The table below describes the mechanism of indirect inheritance.

Table 33. Inheritance scheme in indirect inheritance
	Create	Initial state	Update 1	After Update 1	Update 2	After Update 2	Update 2a	After Update 2a
Invoice
ID (Invoice)	-	931	-	931	-	931	-	931
InvoiceNumber	EIT-11	EIT-11	-	EIT-11	-	EIT-11	-	EIT-11
creditorID	-	[NULL]	999	999	[NULL]	[NULL]	[NULL]	[NULL]
companyID	-	[NULL]	-	888	-	888	[NULL]	[NULL]
companyName	-	[NULL]	SAP	CTuX	Eitco	CTuX	Eitco	EITCO
companyCountryCode	-	[NULL]	-	DE	-	DE	-	DE
companyPhone	-	[NULL]	+49 (30) 408191-425	[NULL]	+49 (30) 408191-425	[NULL]	+49 (30) 408191-425	+49 (30) 408191-425

This form of inheritance is currently not needed and therefore not supported by ECR.

Inheritance of ACLs

The acl_id system field is property-like, thus a type can define it as inherited. This permits scenarios where there is one main entity providing the access definition with several entities being linked to it. If the acl of the main entity changes, the ACLs of the linked entities change as well:

Example for the main entity


  1
2
3
4
5
6
7
8
9
10
11
12
13

  @Type(ObjectType.CONTAINER)
public interface MainEntity {

    @Mandatory (2)
    @SystemProperty(SystemPropertyName.ACL_ID) (1)
    AccessControlListId getMainAcl(); (7)

    void setMainAcl(AccessControlListId id);

    // ...
    // your custom attribute definitions
    // ...
}

Example for linked entities


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

  @Type(ObjectType.DOCUMENT)
public interface ChildEntity {

    @Mandatory (4)
    @ForeignKey(target = MainEntity.class, targetProperty = "id") (3)
    ContainerId getCurrentMainEntity(); (6)

    void setCurrentMainEntity(ContainerId mainEntity);


    @SystemProperty(SystemPropertyName.ACL_ID)
    @InheritedProperty(foreignKeyPropertyName = "current_main_entity", sourcePropertyName = "acl_id") (5)
    AccessControlListId getAcl();

    // ...
    // your custom attribute definitions
    // ...
}

1	The main entity defines a property that accesses the ACL system property.
2	This property is defined mandatory - thus the main entity will always have an ACL.
3	The child entity defines a foreign key to the main entity.
4	By specifying the foreign key property as mandatory, every child entity will be linked to a main entity
5	Now we can specify an ACL property being inherited.
6	Note that `foreignKeyPropertyName` (in line 12) is written in snake-case while the actual property getter is written in camel-case.
7	Note further, that while the property referenced is actually defined by the getter `getMainAcl` (MainEntity line 6), `sourcePropertyName` is set to the name of the system field `"acl_id"` to derive the property.

Let’s see this behaviour in action. Assume that we have a TypeDefinitionServiceClient named typeDefintionServiceClient and also the ids of two ACLs (firstAclId and differenceAclId). First we can create service Clients for the two types defined above:

The annotation @DefaultValue() only accepts the database column name as static string parameter. As the document type properties are CAMEL case and the database column names are SNAKE case you must convert your properties e.g. MyCamelCaseProperty = my_camel_case_property.


  1
2
3
4

          TypedContainerServiceClient<MainEntity> mainEntityServiceClient =
            typeDefinitionServiceClient.getContainerServiceClient().byClass(MainEntity.class);
        TypedDocumentServiceClient<ChildEntity> childEntityServiceClient =
            typeDefinitionServiceClient.getDocumentServiceClient().byClass(ChildEntity.class);

With these service clients we can now create several entity instances of MainEntity and ChildEntity:


  1
2
3
4
5
6
7
8
9
10
11
12
13

          MainEntity mainEntity = mainEntityServiceClient.createTypeInstance();
        mainEntity.setMainAcl(firstAclId);
        TypedContainerClient<MainEntity> mainEntityClient = mainEntityServiceClient.createEntity(mainEntity);

        ChildEntity childEntity1 = childEntityServiceClient.createTypeInstance();
        childEntity1.setCurrentMainEntity(mainEntityClient.getIdentifier());
        TypedDocumentClient<ChildEntity> childEntityClient1 = childEntityServiceClient.createEntity(childEntity1);

        // ...

        ChildEntity childEntityN = childEntityServiceClient.createTypeInstance();
        childEntityN.setCurrentMainEntity(mainEntityClient.getIdentifier());
        TypedDocumentClient<ChildEntity> childEntityClientN = childEntityServiceClient.createEntity(childEntityN);

The instances of ChildEntity will automatically have the same ACL as mainEntity:


  1
2
3

          Assert.assertEquals(childEntityClient1.getEntity().getAcl(), firstAclId);
        // ...
        Assert.assertEquals(childEntityClientN.getEntity().getAcl(), firstAclId);

If the ACL of the parent is updated…


  1
2

          mainEntity.setMainAcl(differentAclId);
        mainEntityClient.updateAttributes(mainEntity);

…then the ACLs the instances of ChildEntity change as well:


  1
2
3
4
5
6
7
8

          childEntityClient1 = childEntityClient1.reload();
        // ...
        childEntityClientN = childEntityClientN.reload();

        Assert.assertEquals(childEntityClient1.getEntity().getAcl(), differentAclId);
        // ...
        Assert.assertEquals(childEntityClientN.getEntity().getAcl(), differentAclId);

Default ACLs and Inheritance

In many cases it will be desirable to be able to specify a default ACL for a given type. But the naive approach for defining a default ACL will prove cumbersome:


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

  @Type(ObjectType.CONTAINER)
public interface ContainerWithDefaultAcl extends WithData {

    @Mandatory
    @SystemProperty(SystemPropertyName.ACL_ID)
    long getAclId();

    void setAclId(long aclId);

    @DefaultValue("acl_id") (1)
    default long defaultAcl() {

        return ?? (2)
    }
}

1	Of course one can define the ACL system property with a default value.
2	However, when specifying the default value one faces a problem. The id of an ACL is set by the Access Control Service automatically and will vary from deployment to deployment, even between test and production environments.

However, the concepts presented so far can be used for a better solution. The main idea is to specify the ACL by its name instead of its id. For that we will need access to a table containing ACL names and their respective ids. Here external views can be used. We have already seen an external view exposing the ACL table to arveo:

Defintion of a view to the acl table


  1
2
3
4
5
6
7
8
9
10
11

  @View
@Type(ObjectType.META)
@TableName("usrv_acl")
public interface AclView {

    @Unique
    String getName();

    @PrimaryKey
    long getId();
}

Since that exposes ACLs as arveo type instances, ACLs can be used for inheritance. And since ACL names are unique they can be used as a foreign key, particularly as one defining inheritance. That way the actual ACL id can be inherited by a key that is an ACL name, for which we can easily define a default value that is stable over every environment:

A sofisticated example of an ACL default value


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

  @Type(ObjectType.CONTAINER)
public interface ContainerWithDefaultAcl extends WithData {

    String DEFAULT_ACL_NAME = "default-container-acl"; (6)

    @Id
    ContainerId getId();

    @Optional (7)
    @ForeignKey(target = AclView.class, targetProperty = "name") (2)
    String getAcl(); (1)

    void setAcl(String acl);

    @DefaultValue("acl")
    default String defaultAcl() { (3)

        return DEFAULT_ACL_NAME;
    }

    @Mandatory (7)
    @InheritedProperty(foreignKeyPropertyName = "acl", sourcePropertyName = "id") (5)
    @SystemProperty(SystemPropertyName.ACL_ID)
    long getAclId(); (4)

    void setAclId(long aclId);
}

1	In our type we define a property ACL, that holds the name of the ACL.
2	This property is a foreign key that targets the field `name` table `usr_acl`.
3	For this property we can easily specify a default value.
4	Now we specify the ACL property.
5	It is simply defined to be inherited by the foreign key to the ACL table.
6	It is good practice to store constant default values in constants.
7	Marking the ACL id as @Mandatory enforces that every instance of the entity must have an ACL. However, this does not need to be an inherited one (since the ACL name is marked @Optional). So the more cumbersome way - to set the ACL by its id - is still possible. Marking the ACL propery as @Mandatory would forbid this.

Retention

Annotations @RetentionProtected

An object may be annotated as @RetentionProtected. This will enable all further retention annotations listed below.Every retention enabled object extends the data model by

Datetime Retention_Date: contains the fixed retention period as ZonedDateTime format
Boolean LitigationHold: stores the litigation hold property

The convenience class 'Retention_Info' contains both values and can be used to read the retention information with one call.

Annotations @DefaultSystemPropertyValue(RETENTION_DATE)

It is possible to define a default value for the RETENTION_DATE system column (Default Values]).

If a retention date is not explicitly set, a default value for the retention period is calculated using the default value function implemented by the document type.


  1
2
3
4

  @DefaultSystemPropertyValue(SystemPropertyName.RETENTION_DATE)
default ZonedDateTime defaultDatum() {
    return ZonedDateTime.Now().plusYears(10);
}

@RetentionProtected annotations is required if you want to set a default for retention_date.

If you have defined foreign keys, you can inherit the retention date from container or folder objects.This is very helpful if you have records in your data model (Defaults and Inheritance).

Examples

Document Type: 10 year retention period

The following example shows how to set the default retention to creation date + 10 years. It also shows how to set a default value for the property warrantyEnd based on the ReceiptDate + 3 years.

It is still possible to set the Retention_date and warrantyEnd when you upload the document and overwrite the default value.

Example: Upload a document with new content

/*
 * Copyright (c) 2020 EITCO GmbH
 * All rights reserved.
 *
 * Created on 02.10.2020
 *
 */
package de.eitco.ecr.system.test.types.defaultvalues;

import de.eitco.ecr.common.RetentionInformation;
import de.eitco.ecr.type.definition.annotations.ContentElement;
import de.eitco.ecr.type.definition.annotations.ObjectType;
import de.eitco.ecr.type.definition.annotations.OverwriteAllowed;
import de.eitco.ecr.type.definition.annotations.Type;
import de.eitco.ecr.type.definition.annotations.constraint.Mandatory;
import de.eitco.ecr.type.definition.annotations.constraint.SecondaryKey;
import de.eitco.ecr.type.definition.annotations.defaults.DefaultSystemPropertyValue;
import de.eitco.ecr.type.definition.annotations.defaults.DefaultValue;
import de.eitco.ecr.type.definition.annotations.system.Id;
import de.eitco.ecr.type.definition.annotations.system.RetentionProtected;
import de.eitco.ecr.type.definition.annotations.system.SystemProperty;
import de.eitco.ecr.type.definition.annotations.system.SystemPropertyName;
import org.springframework.http.MediaType;

import java.time.ZoneId;
import java.time.ZonedDateTime;

@Type(ObjectType.DOCUMENT)
@RetentionProtected
@ContentElement(name = "content", separateField = true)
@OverwriteAllowed
public interface DocumentWithDefaultRetention {

    @Id
    Object identifier();

    @SystemProperty(value = SystemPropertyName.RETENTION_INFO)
    RetentionInformation getRetentionInformation();

    @SystemProperty(value = SystemPropertyName.RETENTION_DATE)
    ZonedDateTime getRetentionDate();

    void setRetentionDate(ZonedDateTime retentionDate);

    @SystemProperty(value = SystemPropertyName.LITIGATION_HOLD)
    Boolean getLitigationHold();

    @SecondaryKey
    String getName();

    void setName(String name);

    @Mandatory
    ZonedDateTime getReceiptDate();

    void setReceiptDate(ZonedDateTime receiptDate);

    @Mandatory
    ZonedDateTime getWarrantyEnd();

    void setWarrantyEnd(ZonedDateTime warrantyEnd);

    @Mandatory
    String getMimeType();

    void setMimeType(String value);


    // helper for snake case db column names based on camel case getter/setter names
    // attenttion you MUST use snake db column names in default value annotations! if the name is wrong you will get a model exception during start up
    String DB_COL_WARRANTYEND = "warranty_end"; (1)
    String DB_COL_MIMETYPE = "mime_type";
    String DB_COL_RECEIPTDATE = "receipt_date";
    String DB_COL_NAME = "name";
    String DB_COL_RETENTIONDATE = "retention_date";

    ZoneId ZoneIdEuropeBerlin = ZoneId.of("Europe/Berlin");

    // set default values
    @DefaultValue(DB_COL_WARRANTYEND)
    default ZonedDateTime defaultWarrantyEnd() {

        return getReceiptDate().withZoneSameInstant(ZoneIdEuropeBerlin).plusYears(3);
    }

    @DefaultSystemPropertyValue(SystemPropertyName.RETENTION_DATE)
    default ZonedDateTime defaultRetentionDate() {

        return ZonedDateTime.now(ZoneIdEuropeBerlin).plusYears(10);
    }

    @DefaultValue(DB_COL_MIMETYPE)
    default String defaultMimeType() {
        return MediaType.APPLICATION_OCTET_STREAM_VALUE;
    }

}

(1) The annotation @DefaultValue() only accepts the database column name as static string parameter.As the document type properties are CAMEL case and the database column names are SNAKE case you must convert your properties e.g. MyCamelCaseProperty = my_camel_case_property.In the below example constants are defined in the type.

The retention annotations also work for the document types: container, folder and relation.

Tenant separation

Objects can be separated by tenant. If this is activated on a type definition, the corresponding table will contain a system field tenant_id, where the id of the tenant an entity resides in is stored.

The field will be assigned to the creation users tenants id and never change. When entities are queried, a filter will be added that filters only entities whose tenant_id field have the value of the querying users tenants id.

Fallback-tenant

Should the creation user be in the fallback-tenant, the _tenant_id field will be set to NULL. Should a querying user reside in the fallback-tenant, the query will not be filtered by tenant. This means that the fallback tenant behaves like a view to the data that combines all tenants. If such a thing is unwanted deactivate the fallback tenant using the multi-tenancy.mode configuration property.

Usage

The behaviour is primarily defined by the multi-tenancy.mode configuration property

It can be controlled more fine-grained by the annotation @TenantSeparation, however this makes sense mostly in environments where multi-tenancy.mode is allowed.

A type separating its entities by tenant

@Type(ObjectType.CONTAINER)
@TenantSeparation(true)
public interface Person {

// property definitions ...
}

There are aliases specified: @TenantSeparated for @TenantSeparation(true) and @TenantAgnostic for @TenantSeparation(false).

The default behaviour depends on the multi-tenancy.mode configuration property:

If it is enforced, types are separated by tenant by default. Any type specified to be not separated by tenant will cause arveo to fail at startup.

If it is disabled, types are not separated by tenant by default. Any type specified to be separated by tenant will cause arveo to fail at startup.

Advanced db schema changes

Simple changes of the database schema like adding a new attribute are performed automatically by the system in maintenance mode. In some cases it might be required to perform more complex schema changes, which cannot be handled by the system automatically. The following changes cannot be performed automatically on tables that already contains data:

setting NOT NULL for an existing column;
type changes especially to non-string columns;
foreign keys;
making a column UNIQUE.

For example, changing the data type of an attribute is not supported because it usually requires project specific migration steps. Advanced changes like this can be performed by custom liquibase scripts.

To perform custom database schema migrations, arveo offers several ways to define custom liquibase migration scripts:

A global script that will be executed before the first type definition will be created or updated. This script can be configured using the property ecr.server.liquibase.preInitializationChangeLog.
A global script that will be executed after the last type definition was created or updated. This script can be configured using the property ecr.server.liquibase.customChangeLog.
A script for a specific type definition that will be executed before the type definition is created or updated. This script can be configured using the annotation @PreSchemaInitialization on the class representing the type definition.
A script for a specific type definition that will be executed after the type definition was created or updated. This script can be configured using the annotation @PostSchemaInitialization on the class representing the type definition.

The values of the configuration properties for the global scripts and the annotations must be valid URIs pointing to a liquibase changelog script. The URIs can point to a filesystem resource (using file:/) or a classpath resource (using classpath:). Each script will be executed in every configured tenant.

Schema initialization steps

For a better understanding of how the schema initialization works, the following list shows the steps performed by the system at startup:

for each tenant:
1. Create or update the system tables
2. Execute custom pre initialization changelog if configured
3. For each registered type definition class:
  1. Execute custom class-specific pre schema initialization script if configured
  2. Create or update the type definition table(s)
  3. Execute custom class-specific post schema initialization script if configured
4. Execute custom liquibase changelog if configured

Note that the actions performed by the automatic schema initialization in step 3.b. can be influenced by the changes that were already performed by the custom scripts executed before. For example, the system will not try to create a new attribute if the custom script has already performed the required schema changes.

Example

The following example shows a type definition class that defines a custom script that will be executed before the type definition is updated. The script expects that the type definition table already exists on the database and is used to change the data type of the attribute postal_code from Long to String. Note that for the sake of simplicity, the script does not perform an actual data migration but simply drops and re-creates the database column for the attribute.

Example for a type definition with custom pre schema initialization script


  1
2
3
4

  @Type(ObjectType.CONTAINER)
@Index(value = "my_container_name_index", onVersionTable = true)
@PreSchemaInitialization("classpath:liquibase/my-container-changelog.xml")
public interface MyContainer {

Example for a custom liquibase script


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

  <?xml version="1.1" encoding="UTF-8"?>
<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.1.xsd"
        logicalFilePath="my-container-changelog.xml">

    <changeSet id="update-my-container-1" author="root">
        <dropColumn tableName="my_container" columnName="postal_code"/>
        <addColumn tableName="my_container">
            <column name="postal_code" type="text"/>
        </addColumn>
        <dropColumn tableName="my_container_ver" columnName="postal_code"/>
        <addColumn tableName="my_container_ver">
            <column name="postal_code" type="text"/>
        </addColumn>
    </changeSet>

</databaseChangeLog>

Note that the script in the above example first updates the content of the type definition system tables to reflect the changed data type of the attribute postal_code of the type my_container. Doing this causes the automatic migration performed afterwards to ignore the change. Other changes in the type class would still be performed automatically, if possible. The script then simple drops and re-creates the column for the attribute. In a real-life scenario, this is the place where the actual data migration would happen.

Changes not checked during startup

The following changes in the type system will not be checked for:

Inheritance: Changing the source key or the source property of an inherited property is allowed. The system will accept it (and not even check it). This can have subtle consequences. The data of an entity created before such a change will still be as before. However, the next time it is updated the inheritance will be computed anew and thus the data will change according the new inheritance rule.
formatted counter sequence names: Changing the name of the sequence of a formatted counter will take effect. This can have an impact on your application. It will result in the creation of a new sequence and effectively reset the counters value. This might be desired effect - it could also be the result of an oversight in the type changes. To protect oneself from accidental changes it is deemed could practice to mark formatted counter fields with @Unique.
indexes prefixed with ecr_mnl_: Indexes defined on tables belonging to ecr types will be created and deleted according to changes in the types. However, indexes whose names start with the prefix ecr_mnl_ will be excluded from that. This enables admins to quickly react on slow systems without a system update inferring with such a patch. This has two consequences
1. Admins, that manually add an index, should pick a name for the index that starts with ecr_mnl_
2. Developers, that add an index to an arveo type should pick a name that does not start with ecr_mnl_

In a case where an index prefixed with ecr_mnl_ is used it will be beneficially in the long-run, to add the index to the type. In this case the prefix ecr_mnl_ in the name must be omitted when defining the index on the type.

Document Service

The Document Service service is responsible for handling various repository entities such as documents and folders. The following entity types are supported: document, folder, container, relation and metadata.

The service saves the binary data belonging to the documents and delivers them again. Various plugins are available for connecting storage devices and services. A plug-in is assigned to a profile and configured. When saving data, the client has to specify the profile to be used and thereby decides where the data will be saved.

Upload data

Content, annotations (see below) and metadata can be uploaded as a coherent document. 0-n content elements of different content types are possible. Each content element is named. As a result, you get a globally unique ID (DocumentID), which can be used to reference content, annotations and / or just metadata of the latest version of the document. It is possible to clone content elements from one document to another, creating a copy of the content on the storage. For that, a ContentReference can be supplied when the document is created.

Example: Upload a document with new content

TypedDocumentServiceClient<SingleContentDocument> serviceClient =
    typeDefinitionServiceClient.getDocumentServiceClient().byClass(SingleContentDocument.class); (1)

SingleContentDocument document = serviceClient.createTypeInstance();
document.setName("some name");

TypedDocumentClient<SingleContentDocument> client = serviceClient.create(
    new TypedDocumentInput<>(Map.of("content", (2)
        new ContentUpload(inputStream)), document)); (3)

1	The typeDefinitionServiceClient is an instance of TypeDefinitionServiceClient, that can be injected.
2	The type definition SingleContentDocument uses only the default content definition, hence the default name is used.
3	The actual content is passed as an InputStream.

Example: Upload a document with cloned content

TypedDocumentServiceClient<TypedTargetDocument> serviceClient =
    typeDefinitionServiceClient.getDocumentServiceClient().byClass(TypedTargetDocument.class); (1)

TypedTargetDocument document = serviceClient.createTypeInstance();

DocumentContentReference reference = new DocumentContentReference(documentId, "content"); (2)

TypedDocumentClient<TypedTargetDocument> client =
    serviceClient.create(new TypedDocumentInput<>(document, Map.of("content", reference)));

1	The typeDefinitionServiceClient is an instance of TypeDefinitionServiceClient, that can be injected.
2	Here the documentId is the ID of an already existing document that uses the default content definition.

Example: Upload a document with Base64 encoded data

Base64EncodedData data = new Base64EncodedData(base64Data);(1)
Map<String, ContentUpload> contentElements = Map.of("content", new ContentUpload(data));(2)

serviceClient.create(new TypedDocumentInput<>(contentElements, document));(3)

1	Wrap the Base64 encoded data in a new `Base64EncodedData` instance
2	Create a content upload with the created `Base64EncodedData`
3	Upload the document

Validating uploaded content

There are several different ways to validate the content of an uploaded document. The method to use depends on the requirements of the client application. Some applications might already have computed a hash of the content while others might offload this to the server.

Validating content on the client side

When content is uploaded to a type definition that supports content metadata, the server computes an SHA-256 hash for the received data and returns it in the result of the upload request. The client can use this hash value to compare the data received by the server with the original data. The following example shows how to compare the hash values:

Example: Checking hash values on the client side

ContentTest entity = client.create(input).getEntity(); (1)
Hash hash = entity.getContent().get("content").getHash(); (2)

Hash expectedHash = Hash.sha256Hash(inputStream, 1000000, tempFile); (3)
Assert.assertEquals(expectedHash, hash);

1	The document is uploaded using a type definition service client
2	Get the hash returned from the server. `getContent` is a getter for the system property `SystemPropertyName.CONTENT`.
3	Use `de.eitco.ecr.common.Hash` to compute the expected hash

The TypedDocumentServiceClient offers an additional method to validate uploaded content. The createAndValidate method automatically computes a hash of the uploaded data and compares it with the hash value returned from the server. If the two hashes do not match, a HashValidationException is thrown and the created document will be purged.

Example: Using the createAndValidate method

TypedDocumentServiceClient<ContentTest> client = typeDefinitionServiceClient
    .getDocumentServiceClient().byClass(ContentTest.class);

ContentUpload contentUpload = new ContentUpload(data);

Map<String, ContentUpload> content = Map.of("content", contentUpload);

ContentTest instance = client.createTypeInstance();
TypedDocumentInput<ContentTest> input = new TypedDocumentInput<>(content, instance);

client.createAndValidate(input);

Validating content on the server side

It is also possible to pass a hex representation of an SHA-256 hash code of the uploaded content to the server. If such a hash is present, the server will compare the computed hash value with the one specified by the client. If the values do not match, the upload fails and the uploaded file will not be stored.

Example: Checking hash values on the server side

Hash hash = Hash.sha256Hash(inputStream, 1000000, tempFile); (1)

ContentUpload contentUpload = new ContentUpload(
    "lorem_ipsum.txt", (2)
    null, (3)
    null, (4)
    data,
    hash
);

Map<String, ContentUpload> content = Map.of("content", contentUpload);

ContentTest document = client.createTypeInstance();
TypedDocumentInput<ContentTest> input = new TypedDocumentInput<>(content, document);

client.create(input);

1	Use `de.eitco.ecr.common.Hash` to compute the hash
2	The filename
3	null for the length, will be computed by the server
4	null for the content type, will be computed by the server

Validating the content of an existing document

The TypedDocumentServiceClient provides a method called hashMatches that can be used to check if the content of an existing document is valid. The client has to provide the expected hash, the document’s ID and the name of the content element to check. An additional parameter called loadContent defines if the server should use the hash value stored in the database or if it should load the content from the storage and compute a new hash value to compare. It is possible to check the content of a specific version of a document, too.

Example: Checking hash values of an existing document

Hash hash = Hash.sha256Hash(inputStream, 1000000, tempFile);
boolean hashMatches = documentServiceClient.hashMatches(documentId, "content", hash, false);

Download data

Content, annotations and metadata of a document can be downloaded via API. It is possible to load the entire document as a multipart or a structure of the document that includes all metadata, annotations and a list of content elements with their IDs, types and identifiers. Each content element can then be loaded using the document ID / content ID or the document ID / content type. Access to individual content elements without a document ID is not possible for reasons of access control. Access control based on the document ID is ensured with every access.

Update metadata without a version

The meta information of a document can be changed. The changes can be persisted in the database without creating a version. It is possible, to maintain frequently changing information on the document quickly without creating the overhead of a version. However, in the event of an audit, the changes are not traceable.

Delete an object

Documents contain one or more content elements which are not stored in the database but in the storage system. When a document is deleted using one of the delete-calls, the content elements will remain on the storage. To delete both the database entries and all content elements (including those referenced from older versions), a client can use the purge methods provided by the document clients.

A type definition can use the optional recycle bin feature. If it is enabled, entities in the type definition can be moved to and restored from the recycle bin. The Delete-API allows you to execute the methods:

MoveToRecycleBin(): to move an object to the recycle bin. The DELETE-property of the latest version is set to 1 and content and older versions are not affected.
Delete() all the versions of the object are deleted from the database.
Purge(): all the version of the objects are deleted from the database and the content objects or files are erased.
RestoreFormRecycleBin(): restore an object from the recycle bin, the DELETE-property is set to 0

If an object has relations to other objects is related by other objects the delete or purge method will fail with a foreign key exception. The Relation API provides methods to delete the relations (Remove Relations)

Filter recycle bin

Entities in the recycle bin will be filtered from normal queries by default, but a client can compose search expressions that override this behavior. To do that it is sufficient to include a reference to the deleted system field in the expression. The following example shows a part of a query that will show only deleted entities:

Excerpt of an example query

....and().systemField(SystemFieldList.GeneralSystemField.Deleted.INSTANCE).equalTo().value(true)

Note that the deleted system field can contain null values, which have the same meaning as false. When a client uses one of the delete calls to delete one or more entities, all database entries for those entities will be deleted (including all versions).

There is no option to restore entities once they have been deleted.

If there are relations between entities that are to be deleted, the relations are not deleted. Instead, a ForeignKeyException is thrown - and has to be handled by the caller.

Removing all relations of an entity

To delete all relations that originate from a certain entity, the method removeAllRelations() has to be used. The method returns the deleted relations:

List<Relation> removed = sourceContainerClient.removeAllRelations();

You can also delete all relations that point to a specific entity. For this, there is the method removeAllIncomingRelations(). This also returns the deleted relations:

List<Relation> removed = targetContainerClient.removeAllIncomingRelations();

Once all relations have been removed, the entity can also be deleted.

Locking

If your applications want to update objects from different processes at the same time you must decide if you want to use no locking or optimistic locking. No locking means that the latest update wins and overwrites the concurrent update. Depending on the database configuration it might happen that one update becomes a deadlock victim and an exception is thrown. If optimistic locking is enabled for the document type the API ensures that updates do not overwrite changes made by other clients by accident. The feature is disabled by default and can be enabled by annotating a type class with @OptimisticLocking. e.g. two processes A and B load the same object including content and versions at the same time and get the same version of the document. Now both processes process the document and some metadata and add additional content. A is faster than B. With No Locking B overwrites the changes made by A. With optimistic locking B cannot save the changes and receives a Locking exception. Process B has to load the changes made by A and retry the operation.

Download links for external users

You can create download links for content elements that can be used by external users who do not exist in the user management service. Such a link has an expiration date and can be used to download a single content element. The links are digitally signed using a configurable certificate, so that the receiver cannot alter the referenced content element or the expiration date of the link. The arveo service provides a special HTTP endpoint to process download links. This endpoint does not require authentication. Instead, the download link contains credentials that allow the user to access the referenced content element.

This feature must be activated by configuring a keystore containing an RSA keypair that will be used to sign the links.

The configuration options are listed here.

Download links (or content access tokens), can be created using the Java SDK as shown in the following exaple:

Injecting and instance of the ContentAccessTokenResourceClient

@Autowired
private ContentAccessTokenResourceClient contentAccessTokenResourceClient;

Creating a new content access token

ContentAccessTokenInput input = new ContentAccessTokenInput(
    documentId,(1)
    "content",(2)
    ZonedDateTime.now().plusHours(3)(3)
);

String token = contentAccessTokenResourceClient.createToken(input);

1	The ID of the document containing the content element to download
2	The name of the content element
3	The expiration date of the link (can be omitted)

The returned token can then be used to download the content by performing a GET request to the following endpoint:

GET http://my-arveo-instance/api/streaming/<token>

Download links can only be created for content elements that are accessible to the user creating the link. When the client does not define an expiration date when the content access token is created, the configured maximum lifetime is used.

The creation of new tokens will fail when the client specifies an expiration date that would exceed the configured maximum lifetime.

Versioning

The goal of using the concept of versioning is to create and work with version-safe archives and track the history of each change in the system.

Versioning basics

All entity types in arveo may have a version, which itself is an optional attribute. The attributes of the entity types specify in their definition whether they are versioned. If an entity type has at least one versioned attribute, a version table is created. The version number of an existing entity is automatically created and can be retrieved via the system property version_number.

In the version table, the version changes to the metadata are listed, as well as the changes to one or more content elements. Optionally you can specify a Unicode version comment. Each version gets a version ID, which is unique for this bundle of version tables. The version id allows a developer to retrieve content and metadata of exactly this version of the entity. Using the API a developer can query all versions including their metadata and content elements for each entity ID or version ID. It is ensured that the existing content of a version is not changed or deleted by a new version, but there is an exception to this rule, which does allow to overwrite a version change.

There is a function that allows you to make a change without having to note it in the version table. And there is a way to forbid this for a certain entity type.

Implementation of versioning

The concept of versioning is implemented using the annotation @Versioned, which is defined by the interface Versioned. This annotation defines if an attribute of a type is versioned or not (when placed on a getter) or if all attributes of a type are versioned or not (when placed on a type). When the annotation is present on a type and on a getter in the type, the annotation on the getter wins.

The following example of an object of type Container contains an attribute "name", which is a versioned attribute. The other attribute "counter" in this example is marked as not versioned.

Example:

@Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface TypedSourceContainer {

    @Name("counter")
    @Versioned(false)
    int getCounter();

    @Name("name")
    @Versioned
    String getName();
}

Data model for versioning

The actual search table only contains the current status of metadata and system fields. In the version table, however, all entities and their versions including the metadata are listed. Only versioned attributes are included in the version table. A current internal version counter (1.,2…n) is maintained in the system column version_number.

During versioning the service counts up the internal version counter by incrementing the value of the system column version_number by 1. The value is stored in the version table.

Changes to non-versioned fields cannot be tracked because they are not written to the version table. To prevent accidental overwriting of such fields, optimistic locking can be activated. In this case, a certain property is defined to let the system know, a certain version of an entity is outdated.

Optimistic locking

Activating the optimistic locking prevents overwriting for versioned fields. When simultaneously editing an entity and trying to overwrite saved changes of another user, an error message is thrown. Overwriting is not thus possible. Hence, through activating the optimistic locking on an entity type definition (using the annotation @OptimisticLocking), you prevent data corruption.

Optimistic locking is used only for single updates, not for batch updates.

Structure of the version system table

The version system table consists of the following columns (this is not a complete excerpt):

Table 34. Structure of the system table
column	db data type	java data type	nullable?
version_id	bigserial	long	no
entity_id	int8	long	yes
version_acl_id	int8	long	yes
modification_date	timestamp	ZonedDateTime	no
modification_user_id	int8	long	no
version_comment	text	String	yes
version_number	int4	int	no

In this table, version_id is the primary key. The foreign key entity_id references the corresponding entity table.

Version ID

The version ID has the following structure:

[12bit Tenant id][14bit Type Definition id][38bit Version id]

Here the tenant may be for instance a database scheme or a customer. It is followed by a type definition, for instance Container. The third part is the version id in the database. The composed version id is unique in arveo system.

Search language

Concept

Any client application, that needs a search function, can implement the Search Service with a suitable parameter. An example of such an implementation is the class DocumentServiceClient in the Client API. The search queries are formulated similarly, what is different is the search result, which is always typed. In arveo the type is Entity.

Technical implementation

Search Service is part of the module 'commons'. It was created to enable more convenient searching. The Search Service works on the basis of EQL (Eitco Query Language). This query language is also used for some other services, like Access Control Service. The main interface is SearchService. It is a functional interface, providing just one method to be implemented: search(). However, this functional interface has a variety of convenience methods, enabling faster and more convenient search, like firstResult(), uniqueResult(), count(), stream() and others.

Listing for the search method definition

Page<EntityType> search(@NotNull SearchRequest searchRequest);

As the only parameter, a search request is accepted, returning a Page of results. A Page has a page definition, a completeCount and a parameterized list of results. The Search Service also provides a method where() with a condition builder, filtering results based on a specific condition.

SearchServiceFactory is a server class, which builds search queries. It has methods for creating an instance of search service for Documents (searchServiceForDocument()), but also for all the other entities, including Metadata. The result of the search is transformed into a Document (or respectively another entity) by the DocumentMapper.

The class SearchResourceImplementation provides an API for searches that are not bound to one and only one type definition.

The interface SearchService is implemented by the class EcrSearchService.

The search client creates different search services, which can be used to search for corresponding entities, for instance a folder search service, a document search service and so on. And there is also a GenericUnionSearchService, that can be used to create any joins on search statements.

Usage

The following examples demonstrates the usage of the Search Service to retrieve an object page.

Example of Search Service usage

SearchService<Object> searchService = <a valid instance>;
Page<Object> objectPage = searchService.where()
    .contextReference("field").equalTo().value(7).or()
    .contextReference("other_field").greaterEqual().contextReference("another_field")
    .holds()
    .order().descendingBy("field").from(5).pageSize(7);

It is possible to check the type of object searched for:

Example for type checking


  1
2
3
4
5
6
7
8
9
10
11

              searchService.where() (1)
                .entity().typeId()   (2)
                .equalTo()
                .typeId(NamedFile.class) (3)
                .or()
                .entity().typeName() (4)
                .in().expressions(x -> x
                    .typeName(NamedTextFile.class) (5)
                    .typeName(NamedFolder.class)
                ).and()
                .entity().typeId().notEqual().typeId("named_relation") (6)

1	The variable `searchService` is an `EcrSearchService`.
2	The id of the type of given entity is referenced by the method `typeId()`.
3	The type id is checked to be the id of the type defined by the class `NamedFile` (which is obtained by the method `typeId()`).
4	Here the type name is referenced instead of the type id.
5	As with the type id, the name of the type defined by the class `NamedTextFile` is obtained.
6	The type id can also be obtained if only the type name is given.

Search endpoints

Using the ecr sdk you will be able to obtain a SearchClient by spring injection.

Injecting a search client

    @Autowired
    private SearchClient searchClient;

A search client has several methods to search in different ways or different contexts.

Aggregation searches

In some situations one needs to accumulate some values that are listed in a database. In SQL this is done using aggregate functions and the group by clause. For example in an invoice archive one might be interested in the number of invoices per customer, or the sum of their totals (per customer). Queries like this can be executed using the aggregated search. As opposed to the other search methods the result entity type of this search method is Map<String, Object>, since aggregating properties will potentially result in a different type - one that might not be specified. Thus, a more general return type is used.

To start an aggregated search query, you will need to build a search service for your aggregated search first. We will build a service for the example above: querying the number and total sum of invoices per user.

Assume that we have a type customer defined by the class Customer and a type invoice defined by the class Invoice:

@Type(ObjectType.DOCUMENT)
@FilingEnabled
public interface Customer {

    @SystemProperty(SystemPropertyName.ID)
    DocumentId id();

    @Unique
    String getName();

    void setName(String name);

}

@Type(ObjectType.DOCUMENT)
@FilingEnabled
public interface Invoice {

    @ForeignKey(name = "fk_invoice_customer", target = Customer.class, targetProperty = "id")
    long getCustomerNumber();

    void setCustomerNumber(long number);

    @Optional
    String getCustomerName();
    void setCustomerName(String customerName);

    @Optional
    String getName();

    void setName(String name);

    @Optional
    Integer getTotal();

    void setTotal(Integer total);

    @Optional
    Boolean getOpen();

    void setOpen(Boolean open);
}

As you can see, the invoice references the customer with the property customer_number defined by the method getCustomerNumber(). Now we can build a search service as follows:

        final EcrSearchService<Map<String, Object>> aggregationSearchService = searchClient.aggregate() (1)
            .count("i", "id").as("invoice_count") (2)
            .sum("i", "total").as("invoice_total")
            .groupedBy("c", "name").as("customer")
            .from().type(Invoice.class).as("i").join().type(Customer.class).as("c")  (3)
            .on().alias("i").field("customer_number").equalTo().alias("c").id() (4)
            .holds().build();

1	Calling `SearchClient.aggregate()` is the entry point to the fluent api to build a search service for aggregation search requests.
2	At first, we need to specify what to aggregate and what to group by: In our case we want to get the count (of the invoice ids) and the sum of the invoice totals grouped by customer name (which is unique). Every field that is grouped by will also be part of the result.
3	Now we need to specify from where the data to aggregate comes from. We join the type `Invoice` with the type `Customer`. Note that we specify aliases for the types `"i"` and `"c"`, which we used in the step before to reference the types fields.
4	Now we specify the condition for the join. The condition is that the invoices `customer_number` must equal the customers' id - as the foreign key `fk_invoice_customer` above specifies.

Now we can query for the aggregated data:

        final List<Map<String, Object>> all = aggregationSearchService.where().alwaysTrue().holds().unpaged();

This will result in a list of maps - one map per customer - where every map contains the keys "customer", "invoice_count" and "invoice_total". Holding the customers name, the number of their invoices and their total sum, respectively´.

Additionally, we can query specific customers and invoices using the same search service. In this scenario for example we could query every customers "invoice_count" and "invoice_total" of invoices that are open i.e. that they haven’t paid, yet:

        final List<Map<String, Object>> open = aggregationSearchService.where()
            .alias("i").field("open").equalTo().value(true)
            .holds().unpaged();

This will also result in a list of maps - one map per customer - where every map contains the keys "customer", "invoice_count" and "invoice_total". Holding the customers name, the number of their invoices and their total sum, respectively´ - only counting open invoices.

Note that we can reference the invoices field open by using the alias i we provided earlier, even though it is not part of the result. === Enterprise Search ==== NoSQL Document Database apache solr 8.6 Apache Solr is a search server and is used as an independent full-text search server for ECR Healthcare. Solr uses the Apache Lucene search library as the core for full-text indexing and search.

Retention periods

arveo supports a range of retention management features:

Full support of document life cycle;
Supports prolongation and litigation hold for data retention managers;
Privileged delete before retention expires;
Privileges for data protection officers (delete) and data protection managers (litigation);
Flexible storage container definition (e.g. months, years) for documents with identical retention period (S3 buckets or file system folders);
Fast erasure of storage container by asynchronous delete jobs.

Concept

arveo is able to store content with a fixed retention date to ensure that the legal or tax relevant retention period of a document is taken into account and the content is protected from deletion. You can configure retention rules for arveo document types and automatically apply the appropriate retention period to uploaded documents.

If some of your documents could be required in a legal proceeding but the retention period expires before the end of dispute you can set a litigation hold or prolong the retention period to protect the data until the dispute has finished.

Let us describe why the storage container concept is used by arveo. Most storage systems can create objects much faster than they can delete them. Once the retention has expired it is much faster to remove a bucket (cloud storage) or partition/directory (file system). You can setup retention rules to define which documents are stored to the containers. All documents within a certain retention range (e.g. 1 year or 3 months) will be stored to one storage container (S3 bucket or directory). arveo allows you to delete millions of content objects in a very short time by simply removing the entire storage container.

If a document needs to be deleted e.g. for data privacy reasons, arveo also provides an API call to erase single objects by their ID. If you want to delete an object before its retention period has expired the user needs along with delete_right also the dataprivacy_admin privilege.

Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo does not use hardware retention features, which protect data from erasure on the hardware level. arveo protects the content by software design. arveo stores the retention information in the database and only allows access to the content and metadata by the arveo REST API. The REST API prevents any delete operation before the retention period has expired. As only arveo and highly authorized administrators have data writer rights for the database and the storage it is impossible that content be deleted or manipulated before the retention expires.

The operator must take appropriate technical or organizational measures to ensure that the data is stored in the storage in such a way that it cannot be changed within the legally prescribed retention period.
The provider of the arveo services should ensure that only authorized data protection officers & administrators have data write (INSERT,UPDTAE; DELETE) permissions for the database and the content repository.

Storage container and document life cycle

Since deleting large amounts of documents is a performance critical task, the arveo repository service provides special support for mass deletion of documents whose retention period has expired.

The basic idea is to define separate storage locations, which are exclusively used to store documents with similar retention requirements. The deletion of documents with specific retention requirements is then a matter of deleting all contents of a specific storage location in one step. Storage locations containing documents with the same retention period will be called storage container for the rest of this section.

arveo allows you to store data with the same retention in one storage container and is able to create storage containers automatically.

The storage containers are either folders (file system storage) or buckets (S3 object storage). The actual selection of the storage container for a document with specific retention requirements can be configured by rules, that select the storage container based on the retention period and litigation hold status of the uploaded document.

When the litigation hold is set, the object is moved to the litigation hold directory or bucket and will not be deleted when the initial retention period expires. When the litigation hold ends, the document is deleted the next time a delete job runs. The number of objects under litigation hold is typically small and does not affect the overall erasure performance.

When a litigation hold is removed, the objects are moved to other storage container which do not have a litigationHold on them.

The following diagram shows the life cycle of a document with a fixed retention period set on upload, a legal dispute and automatic erasure at the end of the document’s life cycle:

Figure 8. Retention in buckets

Each storage container in fact corresponds to a separate storage profile that is used to store the contents of that storage container. The rules that are used to map the retention requirements of documents to storage container are defined as rules for the Bucket Organizer Plugin, see Bucketorganizer.

Litigation hold

arveo provides a system property LITIGATION_HOLD that allows you to prolong the retention until you remove the litigation hold property.

This function requires the DATAPRIVACY_ADMIN privilege.

Prolongation

You can prolong the retention period but not shorten it. You can use the API call to set the initial retention period if the retention is null. When the retention is prolonged, arveo moves the object to the appropriate storage container.

This function requires the DATAPRIVACY_ADMIN privilege.

Erase a document

The arveo delete API will as for all other objects without a retention period delete the respective objects. See also Deletion of objects and Recovery table.

After the retention period has expired, the function requires the DELETE privilege, but before the retention period has expired, DATAPRIVACY_PRIVILEGED_DELETE privilege is required.

This API should not be used for operations like deleting the objects of a certain year. This should be done using the erasure storage container API.

Erase storage container

If you have used the storage container feature to speed up the deletion of documents at the end of their life cycle, you can delete all documents within a retention period range with one API REST call 'EraseStorageContainer'.

You can either erase the storage container (buckets, folders) controlled by your operating team or with an automated arveo job. You can set up a scheduled job in the arveo integration service. Use the erasure storage container template job and adopt it to your needs. The erasure job will delete all entities of a document type within the given retention period range where litigation hold is not set. The job will write an entry for each erased object in the corresponding audit log table. For more detailed explanation, see the erasure job template example.

Mass deletion of documents under retention requires the SUPER_USER privilege.

Enable the audit log feature for all document types and dependent document types if you need a report of the erased objects. Audit Log

Grant the deletion right for your storage containers to arveo. If arveo cannot delete the containers, your operating team is in charge of this task and you must set the option delete rows only.

Privileges & roles

Privilege

DATAPRIVACY_ADMIN (Data Protection Manager)

DATAPRIVACY_PRIVILEGED_DELETE (Data Protection Officer)

SUPER_USER (Data Protection Administrator)

Prolongation

yes

Litigation Hold

yes

Delete before retention

yes

Mass Delete

yes

Examples

Create document with retention and set litigation hold

public void createDocumentWithRetention() throws IOException {

    final String TEST_IDENTIFIER = "SetLitigationHold test timestamp in ms=";
    final String TEST_DATA = "abcde";
    final String TEST_DATA_MIMETYPE = MediaType.APPLICATION_OCTET_STREAM_VALUE;

    TypedDocumentServiceClient<DocumentWithRetention> serviceClient =
            typeDefinitionServiceClient.getDocumentServiceClient().byClass(DocumentWithRetention.class);
    ZonedDateTime now = ZonedDateTime.now(ZoneOffset.UTC);

    DocumentWithRetention newDocument = serviceClient.createTypeInstance();
    newDocument.setName(TEST_IDENTIFIER + System.currentTimeMillis());
    newDocument.setReceiptDate(now );
    newDocument.setMimeType(TEST_DATA_MIMETYPE);
    newDocument.setRetentionDate(now);

    ByteArrayInputStream data = new ByteArrayInputStream(TEST_DATA.getBytes());

    Map<String, ContentUpload> content = Map.of("content", new ContentUpload(data));

    TypedDocumentClient<DocumentWithRetention> newClient = serviceClient.create(new TypedDocumentInput<>(content, newDocument));

    Assert.assertEquals(IOUtils.toByteArray(newClient.readContent("content")), TEST_DATA.getBytes());
    DocumentWithRetention loadedDocument = newClient.getEntity();

    Assert.assertNotNull(loadedDocument);
    Assert.assertTrue(loadedDocument.getName().startsWith(TEST_IDENTIFIER));
    Assert.assertEquals(loadedDocument.getMimeType(), TEST_DATA_MIMETYPE);

    assertDateEquals( loadedDocument.getReceiptDate(), now);
    assertDateEquals(loadedDocument.getRetentionInformation().getRetentionDate(), now);
    Assert.assertFalse(loadedDocument.getRetentionInformation().isLitigationHold());

    // set LitigationHold = true
    newClient.updateLitigationHold(true);
    newClient = newClient.reload();
    DocumentWithRetention litigationOnDocument = newClient.getEntity();
    Assert.assertTrue(litigationOnDocument.getRetentionInformation().isLitigationHold());

    // set LitigationHold = false)
    newClient.updateLitigationHold(false);
    newClient = newClient.reload();
    DocumentWithRetention litigationOffDocument = newClient.getEntity();
    Assert.assertFalse(litigationOffDocument.getRetentionInformation().isLitigationHold());

}

Set retention / prolong retention

public void createDocumentWithoutRetention() throws IOException {

    final String TEST_IDENTIFIER = "SetRetention test timestamp in ms=";
    final String TEST_DATA = "abcde";
    final String TEST_DATA_MIMETYPE = MediaType.APPLICATION_OCTET_STREAM_VALUE;

    TypedDocumentServiceClient<DocumentWithRetention> serviceClient =
            typeDefinitionServiceClient.getDocumentServiceClient().byClass(DocumentWithRetention.class);
    // store document without retention
    DocumentWithRetention newDocument = serviceClient.createTypeInstance();
    newDocument.setName(TEST_IDENTIFIER + System.currentTimeMillis());
    newDocument.setReceiptDate(ZonedDateTime.now());
    newDocument.setMimeType(TEST_DATA_MIMETYPE);

    ByteArrayInputStream data = new ByteArrayInputStream(TEST_DATA.getBytes());

    Map<String, ContentUpload> content = Map.of("content", new ContentUpload(data));

    TypedDocumentClient<DocumentWithRetention> newClient = serviceClient.create(new TypedDocumentInput<>(content, newDocument));

    Assert.assertEquals(IOUtils.toByteArray(newClient.readContent("content")), TEST_DATA.getBytes());
    DocumentWithRetention emptyRetentionDocument = newClient.getEntity();

    RetentionInformation retentionInformation = emptyRetentionDocument.getRetentionInformation();
    Assert.assertNotNull(retentionInformation);
    Assert.assertNull(retentionInformation.getRetentionDate());
    Assert.assertFalse(retentionInformation.isLitigationHold());

    // set initial retention
    ZonedDateTime initialRetentionDate = ZonedDateTime.now();
    emptyRetentionDocument.setRetentionDate(initialRetentionDate);
    TypedDocumentClient<DocumentWithRetention> initialRetentionClient = newClient.updateAttributes(emptyRetentionDocument);
    DocumentWithRetention initialRetentionDocument = initialRetentionClient.getEntity();
    assertDateEquals(initialRetentionDocument.getRetentionInformation().getRetentionDate(), initialRetentionDate);

    // prolong retention
    ZonedDateTime prolongedRetentionDate = ZonedDateTime.of(2050, 1, 1, 0, 0, 0, 0, ZoneId.of("Europe/Berlin"));
    initialRetentionDocument.setRetentionDate(prolongedRetentionDate);
    TypedDocumentClient<DocumentWithRetention> prolongedRetentionClient = initialRetentionClient.updateAttributes(initialRetentionDocument);
    DocumentWithRetention prolongedRetentionDocument = prolongedRetentionClient.getEntity();
    assertDateEquals(prolongedRetentionDocument.getRetentionInformation().getRetentionDate(), prolongedRetentionDate);

}

Retention cleanup job

The retention cleanup job can be used to remove entities with an expired retention period that are not currently in litigation hold status. The job can be triggered to run in the internal job scheduler of the repository service or in a separate instance of the job service. It expects two configuration parameters to be present in the job context of the triggered execution:

type-definition-name: The name of the type definition that contains the entities to remove.
retention-cleanup-retention-end-time: The time at which the rentention period has expired. All entities with a retention period that has expired before the specified time will be removed. The specified time must be in the past.

On systems supporting multiple tenants, the tenant to run the job for must be configured using the property tenant.

The following optional properties can be set in the context of the triggered execution:

retention-cleanup-asynchronous: Enables the asynchronous mode of the job. The asynchronous mode is described below.
retention-cleanup-batch-size: The size of a single batch of entities to process. The default is 1000 and the maximum is 10000.
retention-cleanup-filter: An optional filter in form of an EQL Expression<Boolean> to apply to the query used to find entities with expired retention period.
retention-cleanup-maximum-queue-size: The maximum acceptable size of the message queue used by the job in asynchronous mode. If the queue size exceeds the configured maximum, the job will stop. The job will check the queue size once a minute.
retention-cleanup-duration: An optional maximum duration of the job’s runtime. If the duration is exceeded, the job will stop. The default is null (no limit). The value must be a java.time.Duration.
retention-cleanup-max-entity-count: The maximum number of entities to process. If this number is reached, the job will stop. The default is -1 (no limit).
retention-cleanup-protocol-file: Optional property that can contain a fully qualified path to a file that will contain a list of all deleted entity IDs.

The retention-cleanup-maximum-queue-size limitation mechanism relies on statistics data for the JMS queues that is collected by a separate system job. This job is not enabled by default and must be enabled using the property ecr.server.jobs.jms-statistics.enabled=true to use this feature.

All properties except the type definition name, the retention end time and the protocol file can be configured in the configuration file of the service either globally or for each type definition. See configuration reference for details.

Asynchronous mode

In the asynchronous mode, content is not deleted from the storage immediately. Instead, a message queue is used to delete the content asynchronously. The database entries for the documents are not removed but marked as deleted using the COMPLIANCE_DELETED field.

The entries that were marked as deleted are automatically excluded from query results. It is not possible to read those entries using the arveo client API.

The arveo client API can be used to delete all entities that were marked as deleted as shown in the following example:

Deleting marked entities

Expression<Boolean> expression = EcrQueryLanguage.condition()
    .entity().systemField(SystemFieldList.GeneralSystemField.ComplianceDeleted.INSTANCE)
    .equalTo().value(true).holds();
serviceClient.delete(expression, 1000); (1)

1	Set a limit to reduce database load

Triggering the job

Both the repository service and the job service offer an API that provides methods to create triggers for the job. The following example shows how to use this API to create a simple trigger that will fire once at a specified time. The API requires administrator privileges.

Triggering the retention cleanup job

EcrSchedulerResourceClient schedulerClient = systemManagementClient.getSchedulerClient();

JobKeyModel jobKey = new JobKeyModel(SystemJobIdentities.ECR_JOBS_GROUP, SystemJobIdentities.RETENTION_CLEANUP);
TriggerKeyModel triggerKey = new TriggerKeyModel(SystemJobIdentities.ECR_JOBS_GROUP, "test-trigger-retention-cleanup");

SimpleTriggerModel trigger = new SimpleTriggerModel(triggerKey, jobKey);
trigger.setNextFireTime(ZonedDateTime.now());
trigger.setJobDataMap(Map.of(
    SystemJobDataKeys.TYPE_DEFINITION_NAME, SimpleInvoiceNames.getTypeDefinitionName(),
    SystemJobDataKeys.RETENTION_CLEANUP_RETENTION_END_TIME, ZonedDateTime.now(),
    SystemJobDataKeys.TENANT, "master",
    SystemJobDataKeys.RETENTION_CLEANUP_PROTOCOL_FILE, getTargetDir() + File.separator + "retention-cleanup-job.log"
));

schedulerClient.scheduleSimpleTrigger(trigger);

The API provides additional methods to create cron expression based triggers and to unschedule a job.

REST API

Client SDKs

The client SDKs provide APIs for applications using arveo. SDKs exist for both Java and TypeScript. Client applications should not use the REST API of arveo directly but instead use one of the provided SDKs.

JSON serialization

arveo uses a custom serialization for the JSON data in the REST API to support advanced features like polymorphism. Additionally, the custom serialization allows the arveo server and the client SDKs to pass type information. This way it is for example possible to differ between number types like short, int and long. The client SDKs take care of the serialization and the direct usage of the REST API is discouraged.

If it is necessary to (de-) serialize the custom JSON data, use the already configured Jackson ObjectMapper that is used by the server and the SDKs. This ObjectMapper is equipped with mixin types that contain information about how to (de-) serialize the custom JSON content. The internal ObjectMapper can be obtained by injecting an instance of de.eitco.commons.spring.web.json.AsdlObjectMapperHolder.

The service offers an overview page containing the REST resources and details about the models. It can generate examples for the models, too. The overview page is located at the root URL of the service.

Type information

Each object contains a type identifier in a json property called @type. The required value is listed in the API overview page for each model class. Example:

"identifier": {
  "@type": "container-id",
  "identifier": {
    "@long": "1"
  }
}

Type information for data types

There are some special type identifiers used to identify the type of JSON fields.

The following table lists types and their corresponding identifiers.

Table 35. Types in Java and their Identifiers in *arveo*
Type (Java)	Identifier
Byte	@byte
Short	@short
Long	@long
BigInteger	@big-int
Float	@float
Instant	@utc-date-time
ZonedDateTime	@zoned-date-time
Class<?>	@type-reference
UUID	@uuid
byte[]	@binary
LocalDate	@date
LocalTime	@time

Other data types do not require specific type identifiers.

The following example shows a special type identifier:

"retentionDate": {
  "@zoned-date-time": "2020-12-15T15:52:21.5193002+01:00[Europe/Berlin]"
}

Collections

To distinguish between different types of collections (lists and sets) there are type identifiers for collection types.

Table 36. Identifiers for the Types List and Set
Type (Java)	Identifier
List	@list
Set	@set

The following is an example of the Type List:

"list": {
  "@list": []
}

Java SDK

The SDK contains the general API for accessing the arveo. The SDK can be used both to access the arveo via HTTP and to use the arveo as an embedded library.

Maven dependency of the Client SDK for usage via HTTP

<dependency>
   <groupId>de.eitco.ecr</groupId>
   <artifactId>ecr-sdk-http</artifactId>
   <version>${ecr.version}</version>
</dependency>

Maven dependency of the Client SDK for embedded usage

<dependency>
   <groupId>de.eitco.ecr</groupId>
   <artifactId>ecr-embedded</artifactId>
   <version>${ecr.version}</version>
</dependency>

The SDK offers both a generic API, where attributes of objects are mapped as a generic map, and a typed API. The typed API uses classes to be created by the project that represent the objects with the attributes. The main entry point for the API is the class de.eitco.ecr.sdk.TypeDefinitionServiceClient. An instance of this class can be obtained using Spring Dependency Injection. With the methods

getDocumentServiceClient()
getContainerServiceClient()
getFolderServiceClient()
getRelationServiceClient()
getMetaDataServiceClient()

you obtain a client factory that can be used to create a service client for a specific type definition. This service client can then be used to create new objects or load existing objects. For created or loaded objects, one in turn receives an entity client that offers methods for accessing the object. Special version clients are also available for concrete versions of entities.

Using the SDK in a non-web application

The SDK can be used both in applications that provide web functionality like REST endpoints and in applications that do not contain any web functionality. For non-web applications, some differences need to be considered.

Dependencies

By default, the SDK contains an OAuth2 client implementation that relies on some web-related spring beans. For non-web applications, a different OAuth2 client implementation is available. The default implementation needs to be excluded from the SDK dependency and replaced by the non-web implementation as shown in the following example:

<dependency>
    <groupId>de.eitco.ecr</groupId>
    <artifactId>ecr-sdk-http</artifactId>
    <version>${ecr.version}</version>
    <exclusions>
        <exclusion>
            <groupId>de.eitco.commons</groupId>
            <artifactId>cmn-spring-security5-oauth2-client</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>cmn-spring-security5-oauth2-client-non-web</artifactId>
    <version>${commons-oauth2-version}</version>
</dependency>

The current version of the OAuth2 client can be found in the Nexus.

Application initialization

The SDK contains some dependencies that cause Spring to initialize some web functionality automatically. This can cause problems like missing spring security configuration errors. Non-web applications can simply turn off all of Springs web functionality by using the SpringApplicationBuilder class as shown in the following example:

@SpringBootApplication
public class MyApplication {

        public static void main(String[] args) {
                new SpringApplicationBuilder(MyApplication.class)
                        .web(WebApplicationType.NONE)
                        .run(args);
        }
}

Batch Operations

The SDK provides various methods for batch operations. For example, several objects can be created or updated at once.

Create, update or delete multiple objects of the same type

All service clients provide methods for creating, updating and deleting multiple objects. Since a service client is bound to a specific type definition, only objects of the same type can be created, updated or deleted in this way. The objects to be updated or deleted are identified by any selector. When updating, methods are available that return the updated objects and methods that return only the number of updated objects. Especially if a large number of objects are updated at once, only the latter methods should be used. With these methods, the objects can only be updated in the same way. If the objects are to be customised, the methods from the BatchOperationServiceClient (see below) must be used.

Create or update several objects of different types

The BatchOperationServiceClient class provides methods to create or update multiple objects of different types.

Create several interdependent objects

To create multiple objects of different types, special BatchCreateInput input objects are used that bundle the type of the object and its properties. The order in which the objects are created corresponds to the order in which the input objects are passed. Each of these input objects contains a virtual ID that identifies it within the batch operation. In this way, for example, a relation as well as its source and target can be created in a batch operation. The relation only has to be created with the virtual IDs of source and target.

If the relation between the objects consists not only of the ID, but also of a foreign key to any attribute, a reference to the corresponding attribute of the referenced object must be given to the dependent object. For this purpose, the class BatchAttributeReference is available, which bundles the name of the foreign key attribute, the referenced attribute and the virtual ID of the other object in the batch operation. Code examples can be found in the class de.eitco.ecr.system.test.BatchCreationIT.

Update multiple objects of different types

The BatchOperationServiceClient also provides methods to update several different objects of different types in a batch operation. A separate input object is passed for each object to be updated, which contains the ID of the object and the properties to be updated. This means that individual changes can also be made to each object with these methods. The BatchUpdateUtility class provides auxiliary methods with which the respective input objects can be created. Code examples can be found in the class de.eitco.ecr.system.test.BatchUpdateIT.

Automatic update in case of collision

The BatchCreateInput objects used to create various types make it possible to automatically update the existing object in the event of a collision. To do this, the BatchCreateInput only has to be made aware of the field on which the collision could occur:

TypedContainerBatchCreateInput<Person> containerBatchCreateInput =
    new TypedContainerBatchCreateInput<>(new TypedContainerInput<>(person), List.of());
containerBatchCreateInput.setCollisionCheckAttribute("first_name");

In the above example, a container is to be created in a batch where a collision could possibly occur on the attribute first_name.

The attribute that is to be used to detect the collisions must be provided with a unique constraint.

Create or update (upsert) operations

The SDK provides methods to perform create or update (upsert) operations on entities.The entity to update, if it should exist, is identified by an EQL selector. If a matching entity is found, it is updated using the provided data. If no matching entity is found, the provided data is used to create a new entity. The selector must match exactly one or zero existing entities. If it matches more than one entity, an exception is thrown. The following example shows how to perform an upsert operation.

Performing an upsert operation

TypedContainerServiceClient<Person> serviceClient =
    typeDefinitionServiceClient.getContainerServiceClient().byClass(Person.class);

LocalDate birthday = LocalDate.of(1995, Month.SEPTEMBER, 16);

Person person = serviceClient.createTypeInstance();
person.setBirthday(birthday);
person.setSurname("Smith");
person.setFirstName("John");
person.setBreakTime(LocalTime.NOON);
person.setProcedureDate(ZonedDateTime.now());

TypedContainerClient<Person> client = serviceClient.createOrUpdate(
    EcrQueryLanguage.condition().entity().field(PersonNames.FIRST_NAME).equalTo().value("John").holds(), (1)
    person
);

1	The selector that uniquely identifies the entity to update

Generic batch operations

The generic batch operation API can be used to perform different operations like create, update or delete in one transaction. The batch operations use the same input types as the other batch functions described above, which makes it possible to use the result of one operation in another following operation. The following operation types are available:

Read operations

TypedContainerBatchReadOperation
TypedDocumentBatchReadOperation
TypedFolderBatchReadOperation
TypedMetaDataBatchReadOperation
TypedRelationDataBatchReadOperation

The purpose of read operations is to provide input data for other operations. For example, a read operation could be used to read an entity of which only the ID is known, and then use the entity’s attribute values as input for a create operation. When the entity cannot be read, the entire batch of operations fails and the transaction is rolled back.

Delete operations

TypedContainerBatchDeleteOperation
TypedDocumentBatchDeleteOperation
TypedFolderBatchDeleteOperation
TypedMetaDataBatchDeleteOperation
TypedRelationBatchDeleteOperation

Delete operations are used to delete a single entity. Unlike the other operations, it is not possible to reference a delete operation. When the entity cannot be deleted, the entire batch of operations fails and the transaction is rolled back.

Update operations

TypedContainerBatchUpdateOperation
TypedDocumentBatchUpdateOperation
TypedFolderBatchUpdateOperation
TypedMetaDataBatchUpdateOperation
TypedRelationBatchUpdateOperation

Update operations are used to update a single entity. When the entity cannot be updated, the entire batch of operations fails and the transaction is rolled back.

Create or update operations

TypedContainerBatchCreateOrUpdateOperation
TypedDocumentBatchCreateOrUpdateOperation
TypedFolderBatchCreateOrUpdateOperation
TypedMetaDataBatchCreateOrUpdateOperation
TypedRelationBatchCreateOrUpdateOperation

Create or update operations perform an upsert as described in Create or update (upsert) operations. When the operation cannot update or create the entity, the entire batch of operations fails and the transaction is rolled back.

Create operations

TypedContainerBatchCreateOperation
TypedDocumentBatchCreateOperation
TypedFolderBatchCreateOperation
TypedMetaDataBatchCreateOperation
TypedRelationBatchCreateOperation

Create operations are used to create a new entity. When the entity cannot be created, the entire batch of operations fails and the transaction is rolled back.

Examples

The first example implements a solution for the following problem: An invoice was archived with a relation to an invalid customer. The customer must be replaced with a new customer and the reference in the invoice must be updated.

Using the generic batch api for crud operations

Customer customer = customerServiceClient.createTypeInstance();
customer.setName(UUID.randomUUID().toString());

TypedDocumentBatchCreateOperation<Customer> createCustomerOperation = (1)
    new TypedDocumentBatchCreateOperation<>(customer);

Invoice invoice = invoiceServiceClient.createTypeInstance();

BatchAttributeReference reference = new BatchAttributeReference( (2)
    InvoiceNames.CUSTOMER_NUMBER,
    SystemFieldList.GeneralSystemField.Id.INSTANCE.getName(),
    createCustomerOperation.getVirtualId().getUuid()
);

TypedDocumentBatchUpdateInput<Invoice> invoiceInput =
    new TypedDocumentBatchUpdateInput<>(invoiceId, new TypedDocumentInput<>(invoice), List.of(reference));

TypedDocumentBatchUpdateOperation<Invoice> updateInvoiceOperation = (3)
    new TypedDocumentBatchUpdateOperation<>(invoiceInput);

TypedDocumentBatchDeleteOperation deleteCustomerOperation = new TypedDocumentBatchDeleteOperation(customerId); (4)

List<EcrId> ids = batchOperationServiceClient.performTypedBatchOperations( (5)
    createCustomerOperation, updateInvoiceOperation, deleteCustomerOperation);

1	The operation to create the new customer
2	A reference to the ID of the new customer to be used for the customer_number field of the updated invoice
3	The operation to update the existing invoice
4	The operation to delete the invalid customer
5	An injected instance of `de.eitco.ecr.sdk.BatchOperationServiceClient`

The second example shows how to use a read operation.

Using a read operation

TypedDocumentBatchReadOperation readCustomerOperation = new TypedDocumentBatchReadOperation(customerId);

Invoice invoice = invoiceServiceClient.createTypeInstance();
invoice.setCustomerNumber(customerId.getIdentifier());

BatchAttributeReference attributeReference = new BatchAttributeReference( (1)
    InvoiceNames.CUSTOMER_NAME,
    CustomerNames.NAME,
    readCustomerOperation.getVirtualId().getUuid()
);

TypedDocumentBatchCreateInput<Invoice> invoiceInput =
    new TypedDocumentBatchCreateInput<>(new TypedDocumentInput<>(invoice), List.of(attributeReference));
TypedDocumentBatchCreateOperation<Invoice> createInvoiceOperation =
    new TypedDocumentBatchCreateOperation<>(invoiceInput);

List<EcrId> ids = batchOperationServiceClient.performTypedBatchOperations(readCustomerOperation, createInvoiceOperation); (2)

1	A reference to the name attribute of the customer read by the read operation used for the customer_name attribute of the invoice
2	An injected instance of `de.eitco.ecr.sdk.BatchOperationServiceClient`

System tables

This section contains information about the system tables used by arveo.

Tables for type definitions

The system stores some information like the ID of type definitions in the database. For this, the following tables are used:

ecr-types: Contains en entry for each type definition
ecr-types-content-elements: Contains 1:n mappings of content elements to type definitions.

ERD of type system tables

         +---------------+                     +----------------------------+
         |   ecr_types   |                     | ecr_types_content_elements |
         +---------------+                     +----------------------------+
         | id            |<----+               | ce_name                    |
         |---------------|     +---------------+ ce_type_id                 |
         | creation_date |                     |----------------------------|
         | ecr_version   |                     | ce_content_type            |
         | object_type   |                     | ce_profile                 |
         | type_name     |                     | ce_store_json              |
         +---------------+                     +----------------------------+

Table 37. Columns of ecr_types
Column	Type	Description
id	int4	ID of the type definition
creation_date	timestamp	Creation date and time
object_type	text	Type of the objects in the type definition
type_name	text	The name of the type definition

Table 38. Columns of ecr_types_content_elements
Column	Type	Description
ce_name	text	The name of the content element
ce_type_id	int4	The ID of the type definition containing the content element
ce_content_type	text	The allowed content type of the content element
ce_profile	text	The name of the storage profile used by the content element
ce_store_json	boolean	Whether the content element uses the JSON field or not

Folder structure tables

The object type FOLDER is used to create tree-like structures with parent- and child-relationships. The structure is stored in the ecr_folder_structure table. The table ecr_folder_structure_closure contains a transitive hull of the parent- and child-relationships to allow fast database queries in the tree.

ERD of folder structure tables

         +----------------------+         +------------------------------+
         | ecr_folder_structure |         | ecr_folder_structure_closure |
         +----------------------+         +------------------------------+
         | child_id             |         | id                           |
         |----------------------|         |------------------------------|
         | child_name           |         | child_id                     |
         | child_type_id        |         | child_type_id                |
         | parent_id            |         | depth                        |
         | parent_type_id       |         | parent_id                    |
         +----------------------+         | parent_type_id               |
                                          +------------------------------+

Table 39. Columns of ecr_folder_structure
Column	Type	Description
child_id	int8	The ID of the child-folder
child_name	varchar(128)	The name of the child-folder
parent_id	int8	The ID of the parent-folder
parent_type_id	int4	The ID of the parent type definition

Table 40. Columns of ecr_folder_structure_closure
Column	Type	Description
id	uuid	The ID of the entry in the closure table
child_id	int8	The ID of the child-folder
child_type_id	int4	The ID of the child type definition
depth	int4	The distance between the child and the parent on the direct path in the tree
parent_id	int8	The ID of the parent folder
parent_type_id	int4	The ID of the parent type definition

Recovery table

The recovery table ecr_recovery is used for the recovery feature.

ERD of ecr_recovery

         +----------------+
         |  ecr_recovery  |
         +----------------+
         | deleted_date   |
         | entity         |
         | entity_id      |
         | keep_until     |
         | type_id        |
         | version_id     |
         +----------------+

Table 41. Columns of *ecr_recovery*
Column	Type	Description
deleted_date	timestamp	Date and time at which the entity was deleted
entity	jsonb	A JSON representation of the deleted entity
entity_id	int8	The ID of the entity
keep_until	timestamp	The date and time until which to keep the entity in the recovery table
type_id	int4	The ID of the type definition
version_id	int8	The version ID of the deleted entity

Keystore tables

When the encryption feature is enabled for a storage profile, the generated keys are stored in profile-specific database tables. For each encrypted profile, a table called ecr_keys-<profile> and a table called ecr_keys_assoc_<profile> is created. The ecr_keys_<profile> table contains the generated keys, and the ecr_keys_assoc_<profile> table contains the associations between content elements and keys.

ERD of keystore tables

         +--------------------+           +------------------------+
         |  ecr_keys_profile  |           | ecr_keys_assoc_profile |
         +--------------------+           +------------------------+
         | id                 |<--+       | content_id             |
         |--------------------|   +-------+ id                     |
         | key                |           +------------------------+
         +--------------------+

Table 42. Columns of ecr_keys_<profile>
Column	Type	Description
id	int8	The ID of the key
key	bytea	The encryption key

Table 43. Columns of ecr_keys_assoc_<profile>
Column	Type	Description
content_id	text	The ID of the content element
id	int8	The ID of the key

Compatibility list

To operate arveo successfully the operator of the platform must provide and manage the following services.

Figure 9. Architecture overview

The following table lists 3rd party services used in arveo.

Table 44. 3rd party services in *arveo*
Service	Supported Version	Comment
JDK	Java 11	Integration tests run on Adopt Open JDK 11, but all JDKs are supported
ActiveMQ	ActiveMQ 5.15,5.16
PostgreSQL	postgres 12, 13
Apache Solr	Apache Solr 8.6
S3 Storage	Ceph 15, 16 NetAPP ONTAP 9 Dell Elastic Cloud Storage (ECS) AWS S3	Retention is not supported yet, even if provided by the vendor
File System	NFS CIFS
Linux OS	Ubuntu 18.04, 20.04
Application Server	Tomcat 9, 10
kubernetes	1.19	If helm deployment is used
docker	20.10.8	If helm deployment is used
OAuth	OAuth2.0	Grant flows: Client Credentials Flow Authorization Code Flow with PKCE Resource Owner Password Flow
Authentication Services	Keycloak 15 ADFS 2.0
LDAP Server	MS Active Directory
MS Graph		Document Conversion with Microsoft 365, requires M365 account
SSO	Kerberos	Kerberos Aiuthentication Service is MS Active Directory

Important Terminology

ECR: Short for Enterprise Content Services; this is the collection of the arveo content services providing all document and record features.
EQL: Eitco Query Language.

Used for search operations.
Entity: Object that represents a type of data structure used in arveo.
Document: An entity that can contain metadata and content.
Folder: An entity that contains metadata and is organized in a tree structure like in a file system.
Relation: An entity that represents a relation between two other entities.
Container: Simple folder-like object not organized in a tree structure but with relations to other objects.
Meta: An entity that contains only metadata.
Content type: A meta specification, that classifies the data.

Examples of content types are: original object, rendition, full text, text notes, XML properties, etc.
Retention: Continuous audit-proof storage of all company data for compliance or own business purposes.
Litigation hold: A flag that indicates whether a document is related to a litigation.

If the flag is set the document must never be deleted - even if the retention date has passed by.
Bucket: Object storage.
Encryption: Translating data into unreadable forms by means of electronic or digital codes or keys.

A specific key in the form of a procedure or an algorithm is required for the reverse transformation. Then the legitimate user can access the original data.
Annotation: A construct used on interfaces or getter-methods to specify their properties.
Storage profile: Are used to define on which storage the content elements are saved.
Storage Container: Are folders or buckets on the content storage containing documents with the same retention period (e.g. Jan-Dez 2031).