Introduction

What is arveo?

arveo is a Headless Content Service Platform.

arveo expands your digital company platform and your public cloud or data center solutions with cloud-based enterprise content management (ECM).

arveo is a multi-client and 100% cloud-ready content services platform. With arveo you can legally secure (GoBD certified) and DSGVO/GDPR-compliant manage the entire life cycle of your documents and files and process all your content. arveo ensures data and legal security even when using cloud storage services and takes into account the requirements of the GDPR and DSGVO with regard to the secure deletion of data.

With arveo enterprise-ready solutions can be created, from revision-proof content archives to complex file and transaction processing.

What is Content Service Platform?
…​ is a cloud ready Enterprise Content Management System
… is a collection of Microservices sharing the same data repositories
…​ provides REST interfaces.
…​ typically has ECM Services, AI Services, BPM, Conversion, Enterprise Search, etc.
… provides access to all kind of content like documents, videos, images, audio, etc.
…​ serves all kind of use cases with the organization
… content is stored once and edited and read by many applications.

arveo ‘s modern architecture based on microservices and state-of-the-art technologies was natively built for the cloud. Connect our lightweight arveo content services with a single, lean API with your system landscape, other open systems and the most suitable services for you from the cloud or on-premises. With this Best-Of-Breed approach, you can easily realize your company’s dream of a “single source of truth” across all systems.

arveo subteaser
Figure 1. Our Vision: Single Truth

The arveo content services manage the entire life cycle of your content like

  • Documents

  • Images

  • Videos

  • Audio

  • Text

arveo allows the free configuration of the content objects including metadata and mapping of folder hierarchies and electronic files.

The NOSQL technologies allow you to search across all meta values and document content with high performance regardless of the complexity of your search. Decisive advantages in mass data processing and search performance through the additional use of horizontally scalable NOSQL technologies. The used NOSQL apache solr 8.6 enterprise search engine combined with key value caches leads to an increase in speed of up to a factor of 1,000 compared to relational database systems.

Headless Content Services?

The market for "headless systems" has been growing for some time. These offer backend functions without a user interface of the system completely can be used by the end user. This is best known from content management systems (CMS) used in web development. With the increasing use of different end devices such as smartphones, tablets or wearables, they are increasing also the requirements for content management systems. In addition, users have a lot of content on different Want to reach channels. Headless CMS dispense with the front end and thus enable your content to be displayed various channels through a single REST API.

So if products are to be fully and seamlessly integrated in a platform and a dependency on a user interface or client is no longer desired, one speaks of so-called "headless systems".

The wide availability of different cloud services and solutions enables the set up a modern platform for your business processes. Instead of relying on a monolithic ECM as before, companies combine the most suitable cloud content services and create with the "best-of-breed" Approach targeted added value for your digital company platforms.

Regardless of whether you have your own solution, an open cloud application or your company portal, want to add secure and legally compliant ECM functions: You can access all of your data directly via a single interface (REST API), Access documents and information.

arveo is headless by design. All modules are hosted as pure backend cloud services from Eitco or optionally hybrid in your private cloud or on-premises in your data center Disposal. Of course, these are natively suitable for mobile applications.

API First

API First The stateless REST API is our product and is used by all arveo components and user interfaces. The web services are stable over the long term and are fully available to every customer.

It is important to us that our services have open interfaces and can be easily integrated into an enterprise service infrastructure. As a modern content services platform, the arveo uses standards wherever possible in order to use the steadily growing number of cloud-enabled services inside or outside the company infrastructure. Whether operating system, database, text recognition, machine learning or object storage, arveo can access services from different manufacturers and combine them with its own services in order to quickly create added value.

Best-Of-Breed Strategy

There are many ECM products and the market is constantly changing. A manufacturer-independent ECM standard such as SQL for relational databases has not fully established itself for ECM applications despite several attempts from WebDAV to JSR 170 to CMIS. The market is dominated by monolithic packages that master all ECM applications. A customer who implements a complex ECM application for his company often becomes highly dependent on a manufacturer and is faced with costs that are difficult to calculate when changing providers.

Due to the availability of platforms such as Amazon Web Services (AWS) or Microsoft Azure, which make a wide variety of services easily usable via web services, we are seeing a change in the behavior of companies who want to buy fewer complete solutions and instead are looking for specialized services that can easily be combined and thus create targeted added value for the digital company platform. Companies choose the best features from different manufacturers and combine them to create their own solutions, whereby you control the services used via your own API management or API gateways. This creates company platforms that not only access one, but often several repositories.

Best-Of-Breed
Figure 2. Best-Of-Breed Strategy

This often called Best-Of-Breed strategy benefits from the fact that the services available in the marketplaces have become increasingly standardized in recent years.

arveo consistently relies on a microservice architecture. The individual services are loosely connected to one another via lightweight stateless web service interfaces (http, REST) and each service can run and scale independently. All arveo functions are available via a uniform REST API gateway, which also takes care of the intelligent load distribution and the detection of defective services.

Scalability

Modern cloud-ready platforms rely on horizontal scaling and the load is distributed over many nodes, which can consist of inexpensive commodity hardware. Such a structure can also save costs through automated SCALE OUT and DOWN by switching nodes on or off as required. The arveo platform has a high tolerance for the failure of individual nodes. A high-performance availability is also required, since the end user nowadays only shows a limited understanding of long response times and can quickly switch to the competition in case of doubt.

All arveo services support containerized deployment and use stateless REST APIs so that they can be easily integrated into any cloud infrastructure. Through the use of containerized applications (Docker) and the service management of the open source Spring Framework, which well-known providers such as Netflix use and continuously improve, the services can be installed automatically as often as required and thus scale out and down if you use the cloud orchestration framework kubernetes. You can cluster together linux containers and build an auto-scaling and high available platform with high fail safety. A blue-green deployment for the risk-free, downtime-free rollout of new software versions is also possible.

Future Proof

Our services use standards as far as possible, so that services from different providers can be delivered without great integration effort and the customer can react quickly to changes in the market. Due to the secure web service interfaces, all services including the database can be obtained from the cloud at any time.

With arveo services, you can build a sustained system architecture. By design arveo will you allow to separate your business logic from arveo ECM standard services and all other available cloud services like OCR, AI, document conversion (e.g. to PDF), identity management. arveo solutions are designed to be manufacturer-independent, so that the underlying REST ECM and other services can be exchanged at easily calculable costs.

This approach makes it possible to exchange individual services through to the content services of the arveo with little and easily calculable effort. Even arveo ECM services can be replaced by comparable services and via an open source S3 connector supplied, third-party systems can access the content objects migration-free using the S3 standard API.

Hybrid Operation

arveo is a native cloud platform and is based on Open Source libraries and services. Through the consistent microservice architecture and the use of open source cloud technology, you can keep arveo's operating costs low.

Advantages of arveo operation

  • All services are horizontally scalable separately and can therefore also be operated on simple hardware. arveo runs on all Linux and Windows operating systems.

  • No additional license costs due to the consistent use of open source technology such as Linux, postgreSQL 12 and apache solr 8.6 NOSQL.

  • Container deployment: Simple integration into existing cloud platforms enables load-dependent, automated service provision up to blue-green deployment for seamless updates to new software versions.

  • Hybrid architecture: Flexible use of cloud services or on-premise services.

  • Low manufacturer dependency: By separating the user interface and business logic from the ECM / BPM services while using standards such as REST, S3 or BPMN2, there is less dependency on one manufacturer.

  • Web applications: We deliver templates for PWA (Progressive Web Apps) based on the state-of-the-art angular framework, which are completely open source. I.e. their surfaces belong to you and can be used independently of arveo.

  • Use of standards: Low training costs and high availability of know-how on the market through the use of standard frameworks (angular), standard interfaces (REST, S3, SAP Archive Link) and SDKs for Javascript, JAVA, C #.

Micro Frontends

In addition, you can also use our ready-made, modern, clear, responsive and functional micro frontends, to make the arveo content services and thus their content easily available at the right time and in the right place in your business processes.

Mobile First: All surface components and interfaces are designed for mobile use.

_arveo_ Micro Frontends
Figure 3. Micro Frontends

Architecture Overview

Content Services

arveo is a content service platform and provides a set of lightweight, operating system-independent content microservices.

All services and clients exclusively use the secure, stateless, state-of-the-art HTTPS REST API*. For the highest possible security on the web and to be suitable for mobile access, arveo uses token security based on the state-of-the-art Spring security framework.

A Java, C# und Javascript SDKs is available.

arveo has multi tenant support and separates content and meta values per tenant.

As arveo is built for cloud operating systems like Openstack you automatically deploy and can scale the arveo containerized applications with the cloud orchestration framework kubernetes. You can cluster together linux containers and build an auto-scaling and high available platform with high fail safety. Containerized applications scale horizontally and can run on commodity hardware.

arveo is available as containerized application or WAR/JAR file and allows a hybrid deployment: On-Premise or in Cloud.

diagram
Figure 4. Architecture Overview
Table 1. Content services in arveo
Service Description

Document Service

Store, edit and version documents, records/folders and their metadata.

Manage storage locations with retention periods (GoBD certificate & GDPR/DSGVO compliant)

Search of metadata with relational database postgreSQL 12 and NOSQL document db apache solr 8.6

User Management Service

User management with users, groups and roles.
Secure login and token authentication

Registry Service

Service registry for all arveo content services managing the availability of the services.

Config Service

Secure storage of configuration data in git or database

Access Control Service

Object access control providing permissions to users/groups

Audit Service

Creates and manages audit tables for all other entity types like document types, user management objects, etc. Provides API to access the audit trail of any object by its entoty ID

SAP Archive Link Service (optional)

Web server that processes documents in accordance with the SAP Archive Link standard

Document Conversion Service (optional)

Conversion of document formats like docx, xlsx, etc. to image formats or PDF/A

Enterprise User Management Service (optional)

Extends arveo with organisation structure features like positions or substitutes

Enterprise Integration Service (optional)

The _arveo enterprise integration service supports over 300 data formats and interfaces like XML, REST, CSV, Mail,
Easily integrate all your applications and IT systems e.g. scheduled data import or listen on events, etc.

Federation Service (optional)

Multi repository architecture: The open connector plugin interface allows to access data from other repositories (Saperion, Documentum, file systems directories)

3rd Party Services

To operate arveo successfully the operator of the platform must provide and manage the following services.

Table 2. 3rd Party Services in arveo
Service Description

Active MQ

Message Queue Service to process JMS and AMQP message

postgreSQL 12

Relational database cluster for arveo system properties and customer metadata

apache solr 8.6

NOSQL document database to support high performance content and metadata full text search

Content Storage

Either a S3 API capable object store service or a redundant file system server

Authentication Service (optional)
Keycloak, Active Directory

Identity Management implementing OAUTH2 workflow for secure login.
Implement Single Sign On (SSO) with identity management providers: Keycloak, Active Directory

Monitoring (optional)

Supports logging / monitoring via ELK (Elasticsearch, Logstash, and Kibana.

Supports Spring Service Admin Monitor

Supports Prometheus + Grafana Monitoring frontends

Industry standards

arveo relies on industry standards as much as possible to make integrations as easy as possible.

  • API: REST (JSON)

  • Storage: S3 (Cloud Object Storage API)

  • Authentication: OAUTH2, X.509 or Basic Auth.

  • Relational Database: JDBC access for PostgreSQL, Oracle, SQL Server

  • SAP: Archive Link Service

  • Containerized application deployment

Opensource Technology Stack

The technology stack has been chosen to ensure creating high-performance, cloud- and client-capable and scalable state-of-the-art (micro) services with a modern web user interface. Our chose tech stack enables the implementation of both small projects, which only consist of a single component in the backend, and large projects with various distributed components. The created components are deployable both locally on the customer’s hardware and in a cloud environment.

So the stack consists of the following components:

  • Spring Framework

The implementation of the backend components has been done in Java and Kotlin. The Spring Framework is used as the basis. Spring is an Open Source (Apache License) framework that has existed since 2004 with a large and very active developer community. The framework has a modular structure, which is why it is suitable for both simple and complex applications. It provides dependency injection, externalized configuration, and assistance with things like database access, transactions, messaging, etc.

  • Spring MVC, WebFlux

Spring MVC is a framework for creating web applications, especially for REST services. It is based on the servlet stack, in which a request is processed in a dedicated thread. WebFlux is also a framework for web applications, but is based on the reactive stack, in which the processing of a request is not restricted to one thread.

  • Spring Security

Spring Security is a component that provides authentication and authorization functionality. It can be used to secure web applications and also offers support for SSO technologies such as OAuth and SAML.

  • Spring Cloud

Spring Cloud is a collection of additional Spring components that provide the typical functionality required in a distributed or cloud application. The individual components can be used independently of one another and partly consist of integrable dependencies as well as independent applications. Which of the Spring Cloud components are used therefore depends entirely on the project requirements. Spring Cloud applications can be operated in managed cloud environments such as Cloud Foundry.

  • Spring Cloud Config

Spring Cloud Config offers a central configuration service as well as a client library for components that consume the configuration. In a Spring Boot application, it is sufficient to add the corresponding dependency. From then on, Spring will automatically read from the configuration service if it is available. The configuration data can be stored in simple files, in a database, a Git repository or in a protected repository such as Vault.

In a distributed application with several components running on different machines, Spring Cloud Config can be used to implement central management for the configuration of all components.

  • Spring Cloud Bus

Spring Cloud Bus provides a bus for communication between the components or for connecting external components. The communication is based on the AMQP protocol and requires a backend such as RabbitMQ or ActiveMQ. With the help of the bus, e.g. Notify components when their configuration in the configuration service has changed.

  • Eureka

Eureka is a Spring Cloud component provided by Netflix that provides a service registry. A service registry is a central directory of all service instances. A service or a client application therefore only needs to know the URL of the service registry in order to access one of the other services. Eureka is an independently executable component and offers a client library for access to the registry.

  • Hystrix

Hystrix is a Spring Cloud component provided by Netflix that can be imagined as a fuse in an electrical installation. If one component of a cloud environment fails, Hystrix can isolate it from the other components to prevent further failures. Another instance of the component can then provide the functionality.

  • Zuul

Zuul is a Spring Cloud component provided by Netflix that provides an API gateway. An API gateway acts like a reverse proxy and hides the individual microservices from a client application. The client application only knows the API gateway and does not have to worry about the URLs of the various services.

  • Ribbon

Ribbon is a Spring Cloud component provided by Netflix that provides a client-side load balancer.

  • Archetypes

There are Maven archetypes that can be used to easily start a new project based on our technology stack. Different archetypes are available for different types of applications. The generated projects contain a Jenkins file with a preconfigured CI environment including static code analysis with sonar, OWASP dependency checks, load tests based on JMeter, a release mechanism at the push of a button and an optional teams hook. Also included are packaging modules with which the application can be packaged as a Linux daemon or as a Windows service and IDE configuration files for IntelliJ and Eclipse.

  • Logging

In order not to depend on a specific logging implementation, logging has been implemented with a logging facade SLF4J or to be exact, with its specific implementation logback. In contrast to Log4J, Logback is actively maintained and is less complicated during initialization. It can be combined with SLF4J. Logback is one of the standard Spring dependencies.

  • Caching

Caching frameworks are available in many variants that cover very different use cases. Frameworks are listed here sorted according to their primary use case.

  • Local in-memory cache

Caffeine has proven itself as a fast local in-memory cache. It can be combined with Spring’s caching abstraction layer.

  • JDBC connection pool HikariCP has proven itself for JDBC connection pooling. This pool is also Spring’s standard dependency.

Security

Application Security

arveo is a content service platform you can trust. We are continuously working to ensure that our services can be operated securely in the cloud.

All arveo content services and clients communicate via state-of-the-art secure REST interfaces via the secure HTTPS (SSL) protocol. All services require the web standard OAUTH2 with OpenID Connect authentication using tokens. A central authentication service (Keycloak, Active Directory or arveo user management service) issues tokens with an expiry date. That ensures that only client authenticated against the central service can use the content service APIs.

Data Security

arveo can encrypt the content with AES 256 and thus protect it against unauthorized access. The key is stored in such a way that maximum security is guaranteed. In order not to re-encrypt all data if the key is compromised, own keys are generated. Only the keys used are encrypted with the customer key and stored separately (Encryption).
See also Data Integrity. _ arveo _ allows you to organize documents into folders and records. _ arveo _ can control the access rights such as reading, writing or deleting to each document via attributes or access lists and thus grant or deny the corresponding access to the groups or users.

ACL Permissions

  • None - no authorization (object not visible)

  • Browse - the user is allowed to see the metadata of the object, but not the content

  • Read - the user can read metadata and content

  • Relate - The user can add an annotation

  • Version - The user may change the content, but may not overwrite it

  • Write - The user can change metadata and content with the possibility to overwrite

  • Delete - The user can delete the object

Tenant Security

The metadata and the content of the tenants are separated. Each tenant has its own storage container and database. It is ensured that all data of a tenant is protected from unauthorized access by another tenant.
The data of a tenant can be easily exported.

Security Patches

For us it is important to continuously ensure that all known vulnerabilities are fixed and that we deliver security patches and hotfixes as early as possible to our customers.

To achieve this goal we integrated all kind of state-of-the-art tools like OWASP dependency check in our build process that perform automated static code analysis. We also perform PEN Tests on a regular basis.

What is OWASP?
The Open Web Application Security Project® (OWASP) is a nonprofit foundation that works to improve the security of software. Through community-led open-source software projects, hundreds of local chapters worldwide, tens of thousands of members, and leading educational and training conferences, the OWASP Foundation is the source for developers and technologists to secure the web.
OWASP is dedicated to enabling organizations to conceive, develop, acquire, operate, and maintain applications that can be trusted.
All of our projects, tools, documents, forums, and chapters are free and open to anyone interested in improving application security (https://owasp.org).

Application protection by design

What does Eitco to develop, operate and maintain a secure content service platform?

  • we only use Opensource Software from secure and accepted projects like Apache or Spring.

  • we implemented an open source review and monitor process

    • Software architecture review by the Eitco software architects

    • security check using OWASP dependency check

    • legal licence check to ensure that it is a real open source project on the long term.

    • we continuously check our open source dependencies with reference to architecture, security leaks, maintainability.

  • to ensure that all known vulnerabilities of 3rd party open source projects are eliminated we integrated the OWASP dependency-check tool in our nightly build. Dependency check checks our dependencies against a database with all known vulnerabilities.

  • in case a severe vulnerability is found we take the appropriate countermeasures.

    • provide a security path for our customers with a new version of the 3rd party library

    • change the implementation or configuration using the 3rd party component

    • inform our customers to update or reconfigure components like database, message queue, application server, etc.

    • replace the 3rd party component. The typically requires a major update.

OWASP Dependency-Check tool:
it is a Software Composition Analysis tool trying to find vulnerabilities made public within the project dependencies.
The tool checks if there is an issue tracked in the "Common Platform Enumeration (CPE)" for the dependency.
If a vulnerability is found it creates report with a link to the CVE entry.
It is command line interface that can be easily integrated in any nightly build process.
National Vulnerability Database (NVD)– (https://nvd.nist.gov).
Also read Jeff Williams und Arshan Dabirsiaghi “Unfortunate Reality of Insecure Libraries”
(https://owasp.org/www-pdf-archive/ASDC12-The_Unfortunate_Reality_of_Insecure_Libraries.pdf).

Compliance Recommendations (GoBD)

All companies using electronic data processing for legally or tax relevant documents have to be compliant to the "Principles for the proper management and storage of books, records and documents in electronic form and for data access" (GoBD, BMF letter November 28, 2019).

In addition to the proper use of the arveo and 3rd party services, we recommend implementing these measures when using Eitco as compliant repository for legally compliant storage of records and documents.

Indexing And Retrievel

To allow users and 3rd party applications to identify and find objects in arveo you should define a unique and immutable unique identifier property ( Data Modelling). The property must be @Unique to ensure that a user or business application can clearly identify the item. The unique identifier should use the taxonomy of business processes and contain all information to clearly recognize the document. Make the property @Readonly to ensure that the identifier is always set and immutable.

The minimizes the risk of incorrect indexing and undetectability of documents because the index is immutable, duplicate identifiers are rejected and the compliant taxonomy ensures that every user can find documents easy and fast. We strongly recommend building a documented, simple but clear taxonomy.

Your business application or the user must set the value when the object is created (@Mandatory annotation), or you can let arveo create a unique value by adding counter annotations. Add the @Autoincrement annotation if a simple sequential Long id meets your requirements.

If you need a more sophisticated unique identifier you can use the annotation _@FormattedCounter which allows you to create e.g. String identifiers like <year>-<sequence> (Unique Identifier Example)

List data types allow you to store more than String or long value for a property. You can search for each value using the array search operation of the arveo query language (Data Types).

Enumeration data types allow you to set one or more values from a fixed set of values.

Retention Periods

Enable that the statutory retention periods are assigned to the records, cases and document types (Retention Periods, Retention Rules) and ensure that the storage container are configured correctly (Retention Container) .

Check if the technically assigned retention periods also correspond to the statutory retention periods. Monitor the audit logs to ensure that the retention period is set and is correct. Monitoring could be automated or could be a random control by an employee.

The operating team must ensure that storage container contain only documents with the same retention period. Please do not use the same bucket in different storage profiles or assign a storage profile containing content with retention to different document types.

Grant the deletion right for your storage containers to arveo. If arveo cannot delete the containers, your operating team is in charge of this task, and you must set the option delete rows only.

Configuring storage containers in arveo-service.yaml and your content storage is an ongoing task for your operating team. Eitco will try to create the buckets or subdirectory on your storage system but can also use already existing ones.

It must be ensured that the system time cannot be manipulated (e.g. NTP server). Suitable map measures that a change in the system time is detected promptly.

Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo please take care that the content storage has no default hardware retention activated.

Audit Log

Enable the audit option for all types containing legally compliant content (Audit Log). If the platform is operated safely (Platform Security) users and applications can exclusively write content and metadata using the _arveo REST API. arveo logs all user or application update operations of content and metadata to the audit table.

All changes of content or metadata are persisted as a traceable and immutable version (Versioning) on your storage system and an audit entry is written to the audit log table (Audit Log) containing the author and the timestamp of the change. If a document is updated the version is incremented and saved in the version number. Although all version are traceable and accessible by the API we recommend making the version number system property visible in the application to identify copies of the original easily.

Ensure that the @Overwrite option is not set for legally compliant document types. If overwrite is turned on it is possible to manipulate the originally saved content and compromise the document without creating a versioned copy.

The audit logs are subject to the retention period of commercial and tax law. Ensure that the audit logs are kept for the legal retention period (10 years). We recommend that the operator of the platform exports and clears the audit tables using database tools after 2 years. Save the dumps as an arveo document with a 10-year retention period. If you need access to older audit logs you easily download the dumps and upload them to the database.

The audit tables must be protected against unauthorized access by users. Do not allow write-access to the audit tables to anyone but the arveo services. Only data protection officers are allowed to have controlled read access to the audit data.

Check the audit logs regularly to find unauthorized user activities.

Download And Migration

All documents in arveo that are subject to retention are available by the REST API and can be downloaded. The integrity and availability of the content is the responsibility of the provider and operator of the platform. The provider must ensure that failures of the storage systems for database and content are identified at an early stage and take appropriate countermeasures. See chapter Fail Safety for technical and organizational measures for high availability of the arveo platform.

In the event that data has to be migrated, arveo offers an extensive export API that enables content and metadata to be exported. arveo saves the hash value (https://en.wikipedia.org/wiki/Cryptographic_hash_function) in the database that was determined when the content was first uploaded (Upload Data). This hash value can be used as a checksum to detect accidental or intentionally corruption of data. If the hash value of the content after the migration is identical to the original hash the migration report proves the correctness of the migration process. To report the completeness of the migration process the _arveo API allows you to export a list of all records, cases and documents in a document type.

Legally Compliant Migration

  • Prerequisite for the migration

    • use verify and best hash check possible in your solution when uploading content to arveo.

  • During the migration

    • download content and metadata (including the original hash and retention period)

    • upload metadata and content to the migrated platform and set the retention period to the exact same value.

    • calculate hash of the migrated platform by downloading the content

  • After the migration

    • Correctness: .compare hash, metadata and retention period for each original and migrated record, case and document.

    • Completeness: check that each migrated document can be found using the unique identifier

    • Traceability: Create a report for each document type. Report the content hash evidence and the metadata for all migrated objects.
      Upload the migration report to the migrated platform and set the retention period to the retention date of the document with the longest retention period within the report.
      Depending on your retention policy you can create separate reports for a retention period range (e.g. by year).

Data Integrity

arveo guarantees high availability, reliability and high performance at all times. The system has to be protected from manipulation attempts by proven and well-thought concepts. The data that is stored and managed in the system is protected via the API. The access and editing rights are managed via ACLs. User rights are based on the developed concepts for roles, groups and ACLs. More detailed information on this is provided in the relevant chapters of this manual.

Access to all data (documents, metadata) takes place exclusively via the API, with the corresponding protection mechanisms so that the security of the data is guaranteed at all times.

Content Storage

The operator must take appropriate technical or organizational measures to ensure that the data is stored in the storage in such a way that it cannot be changed within the legally prescribed retention period.

Enable the verify option for all clients and integrations The upload API optionally can verify the uploaded content. The content service downloads the just uploaded stream from the content storage and compares the hash once again with the expected value (Upload Content). _arveo stores the hash value in a system property and persists the value in the document type metadata table.

In case of very sensible data you can enable transparent encryption (Encryption) to follow the data protection rules and prevent your administrators from access of document content.

Databases

For the supported databases postgreSQL 12 you can select between different data replication strategies: * Asynchronous replication (backup or mirror): Enables an asynchronous disaster recovery. Your database is periodically mirrored. * Synchronous database cluster: Transactions are synchronously replicated on more than one master node. The provider of the postgreSQL 12 cluster must guarantee that data is stored redundant and reduce potential data loss. The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the backup strategy prevents data loss.

The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the backup strategy prevents data loss.

Fail Safety

The system operator is responsible for data security and recovery. He must ensure that the backups of the data are checked regularly and that recovery is reliably possible in the event of a failure. The IT processes that ensure the secure, redundant and highly available storage of arveo data in databases and object or file system storage systems are particularly decisive for the proper operation of the platform. These are the responsibility of the operator of the platform, who must implement the availability and security of the systems in accordance with legal and organizational requirements.

We strongly recommend using a redundant file system or object storage system. If you do not at least backup your data periodically a data loss is likely.
For high availability with almost zero data loss your storage system should replicate the written content and data synchronously.
The operating team of the platform must ensure that an appropriate replication is set up and monitored.

Object storages with REST APIs are designed for the cloud. If you decide to use storage from the Cloud (public or private) we recommend to use object storage via S3 API. Object storages provide a high level of redundancy (even geo redundant) and fail safety. The REST S3 API is very tolerant against network and infrastructure failures.

Ensure technically and organizationally that there is sufficient space for storing the data.

For the best high availability the provider of your storage system must protect the stored data against accidental, malicious, or disaster-induced loss of data. The better your data replication the better is your availability in case of a failure.

To achieve high availability for arveo the provider must guarantee that all required content services (xref:./content-services.adoc) run as a cluster.

Security

Operators

The provider of the arveo services should ensure that only authorized data protection officers & administrators have data write (INSERT,UPDATE, DELETE) permissions for the database and the content repository.

An administrator only can illegally manipulate content if he can access both database and content storage because the control hash value of the content is stored in the database. Take care that none of your administrators has exclusive and unattended access to the content storage and the database.

Distributed management roles of the storage systems and the arveo transparent encryption feature (Encryption) make your system more forgery-proof!

The activities of administrators with extensive rights must be logged by the operator. The logs are subject to the retention periods of tax law and must be checked regularly.

Platform

To prevent unauthorized access to the arveo platform the provider must:

  • ensure that HTTPS communication is enabled for all clients, applications,3erd party components and services (Services).

  • enable OAuth2.0 or X.509 certificate authentication X.509 certificate authentication and authorization for all arveo service (OAuth2.0). All arveo services require authentication, ensuring that only arveo services or authenticated and authorized users can use the API. We recommend using a state-of-the-art authentication services like keycloak and to enable SSO with at least 2-factor authentication.

  • take suitable technical or organizational actions against unauthorized changes to the data such as firewall, VPN, transparent encryption with arveo or at hardware level,

  • provide adequate protection of passwords by using a state-of-the-art IDP such as Keycloak or MS Active Directory and increasing the password complexity accordingly.

  • take actions against denial of service attacks

arveo Content Services

The administrators of the arveo platform must:

  • make sure that only authorized persons receive an account that grants access to arveo documents.

  • ensure that objects are protected against unauthorized access using ACLs. We recommend defining a separation of functions and implementing this via ACLs. To achieve the best data security assign ACLs to all records, cases and documents. Make sure that for all used ACLs the assignment of access rights to users and groups is carried out regularly (e.g. Invoice document type, accounting: write, employees: read)

  • the activities of managers who can change ACLs are logged via arveo audit (User Management) and checked at regular intervals.

  • organizationally ensure that the password the arveo administration users are changed regularly.

Data Store

Persistence Architecture

arveo guarantees forgery-proof long term availability of your content and metadata.

All revisions of content or metadata are stored as a traceable and immutable version (Versioning) to the storage systems. The content service checks the integrity of uploaded content by computing SHA-256 hashes on client and server side. Additionally, an audit entry is written to the audit log table (Audit Log). arveo provides a role based access control on object level and allows you to prevent unauthorized access to content and metadata.

arveo protects content and metadata by software design. arveo only allows access to content and metadata via the arveo REST API. As only arveo and highly authorized administrators have data writer rights for the database and the storage it is impossible that content is deleted or manipulated by unauthorized persons.

Together with arveo's capabilities to manage the retention periods of documents and records (Retention Periods) arveo guarantees a GDPR and/or DSGVO compliant data protection and data privacy.

arveo meets the the requirements of a revision proof long term archive and is a corner stone for the legal compliance of your IT systems.

Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo does not use hardware retention features.
If needed you can add verifiable evidence records to the documents (signatures, timestamps) to proof the integrity and authenticity of content and author. The creation of the evidence is not a feature of arveo . It only stores the record together with the content.

In this chapter you will find all information how to setup a secure and legally compliant content service platform with arveo.

Data Kinds

arveo distinguishes three kinds of data and stores each to the most suitable storage system.

  • Content: arveo stores unstructured content like documents, audio, video and images to either a cloud object storage or a file system storage.
    Most cloud providers like AWS S3, NetAPP ONTAP, EMC Elastic Cloud, etc. provide file system storage or object storage systems. Object storages are organized in buckets and allow you to store an almost unlimited numbers of objects in a bucket. arveo accesses the content via REST Standard S3.
    For an optimized and fast access of often used content objects arveo can integrate a NOSQL Keyvalue Cache DB like redis.

  • Structured system properties: containing all primary keys and technical information about documents, containers and folders. The data has a fix data model and requires highest performance, consistency and transaction support. arveo saves the data on a relational database.

  • Customer specific metadata: The data model is different for each document, container or folder type. This metadata is semi structured and new properties might be added during the life cycle of the application.

    • Eventually consistent customer information: Sometimes the consistency of the data is not important, but we must guarantee a high performance and facets support when we filter by any value without the risk of a full table scan. arveo saves customer metadata on a NOSQL document DB apache solr 8.6 which is highly efficient for inserting and searching and offers automatic completion and facets.

    • Consistent customer keys: The properties require highest performance, consistancy and transaction support. arveo saves the data on a relational database.

High Availability

The high availability (HA) of arveo depends highly on the HA of the storage systems for all kind of data. Each of the storage systems and as a result the arveo services follow the CAP (Consistency, Availability and Partition Tolerance) theorem saying that the availability and fail safety of a system depend on:

  • Consistency: All clients see the same content and metadata.

  • Availability: All clients can read and write.

  • Partition Tolerance: the system is fail safe when one or more nodes fail.

CAP theorem

The CAP theorem in a nutshell predicts that you cannot have all three properties but only two of them.

As arveo is a ECM cloud platform consistency and availability (read/write) of content and metadata are most important. arveo tolerates that network or message failure of either the primary content storage or database node can cause exceptions on the client application. The arveo services do not store data within their containers and focus on scalability and partition tolerance.

The arveo micro services should be deployed as containers in your cloud environment (e.g. kubernetes) and auto scaling should be implemented.

Data Integrity

arveo ensures the immutability and integrity of all your digital content and evidence records by an automated hash check each time content is up- or downloaded.

Upload

Hash-Check: When you use the upload content API, the client side and content service compute SHA-256 hash for the streamed data. Only if both values are identical the upload process is successful. The upload API allows you to pass the expected SHA-256 value and the API will only return OK if the server side hash matches the expected hash.

Verify: The upload API optionally can verify the uploaded content. The content service downloads the just uploaded stream from the content storage and compares the hash once again with the expected value (Upload Content). _arveo stores the hash value in a system property and persists the value in the document type metadata table.

The verify option of the Upload API may slow down your system when uploading a huge amount of data.
Transactions

The arveo REST API is stateless and there is no session. That means that all REST API calls are atomic and all database commands are executed within one transaction. arveo guarantees the atomicity of the transactions and to avoid inconsistent states, all aborted transactions are removed and rolled back. Hanging transactions are removed and rolled back to avoid database locks.

The database provider should configure the transaction deadlock timeout on your database to avoid locks on the database that can decrease the performance of your UPDATE and DELETE calls.
Download

When you use the download API (Download Content) the client SDK computes the SHA-256 hash of the downloaded stream and compares it to the hash value in the system property of the document type. If the hash does not match the upload hash value in the database the download fails with a data integrity exception telling the caller that the data on the storage was most likely manipulated.

An administrator only can illegally manipulate content if he can access both database and content storage because the control hash value of the content is stored in the database. Take care that none of your administrators has exclusive and unattended access to the content storage and the database.
Distributed management roles of the storage systems and the arveo transparent encryption feature can make your system forgery-proof!

Content Storage

arveo support evidence proof long term storage of your content and metadata by storing the content legally secure to either a S3 object storage or a file system. The storage must be redundant. Object storage systems like AWS, NetAPP or EMC Elastic Cloud Storage guarantee the long term availability and integrity of your content.

All changes of content or metadata are persisted as a traceable and immutable version (Versioning) on your storage system and an audit entry is written to the audit log table (Audit Log). arveo creates a version each time metadata including comments and annotations or content of a document is changed by the API arveo} creates a new entry containing the author and the timestamp pf the change in the version management table. The Update API allows you to add a comment to each version. The Version Management API provides access to all version information and metadata and content of previous versions.

To ensure that the content is immutable only arveo should have write access to the storage system.
Only authorized data protection officers & administrators should have write-access to the storage system.
In case of very sensible data you can enable encryption (Encryption) to follow the data protection rules and prevent your administrators from access of document content.

For best high availability the provider of your storage system must protect the stored data against accidental, malicious, or disaster-induced loss of data. The better your data replication the better is your availability in case of a failure.

Data Replication (Redundancy)

For both supported storages (S3, file system) you can select between different data replication strategies: * Backup or Mirror enables an asynchronous disaster recovery. Your content data is periodically mirrored and the data * Synchronous replication: * Asynchronous replication:

Fail Safety (Consistency, Availability)

As arveo stores each version of the content as an immutable object it is not possible that clients will get outdated data. If the replication is asynchronous it only can happen that clients get a read error.

In case the storage is offline arveo is not available and the system has an outage. In case the storage allows only read access arveo can download content but upload operations fail.

If the storage node has a long term outage the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.

We strongly recommend using a redundant file system or object storage system. If you do not at least backup your data periodically a data loss is likely.
For high availability with almost zero data loss your storage system should replicate the written content and data synchronously.
The operating team of the platform must ensure that an appropriate replication is set up and monitored.
You can configure different storage location (Cloud-Storage or on premise) for your content and document types (Storage Configuration).
Reduce costs by storing non compliant and legally relevant data like PDF/A renditions of documents on storage systems with lower availability and performance SLAs.
Object storages with REST APIs are designed for the cloud. If you decide to use storage from the Cloud (public or private) we recommend to use object storage via S3 API. Object storages provide a high level of redundancy (even geo redundant) and fail safety. The REST S3 API is very tolerant against network and infrastructure failures.

Consistent Meta Data Storage (relational Database)

The relational database postgreSQL 12 is responsible for 100% consistent processing of the structured metadata and transactions.

Data Replication (Redundancy)

For the supported databases postgreSQL 12 you can select between different data replication strategies:

  • Asynchronous replication (backup or mirror): Enables an asynchronous disaster recovery. Your database is periodically mirrored.

  • Synchronous database cluster: Transactions are synchronously replicated on more than one master node.

The provider of the postgreSQL 12 cluster must guarantee that data is stored redundant and reduce potential data loss.

Fail Safety (Consistency, Availability)

In case the database cluster is down or allows only read access arveo is not available (Deny Of Service/DOS). If the database has a long term outage and the data files are affected the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.

Eventually Consistent Meta Data Storage (NOSQL Document Database apache solr 8.6)

arveo uses modern NOSQL storage technologies to guarantee high search performance and horizontal scalability at all times. We store semi-structured or dynamic document metadata to a NOSQL document database apache solr 8.6.

Solr is an open source search platform that has been partially integrated into arveo.

solr arveo

Based on the type definitions that are created in arveo, arveo automatically creates a schema that Solr uses. In addition, for each client that is created in arveo, a new collection is also created in Solr, so that there is also a separation of data there.

Data Replication (Redundancy)

Setup a cluster of replicated nodes for apache solr 8.6. Refer to the apache solr 8.6 documentation to setup a redundant cluster.

Fail Safety (Availability, Partition Tolerance)

In case the database cluster is down arveo is still available but free customer searches fail. In case one database node is down or the database is read only arveo is still available but searches may return outdated results. If the database has a long term outage and the data files are affected the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.

The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the a backup strategy prevents data loss.

Clustering

Each arveo service can be configured as a service cluster to achieve HA. Depending on the deployment you can either set up an application server cluster (WAR deployment) or run our containerized applications on a cloud platform like open-stack with kubernetes.

Fail Safety (Consistency, Availability)
Table 3. Content services in arveo
Service Failure risks Recommended

User management Service

No login possible, system outage

Cluster 2

Config Service

Configuration not available to all nodes, system outage

Cluster 2

Registry Service

Service registry not available, system outage

Cluster 2

Document Service

Store, edit and version documents and metadata not available, system outage

Cluster 2-n, automatic scale up/down by load

SAP Archive Link Service

SAP archive link not available, SAP outage

Cluster 2-n, automatic scale up/down by load

Document Conversion Service

Conversion to PDF/A not available

Cluster 2-n, automatic scale up/down by load

Enterprise Integration Service

Job execution paused and integration with external systems not available

Cluster 2-n

Federation Service

Access to external repositories (Documentum, Saperion) not available

Cluster 2-n, automatic scale up/down by load

Access Control Service

Access of objects with access control list fails, partial system outage

Cluster 2

Required 3rd Party services

To operate arveo successfully with high availability the operator of the platform must provide the following services as a cluster.

Service Failure risks Recommended

Active MQ

Asynchronous operations are not triggered

Cluster 2

postgreSQL 12

Access to metadata not available, system outage

Cluster 2-n depending on load and configuration of postgreSQL 12 cluster

apache solr 8.6

Enterprise search not available

Cluster 2-n depending on load and configuration of apache solr 8.6 cluster

Content Storage

Content access not available, system outage

Storage cluster depending on provider

Authentication Service (optional)

Login not available via OAUTH2, system outage

Cluster 2

Monitoring (optional)

ELK (Elasticsearch, Logstash, and Kibana)

Cluster 2

To achieve high availability for arveo the provider must guarantee that all required content services (xref:./content-services.adoc) run as a cluster.

Data Deletion

By default all documents of a specific document type stored in arveo store the metadata to the configured database and their content to the object storage. When a version is created the content or metadata is stored as a traceable and immutable version (Versioning) to the database and storage system. That means that we have separate content objects and database entries for each version. Each document can have a retention period that ensures that the document cannot be deleted before the period expires.

You can delete or purge any object with the arveo Delete-API if you have the DELETE right for the document type and the object ACL and the retentions period has not expired.

The delete method deletes all entities including all versions of the object in the database, but it does not delete the content objects or files. The delete operation cannot be restored and the data is permanently deleted.

The purge method additionally erases the content objects or files from the content storage.

If you delete objects only in the database the content objects are orphaned, and it is impossible to restore them and almost impossible to delete them later on because there is no relation left in the database. The content objects remain as data trash in the system and cannot be accessed by the API.

Recycle Bin

Any document, container or folder type can use the optional recycle bin feature. If it is enabled, entities in the type definition can be moved to and restored from the recycle bin.

The recycle bin is implemented as a boolean database system property DELETED. Entities in the recycle bin will be filtered from normal queries by default, but a client can compose search expressions that override this behavior (see Recycle Bin).

If you delete or purge an object in the recycle bin it is deleted like a document without recycle bin feature and cannot be restored.

For compliance reasons the audit entries in the database are not deleted by the Delete-API and the delete operation written to the audit log. The operator of the platform must clean up the audit table after the legal retention period has expired. We recommend backing up the audit logs to meet the legal requirements of data protection and to ensure that the backups can be restored within the legal retention period.

Automated Recycle Bin Emptying

It is possible to empty your recycle bin by an automated job scheduled in the Enterprise Integration Service of arveo. You can activate the predefined empty recycle bin job, and you can change the age from the 6 month default value to the age you want. the job deletes all entries permanently that have been in the recycle bin for longer than the set age.

Recovery Log

In addition to the recycle bin feature, the arveo offers an additional safety layer to recover permanently deleted entities. By annotating a type definition with @Recovery, it is possible to define a time period, in which permanently deleted entities will be kept in a system-wide recovery table before they are removed completely. An entity in such a type definition that is deleted (or purged) will be removed from the type definition’s table (and its version table). A copy of each version of the entity will be stored in the recovery table making it possible to restore it manually. If the entity is a document, its contents will not be deleted from the storage until the entity is removed from the recovery table.

There is no API to restore data from the recovery table. This feature is only intended as a last backup in order to make accidentally deleted data available to the business by an administrator. The admin can copy the content file from the storage together with the JSON metadata and send it to the business department.

Recovery Log Emptying

The system management API provides a method to remove expired entities from the recovery table. An entity is considered expired when its keep-until timestamp is in the past compared from the moment the method is invoked. A user who calls this method needs the ECR_PURGE_RECOVERY_TABLE authority (see Access Rights).

The recovery of deleted entities is a manual process. The recovery table contains a JSONB column containing a JSON representation of the entire entity including attributes, content information and modification information. Each version of an entity is contained in the recovery table as a separate row.

It is possible to empty the recovery log by an automated custom job scheduled in the Enterprise Integration Service of arveo. The job must execute the Management-API method to empty the recovery table.

Installation

Deployment Options

The lightweight and stateless services are delivered as containers for all platforms and allow the arveo to automatically scale horizontally. Customers have the choice between an on-premise, cloud or hybrid installation.

The deployment may be done as a:

  1. Docker images (for the arveo services): A Docker image is a template that contains a set of instructions for creating a container. Several containers can be started from one image.

  2. executable jar: Integrate the content services in your java application and run on any platform that provides a JVM.

  3. a .war file: Deploy the services as web applications in an application server like Tomcat.

  4. Spring Boot application: Deployed as a self running service using an embedded undertow servlet container.

  5. Debian package: Debian packages are used for software installation on Debian-based operating systems.

  6. Kubernetes HELM charts: Deploy the content services as containerized applications in your kubernetes environment with flexible HELM charts. That will enable load-dependent, automated service provision.

System requirements

This chapter describes the system requirements for an on premise installation. The configuration and deployment of all required artefacts is performed by Eitco or a partner by the automated deployment tool "Puppet".

General prerequisites

Firewall

Some firewall permissions are required. The IP addresses and the ports are customer-specific. In order to notify the provider of this, the customer must fill out the form customer-specific information.

Network Access

SSH access from the Eitco network to all customer-specific systems (including server) is required so that the installation can be carried out. Access to the official Ubuntu package sources is required. This is done either in the form of direct access via the Internet or by providing a local copy of the corresponding repository.

SMTP Mail

In addition, an SMTP server access is required for sending mail, as well as access to the Eitco Puppet Master via VPN. The following parameters must also be provided by the customer so that any error messages from the HL7 Integration Service can be sent by email: • SMTP_SERVER • SMTP_PORT • SMTP_STARTTLS = true / false • SMTP_USER • SMTP_PASSWORD • MAIL_TO • MAIL_FROM MAIL_TO is the address to which the mails are sent and MAIL_FROM is the sender address.

Reference Integration System

A reference system (in the form of a VM or similar) is required to test the system. There must be the same setup as on the customer client systems (i.e. the same web browser, with the same settings, etc). In addition, a terminal / RDP access is to be provided so that Eitco can test the client installation.

Web Browser

For the administration user interfaces the following web browsers are supported: Safari, Google Chrome, Microsoft Edge, Mozilla Firefox, each in the current version.

Containerized Applications

For the installation of the product, certain requirements for the hardware, software and infrastructure to be provided must be met. In a typical cloud environment each arveo service is deployed as a containerized application and is hosted and scaled by a cloud operating system. However, a different setup can be used, depending on the customer infrastructure and the load of the system (see Deployment Options)

The following chapter describes the minimum CPU and RAM requirements of each arveo service in a production environment.

Table 4. Content services requirements
Service CPU RAM

Document Service

4 x> 2 GHz

>= 32 GB

User Management Service

1x > 2 GHz

>= 2 GB

Registry Service

1x > 2 GHz

>= 512 MB

Config Service

1x > 2 GHz

>= 512 MB

Access Control Service

1x > 2 GHz

>= 2 GB

Audit Service

1x > 2 GHz

>= 512 MB

SAP Archive Link Service (optional)

1x > 2 GHz

>= 1 GB

Document Conversion Service (optional)

1x > 2 GHz

>= 2 GB

Enterprise User Management Service (optional)

1x > 2 GHz

>= 1 GB

Enterprise Integration Service (optional)

1x > 2 GHz

>= 1 GB

Federation Service (optional)

1x > 2 GHz

>= 2 GB

The number of started services for each service group and the assigned CPU and RAM depends very much on the load and the number of documents and objects in the database. You should always monitor the system and scale up or down on demand. Especially service like document conversion or enterprise integration service can produce heavy load and require a lot of containers consuming RAM and CPU.
For a test or development system the requirements are lower and each service requires: < 1 CPU, 256 MB for all services.

Typical Non-Containerized Installation

Assuming that the installation is performed as spring boot services we recommend to set up a minimum of 3 machines. The database and the document service carry the highest load and should be deployed on separate machines. All other services and 3rd party services can run on one OS instance. Some services like Archive Link, Document Conversion may consume high CPU and RAM and can make it necessary to outsource them to separate machines,

  • System machine 1 - database. The PostgreSQL database is installed here.

Table 5. Requirements for the database machine
Component Recommendation Note

CPU

4x (> 2 GHz)

RAM

At least 16 GB

Depending on the size of the database

DB Storage

Proportional to the number and the kind of the entities

Recommendation: should be stored on separate storage

Log files

Depending on the volume of changes to the database

Recommendation: Should be stored on separate storage

OS

Ubuntu 18.04/20.04

The operating system recommendation is optional, hence any system satisfying the requirements of the PostgreSQL database may be installed

  • System machine 2 - Document Service is installed here.

Table 6. Requirements for the arveo machine
Component Recommendation Note

CPU

4x (> 2 GHz)

RAM

32 GB

Storage

Proportional to the size of the content objects

These storages are supported:
1) on a separate file storage
2) AWS, NetAPP or EMC Elastic Cloud Storage.

OS

Ubuntu 18.04/20.04

The tests are performed on a Debian machine, hence it is recommended to install a Debian based distribution, for example a current LTS version of Ubuntu

The storage is meant for storing the arveo content objects of type Document, meaning binary content. All metadata and system properties are stored in the database, see System machine 1 above.
Table 7. Requirements for the Services machine
Component Recommendation Note

CPU

4x (> 2 GHz)

RAM

16 GB

OS

Ubuntu 18.04/20.04

The operating system should be a Debian based

The importance of testing shouldn’t be underestimated, so there should always be a way to test specific cases without trying it out on a production system. For this reason, it is important to create a test system, which has the same specification and a similar data set as the original system.

For the arveo services JDK 11, 16 is required. All the other recommendations listed above are non-binding, but they have proven to work well. In some cases, other recommendations can be made, according to your individual project setup as well as the requirements of the project.

Installation

General Concept

These instructions describe the installation procedure, the installation content and the items required for commissioning the product. We recommend controlling the rollout of the _arveo services by a continuous integration process that provides all artefact required for the deployment of the required content services and your web solution and integrations.

Depending on the underlying platform, deployment takes place via binary service artifacts that are deployed on pre-installed VMs or via containerized applications that are made available in the host cloud system.

On Premise Installation By Eitco

This chapter describes the compliant On Premise installation provided by Eitco. The configuration and deployment of all required artefacts is performed by Eitco or a partner by the automated deployment tool "Puppet".

The customer provides several virtual machines that are configured by Eitco with the automated deployment tool Puppet (Puppet Deployment) in order to ensure a problem-free software rollout in the customer system.

Depending on the service level agreement Eitco can guarantee high availability, reliability and high performance at all times. The system has to be protected from manipulation attempts by technical or organizational measures. The data that is stored and managed in the system is protected via the API. The access and editing rights are managed via ACLs. User rights are based on the concepts for roles, groups and ACLs. More detailed information on this is provided in the relevant chapters of this manual.

All changes to the system and the data are logged via the API, and the changes are traceable via the audit log. If auditing is activated, every database change is logged. In order to guarantee the atomicity of the transactions and to avoid inconsistent states, all are aborted transactions removed and rolled back.

Access to all data (documents, metadata) is exclusively provided via the API, with the corresponding protection mechanisms so that the security of the data is guaranteed at all times.

Puppet

Puppet is open source software developed by Puppet Labs and is used for the automated configuration and deployment of software deliveries. It ensures the configuration management of servers with both Unix-like operating systems and the Windows operating system via network. The Ad-min-Tool allows the automated configuration of computers and servers as well as the services installed on them. The _arveo Services are installed and configured with Puppet. After the server has been provided, see System Requirements, the Puppet Agent is installed on it, which then takes care of setting up the environment and the actual application. The duration of the installation process can vary and requires an adequate internet connection. The individual installation components are installed in the form of .deb packages. The installation is completely automated and carried out remotely.

Installed Services

  • postgreSQL 12 Database

  • apache solr 8.6 Document Database (full text)

  • JDK 11, 16

  • Keycloak, Active Directory Authentication Service

  • Active MQ Message Service Hub

  • Tomcat 9 Application Server

  • Document Service

  • Registry Service

  • Configuration Service

  • User Management Service

  • Access Control Service

  • Audit Service (optional)

  • Document Conversion Service (optional)

  • Enterprise Integration Service (optional)

  • Enterprise User Management Service (optional)

  • Enterprise Federation Service (optional)

Customer Applications & Services

  • Eitco or Customer application and integration services (typically web client and Apache Camel integration end pints)

Order Of Services

Following you find the order of the service starts. The content services may not work before important services are started.

All commands should be executed as root. When running as a non-root user, sudo should be set in front of systemctl.

The services are initially started by Puppet. After the installation of arveo has been successfully completed, the customer applications can be started. Additional information on registration, user management and the use of the web client can be found in the user and admin manual.

  • postgreSQL 12: systemctl start/stop postgresql

  • apache solr 8.6: systemctl start/stop solr.service, systemctl start/stop zookeeper.service

  • Config Service: systemctl start/stop common_config_service.service

  • Registry Service: systemctl start/stop common_registry_service.service

  • User Management Service: systemctl start/stop common_user_management.service

  • ACL Service: systemctl start/stop common_access_control.service

  • Enterprise User Management Service: systemctl start/stop common_enterprise_user_management.service (optional)

  • Federation Service: esystemctl start/stop cr_federation.service (optional)

  • Audit Service: systemctl start/stop common_audit.service (optional)

  • Document Service: systemctl start/stop ecr_repository_service.service

  • Document Conversion Service: systemctl start/stop common_document_conversion.service (optional)

  • Enterprise Integration Service: systemctl start/stop common_enterprise_integration.service (optional)

The current status of the service can also be determined with systemctl status <service>.

SSL Certificates

If all connections between the services are to be encrypted, SSL certificates are required. The following requirements apply: An X-509 certificate with an associated "private key" is required for each server. The certificate should be signed by an official CA or the company’s own CA. Self-signed certificates can also be used. The following special feature must be observed: the X509 extension “Subject Alternative Name” must contain all DNS names and IP addresses via which the respective systems are accessed.

Licensing

The client software uses several 3rd party licenses. The list of licenses can be called up via the following link: https://<customername>.eitco.de/3rdpartylicenses.txt.

Backups

The logs are on the db server in the directory /var/log/postgresql/backup.log. The database backup script is located at /var/lib/postgresql/backup.sh. This can also be started manually at any time. There should not yet be a folder with the current date under / backup / full /. If such a folder exists, it must be moved beforehand. The script is controlled by cron and is always started automatically at 10 p.m.

Getting Started

To get started developing an arveo based application there are maven archetypes to create projects. An example of a well documented application based on such an archetype - but extended since - can be found here.

The slim archetype

This archetype creates a rather small project. It consists of an arveo scenario and tests for that.

The maven coordinate of this archetype are:

    <groupId>de.eitco.ecr</groupId>
    <artifactId>ecr-types-archetype</artifactId>
    <version>{project-technical-version}</version>

To create an arveo scenario project use the maven archetype plugin:

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-types-archetype -DarchetypeVersion={project-technical-version}

Here, the variable {project-technical-version} must be replaced with the actual version, f.e. 5.0.1.

Also, you have to remember, that this will generate a project structure into a project folder. So before you type this command in your command line, make sure you have prepared a folder where your project structure is going to be and you have switched into this folder on your command line.

This will start a process that will ask for some parameters and then generate a maven project according to the parameters. The following parameters will be asked for:

groupId

The maven groupId of the new project

artifactId

The maven artifactId of the new project

version

The maven version of the new project

class-name-prefix

A prefix for the names of the generated classes.

scm-locator

The location in the eitco bitbucket server where the sources are (or will be). For a project located in https://git.eitco.de/scm/<project>/<repository>.git, this would be <project>/<repository>.git. This configures the maven release plugin. If this is omitted (or set to a wrong value) the project will work for now - however the release process will not work - unless it is fixed.

Some or all of these parameters can also be given on the commandline via -D. The process will not ask for parameters given by command line. So the command

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-types-archetype -DarchetypeVersion={project-technical-version} -DgroupId=my.group.id -DartifactId=my-artifact-id -Dversion=0.0.1-SNAPSHOT -Dclass-name-prefix=My -Dscm-locator=prj/repo.git

would not ask for any parameters and just create the project.

Overview of the generated project

The project generated by the archetype will consist of two modules:

implementation\types

This module contains your arveo scenario. An example type will be created with the name <class-name-prefix>Model. You can define more types here, but you will need to register them in register in <class-name-prefix>TypeRegistration. The chapter Object types and definitions describes how to define types.

test\system-test

This module contains tests for your scenario. These tests will be executed in the build. For that a complete arveo environment will be created, so you can add tests, that simply connect to arveo by the http client and can assume that your scenario is deployed.

This module can also be used to set up an arveo environment with your scenario on which you can then run tests manually. In the module run

mvn -Denv

to set up the environment. It will be torn down when you press <enter> in the console.

This archetype creates a more complex project. It is based on the eitco commons archetype It will contain a simple web service, with an automatically generated client layer, based on eitco commons. The maven coordinate of this archetype are:

    <groupId>de.eitco.ecr</groupId>
    <artifactId>ecr-service-archetype</artifactId>
    <version>{project-technical-version}</version>

To create an arveo based service project use the maven archetype plugin:

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-service-archetype -DarchetypeVersion={project-technical-version}

This will start a process that will ask for some parameters and then generate a maven project according to the parameters. The following parameters will be asked for:

groupId

The maven groupId of the new project

artifactId

The maven artifactId of the new project

version

The maven version of the new project

class-name-prefix

A prefix for the names of the generated classes.

scm-locator

The location in the eitco bitbucket server where the sources are (or will be). For a project located in https://git.eitco.de/scm/<project>/<repository>.git, this would be <project>/<repository>.git. This configures the maven release plugin. If this is omitted (or set to a wrong value) the project will work for now - however the release process will not work - unless it is fixed.

disable-optional-features

when set to false, it will create a little more complex project, including the audit service, the user-management enterprise service and jmeter samplers. If set to true (the default value) these features will be disabled but can be activated by uncommenting certain source locations.

Some or all of these parameters can also be given on the commandline via -D. The process will not ask for parameters given by command line. So the command

mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-service-archetype -DarchetypeVersion={project-technical-version} -DgroupId=my.group.id -DartifactId=my-artifact-id -Dversion=0.0.1-SNAPSHOT -Dclass-name-prefix=My -Dscm-locator=prj/repo.git -Ddisable-optional-features=false

would not ask for any parameters and just create the project.

Overview of the generated project

The project generated by the archetype will consist of four modules:

  • documentation

  • implementation

  • packaging

  • test

The documentation module

This module holds a frame for an asciidoc based documentation of your project.

The implementation module

This module contains the actual source code. It is separated into five submodules.

  • common

    • This submodule contains classes that are available on the server side as well as the client side.

  • generated

    • This submodule contains modules that are automatically generated.

    • Normally developers will not add code in these modules.

      • They are however relevant for building the project.

    • The following submodules exist

      • serialization

        • This submodule contains automatically generated serialization meta information.

      • client

        • This submodule contains a few submodules itself, holding client side applications for:

          • a java spring based http client api,

          • a java spring based embedded client api,

          • a typescript http client api.

      • jmeter-sampler

        • This submodule generated jmeter samplers of the services api, usable in load tests.

  • server

    • This submodule contains the server side implementation.

  • types

    • this submodule contains the arveo based model. The generated interface named <class-name-prefix>Model describes an arveo type definition as will every interface you register in <class-name-prefix>TypeRegistration. The jar compiled by this module will be available on the server side and client side. Additionally, it needs to be in the class path of your arveo instance. For the system tests (se below) this is already taken care of.

The packaging module

This module contains delivery artifacts to deliver the service to or with different runtimes. This includes:

  • a stand-alone jar

  • a java web archive (war)

  • a helm chart for deployment in a kubernetes cluster

The test module

This module contains a system test module. When building this module maven will start a complete arveo system (containing all required services) with the newly generated service in the pre-integration-test-phase so that tests written here (like the generated example <class-name-prefix>ClientIT) may simply call the new service via the generated http-client (see above).

Working on the generated project

Most implementation will be done in the implementation\server module since this contains the server side code. You api and model will be defined in the implementation\common and implementation\types modules. The later will only be used for classes that are part of your arveo model and need to be in the classpath of arveo.

When testing your code, the test\system-test module comes in handy. As mentioned above, it will start a complete arveo system so that your tests can simply use the generated http client api to test your functionality. However, you can use this to manually test and debug your service, too. In case you simply need to start up the environment, in the test\system-test directory call:

mvn -Denv

If you want to debug your service call

mvn -Denv -Dservice.skip

This will start the environment except for your service. You can then start your service in debug mode from your IDE.

In both cases you can now start tests manually or call the service api directly to test your code.

Administration

Configure Database access and Tenants

The arveo (and the User Service, Enterprise User Service and the Access Control Service) support multiple tenants. The separation between the tenant’s data is done on the database layer by using separate JDBC connections for each tenant. The tenants could be located in different schemas or in different database instances. Each user is associated to a single client, so that all database queries performed in the user’s context are executed on the correct database connection. On systems that have only one tenant, it is not required to associate users to the tenant. Systems that use more than one tenant, have to have one tenant called 'master'. This tenant is used to store some tenant-spanning configuration properties. The available tenants are configured in the service’s configuration file:

Configuration of the service
tenants:
  - tenant-id: "master"
    numeric-id: 1
    db-url: "jdbc:postgresql://localhost:5432/postgres?currentSchema=master"
    db-username: username
    db-password: password
    db-driver-class-name: org.postgresql.Driver
    db-maximum-pool-size: 5
    db-minimum-idle: 2
    db-connection-timeout: 10000
    db-idle-timeout: 60000
    db-max-lifetime: 1800000
    db-leak-detection-threshold: 20000
  - tenant-id: "tenant1"
    numeric-id: 2
    db-url: "jdbc:postgresql://localhost:5432/postgres?currentSchema=tenant1"
    db-username: username
    db-password: password
    db-driver-class-name: org.postgresql.Driver
    db-maximum-pool-size: 5
    db-minimum-idle: 2
    db-connection-timeout: 10000
    db-idle-timeout: 60000
    db-max-lifetime: 1800000
    db-leak-detection-threshold: 20000
  - tenant-id: "tenant2"
    numeric-id: 3
    db-url: "jdbc:postgresql://localhost:5432/postgres?currentSchema=tenant2"
    db-username: username
    db-password: password
    db-driver-class-name: org.postgresql.Driver
    db-maximum-pool-size: 5
    db-minimum-idle: 2
    db-connection-timeout: 10000
    db-idle-timeout: 60000
    db-max-lifetime: 1800000
    db-leak-detection-threshold: 20000
Table 8. Database parameters
Parameter Meaning

tenant-id

The human-readable name of the tenant as it is used to associate a user to a tenant

numeric-id

A numeric identifier for the tenant. The numeric identifier is used in IDs of entities to distinguish entities of different tenants

db-url

The JDBC URL for the tenant

db-username

The username used to log on to the database

db-password

The password used to log on to the database

db-driver-class-name

The name of the JDBC driver

db-maximum-pool-size

The maximum amount of connections in the connection pool for the tenant

db-minimum-idle

The minimum number of idle connections to keep in the pool

db-connection-timeout

Time in milliseconds to wait for a connection to the database

db-idle-timeout

Time in milliseconds an idle connection is left in the pool

db-max-lifetime

The maximum lifetime of a connection in the pool in milliseconds

db-leak-detection-threshold

Time in milliseconds a connection can be active before a warning is logged

Configure Storage Locations

Content and type definitions

Only Documents can contain content elements.A Document in the repository can contain several content elements.For example, a document could contain a content element with the original content (like a TIFF image or a Word document) and a PDF rendition.Each content element has a contentName and some more properties like the media type.The contentName is a label that uniquely identifies a single content element contained in a Document.For example, a Document might contain two content elements that are identified by the contentName's 'content' and 'rendition'.

The contentName's are not only relevant for uniquely identifying a content element contained in a document, but serve as reference for further customization of the repository.The repository does accept configuration options that are directly related to contentName's and the Document type definitions can define restrictions regarding the allowed contentName's.

Type definitions can define which contentName's can be contained in the entities stored in the definition.For example, there might be a type definition that only accepts the 'rendition' content element but not the other ones.

The configured content definitions must refer to contentName's explicitly if specific settings need to be applied. In addition to content definitions referring to specific contentName's, a default content definition can be configured. If the default content definition is configured, documents can contain content elements with arbitrary names, and it is not required to list the complete set of contentName's in the repository configuration.

Each content element is stored in a storage profile, which defines the place where the actual content will be stored.The media type configuration parameter can be used to define what kind of content a content element can contain.When the media type is set to application/octet-stream, any kind of content can be used. The supported content elements are configured as shown below.If no specifications have been made for a contentName, the settings named under default-definition apply.

ecr-service.yaml
content:
  default-definition:
    mediaType: "application/octet-stream"
    storageProfile: testProfile
  definitions:
    content:
      mediaType: "application/octet-stream"
      storageProfile: testProfile
    rendition:
      mediaType: "application/octet-stream"
      storageProfile: testProfile
    LARGE_CONTENT:
      mediaType: "application/octet-stream"
      storageProfile: testProfile
    s3stuff:
      mediaType: "application/octet-stream"
      storageProfile: s3Profile
    rollback-test:
      mediaType: "application/octet-stream"
      storageProfile: rollbackProfile

In this example, content elements named 'content' and 'rendition' and the default definition are configured. If instead of "application/octet-stream" the mime type is defined as "application/pdf", then only the content elements of type pdf could be saved there.

Types of content elements

It is specified in the type definition, which content elements this type definition may have.Objects of type Document may have several content elements.

A content element that uses the default content-definition must be named 'default'. Usually, the content elements of the entities are stored in a JSON field which contains the storage-ID and additional metadata like size, media type and a hash. If required, a content element can also be stored in a separate field. The separate field will contain only the storage-ID but no additional metadata. A type definition that uses a separate field for a content element using the default definition cannot store content elements with arbitrary names but only content elements named 'default'.Additional metadata for content elements using separate fields have to be handled by the client application, for example by storing them in a custom metadata attribute.

The following example is an object of type Document, for which two content elements are defined: "content" and "LARGE_CONTENT". The names of the content elements are defined in the configuration file of arveo.In this example, "separateField = true" means a separate column in the database, otherwise it is written in the corresponding json field of the database.The name of a separate column in the database is derived from the name of the content element.

Example of a Document with two content elements
@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content", separateField = true)
@ContentElement(name = "LARGE_CONTENT", separateField = true)
public interface TwoContentsDocument {

    @SystemProperty(SystemPropertyName.ID)
    DocumentId getId();

    @SystemProperty(SystemPropertyName.CONTENT)
    Map<String, ContentInformation> getContentInformation();

    String getName();

    void setName(String name);
}

Storage profiles

Storage profiles are bound to contentName. The StorageProfile defines on which storage the content elements are saved. The arveo uses different storage profiles that define how and where to store binary data. Access to the storage backends (like filesystem or S3) is handled by storage plugins.

A StoragePlugin is defined in the StorageProfile, which is used to access the connected storage. The same plugin can be used in several StorageProfiles. Each StorageProfile can have a different set of parameters (access data, URls, …​) for the plugin.

StorageProfile Definition
storage:
  profiles:
    version-attributes-profile: (1)
      pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin (2)
      pluginSettings:
        storagePath: ${project.build.directory}/attribute-storage
    testProfile: (1)
      pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin (2)
      pluginSettings:
        storagePath: ${project.build.directory}/storage
    s3Profile: (1)
      pluginClassName: de.eitco.ecr.server.storage.plugins.S3Plugin (2)
      pluginSettings:
        pathStyleAccessEnabled: true
        serviceEndpoint: "http://localhost:49999"
        region: us-west-2
        accessKey: myaccesskey
        secretAccessKey: mysecretaccesskey
        bucket: testbucket (3)
    rollbackProfile: (1)
      pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin (2)
      pluginSettings:
        storagePath: ${project.build.directory}/storage/rollback-test
  1. profile name;

  2. type of plugin, so its class name;

  3. bucket.

Each profile is identified by name and defines the storage plugin to use. Plugin-specific settings can be configured in the pluginSettings map. So the plugin class name determines the storage technology and the plugin settings.

If a document has been saved using the named StoragePlugin, the plugin defined in the profile will return a contentID, with which the stored data can be retrieved later. This id, which is usually of type String, is saved with the document. It is a task of the storage plugin to implement, which contents this id has. Usually it is an UUID, but it may also be a text string.

A plugin is assigned to each profile based on the fully qualified class name. Any name-value pairs can be specified for the configuration of the plug-in. The profiles are identified by their name.

Plugin configuration

The service uses a plug-in interface for connection to the specific storage provider. The following plugins are currently available:

File system

Class name: de.eitco.ecr.server.storage.plugins.FileSystemPlugin.

The FileSystemPlugin offers storage of the data as files in the file system.

Table 9. Configuration parameters of the file system
parameter meaning

storagePath

Path to the directory that is used to store the files

AWS, NetAPP or EMC Elastic Cloud Storage

Class name: de.eitco.ecr.server.storage.plugins.S3Plugin.

The S3 plug-in stores data in an Amazon S3 compatible storage.

Table 10. Configuration parameter of the S3 plugin
parameter meaning

pathStyleAccessEnabled

From S3 documentation: Configures the client to use path-style access for all requests. Amazon S3 supports virtual-hosted-style and path-style access in all regions. The path-style syntax, however, requires that you use the region-specific endpoint when attempting to access a bucket

serviceEndpoint

The URL to the S3 endpoint to be used by the plugin

region

The region for access to AWS

accessKey

AWS Access Key

secretAccessKey

AWS Secret Access Key

bucket

The name of the S3 bucket to be created by the plugin. The name can only contain lowercase letters. The name of the current tenant will be appended to the bucket name, separated by a "-" sign.

If arveo has no permissions to create buckets, then the administrator has to create the buckets manually. Buckets have to be created for every tenant separately.

BucketOrganizer

Class name: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin.

The BucketOrganizer is not specific for a specific storage technology or storage interface but delegates storage requests to other storage plugins. The selection of the target plugin depends on the retention information of document that contains the content element to be stored. The selection criteria that are used to select the target plugin can be configured in terms of a list of bucket selection rules.

The relevant retention information of the document is defined by the values of the system fields RETENTION_DATE and LITIGATION_HOLD. This value pair is matched against the bucket selection rules. The matching process starts with the first rule and continues to the next rule if the rule does not match the value pair. The matching process ends at the first rule that matches the value pair. The storage profile named in this rule will be used to store the content. Each bucket selection rule consists of three parts that are separated by the pipe (|) symbol:

  1. retention date match expression

     The retention date match expression is usually a time interval that begins at some calendar day and extends to some
    later calendar day. The notation for the interval is inspired by ISO 8601 and may read like this
    "2021-01-01+01:00--2022-01-01+01:00". The general format is "begin_date--end_date", that is both dates are separated
    by "--". A retention date matches the expression if "begin date" <= "retention date" < "end date".
    The begin and end dates are specified as "YYYY-MM-DD" followed by a time zone offset as "+hh:mm" or "-hh:mm"
        It is possible to define open intervals by specifying one of the boundary dates as "UNBOUNDED".
        Retention dates may be NULL if the retention date has not (yet) been set on the document. A NULL retention date will
    not match any interval specified in a match rule. For this reason the retention date match expression may be specified
    to be "NULL" to match NULL retention dates.
    A retention date match expression can also be specified to be "*" if the rule should always match.
  2. litigation hold match expression

    The litigation hold match expression can be one of these literals:
        "true", "false", "*"
    While the literal "*" will always match. The other literals will match the denoted value only.
  3. target storage profile name

    The name of the target storage profile to be used if both expressions match the corresponding systemfield values
Table 11. Configuration parameters of BucketOrganizer plugin
parameter meaning

bucketSelectionRules

A list of bucket selection rules

storage:
  profiles:
    bucketProfile: (1)
      pluginClassName: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin (2)
      pluginSettings:
        bucketSelectionRules: (3)
          - "*|true|fsProfileLitigationHold"  (4) (5)
          - "NULL|false|fsProfileForever" (4)
          - "2021-01-01+01:00--2022-01-01+01:00|false|fsProfile2021" (4)
          - "2022-01-01+01:00--2023-01-01+01:00|false|fsProfile2022" (4)
          - "2023-01-01+01:00--2024-01-01+01:00|false|fsProfile2023" (4)
          - "2024-01-01+01:00--2025-01-01+01:00|false|fsProfile2024" (4)
          - "2025-01-01+01:00--2026-01-01+01:00|false|fsProfile2025" (4)
          - "2026-01-01+01:00--2027-01-01+01:00|false|fsProfile2026" (4)
          - "2027-01-01+01:00--2028-01-01+01:00|false|fsProfile2027" (4)
          - "2028-01-01+01:00--2029-01-01+01:00|false|fsProfile2028" (4)
          - "2029-01-01+01:00--2030-01-01+01:00|false|fsProfile2029" (4)
          - "2030-01-01+01:00--2031-01-01+01:00|false|fsProfile2030" (4)
          - "*|*|fsProfileAnotherEra" (4)
    fsProfileLitigationHold: (5)
      pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/litigationHold
    fsProfileForever:
      pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/forever
    fsProfile2021:
      pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/2021
 #...
    fsProfile2030:
      pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/2030
    fsProfileAnotherEra:
      pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin
      pluginSettings:
        storagePath: ${project.build.directory}/storage/anotherEra
  1. profile name;

  2. type of plugin, so its class name;

  3. rules list.

  4. a bucket selection rule, consisting of retention date match expression, litigation hold match expression and target storage profile name.

  5. the referenced profile name.

Configure Retention Container

Configure storage containers for yearly retention periods

Once you have deployed your new data type with enabled retention, all your data is stored in your default storage profile and has a default retention of 10 years. The following example will define separate buckets containing all your objects with a retention period within one year. Configure the buckets in the ecr-service.yaml of your config service in the section arveo:storage:profiles: You can configure a new storage profile with an unlimited number of data buckets for your content.

Mandatory properties of your new bucket profile:

Property Description

pluginClassName:

must always be "de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin"

pluginSettings:bucketSelectionRules:

array of rules containing filter (string)|litigationHold (boolean)|storageProfile (string)

filter (string): Must be * to match all objects or a valid zoned date time range like 2031-01-01+01:00—​2032-01-01+01:00, the bucket selection is based on the document type property RETENTION_DATE.

litigationHold (boolean) true= is a litigationHold bucket, false for all other regular retention buckets

storageProfile (string): a valid storage profile name (arveo:storage:profiles:).

Find more details about selection rules in Retention Bucket Selection Rules

If the configuration is not correct you will find more information in the startup log and will most likely find a MissingConfigurationException

Defining storage containers in arveo-service.yaml and your storage system is an ongoing task for your operating team. Eitco will try to create the buckets or subdirectory on your storage system but can also use already existing ones.

ecr-service.yaml example snippet for content definitions and storages. Adapt your ecr-service.yaml and replace rules, profile names and cloud storage url, etc. with your values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
arveo:
  server:
    content:
      default-definition:
        mediaType: "application/octet-stream"
        storageProfile: bucketProfile (1)
      definitions:
        content:
          mediaType: "application/octet-stream"
          storageProfile: bucketProfile (1)
        rendition:
          mediaType: "application/octet-stream"
          storageProfile: bucketProfile (1)
        documentTypeA: (2)
          mediaType: "application/octet-stream"
          storageProfile: storageProfileDocumentTypeA
        documentTypeB: (2)
          mediaType: "application/octet-stream"
          storageProfile: storageProfileDocumentTypeB
1 Assign your bucket storage profile to the content types with a retention period.
2 The example provides two more storage profiles for other document types (storageProfileDocumentTypeA, storageProfileDocumentTypeB). To write all content of a document type to a storage profile you must assign this content type to the document type. The upload API will only accept content of this type for the document type.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
    storage:
      profiles:
        bucketProfile:
          pluginClassName: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin
          pluginSettings:
            bucketSelectionRules:
              - "*|true|storageProfileRetentionLitigationHold"
              - "NULL|false|storageProfileRetentionNone"
              - "2031-01-01+01:00--2032-01-01+01:00|false|storageProfileRetention2031"
              - "2032-01-01+01:00--2033-01-01+01:00|false|storageProfileRetention2032"
              - "2033-01-01+01:00--2034-01-01+01:00|false|storageProfileRetention2033"
              - "2034-01-01+01:00--2035-01-01+01:00|false|storageProfileRetention2034"
              - "2035-01-01+01:00--2036-01-01+01:00|false|storageProfileRetention2035"
              - "2036-01-01+01:00--2037-01-01+01:00|false|storageProfileRetention2036"
              - "2037-01-01+01:00--2038-01-01+01:00|false|storageProfileRetention2037"
              - "2038-01-01+01:00--2039-01-01+01:00|false|storageProfileRetention2038"
              - "2039-01-01+01:00--2030-01-01+01:00|false|storageProfileRetention2039"
              - "2030-01-01+01:00--2031-01-01+01:00|false|storageProfileRetention2030"
              - "2031-01-01+01:00--2032-01-01+01:00|false|storageProfileRetention2041"
              - "*|*|storageProfileRetention2042Plus"
        storageProfileRetentionLitigationHold: (1)
          pluginClassName: de.eitco.ecr.server.storage.plugins.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: LitigationHold
        storageProfileRetentionNone: (2)
          pluginClassName: de.eitco.ecr.server.storage.plugins.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: NoRetention
        storageProfileRetention2032Plus: (3)
          pluginClassName: de.eitco.ecr.server.storage.plugins.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: RetentionPeriod2032Plus
        storageProfileRetention2031: (4)
          pluginClassName: de.eitco.ecr.server.storage.plugins.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: RetentionPeriod2031
        storageProfileRetention2032:
          pluginClassName: de.eitco.ecr.server.storage.plugins.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: RetentionPeriod2032
             ... (5)
        storageProfileDocumentTypeA: (6)
          pluginClassName: de.eitco.ecr.server.storage.plugins.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>"
            region: eu
            accessKey: <myaccesskey>
            secretAccessKey: <mysecret>
            bucket: DocumentTypeA
        storageProfileDocumentTypeB: (6)
          pluginClassName: de.eitco.ecr.server.storage.plugins.S3Plugin
          pluginSettings:
            pathStyleAccessEnabled: true
            serviceEndpoint: "<cloudstorage url>" (7)
            region: eu (7)
            accessKey: <myaccesskey> (7)
            secretAccessKey: <mysecret> (7)
            bucket: DocumentTypeB
1 always configure a litigation hold bucket
2 you should also configure a data that has no retention …​ just in case
3 fall back bucket for all content with retention period past 2041. You can leave this bucket and get an exception if you store content which cannot be assigned to a bucket
4 One buckets for each year
5 Configure as many buckets as needed for your content
6 Two more storage profiles for other document types without retention. See content types without retention above arveo:server:content:DocumentTypeA/B
7 replace the placeholders with your S3 url, region, access key and access secret.

For more details on storage profiles and content types see Content Types

If you want to use directories instead of buckets you can configure file system storage profiles and assign a sub directory (File system storage profile configuration)
1
2
3
4
5
6
7
8
9
10
11
12
        storageProfileLitigationHold:
          pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin
          pluginSettings:
            storagePath: ${storage.base.directory}/storage/litigationHold
        storageProfileRetentionNone:
          pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin
          pluginSettings:
            storagePath: ${storage.base.directory}/storage/retentionNone
        storageProfile2031:
          pluginClassName: de.eitco.ecr.server.storage.plugins.FileSystemPlugin
          pluginSettings:
            storagePath: ${storage.base.directory}/storage/2031

Configure Encryption

The arveo provides a transparent encryption for data stored in the profiles. The encryption can be configured individually for each storage profile.

Overview

Encrypting and decrypting is performed by configurable encryption providers. Currently, there is one provider that supports AES encryption with 256bit keys. When a new content element is created in an encrypted profile, the arveo generates a random cipher key for the element. The key is encrypted using a master password that is configured in the profile’s encryption settings and stored in the database, which creates an identifier for the key. The keys are stored in individual tables for each profile called ecr_keys_<profileName>. After that, the content is encrypted and stored using the profile’s storage plugin. The key-id is stored in a header together with the encrypted data. When the data is read, the cipher key is loaded from the database using the key-id read from the header. The key is decrypted using the master password and used to decrypt the data read by the profile’s storage plugin.

When the database table containing the keys or the master password is lost, it is impossible to restore the data stored in the profile. When the master password for a profile is changed, it is required to re-encrypt all stored keys for the profile.

In the future, there will be a way to re-encrypt keys. For now, this issue hasn’t been implemented yet.

There is a second database table for each profile called ecr_keys_assoc_<profileName>. This table contains mappings of key IDs to content element IDs and is intended for system administration purposes. The encryption feature is configured as shown in the following example:

Configuration of encryption for a storage profile
storage:
  profiles:
    encryptedProfile:
      pluginClassName: "de.eitco.ecr.server.storage.plugins.FileSystemPlugin"
      pluginSettings:
        storagePath: "/storage/encrypted"
      encryptionSettings:
        enabled: true
        providerName: "commons-aes"
        providerSettings:
          password: "changeme"

The following tables give an overview of encryption settings and specific settings for the module commons-aes:

Table 12. Encryption settings
Parameter Description Default value

enabled

enables or disables the encryption

false

providerName

name of the encryption provider to use

commons-aes

Table 13. Provider specific settings for 'commons-aes'
Parameter Description Default value

password

the master password used to encrypt the cipher keys

false

rngAlgorithm

the algorithm used to generate secure random data

Platform specific. See docs. If not specified, the most secure algorithm available will be used

To make sure all content of a specific type definition is encrypted, make sure to limit the content types supported by the type definition to types that use an encrypting storage profile.

Implementation details

Encryption in the arveo is based on the implementation in the module commons.

Header

The library is designed to encrypt data in such a way that it can be stored permanently in encrypted form and possibly only decrypted after a long time. In order to guarantee decryption, all data required for this (except the key, of course) are stored in a header together with the encrypted data. Using the data from the header, the library can thus obtain, for example, the algorithm used and the data for key derivation, and only needs the password or the derived key for decryption.

AES

The library uses AES according to the recommendation of the Federal Office for Information Security of March 2020:

  • Operating mode: Galois/Counter-Mode

  • Hash function for key derivation: Argon2

The library allows the configuration of different parameters, but offers default values according to the recommendation of the BSI:

  • Key length: 256 bit

  • Length of GCM checksums: 128 bit

  • Length of the initialisation vector: 96 bit

  • Length of the salt for the key derivation: 32 bit

  • Parallelism for Argon2: 1

  • Memory cost for Argon2: 4096 KB

  • Iterations for Argon2: 3

The initialisation vector is randomly generated each time the encryption methods are called by using SecureRandom. The salt for the key derivation is generated in the same way each time the password derivation method is called. The fact that the initialisation vector is always regenerated ensures that the same combination of initialisation vector and key can never be used more than once. For both the AES algorithm and the Argon2 hash function, the implementations of the BouncyCastle library are used. For performance and compatibility reasons, the BouncyCastle implementations are used directly and not via the JCA:

Generation of the AES cipher with GCM
GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());
Generation of the Argon2 hash generator
Argon2BytesGenerator generator = new Argon2BytesGenerator();

Since the default implementation of the CipherInputStream from javax.crypto is not suitable for block ciphers with data authentication, the implementations for CipherInputStream and CipherOutputStream from the BouncyCastle library are used. To generate the random data for the initialisation vector and the salt, a SecureRandom instance created with SecureRandom.getInstanceStrong() is used by default. However, the library allows you to specify a different RNG algorithm (see Note on Linux below).

Header Format

The header begins with a string to identify data encrypted with the library (>ENC<) followed by the length of the payload data in the header. The header is divided into blocks and can be read serially.

++>~ENC~<++|97|AES_GCM_ARGON2|1|256|128|10|4096|1|aWFtYW5pbml0aWFsaXphdGlvbnZlY3Rvcg==|aWFtYXNhbHQ=|bXlLZXlJZA==

Marker|length|method|header version|key length|checksum length|iteration|storage cost|parallelism|initialisation vector|salt|key ID

Key

The keys used for encryption are derived from any password using the Argon2 hash function. Since this process can be very computationally intensive depending on the configuration, a key ID can be stored in the header. This makes it possible to store a key once it has been derived and to reuse it for decryption, which avoids having to derive the key from the password again. The library is not responsible for the secure storage of the key.

Usage

Instantiation of the AesEncryptorAndDecryptor:

Instantiation with default parameters
AesEncryptorAndDecryptor encryptorAndDecryptor=new AesEncryptorAndDecryptor.Builder().build();
Instantiation with customised parameters
AesEncryptorAndDecryptor encryptorAndDecryptor=new AesEncryptorAndDecryptor.Builder()
    .with128BitKeys()
    .withInitializationVectorLength(128)
    .withTagLength(128)
    .withIterations(5)
    .withMemoryCost(1024)
    .withParallelism(3)
    .withSaltLength(64)
    .withRngAlgorithm("SHA1PRNG")
    .build();

Examples of usage can be found in the test class de.eitco.commons.crypto.AesEncryptionTest.

Note on Linux

On Linux, Java uses the NativePRNG algorithm by default for generating random data with SecureRandom.getInstanceStrong(). This implementation uses /dev/random and may block if there is not enough data available there. This can lead to very long waiting times for key derivation and encryption. You can then either use a weaker RNG algorithm or make sure that /dev/random always contains enough data. This can be achieved with the haveged daemon, for example:

apt-get install haveged
update-rc.d haveged defaults
service haveged start

Configure Active MQ

The arveo uses Apache ActiveMQ to queue asynchronous tasks. Access to the message broker is configured in the YAML file of the arveo service using the default configuration properties of the Spring ActiveMQ integration:

Configuration of ActiveMQ
spring:
  activemq:
    broker-url: "tcp://127.0.0.1:61616"
    user: "system"
    password: "manager"

ActiveMQ’s OpenWire protocol is used to connect to the broker. The queues and topics used by the arveo can be identified by arveo- name-prefix. The arveo uses text messages containing JSON data to make it possible to consume messages in components not implemented in Java. The JSON data uses the same serialization mechanism as the REST API.

The arveo uses ActiveMQ’s scheduler support for features like automated deletion of entities in the recycle bin after a configurable time. Therefore it is required to enable the scheduler in ActiveMQ by setting schedulerSupport="true" in the broker tag in activemq.xml. === Configure arveo User Management As Authentication Service

The User Management Service can also be used as an OAuth2.0 Authorization Server. The service can issue JSON web tokens that can be used to log in to services that are also secured with OAuth2.

Configuration of the Authorization Server

To enable the Authorization Server, the user-service.authorization-server.enabled setting must be enabled and a keystore must be configured. The keystore must contain an RSA keypair under the specified alias:

Excerpt from a configuration file
user-service:
  authorization-server:
  enabled: true
  keystore:
    file: "Pfad/zum/Keytore/keystore.jks"
    password: test
    alias: test

OAuth Clients

To obtain a token, a client application must log on to a specific client configured in the Authorization Server. Clients can be created both by API and by configuration. At least one client must have been configured to be able to log in via OAuth. Clients are always stored in the master tenant. In the configuration, clients can be specified as follows:

Excerpt from a configuration file
user-service:
  config-data:
    tenants:
      - tenant-id: master
        oauth2-clients:
          - clientId: test-client
            resourceIds:
              - user-management-service
            clientSecret: my-secret
            authorizedGrantTypes:
              - password
              - client_credentials
              - refresh_token
            authorities:
              - USER_MANAGEMENT_SERVICE_USER
            accessTokenValiditySeconds: 300
            refreshTokenValiditySeconds: 600

In the above example, a client with the ID "test-client" is configured to have access to the arveo User Management Service (resourceIds and authorities) and to offer the authorization grants password, client_credentials and refresh_tokens. The grants are the same as those in the OAuth2.0 standard.

By default, the client’s configured authorities are included in the issued tokens. In addition, the user’s authorities (= privileges) configured in the user service are entered in the tokens. To prevent the client’s authorities from being included in the tokens, the user-service.authorization-server.inherit-authorities setting can be set to false.

The clients are always stored in the master tenant. For systems with multiple clients, care must be taken to specify the master tenant in the configuration.
Refresh Tokens

When a new token is issued, a refresh token is also generated (except for the client_credentials grant). This refresh token can be used to renew an expiring token without requiring the user to log in again. By default, when a token refresh request is made, the user also receives a new refresh token whose validity is still that of the first refresh token. This ensures that a user cannot be issued new access tokens by the service indefinitely. If this behavior is not desired, and the refresh tokens should each have an extended validity, the user-service.authorization-service.reuse-refresh-tokens parameter can be set to false.

Client login

To get a token, the client application must send the respective client id and the client secret as HTTP Basic Auth header in the token request. The remaining parameters are sent as form data via POST to the endpoint https://user-management-service/oauth/token.

Configure Authentication/SSO with Keycloak

The arveo content services support OAuth2.0 with OpenID Connect to authenticate users and services. You can install Keycloak as your identity management and use it as OAuth2.0 authentication service instead of the _arveo user management service. This will also allow you to enable single sign on for your web clients.

The content services take either the role of a "resource server" or the role "resource server" and a "client" if the use resources of other services.

In principle, any authentication server that supports OAuth2.0 and OpenID Connect can be used. Currently Keycloak, Active Directory are approved for use with arveo.

Install Keycloak
  • Download and install Keycloak: https://www.keycloak.org/downloads.html

  • Start the server. With standalone.bat -Djboss.socket.binding.port-offset=100 the used ports can be adjusted.

  • Call the configuration interface (e.g. localhost:8180) and define an administrator user for the first login.

  • Refer to Keycloak documentation to configure Keycloak on your system.

Create Keycloak Realm
  • Create your own realm (e.g. "arveo") with keycloak configuration interface.

  • Copy the public RSA key from the realm keys tab. We will use later for the configuration of the content services.

Create Keycloak Clients
  • Next, the keycloak clients for the arveo services are set up. Clients may access resources, resources validate access to themselves. There is also a mixed form (confidential) that accesses resources, but can also be a resource itself.
    All clients have set Client Protocol=openid-connect
    Client Authenticator must be Client Id and Secret to allow a secure OAuth2.0 flow.

    • Create a client arveo-service for the secure communication between the arveo services,
      This client behaves like a technical user for service/service calls
      access-type=confidential

    • Create client for your applications e.g. arveo-webclient which is public and accesses all arveo services
      This client is used for the users of your application that have logged with credentials.
      Client-Protocol=public
      Valid Redirect URIs=<URI of your web client>
      Client Protocol=openid-connect

Implicit flow is no longer recommended. The standard flow should be used. Furthermore, the extension PKCE (Proof Key for Code Exchange Code) should be used (*Authentication Standard Flow with PKCE.
Configure A Client
  • To allow the client to access the arveo services, add the role arveo-service-user to the client.

  • Add token mappers to allow arveo to get information from the token

    • Tenant, the tenant in arveo. This is used to assign the user to a client.
      Name=Tenant
      Mapper-Type=User Attribute
      User Attribute=tenant
      Token Claim Name=tenant
      Claim JSON Type=String
      Multi Valued = Off
      Add To Id Token=On
      Add To Access Token=On
      Add To User Info=On

    • Audience for repository service, _arveo pays attention to access tokens from the score-client at all, the score-backend client must also be in the token as an audience
      Name=Audience for arveo services
      Mapper-Type=Audience
      Included Client Audience=arveo-webclient
      Add To ID Token=Off
      Add To Access Token=On

    • GUUID, important for authentication via LDAP. In the access token, the user_name attribute is set to the GUUID from the LDAP. This is largely stable, in contrast to the Keycloak internal user ID.
      Name=GUUID
      Mapper-Type=User Attribute
      User Attribute=LDAP_ID
      Token Claim Name=user_name
      Claim JSON Type=String
      Multi Valued = Off
      Add To Id Token=On
      Add To Access Token=On
      Add To User Info=On

    • Client ID, keycloak ClientID Name=Client ID
      Mapper-Type=User Session Note
      User Attribute=clientid
      Token Claim Name=clientid
      Claim JSON Type=String
      Add To Id Token=On
      Add To Access Token=On

    • Client roles, is required for the _arveo services. The client needs the authority arveo-user-role to access the service. All roles from the client with the ClientID arveo-client are added to the claim authorities. Name=client roles
      Mapper-Type=User Client Role
      User Attribute=LDAP_ID
      Multi Valued = On
      Token Claim Name=authorities
      Claim JSON Type=String
      Multi Valued = Off
      Add To Id Token=Off
      Add To Access Token=On
      Add To UserInfo = Off

    • Service user. is required to identify the user of the access token as a service user. Name=Service user
      Mapper-Type=Script Mapper
      Script=exports=user.getUserName.startsWith("service-account");
      Multi Valued = Off
      Token Claim Name=technical-user
      Claim JSON Type=boolean
      Add To Id Token=On
      Add To Access Token=On
      Add To UserInfo = On

Configure Keycloak for SSO with Kerberos
  • Configure Keycloak user federation for SSO with Active Directory using Kerberos

LDAP Mapper
Figure 5. LDAP Mapping
  • 2 additional LDAP mappers have to be added:

    • Adding the tenant to the user attributes, since the tenant does not come from the AD.
      Name: add-arveo-tenant
      Mapper Type: hardcoded-attribute-mapper
      User Model Attribute Name: tenant
      Attribute Value: master

    • The role so that the user is authorized to access the arveo services Name: add-arveo_-user
      Mapper Type: hardcoded-ldap-role-mapper
      Role: arveo-service-user

As soon as a user logs on to the _arveo web client application for the first time, the user is imported from the LDAP into Keyclaok. Users who are not in the LDAP can be created locally in the Keycloak.

Configure Keycloak:

  • Install package freeipa-client (Ubuntu)

  • Setup /etc/krb5.conf

[libdefaults]
default_realm = <your realm>
# The following krb5.conf variables are only for MIT Kerberos.
kdc_timesync = 1
ccache_type = 4
forwardable = true
proxiable = true
# The following encryption type specification will be used by MIT Kerberos
# if uncommented. In general, the defaults in the MIT Kerberos code are
# correct and overriding these specifications only serves to disable new
# encryption types as they are added, creating interoperability problems.
#
# The only time when you might need to uncomment these lines and change
# the enctypes is if you have local software that will break on ticket
# caches containing ticket encryption types it doesn't know about (such as
# old versions of Sun Java).# default_tgs_enctypes = des3-hmac-sha1
# default_tkt_enctypes = des3-hmac-sha1
# permitted_enctypes = des3-hmac-sha1
# The following libdefaults parameters are only for Heimdal Kerberos.
fcc-mit-ticketflags = true
[realms]
YOURDOMAIN.COM={
kdc=yourdomaincontronler:port
}
[domain_realm]
yourdmain.com=YOURDOMAIN.COM
.yourdomain.com=YOURDOMAIN.COM
  • chown on arveo:arveo und chmod 600

  • Import the CA certificate to your Java truststore
    e.g. %javahome%/keytool -import -alias YourDomain.com -keystore truststore.jks -file ~/ca.pem

  • Activate Kerberos Single Sign On:
    To allow SSO set requirement for all Flow to ALTERNATIVE

*Keycloak Flows*
Figure 6. Keycloak Flows
  • Add a non ldap test user in manage users
    Details:
    Name=TestUser
    User Enabled=On
    Attributes:
    LDAP_ID=<new UUID>
    Tenant=master
    Role Mappings=<add arveo-service-user>

Configure Authentication between Content Services

All arveo content services use Spring Security for user authentication and authorization. Spring Security supports several standardized protocols as well as custom implementations. The basic configuration is independent of the protocol used.

When configuring the service, it is important to consider the role that the service plays in the overall system. Some services are only used by different clients and do not communicate with other services. These services only take the role of a "resource server". Other services, such as the repository service, communicate with other services themselves and assume the role of a "resource server" and a "client" at the same time.

Resource and Client Configuration

The following configuration can be used to make a service an OAuth2.0 resource and/or an OAuth2.0 client in the service’s application.yaml

Service Client Resource

Document Service

yes

yes

User Management Service

no

yes

Access Control Service

yes

no

Audit Service

yes

no

SAP Archive Link Service (optional)

yes

no

Document Conversion Service (optional)

no

yes

Enterprise User Management Service (optional)

no

yes

Enterprise Integration Service (optional)

yes

no

Federation Service (optional)

no

yes

Configure Resource

Configure the respective application.yaml of the service like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
security:
  general:
    secured-ant-matchers: "/api/**"
    open-ant-matchers: "/actuator/health,/actuator/info"
    role-for-secured-access: "<service - name>"
    cors-configuration:
      allowed-origins: "*"
      allowed-headers: "*"
      allowed-methods: "GET,POST,PUT,PATCH,DELETE,OPTIONS"
      max-age: 3600

spring:
  security:
    oauth2:
      resourceserver:
        jwt:
# public key for user management service
#          public-key-location: "http://localhost:39002/oauth/public_key"
# public key location for keycloak
          jwk-set-uri: "http://localhost:8080/auth/realms/ecr/protocol/openid-connect/certs"

(1) Generally, these parameters shouldn’t be changed.

(2) CORS defines a way in which a browser and server can interact to determine whether it is safe to allow the cross-origin request.

Configure Client

Configure the respective application.yaml of the service like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
spring:
  security:
    oauth2:
      client:
        registration:
          cmn-user-service-client-credentials:
            provider: user-service
            client-id: "arveo-service"
            client-secret: "my-secret"
            authorization-grant-type: "client_credentials"
            scope: "arveo"
        provider:
          user-service:
            authorization-uri: "http://localhost:39002/oauth/auth"
            token-uri: "http://localhost:39002/oauth/token"-----
          keycloak:
            authorization-uri: "http://localhost:8080/auth/realms/arveo/protocol/openid-connect/auth"
            token-uri: "http://localhost:8080/auth/realms/arveo/protocol/openid-connect/token"

Parameter

Description

oauth2.resourceserver.jwt.public-key-location

Validation key of the authentication service to validate the token. e.g. PEM or RSA Public Key. For keycloak Realm Settings Keys, for User management service OAuth2.0 support

security.general.role-for-secured-access

unique identifier of the service: see names of services in table below

spring.security.oauth2.client.registration.cmn-user-service-client-credentials.client-id

Client Id configured in your authentication service. In our keyckloak example: arveo-service, Client-ID, for User management service Client-ID

spring.security.oauth2.client.registration.cmn-user-service-client-credentials.client-secret

Your client secret of the authentication service. For keyckloak Client Secret, for User management service Client Secret

spring.security.oauth2.client.provider.user-service.authorization-uri

end point for user authorization

spring.security.oauth2.client.provider.user-service.token-uri

end point to get an access token

spring.security.oauth2.client.registration.cmn-user-service-client-credentials.scope

Scope is always arveo

  • The public-key-location defines a path to a resource containing the public key of the service that issued the signed tokens. If the issuing service supports JSON Web Keys, the URL to the JWK endpoint can be set using jwk-set-uri.

  • To enable the user impersonation feature, add the following to the application.yaml configuration:

1
2
3
4
    commons:
        security:
            oauth2:
                impersonation-enabled: true
  • It is possible to disable the auto configuration of the server components by setting spring.security.oauth2.resourceserver.enabled to false. Troubleshooting Some common error messages and ways to fix them:

  • principal cannot be null from OAuth2AuthorizeRequest: There probably was no Authentication in the application’s SecurityContext. Check if the application sets an Authentication.

  • Startup fails because no bean of type ClientRegistrationRepository was found: Check the configuration. This usually happens when the values in spring.security.oauth2.client are either missing or invalid. Check indentation!

Troubleshooting

Some common error messages and ways to fix them:

  • Principal cannot be null from OAuth2AuthorizeRequest: There probably was no Authentication in the application’s SecurityContext. Check if the application sets an Authentication.

  • Startup fails because no bean of type ClientRegistrationRepository was found: Check the configuration. This usually happens when the values in spring.security.oauth2.client are either missing or invalid. Check indentation!

Maintenance mode for the database schema

This chapter documents the arveo parameters to start the database service, alter schema and stop the service.

The arveo can be started in a special mode that ensures, that this instance changes the schema and prevents other instances from being started or have already been started. If the database schema change fails, the instance terminates in a way that can be easily evaluated by the administrator to be able to react to this exception.

The service does not start if registry query returns other running instances. The service terminates after the liquibase script is executed. The following two parameters are set:

system:
  terminateAfterCreation: true
  updateSchema: true

So the maintenance mode can be used to update the database schema. When the maintenance mode is enabled, the arveo starts, performs necessary schema updates, and terminates once the schema was updated. Requests from clients are not processed while the system is in maintenance mode. Clients will receive a HTTP 503 response code. Schema updates must be performed by one single arveo instance to avoid race conditions. The recommended procedure for a schema update is as follows:

  • Shut down all arveo instances

  • If required: Update to a newer arveo version

  • Enable maintenance mode by setting system.maintenanceMode: true in the configuration

  • Start one single arveo instance and wait for it to shut down after the schema was updated

  • Disable maintenance mode in the configuration

  • Start all arveo instances.

The database schema of an existing system can be changed by adapting the type definition classes and restarting the repository service with the setting arveo.server.system.maintenance-mode=true. The service will update the database schema and shut down once the update is finished. It will not accept requests while the schema is updated.

Supported schema changes

The following list contains the supported schema changes. Note that some changes like removing an attribute or adding constraints might not be possible when the existing data or existing constraints might be violated by the change.

  • Adding a new attribute.

  • Removing an existing attribute. Note that the column will be dropped from the schema.

  • Adding and removing indexes as well as changing index properties.

  • Change the primary key (only for META types).

  • Adding and removing of foreign keys.

  • Add new content elements (only for DOCUMENT types).

  • Adding and removing unique constraints.

  • Adding and removing not-null constraints.

It is also possible to enable certain features on existing type definitions. Disabling the features is not supported.

  • Enabling ACL support.

  • Enabling document filing.

  • Enabling optimistic locking.

  • Enabling the recycle bin.

  • Enabling retention support.

Checking for schema changes

By setting the properties arveo.server.system.maintenanceMode and arveo.server.system.logSchemaChanges to true, the system will start up, check for required schema changes, write them to a special log file, and shut down again. The database schema will not be changed. This makes it possible to check for unsupported changes to the schema before performing the actual schema update.

The directory used to store the schema update log can be specified using the property arveo.server.system.schemaChangeLogDirectory. The default value is logs. The system will create one logfile for each tenant. The contents of the file will look like the following example:

Supported changes for attributes of type definition my_document:
        - document_name: IS_UNIQUE
        - container_id: FOREIGN_KEY, IS_UNIQUE

Unsupported changes for attributes of type definition my_document:
        - document_name: none
        - container_id: none

In this example, there are three supported changes for the type definition named my_document. A unique constraint will be added to or removed from the attributes container_id and document_name and a foreign key will be added to or removed from the attribute container_id. There are no unsupported changes, so the actual schema update should succeed.

Please note that there are some advanced schema checks that can only be done correctly when the types are actually stored in the database. For example, the checks for the correctness of parent- and child- types of a relation type is not possible when the schema update itself is skipped.

Configure Audit

A @Type may define to be audited. This means, that any write access i.e. any create, update and delete operation to any entity of this type will be logged into another table. This is done with the annotation @Audit:

1
2
3
4
5
6
7
8
9
10
11
12
@Type(ObjectType.CONTAINER)
@Audit(AuditLocation.TYPE_SPECIFIC) (1)
public interface AuditedContainer {

    @Optional
    String getName();
    void setName(String name);

    @Optional
    Integer getInteger();
    void setInteger(Integer integer);
}
1 The annotation @Audit activates auditing on a type

The name of the table to be audited to is derived from the table name of the given type, following the form <table-name>_log. You can choose to specify one audit table per entity table, or alternatively to audit to one global table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Type(ObjectType.DOCUMENT)
@Audit(
    value = AuditLocation.GLOBAL, (1)
    indexOn = {AuditJsonField.CURRENT} (2)
)
public interface AuditedDocument {

    @Optional
    String getName();
    void setName(String name);

    @Optional
    Integer getInteger();
    void setInteger(Integer integer);
}
1 Note a different AuditLocation
2 with indexOn it is possible to specify on which json fields of the audit table indices should be set

In this case the table to be audited to will be the audit services default audit table default_audit_log.

Access audit

To access the audit, the audit service provides a REST API. Access will be restricted depending on the type: If ACLs are activated on a @Type, only users that have read access to an entity will be allowed to audit this entity. If ACLs are deactivated on a @Type, only users with the authority AUDITOR are allowed to audit the entities of this type.

SOLR

In order to use Solr in connection with arveo, an installation of a Solr service is required. Under the following link you can download the current versions of Solr: https://solr.apache.org/downloads.html.

The Solr service must be configured in arveo within the application.yaml. See Configuration properties for details.

When a type definition is annotated with @NOSql, the entities stored in this type definition will be stored in SOLR, too. See NOSQL Example for how to enable this feature. The system will create a special queue table in the relational database for such a type definition. The queue table will contain the entities that have to be stored in SOLR. A system job is used to process the entries in the queue table.

SOLR security configuration

By default, SOLR does not use any kind of authorization or authentication. In productive systems, SOLR must be secured by enabling transport encryption, authentication and authorization.

The transport encryption can be enabled by enabling HTTPs in SOLR. See the SOLR documentation for instructions. When SSL is enabled, the url in the configuration property ecr.server.solr.host must use the https scheme.

Security configuration data is stored in Zookeeper, so it is required to enable security measures in Zookeeper, too. arveo supports digest authentication for the Zookeeper client used to upload configuration data required by arveo. To enable digest authentication for the Zookeeper client, you need to set the following parameters in the arveo configuration:

ecr:
  server:
    solr:
      zkDigestUsername: zookeeperDigestUsername
      zkDigestPassword: zookeeperDigestPassword

The username and password must match the configured values for the admin role in the SOLR configuration. The SOLR documentation contains instructions for how to enable digest authentication in Zookeeper.

arveo can use OAuth2 JWTs to authenticate requests sent to SOLR. SOLR provides a JWT authentication plugin that must be enabled as described in the documentation. Enabling authentication and authorization in SOLR required uploading a security.json file to Zookeeper. The following example shows a security.json file that enables the JWT plugin and a rule based authorization plugin.

{
  "authentication": {
    "blockUnknown": true,
    "class": "solr.BasicAuthPlugin",
    "credentials": {
      "ecr-solr-user": "qkxp6hmEeGTaqnEvSmH7f+qytLWd/JcwaUyqpdjt5rg= NERXZefDt7lXYvdZfB0hT3ZCgNFSqI4nJ7kGgbhaTWs="
    },
    "realm": "My Solr users",
    "forwardCredentials": false
  },
  "authorization": {
    "class": "solr.RuleBasedAuthorizationPlugin",
    "permissions": [
      {
        "name": "schema-edit",
        "role": "admin"
      },
      {
        "name": "update",
        "role": "admin"
      },
      {
        "name": "read",
        "role": [
          "user",
          "admin"
        ]
      }
    ],
    "user-role": {
      "ecr-solr-user": [
        "user",
        "admin"
      ]
    }
  }
}

To enable OAuth support for the SOLR client used by arveo, you have to set the following parameters in the configuration for arveo:

ecr:
  server:
    solr:
      oauth2Enabled: true
      oauth2ClientRegistrationId: "cmn-user-service-client-credentials"

The value for the oauth2ClientRegistrationId must match a configured client registration that uses the client_credentials grant.

System jobs

The arveo system uses several background jobs to perform essential functions. These jobs are managed by a clustered Quartz scheduler running inside the repository service and/or in a dedicated job service. The scheduler instances are synchronized using the database. The repository service creates the jobs and initial trigger configurations when the system is started for the first time. Afterwards, it is possible to modify the scheduled jobs manually.

By default, the scheduler embedded in the repository service is used to create and to execute the jobs. Dedicated job service instances configured to use the same database as the repository service can be used to execute the jobs as well. It is also possible to start the scheduler embedded in the repository service in standby mode. In standby mode, the repository service will create the jobs (if required), but it will not execute them.

The available configuration parameters for the scheduler are listed here: Job service

The available configuration parameters for the jobs are listed here: Job configuration

It is required to configure the user and the password to be used for the jobs. The user has to own the required authorities to execute the jobs: ECR_PURGE_RECOVERY_TABLE and the autority configured in security.general.role-for-secured-access (by default ECR_SERVICE_USER).

Clean recovery table job

The expired entries in the recovery table (see Recovery) are deleted by the clean recovery table job. By default, the job is triggered every day at 3 a.m. The user configured to execute the system jobs needs to have the ECR_PURGE_RECOVERY_TABLE authority to be able to perform this operation.

NOSQL queue job

Data that is supposed to be stored in the SOLR NOSQL database is stored in dedicated queue tables in the relational database used by arveo. The NOSQL queue job is used to read the data from the queue tables and to write it to SOLR. By default the job is scheduled once a second individually for every queue table in the system. It is possible to configure the number of entries to process in one run of the job as well as the maximum number of attempts to write the data to SOLR.

When the NOSQL feature is disabled for a type definition, the queue job and the triggers for the queue job have to be disabled or removed manually from the scheduler.

Using external Job Service instances

It is possible to use one or more external Job Service instances to execute the scheduled system jobs. The service must be configured to use the same tenants as the repository service. To be able to execute the system jobs, the job implementations must be present in each of the Job Service’s class paths. The jobs are available as a ZIP file (ecr-packaging-jobs-external<version>.zip) that contains all required libraries. Simply extract the contents of the ZIP file to a directory (e.g. libs) and start the Job Service with the following parameter: -Dloader.path=libs.

The configuration parameters for the jobs are already configured in the database. No further configuration parameters for the jobs are required in the Job Service’s configuration. However, the service must be able to authenticate to the repository service. As the jobs use a username and password to obtain an access token, the service needs OAuth client registrations both for the client_credentials and for the password grant types. The following example shows how to configure two client registrations for the service:

OAuth client registrations
spring:
  security:
    oauth2:
      resourceserver:
        jwt:
          public-key-location: "http://localhost:39004/oauth/public_key"
      client:
        registration:
          cmn-user-service-client-credentials:
            provider: user-service
            client-id: "tech-client"
            client-secret: "tech-secret"
            authorization-grant-type: "client_credentials"
            scope: "oauth2"
          cmn-user-service-password:
            provider: user-service
            client-id: "test-client"
            client-secret: "my-secret"
            authorization-grant-type: "password"
            scope: "oauth2"
        provider:
          user-service:
            authorization-uri: "http://localhost:39004/oauth/auth"
            token-uri: "http://localhost:39004/oauth/token"

By default, the scheduler included in the repository service instances will be used to execute the scheduled jobs, too. When the scheduler in the repository service is started in standby mode, only the external Job Service instances will execute the scheduled jobs. The following configuration can be used to start the repository service with a scheduler in standby mode:

Scheduler in standby mode
job-service:
  standbyOnlyScheduler: true

Configuration Properties

ecr.server.caching

Property Type Description Default value

default-acls.expire-seconds

java.lang.Long

The time in seconds after which an entity in the cache will be expired.

900

default-acls.size

java.lang.Long

The maximum number of entities in the cache.

500

enums.expire-seconds

java.lang.Long

The time in seconds after which an entity in the cache will be expired.

900

enums.size

java.lang.Long

The maximum number of entities in the cache.

500

type-definition-access.expire-seconds

java.lang.Long

The time in seconds after which an entity in the cache will be expired.

900

type-definition-access.size

java.lang.Long

The maximum number of entities in the cache.

500

type-definitions.expire-seconds

java.lang.Long

The time in seconds after which an entity in the cache will be expired.

900

type-definitions.size

java.lang.Long

The maximum number of entities in the cache.

500

ecr.server.content

Property Type Description Default value

default-definition.media-type

java.lang.String

The supported media type of the content element. Use 'application/octet-stream' as a wildcard that supports all content types.

default-definition.storage-profile

java.lang.String

The name of the storage profile.

definitions

java.util.Map<java.lang.String,de.eitco.ecr.server.config.ContentDefinitionSettings>

A map containing all content definitions.

ecr.server.http

Property Type Description Default value

file.directory

java.io.File

The directory used to store the temporary files.

file.prefix

java.lang.String

The prefix to use for the names of the temporary files.

temp

file.suffix

java.lang.String

The suffix to use for the names of the temporary files.

.dat

file.threshold

java.lang.Integer

The size of the file in bytes from which on a temporary file will be used for buffering.

131072

ecr.server.jobs

Property Type Description Default value

clean-recovery-table.cron-expression

java.lang.String

Defines the CRON expression used to schedule the job.

0 0 3 * * ?

no-sql-queue.batch-size

java.lang.Integer

Sets the number of entries to load from the queue table in one batch.

100

no-sql-queue.cron-expression

java.lang.String

Sets the CRON expression used to schedule the job.

*/1 * * * * ?

no-sql-queue.retries

java.lang.Integer

Sets the maximum number of attempts to write an entry in the queue to solr.

3

password

java.lang.String

Defines the password of the user used to run the jobs.

username

java.lang.String

Defines the name of the user used to run the jobs.

ecr.server.liquibase

Property Type Description Default value

auto-change-log

java.lang.String

Defines the location used to store the auto generated changelog.

changeLog/auto.xml

changelog-directory

java.lang.String

The directory used when generated changelogs are kept. This setting is only relevant when keepChangelogs is set to true.

changelog

custom-change-log

java.lang.String

Defines the location of a custom liquibase changelog to execute on startup after the database schema was initialized. Changelogs can be loaded from the classpath by adding the 'classpath:' prefix. Files must be identified by an absolute path using the prefix 'file:/'.

keep-changelogs

java.lang.Boolean

If set to true, generated changelogs will be kept in separate files in the configured directory.

false

pre-initialization-change-log

java.lang.String

Defines the location of a custom liquibase changelog to execute on startup before the database schema was initialized. Changelogs can be loaded from the classpath by adding the 'classpath:' prefix. Files must be identified by an absolute path using the prefix 'file:/'.

ecr.server.memory

Property Type Description Default value

buffer-size

java.lang.Integer

Defines how many bytes of data to keep in memory when working with streams before switching to a temporary file.

1024000

ecr.server.messaging

Property Type Description Default value

json-messages

java.lang.Boolean

If enabled, the payload of JMS messages will be a JSON string.

true

ecr.server.query

Property Type Description Default value

in-condition-optimization-limit

java.lang.Integer

Sets the number of entries in an in clause from which in the optimized query is used. -1 disables that feature.

-1

ecr.server.security

Property Type Description Default value

type-definition-access-checks-enabled

java.lang.Boolean

Defines whether type definition specific access checks are enabled or not.

true

ecr.server.storage

Property Type Description Default value

profile-templates

java.util.List<de.eitco.ecr.server.config.StorageProfileTemplate>

null

profiles

java.util.Map<java.lang.String,de.eitco.ecr.server.config.StorageProfileSettings>

A map containing all configured storage profiles.

ecr.server.system

Property Type Description Default value

create-solr-changes

java.lang.Boolean

If set to false when initializing the type schema solr changes will not be executed.

true

initialize-empty-database

java.lang.Boolean

If set to true, the system will create the schema even if not in maintenance mode should the table ecr_types be empty.

true

log-schema-changes

java.lang.Boolean

If set to true together with maintenanceMode, the system will only log required changes to the database schema and shut down after the log was written.

false

maintenance-mode

java.lang.Boolean

If true, the server will update the database schema at startup and shut down after the update was finished. This is actually a combination of updateSchema = true and terminateAfterCreation = true.

false

schema-change-log-directory

java.lang.String

The location of the logfile used when checkForSchemaChanges is set to true.

logs

terminate-after-creation

java.lang.Boolean

If true, the server will terminate after the database schema was created.

false

update-schema

java.lang.Boolean

Whether to update the database schema at startup or not.

false

ecr.server.upload

Property Type Description Default value

maximum-file-size

java.lang.Long

Defines the maximum size of a single file in one multipart upload in bytes. -1 means no limit.

-1

maximum-in-memory-size

java.lang.Integer

Defines the maximum size of data to keep in memory before using a temporary file (in bytes).

1048576

maximum-total-size

java.lang.Long

Defines the maximum total size of all files in one multipart upload in bytes. -1 means no limit.

-1

ecr.server.solr

Property Type Description Default value

commit-within-millis

java.lang.Integer

Defines the maximum time in milliseconds after which the solr client will perform a commit.

1000

default-config-name

java.lang.String

Defines the default SolrConfig.

solr-plugin-config

host

java.lang.String

Defines the host for the connection of the Solr Client.

http://localhost:38983/solr

http-client-connection-timeout

java.lang.Integer

Defines the connection timeout for the Solr HTTP client in milliseconds.

10000

http-client-socket-timeout

java.lang.Integer

Defines the socket timeout for the Solr HTTP client in milliseconds.

60000

password

java.lang.String

The password used for basic authorization to SOLR.

username

java.lang.String

The username used for basic authorization to SOLR.

zk-digest-password

java.lang.String

The password used for digest authentication for SOLR Zookeeper.

zk-digest-username

java.lang.String

The username used for digest authentication for SOLR Zookeeper.

zookeeper-client-host

java.lang.String

Defines the host for the connection of the Solr Zookeeper Client.

localhost:39983

zookeeper-client-timeout

java.lang.Integer

Defines the Timeout for the Connection of the Solr Zookeeper Client in milliseconds.

60000

zookeeper-connect-timeout

java.lang.Integer

Defines the connect-timeout for the Zookeeper client.

30000

job-service

Property Type Description Default value

configuration-tenant

java.lang.String

Defines the tenant to use to store job-information in the database.

standby-only-scheduler

java.lang.Boolean

If true, the scheduler used by the job service will be in standby mode. It will not process any jobs.

false

Access Control

Access Rights

The REST API has the following user-rights (authorities) for different endpoints:

  • ECR_SERVICE_USER (configurable): Required authority for all API endpoints. Must always be present.

  • ECR_ADMIN: Allows editing type- and attribute definitions as well as other administrative operations.

  • ECR_DSGVO_ADMIN: Allows a user to change the litigation hold and retention settings of entities contained in type definitions using the retention feature.

  • ECR_DSGVO_PRIVILEGED_DELETE: An addition to ECR_DSGVO_ADMIN that allows a user to delete an entity which is still within it’s retention period. Organisational precautions must be put in place to ensure DSGVO compliance when making use of this authority.

  • ECR_ALL_TYPES_READ: Allows read access to all type definitions that use type level access restrictions.

  • ECR_ALL_TYPES_WRITE: Allows write access to all type definitions that use type level access restrictions.

  • ECR_PURGE_RECOVERY_TABLE: Allows a user to trigger the removal of expired entries in the recovery table.

Permissions

This page contains an overview of the required permissions in the arveo Repository REST Service.

Endpoints

The actuator endpoints /actuator/health and /actuator/info are accessible without authentication.

All API endpoints (/api/*) can only be used by authenticated users. The users need the authorization ECR_SERVICE_USER for this.

Certain endpoints that offer administrative functions can only be used by users with the additional authorization ECR_ADMIN. These are:

  • Updating a type definition: PUT /definitions/{id}

  • Update a type definition (by name): PUT /definitions/named/{id}

  • Create a new type definition: POST /definitions

  • Create several new type definitions: 'POST /definitions/multiple

  • Create a new attribute definition: POST /attributes

  • Create several new attribute definitions: POST /attributes/multiple

Permissions on Type Definitions

It is possible to restrict the access of users to certain type definitions. For this purpose for each type definition two SecTokens can be created in the user administration: One for the read- and one for write access. The authorization check on the type definition is done by enabling or disabling the property 'accessCheck'. If the property is enabled, the user has to have the authorization ECR-TYPE_$ID$_READ for reading or ECR-TYPE_$ID$_WRITE for writing of data in this type definition. The placeholder $ID$ stands for the ID of Type Definition.

Additionally, there are two types of authorizations ECR_ALL_TYPES_READ and ECR_ALL_TYPES_WRITE, which give a user read or write access to all definitions types. The check of the permissions on the type definitions can also be disabled completely using the switch security.type-definition-access-checks-enabled (true/false) .

Within a folder structure, child elements of a parent folder can belong to different type definitions. To filter the result, the user’s permissions on the type definitions are used. This means that a user will only see folders and documents from type definitions for which he or she has read permission.

Permissions of folders are inherited to the documents. For example, if a document of a type definition to which the user has read permission is located in a folder on whose type definition the user does not have read permission, the user is not allowed to access the document either. The folder permissions are currently only checked for the direct parent folder. There is no inheritance within the folder hierarchy.

Access Control Lists

The Access Control Service extends the Eitco User Management Service with the possibility of access control. Access control lists (ACLs) are used for this purpose.

The Concept of Access Control Lists

An ACL assigns rights and prohibitions to users and groups. Normally, ACLs are then assigned to objects. How ACLs are assigned to objects and what rights exist is not defined in the module user-management-access-control. However, there are requirements for the rights concept of the applications that use this component:

  • Rights and prohibitions share the same value range. You can only forbid what you can allow and vice versa.

  • The rights are fully ordered and imply their respective subordinate rights. That is, there is a clear hierarchy of rights. If someone has a right, then he also has all the rights that are smaller.

  • Even the smallest right implies access. Once someone has rights on an object, they will be able to find it in searches. With these requirements, the permissions of a user on an ACL can be calculated as described below.

Let G be the set of all permissions assigned to a user (directly or indirectly via the groups he is in) in the ACL.
Let D be the set of all prohibitions assigned to a user (directly or indirectly via the groups he is in) in the ACL.

If max(D) > max(G), then the user has the permission max(G); otherwise the user has the permission max(D) - 1.

So generally, the greater the value of an ACL, the more rights this group or job position possesses. If no ACL is assigned to an object, then it is accessible to everyone.

It is possible to prohibit all rights on an object, but then nobody would have access to it making it effectively removed.

Substitution

In order to implement concepts such as substitutions or extensions (i.e. group memberships that propagate authorisations but not prohibitions), the rights determination must be adapted. In addition, an extension of the user service is required.

Weak group membership

Group memberships of users require the information whether they are "weak". Weak affiliations only propagate permissions, never prohibitions.

The permissions of a user on an ACL would then be calculated as follows:

Let S be the set of all groups to which the user is strongly (i.e. not weakly) connected.
Let S' be the set of all groups to which the user is weakly connected.

Let G be the set of all permissions assigned to the groups in S in the ACL.
Let D be the set of all prohibitions assigned to groups in S in the ACL.

For all s_i in S', now calculate the permissions g_i as follows:

    Let G' be the set of all permissions assigned to groups in S in the ACL.
    Let D' be the set of all prohibitions assigned to the groups in S in the ACL.

    If max(D') > max(G'), then g_i = max(G')
    otherwise g_i = max(0, max(D') - 1)

Let G' be the set of all calculated permissions g_i

If max(D) > max(G + G'), then the user has the permission max(G + G'); otherwise the user has the permission max(D) - 1
Mapping of the Access Control List values

Although the module user-management-access-control defines the concepts and functionality of the ACLs, the actual mapping of the values is implemented in arveo. The class de.eitco.ecr.acl.AclRight implements the following permissions:

  • BROWSE(5): The user is allowed to see the object’s metadata but not the content

  • READ(10): The user is allowed to see the metadata and content of the object

  • COMMENT(15): The user is allowed to add annotations to the object

  • WRITE(20): The user is allowed to change metadata and content of the object creating a new version

  • OVERWRITE(25): The user is allowed to overwrite an existing version of the object

  • DELETE(30): The user is allowed to delete the object

  • CHANGE_ACL( (Short.MAX_VALUE - 1).toShort()): The user is allowed to change the ACL of the object.

To illustrate the information above, here are some examples:

1: The permission COMMENT is assigned to a group or a job position. In this case, the assignee is per default granted the permissions BROWSE and READ;

2: The prohibition WRITE is assigned to a group or a job position. In this case, the assignee is per default prohibited all the higher rights, so OVERWRITE, DELETE and CHANGE_ACL;

3: A job position J1 with the permission COMMENT acts as a substitute for a job position J2 with the permission OVERWRITE. So the job position J1 is assigned the permission OVERWRITE for the time of the substitution.

4: A job position J1 with the prohibition WRITE acts as a substitute for a job position J2 with the prohibition READ. The job position J1 is still assigned the prohibition WRITE for the time of the substitution. This way, it is guaranteed, that J1 is still able to perform their tasks (which would become impossible if they were assigned stronger prohibitions, that is, the prohibitions of J2).

User Management Access Control

The arveo uses the functionality of the module user-management-access-control to control access to entities. This module includes a client module, containing interface methods to work with it.

Usage of the module user-management-access-control in an application

The client application has to be a Spring boot application. It can connect to the server in two ways:

1) by http;

In this case, add the maven dependency below to your project:

Maven dependency for the http Spring starter
<groupId>de.eitco.commons</groupId>
<artifactId>cmn-user-management-access-control-http-client-spring-boot-starter</artifactId>
<version>${project.version}</version>

2) embedded by java calls.

For this, use the following maven dependency:

Maven dependency for the embedded starter
<groupId>de.eitco.commons</groupId>
<artifactId>cmn-user-management-access-control-embedded-spring-boot-starter</artifactId>
<version>${project.version}</version>
Batch updates for ACLs

It is possible to change values of multiple ACLs. So lists of ACLs, that satisfy a certain condition, can be processed. For every ACL, that fulfills a given condition, the following modifications can be specified:

  • addgroupright (adds a given right to a given group in every ACL that fulfills the condition);

  • adduserright (adds a given right to a given user in every ACL that fulfills the condition);

  • keepgroupright (keeps the current right of a given group in every ACL that fulfills the condition);

  • keepuserright (keeps the current right of a given user in every ACL that fulfills the condition).

All the other entries in the ACLs that fulfill the condition, are removed.

The ACL updates are performed in the the module 'common', in the Client SDK by the class AclServiceClient. It calls the method updateAclsWhere() and passes two parameters: an Expression of type Boolean (the condition mentioned above) and a List of type AccessControlListModification as modifications (the four modifications mentioned above) to apply to every ACL. The method updateAclsWhere() executes the given ACL batch updates.

There is also a more convenient method with the same name updateAclsWhere(), that returns a ConditionBuilder, that can be used in searches (see [Search Service]).

Example of usage

Consider the following snippet from a test class as an example of the batch update functionality for the ACLs. Pay attention to the method setRightsTo(), which is called to modify the current rights.

Example of batch update functionality
        aclServiceClient.updateAclsWhere().contextReference("id").in().values(
            acl1.getIdentifier().getValue(),
            acl2.getIdentifier().getValue()
        ).holds()
            .setRightsTo(GrantAndDeny.grant(AclRight.READ)).of(umAdmin.getIdentifier())
            .execute();

ACL Data Model

An Access Control List (ACL) or “Access Control List” is used to restrict access to a data object. ACLs consist of entries, called Access Control Entries, each of which describes an access rule in more detail. For example, an entry can grant access to a specific object (for example a file in a file system; the respective ACL is assigned to this) for a specific user (or a group of users) to an access level (for example "View only" or " Edit ").

Attribute Based Access Control provides access based on the evaluation of attributes. DAC In the Discretionary Access Control (DAC) model, access to resources is based on user’s identity. A user is granted permissions to a resource by being placed on an access control list (ACL) associated with resource. An entry on a resource’s ACL is known as an Access Control Entry (ACE). When a user (or group) is the owner of an object in the DAC model, the user can grant permission to other users and groups. The DAC model is based on resource ownership.

Attribute Based Access Control (ABAC)

arveo allows entity access based on attributes of that entity. This can be specified per entity by a static method annotated with @Security. The method must return an eql expression that resolves to a boolean i.e. a condition. It will be called by arveo when entities of the given type are accessed to retrieve an additional filter for the access. Operations will only affect entities where the condition evaluates to true.

Every operation on entities of the type will execute the method and add the resulting expression to the filter of the operation:

  • Searches will add the expression to the filter of the search request.

  • Batch operations will add the expression to their filter

  • Calls that operate on a specific id will fail if the expression yields false.

The simplest case would look as follows:

the simplest access check
1
2
3
4
5
6
7
8
9
10
@Type(ObjectType.DOCUMENT)
public interface UnsecuredDocuments {

    @Security
    static Expression<Boolean> calculateAccess() {

        return Eql.alwaysTrue();
    }
}

This would add the filter 'true' to every operation on the entity, which would allow anyone to access entities.

In most cases, one would want to compare attributes of the entity with properties of the user requesting the current operation. The first can be accomplished with the eql:

accessing an attribute
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Type(ObjectType.CONTAINER)
public interface ThresholdContainer {

    int getThreshold();

    void setThreshold(int threshold);

    @Security
    static Expression<Boolean> calculateAccess(Alias alias) {

        return EcrQueryLanguage.condition().
                        alias(alias).field("threshold").
                                greaterThan().value(300);
    }
}

Users may only access entities of the type above where the field 'threshold' is greater than 300.

In order to check the user requesting an operation, one can define a parameter to the method of the type AuthenticationContext. Other information may be accessed this way, too. The method can have up to four parameters of the following types:

  • AuthenticationContext: this class holds information about the user requesting the operation.

  • AclRight: the right needed to perform the operation.

  • Alias: identifies the part of the query that holds the entity

  • DSLContext: an entrypoint to the jooq api bound to the database and schema the table containing the entities is located in.

Parameters

AuthenticationContext: Who Requests the Operation?

The AuthenticationContext holds information about the user requesting the operation. This parameter will most likely be used in every such method, except for the most basic cases.

Take a case where access to a document is specified by a field named access_token. It holds the name of a user-management authority every user with access to it must have. If it is null, every user has access to the document:

a type specifying different access to different users.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@Type(ObjectType.DOCUMENT)
@OverwriteAllowed
public interface DocumentWithAccessToken {

    @Mandatory(false)
    String getAccessToken(); (1)

    void setAccessToken(String accessToken);

    @Security
    static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext, DSLContext dslContext) { (4)

        return EcrQueryLanguage.condition()
            .alias(alias).field("access_token").isNull() (2)
            .or()
            .value(authenticationContext.getAuthorities())
            .contains().alias(alias).field("access_token")
            .holds();
    }

    // ...
    // more attributes (3)
}
1 the type defines the attribute that specifies access
2 the query generated uses this attribute.
3 other elements of the type are omitted for the sake of readability
4 note that the third parameter is unused. In such a case it could be omitted.

AclRight: What Will the Operation Do?

The AclRight parameter holds the right necessary to perform the operation requested. This is a hint for the method about what should actually be done in the operation. It allows differentiating between read and write access:

a type differentiating between read and write access.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface ContainerAccessedByUserId {

    long getOwner(); (1)

    void setOwner(long owner);

    List<Long> getAudience(); (2)

    void setAudience(List<Long> audience);

    @Security
    static Expression<Boolean> checkAccess(Alias alias, AuthenticationContext authenticationContext, AclRight right) {

        long userId = authenticationContext.getUser().getIdentifier().getValue();

        if (AclRight.READ.getValue() < right.getValue()) { (3)

            return EcrQueryLanguage.condition().alias(alias).field("owner").equalTo().value(userId).holds();
        }

        return EcrQueryLanguage.condition()  (4)
            .alias(alias).field("audience").contains().value(userId)
            .or().alias(alias).field("owner").equalTo().value(userId)
            .holds();
    }

    // ...
    // more attributes
}
1 This type defines an attribute owner holding the user id of the user, responsible. The owner of an entity will be the only user to modify the entities.
2 The type also defines a list of user ids audience, holding the ids of users that may read the entity. Users that are neither owner nor audience have no access on the entity.
3 Thus, in cases where a right greater than READ is requested, the method returns an expression, that checks whether the current user is the owner of the document.
4 In every other case .i.e. the requested access right is READ or below, an expression is returned, that checks whether the current user is the owner or part of the audience.

Alias

The alias identifies the part of the query executed that contains the entity and should be used to reference its members.

Always use the alias as given in the examples. Other ways to reference the entity might work in most cases but only using the alias assures that referencing entity attributes works in every case.
The full class name is de.eitco.ecr.common.search.Alias. Avoid confusion with another Alias class.

DSLContext

In some cases using expressions on the entity itself may become cumbersome or slow. For that, one can use the DSLContext parameter. This allows access by jooq to any table in the same schema the table of the requested entity is located in. It can be used to obtain specific data directly.

Since the access is directly to the database, there are no further access checks on queries using DSLContext.
Depending on the operation requested, the method may be able to execute INSERT or UPDATE statements. It is the responsibility of the security methods author to make sure changes do not create an inconsistent or otherwise corrupted state of the database. The simplest way to assure this, is to use the DSLContext only to read data.

Examples

Subselect

There might be cases where the attribute defining access is not part of the entity itself, but part of another entity referred to by a foreign key or a relation. In such cases a subselect comes handy. Assume two entity types: documents, to which access is restricted by an attribute named owner_group which is part of the second entity a container. An owner group must be given Documents are linked to their container with a foreign key named contained_in:

the container entity
1
2
3
4
5
6
7
8
9
10
11
12
13
@Type(ObjectType.CONTAINER)
public interface OwnedContainer {

    long getOwnerGroup(); (1)

    void setOwnerGroup(long ownerGroup);


    // ...
    // more attributes

}
the document entity
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
@Type(ObjectType.DOCUMENT)
public interface OwnedDocument {

    @ForeignKey(target = OwnedContainer.class, targetProperty = "id")
    ContainerId getContainedIn(); (2)

    void setContainedIn(ContainerId container);

    @Security
    static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext) {

        List<Long> groupIds = authenticationContext.getAllGroups().stream().map(group -> group.getIdentifier().getValue()).collect(Collectors.toList()); (3)

        return EcrQueryLanguage.condition().alias(alias).field("contained_in").in() (4)
            .select("id").from("owned_container").as("container").where().
            contextReference("container", "owner_group").in().values(groupIds).holds().holds();
    }

    // ...
    // more attributes
}
1 The entity OwnedContainer holds the attribute that specifies access.
2 The entity OwnedDocument is linked with a container by its attribute contained_in.
3 The AuthenticationContext is used to obtain the ids of every group the current user is a member of.
4 The group ids are used to create a check whether the entity is contained in a container whose owner_group is one of the users groups.

Interface Inheritance

Since attribute based security - by definition - is based on attributes, it must be able to be specified by type. However, in some cases a more general solution is desired. In these cases, java interface inheritance comes handy.

Assume the class DocumentWithAccessToken from above. Assume further that there are other types (ContainerWithAccessToken and FolderWithAccessToken) that should be secured by their access-token as well. In this case it is a good practice to combine the access method and field in a common superinterface:

superinterface
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(2)
public interface WithAccessToken {

    @Mandatory(false)
    String getAccessToken();

    void setAccessToken(String accessToken);

    @Security
    static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext) {

        return EcrQueryLanguage.condition() (1)
            .alias(alias).field("access_token").isNull()
            .or()
            .alias(alias).field("access_token").in()
            .values(new ArrayList<>(authenticationContext.getAuthorities()))
            .holds();
    }

}
1 The check for the access token is defined here.
2 note that this interface does not specify an entity by itself, since it lacks a @Type annotation.

Then the types itself can simply inherit this feature:

inheriting entity 1
1
2
3
4
@Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface ContainerWithAccessToken extends WithAccessToken {
}
inheriting entity 2
1
2
3
4
@Type(ObjectType.FOLDER)
@OverwriteAllowed
public interface FolderWithAccessToken extends WithAccessToken {
}

Complex Scenario: Hospital

Here we look at a more complex example: a Hospital. The hospital manages documents concerning cases. A case belongs to a patient. Users of the system are hospital employees and may access data about documents, cases and patients. These users are part of one or several wards. For every ward there is a group in the system containing the users that are part of this ward. Cases have a list of wards - that may change over time - where the patient was treated for that case. Access is specified as follows

  • A user may only access cases whose wards contain at least one ward, the user is a member of.

  • A user may only access patients whose cases he may access.

  • A user may only access document whose cases he may access.

Cases could be modeled as follows:

The case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
@Type(ObjectType.CONTAINER)
public interface MedicalRecordCase {

    @Mandatory
    @ForeignKey(target = MedicalRecordPatient.class, targetProperty = "id")
    ContainerId getPatient();  (1)

    void setPatient(ContainerId containerId);

    @Mandatory
    List<String> getWards(); (2)

    void setWards(List<String> wards);

    @Security
    static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {

        List<String> groupNames = authenticationContext.getAllGroups() (3)
            .stream().map(group -> group.getEntityName().getValue()).collect(Collectors.toList());

        Expression<Boolean> result = null;

        for (String groupName : groupNames) { (4)

            Expression<Boolean> wardCondition = EcrQueryLanguage.condition() (5)
                .alias(alias).field("wards").contains().value(groupName)
                .holds();

            if (result == null) {

                result = wardCondition;

            } else {

                result = Eql.or(result, wardCondition); (6)
            }
        }

        if (result == null) {

            return Eql.alwaysFalse(); (7)
        }

        return result;
    }


    // case attributes ... (8)
}
1 A case holds a foreign key to a patient. Since a case must have a patient, this attribute is mandatory.
2 A case has a list of wards, where it was treated. This attribute is also mandatory.
3 When computing access, the groups - and thus the wards - of the current user are obtained from the AuthenticationContext
4 Since it is necessary to check whether the intersection between the wards of the case and the groups of the user is not empty, it is iterated over all the groups of the user.
5 A condition is created that checks whether the entities wards contain the current group.
6 Access is granted when one of the conditions created yields true.
7 If the user is in no group whatsoever he may access no case at all.
8 Further attributes are omitted for the sake of readability.

Now Patients specify their security as follows:

The patient
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@Type(ObjectType.CONTAINER)
public interface MedicalRecordPatient {

    @Security
    static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {

        Alias caseAlias = Alias.byName("case"); (2)

        Expression<Boolean> caseAccessCondition = MedicalRecordCase.access(caseAlias, authenticationContext);  (1)

        return EcrQueryLanguage.condition().exists()
            .select("id").from(MedicalRecordCase.class).as(caseAlias.getValue()) (3)
            .where()
            .alias(caseAlias).field("patient").equalTo().alias(alias).id() (4)
            .and(caseAccessCondition).holds().holds(); (5)
    }

    // patient attributes ...
}
1 Access to a patient depends on access to cases. So, the MedicalRecordCase.access() is called (see above).
2 In order to do that a custom alias is specified, that is used for the method call and in the query below.
3 Using a subselect its is checked whether there is a case …​
4 …​ that is assigned to the patient the access is checked for and …​
5 …​ and to which the current user may access.

Documents may specify their security method very similar, only the document-to-case link is specified the other way around:

The document
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
@Type(ObjectType.DOCUMENT)
public interface MedicalRecordDocument {

    @ForeignKey(target = MedicalRecordCase.class, targetProperty = "id")
    @Mandatory
    ContainerId getCase(); (1)

    void setCase(ContainerId containerId);

    @Security
    static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {

        Alias caseAlias = Alias.byName("case");

        Expression<Boolean> caseAccessCondition = MedicalRecordCase.access(caseAlias, authenticationContext); (2)

        return EcrQueryLanguage.condition()
            .exists().select("id").from(MedicalRecordCase.class).as(caseAlias.getValue())
            .where()
            .alias(caseAlias).id().equalTo().alias(alias).field("case") (3)
            .and(caseAccessCondition).holds().holds();
    }

    // patient attributes ...
}
1 A document is assigned to a case. This is mandatory.
2 As for patients, the access check for documents depends on the access check for cases.
3 A similar subselect to the one above is created, however here the outer select holds the link to the inner one.

Revision History and Attribute Based Access Control

In the example above access to the entities is defined by one attribute: the wards of a case. It is assumed that a case may be treated in several wards - one after another - and every employee belonging to those wards needs access to the case, its patients data and its documents. Visiting the wards one after another will result in several updates on the case - each adding another ward - and thus in a revision history where the list of wards will build up over time.

In the scenario above this has an interesting consequence: The access to older versions of the case will be granted to users that were allowed to access it at the time the version was created. For example if a case started in the pulmonology it would have the following revision list:

revision wards

1

pulmonology

If it was moved to intensive care after that, it would result in the following revision list:

revision wards

1

pulmonology

2

pulmonology, intensive care

Employees working in intensive care would be unable to access data of revision 1 of this case. Depending on the scenario this might or might not be desired.

If this is not desired, it can be fixed with a simple annotation on the case interface:

alternative case
1
2
3
4
    @Mandatory
    @Versioned(value = false)
    List<String> getWards();

By simply specifying the wards attribute as not versioned, changes on the attribute will affect every revision of the case. If a case started in the pulmonology it would at first have the same revision history as above:

revision wards

1

pulmonology

However, if it was moved to intensive care now, the revision list would look like this:

revision wards

1

pulmonology, intensive care

2

pulmonology, intensive care

Now all employees in pulmonology and intensive care have access to every revision of this case.

This solution can be used generally. When access control to entities depends on attributes, deciding whether those attributes are versioned or not is an important detail.

Accessing External Tables

Assume that in the hospital from the example above, the information which employee belongs to which ward is kept in a separate table named 'employee_to_ward'. This table is managed by an external application.

Using Direct Database Access

As stated earlier, it is possible to add a parameter of the type org.jooq.DSLContext to a security method in order to gain direct access to the database. This could be used to access the 'employee_to_ward' table:

using DSLContext
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
    @Security
    static Expression<Boolean> access(
        Alias alias,
        AuthenticationContext authenticationContext,
        DSLContext context  (1)
    ) {

        long userId = authenticationContext.getUser().getIdentifier().getValue(); (2)

        final List<String> wards = context.selectFrom("test_employee_to_ward")
            .where(DSL.field(DSL.name("employee")).eq(DSL.value(userId))) (3)
            .fetch(DSL.field("ward", String.class));

        Expression<Boolean> result = null;

        for (String ward : wards) {

 // ... (as above) (4)
1 The DSLContext is defined as another parameter.
2 The AuthenticationContext is only used to get the current users id.
3 The wards of the user are obtained using the jooq-api to directly access the database. Depending on the scenario, it might improve performance to cache the result of this query.
4 After that, the same code as above is executed.

Using a Metadata Type

Alternatively, an arveo custom type could be used to access the external table:

an external type
1
2
3
4
5
6
7
8
9
10
11
12
13
@View (1)
@Name("employee_to_ward") (2)
@Type(ObjectType.META)
public interface UserToWard {

    long getEmployee();

    void setEmployee(long employee);

    String getWard();

    void setWard(String ward);
}
1 The @View annotation marks the type as external. This means arveo will not create the corresponding table.
2 The @Name annotation specifies the name of the table the types entities are stored in.

Now, in the security method this type can be accessed with a subselect:

using subselect
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
    @Security
    static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {

        final Alias userWard = Alias.byName("user_ward");  (1)

        return EcrQueryLanguage.condition()
            .exists().select("ward").from(UserToWard.class).as(userWard.getValue())(2)
            .where()
                .alias(userWard).field("employee").equalTo()
                .value(authenticationContext.getUser().getIdentifier().getValue()) (3)
            .and()
                .alias(alias).field("wards").contains() (4)
                .alias(userWard).field("ward").holds()
            .holds();
    }
1 First, an alias is declared for the subselect.
2 Then, a query is created that checks whether there is a ward, that …​
3 …​ the current user is assigned to and …​
4 …​ that is contained in the current entities wards attribute.

Data Modelling

Entity Types

The following chapter defines entity types and type definitions, used in arveo.

To be able to store objects in the database we define a class for entity definitions.So an entity represents a type of data structure used in the arveo.There are five supported entity types.

  • Document: an entity that can contain metadata and content. Documents are the only objects that can have content, the content may be binary. Documents can be contained in folders (Document).

  • Container: simple folder-like object not organized in a tree structure but with relations to other objects. A Container contains only metadata and cannot be contained in a folder (Container).

  • Relation: an entity that represents a relation between two other entities. A relation can contain metadata (Relation).

  • Folder: an entity that contains metadata and is organized in a tree structure like in a file system (Folder)

  • Meta: an entity that contains only metadata. Unlike containers, metadata entities do not support system attributes like ID and creation date (Metadata)

Each type definition is represented by one (or more) tables in the database. Each entity is referred by its system-wide unique id, which consists of a tenant id and its type definition id, followed by the sequential database id of this entity:

[12bit Tenant id][14bit Type Definition id][38bit Entity id].

Versioned Entities

All above listed entities (except for meta) are versioned by default. It means that they store version information, modification information. The class VersionInformation combines information about a version, including version id, version number and version comment. The version modification object stores a modification stamp, consisting of a user id and a ZonedDateTime object, both for the events of creation and last modification of the entity. The version information is stored in a separate table for each typed entity.

When specifying a type definition, you can decide which attributes of this type definition are versioned.

If none of the attributes are versioned, the entire object is not versioned. For the type Document the content changes are always versioned.

Custom Types

You can make your class a type and add features by annotating your classes. You can define the custom metadata schema with simple getter and setter methods.

When you start a project you have to create your own types. Simply annotate the class with the TYPE annotation and define your schema with type safe getter/setter methods (Example.

You can find the arveo-specific annotations in the module type-definition-annotations. The goal is to create a type, and specify its properties. So annotations precisely define the behavior of the type definitions. When defining a type, a database table is created. To achieve this, you annotate the type definition with @Type. There is an exception to that: when annotating with @View or @Partial_View, no database table is created.

There are 2 types of annotations:

  • annotations on types (interfaces): @Target({ElementType.TYPE, ElementType.ANNOTATION_TYPE})

  • annotations on properties (getter-methods): @Target({ElementType.METHOD, ElementType.ANNOTATION_TYPE})

Some annotations can be used both on interfaces and on getter-methods. The annotation ElementType.ANNOTATION_TYPE is used for inherited annotations. The following annotation groups are used in arveo:

  • constraint: contains annotations that define specific properties or behaviour of attributes;

  • defaults: contains annotations that define default values of attributes;

  • index: contains annotations that define indexes on type definitions;

  • naming: contains annotations that specify names for tables, attribute definitions, type definitions, enumeration types and enumeration values;

  • reference: contains annotations that specify references between types or attributes;

  • system: contains annotations that concern system properties;

  • view: contains annotations that mark an interfaces as view;

  • other: contains annotations like @Type, @EcrIgnore and others, which stand out and cannot be classified into a group.

You can use the 5 entity classes to create custom entity types to serve the needs of your system. The customized entity types reflect the structure of your project or organization and can be created in a flexible way by extending the five entity types of the arveo system. You can make your class a type and add features by annotating your classes. You can define the custom metadata schema with simple getter and setter methods.

To create your first project using arveo you may want to review the following examples and follow the pattern.

Inherited annotations

Certain properties of annotations have a wide usage throughout the code, so it is therefore more convenient to define a certain annotation once for frequent usage.

The following is a listing of the interface definition @CustomAnnotation, which defines itself as a system property version id.If you mark a getter-method with this annotation, there is no need to list the system property name.

Listing of the interface @CustomAnnotation
@Target({ElementType.METHOD, ElementType.ANNOTATION_TYPE})
@SystemProperty(SystemPropertyName.VERSION_ID)
public @interface CustomAnnotation {

}

To take advantage of this interface, we annotate getter-methods with it as shown in the listing below:

public interface InterfaceInheritanceExample {

    @SystemProperty(SystemPropertyName.ID)
    DocumentId getId();

    @CustomAnnotation
    VersionId getVersionId();
}

Examples

Enumeration Example

Define a enum class and use it in a another object type (Example).

@Enumeration(typeName = "my_enum")
public enum MyEnum {
    ENUM1, ENUM2, ENUM3, ENUM4
}
Document Type Example
Example of a type definition using the object type Document
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
@Type(ObjectType.DOCUMENT) (1)
@RetentionProtected
@ContentElement(defaultDefinition = true, separateField = true)
@OverwriteAllowed
@RecycleBin
@Audit
public interface Resume {
// Immutable identifier documentid of the resume document: unique and readonly
@Unique
@ReadOnly
// alternatively: use autoincrement instead of unique and readonly to let the service create a unique sequence
//@Autoincrement
long getDocumentId(); (2)
void setDocumentId(long value);

// title of the resume document
String getTitle(); (2)
void setTitle(String value);

// relation to Person by person.id()
@ForeignKey (target = Person.class, targetProperty = "id") (3)
String getPersonId();
void setPersonId(String value);

// Multi value with former employers
List<String> getEmployers();
void setEmployers(List<String> employers);

MyEnum getEnum();
void setEnum(MyEnum myEnum);

}
1 Definition of the object type to allow Document to upload content
2 A database column is created for this property with a default name DocumentId. The column is readonly, mandatory, autoincrement and unique. The database creates a sequence of integer values. The value is readonly and so immutable. This allows users and 3rd party applications to identify and find the object. If you leave the @Autoincrement annotation the id must be set on creation and is readonly and immutable from that moment on.
3 This annotation specifies a foreign key to class Person
Container Type Example

The following example class is marked as type Container. To use an entity type, we annotate the class using the @Type annotation.

Example of an object definition of entity type Container
@Type(ObjectType.CONTAINER) (1)
public interface Person {
    String getFirstName(); (2)
    void setFirstName(String value);
    @Name("last_name")  (3)
    String getSurname();
    void setSurname(String value);
    @Unique  (4)
    String getVatNumber();
    void setVatNumber(String value);
}
1 Definition of the object type to be Container
2 A database column is created for this property with a default name first_name
3 This annotation specifies the name of the database column, which is different from the default
4 This annotation specifies a unique column, in this case vat_number.

Referencing attributes by name

The system creates a column for each attribute of a type definition in the type definition’s database table. The name of the column will be a snake case representation of the camel case name of the getter method of the attribute. For example, the getter getInvoiceNumber will be mapped to an attribute (and a column) named invoice_number. To make it easy to reference these names in a compile-safe manner, classes with string constants for all type definitions will be generated automatically. For example, for a type definition class called SimpleInvoice a class named SimpleInvoiceNames will be generated in the same package as SimpleInvoice.

The classes containing the constants are generated using an annotation processor that is contained in the library containing the type annotations. The processor is picked up by the compiler automatically.

The following example shows how these constants can be used to perform a search referencing two different attributes.

Example of a search using generated attribute name constants
        EcrSearchService<SimpleInvoice> searchService = serviceClient.asEntitySearchService(); (1)

        List<SimpleInvoice> list = searchService.where() (2)
            .entity().field(SimpleInvoiceNames.INVOICE_NUMBER).like().value("2021-08-*")
            .and()
            .entity().field(SimpleInvoiceNames.AMOUNT).greaterThan().value(90D)
            .holds()
            .unpaged();
1 serviceClient is a TypedDocumentServiceClient obtained using the TypeDefinitionServiceClient
2 A query is formulated using the fluent API of the EQL using the attributes invoice_number and amount

Type Annotations

Table 14. Type Annotations
Annotation Parameter Description

@Type

ObjectType

Define the entity type of your class by setting a valid ObjectType: DOCUMENT, FOLDER, RELATION, CONTAINER, META (Example)

@AccessCheck

boolean

This annotation specifies whether type-based access-checking will be enabled on a type. Default = false, if turned on the type permissions token can be set in the user administration: ECR-TYPE_TypeName_READ/WRITE e.g. ECR-TYPE_PERSON_WRITE or ECR-TYPE_PERSON_READ. The type name must be snake case and represent the table name.

@AclDisabled

boolean

Support for ACLs is enabled by default but can be disabled by annotating the type class with . Additionally, annotating a getter for the ACL-Id system property with @Mandatory enforces the assignment of an ACL to every entity. Meta types do not support ACLs.

@FilingEnabled

boolean

The filing feature makes it possible to assign a document to a folder. The feature is disabled by default and can be activated on typ classes of type DOCUMENT by annotating the class with .

@RetentionProtected

boolean

The retention and litigation hold feature is disabled by default and can be enabled by annotating a type class with. Meta types do not support retention. Example

@OptimisticLocking

boolean

The optimistic locking feature makes it possible for clients to ensure that updates do not overwrite changes made by other clients by accident. The feature is disabled by default and can be enabled by annotating a type class with

@RecycleBin

boolean

The recycle bin feature makes it possible to move entities to the recycle bin and restore them again if required. The feature is disabled by default but can be enabled by annotating a type class with. Recycle Bin

@Recovery

boolean

Enables the recovery log. Content objects or files are deleted after configurable time. Recovery Log

@ContentElement

String

Define the allowed content types. Example

@Audit

boolean

This annotation enables auditing of create-, update- and delete-operations on the type definition.

@Versioned

boolean

This annotation defines if all properties of a type are versioned or not. If the annotation is present on a type and on a getter in the type, the annotation on the getter wins.

@OverwriteAllowed

boolean

By default, arveo creates a new version if the content object of a document is changed. You can always read and restore all older versions of a content element. If overwrite is allowed you can replace a content element and overwrite it on the content store. The old version is lost.

@View

boolean

The metadata type is a database view. (Example)

@Tablename

String

Set the real database table name of a system column which is not camel case but snake case. Example

@SourceType

boolean

This annotation marks a setter method to be setting a property that is part of an update or create call and not a member of the entity itself. Examples are revision commentary or the update counter.

@TargetType

boolean

This annotation specifies the class being the target of a foreign key or relation.

@InheritedProperty

boolean

This annotation marks a property as an inherited property.

@Enumeration

String

This annotation can be used to configure a registered enumeration type.You must pass the database snake case name of the enumeration type (Example)

@EcrIgnore

Ignore Property

This annotation marks a method to be ignored as property or a class to be ignored as type.The property is not stored in the database table.

@NOSql

boolean

This annotation enables full-text support for all columns of the document types. By default, the full-text support is disabled (Example).

Property Annotations

Table 15. Property Annotations
Annotation Parameter Description

@AutoIncrement

boolean

The annotation AutoIncrement indicates that the value of an attribute will be auto-incremented by the database.

@Indexed

String

The annotation ensures, that an index will be created for one or more properties. You must pass the index name as a parameter. When several attributes are annotated to use an index with the same name, a multi-column-index will be created for these columns. Use {@link Index} to configure additional properties of the index.

@Unique

boolean

Defines a unique column. If you try to create an entity with a duplicate value an unique constraint violation is thrown. arveo creates an unique index or an unique constraint on the database and ensures the integrity of the documents. Example

@Mandatory

boolean

Defines a mandatory column. Default = false, the create operation fails with an exception if the property is not set. Example

@Readonly

boolean

The property must be set when the entity is created (like @Mandatory) and cannot be changed afterwards. If a column has the annotations @Readonly and @Unique you have an immutable index value that can be used as business primary key. This ensures that users and third-party systems can clearly identify and find a document. Example

@Versioned

boolean

This annotation defines if an attribute of a type is versioned or not (when placed on a getter). If the annotation is present on a type and on a getter in the type, the annotation on the getter wins.

@Length

Long

This annotation specifies the length of a string or binary attribute

@Precision

Long
Long

This annotation specifies the precision of a decimal, parameter: digits before and after comma

@Casesensitive

boolean

This annotation marks a field of type String as case-sensitive. This effects how searches on this field will be performed. The value itself will always be stored preserving the case.

@DefaultValue

String
default<T>

It is possible to specify default values for properties. Pass the database name of the property (camel vs. snake case!) and define a function returning the required type. If an instance of a type with a field that has a default value specified is created, and a value for that field is not defined, the default value will be used instead. However, if the field is explicitly set to null, then null will be used instead. Example

@DefaultSystemPropertyValue

String
ZonedDateTime

It is possible to calculate the initial value of the retention period and set it as a default value for RETENTION_DATE system column. Pass the database column name "retention_date" and a function returning a ZonedDateTime value. Example

@PrimaryKey

boolean

This annotation marks a custom property as part of the elements primary key. The primary key will be combined of every custom property annotated with this annotation and the system property id. The property will be mandatory.

@SecondaryKey

boolean

This annotation marks a property as secondary. The property is mandatory and unique.

@ForeignKey

String

Defines a foreign key. You must pass the class name and the column for the foreign key. _arveo creates the foreign key on the database and ensures the data integrity of your entities. Example

@CascadeDelete

boolean

It is possible to define foreign keys that cascade a delete operation to the referencing entity. Example

@Systemproperty

SystemPropertyName

To access system properties you can use the annotation @SystemProperty and pass one of the following names (Example).

@FormattedCounter

String

This annotation marks an attribute of type String as a formatted counter. Formatted counters can be used to generate string valued attributes with a counter backed by a sequence as well as a prefix and a suffix. The name of the sequence can be user defined, or it can be auto-generated by the system. Prefix, suffix, and the name of the sequence can contain placeholders. Currently, the only supported placeholder is $date(<format>). The format string is a simple date format string as supported by DateTimeFormatter.ofPattern. Example

@RelationCounter

Class

This annotation marks a property of type Int as a counter for a specific relation type identified by the type definition class.

@EcrIgnore

boolean

This annotation marks a method to be ignored as property or a class to be ignored as type. The property is not stored in the database table.

@NOSql

boolean

This annotation enables or disables full-text support for this property.

Unique Identifiers
To allow users and 3rd party applications to identify and find objects in arveo you should define a unique and immutable property.
The property must be @Unique to ensure that an application can identify the item.
Make the property @ReadOnly to ensure that the identifier is always set and immutable.
Your business application or the user must set the value when the object is created.
Use the @AutoIncrement annotation instead of @Unique and @Readonly if a simple sequential Long id meets your requirements.
If you need a more sophisticated unique identifier you can use the annotation @FormattedCounter which allows you to create e.g. String identifiers like <year>-<sequence> (Example)
If overwrite is turned on it is possible to manipulate the originally saved content and compromise the document without creating a versioned copy. Ensure that the @OverwriteAllowed annotation is not present on legally compliant document types.

Examples

Default Values
Example for a default value definition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
@Type(ObjectType.CONTAINER)
public interface ContainerWithSimpleDefaultProperty {

    String DEFAULT_STRING = "default string"; (2)

    @Mandatory
    String getMyStringField();

    void setMyStringField(String myStringField);

    @DefaultValue("my_string_field") (1)
    default String defaultStringField() {

        return DEFAULT_STRING;
    }

    // ...
    // your custom attribute definitions
    // ...
}
1 With @DefaultValue("my_string_field") the method defaultStringField is defined to return the default value of my_string_field.Note that the reference in the annotation is in snake-case while the actual property getMyStringField is camel-case.
2 In a simple case like this it is considered good practice to declare a constant default value as a public constant.However, the default method does not need to return a constant.For example, date-time fields could use ZonedDateTime.now() to specify the timestamp of the creation as default value.
Index Example

As an example of annotations usage let us define an interface BookIndex with two properties, page and chapter.These properties have to be indexed.

Object annotation of type Meta
@Type(ObjectType.META)
@Index("book-chapter-page-index")
public interface BookIndex {

    @PrimaryKey
    @AutoIncrement
    int getId();

    @Indexed("book-chapter-page-index")
    int getChapter();
    void setChapter(int chapter);

    @Indexed("book-chapter-page-index")
    int getPage();
    void setPage(int page);
}

The above-mentioned properties are thus marked with the annotation @Indexed, which ensures, that an index will be created for these attributes.Here, the annotation @Index on the type is an example of an annotation on a type, described above.

Formatted Counters Example

Using the @FormattedCounter annotation it is possible to define counters with prefix and suffix that are backed by a sequence on the database.There are several properties that can be defined in the annotation:

Property Description

prefix

The prefix used for the counter values.Can contain placeholders.

suffix

The suffix used by the counter values. Can contain placeholders.

digits

The number of digits for the counter. Shorter numbers will be padded with zero.

sequenceName

The name of the sequence to use. Can contain placeholders.

autoGenerateSequences

The number of sequences to auto-generate when the system is started in maintenance mode.

startValue

The start value of the generated sequence(s).

The parameters prefix, suffix and sequenceName support placeholders.Currently, the system supports a placeholder for dates in the form $date(<format>) where format is a java date format string supported by java.time.format.DateTimeFormatter#ofPattern(String)
The autoGenerateSequences property can only be used when the sequenceName contains the placeholder $date(uuuu).It must not contain any other placeholders.

The following example shows a formatted counter attribute used as an invoice number that will produce counter values in the form 2021#0103.It will be backed by a sequence called inv_no_seq_2021.The system will create the next 10 sequences automatically (inv_no_seq_2021 to inv_no_seq_2030).The start value of each sequence will be 100. The sequence to use will be determined automatically because of the date placeholder in the sequenceName property.So on January 1st 2022, the generated counter values will use another prefix and the counter will start over at 100 (2022#0100).Each time the system is started in maintenance mode, it will make sure that sequences for the next 10 years will be present.

Example: Defining a formatted counter attribute
@FormattedCounter(prefix = "$date(uuuu)#", digits = 4, sequenceName = "inv_no_seq_$date(uuuu)", autoGenerateNextSequences = 10, startValue = 100)
String getInvoiceNumber();
Foreign keys with ON DELETE CASCADE example

Add the @CascadeDelete annotation to the getter for the foreign key attribute. For relation types it is possible to add the cascade delete option to the foreign keys to the parent and child of the relation.To do that, add a system property for the parent- and/or child-id and annotate it with @CascadeDelete.

Usage of the annotation @CascadeDelete
// simple foreign key
@CascadeDelete
@Mandatory(false)
@ForeignKey(target = BookIndex.class, targetProperty = "id")
Integer getReferencedIndex();

// parent- and child-id of a relation
@CascadeDelete
@SystemProperty(SystemPropertyName.PARENT_ID)
short getParentId();

@CascadeDelete
@SystemProperty(SystemPropertyName.CHILD_ID)
short getChildId();
The cascade delete option is supported only for entities that are not versioned (hence it cannot be used on Document types) and do not support retention or inheritance.It is also not possible to inherit attribute values from a type definition that has a foreign key with the cascade delete option.
Property-like system fields

If a getter for a system field is defined, then it is possible to define a setter, if the system field is property like. The following fields are property-like:

  • acl_id;

  • retention_date.

The following listing shows the definition of a getter and a setter method on a property-like field.

Example of property-like fields
public interface Secured {

    @SystemProperty(SystemPropertyName.ACL_ID)
    AccessControlListId getAclId();

    void setAclId(AccessControlListId aclId);

}

Define a view

To define your type as a view or a partial view, you have to annotate your type with @View or @PartialView.The @View annotation specifies whether the defined type is a view i.e. whether it should create the tables for it.The @PartialView annotation marks a class to be a partial view of the type definition created by another class via the @Type annotation.Partial views can be used for updates and selects with limited select clauses.No tables will be created for classes annotated this way.The interfaces that are to be defined as views of an object type, have to be registered on the interface, representing this object type.For instance, if an interface NamedFile inherits from the interface NamedEntity, and NamedEntity is a partial view of NamedFile, it has to be registered on the object from which it inherits:

@PartialView(NamedFile.class)
public interface NamedEntity {
    //...
}

Note: An interface may also be a partial view of more than one type definitions.

External Views

It is possible to expose tables that are under control of other applications to arveo and include them in its type system.This assumes that the given tables are in the same database schema as the tables of arveo.Also, one needs to know the name of these tables as well as their types.In this case one can define a meta-type annotated with @View.

External views will only be read from ecr.It will never write to an external view.

For example, the access-control-service defines several tables, one of it named usrv_acl.In this table there are - amongst others - two fields: id (a bigint) and name (a varchar).With this knowledge one can define the following external view:

Example for an external view
1
2
3
4
5
6
7
8
9
10
11
@View (1)
@Type(ObjectType.META) (2)
@TableName("usrv_acl") (3)
public interface AclView {

    @Unique
    String getName(); (4)

    @PrimaryKey
    long getId(); (5)
}
1 We annotate the class with @View to declare it as an external view.
2 Specifying the type as a META type is good practice, since every other type would expect specific system fields.
3 Specifying the table name is good practice here, since an external table most likely follows its own name convention. However, it would be possible to omit the @TableName annotation here and instead name the class UsrvAcl.
4 Since we know that the table usrv_acl has a field name of type varchar we can define the property name of type String.
5 We know that the table usrv_acl has a field id of type bigint, so we specify a java property accordingly.

NoSQL Example

You can write the annotation @NOSql to the type definitions, which should also be created in the solr schema, so that the whole class is created with its fields.

Example of usage of the @NOSql annotation
@Type(ObjectType.CONTAINER)
@NOSql
public interface PersonSimple {
    String getFirstName();
    void setFirstName(String value);
    String getLastName();
    void setLastName(String value);
}

If you don’t want to create a field, you can disable it with the annotation @NOSql(value = false).

Example of usage of the @NOSql annotation with value set to false
@Type(ObjectType.CONTAINER)
@NOSql
public interface PersonSimple {
    String getFirstName();
    void setFirstName(String value);
   @NOSql(value = false)
    String getLastName();
    void setLastName(String value);
}

@SystemProperty Annotation

To access system properties you can use the annotation @SystemProperty and pass one of the following names (Retention Information Getter)

general system fields:

  • ID: The unique identifier of the entity. Use on EcrId properties (or subclasses as applicable). Can be used on any entity

  • CREATION_DATE: The date and time the relation was created. Use on ZonedDateTime properties. Can only be used on relations.

  • CREATOR_USER_ID: The id of the user that created this relation. Use on UserId properties. Can only be used on relations.

  • ACL_ID: The id of the ACL currently assigned to the entity. Might be null. This is not supported for metadata entities.

  • ACL_RIGHT: The resolved right based on the ACL currently assigned to the entity and the current user. This is not supported for metadata entities.

  • RETENTION_INFO: Information about the retention properties of the entity. It contains the RETENTION_DATE and the LITIGATION_HOLD flag described below. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.

  • RETENTION_DATE: The retention date defines the minimum storage date i.e. the related object can not be deleted until after this date passed. The the storage period may be extended but never shortened. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.

  • LITIGATION_HOLD:A flag that indicates whether a document is related to a litigation. If the flag is set the document must never be deleted - even if the retention date has passed by. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.

versioned system fields:

  • VERSION_NUMBER: The number of the version of the versioned entity. Use on int/Integer properties. Can only be used on versioned entities.

  • VERSION_ID: The unique identifier of the version of the entity. Use on VersionId properties. Can only be used on versioned entities

  • UPDATE_COUNTER: A counter that is incremented each time an entity is updated. It is used for the optimistic locking feature and therefore is only available on type definitions that use optimistic locking.

  • IS_CURRENT_VERSION: A boolean that indicates whether the entity was the current version at the time it was loaded from the backend. Can only be used on versioned entities.

  • MODIFICATION_INFO: Information about the date and time as well as the user of the first and last modification of the entity. Use on ModificationInformation properties. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations.

document system fields:

  • CONTENT: Information about the content of the document. Use on Map<String, ContentInformation> properties. Can only be used on documents.

  • CONTAINING_FOLDER: The id of the folder containing the document (if any). Use on FolderId properties. Can only be used on documents.

folder system fields:

  • FOLDER_NAME: The name of the folder. Use on String properties. Can only be used on folders.

  • PARENT_FOLDER: The id of this folders parent. Use on FolderId properties. Can only be used on folders.

relation system fields:

  • PARENT_ID: The id of the parent of this relation. Use on TypedId properties (or applicable subclasses). Can only be used on relations.

  • PARENT_VERSION_ID: The version-id of the parent of this relation. Use on VersionId properties. Can only be used on relation types that support relations to or from versions.

  • CHILD_ID: The id of the child of this relation. Use on TypedId properties (or applicable subclasses). Can only be used on relations.

  • CHILD_VERSION_ID: The version-id of the child of this relation. Use on VersionId properties. Can only be used on relation types that support relations to or from versions.

Data Types

Table 16. Property Data Types
Java Type Database Type Description

String

text

Unlimited unicode text. Limit the length with @Length annotation

Long or long

bigint

64 bit long value, Long = null is allowed

Double or double

double

double value, Double = null is allowed

Boolean or boolean

boolean

Boolean value, Boolean = 3 state boolean

Decimal or decimal

decimal( precision)

Decimal value, Decimal = null is allowed, add @Precision annotation

UUID

uuid

uuid type

byte[ length ]

bytea

Binary data with a length, specified by a java int (max. 4 gb).

String

text

String based ID with a non-null length.

EnumerationType

EnumerationType

arveo creates an enumeration object on postgreSQL 12.

ZonedDateTime

datetime

arveo stores a GMT based date time value in postgreSQL 12

LocalDate

datetime

arveo stores a date time value in postgreSQL 12, but only the date is relevant

LocalTime

datetime

arveo stores a date time value in postgreSQL 12, but only the time is relevant

List<String>

array(text)

_arveo stores multiple text values in an array column of postgreSQL 12.

List<Long>

array(bigint)

_arveo stores multiple bigint values in an array column of postgreSQL 12.

By default, postgreSQL 12 does not limit the length of String values. Typically, it is not necessary to define a length using the @Length annotation because postgreSQL 12 does handle Strings of all length very well.
Your strings should have a length up to 4 kByte. Even larger strings are allowed, but you should take care that you do not inadvertently consume too much data space if you store very large strings.
List data types allow you to store more than String or long value for a property. You can search for each value using the array search operation of the arveo query language.
Enumeration data types allow you to set one or more values from a fixed set of values.

System Properties

The following chapter describes types of system properties in arveo.

There are different types of system properties:

  • General system properties (e.g. ID, creation date)

  • Versioned entity system properties (e.g. the latest version ID, the last modification date)

  • Document system properties (content specific fields)

  • Folder system properties (e.g. parent ID)

  • Relation system properties (e.g. the parent- and child ID)

  • Version System properties (e.g. the version ID or the version number)

System Property Names

All system columns in the database are snake case but not camel case. e.g. the Java RetentionDate variable is persisted as "retention_date".
Table 17. General system properties:
Name Database Type Description

id

bigint

The unique identifier of the entity. Use EcrId properties (or subclasses as applicable). Can be used on any entity and is applied by arveo for all types but metadata.

acl_id

bigint

The id of the ACL currently assigned to the entity. Might be null. This is not supported for metadata entities or type with disabled ACLs.

Table 18. Versioned system properties:
Name Database Type Description

version_number

bigint

The sequential number of the version of the versioned entity. Use on int/Integer properties.

version_id

bigint

The unique identifier of the version of the entity. Use on VersionId properties.

update_counter

bigint

A counter that is incremented each time an entity is updated. It is used for the optimistic locking feature and therefore is only available on type definitions that use optimistic locking.

creation_date

datetime

GMT timestamp when the object was created, precision (1/100 second)

modification_date

datetime

GMT timestamp when the object was created, precision (1/100 second)

creator_user_id

bigint

The ID of the user who created the object (User Management)

modification_user_id

bigint

The ID of the user who created the object (User Management)

retention_date

datetime

The GMT based retention timestamp defines the minimum storage date i.e. the related object can not be deleted until after this date passed. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected (Retention)

litigation_hold

boolean

The boolean indicates whether a document is related to a litigation. If the flag is set the document must never be deleted - even if the retention date has passed by. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected (Retention)

Table 19. Document system properties:
Name Database Type Description

content

json

JSON containing content properties:
ID : unique id of the content
Hash : SHA256 hash of the content stream
Hash-Algorithm: Algorithm of the hash
MediaType : mime type of the content, e.g. octet-stream
Creation: GMT based ZonedDateTime timestamp of the creation of the object
FileName: Name of the file, if stored on a file system storage
Size: bigint value containing the size of the content stream in bytes

parent_id

bigint

The id of the folder containing the document (if any). Use on FolderId properties. Can only be used on documents.

Table 20. Folder system properties:
Name Database Type Description

folder_name

String

The name of the folder. Use on String properties. Can only be used on folders.

parent_id

bigint

The id of this folders parent. Use on FolderId properties. Can only be used on folders.

Table 21. Relation system properties:
Name Database Type Description

parent_id

bigint

The id of the parent of this relation. Use on TypedId properties (or applicable subclasses)

parent_version_id

bigint

The version-id of the parent of this relation. Use on VersionId properties. Can only be used on relation types that support relations to or from versions.

child_id

bigint

The id of the child of this relation. Use on TypedId properties (or applicable subclasses).

child_version_id

bigint

The version-id of the child of this relation. Use on VersionId properties. Can only be used on relation types that support relations to or from versions.

Document Type

The following chapter provides a more detailed overview of the type Document.

A Document is one of five entity types supported by the arveo system.Unlike the other entity types, documents are always versioned too keep track of changes of the binary content. A Document consists of the following components:

  • Technical metadata, which is filled by arveo and cannot be changed, see System properties

  • Typed metadata as defined in the annotated interface (the type definition)

  • 0-n content objects: A content object has a content type that is freely configured in the system.A maximum of one element can be inserted per content type.Examples of content types are: original object, rendition, full text, text notes, XML properties, etc.

  • content metadata like content size, mime-type and hash

  • 0-n annotations per content object: Only for image objects (TIFF, JPEG, PNG, BMP, PDF/A) annotations can be created in a layer independent of the document.

Any number of versions can be created for a document.All the versions are traceable in the repository and can be referenced via independent system-wide unique IDs.

Container Type

The following chapter provides a more detailed overview of the type Container.

A Container is an object without content. It supports all system managed metadata attributes and custom attributes defined by the type definition. It is called 'container' because it’s primary use case is to serve as an entity that contains custom metadata and that is related to other entities like a document via foreign keys or relations.

Use container objects to build records and cases that contain documents.You can map the relationship between file, case and documents either as a foreign key (@ForeignKey_ annotation) or using the relation type objects (xref:relation-type.adoc#Relation Type).
If you use Foreign keys to create the relationship between objects you can inherit values from the parent to its children (Inheritance)

Containers can be versioned.A Container consists of the following components:

  • Technical meta information, which is filled by arveo and cannot be changed, see System properties

  • Typed container type metadata according to the type definition of the container type

Any number of versions can be created for a container.All the versions are traceable in the repository and can be referenced via independent IDs.

Relation Type

The following chapter provides a more detailed overview of the type Relation.

A relation represents a connection between two entities (document, container, folder or meta). It is directed, having a parent and a child and it can contain custom metadata attributes. A relation type must specify the type of the parent and child entities. Any number of versions can be created for a relation. All the versions are traceable in the repository and can be referenced via independent IDs.

Changes of the child-id or parent-id are not tracked in the version table.
Data model of a relation
 +------------+               +--------+-----+              +------------+
 |   Parent   |               +   Relation   +              |   Child    |
 |------------|      source   |--------------|  target      |------------|
 |            |<--------------|              |------------->|            |
 | attributes |               |  attributes  |              | attributes |
 |            |               |              |              |            |
 +---+--------+               +--------------+              +------------+
Example: A relation type definition
@Type(ObjectType.RELATION) (1)
@SourceType(Customer.class) (2)
@TargetType(Invoice.class) (3)
public interface CustomerInvoiceRelation {

    @SystemProperty(SystemPropertyName.CHILD_ID) (4)
    @InputProperty(InputPropertyName.RELATION_CHILD) (5)
    DocumentId getChildId();

    void setChildId(DocumentId childId);

    @SystemProperty(SystemPropertyName.PARENT_ID) (6)
    @InputProperty(InputPropertyName.RELATION_PARENT) (7)
    ContainerId getParentId();

    void setParentId(ContainerId parentId);

    String getStatus();

    void setStatus(String status);
}
1 Specifies that the type definition is used for relations
2 Defines the type of the source or parent of the relation
3 Defines the type of the target or child of the relation
4 Marks an attribute to return the value of the childId property of the relation
5 Marks an attribute to set the value of the childId property of the relation
6 Marks an attribute to return the value of the parentId property of the relation
7 Marks an attribute to set the value of the parentId property of the relation

Relations vs. foreign keys

Instead of using relations, it is possible to model a dependency between two entities using foreign keys. The key difference between the two approaches is that a relation can carry its own metadata attributes, which a foreign key can not. This possibility requires an additional database table (or two, in case of versioned relations) for a relation, which might have a negative impact on the performance. If the dependency between the two entities does not require its own metadata attributes (and is not a many-to-many relation), it is recommended to use foreign keys instead of relations.

Foreign keys can be defined by adding the @ForeignKey annotation to an attribute in a type definition. The targetProperty attribute of the annotation must point to the ID or to a custom metadata attribute with a unique constraint of the target type. The type of the annotated attribute must match the type of the target property of the foreign key. The chapter Foreign Keys contains a more detailed overview of the foreign key feature.

Example: Defining a foreign key
@ForeignKey(name = "fk_invoice_customer", target = Customer.class, targetProperty = "id")
long getCustomerNumber();
Data model of a foreign key relationship
 +------------+                 +------------+
 |   Parent   |                 |   Child    |
 |------------|   foreign key   |------------|
 |            |---------------->|            |
 | attributes |                 | attributes |
 |            |                 |            |
 +---+--------+                 +------------+

Relations to versions

By default, a relation can point to the current version or to a specific version of its parent or child, when the parent- or child-type supports versions. This behavior can be controlled by the supportedNodeVersion property of the @Source and @Target annotations used for relation type definitions. The attribute supports three different values (defined in de.eitco.ecr.type.definition.annotations.reference.SupportedNodeVersion):

Table 22. Possible values of the supportedNodeVersion attribute
Value Meaning

CURRENT_VERSION

The relation must point to the current version of the node identified by the node’s ID (NOT the VersionId of the current version)

SPECIFIC_VERSION

The relation must point to a specific version of the node identified by it’s VersionId.

CURRENT_OR_SPECIFIC_VERSION

The relation can point to either the current version or a specific version of the node. This is the default.

Unique relations

A single relation always has exactly one parent and one child. However, by default a single entity can be the parent or child of multiple relations (many-to-many). By adding unique constraints to the parentId and/or childId system properties of the relation type, it is possible to define one-to-many, many-to-one or one-to-one relations.

Example: Adding a unique constraint to the child ID of a relation
@SystemProperty(SystemPropertyName.CHILD_ID)
@Unique(constraintName = "uccr_parent_child_uc")
ContainerId getChildId();

Relation counters

By using the @RelationCounter annotation it is possible to create counters on the parent- and child-entities for both incoming and outgoing relations. The counters are persisted in the database and are updated automatically when relations are added or removed.

The @RelationCounter annotation contains two attributes: The relationType attribute defines the type of relation to count and the direction attribute defines whether to count incoming (the entity is the child or target of the relation) or outgoing (the entity is the parent or source of the relation). By annotating the relation counter attribute with @Versioned it is possible to control whether the counter attribute is stored in the version table for each version or in the main table for all versions. When the counter is stored in the version table it will contain the count for a single version of the entity. If it is stored in the main table it will contain the count for all versions of the entity. The following example shows how to define relation counter attributes. The @Name annotation is used because the attribute name is too long for a database column name.

Example: Defining relation counter attributes
@RelationCounter(relationType = TypedContainerContainerRelation.class, direction = RelationCounterDirection.INCOMING)
@Versioned(false)
int getIncomingRelationCounter();

@RelationCounter(relationType = TypedContainerContainerRelation.class, direction = RelationCounterDirection.INCOMING)
@Versioned
@Name("v_in_relation_counter")
int getVersionedIncomingRelationCounter();

Working with relations

The $arveo API provides several methods that can be used to create, modify and resolve relations. Relations itself are treated just like any other entity type. Entities, that can be the parent or child of a relation (containers, folders, documents and meta data entities), provide additional relation-specific methods in the client API. The available methods are defined in the interface de.eitco.ecr.sdk.TypedBaseRelationNodeEntityClient, which is a super interface of the clients used in the API for documents, folders, containers and meta data entities. The injectable de.eitco.ecr.sdk.SearchClient offers additional methods to search for relations using filters on the relation, the parent or the child.

Folder Type

The following chapter provides a more detailed overview of the type Folder.

A folder is an entity that is organized in a file system like tree structure.A folder can contain custom metadata attributes.Documents can be filed in a folder.

A Folder consists of the following components:

  • Technical meta information, which is filled by arveo and cannot be changed, see System properties

  • Typed folder type metadata according to a schema defined for the document type

Any number of versions can be created for a folder.All the versions are traceable in the repository and can be referenced via independent IDs.

Only documents can be filed in a folder. To enable the filing feature, add the @FilingEnabled annotation to your document type.

Metadata Type

Metadata types are used e.g. to connect external tables.They do not contain any specific system fields and no typed ID as a primary key.The database table can be created by the arveo or an existing table can be used.

Use the @View annotation to mark a metadata type as a view for which the system should not create a table and use the @TableName annotation to define the name of the table of the external system.
Metadata types do not support versioning and retention protection.
You can use the @PrimaryKey annotation to define one or more properties of a metadata type to be the primary key.

Inheritance

Simple direct inheritance

The following chapter describes the inheritance scheme, used in arveo. The object to be inherited and its initial state is shown in the following table.

Table 23. The object to be inherited
Create Initial state

Company

ID (Company)

-

888

Name

CTuX

CTuX

CountryCode

DE

DE

PhoneNumber

-

[NULL]

The following table describes direct inheritance (hence with no intermediate objects). Here, Invoice is an object that inherited from Company. The following table describes its initial state, and the update status after 3 different updates.

Table 24. Inheritance Scheme
Create Initial state Update 1 After Update 1 Update 2 After Update 2 Update 3 After Update 3 Update 4 After Update 4

Invoice

ID (Invoice)

-

931

-

931

-

931

-

931

-

931

InvoiceNumber

EIT-53

EIT-53

-

EIT-53

-

EIT-53

-

-

-

EIT-53

companyID

-

[NULL]

888

888

[NULL]

[NULL]

[NULL]

[NULL]

-

[NULL]

companyName

-

[NULL]

SAP

CTuX

Eitco

Eitco

-

[NULL]

-

[NULL]

companyCountryCode

-

[NULL]

-

DE

-

[NULL]

-

[NULL]

-

[NULL]

companyPhone

-

[NULL]

+49 (30) 408191-425

[NULL]

+49 (30) 408191-425

+49 (30) 408191-425

-

[NULL]

+41 123456

+41 123456

Error: no change!

Not possible: faulty update parameters!

Note the following principles:

After update2: All inherited fields are NULLs if inheritance key is set to NULL, unless values are explicitly specified. After update3: All inherited fields are NULLs if inheritance key is set to NULL, unless values are explicitly specified. - Even if the inheritance key was already NULL before.

Multilevel Inheritance

This inheritance form has an object to be inherited from, just like the direct inheritance. An objects inherits from it, after that another object inherits from the second object. The initial object is still the same, its initial state is described in the table above.

In the following table, the second object Creditor, which inherits from the first object, is described.

Table 25. The object to be inherited and inheriting
Create Initial state

Creditor

ID (Creditor)

-

999

CreditorNumber

471147114711

471147114711

CompanyID

888

888

companyName

-

CTuX

companyCountryCode

-

DE

companyPhone

-

[NULL]

In the table above, the object Creditor inherited the following properites through the companyID: companyName, companyCountryCode, companyPhone.

The results of multilevel inheritance through an intermediate object are shown in the table below:

Table 26. Inheritance Scheme in Multilevel inheritance
Create Initial state Update 1 After Update 1 Update 2 After Update 2

Invoice

ID (Invoice)

-

931

-

931

-

931

InvoiceNumber

EIT-11

EIT-11

-

EIT-11

-

EIT-11

creditorID

-

[NULL]

999

999

[NULL]

[NULL]

companyName

-

[NULL]

SAP

CTuX

Eitco

EITCO

companyCountryCode

-

[NULL]

-

DE

-

[NULL]

companyPhone

-

[NULL]

+49 (30) 408191-425

[NULL]

+49 (30) 408191-425

+49 (30) 408191-425

Indirect inheritance

The third form of inheritance is indirect inheritance. It is much like the second form, only the inheriting object inherits the IDs of both objects it inherits from. In the example above, the object Invoice inherits both the creditorID and the companyID.

In the following table, the object Creditor is described.

Table 27. An object to be inherited
Create Initial state

Creditor

ID (Creditor)

-

999

CreditorNumber

471147114711

471147114711

CompanyID

888

888

The table below describes the mechanism of indirect inheritance.

Table 28. Inheritance Scheme in Indirect inheritance
Create Initial state Update 1 After Update 1 Update 2 After Update 2 Update 2a After Update 2a

Invoice

ID (Invoice)

-

931

-

931

-

931

-

931

InvoiceNumber

EIT-11

EIT-11

-

EIT-11

-

EIT-11

-

EIT-11

creditorID

-

[NULL]

999

999

[NULL]

[NULL]

[NULL]

[NULL]

companyID

-

[NULL]

-

888

-

888

[NULL]

[NULL]

companyName

-

[NULL]

SAP

CTuX

Eitco

CTuX

Eitco

EITCO

companyCountryCode

-

[NULL]

-

DE

-

DE

-

DE

companyPhone

-

[NULL]

+49 (30) 408191-425

[NULL]

+49 (30) 408191-425

[NULL]

+49 (30) 408191-425

+49 (30) 408191-425

This form of inheritance is currently not needed and therefore not supported by ECR.

Inheritance of ACLs

The acl_id system field is property-like, thus a type can define it as inherited. This permits scenarios where there is one main entity providing the access definition with several entities being linked to it. If the acl of the main entity changes, the ACLs of the linked entities change as well:

Example for the main entity
1
2
3
4
5
6
7
8
9
10
11
12
13
@Type(ObjectType.CONTAINER)
public interface MainEntity {

    @Mandatory (2)
    @SystemProperty(SystemPropertyName.ACL_ID) (1)
    AccessControlListId getMainAcl(); (7)

    void setMainAcl(AccessControlListId id);

    // ...
    // your custom attribute definitions
    // ...
}
Example for linked entities
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@Type(ObjectType.DOCUMENT)
public interface ChildEntity {

    @Mandatory (4)
    @ForeignKey(target = MainEntity.class, targetProperty = "id") (3)
    ContainerId getCurrentMainEntity(); (6)

    void setCurrentMainEntity(ContainerId mainEntity);


    @SystemProperty(SystemPropertyName.ACL_ID)
    @InheritedProperty(foreignKeyPropertyName = "current_main_entity", sourcePropertyName = "acl_id") (5)
    AccessControlListId getAcl();

    // ...
    // your custom attribute definitions
    // ...
}
1 The main entity defines a property that accesses the ACL system property.
2 This property is defined mandatory - thus the main entity will always have an ACL.
3 The child entity defines a foreign key to the main entity.
4 By specifying the foreign key property as mandatory, every child entity will be linked to a main entity
5 Now we can specify an ACL property being inherited.
6 Note that foreignKeyPropertyName (in line 12) is written in snake-case while the actual property getter is written in camel-case.
7 Note further, that while the property referenced is actually defined by the getter getMainAcl (MainEntity line 6), sourcePropertyName is set to the name of the system field "acl_id" to derive the property.

Let’s see this behaviour in action. Assume that we have a TypeDefinitionServiceClient named typeDefintionServiceClient and also the ids of two ACLs (firstAclId and differenceAclId). First we can create service Clients for the two types defined above:

The annotation @DefaultValue() only accepts the database column name as static string parameter. As the document type properties are CAMEL case and the database column names are SNAKE case you must convert your properties e.g. MyCamelCaseProperty = my_camel_case_property.
1
2
3
4
        TypedContainerServiceClient<MainEntity> mainEntityServiceClient =
            typeDefinitionServiceClient.getContainerServiceClient().byClass(MainEntity.class);
        TypedDocumentServiceClient<ChildEntity> childEntityServiceClient =
            typeDefinitionServiceClient.getDocumentServiceClient().byClass(ChildEntity.class);

With these service clients we can now create several entity instances of MainEntity and ChildEntity:

1
2
3
4
5
6
7
8
9
10
11
12
13
        MainEntity mainEntity = mainEntityServiceClient.createTypeInstance();
        mainEntity.setMainAcl(firstAclId);
        TypedContainerClient<MainEntity> mainEntityClient = mainEntityServiceClient.createEntity(mainEntity);

        ChildEntity childEntity1 = childEntityServiceClient.createTypeInstance();
        childEntity1.setCurrentMainEntity(mainEntityClient.getIdentifier());
        TypedDocumentClient<ChildEntity> childEntityClient1 = childEntityServiceClient.createEntity(childEntity1);

        // ...

        ChildEntity childEntityN = childEntityServiceClient.createTypeInstance();
        childEntityN.setCurrentMainEntity(mainEntityClient.getIdentifier());
        TypedDocumentClient<ChildEntity> childEntityClientN = childEntityServiceClient.createEntity(childEntityN);

The instances of ChildEntity will automatically have the same ACL as mainEntity:

1
2
3
        Assert.assertEquals(childEntityClient1.getEntity().getAcl(), firstAclId);
        // ...
        Assert.assertEquals(childEntityClientN.getEntity().getAcl(), firstAclId);

If the ACL of the parent is updated…​

1
2
        mainEntity.setMainAcl(differentAclId);
        mainEntityClient.updateAttributes(mainEntity);

…​then the ACLs the instances of ChildEntity change as well:

1
2
3
4
5
6
7
8
        childEntityClient1 = childEntityClient1.reload();
        // ...
        childEntityClientN = childEntityClientN.reload();

        Assert.assertEquals(childEntityClient1.getEntity().getAcl(), differentAclId);
        // ...
        Assert.assertEquals(childEntityClientN.getEntity().getAcl(), differentAclId);

Default ACLs and Inheritance

In many cases it will be desirable to be able to specify a default ACL for a given type. But the naive approach for defining a default ACL will prove cumbersome:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Type(ObjectType.CONTAINER)
public interface ContainerWithDefaultAcl extends WithData {

    @Mandatory
    @SystemProperty(SystemPropertyName.ACL_ID)
    long getAclId();

    void setAclId(long aclId);

    @DefaultValue("acl_id") (1)
    default long defaultAcl() {

        return ?? (2)
    }
}
1 Of course one can define the ACL system property with a default value.
2 However, when specifying the default value one faces a problem. The id of an ACL is set by the access-control-service automatically and will vary from deployment to deployment, even between test and production environments.

However, the concepts presented so far can be used for a better solution. The main idea is to specify the ACL by its name instead of its id. For that we will need access to a table containing ACL names and their respective ids. Here external views can be used. We have already seen an external view exposing the ACL table to arveo:

Defintion of a view to the acl table
1
2
3
4
5
6
7
8
9
10
11
@View
@Type(ObjectType.META)
@TableName("usrv_acl")
public interface AclView {

    @Unique
    String getName();

    @PrimaryKey
    long getId();
}

Since that exposes ACLs as arveo type instances, ACLs can be used for inheritance. And since ACL names are unique they can be used as a foreign key, particularly as one defining inheritance. That way the actual ACL id can be inherited by a key that is an ACL name, for which we can easily define a default value that is stable over every environment:

A sofisticated example of an ACL default value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
@Type(ObjectType.CONTAINER)
public interface ContainerWithDefaultAcl extends WithData {

    String DEFAULT_ACL_NAME = "default-container-acl"; (6)

    @Id
    ContainerId getId();

    @Optional (7)
    @ForeignKey(target = AclView.class, targetProperty = "name") (2)
    String getAcl(); (1)

    void setAcl(String acl);

    @DefaultValue("acl")
    default String defaultAcl() { (3)

        return DEFAULT_ACL_NAME;
    }

    @Mandatory (7)
    @InheritedProperty(foreignKeyPropertyName = "acl", sourcePropertyName = "id") (5)
    @SystemProperty(SystemPropertyName.ACL_ID)
    long getAclId(); (4)

    void setAclId(long aclId);
}
1 In our type we define a property ACL, that holds the name of the ACL.
2 This property is a foreign key that targets the field name table usr_acl.
3 For this property we can easily specify a default value.
4 Now we specify the ACL property.
5 It is simply defined to be inherited by the foreign key to the ACL table.
6 It is good practice to store constant default values in constants.
7 Marking the ACL id as @Mandatory enforces that every instance of the entity must have an ACL. However, this does not need to be an inherited one (since the ACL name is marked @Optional). So the more cumbersome way - to set the ACL by its id - is still possible. Marking the ACL propery as @Mandatory would forbid this.

Retention

Annotations_@RetentionProtected_

An object may be annotated as @RetentionProtected.This will enable all further retention annotations listed below.Every retention enabled object extends the data model by

  • Datetime Retention_Date: contains the fixed retention period as ZonedDateTime format

  • Boolean LitigationHold: stores the litigation hold property

The convenience class 'Retention_Info' contains both values and can be used to read the retention information with one call.

Annotations_@DefaultSystemPropertyValue(RETENTION_DATE)_

It is possible to define a default value for the RETENTION_DATE system column (Default Values]).

If a retention date is not explicitly set, a default value for the retention period is calculated using the default value function implemented by the document type.

1
2
3
4
@DefaultSystemPropertyValue(SystemPropertyName.RETENTION_DATE)
default ZonedDateTime defaultDatum() {
return ZonedDateTime.Now().plusYears(10);
}
@RetentionProtected annotations is required if you want to set a default for retention_date.
If you have defined foreign keys, you can inherit the retention date from container or folder objects.This is very helpful if you have records in your data model (Defaults and Inheritance).

Examples

Document Type: 10 year retention period

The following example shows how to set the default retention to creation date + 10 years. It also shows how to set a default value for the property warrantyEnd based on the ReceiptDate + 3 years.

It is still possible to set the Retention_date and warrantyEnd when you upload the document and overwrite the default value.
Example: Upload a document with new content
/*
 * Copyright (c) 2020 EITCO GmbH
 * All rights reserved.
 *
 * Created on 02.10.2020
 *
 */
package de.eitco.ecr.system.test.types.defaultvalues;

import de.eitco.ecr.common.RetentionInformation;
import de.eitco.ecr.type.definition.annotations.*;
import de.eitco.ecr.type.definition.annotations.constraint.Mandatory;
import de.eitco.ecr.type.definition.annotations.constraint.SecondaryKey;
import de.eitco.ecr.type.definition.annotations.defaults.DefaultSystemPropertyValue;
import de.eitco.ecr.type.definition.annotations.defaults.DefaultValue;
import de.eitco.ecr.type.definition.annotations.system.Id;
import de.eitco.ecr.type.definition.annotations.system.RetentionProtected;
import de.eitco.ecr.type.definition.annotations.system.SystemProperty;
import de.eitco.ecr.type.definition.annotations.system.SystemPropertyName;
import org.springframework.http.MediaType;
import org.springframework.util.MimeType;

import java.time.ZoneId;
import java.time.ZonedDateTime;

@Type(ObjectType.DOCUMENT)
@RetentionProtected
@ContentElement(defaultDefinition = true, separateField = true)
@OverwriteAllowed
public interface DocumentWithDefaultRetention {

    @Id
    Object identifier();

    @SystemProperty(value = SystemPropertyName.RETENTION_INFO)
    RetentionInformation getRetentionInformation();

    @SystemProperty(value = SystemPropertyName.RETENTION_DATE)
    ZonedDateTime getRetentionDate();

    void setRetentionDate(ZonedDateTime retentionDate);

    @SystemProperty(value = SystemPropertyName.LITIGATION_HOLD)
    Boolean getLitigationHold();

    @SecondaryKey
    String getName();

    void setName(String name);

    @Mandatory
    ZonedDateTime getReceiptDate();

    void setReceiptDate(ZonedDateTime receiptDate);

    @Mandatory
    ZonedDateTime getWarrantyEnd();

    void setWarrantyEnd(ZonedDateTime warrantyEnd);

    @Mandatory
    String getMimeType();

    void setMimeType(String value);


    // helper for snake case db column names based on camel case getter/setter names
    // attenttion you MUST use snake db column names in default value annotations! if the name is wrong you will get a model exception during start up
    String DB_COL_WARRANTYEND = "warranty_end"; (1)
    String DB_COL_MIMETYPE = "mime_type";
    String DB_COL_RECEIPTDATE = "receipt_date";
    String DB_COL_NAME = "name";
    String DB_COL_RETENTIONDATE = "retention_date";

    ZoneId ZoneIdEuropeBerlin = ZoneId.of("Europe/Berlin");

    // set default values
    @DefaultValue(DB_COL_WARRANTYEND)
    default ZonedDateTime defaultWarrantyEnd() {

        return getReceiptDate().withZoneSameInstant(ZoneIdEuropeBerlin).plusYears(3);
    }

    @DefaultSystemPropertyValue(SystemPropertyName.RETENTION_DATE)
    default ZonedDateTime defaultRetentionDate() {

        return ZonedDateTime.now(ZoneIdEuropeBerlin).plusYears(10);
    }

    @DefaultValue(DB_COL_MIMETYPE)
    default String defaultMimeType() {
        return MediaType.APPLICATION_OCTET_STREAM_VALUE;
    }

}
(1) The annotation @DefaultValue() only accepts the database column name as static string parameter.As the document type properties are CAMEL case and the database column names are SNAKE case you must convert your properties e.g. MyCamelCaseProperty = my_camel_case_property.In the below example constants are defined in the type.
The retention annotations also work for the document types: container, folder and relation.

Advanced database schema changes

Simple changes of the database schema like adding a new attribute are performed automatically by the system in maintenance mode. In some cases it might be required to perform more complex schema changes, which cannot be handled by the system automatically. For example, changing the data type of an attribute is not supported because it usually requires project specific migration steps. Advanced changes like this can be performed by custom liquibase scripts.

To perform custom database schema migrations, arveo offers several ways to define custom liquibase migration scripts:

  • A global script that will be executed before the first type definition will be created or updated. This script can be configured using the property ecr.server.liquibase.preInitializationChangeLog.

  • A global script that will be executed after the last type definition was created or updated. This script can be configured using the property ecr.server.liquibase.customChangeLog.

  • A script for a specific type definition that will be executed before the type definition is created or updated. This script can be configured using the annotation @PreSchemaInitialization on the class representing the type definition.

  • A script for a specific type definition that will be executed after the type definition was created or updated. This script can be configured using the annotation @PostSchemaInitialization on the class representing the type definition.

The values of the configuration properties for the global scripts and the annotations must be valid URIs pointing to a liquibase changelog script. The URIs can point to a filesystem resource (using file:/) or a classpath resource (using classpath:). Each script will be executed in every configured tenant.

Schema initialization steps

For a better understanding of how the schema initialization works, the following list shows the steps performed by the system at startup:

  • for each tenant:

    1. Create or update the system tables

    2. Execute custom pre initialization changelog if configured

    3. For each registered type definition class:

      1. Execute custom class-specific pre schema initialization script if configured

      2. Create or update the type definition table(s)

      3. Execute custom class-specific post schema initialization script if configured

    4. Execute custom liquibase changelog if configured

Note that the actions performed by the automatic schema initialization in step 3.b. can be influenced by the changes that were already performed by the custom scripts executed before. For example, the system will not try to create a new attribute if the custom script has already performed the required schema changes.

Example

The following example shows a type definition class that defines a custom script that will be executed before the type definition is updated. The script expects that the type definition table already exists on the database and is used to change the data type of the attribute postal_code from Long to String. Note that for the sake of simplicity, the script does not perform an actual data migration but simply drops and re-creates the database column for the attribute.

Example for a type definition with custom pre schema initialization script
1
2
3
4
@Type(ObjectType.CONTAINER)
@Index(value = "my_container_name_index", onVersionTable = true)
@PreSchemaInitialization("classpath:liquibase/my-container-changelog.xml")
public interface MyContainer {
Example for a custom liquibase script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<?xml version="1.1" encoding="UTF-8"?>
<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.1.xsd"
        logicalFilePath="my-container-changelog.xml">

    <changeSet id="update-my-container-1" author="root">
        <dropColumn tableName="my_container" columnName="postal_code"/>
        <addColumn tableName="my_container">
            <column name="postal_code" type="text"/>
        </addColumn>
        <dropColumn tableName="my_container_ver" columnName="postal_code"/>
        <addColumn tableName="my_container_ver">
            <column name="postal_code" type="text"/>
        </addColumn>
    </changeSet>

</databaseChangeLog>

Note that the script in the above example first updates the content of the type definition system tables to reflect the changed data type of the attribute postal_code of the type my_container. Doing this causes the automatic migration performed afterwards to ignore the change. Other changes in the type class would still be performed automatically, if possible. The script then simple drops and re-creates the column for the attribute. In a real-life scenario, this is the place where the actual data migration would happen.

Changes not checked during startup

The following changes in the type system will not be checked for:

  • Inheritance: Changing the source key or the source property of an inherited property is allowed. The system will accept it (and not even check it). This can have subtle consequences. The data of an entity created before such a change will still be as before. However, the next time it is updated the inheritance will be computed anew and thus the data will change according the new inheritance rule.

  • formatted counter sequence names: Changing the name of the sequence of a formatted counter will take effect. This can have an impact on your application. It will result in the creation of a new sequence and effectively reset the counters value. This might be desired effect - it could also be the result of an oversight in the type changes. To protect oneself from accidental changes it is deemed could practice to mark formatted counter fields with @Unique.

Document Service

The Document Service service is responsible for handling various repository entities such as documents and folders. The following entity types are supported: document, folder, container, relation and metadata.

The service saves the binary data belonging to the documents and delivers them again. Various plugins are available for connecting storage devices and services. A plug-in is assigned to a profile and configured. When saving data, the client has to specify the profile to be used and thereby decides where the data will be saved.

Upload Data

Content, annotations (see below) and metadata can be uploaded as a coherent document. 0-n content elements of different content types are possible. Each content element is named. As a result, you get a globally unique ID (DocumentID), which can be used to reference content, annotations and / or just metadata of the latest version of the document. It is possible to clone content elements from one document to another, creating a copy of the content on the storage. For that, a ContentReference can be supplied when the document is created.

Example: Upload a document with new content
TypedDocumentServiceClient<SingleContentDocument> serviceClient =
    typeDefinitionServiceClient.getDocumentServiceClient().byClass(SingleContentDocument.class); (1)

SingleContentDocument document = serviceClient.createTypeInstance();
document.setName("some name");

TypedDocumentClient<SingleContentDocument> client = serviceClient.create(
    new TypedDocumentInput<>(Map.of(ContentElementNames.DEFAULT_NAME, (2)
        new ContentUpload(inputStream)), document)); (3)
1 The typeDefinitionServiceClient is an instance of TypeDefinitionServiceClient, that can be injected.
2 The type definition SingleContentDocument uses only the default content definition, hence the default name is used.
3 The actual content is passed as an InputStream.
Example: Upload a document with cloned content
TypedDocumentServiceClient<TypedTargetDocument> serviceClient =
    typeDefinitionServiceClient.getDocumentServiceClient().byClass(TypedTargetDocument.class); (1)

TypedTargetDocument document = serviceClient.createTypeInstance();

DocumentContentReference reference = new DocumentContentReference(documentId, ContentElementNames.DEFAULT_NAME); (2)

TypedDocumentClient<TypedTargetDocument> client =
    serviceClient.create(new TypedDocumentInput<>(document, Map.of("content", reference)));
1 The typeDefinitionServiceClient is an instance of TypeDefinitionServiceClient, that can be injected.
2 Here the documentId is the ID of an already existing document that uses the default content definition.

Validating uploaded content

There are several different ways to validate the content of an uploaded document. The method to use depends on the requirements of the client application. Some applications might already have computed a hash of the content while others might offload this to the server.

Validating content on the client side

When content is uploaded to a type definition that supports content metadata, the server computes an SHA-256 hash for the received data and returns it in the result of the upload request. The client can use this hash value to compare the data received by the server with the original data. The following example shows how to compare the hash values:

Example: Checking hash values on the client side
ContentTest entity = client.create(input).getEntity(); (1)
Hash hash = entity.getContent().get("content").getHash(); (2)

Hash expectedHash = Hash.sha256Hash(inputStream, 1000000, tempFile); (3)
Assert.assertEquals(expectedHash, hash);
1 The document is uploaded using a type definition service client
2 Get the hash returned from the server. getContent is a getter for the system property SystemPropertyName.CONTENT.
3 Use de.eitco.ecr.common.Hash to compute the expected hash

The TypedDocumentServiceClient offers an additional method to validate uploaded content. The createAndValidate method automatically computes a hash of the uploaded data and compares it with the hash value returned from the server. If the two hashes do not match, a HashValidationException is thrown and the created document will be purged.

Example: Using the createAndValidate method
TypedDocumentServiceClient<ContentTest> client = typeDefinitionServiceClient
    .getDocumentServiceClient().byClass(ContentTest.class);

ContentUpload contentUpload = new ContentUpload(data);

Map<String, ContentUpload> content = Map.of("content", contentUpload);

ContentTest instance = client.createTypeInstance();
TypedDocumentInput<ContentTest> input = new TypedDocumentInput<>(content, instance);

client.createAndValidate(input);
Validating content on the server side

It is also possible to pass a hex representation of an SHA-256 hash code of the uploaded content to the server. If such a hash is present, the server will compare the computed hash value with the one specified by the client. If the values do not match, the upload fails and the uploaded file will not be stored.

Example: Checking hash values on the server side
Hash hash = Hash.sha256Hash(inputStream, 1000000, tempFile); (1)

ContentUpload contentUpload = new ContentUpload(
    "lorem_ipsum.txt", (2)
    null, (3)
    null, (4)
    data,
    hash
);

Map<String, ContentUpload> content = Map.of("content", contentUpload);

ContentTest document = client.createTypeInstance();
TypedDocumentInput<ContentTest> input = new TypedDocumentInput<>(content, document);

client.create(input);
1 Use de.eitco.ecr.common.Hash to compute the hash
2 The filename
3 null for the length, will be computed by the server
4 null for the content type, will be computed by the server
Validating the content of an existing document

The TypedDocumentServiceClient provides a method called hashMatches that can be used to check if the content of an existing document is valid. The client has to provide the expected hash, the document’s ID and the name of the content element to check. An additional parameter called loadContent defines if the server should use the hash value stored in the database or if it should load the content from the storage and compute a new hash value to compare. It is possible to check the content of a specific version of a document, too.

Example: Checking hash values of an existing document
Hash hash = Hash.sha256Hash(inputStream, 1000000, tempFile);
boolean hashMatches = documentServiceClient.hashMatches(documentId, "content", hash, false);

Download Data

Content, annotations and metadata of a document can be downloaded via API. It is possible to load the entire document as a multipart or a structure of the document that includes all metadata, annotations and a list of content elements with their IDs, types and identifiers. Each content element can then be loaded using the document ID / content ID or the document ID / content type. Access to individual content elements without a document ID is not possible for reasons of access control. Access control based on the document ID is ensured with every access.

Update Metadata Without Version

The meta information of a document can be changed. The changes can be persisted in the database without creating a version. It is possible, to maintain frequently changing information on the document quickly without creating the overhead of a version. However, in the event of an audit, the changes are not traceable.

Delete An Object

Documents contain one or more content elements which are not stored in the database but in the storage system. When a document is deleted using one of the delete-calls, the content elements will remain on the storage. To delete both the database entries and all content elements (including those referenced from older versions), a client can use the purge methods provided by the document clients.

A type definition can use the optional recycle bin feature. If it is enabled, entities in the type definition can be moved to and restored from the recycle bin. The Delete-API allows you to execute the methods:

  • MoveToRecycleBin(): to move an object to the recycle bin. The DELETE-property of the latest version is set to 1 and content and older versions are not affected.

  • Delete() all the versions of the object are deleted from the database.

  • Purge(): all the version of the objects are deleted from the database and the content objects or files are erased.

  • RestoreFormRecycleBin(): restore an object from the recycle bin, the DELETE-property is set to 0

If an object has relations to other objects is related by other objects the delete or purge method will fail with a foreign key exception. The Relation API provides methods to delete the relations (Remove Relations)

Filter Recycle Bin

Entities in the recycle bin will be filtered from normal queries by default, but a client can compose search expressions that override this behavior. To do that it is sufficient to include a reference to the deleted system field in the expression. The following example shows a part of a query that will show only deleted entities:

Excerpt of an example query
....and().systemField(SystemFieldList.GeneralSystemField.Deleted.INSTANCE).equalTo().value(true)

Note that the deleted system field can contain null values, which have the same meaning as false. When a client uses one of the delete calls to delete one or more entities, all database entries for those entities will be deleted (including all versions).

There is no option to restore entities once they have been deleted.

If there are relations between entities that are to be deleted, the relations are not deleted. Instead, a ForeignKeyException is thrown - and has to be handled by the caller.

Removing all relations of an entity

To delete all relations that originate from a certain entity, the method removeAllRelations() has to be used. The method returns the deleted relations:

List<Relation> removed = sourceContainerClient.removeAllRelations();

You can also delete all relations that point to a specific entity. For this, there is the method removeAllIncomingRelations(). This also returns the deleted relations:

List<Relation> removed = targetContainerClient.removeAllIncomingRelations();

Once all relations have been removed, the entity can also be deleted.

Locking

If your applications want to update objects from different processes at the same time you must decide if you want to use no locking or optimistic locking. No locking means that the latest update wins and overwrites the concurrent update. Depending on the database configuration it might happen that one update becomes a deadlock victim and an exception is thrown. If optimistic locking is enabled for the document type the API ensures that updates do not overwrite changes made by other clients by accident. The feature is disabled by default and can be enabled by annotating a type class with @OptimisticLocking. e.g. two processes A and B load the same object including content and versions at the same time and get the same version of the document. Now both processes process the document and some metadata and add additional content. A is faster than B. With No Locking B overwrites the changes made by A. With optimistic locking B cannot save the changes and receives a Locking exception. Process B has to load the changes made by A and retry the operation.

Checkin / Checkout

TYou can check out a document via the API and blocking it for other users. You can optionally specify a timeout after which the lock is automatically reset. Otherwise, the document is locked endlessly or until the global lock timeout takes effect.

You can check in a checked-out document via the API and save the changes (stream) as a new version in the repository or overwrite the existing version (traceable via the audit). You can optionally specify whether the lock (checkout) will be retained. Optional parameters as when creating a new version.

You can discard a checked-out document via the API and thus reset the checkout lock.

You can use the API to query the information for each document as to whether/since when and by whom it has been checked out. This information is also available for lists.

As a system administrator, you can set a global timeout after the lock has been automatically activated. An INFO message is written in the central log, that contains the name of the blocking user and the document ID. === Versioning

The goal of using the concept of versioning is to create and work with version-safe archives and track the history of each change in the system.

Versioning basics

All entity types in arveo may have a version, which itself is an optional attribute. The attributes of the entity types specify in their definition whether they are versioned. If an entity type has at least one versioned attribute, a version table is created. The version number of an existing entity is automatically created and can be retrieved via the system property version_number.

In the version table, the version changes to the metadata are listed, as well as the changes to one or more content elements. Optionally you can specify a Unicode version comment. Each version gets a version ID, which is unique for this bundle of version tables. The version id allows a developer to retrieve content and metadata of exactly this version of the entity. Using the API a developer can query all versions including their metadata and content elements for each entity ID or version ID. It is ensured that the existing content of a version is not changed or deleted by a new version, but there is an exception to this rule, which does allow to overwrite a version change.

There is a function that allows you to make a change without having to note it in the version table. And there is a way to forbid this for a certain entity type.

Implementation of versioning

The concept of versioning is implemented using the annotation @Versioned, which is defined by the interface Versioned. This annotation defines if an attribute of a type is versioned or not (when placed on a getter) or if all attributes of a type are versioned or not (when placed on a type). When the annotation is present on a type and on a getter in the type, the annotation on the getter wins.

The following example of an object of type Container contains an attribute "name", which is a versioned attribute. The other attribute "counter" in this example is marked as not versioned.

Example:

@Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface TypedSourceContainer {

    @Name("counter")
    @Versioned(false)
    int getCounter();

    @Name("name")
    @Versioned
    String getName();
}

Data model for versioning

The actual search table only contains the current status of metadata and system fields. In the version table, however, all entities and their versions including the metadata are listed. Only versioned attributes are included in the version table. A current internal version counter (1.,2…​n) is maintained in the system column version_number.

During versioning the service counts up the internal version counter by incrementing the value of the system column version_number by 1. The value is stored in the version table.

Changes to non-versioned fields cannot be tracked because they are not written to the version table. To prevent accidental overwriting of such fields, optimistic locking can be activated. In this case, a certain property is defined to let the system know, a certain version of an entity is outdated.

Optimistic locking

Activating the optimistic locking prevents overwriting for versioned fields. When simultaneously editing an entity and trying to overwrite saved changes of another user, an error message is thrown. Overwriting is not thus possible. Hence, through activating the optimistic locking on an entity type definition (using the annotation @OptimisticLocking), you prevent data corruption.

Optimistic locking is used only for single updates, not for batch updates.

Structure of the version system table

The version system table consists of the following columns (this is not a complete excerpt):

Table 29. Structure of the system table
column db data type java data type nullable?

version_id

bigserial

long

no

entity_id

int8

long

yes

version_acl_id

int8

long

yes

modification_date

timestamp

ZonedDateTime

no

modification_user_id

int8

long

no

version_comment

text

String

yes

version_number

int4

int

no

In this table, version_id is the primary key. The foreign key entity_id references the corresponding entity table.

Version ID

The version ID has the following structure:

[12bit Tenant id][14bit Type Definition id][38bit Version id]

Here the tenant may be for instance a database scheme or a customer. It is followed by a type definition, for instance Container. The third part is the version id in the database. The composed version id is unique in arveo system.

Search Language

Concept

Any client application, that needs a search function, can implement the Search Service with a suitable parameter. An example of such an implementation is the class DocumentServiceClient in the Client API. The search queries are formulated similarly, what is different is the search result, which is always typed. In the arveo the type is Entity.

Technical implementation

Search Service is part of the module 'commons'. It was created to enable more convenient searching. The Search Service works on the basis of EQL (Eitco Query Language). This query language is also used for some other services, like Access Control Service. The main interface is SearchService. It is a functional interface, providing just one method to be implemented: search(). However, this functional interface has a variety of convenience methods, enabling faster and more convenient search, like firstResult(), uniqueResult(), count(), stream() and others.

Listing for the search method definition
Page<EntityType> search(@NotNull SearchRequest searchRequest);

As the only parameter, a search request is accepted, returning a Page of results. A Page has a page definition, a completeCount and a parameterized list of results. The Search Service also provides a method where() with a condition builder, filtering results based on a specific condition.

SearchServiceFactory is a server class, which builds search queries. It has methods for creating an instance of search service for Documents (searchServiceForDocument()), but also for all the other entities, including Metadata. The result of the search is transformed into a Document (or respectively another entity) by the DocumentMapper.

The class SearchResourceImplementation provides an API for searches that are not bound to one and only one type definition.

The interface SearchService is implemented by the class EcrSearchService.

The search client creates different search services, which can be used to search for corresponding entities, for instance a folder search service, a document search service and so on. And there is also a GenericUnionSearchService, that can be used to create any joins on search statements.

Usage

The following examples demonstrates the usage of the Search Service to retrieve an object page.

Example of Search Service usage
SearchService<Object> searchService = <a valid instance>;
Page<Object> objectPage = searchService.where()
    .contextReference("field").equalTo().value(7).or()
    .contextReference("other_field").greaterEqual().contextReference("another_field")
    .holds()
    .order().descendingBy("field").from(5).pageSize(7);

It is possible to check the type of object searched for:

Example for type checking
1
2
3
4
5
6
7
8
9
10
11
            searchService.where() (1)
                .entity().typeId()   (2)
                .equalTo()
                .typeId(NamedFile.class) (3)
                .or()
                .entity().typeName() (4)
                .in().expressions(x -> x
                .typeName(NamedTextFile.class) (5)
                .typeName(NamedFolder.class)
            ).and()
                .entity().typeId().notEqual().typeId("named_relation") (6)
1 The variable searchService is an EcrSearchService.
2 The id of the type of given entity is referenced by the method typeId().
3 The type id is checked to be the id of the type defined by the class NamedFile (which is obtained by the method typeId()).
4 Here the type name is referenced instead of the type id.
5 As with the type id, the name of the type defined by the class NamedTextFile is obtained.
6 The type id can also be obtained if only the type name is given.

NOSQL Document Database apache solr 8.6

Apache Solr is a search server and is used as an independent full-text search server for ECR Healthcare. Solr uses the Apache Lucene search library as the core for full-text indexing and search.

Retention Periods

arveo supports a range of retention management features:

  • Full support of document life cycle

  • Supports prolongation and litigation hold for data retention managers

  • Privileged delete before retention expires

  • Privileges for data protection officers (delete) and data protection managers (litigation)

  • Flexible storage container definition (e.g. months, years) for documents with identical retention period (S3 buckets or file system folders)

  • Fast erasure of storage container by asynchronous delete jobs.

Concept

arveo is able to store content with a fixed retention date to ensure that the legal or tax relevant retention period of a document is taken into account and the content is protected from deletion. You can configure retention rules for arveo document types and automatically apply the appropriate retention period to uploaded documents.

If some of your documents could be required in a legal proceeding but the retention period expires before the end of dispute you can set a litigation hold or prolong the retention period to protect the data until the dispute has finished.

Let us describe why the storage container concept is used by arveo. Most storage systems can create objects much faster than they can delete them. Once the retention has expired it is much faster to remove a bucket (cloud storage) or partition/directory (file system). You can setup retention rules to define which documents are stored to the containers. All documents within a certain retention range (e.g. 1 year or 3 months) will be stored to one storage container (S3 bucket or directory). arveo allows you to delete millions of content objects in a very short time by simply removing the entire storage container.

If a document needs to be deleted e.g. for data privacy reasons, arveo also provides an API call to erase single objects by their ID. If you want to delete an object before its retention period has expired the user needs along with delete_right also the dataprivacy_admin privilege.

Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo does not use hardware retention features, which protect data from erasure on the hardware level. arveo protects the content by software design. arveo stores the retention information in the database and only allows access to the content and metadata by the arveo REST API. The REST API prevents any delete operation before the retention period has expired. As only arveo and highly authorized administrators have data writer rights for the database and the storage it is impossible that content be deleted or manipulated before the retention expires.

The operator must take appropriate technical or organizational measures to ensure that the data is stored in the storage in such a way that it cannot be changed within the legally prescribed retention period.
The provider of the arveo services should ensure that only authorized data protection officers & administrators have data write (INSERT,UPDTAE; DELETE) permissions for the database and the content repository.

storage container and document life cycle

Since deleting large amounts of documents is a performance critical task, the arveo repository service provides special support for mass deletion of documents whose retention period has expired.

The basic idea is to define separate storage locations, which are exclusively used to store documents with similar retention requirements. The deletion of documents with specific retention requirements is then a matter of deleting all contents of a specific storage location in one step. Storage locations containing documents with the same retention period will be called storage container for the rest of this section.

arveo allows you to store data with the same retention in one storage container and is able to create storage containers automatically.

The storage containers are either folders (file system storage) or buckets (S3 object storage). The actual selection of the storage container for a document with specific retention requirements can be configured by rules, that select the storage container based on the retention period and litigation hold status of the uploaded document.

When the litigation hold is set, the object is moved to the litigation hold directory or bucket and will not be deleted when the initial retention period expires. When the litigation hold ends, the document is deleted the next time a delete job runs. The number of objects under litigation hold is typically small and does not affect the overall erasure performance.

The following diagram shows the life cycle of a document with a fixed retention period set on upload, a legal dispute and automatic erasure at the end of the document’s life cycle:

*Retention in Buckets*
Figure 7. Retention in buckets

Litigation Hold

arveo provides a system property LITIGATION_HOLD that allows you to prolong the retention until you remove the litigation hold property.

This function requires the DATAPRIVACY_ADMIN privilege.

Prolongation

You can prolong the retention period but not shorten it. You can use the API call to set the initial retention period if the retention is null. When the retention is prolonged, arveo moves the object to the appropriate storage container.

This function requires the DATAPRIVACY_ADMIN privilege.

Erase a document

The arveo delete API will as for all other objects without a retention period delete the respective objects. See also Deletion of objects and Recovery table.

After the retention period has expired, the function requires the DELETE privilege, but before the retention period has expired, DATAPRIVACY_PRIVILEGED_DELETE privilege is required.
This API should not be used for operations like deleting the objects of a certain year. This should be done using the erasure storage container API.

Erase storage container

If you have used the storage container feature to speed up the deletion of documents at the end of their life cycle, you can delete all documents within a retention period range with one API REST call 'EraseStorageContainer'.

You can either erase the storage container (buckets, folders) controlled by your operating team or with an automated arveo job. You can set up a scheduled job in the arveo integration service. Use the erasure storage container template job and adopt it to your needs. The erasure job will delete all entities of a document type within the given retention period range where litigation hold is not set. The job will write an entry for each erased object in the corresponding audit log table. For more detailed explanation, see the erasure job template example.

Mass deletion of documents under retention requires the SUPER_USER privilege.
Enable the audit log feature for all document types and dependent document types if you need a report of the erased objects. Audit Log
Grant the deletion right for your storage containers to arveo. If arveo cannot delete the containers, your operating team is in charge of this task and you must set the option delete rows only.

Privileges & Roles

Privilege DATAPRIVACY_ADMIN (Data Protection Manager) DATAPRIVACY_PRIVILEGED_DELETE (Data Protection Officer) SUPER_USER (Data Protection Administrator)

Prolongation

yes

no

no

Litigation Hold

yes

no

no

Delete before retention

no

yes

no

Mass Delete

no

no

yes

Examples

Create document with retention and set litigation hold
public void createDocumentWithRetention() throws IOException {
final String TEST_IDENTIFIER = "SetLitigationHold test timestamp in ms=";
final String TEST_DATA = "abcde";
final String TEST_DATA_MIMETYPE = MediaType.APPLICATION_OCTET_STREAM_VALUE;
TypedDocumentServiceClient<DocumentWithRetention> serviceClient =
        typeDefinitionServiceClient.getDocumentServiceClient().byClass(DocumentWithRetention.class);
ZonedDateTime now = ZonedDateTime.now(ZoneOffset.UTC);
DocumentWithRetention newDocument = serviceClient.createTypeInstance();
newDocument.setName(TEST_IDENTIFIER + System.currentTimeMillis());
newDocument.setReceiptDate(now );
newDocument.setMimeType(TEST_DATA_MIMETYPE);
newDocument.setRetentionDate(now);
ByteArrayInputStream data = new ByteArrayInputStream(TEST_DATA.getBytes());
Map<String, ContentUpload> content = Map.of(ContentElementNames.DEFAULT_NAME, new ContentUpload(data));
TypedDocumentClient<DocumentWithRetention> newClient = serviceClient.create(new TypedDocumentInput<>(content, newDocument));
Assert.assertEquals(IOUtils.toByteArray(newClient.readContent()), TEST_DATA.getBytes());
DocumentWithRetention loadedDocument = newClient.getEntity();
Assert.assertNotNull(loadedDocument);
Assert.assertTrue(loadedDocument.getName().startsWith(TEST_IDENTIFIER));
Assert.assertEquals(loadedDocument.getMimeType(), TEST_DATA_MIMETYPE);
assertDateEquals( loadedDocument.getReceiptDate(), now);
assertDateEquals(loadedDocument.getRetentionInformation().getRetentionDate(), now);
Assert.assertFalse(loadedDocument.getRetentionInformation().isLitigationHold());
// set LitigationHold = true
newClient.updateLitigationHold(true);
newClient = newClient.reload();
DocumentWithRetention litigationOnDocument = newClient.getEntity();
Assert.assertTrue(litigationOnDocument.getRetentionInformation().isLitigationHold());
// set LitigationHold = false)
newClient.updateLitigationHold(false);
newClient = newClient.reload();
DocumentWithRetention litigationOffDocument = newClient.getEntity();
Assert.assertFalse(litigationOffDocument.getRetentionInformation().isLitigationHold());
}
Set retention / prolong retention
public void createDocumentWithoutRetention() throws IOException {
final String TEST_IDENTIFIER = "SetRetention test timestamp in ms=";
final String TEST_DATA = "abcde";
final String TEST_DATA_MIMETYPE = MediaType.APPLICATION_OCTET_STREAM_VALUE;
TypedDocumentServiceClient<DocumentWithRetention> serviceClient =
        typeDefinitionServiceClient.getDocumentServiceClient().byClass(DocumentWithRetention.class);
// store document without retention
DocumentWithRetention newDocument = serviceClient.createTypeInstance();
newDocument.setName(TEST_IDENTIFIER + System.currentTimeMillis());
newDocument.setReceiptDate(ZonedDateTime.now());
newDocument.setMimeType(TEST_DATA_MIMETYPE);
ByteArrayInputStream data = new ByteArrayInputStream(TEST_DATA.getBytes());
Map<String, ContentUpload> content = Map.of(ContentElementNames.DEFAULT_NAME, new ContentUpload(data));
TypedDocumentClient<DocumentWithRetention> newClient = serviceClient.create(new TypedDocumentInput<>(content, newDocument));
Assert.assertEquals(IOUtils.toByteArray(newClient.readContent()), TEST_DATA.getBytes());
DocumentWithRetention emptyRetentionDocument = newClient.getEntity();
RetentionInformation retentionInformation = emptyRetentionDocument.getRetentionInformation();
Assert.assertNotNull(retentionInformation);
Assert.assertNull(retentionInformation.getRetentionDate());
Assert.assertFalse(retentionInformation.isLitigationHold());
// set initial retention
ZonedDateTime initialRetentionDate = ZonedDateTime.now();
emptyRetentionDocument.setRetentionDate(initialRetentionDate);
TypedDocumentClient<DocumentWithRetention> initialRetentionClient = newClient.updateAttributes(emptyRetentionDocument);
DocumentWithRetention initialRetentionDocument = initialRetentionClient.getEntity();
assertDateEquals(initialRetentionDocument.getRetentionInformation().getRetentionDate(), initialRetentionDate);
// prolong retention
ZonedDateTime prolongedRetentionDate = ZonedDateTime.of(2050, 1, 1, 0, 0, 0, 0, ZoneId.of("Europe/Berlin"));
initialRetentionDocument.setRetentionDate(prolongedRetentionDate);
TypedDocumentClient<DocumentWithRetention> prolongedRetentionClient = initialRetentionClient.updateAttributes(initialRetentionDocument);
DocumentWithRetention prolongedRetentionDocument = prolongedRetentionClient.getEntity();
assertDateEquals(prolongedRetentionDocument.getRetentionInformation().getRetentionDate(), prolongedRetentionDate);
}
// endg::prolongRetention[]
@Test
@WithCredentials(username = "ecr-dsgvo-admin", password = "password", tenant = "integrationtest")
public void createDocumentWithRetention() throws IOException {
final String TEST_IDENTIFIER = "SetLitigationHold test timestamp in ms=";
final String TEST_DATA = "abcde";
final String TEST_DATA_MIMETYPE = MediaType.APPLICATION_OCTET_STREAM_VALUE;
TypedDocumentServiceClient<DocumentWithRetention> serviceClient =
        typeDefinitionServiceClient.getDocumentServiceClient().byClass(DocumentWithRetention.class);
ZonedDateTime now = ZonedDateTime.now(ZoneOffset.UTC);
DocumentWithRetention newDocument = serviceClient.createTypeInstance();
newDocument.setName(TEST_IDENTIFIER + System.currentTimeMillis());
newDocument.setReceiptDate(now );
newDocument.setMimeType(TEST_DATA_MIMETYPE);
newDocument.setRetentionDate(now);
ByteArrayInputStream data = new ByteArrayInputStream(TEST_DATA.getBytes());
Map<String, ContentUpload> content = Map.of(ContentElementNames.DEFAULT_NAME, new ContentUpload(data));
TypedDocumentClient<DocumentWithRetention> newClient = serviceClient.create(new TypedDocumentInput<>(content, newDocument));
Assert.assertEquals(IOUtils.toByteArray(newClient.readContent()), TEST_DATA.getBytes());
DocumentWithRetention loadedDocument = newClient.getEntity();
Assert.assertNotNull(loadedDocument);
Assert.assertTrue(loadedDocument.getName().startsWith(TEST_IDENTIFIER));
Assert.assertEquals(loadedDocument.getMimeType(), TEST_DATA_MIMETYPE);
assertDateEquals( loadedDocument.getReceiptDate(), now);
assertDateEquals(loadedDocument.getRetentionInformation().getRetentionDate(), now);
Assert.assertFalse(loadedDocument.getRetentionInformation().isLitigationHold());
// set LitigationHold = true
newClient.updateLitigationHold(true);
newClient = newClient.reload();
DocumentWithRetention litigationOnDocument = newClient.getEntity();
Assert.assertTrue(litigationOnDocument.getRetentionInformation().isLitigationHold());
// set LitigationHold = false)
newClient.updateLitigationHold(false);
newClient = newClient.reload();
DocumentWithRetention litigationOffDocument = newClient.getEntity();
Assert.assertFalse(litigationOffDocument.getRetentionInformation().isLitigationHold());
}
@Test(expectedExceptions = MissingBucketSelectionRuleException.class)
public void testForcedRetention() {
TypedDocumentServiceClient<MultiProfileDocumentWithRetention> serviceClient =
    typeDefinitionServiceClient.getDocumentServiceClient().byClass(MultiProfileDocumentWithRetention.class);
MultiProfileDocumentWithRetention document = serviceClient.createTypeInstance();
document.setName(UUID.randomUUID().toString());
ContentUpload contentUpload = new ContentUpload("Just some random content.".getBytes(StandardCharsets.UTF_8));
        serviceClient.create(new TypedDocumentInput<>(Map.of("content_forced_retention", contentUpload), document));
    }
}
Automated erase job template

With the StorageContainer API you can erase a storage profile which is used in a retention management storage profile. See the example in Retention Rules.

1
EraseStorageContainer ("storageProfileRetention2032", DocumentWithRetention)

Parameter

  • String storageProfile

  • String Array documentTypes

The StorageContainer API erases the bucket assigned to the storage profile and deletes all rows in the document types If arveo cannot find the passed storage profile or the profile is not used in a retention management rule the erasure API fails with an invalid argument exception. If the retention configured in the retention rule has not expired the API returns an error.

The operating team must ensure that storage profiles contain only documents with the same retention period. Please do not use the same bucket in different storage profiles or assign a storage profile containing content with retention to different document types.

By archiving or exporting print lists, documents and data, an improved performance of your SAP environment is immediately noticeable.

The SAP user does not need any special training, as the existing SAP document viewer is used in the usual SAP specialist application to display documents. A special ViewClient rollout is therefore superfluous.

Eitco Content Server supports all SAP archiving scenarios such as: early recording, simultaneous recording, late recording without barcode, late recording with barcode up to support of the SAP business workflow. feature Standardized SAP ArchiveLink ™ interface, certification for SAP ArchiveLink ™ 7.20

The Eitco Archive Link Server fully implements the SAP Archive Link Standard and is certified for SAP and SAP / HANA

SAP basic knowledge is sufficient for the administration of the integration.

The Eitco Archive Link supports all SAP releases from R / 3

The Eitco Archive Link offers the e: c r repository to store the SAP business objects in accordance with GOBD and GoBS and thus supports legally secure archiving. Document display using standard SAP functions

All digitized documents stored via the Archive Link can be displayed with the SAP Standard Viewer. It is possible to use the additional component Eiutco Web Viewer via the Eitco Archive Link to display digitized invoices, delivery notes, etc. inside or outside in every web portal and to add comments / annotations. Support of all archiving scenarios

There are two ways of working when processing incoming documents. The early recording, the receipt is provided at the earliest possible point in time in the process in digital form, and on the other hand the late recording. Here, the entire processing process is operated unchanged as before the use of SAP. The paper document continues to flow through the full process until at some point it is scanned and archived. Late entry with barcode (paper-based):

An invoice is recorded with a barcode + accounting records the posting: At the beginning of the processing sequence after receipt of the mail, the incoming document is directly provided with a barcode sticker on the first side (e.g. by means of a barcode roll / label) Incoming documents are initially generated using SAP transactions (e.g. FB60 or MIGO) SAP objects. When the documents are posted later, there is an automatic barcode query. So that the barcode can later be assigned to the corresponding document, the barcode is recorded with a handheld barcode scanner, for example, and the result is automatically transferred to the SAP dialog window. Alternatively, the barcode can also be read and typed into the data entry mask With transaction OAM1, all SAP objects that are provided with a barcode are visible. The paper document goes through a paper-based approval process and finally scanned to SAP for digitization to complete the booking. Document scanning, barcode autom. read out and assigned to the business object in SAP: The documents that have already been provided with a barcode and recorded using a hand scanner are then scanned in batches (> document separation via barcode), and the barcode is read out. The scanning software sends the content including the recognized barcode to the Eitco Archive Link Server. This links the archived documents with the SAP business object (TA01). This process, namely the linking of a document with a "waiting", i.e. previously created SAP object, is known as "late archiving". In SAP, the data in the archive can then be accessed directly via the booking record with a click of the mouse without leaving SAP The payment can now be triggered as a result of the complete approval process in SAP.

Early recording with barcode (paper-based):

An invoice is recorded with metadata and sent directly to SAP: When the incoming document is filed with a barcode at the beginning of the processing sequence after the incoming mail, it is provided with a barcode sticker directly on the first page. This is followed by the central scanning point in the inbox with the process of digitization (scanning). The capture software reads the barcode, which also serves as a document separator. The barcode read out is reported to SAP together with the document ID via the interface of the Eitco ArchiveLink. The paper document goes through the paper-based approval process. During the posting (creation of the business object in SAP), e. For example, the employee can read the barcode of the document with a wand or barcode gun and automatically transfers the result to the SAP dialog box. Alternatively, the barcode can also be read and manually entered into the data entry mask. SAP then connects the business object in SAP with the digital document stored by the Eitco Archive Link. In SAP, the data in the archive can then be accessed directly via the booking record with a click of the mouse without leaving SAP

Early recording with / without barcode (SAP business process):

An invoice is recorded with metadata and sent directly to SAP. The central scanning point digitizes the incoming documents and the capture software reads out the barcode and other metadata if necessary. Any existing barcode also serves as a document separator. The read barcode is sent together with the metadata via the interface to the Eitco ArchiveLink. The Eitco Archive Link starts a SAP business process for checking incoming invoices based on the met data and transfers this to the SAP process. The approval process and everything else take place without further use of the Eitco Archive Link in SAP.

Late capture without barcode:

The invoice is digitized and the content is transferred to SAP via the Eitco archive link, but not linked to a business object. All further work steps are carried out manually in SAP. Simultaneous acquisition:

At the time of processing the security-critical data, e.g. personnel data, the scanning of documents for this process is triggered at the office of the clerk. All further steps, such as inserting it into the personnel file, then take place at the office of the clerk. Interface for linking and archiving in late / early archiving

Save barcode in SAP. In SAP, a suitable entry is created in the SAP barcode table BDS_BAR_IN and linked to the SAP invoice object (SAP TA OAM1)

Our Archive Link Service has a REST interface to which you can send a barcode and the type of doctor. (?? for a certain document whose SAPDOCID you know ?! or does the content and the SAPDOCID come with you?)

The Eitco Archive Link archives the object, creates (created? According to which logic?) A SAPDOCID and reports this to SAP so that an entry is created in the link table TA01. (???)

The Eitco Archive Link then sends the barcode to SAP, including a SAPDOCID generated by us and the appropriate SAP repository name. The Archive Link uses the SAP Java Connector (JCo, https://help.sap.com/saphelp_nwpi711/helpdata/de/48/70792c872c1b5ae10000000a42189c/frameset.htm) and calls the SAP FM BAPI_BARCODE_SENDLIST (https: //www.sapdatasheet .org / abap / func / bapi_barcode_sendlist.html) (not yet 100% sure, possibly also other SAP TA).

SAP then enters this information in the BarcodeExt a.k.a. BDS_BAR_EX table. In SAP, the OAM1 / SBDS7 then compares the BarcodeExt and the BDS_BAR_IN table. If he finds a barcode match, an entry is made in TA01 and the SAP business object is linked to the invoice in the Archive Link.

REST API

Client SDKs

The client SDKs provide APIs for applications using arveo. SDKs exist for both Java and TypeScript. Client applications should not use the REST API of arveo directly but instead use one of the provided SDKs.

JSON serialization

arveo uses a custom serialization for the JSON data in the REST API to support advanced features like polymorphism. Additionally, the custom serialization allows the arveo server and the client SDKs to pass type information. This way it is for example possible to differ between number types like short, int and long. The client SDKs take care of the serialization and the direct usage of the REST API is discouraged.

If it is necessary to (de-) serialize the custom JSON data, use the already configured Jackson ObjectMapper that is used by the server and the SDKs. This ObjectMapper is equipped with mixin types that contain information about how to (de-) serialize the custom JSON content. The internal ObjectMapper can be obtained by injecting an instance of de.eitco.commons.spring.web.json.AsdlObjectMapperHolder.

The service offers an overview page containing the REST resources and details about the models. It can generate examples for the models, too. The overview page is located at the root URL of the service.

Type information

Each object contains a type identifier in a json property called @type. The required value is listed in the API overview page for each model class. Example:

"identifier": {
  "@type": "container-id",
  "identifier": {
    "@long": 1
  }
}

Type information for data types

There are some special type identifiers used to identify the type of JSON fields.

The following table lists types and their corresponding identifiers.

Table 30. Types in Java and their Identifiers in arveo
Type (Java) Identifier

Byte

@byte

Short

@short

Long

@long

BigInteger

@big-int

Float

@float

Instant

@utc-date-time

ZonedDateTime

@zoned-date-time

Class<?>

@type-reference

UUID

@uuid

byte[]

@binary

LocalDate

@date

LocalTime

@time

Other data types do not require specific type identifiers.

The following example shows a special type identifier:

"retentionDate": {
  "@zoned-date-time": "2020-12-15T15:52:21.5193002+01:00[Europe/Berlin]"
}

Collections

To distinguish between different types of collections (lists and sets) there are type identifiers for collection types.

Table 31. Identifiers for the Types List and Set
Type (Java) Identifier

List

@list

Set

@set

The following is an example of the Type List:

"list": {
  "@list": []
}

Java SDK

The SDK contains the general API for accessing the arveo. The SDK can be used both to access the arveo via HTTP and to use the arveo as an embedded library.

Maven dependency of the Client SDK for usage via HTTP
<dependency>
   <groupId>de.eitco.ecr</groupId>
   <artifactId>ecr-sdk-http</artifactId>
   <version>${ecr.version}</version>
</dependency>
Maven dependency of the Client SDK for embedded usage
<dependency>
   <groupId>de.eitco.ecr</groupId>
   <artifactId>ecr-embedded</artifactId>
   <version>${ecr.version}</version>
</dependency>

The SDK offers both a generic API, where attributes of objects are mapped as a generic map, and a typed API. The typed API uses classes to be created by the project that represent the objects with the attributes. The main entry point for the API is the class de.eitco.ecr.sdk.TypeDefinitionServiceClient. An instance of this class can be obtained using Spring Dependency Injection. With the methods

  • getDocumentServiceClient()

  • getContainerServiceClient()

  • getFolderServiceClient()

  • getRelationServiceClient()

  • getMetaDataServiceClient()

you obtain a client factory that can be used to create a service client for a specific type definition. This service client can then be used to create new objects or load existing objects. For created or loaded objects, one in turn receives an entity client that offers methods for accessing the object. Special version clients are also available for concrete versions of entities.

Using the SDK in a non-web application

The SDK can be used both in applications that provide web functionality like REST endpoints and in applications that do not contain any web functionality. For non-web applications, some differences need to be considered.

Dependencies

By default, the SDK contains an OAuth2 client implementation that relies on some web-related spring beans. For non-web applications, a different OAuth2 client implementation is available. The default implementation needs to be excluded from the SDK dependency and replaced by the non-web implementation as shown in the following example:

<dependency>
    <groupId>de.eitco.ecr</groupId>
    <artifactId>ecr-sdk-http</artifactId>
    <version>${ecr.version}</version>
    <exclusions>
        <exclusion>
            <groupId>de.eitco.commons</groupId>
            <artifactId>cmn-spring-security5-oauth2-client</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>cmn-spring-security5-oauth2-client-non-web</artifactId>
    <version>${commons-oauth2-version}</version>
</dependency>

The current version of the OAuth2 client can be found in the Nexus.

Application initialization

The SDK contains some dependencies that cause Spring to initialize some web functionality automatically. This can cause problems like missing spring security configuration errors. Non-web applications can simply turn off all of Springs web functionality by using the SpringApplicationBuilder class as shown in the following example:

@SpringBootApplication
public class MyApplication {

        public static void main(String[] args) {
                new SpringApplicationBuilder(MyApplication.class)
                        .web(WebApplicationType.NONE)
                        .run(args);
        }
}

Batch Operations

The SDK provides various methods for batch operations. For example, several objects can be created or updated at once.

Create, update or delete multiple objects of the same type

All service clients provide methods for creating, updating and deleting multiple objects. Since a service client is bound to a specific type definition, only objects of the same type can be created, updated or deleted in this way. The objects to be updated or deleted are identified by any selector. When updating, methods are available that return the updated objects and methods that return only the number of updated objects. Especially if a large number of objects are updated at once, only the latter methods should be used. With these methods, the objects can only be updated in the same way. If the objects are to be customised, the methods from the BatchOperationServiceClient (see below) must be used.

Create or update several objects of different types

The BatchOperationServiceClient class provides methods to create or update multiple objects of different types.

Create several interdependent objects

To create multiple objects of different types, special BatchCreateInput input objects are used that bundle the type of the object and its properties. The order in which the objects are created corresponds to the order in which the input objects are passed. Each of these input objects contains a virtual ID that identifies it within the batch operation. In this way, for example, a relation as well as its source and target can be created in a batch operation. The relation only has to be created with the virtual IDs of source and target.

If the relation between the objects consists not only of the ID, but also of a foreign key to any attribute, a reference to the corresponding attribute of the referenced object must be given to the dependent object. For this purpose, the class BatchAttributeReference is available, which bundles the name of the foreign key attribute, the referenced attribute and the virtual ID of the other object in the batch operation. Code examples can be found in the class de.eitco.ecr.system.test.BatchCreationIT.

Update multiple objects of different types

The BatchOperationServiceClient also provides methods to update several different objects of different types in a batch operation. A separate input object is passed for each object to be updated, which contains the ID of the object and the properties to be updated. This means that individual changes can also be made to each object with these methods. The BatchUpdateUtility class provides auxiliary methods with which the respective input objects can be created. Code examples can be found in the class de.eitco.ecr.system.test.BatchUpdateIT.

Automatic update in case of collision

The BatchCreateInput objects used to create various types make it possible to automatically update the existing object in the event of a collision. To do this, the BatchCreateInput only has to be made aware of the field on which the collision could occur:

TypedContainerBatchCreateInput<Person> containerBatchCreateInput =
    new TypedContainerBatchCreateInput<>(new TypedContainerInput<>(person), List.of());
containerBatchCreateInput.setCollisionCheckAttribute("first_name");

In the above example, a container is to be created in a batch where a collision could possibly occur on the attribute first_name.

The attribute that is to be used to detect the collisions must be provided with a unique constraint.

Release Policy

Unresolved directive in product.adoc - include::../development-process/release-model.adoc[Release Model]

Roadmap

Unresolved directive in product.adoc - include::../development-process/release-timeline.adoc[Compatibility List]

Compatibility List*

Compatibility List

To operate arveo successfully the operator of the platform must provide and manage the following services.

diagram
Figure 8. Architecture Overview
Table 32. 3rd Party Services in arveo
Service Supported Version Comment

JDK

Java 11

Integration tests run on Adopt Open JDK 11, but all JDKs are supported

ActiveMQ

ActiveMQ 5.15,5.16

PostgreSQL

postgres 12, 13

apache solr

apache solr 8.6

S3 Storage

Ceph 15, 16
NetAPP ONTAP 9
Dell Elastic Cloud Storage (ECS)
AWS S3

Retention is not supported yet, even if provided by the vendor

File System

NFS
CIFS

Linux OS

Ubuntu 18.04, 20.04

Application Server

Tomcat 9, 10

kubernetes

1.19

If helm deployment is used

docker

20.10.8

If helm deployment is used

OAuth

OAuth2.0

Grant flows:
Client Credentials Flow
Authorization Code Flow with PKCE
Resource Owner Password Flow

Authentication Services

Keycloak 15
ADFS 2.0

LDAP Server

MS Active Directory

MS Graph

Document Conversion with Microsoft 365, requires M365 account

SSO

Kerberos

Kerberos Aiuthentication Service is MS Active Directory

Important Terminology

ECR

Short for Enterprise Content Services; this is the collection of the arveo content services providing all document and record features.

EQL

Eitco Query Language.

Used for search operations.

Entity

Object that represents a type of data structure used in arveo.

Document

An entity that can contain metadata and content.

Folder

An entity that contains metadata and is organized in a tree structure like in a file system.

Relation

An entity that represents a relation between two other entities.

Container

Simple folder-like object not organized in a tree structure but with relations to other objects.

Meta

An entity that contains only metadata.

Content type

A meta specification, that classifies the data.

Examples of content types are: original object, rendition, full text, text notes, XML properties, etc.

Retention

Continuous audit-proof storage of all company data for compliance or own business purposes.

Litigation hold

A flag that indicates whether a document is related to a litigation.

If the flag is set the document must never be deleted - even if the retention date has passed by.

Bucket

Object storage.

Encryption

Translating data into unreadable forms by means of electronic or digital codes or keys.

A specific key in the form of a procedure or an algorithm is required for the reverse transformation. Then the legitimate user can access the original data.

Annotation

A construct used on interfaces or getter-methods to specify their properties.

Storage profile

Are used to define on which storage the content elements are saved.

Storage Container

Are folders or buckets on the content storage containing documents with the same retention period (e.g. Jan-Dez 2031).