Introduction
What is arveo?
arveo is a Headless Content Service Platform.
arveo expands your digital company platform and your public cloud or data center solutions with cloud-based enterprise content management (ECM).
arveo is a multi-client and 100% cloud-ready content services platform. With arveo you can legally secure (GoBD certified) and DSGVO/GDPR-compliant manage the entire life cycle of your documents and files and process all your content. arveo ensures data and legal security even when using cloud storage services and takes into account the requirements of the GDPR and DSGVO with regard to the secure deletion of data.
With arveo enterprise-ready solutions can be created, from revision-proof content archives to complex file and transaction processing.
What is Content Service Platform? … is a cloud ready Enterprise Content Management System … is a collection of Microservices sharing the same data repositories … provides REST interfaces. … typically has ECM Services, AI Services, BPM, Conversion, Enterprise Search, etc. … provides access to all kind of content like documents, videos, images, audio, etc. … serves all kind of use cases with the organization … content is stored once and edited and read by many applications. |
arveo's modern architecture based on microservices and state-of-the-art technologies was natively built for the cloud. Connect our lightweight arveo content services with a single, lean API with your system landscape, other open systems and the most suitable services for you from the cloud or on-premises. With this Best-Of-Breed approach, you can easily realize your company’s dream of a “single source of truth” across all systems.
The arveo content services manage the entire life cycle of your content like
-
Documents
-
Images
-
Videos
-
Audio
-
Text.
arveo allows the free configuration of the content objects including metadata and mapping of folder hierarchies and electronic files.
The NoSQL technologies allow you to search across all meta values and document content with high performance regardless of the complexity of your search. Decisive advantages in mass data processing and search performance through the additional use of horizontally scalable NoSQL technologies. The used NoSQL apache solr 8.6 enterprise search engine combined with key value caches leads to an increase in speed of up to a factor of 1,000 compared to relational database systems.
Headless Content Services
The market for "headless systems" has been growing for some time. These offer backend functions without a user interface of the system completely can be used by the end user. This is best known from content management systems (CMS) used in web development. With the increasing use of different end devices such as smartphones, tablets or wearables, they are increasing also the requirements for content management systems. In addition, users have a lot of content on different channels. Headless CMS dispense with the front end and thus enable your content to be displayed various channels through a single REST API.
So if products are to be fully and seamlessly integrated in a platform and a dependency on a user interface or client is no longer desired, one speaks of so-called "headless systems".
The wide availability of different cloud services and solutions enables the set up a modern platform for your business processes. Instead of relying on a monolithic ECM as before, companies combine the most suitable cloud content services and create with the "best-of-breed" Approach targeted added value for your digital company platforms.
Regardless of whether you have your own solution, an open cloud application or your company portal, want to add secure and legally compliant ECM functions: You can access all of your data directly via a single interface (REST API), Access documents and information.
arveo is headless by design. All modules are hosted as pure backend cloud services from Eitco or optionally hybrid in your private cloud or on-premises in your data center disposal. Of course, these are natively suitable for mobile applications.
API first
The stateless REST API is our product and is used by all arveo components and user interfaces. The web services are stable over the long term and are fully available to every customer.
It is important to us that our services have open interfaces and can be easily integrated into an enterprise service infrastructure. As a modern content services platform, the arveo uses standards wherever possible in order to use the steadily growing number of cloud-enabled services inside or outside the company infrastructure. Whether operating system, database, text recognition, machine learning or object storage, arveo can access services from different manufacturers and combine them with its own services in order to quickly create added value.
Best-Of-Breed strategy
There are many ECM products and the market is constantly changing. A manufacturer-independent ECM standard such as SQL for relational databases has not fully established itself for ECM applications despite several attempts from WebDAV to JSR 170 to CMIS. The market is dominated by monolithic packages that master all ECM applications. A customer who implements a complex ECM application for his company often becomes highly dependent on a manufacturer and is faced with costs that are difficult to calculate when changing providers.
Due to the availability of platforms such as Amazon Web Services (AWS) or Microsoft Azure, which make a wide variety of services easily usable via web services, we are seeing a change in the behavior of companies who want to buy fewer complete solutions and instead are looking for specialized services that can easily be combined and thus create targeted added value for the digital company platform. Companies choose the best features from different manufacturers and combine them to create their own solutions, whereby you control the services used via your own API management or API gateways. This creates company platforms that not only access one, but often several repositories.
This often called Best-Of-Breed strategy benefits from the fact that the services available in the marketplaces have become increasingly standardized in recent years.
arveo consistently relies on a microservice architecture. The individual services are loosely connected to one another via lightweight stateless web service interfaces (http, REST) and each service can run and scale independently. All arveo functions are available via a uniform REST API gateway, which also takes care of the intelligent load distribution and the detection of defective services.
Scalability
Modern cloud-ready platforms rely on horizontal scaling and the load is distributed over many nodes, which can consist of inexpensive commodity hardware. Such a structure can also save costs through automated SCALE OUT and DOWN by switching nodes on or off as required. The arveo platform has a high tolerance for the failure of individual nodes. A high-performance availability is also required, since the end user nowadays only shows a limited understanding of long response times and can quickly switch to the competition in case of doubt.
All arveo services support containerized deployment and use stateless REST APIs so that they can be easily integrated into any cloud infrastructure. Through the use of containerized applications (Docker) and the service management of the open source Spring Framework, which well-known providers such as Netflix use and continuously improve, the services can be installed automatically as often as required and thus scale out and down if you use the cloud orchestration framework kubernetes. You can cluster together linux containers and build an auto-scaling and high available platform with high fail safety. A blue-green deployment for the risk-free, downtime-free rollout of new software versions is also possible.
Future proof
Our services use standards as far as possible, so that services from different providers can be delivered without great integration effort and the customer can react quickly to changes in the market. Due to the secure web service interfaces, all services including the database can be obtained from the cloud at any time.
With arveo services, you can build a sustained system architecture. By design arveo will you allow to separate your business logic from arveo ECM standard services and all other available cloud services like OCR, AI, document conversion (e.g. to PDF), identity management. arveo solutions are designed to be manufacturer-independent, so that the underlying REST ECM and other services can be exchanged at easily calculable costs.
This approach makes it possible to exchange individual services through to the content services of the arveo with little and easily calculable effort. Even arveo ECM services can be replaced by comparable services and via an open source S3 connector supplied, third-party systems can access the content objects migration-free using the S3 standard API.
Hybrid operation
arveo is a native cloud platform and is based on Open Source libraries and services. Through the consistent microservice architecture and the use of open source cloud technology, you can keep arveo's operating costs low.
Advantages of arveo operation
-
All services are horizontally scalable separately and can therefore also be operated on simple hardware. arveo runs on all Linux and Windows operating systems.
-
No additional license costs due to the consistent use of open source technology such as Linux, postgreSQL 12 and apache solr 8.6 NoSQL.
-
Container deployment: Simple integration into existing cloud platforms enables load-dependent, automated service provision up to blue-green deployment for seamless updates to new software versions.
-
Hybrid architecture: Flexible use of cloud services or on-premise services.
-
Low manufacturer dependency: By separating the user interface and business logic from the ECM / BPM services while using standards such as REST, S3 or BPMN2, there is less dependency on one manufacturer.
-
Web applications: We deliver templates for PWA (Progressive Web Apps) based on the state-of-the-art angular framework, which are completely open source. I.e. their surfaces belong to you and can be used independently of arveo.
-
Use of standards: Low training costs and high availability of know-how on the market through the use of standard frameworks (angular), standard interfaces (REST, S3, SAP Archive Link) and SDKs for Javascript, JAVA, C #.
Micro frontends
In addition, you can also use our ready-made, modern, clear, responsive and functional micro frontends, to make the arveo content services and thus their content easily available at the right time and in the right place in your business processes.
Mobile First: All surface components and interfaces are designed for mobile use.
Architecture Overview
Content Services
arveo is a content service platform and provides a set of lightweight, operating system-independent content microservices.
All services and clients exclusively use the secure, stateless, state-of-the-art HTTPS REST API*. For the highest possible security on the web and to be suitable for mobile access, arveo uses token security based on the state-of-the-art Spring security framework.
A Java, C# und Javascript SDKs is available.
arveo has multi tenant support and separates content and meta values per tenant.
As arveo is built for cloud operating systems like Openstack you automatically deploy and can scale the arveo containerized applications with the cloud orchestration framework kubernetes. You can cluster together linux containers and build an auto-scaling and high available platform with high fail safety. Containerized applications scale horizontally and can run on commodity hardware.
arveo is available as containerized application or WAR/JAR file and allows a hybrid deployment: On-Premise or in Cloud.
Service | Description |
---|---|
Document Service |
Store, edit and version documents, records/folders and their metadata. Manage storage locations with retention periods (GoBD certificate & GDPR/DSGVO compliant) Search of metadata with relational database postgreSQL 12 and NoSQL document db apache solr 8.6 |
User Management Service |
User management with users, groups and roles. |
Registry Service |
Service registry for all arveo content services managing the availability of the services. |
Config Service |
Secure storage of configuration data in git or database |
Access Control Service |
Object access control providing permissions to users/groups |
Audit Service |
Creates and manages audit tables for all other entity types like document types, user management objects, etc. Provides API to access the audit trail of any object by its entoty ID |
SAP Archive Link Service (optional) |
Web server that processes documents in accordance with the SAP Archive Link standard |
Document Conversion Service (optional) |
Conversion of document formats like docx, xlsx, etc. to image formats or PDF/A |
Enterprise User Management Service (optional) |
Extends arveo with organisation structure features like positions or substitutes |
Enterprise Integration Service (optional) |
The _arveo enterprise integration service supports over 300 data formats
and interfaces like XML, REST, CSV, Mail, |
Federation Service (optional) |
Multi repository architecture: The open connector plugin interface allows to access data from other repositories (Saperion, Documentum, file systems directories) |
3rd Party Services
To operate arveo successfully the operator of the platform must provide and manage the following services.
Service | Description |
---|---|
Active MQ |
Message Queue Service to process JMS and AMQP message |
postgreSQL 12 |
Relational database cluster for arveo system properties and customer metadata |
apache solr 8.6 |
NoSQL document database to support high performance content and metadata full text search |
Content Storage |
Either a S3 API capable object store service or a redundant file system server |
Authentication Service (optional) |
Identity Management implementing OAUTH2 workflow for secure login. |
Monitoring (optional) |
Supports logging / monitoring via ELK (Elasticsearch, Logstash, and Kibana. Supports Spring Service Admin Monitor Supports Prometheus + Grafana Monitoring frontends |
Industry standards
arveo relies on industry standards as much as possible to make integrations as easy as possible.
-
API: REST (JSON)
-
Storage: S3 (Cloud Object Storage API)
-
Authentication: OAUTH2, X.509 or Basic Auth.
-
Relational Database: JDBC access for PostgreSQL, Oracle, SQL Server
-
SAP: Archive Link Service
-
Containerized application deployment
Opensource technology stack
The technology stack has been chosen to ensure creating high-performance, cloud- and client-capable and scalable state-of-the-art (micro) services with a modern web user interface. Our chose tech stack enables the implementation of both small projects, which only consist of a single component in the backend, and large projects with various distributed components. The created components are deployable both locally on the customer’s hardware and in a cloud environment.
So the stack consists of the following components:
-
Spring Framework
The implementation of the backend components has been done in Java and Kotlin. The Spring Framework is used as the basis. Spring is an Open Source (Apache License) framework that has existed since 2004 with a large and very active developer community. The framework has a modular structure, which is why it is suitable for both simple and complex applications. It provides dependency injection, externalized configuration, and assistance with things like database access, transactions, messaging, etc.
-
Spring MVC, WebFlux
Spring MVC is a framework for creating web applications, especially for REST services. It is based on the servlet stack, in which a request is processed in a dedicated thread. WebFlux is also a framework for web applications, but is based on the reactive stack, in which the processing of a request is not restricted to one thread.
-
Spring Security
Spring Security is a component that provides authentication and authorization functionality. It can be used to secure web applications and also offers support for SSO technologies such as OAuth and SAML.
-
Spring Cloud
Spring Cloud is a collection of additional Spring components that provide the typical functionality required in a distributed or cloud application. The individual components can be used independently of one another and partly consist of integrable dependencies as well as independent applications. Which of the Spring Cloud components are used therefore depends entirely on the project requirements. Spring Cloud applications can be operated in managed cloud environments such as Cloud Foundry.
-
Spring Cloud Config
Spring Cloud Config offers a central configuration service as well as a client library for components that consume the configuration. In a Spring Boot application, it is sufficient to add the corresponding dependency. From then on, Spring will automatically read from the configuration service if it is available. The configuration data can be stored in simple files, in a database, a Git repository or in a protected repository such as Vault.
In a distributed application with several components running on different machines, Spring Cloud Config can be used to implement central management for the configuration of all components.
-
Spring Cloud Bus
Spring Cloud Bus provides a bus for communication between the components or for connecting external components. The communication is based on the AMQP protocol and requires a backend such as RabbitMQ or ActiveMQ. With the help of the bus, e.g. Notify components when their configuration in the configuration service has changed.
-
Eureka
Eureka is a Spring Cloud component provided by Netflix that provides a service registry. A service registry is a central directory of all service instances. A service or a client application therefore only needs to know the URL of the service registry in order to access one of the other services. Eureka is an independently executable component and offers a client library for access to the registry.
-
Hystrix
Hystrix is a Spring Cloud component provided by Netflix that can be imagined as a fuse in an electrical installation. If one component of a cloud environment fails, Hystrix can isolate it from the other components to prevent further failures. Another instance of the component can then provide the functionality.
-
Zuul
Zuul is a Spring Cloud component provided by Netflix that provides an API gateway. An API gateway acts like a reverse proxy and hides the individual microservices from a client application. The client application only knows the API gateway and does not have to worry about the URLs of the various services.
-
Ribbon
Ribbon is a Spring Cloud component provided by Netflix that provides a client-side load balancer.
-
Archetypes
There are Maven archetypes that can be used to easily start a new project based on our technology stack. Different archetypes are available for different types of applications. The generated projects contain a Jenkins file with a preconfigured CI environment including static code analysis with sonar, OWASP dependency checks, load tests based on JMeter, a release mechanism at the push of a button and an optional teams hook. Also included are packaging modules with which the application can be packaged as a Linux daemon or as a Windows service and IDE configuration files for IntelliJ and Eclipse.
-
Logging
In order not to depend on a specific logging implementation, logging has been implemented with a logging facade SLF4J or to be exact, with its specific implementation logback. In contrast to Log4J, Logback is actively maintained and is less complicated during initialization. It can be combined with SLF4J. Logback is one of the standard Spring dependencies.
-
Caching
Caching frameworks are available in many variants that cover very different use cases. Frameworks are listed here sorted according to their primary use case.
-
Local in-memory cache
Caffeine has proven itself as a fast local in-memory cache. It can be combined with Spring’s caching abstraction layer.
-
JDBC connection pool HikariCP has proven itself for JDBC connection pooling. This pool is also Spring’s standard dependency.
Security
Application security
arveo is a content service platform you can trust. We are continuously working to ensure that our services can be operated securely in the cloud.
All arveo content services and clients communicate via state-of-the-art secure REST interfaces via the secure HTTPS (SSL) protocol. All services require the web standard OAUTH2 with OpenID Connect authentication using tokens. A central authentication service (Keycloak, Active Directory or arveo user management service) issues tokens with an expiry date. That ensures that only client authenticated against the central service can use the content service APIs.
Data security
arveo can encrypt the content with AES 256 and thus protect it against unauthorized access.
The key is stored in such a way that maximum security is guaranteed.
In order not to re-encrypt all data if the key is compromised, own keys are generated.
Only the keys used are encrypted with the customer key and stored separately (Encryption).
See also Data Integrity.
arveo allows you to organize documents into folders and records. arveo can control the access rights such as reading, writing or deleting to each document via attributes or access lists and thus grant or deny the corresponding access to the groups or users.
ACL Permissions
-
None - no authorization (object not visible)
-
Browse - the user is allowed to see the metadata of the object, but not the content
-
Read - the user can read metadata and content
-
Relate - The user can add an annotation
-
Version - The user may change the content, but may not overwrite it
-
Write - The user can change metadata and content with the possibility to overwrite
-
Delete - The user can delete the object
Tenant security
The metadata and the content of the tenants are separated. Each tenant has its own storage container and
database. It is ensured that all data of a tenant is protected from
unauthorized access by another tenant.
The data of a tenant can be easily exported.
Security patches
For us it is important to continuously ensure that all known vulnerabilities are fixed and that we deliver security patches and hotfixes as early as possible to our customers.
To achieve this goal we integrated all kind of state-of-the-art tools like OWASP dependency check in our build process that perform automated static code analysis. We also perform PEN Tests on a regular basis.
What is OWASP? The Open Web Application Security Project® (OWASP) is a nonprofit foundation that works to improve the security of software. Through community-led open-source software projects, hundreds of local chapters worldwide, tens of thousands of members, and leading educational and training conferences, the OWASP Foundation is the source for developers and technologists to secure the web. OWASP is dedicated to enabling organizations to conceive, develop, acquire, operate, and maintain applications that can be trusted. All of our projects, tools, documents, forums, and chapters are free and open to anyone interested in improving application security (https://owasp.org). |
Application protection by design
What does Eitco to develop, operate and maintain a secure content service platform?
-
we only use Opensource Software from secure and accepted projects like Apache or Spring.
-
we implemented an open source review and monitor process
-
Software architecture review by the Eitco software architects
-
security check using OWASP dependency check
-
legal licence check to ensure that it is a real open source project on the long term.
-
we continuously check our open source dependencies with reference to architecture, security leaks, maintainability.
-
-
to ensure that all known vulnerabilities of 3rd party open source projects are eliminated we integrated the OWASP dependency-check tool in our nightly build. Dependency check checks our dependencies against a database with all known vulnerabilities.
-
in case a severe vulnerability is found we take the appropriate countermeasures.
-
provide a security path for our customers with a new version of the 3rd party library
-
change the implementation or configuration using the 3rd party component
-
inform our customers to update or reconfigure components like database, message queue, application server, etc.
-
replace the 3rd party component. The typically requires a major update.
-
OWASP dependency-check tool
it is a Software Composition Analysis tool trying to find vulnerabilities made public within the project dependencies.
The tool checks if there is an issue tracked in the "Common Platform Enumeration (CPE)" for the dependency.
If a vulnerability is found it creates report with a link to the CVE entry.
It is command line interface that can be easily integrated in any nightly build process.
For further information, consult National Vulnerability Database (NVD)– (https://nvd.nist.gov).
The following source is worth having a look at: Jeff Williams und Arshan Dabirsiaghi “Unfortunate Reality of Insecure Libraries”
(https://owasp.org/www-pdf-archive/ASDC12-The_Unfortunate_Reality_of_Insecure_Libraries.pdf).
Compliance recommendations (GoBD)
All companies using electronic data processing for legally or tax relevant documents have to be compliant to the "Principles for the proper management and storage of books, records and documents in electronic form and for data access" (GoBD, BMF letter November 28, 2019).
In addition to the proper use of the arveo and 3rd party services, we recommend implementing these measures when using Eitco as compliant repository for legally compliant storage of records and documents.
Indexing and retrievel
To allow users and 3rd party applications to identify and find objects in arveo you should define a unique and immutable unique identifier property ( Data Modelling). The property must be @Unique to ensure that a user or business application can clearly identify the item. The unique identifier should use the taxonomy of business processes and contain all information to clearly recognize the document. Make the property @Readonly to ensure that the identifier is always set and immutable.
The minimizes the risk of incorrect indexing and undetectability of documents because the index is immutable, duplicate identifiers are rejected and the compliant taxonomy ensures that every user can find documents easy and fast. We strongly recommend building a documented, simple but clear taxonomy.
Your business application or the user must set the value when the object is created (@Mandatory annotation), or you can let arveo create a unique value by adding counter annotations. Add the @Autoincrement annotation if a simple sequential Long id meets your requirements.
If you need a more sophisticated unique identifier you can use the annotation _@FormattedCounter which allows you to create e.g. String identifiers like <year>-<sequence> (Unique Identifier Example)
List data types allow you to store more than String or long value for a property. You can search for each value using the array search operation of the arveo query language (Data Types).
Enumeration data types allow you to set one or more values from a fixed set of values.
Retention periods
Enable that the statutory retention periods are assigned to the records, cases and document types (Retention Periods, Retention Rules) and ensure that the storage container are configured correctly (Retention Container) .
Check if the technically assigned retention periods also correspond to the statutory retention periods. Monitor the audit logs to ensure that the retention period is set and is correct. Monitoring could be automated or could be a random control by an employee.
The operating team must ensure that storage container contain only documents with the same retention period. Please do not use the same bucket in different storage profiles or assign a storage profile containing content with retention to different document types.
Grant the deletion right for your storage containers to arveo. If arveo cannot delete the containers, your operating team is in charge of this task, and you must set the option delete rows only.
Configuring storage containers in arveo-service.yaml and your content storage is an ongoing task for your operating team. Eitco will try to create the buckets or subdirectory on your storage system but can also use already existing ones.
It must be ensured that the system time cannot be manipulated (e.g. NTP server). Suitable map measures that a change in the system time is detected promptly.
Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo please take care that the content storage has no default hardware retention activated.
Audit log
Enable the audit option for all types containing legally compliant content (Audit Log). If the platform is operated safely (Platform Security) users and applications can exclusively write content and metadata using the _arveo REST API. arveo logs all user or application update operations of content and metadata to the audit table.
All changes of content or metadata are persisted as a traceable and immutable version (Versioning) on your storage system and an audit entry is written to the audit log table (Audit Log) containing the author and the timestamp of the change. If a document is updated the version is incremented and saved in the version number. Although all version are traceable and accessible by the API we recommend making the version number system property visible in the application to identify copies of the original easily.
Ensure that the @Overwrite option is not set for legally compliant document types. If overwrite is turned on it is possible to manipulate the originally saved content and compromise the document without creating a versioned copy.
The audit logs are subject to the retention period of commercial and tax law. Ensure that the audit logs are kept for the legal retention period (10 years). We recommend that the operator of the platform exports and clears the audit tables using database tools after 2 years. Save the dumps as an arveo document with a 10-year retention period. If you need access to older audit logs you easily download the dumps and upload them to the database.
The audit tables must be protected against unauthorized access by users. Do not allow write-access to the audit tables to anyone but the arveo services. Only data protection officers are allowed to have controlled read access to the audit data.
Check the audit logs regularly to find unauthorized user activities.
Download and migration
All documents in arveo that are subject to retention are available by the REST API and can be downloaded. The integrity and availability of the content is the responsibility of the provider and operator of the platform. The provider must ensure that failures of the storage systems for database and content are identified at an early stage and take appropriate countermeasures. See chapter Fail Safety for technical and organizational measures for high availability of the arveo platform.
In the event that data has to be migrated, arveo offers an extensive export API that enables content and metadata to be exported. arveo saves the hash value (https://en.wikipedia.org/wiki/Cryptographic_hash_function) in the database that was determined when the content was first uploaded (Upload Data). This hash value can be used as a checksum to detect accidental or intentionally corruption of data. If the hash value of the content after the migration is identical to the original hash the migration report proves the correctness of the migration process. To report the completeness of the migration process the arveo API allows you to export a list of all records, cases and documents in a document type.
Legally compliant migration
-
Prerequisite for the migration
-
use verify and best hash check possible in your solution when uploading content to arveo.
-
-
During the migration
-
download content and metadata (including the original hash and retention period)
-
upload metadata and content to the migrated platform and set the retention period to the exact same value.
-
calculate hash of the migrated platform by downloading the content
-
-
After the migration
-
Correctness: compare hash, metadata and retention period for each original and migrated record, case and document.
-
Completeness: check that each migrated document can be found using the unique identifier
-
Traceability: Create a report for each document type. Report the content hash evidence and the metadata for all migrated objects.
Upload the migration report to the migrated platform and set the retention period to the retention date of the document with the longest retention period within the report.
Depending on your retention policy you can create separate reports for a retention period range (e.g. by year).
-
Data integrity
arveo guarantees high availability, reliability and high performance at all times. The system has to be protected from manipulation attempts by proven and well-thought concepts. The data that is stored and managed in the system is protected via the API. The access and editing rights are managed via ACLs. User rights are based on the developed concepts for roles, groups and ACLs. More detailed information on this is provided in the relevant chapters of this manual.
Access to all data (documents, metadata) takes place exclusively via the API, with the corresponding protection mechanisms so that the security of the data is guaranteed at all times.
Content storage
The operator must take appropriate technical or organizational measures to ensure that the data is stored in the storage in such a way that it cannot be changed within the legally prescribed retention period.
Enable the verify option for all clients and integrations. The upload API optionally can verify the uploaded content. The content service downloads the just uploaded stream from the content storage and compares the hash once again with the expected value (Upload content). arveo stores the hash value in a system property and persists the value in the document type metadata table.
In case of very sensible data you can enable transparent encryption (Encryption) to follow the data protection rules and prevent your administrators from access of document content.
Databases
For the supported databases postgreSQL 12 you can select between different data replication strategies:
-
Asynchronous replication (backup or mirror): Enables an asynchronous disaster recovery. Your database is periodically mirrored.
-
Synchronous database cluster: Transactions are synchronously replicated on more than one master node. The provider of the postgreSQL 12 cluster must guarantee that data is stored redundant and reduce potential data loss. The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the backup strategy prevents data loss.
The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the backup strategy prevents data loss.
Fail safety
The system operator is responsible for data security and recovery. He must ensure that the backups of the data are checked regularly and that recovery is reliably possible in the event of a failure. The IT processes that ensure the secure, redundant and highly available storage of arveo data in databases and object or file system storage systems are particularly decisive for the proper operation of the platform. These are the responsibility of the operator of the platform, who must implement the availability and security of the systems in accordance with legal and organizational requirements.
We strongly recommend using a redundant file system or object storage system. If you do not at least backup your data periodically a data loss is likely. For high availability with almost zero data loss our storage system should replicate the written content and data synchronously. The operating team of the platform must ensure that an appropriate replication is set up and monitored.
Object storages with REST APIs are designed for the cloud. If you decide to use storage from the Cloud (public or private) we recommend to use object storage via S3 API. Object storages provide a high level of redundancy (even geo redundant) and fail safety. The REST S3 API is very tolerant against network and infrastructure failures.
Ensure technically and organizationally that there is sufficient space for storing the data. |
For the best high availability the provider of your storage system must protect the stored data against accidental, malicious, or disaster-induced loss of data. The better your data replication the better is your availability in case of a failure.
To achieve high availability for arveo the provider must guarantee that all required (content services) run as a cluster.
Security
Operators
The provider of the arveo services should ensure that only authorized data protection officers & administrators have data write (INSERT,UPDATE, DELETE) permissions for the database and the content repository.
An administrator only can illegally manipulate content if he can access both database and content storage because the control hash value of the content is stored in the database. Take care that none of your administrators has exclusive and unattended access to the content storage and the database.
Distributed management roles of the storage systems and the arveo transparent encryption feature make your system more forgery-proof!
The activities of administrators with extensive rights must be logged by the operator. The logs are subject to the retention periods of tax law and must be checked regularly.
Platform
To prevent unauthorized access to the arveo platform the provider must:
-
ensure that HTTPS communication is enabled for all clients, applications, 3rd party components and services (Services).
-
enable OAuth2.0 or X.509 certificate authentication X.509 certificate authentication and authorization for all arveo service (OAuth2.0). All arveo services require authentication, ensuring that only arveo services or authenticated and authorized users can use the API. We recommend using a state-of-the-art authentication services like keycloak and to enable SSO with at least 2-factor authentication.
-
take suitable technical or organizational actions against unauthorized changes to the data such as firewall, VPN, transparent encryption with arveo or at hardware level,
-
provide adequate protection of passwords by using a state-of-the-art IDP such as Keycloak or MS Active Directory and increasing the password complexity accordingly.
-
take actions against denial of service attacks.
arveo Content Services
The administrators of the arveo platform must:
-
make sure that only authorized persons receive an account that grants access to arveo documents;
-
ensure that objects are protected against unauthorized access using ACLs. We recommend defining a separation of functions and implementing this via ACLs. To achieve the best data security assign ACLs to all records, cases and documents. Make sure that for all used ACLs the assignment of access rights to users and groups is carried out regularly (e.g. Invoice document type, accounting: write, employees: read);
-
the activities of managers who can change ACLs are logged via arveo audit and checked at regular intervals;
-
organizationally ensure that the password the arveo administration users are changed regularly.
Data Store
Persistence architecture
arveo guarantees forgery-proof long term availability of your content and metadata.
All revisions of content or metadata are stored as a traceable and immutable version (Versioning) to the storage systems. The content service checks the integrity of uploaded content by computing SHA-256 hashes on client and server side. Additionally, an audit entry is written to the audit log table (Audit Log). arveo provides a role based access control on object level and allows you to prevent unauthorized access to content and metadata.
arveo protects content and metadata by software design. arveo only allows access to content and metadata via the arveo REST API. As only arveo and highly authorized administrators have data writer rights for the database and the storage it is impossible that content is deleted or manipulated by unauthorized persons.
Together with arveo's capabilities to manage the retention periods of documents and records (Retention Periods) arveo guarantees a GDPR and/or DSGVO compliant data protection and data privacy.
arveo meets the the requirements of a revision proof long term archive and is a corner stone for the legal compliance of your IT systems.
Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo does not use hardware retention features. |
If needed you can add verifiable evidence records to the documents (signatures, timestamps) to proof the integrity and authenticity of content and author. The creation of the evidence is not a feature of arveo . It only stores the record together with the content. |
In this chapter you will find all information how to setup a secure and legally compliant content service platform with arveo.
Data types
arveo distinguishes three kinds of data and stores each to the most suitable storage system.
-
Content: arveo stores unstructured content like documents, audio, video and images to either a cloud object storage or a file system storage.
Most cloud providers like AWS S3, NetAPP ONTAP, EMC Elastic Cloud, etc. provide file system storage or object storage systems. Object storages are organized in buckets and allow you to store an almost unlimited numbers of objects in a bucket. arveo accesses the content via REST Standard S3.
For an optimized and fast access of often used content objects arveo can integrate a NoSQL Keyvalue Cache DB like redis. -
Structured system properties: containing all primary keys and technical information about documents, containers and folders. The data has a fix data model and requires highest performance, consistency and transaction support. arveo saves the data on a relational database.
-
Customer specific metadata: The data model is different for each document, container or folder type. This metadata is semi structured and new properties might be added during the life cycle of the application.
-
Eventually consistent customer information: Sometimes the consistency of the data is not important, but we must guarantee a high performance and facets support when we filter by any value without the risk of a full table scan. arveo saves customer metadata on a NoSQL document DB apache solr 8.6 which is highly efficient for inserting and searching and offers automatic completion and facets.
-
Consistent customer keys: The properties require highest performance, consistancy and transaction support. arveo saves the data on a relational database.
-
High availability
The high availability (HA) of arveo depends highly on the HA of the storage systems for all kind of data. Each of the storage systems and as a result the arveo services follow the CAP (Consistency, Availability and Partition Tolerance) theorem saying that the availability and fail safety of a system depend on:
-
Consistency: All clients see the same content and metadata.
-
Availability: All clients can read and write.
-
Partition Tolerance: the system is fail safe when one or more nodes fail.
The CAP theorem in a nutshell predicts that you cannot have all three properties but only two of them.
As arveo is a ECM cloud platform consistency and availability (read/write) of content and metadata are most important. arveo tolerates that network or message failure of either the primary content storage or database node can cause exceptions on the client application. The arveo services do not store data within their containers and focus on scalability and partition tolerance.
The arveo micro services should be deployed as containers in your cloud environment (e.g. kubernetes) and auto scaling should be implemented. |
Data integrity
arveo ensures the immutability and integrity of all your digital content and evidence records by an automated hash check each time content is up- or downloaded.
Upload
Hash-Check: When you use the upload content API, the client side and content service compute SHA-256 hash for the streamed data. Only if both values are identical the upload process is successful. The upload API allows you to pass the expected SHA-256 value and the API will only return OK if the server side hash matches the expected hash.
Verify: The upload API optionally can verify the uploaded content. The content service downloads the just uploaded stream from the content storage and compares the hash once again with the expected value (Upload Content). arveo stores the hash value in a system property and persists the value in the document type metadata table.
The verify option of the Upload API may slow down your system when uploading a huge amount of data. |
Transactions
The arveo REST API is stateless and there is no session. That means that all REST API calls are atomic and all database commands are executed within one transaction. arveo guarantees the atomicity of the transactions and to avoid inconsistent states, all aborted transactions are removed and rolled back. Hanging transactions are removed and rolled back to avoid database locks.
The database provider should configure the transaction deadlock timeout on your database to avoid locks on the database that can decrease the performance of your UPDATE and DELETE calls. |
Download
When you use the download API (Download Content) the client SDK computes the SHA-256 hash of the downloaded stream and compares it to the hash value in the system property of the document type. If the hash does not match the upload hash value in the database the download fails with a data integrity exception telling the caller that the data on the storage was most likely manipulated.
An administrator only can illegally manipulate content if he can access both database and content storage because the control hash value of the content is stored in the database. Take care that none of your administrators has exclusive and unattended access to the content storage and the database. |
Distributed management roles of the storage systems and the arveo transparent encryption feature can make your system forgery-proof! |
Content storage
arveo support evidence proof long term storage of your content and metadata by storing the content legally secure to either a S3 object storage or a file system. The storage must be redundant. Object storage systems like AWS, NetAPP or EMC Elastic Cloud Storage guarantee the long term availability and integrity of your content.
All changes of content or metadata are persisted as a traceable and immutable version (Versioning) on your storage system and an audit entry is written to the audit log table (Audit Log). arveo creates a version each time metadata including comments and annotations or content of a document is changed by the API arveo} creates a new entry containing the author and the timestamp pf the change in the version management table. The Update API allows you to add a comment to each version. The Version Management API provides access to all version information and metadata and content of previous versions.
To ensure that the content is immutable only arveo should have write access to the storage system. Only authorized data protection officers & administrators should have write-access to the storage system. In case of very sensible data you can enable encryption (Encryption) to follow the data protection rules and prevent your administrators from access of document content. |
For best high availability the provider of your storage system must protect the stored data against accidental, malicious, or disaster-induced loss of data. The better your data replication the better is your availability in case of a failure.
Data replication (redundancy)
For both supported storages (S3, file system) you can select between different data replication strategies:
-
Backup or Mirror enables an asynchronous disaster recovery. Your content data is periodically mirrored and the data;
-
Synchronous replication;
-
Asynchronous replication.
Fail Safety (Consistency, Availability)
As arveo stores each version of the content as an immutable object it is not possible that clients will get outdated data. If the replication is asynchronous it only can happen that clients get a read error.
In case the storage is offline arveo is not available and the system has an outage. In case the storage allows only read access arveo can download content but upload operations fail.
If the storage node has a long term outage the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.
We strongly recommend using a redundant file system or object storage system. If you do not at least backup your data periodically
a data loss is likely. For high availability with almost zero data loss your storage system should replicate the written content and data synchronously. The operating team of the platform must ensure that an appropriate replication is set up and monitored. |
You can configure different storage location (Cloud-Storage or on premise) for your content and document types (Storage Configuration). Reduce costs by storing non compliant and legally relevant data like PDF/A renditions of documents on storage systems with lower availability and performance SLAs. |
Object storages with REST APIs are designed for the cloud. If you decide to use storage from the Cloud (public or private) we recommend to use object storage via S3 API. Object storages provide a high level of redundancy (even geo redundant) and fail safety. The REST S3 API is very tolerant against network and infrastructure failures. |
Consistent meta data storage (relational database)
The relational database postgreSQL 12 is responsible for 100% consistent processing of the structured metadata and transactions.
Data replication (redundancy)
For the supported databases postgreSQL 12 you can select between different data replication strategies:
-
Asynchronous replication (backup or mirror): Enables an asynchronous disaster recovery. Your database is periodically mirrored.
-
Synchronous database cluster: Transactions are synchronously replicated on more than one master node.
The provider of the postgreSQL 12 cluster must guarantee that data is stored redundant and reduce potential data loss. |
Fail safety (consistency, availability)
In case the database cluster is down or allows only read access arveo is not available (Deny Of Service/DOS). If the database has a long term outage and the data files are affected the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.
Eventually consistent meta data storage (NoSQL document database apache solr 8.6)
arveo uses modern NoSQL storage technologies to guarantee high search performance and horizontal scalability at all times. We store semi-structured or dynamic document metadata to a NoSQL document database apache solr 8.6.
Solr is an open source search platform that has been partially integrated into arveo.
Based on the type definitions that are created in arveo, arveo automatically creates a schema that Solr uses. In addition, for each client that is created in arveo, a new collection is also created in Solr, so that there is also a separation of data there.
Data Replication (Redundancy)
Setup a cluster of replicated nodes for apache solr 8.6. Refer to the apache solr 8.6 documentation to setup a redundant cluster.
Fail safety (availability, partition tolerance)
In case the database cluster is down arveo is still available but free customer searches fail. In case one database node is down or the database is read only arveo is still available but searches may return outdated results. If the database has a long term outage and the data files are affected the potential data loss is limited by the time that has passed since the last replication and the number of objects stored since that time.
The provider of the apache solr 8.6 cluster must guarantee that data is replicated between the nodes and that the a backup strategy prevents data loss. |
Clustering
Each arveo service can be configured as a service cluster to achieve HA. Depending on the deployment you can either set up an application server cluster (WAR deployment) or run our containerized applications on a cloud platform like open-stack with kubernetes.
Fail safety (consistency, availability)
Service | Failure risks | Recommended |
---|---|---|
User management Service |
No login possible, system outage |
Cluster 2 |
Config Service |
Configuration not available to all nodes, system outage |
Cluster 2 |
Registry Service |
Service registry not available, system outage |
Cluster 2 |
Document Service |
Store, edit and version documents and metadata not available, system outage |
Cluster 2-n, automatic scale up/down by load |
SAP Archive Link Service |
SAP archive link not available, SAP outage |
Cluster 2-n, automatic scale up/down by load |
Document Conversion Service |
Conversion to PDF/A not available |
Cluster 2-n, automatic scale up/down by load |
Enterprise Integration Service |
Job execution paused and integration with external systems not available |
Cluster 2-n |
Federation Service |
Access to external repositories (Documentum, Saperion) not available |
Cluster 2-n, automatic scale up/down by load |
Access Control Service |
Access of objects with access control list fails, partial system outage |
Cluster 2 |
Required 3rd party services
To operate arveo successfully with high availability the operator of the platform must provide the following services as a cluster.
Service | Failure risks | Recommended |
---|---|---|
Active MQ |
Asynchronous operations are not triggered |
Cluster 2 |
postgreSQL 12 |
Access to metadata not available, system outage |
Cluster 2-n depending on load and configuration of postgreSQL 12 cluster |
apache solr 8.6 |
Enterprise search not available |
Cluster 2-n depending on load and configuration of apache solr 8.6 cluster |
Content Storage |
Content access not available, system outage |
Storage cluster depending on provider |
Authentication Service (optional) |
Login not available via OAUTH2, system outage |
Cluster 2 |
Monitoring (optional) |
ELK (Elasticsearch, Logstash, and Kibana) |
Cluster 2 |
To achieve high availability for arveo the provider must guarantee that all required content services run as a cluster. |
Data deletion
By default all documents of a specific document type stored in arveo store the metadata to the configured database and their content to the object storage. When a version is created the content or metadata is stored as a traceable and immutable version (Versioning) to the database and storage system. That means that we have separate content objects and database entries for each version. Each document can have a retention period that ensures that the document cannot be deleted before the period expires.
You can delete or purge any object with the arveo Delete-API if you have the DELETE right for the document type and the object ACL and the retentions period has not expired.
The delete method deletes all entities including all versions of the object in the database, but it does not delete the content objects or files. The delete operation cannot be restored and the data is permanently deleted.
The purge method additionally erases the content objects or files from the content storage.
If you delete objects only in the database the content objects are orphaned, and it is impossible to restore them and almost impossible to delete them later on because there is no relation left in the database. The content objects remain as data trash in the system and cannot be accessed by the API. |
Recycle bin
Any document, container or folder type can use the optional recycle bin feature. If it is enabled, entities in the type definition can be moved to and restored from the recycle bin.
The recycle bin is implemented as a boolean database system property DELETED. Entities in the recycle bin will be filtered from normal queries by default, but a client can compose search expressions that override this behavior (see Recycle Bin).
If you delete or purge an object in the recycle bin it is deleted like a document without recycle bin feature and cannot be restored.
For compliance reasons the audit entries in the database are not deleted by the Delete-API and the delete operation written to the audit log. The operator of the platform must clean up the audit table after the legal retention period has expired. We recommend backing up the audit logs to meet the legal requirements of data protection and to ensure that the backups can be restored within the legal retention period. |
Automated recycle bin emptying
It is possible to empty your recycle bin by an automated job scheduled in the Enterprise Integration Service of arveo.You can activate the predefined empty recycle bin job, and you can change the age from the 6 months default value to the age you choose. The job deletes all entries permanently that have been in the recycle bin for longer than the set age.
Recovery log
In addition to the recycle bin feature, the arveo offers an additional safety layer to recover permanently deleted entities.By annotating a type definition with @Recovery, it is possible to define a time period, in which permanently deleted entities will be kept in a system-wide recovery table before they are removed completely.An entity in such a type definition that is deleted (or purged) will be removed from the type definition’s table (and its version table).A copy of each version of the entity will be stored in the recovery table making it possible to restore it manually.If the entity is a document, its contents will not be deleted from the storage until the entity is removed from the recovery table.
There is no API to restore data from the recovery table. This feature is only intended as a last backup in order to make accidentally deleted data available to the business by an administrator.The admin can copy the content file from the storage together with the JSON metadata and send it to the business department. |
Recovery log emptying
The system management API provides a method to remove expired entities from the recovery table.An entity is considered expired when its keep-until timestamp is in the past compared from the moment the method is invoked.A user who calls this method needs the ECR_PURGE_RECOVERY_TABLE authority (see Access Rights).
The recovery of deleted entities is a manual process. The recovery table contains a JSONB column containing a JSON representation of the entire entity including attributes, content information and modification information. Each version of an entity is contained in the recovery table as a separate row.
It is possible to empty the recovery log by an automated custom job scheduled in the Enterprise Integration Service of arveo. The job must execute the Management-API method to empty the recovery table. |
Installation
Deployment options
The lightweight and stateless services are delivered as containers for all platforms and allow the arveo to automatically scale horizontally. Customers have the choice between an on-premise, cloud or hybrid installation.
The deployment may be done as a:
-
Docker images (for the arveo services): A Docker image is a template that contains a set of instructions for creating a container. Several containers can be started from one image;
-
executable jar: Integrate the content services in your java application and run on any platform that provides a JVM;
-
a .war file: Deploy the services as web applications in an application server like Tomcat;
-
Spring Boot application: Deployed as a self running service using an embedded undertow servlet container;
-
Debian package: Debian packages are used for software installation on Debian-based operating systems;
-
Kubernetes HELM charts: Deploy the content services as containerized applications in your kubernetes environment with flexible HELM charts. That will enable load-dependent, automated service provision.
System requirements
This chapter describes the system requirements for an on premise installation. The configuration and deployment of all required artefacts is performed by Eitco or a partner by the automated deployment tool "Puppet".
General prerequisites
Firewall
Some firewall permissions are required. The IP addresses and the ports are customer-specific. In order to notify the provider of this, the customer must fill out the form customer-specific information.
Network Access
SSH access from the Eitco network to all customer-specific systems (including server) is required so that the installation can be carried out. Access to the official Ubuntu package sources is required. This is done either in the form of direct access via the Internet or by providing a local copy of the corresponding repository.
SMTP Mail
In addition, an SMTP server access is required for sending mail, as well as access to the Eitco Puppet Master via VPN. The following parameters must also be provided by the customer so that any error messages from the HL7 Integration Service can be sent by email:
-
SMTP_SERVER
-
SMTP_PORT
-
SMTP_STARTTLS = true / false
-
SMTP_USER
-
SMTP_PASSWORD
-
MAIL_TO
-
MAIL_FROM.
MAIL_TO is the address to which the mails are sent and MAIL_FROM is the sender address.
Reference Integration System
A reference system (in the form of a VM or similar) is required to test the system. There must be the same setup as on the customer client systems (i.e. the same web browser, with the same settings, etc). In addition, a terminal / RDP access is to be provided so that Eitco can test the client installation.
Web browser
For the administration user interfaces the following web browsers are supported: Safari, Google Chrome, Microsoft Edge, Mozilla Firefox, each in the current version.
Containerized Applications
For the installation of the product, certain requirements for the hardware, software and infrastructure to be provided must be met. In a typical cloud environment each arveo service is deployed as a containerized application and is hosted and scaled by a cloud operating system. However, a different setup can be used, depending on the customer infrastructure and the load of the system (see Deployment Options)
The following chapter describes the minimum CPU and RAM requirements of each arveo service in a production environment.
Service | CPU | RAM |
---|---|---|
Document Service |
4 x> 2 GHz |
>= 32 GB |
User Management Service |
1x > 2 GHz |
>= 2 GB |
Registry Service |
1x > 2 GHz |
>= 512 MB |
Config Service |
1x > 2 GHz |
>= 512 MB |
Access Control Service |
1x > 2 GHz |
>= 2 GB |
Audit Service |
1x > 2 GHz |
>= 512 MB |
SAP Archive Link Service (optional) |
1x > 2 GHz |
>= 1 GB |
Document Conversion Service (optional) |
1x > 2 GHz |
>= 2 GB |
Enterprise User Management Service (optional) |
1x > 2 GHz |
>= 1 GB |
Enterprise Integration Service (optional) |
1x > 2 GHz |
>= 1 GB |
Federation Service (optional) |
1x > 2 GHz |
>= 2 GB |
The number of started services for each service group and the assigned CPU and RAM depends very much on the load and the number of documents and objects in the database. You should always monitor the system and scale up or down on demand. Especially service like document conversion or enterprise integration service can produce heavy load and require a lot of containers consuming RAM and CPU. |
For a test or development system the requirements are lower and each service requires: < 1 CPU, 256 MB for all services. |
Typical Non-Containerized Installation
Assuming that the installation is performed as spring boot services we recommend to set up a minimum of 3 machines. The database and the document service carry the highest load and should be deployed on separate machines. All other services and 3rd party services can run on one OS instance. Some services like Archive Link, Document Conversion may consume high CPU and RAM and can make it necessary to outsource them to separate machines,
-
System machine 1 - database. The PostgreSQL database is installed here.
Component | Recommendation | Note |
---|---|---|
CPU |
4x (> 2 GHz) |
|
RAM |
At least 16 GB |
Depending on the size of the database |
DB Storage |
Proportional to the number and the kind of the entities |
Recommendation: should be stored on separate storage |
Log files |
Depending on the volume of changes to the database |
Recommendation: Should be stored on separate storage |
OS |
Ubuntu 18.04/20.04 |
The operating system recommendation is optional, hence any system satisfying the requirements of the PostgreSQL database may be installed |
-
System machine 2 - Document Service is installed here.
Component | Recommendation | Note |
---|---|---|
CPU |
4x (> 2 GHz) |
|
RAM |
32 GB |
|
Storage |
Proportional to the size of the content objects |
These storages are supported: |
OS |
Ubuntu 18.04/20.04 |
The tests are performed on a Debian machine, hence it is recommended to install a Debian based distribution, for example a current LTS version of Ubuntu |
The storage is meant for storing the arveo content objects of type Document, meaning binary content. All metadata and system properties are stored in the database, see System machine 1 above. |
-
System machine 3 - Here all other services of arveo are installed: see Content Services, 3rd party services
Component | Recommendation | Note |
---|---|---|
CPU |
4x (> 2 GHz) |
|
RAM |
16 GB |
|
OS |
Ubuntu 18.04/20.04 |
The operating system should be a Debian based |
The importance of testing shouldn’t be underestimated, so there should always be a way to test specific cases without trying it out on a production system. For this reason, it is important to create a test system, which has the same specification and a similar data set as the original system. |
For the arveo services JDK 11, 16 is required. All the other recommendations listed above are non-binding, but they have proven to work well. In some cases, other recommendations can be made, according to your individual project setup as well as the requirements of the project.
Installation
General concept
These instructions describe the installation procedure, the installation content and the items required for commissioning the product. We recommend controlling the rollout of the _arveo services by a continuous integration process that provides all artefact required for the deployment of the required content services and your web solution and integrations.
Depending on the underlying platform, deployment takes place via binary service artifacts that are deployed on pre-installed VMs or via containerized applications that are made available in the host cloud system.
On premise installation By Eitco
This chapter describes the compliant On Premise installation provided by Eitco. The configuration and deployment of all required artefacts is performed by Eitco or a partner by the automated deployment tool "Puppet".
The customer provides several virtual machines that are configured by Eitco with the automated deployment tool Puppet (Puppet Deployment) in order to ensure a problem-free software rollout in the customer system.
Depending on the service level agreement Eitco can guarantee high availability, reliability and high performance at all times. The system has to be protected from manipulation attempts by technical or organizational measures. The data that is stored and managed in the system is protected via the API. The access and editing rights are managed via ACLs. User rights are based on the concepts for roles, groups and ACLs. More detailed information on this is provided in the relevant chapters of this manual.
All changes to the system and the data are logged via the API, and the changes are traceable via the audit log. If auditing is activated, every database change is logged. In order to guarantee the atomicity of the transactions and to avoid inconsistent states, all are aborted transactions removed and rolled back.
Access to all data (documents, metadata) is exclusively provided via the API, with the corresponding protection mechanisms so that the security of the data is guaranteed at all times.
Puppet
Puppet is open source software developed by Puppet Labs and is used for the automated configuration and deployment of software deliveries. It ensures the configuration management of servers with both Unix-like operating systems and the Windows operating system via network. The Ad-min-Tool allows the automated configuration of computers and servers as well as the services installed on them. The arveo services are installed and configured with Puppet. After the server has been provided, see System Requirements, the Puppet Agent is installed on it, which then takes care of setting up the environment and the actual application. The duration of the installation process can vary and requires an adequate internet connection. The individual installation components are installed in the form of .deb packages. The installation is completely automated and carried out remotely.
Installed services
-
postgreSQL 12 Database
-
apache solr 8.6 Document Database (full text)
-
JDK 11, 16
-
Keycloak, Active Directory Authentication Service
-
Active MQ Message Service Hub
-
Tomcat 9 Application Server
-
Document Service
-
Registry Service
-
Configuration Service
-
User Management Service
-
Access Control Service
-
Audit Service (optional)
-
Document Conversion Service (optional)
-
Enterprise Integration Service (optional)
-
Enterprise User Management Service (optional)
-
Enterprise Federation Service (optional).
Customer Applications & Services
-
Eitco or Customer application and integration services (typically web client and Apache Camel integration end pints)
Order of services
Following you find the order of the service starts. The content services may not work before important services are started.
All commands should be executed as root. When running as a non-root user, sudo should be set in front of systemctl. |
The services are initially started by Puppet. After the installation of arveo has been successfully completed, the customer applications can be started. Additional information on registration, user management and the use of the web client can be found in the user and admin manual.
-
postgreSQL 12: systemctl start/stop postgresql
-
apache solr 8.6: systemctl start/stop solr.service, systemctl start/stop zookeeper.service
-
Config Service: systemctl start/stop common_config_service.service
-
Registry Service: systemctl start/stop common_registry_service.service
-
User Management Service: systemctl start/stop common_user_management.service
-
ACL Service: systemctl start/stop common_access_control.service
-
Enterprise User Management Service: systemctl start/stop common_enterprise_user_management.service (optional)*
-
Federation Service: esystemctl start/stop cr_federation.service (optional)
-
Audit Service: systemctl start/stop common_audit.service (optional)
-
Document Service: systemctl start/stop ecr_repository_service.service
-
Document Conversion Service: systemctl start/stop common_document_conversion.service (optional)
-
Enterprise Integration Service: systemctl start/stop common_enterprise_integration.service (optional)
The current status of the service can also be determined with systemctl status <service>. |
SSL Certificates
If all connections between the services are to be encrypted, SSL certificates are required. The following requirements apply: An X-509 certificate with an associated "private key" is required for each server. The certificate should be signed by an official CA or the company’s own CA. Self-signed certificates can also be used. The following special feature must be observed: the X509 extension “Subject Alternative Name” must contain all DNS names and IP addresses via which the respective systems are accessed.
Licensing
The client software uses several 3rd party licenses. The list of licenses can be called up via the following link: https://<customername>.eitco.de/3rdpartylicenses.txt.
Backups
The logs are on the db server in the directory /var/log/postgresql/backup.log. The database backup script is located at /var/lib/postgresql/backup.sh. This can also be started manually at any time. There should not yet be a folder with the current date under / backup / full /. If such a folder exists, it must be moved beforehand. The script is controlled by cron and is always started automatically at 10 p.m.
Getting Started
In this guide you will create a simple application that implements a basic project file scenario. It will consist of a document type, that represents documents used in a project.
Prerequisites
To complete the steps in this guide, you need the following tools installed on your machine:
-
JDK 11 or newer (https://adoptium.net/). Please use only LTS versions!
-
Apache Maven (https://maven.apache.org/)
-
An IDE of your choice (we recommend IntelliJ)
Maven configuration
To be able to access the maven artifacts of arveo, you need access to the EITCO Nexus repository.
Internal
When you are inside the company network or the VPN, you can use the internal Nexus that does not require authentication. The following maven settings.xml file shows how to configure the required repositories. The settings.xml file can be found in the .m2 directory in your user home directory.
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<pluginGroups>
</pluginGroups>
<proxies>
</proxies>
<servers>
</servers>
<mirrors>
</mirrors>
<profiles>
<profile>
<id>repos-default</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
</properties>
<repositories>
<repository> (1)
<id>nexus</id>
<url>https://nexus-intern.eitco.de/repository/maven-private/</url>
<releases>
<updatePolicy>never</updatePolicy>
</releases>
<snapshots>
<updatePolicy>never</updatePolicy>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository> (2)
<id>nexus</id>
<url>https://nexus-intern.eitco.de/repository/maven-private/</url>
<releases>
<updatePolicy>never</updatePolicy>
</releases>
<snapshots>
<updatePolicy>never</updatePolicy>
</snapshots>
</pluginRepository>
</pluginRepositories>
</profile>
</profiles>
</settings>
1 | The maven repository that contains maven artifacts of arveo |
2 | The plugin repository that contains maven plugins used when building the demo project |
External
When you are outside the company network and the VPN, you need to use the public Nexus repository that requires authentication. To do so, maven requires credentials. For security reasons, the credentials should be encrypted. Follow the instructions in the Maven documentation to configure a master password and to create an encrypted password.
You should now have created a settings-security.xml file in the .m2 directory like the one shown below:
<settingsSecurity>
<master>{encryped-master-password}</master>
</settingsSecurity>
Then you have to adapt your maven settings.xml as follows:
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<pluginGroups>
</pluginGroups>
<proxies>
</proxies>
<servers>
<server>
<id>nexus</id> (1)
<username>username</username> (2)
<password>{your-encrypted-password}</password> (3)
</server>
</servers>
<mirrors>
</mirrors>
<profiles>
<profile>
<id>repos-default</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
<solr.repositoryUrl>https://nexus.eitco.de/repository/raw-public</solr.repositoryUrl> (4)
</properties>
<repositories>
<repository>
<id>nexus</id> (5)
<url>https://nexus.eitco.de/repository/maven-private/</url>
<releases>
<updatePolicy>never</updatePolicy>
</releases>
<snapshots>
<updatePolicy>never</updatePolicy>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>nexus</id>
<url>https://nexus.eitco.de/repository/maven-private/</url>
<releases>
<updatePolicy>never</updatePolicy>
</releases>
<snapshots>
<updatePolicy>never</updatePolicy>
</snapshots>
</pluginRepository>
</pluginRepositories>
</profile>
</profiles>
</settings>
1 | The server id is used to tie credentials to repositories |
2 | The username you use to logon to nexus |
3 | The password encrypted by maven using the master password |
4 | Sets the repository to use for the SOLR plugin used in the system tests |
5 | Tells maven to use the credentials for the server with id 'nexus' |
Make sure to use only https repositories when using credentials. Current maven versions already block the usage of unencrypted repository connections.
Step 1 - Type definitions
In the first step you will define the data model of your application. In arveo, this is done by creating Java (or Kotlin) interfaces which contain getters and setters for the fields that will be available on each individual entity type. There is a maven archetype to create a project that will contain those type definition interfaces and integration tests to try out the created types. More information about the archetype can be found here.
First, create a directory that will contain the project files for the demo application. Open a command line in this directory and perform the following operation.
mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-types-archetype -DarchetypeVersion=<arveo-version>
The archetype version is the arveo version that you are working with. The current version is 13.0.4.
Maven will start by downloading a couple of required artifacts. After that, the archetype plugin will be started in interactive mode. It will query for several settings required for the generated project. Some of the settings have default values that can be used.
-
class-name-prefix: A prefix that will be used for the generated classes. Use Demo for this guide.
-
groupId: The group-id of the artifact that will contain the types. Use de.eitco.demo.
-
artifactId: The artifact-id of the artifact that will contain the type. Use demo-types.
-
version: The version of the artifact. You can use the default value.
-
package: The package that will contain the types. You can use the default value.
In the last step, the archetype plugin shows the selected property values and asks for confirmation. After the settings are confirmed, the project will be generated in a folder called demo-types.
The archetype documentation contains a description of the generated project. For this guide, the files in implementation/types are the most important ones:
-
DemoModel.java: This file contains the type definition interface. The generated example is a simple document type with a name, a number and some system properties. The annotations used to define the type are documented here.
-
DemoTypeRegistration.java: A spring component that automatically registers your type(s) in the arveo service. Only types that have been registered can be used in your application.
-
spring.factories: This file tells spring to autoconfigure the DemoRegistration component.
The archetype has generated integration tests for the generated type definition, too. You can find them in the directory test/system-test. The file DemoClientIT.java contains some tests that show how to perform basic CRUD operations on the generated document type.
You will notice an additional class called DemoModelId. This class demonstrates how to create a typed ID for a specific model class. It is not required for the system to be able to use the DemoModel type. If you do not require typed IDs, you can remove the class.
Running the tests
The tests are run automatically in a full maven build. The system-test module is configured to automatically start a complete arveo system including all required services and a database. If you want to run the tests manually from the IDE, you can still use maven to start the arveo system.Open a command line in the system-test directory and run mvn -Denv
. Maven will start the following processes:
-
A PostreSQL database server
-
An ActiveMQ message broker
-
A SOLR server
-
The Service Registry
-
The Configuration Service
-
The User management Service
-
The Audit Service
-
The Access Control Service
-
The arveo Service
The services will be kept alive until you press enter in the command line.
This will only work if a complete build has been performed at least once (which can be done through mvn install
).
The system set up by maven in the system test module is already configured to contain the type definitions that were
defined in this project. To use those definitions in another system, you have to add the jar containing the definitions
to the classpath of the arveo service instances. This can be done by copying the ja to a lib directory and
adding the following command line option when starting the arveo service instances:
-Dloader.path=path/to/libs
|
Adapt the model
Now you can adapt the generated type definition so that it fits the requirements for our project scenario. In this scenario, documents are organized in a two-level folder structure. For example, the project could contain a folder called "invoices" which again contains two folders named "inbound" and "outbound". Each document is contained in exactly one folder and belongs to exactly one project. The document type will contain the following meta data fields:
-
projectName: The name of the project the documents belongs to
-
type: The type of document, e.g. whether it is an invoice, a contract or something else
-
structureLevel1: This field is used to represent the first level of the folder structure
-
structureLevel2: This field is used to represent the second level of the folder structure
-
status: Represents the current status of the document
-
customerName: The name of the customer associated to the project
-
contactPerson: The contact person for the document
-
assignedTo: The employee currently assigned to work on the document
-
fileSystemCreationDate: The timestamp at which the file was created in the file system (not the time it was imported to arveo - see system fields)
In addition to these custom fields, the document will contain some system fields like content metadata (filename, size, mimetype…) and versioning information like creation- and update-timestamps. The two metadata fields name and number that are already contained in DemoModel.java can be removed.
Adding getters for system fields
Complete listings for the steps below can be found at the end of this chapter.
Let’s first add some getters for system fields. Those will provide access to system information that is generated automatically when an entity is created or updated. The generated DemoModel class already contains getters for the ID- and ACL- system properties. Add the following lines to DemoModel.java:
@SystemProperty(SystemPropertyName.CONTENT)
Map<String, ContentInformation> getContentInformation();
@SystemProperty(SystemPropertyName.VERSION_INFO)
VersionInformation getVersionInformation();
@SystemProperty(SystemPropertyName.MODIFICATION_INFO)
ModificationInformation getModificationInformation();
The JavaDoc for the SystemPropertyName enum constants contains information about each field. The data type for the contentInformation field is a map because each document can contain multiple content elements. For example, a document could contain a TIFF image and a PDF rendition of the TIFF.
Adding getters and setters for custom fields
Now we can add the getters and setters for the custom metadata fields:
@Mandatory
String getProjectName();
void setProjectName(String projectName);
@Mandatory
String getStructureLevel1();
void setStructureLevel1(String structureLevel1);
@Optional
String getStructureLevel2();
void setStructureLevel2(String structureLevel2);
@Optional
String getCustomerName();
void setCustomerName(String customerName);
@Optional
String getContactPerson();
void setContactPerson(String contactPerson);
@Mandatory
ZonedDateTime getFileSystemCreationDate();
void setFileSystemCreationDate(ZonedDateTime fileSystemCreationDate);
@Optional
Long getAssignedTo();
void setAssignedTo(Long assignedTo);
The annotations @Mandatory
and @Optional
can be used to control which fields have to be set by the client and which can be left empty.
The annotations for the arveo type definitions always have to be added to the getters. You can find an overview of the supported data types here. |
For the type field we want to limit the possible values that can be set. This can be done by defining an enumeration. Create the following enumeration type:
package de.eitco.demo.types;
import de.eitco.ecr.type.definition.annotations.Enumeration;
@Enumeration
public enum DemoModelType {
INVOICE,
CONTRACT,
OTHER
}
This enum class will be mapped to an enumeration type on the database. It needs to be registered in the type registration just like the DemoModel type. Adapt the class DemoTypeRegistration as follows:
@Component
@Register(DemoModel.class)
@Register(DemoModelType.class)
@Register(DemoModelStatus.class)
public class DemoTypeRegistration implements TypeDefinitionRegistration {
}
We will do the same for the status field. Add and register the following enum class:
@Enumeration
public enum DemoModelStatus {
IN_PROGRESS,
DONE
}
Don’t forget to register it in the DemoTypeRegistration class.
Now you can add the getters and setters for the two fields in the DemoModel class:
@Mandatory
DemoModelType getType();
void setType(DemoModelType type);
@Optional
DemoModelStatus getStatus();
void setStatus(DemoModelStatus status);
Your DemoModel class should now look like this:
package de.eitco.demo.types;
import de.eitco.commons.asdl.annotation.AsdlIgnore;
import de.eitco.commons.asdl.annotation.Model;
import de.eitco.commons.user.management.common.model.ModificationInformation;
import de.eitco.ecr.common.ContentInformation;
import de.eitco.ecr.common.VersionInformation;
import de.eitco.ecr.common.document.DocumentId;
import de.eitco.ecr.type.definition.annotations.ObjectType;
import de.eitco.ecr.type.definition.annotations.Type;
import de.eitco.ecr.type.definition.annotations.constraint.Mandatory;
import de.eitco.ecr.type.definition.annotations.constraint.Optional;
import de.eitco.ecr.type.definition.annotations.system.SystemProperty;
import de.eitco.ecr.type.definition.annotations.system.SystemPropertyName;
import java.time.ZonedDateTime;
import java.util.Map;
@Model
@Type(ObjectType.DOCUMENT)
public interface DemoModel {
@SystemProperty(SystemPropertyName.ID)
DocumentId getDocumentId();
@AsdlIgnore
default DemoModelId id() {
return DemoModelId.of(getDocumentId());
}
@SystemProperty(SystemPropertyName.ACL_ID)
Long getAclId();
void setAclId(Long aclId);
@SystemProperty(SystemPropertyName.CONTENT)
Map<String, ContentInformation> getContentInformation();
@SystemProperty(SystemPropertyName.VERSION_INFO)
VersionInformation getVersionInformation();
@SystemProperty(SystemPropertyName.MODIFICATION_INFO)
ModificationInformation getModificationInformation();
@Mandatory
String getProjectName();
void setProjectName(String projectName);
@Mandatory
String getStructureLevel1();
void setStructureLevel1(String structureLevel1);
@Optional
String getStructureLevel2();
void setStructureLevel2(String structureLevel2);
@Optional
String getCustomerName();
void setCustomerName(String customerName);
@Optional
String getContactPerson();
void setContactPerson(String contactPerson);
@Mandatory
ZonedDateTime getFileSystemCreationDate();
void setFileSystemCreationDate(ZonedDateTime fileSystemCreationDate);
@Mandatory
DemoModelType getType();
void setType(DemoModelType type);
@Optional
DemoModelStatus getStatus();
void setStatus(DemoModelStatus status);
@Optional
Long getAssignedTo();
void setAssignedTo(Long assignedTo);
}
Before you can build and use the adapted type, you have to adapt the generated integration tests.
Step 2 - Command line tool
In the second step you will implement a simple command line application that uses the model defined in step 1. We will use the Spring Initializer to generate a maven project with the required dependencies for a Spring command line application.
Generating the project
-
Go to https://start.spring.io/
-
Under "Project", select "Maven Project"
-
Under "Language", select "Java"
-
Select Spring Boot version 2.7.10. If your required version is not available, select the most compatible one in terms of major.minor.patch.
-
Define project metadata. For example, use Group = de.eitco.demo, Artifact = demo-tool, Name = demo-tool, Package name = de.eitco.demo.tool
-
Select "Jar" Packaging
-
Select Java version 11 or newer
-
Add a dependency to "Picocli"
Click Generate and download the zip file containing the generated project. Unzip the file to a directory of your choice and open the project in your IDE. Delete the 'test' directory.
Adding arveo dependencies
Open the generated pom.xml file and add the following dependencies:
<dependency>
<groupId>de.eitco.ecr</groupId> (1)
<artifactId>ecr-sdk-http</artifactId>
<version>13.0.4</version>
<exclusions>
<exclusion> (2)
<groupId>de.eitco.commons</groupId>
<artifactId>cmn-spring-security5-oauth2-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency> (3)
<groupId>de.eitco.commons</groupId>
<artifactId>cmn-spring-security5-oauth2-client-non-web</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>de.eitco.demo</groupId> (4)
<artifactId>demo-types-types</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
1 | This dependency contains a spring boot starter for the arveo SDK |
2 | We have to exclude the OAuth2 client for web applications because the tool will be a console application |
3 | This dependency contains the OAuth2 client for non-web applications |
4 | The data model that was defined in step 1 |
You have to set the version of arveo that was used in the project containing the data model.
Additionally, you have to define a dependency management for the EITCO Commons Spring Security library:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>de.eitco.commons</groupId>
<artifactId>cmn-commons-spring-security</artifactId>
<version>7.0.10</version>
</dependency>
</dependencies>
</dependencyManagement>
Implementing the tool
The tool will use the picocli library to make it easy to write a command line application with features like usage help and simple parameter binding. You can read more about picocli here.
Go to the de.eitco.demo.tool package and create a new class called "ArveoCommand". This will contain the business logic behind the commands available in the command line application. In picocli, those commands need to implement Runnable, so we have to implement this interface:
@Component (1)
@CommandLine.Command( (2)
mixinStandardHelpOptions = true,
version = "1.0-SNAPSHOT",
description = "arveo demo tool")
public class ArveoCommand implements Runnable {
private static final Logger LOGGER = Logger.getLogger(ArveoCommand.class); (3)
@Override
public void run() {
}
}
1 | Defines ArveoCommand as an injectable spring component |
2 | Activate picocli features like usage help |
3 | We will use the de.eitco.commons.lang.Logger to log exception messages |
To be able to access arveo, the command line tool will have to authenticate to the arveo service. We will use a simple username/password authentication, so the user must be able to enter credentials. With picocli, we can implement this with some annotated fields in ArveoCommand:
@CommandLine.Option(names = {"-u", "--username"}, required = true, interactive = true,
description = "The username used to log on to arveo")
private String username;
@CommandLine.Option(names = {"-p", "--password"}, required = true, interactive = true,
description = "The password used to log on to arveo")
private String password;
@CommandLine.Option(names = {"-t", "--tenant"}, required = true,
description = "The tenant used to log on to arveo")
private String tenant;
Using the interactive=true option, the user will be prompted to enter username and password when the program is running.
The ArveoCommand class will need to know the directory to import from and (optionally) a customer name. To be able to access the arveo API, we have to get an instance of the TypeDefinitionServiceClient. This can be done using dependency injection:
@CommandLine.Option(names = {"-d", "--directory"}, required = true,
description = "The base directory to import from")
private File baseDirectory;
@CommandLine.Option(names = {"-c", "--customer"}, description = "The name of the customer")
private String customer;
@Autowired
private TypeDefinitionServiceClient typeDefinitionServiceClient;
Now it is time to implement the import. Add the following methods to the ArveoCommand class:
private void importProject(File root) {
String projectName = root.getName();
Arrays.stream(root.listFiles()).forEach(file -> {
if (file.isFile()) {
LOGGER.warn(() -> "Ignored file " + file);
} else {
importLevel1(projectName, file);
}
});
}
The importProject method will be used to import a project located in the provided root directory. The scenario does not support files located directly in the root of the project, so we will log a warning when we encounter such a file.
private void importLevel1(String projectName, File level1) {
String level1Value = level1.getName();
Arrays.stream(level1.listFiles()).forEach(file -> {
if (file.isDirectory()) {
importLevel2(projectName, level1Value, file);
} else {
importFile(projectName, level1Value, null, file);
}
});
}
The importLevel1 method will collect all files and directories located in the first level of the project structure. Files will be imported directly, directories will be passed to the next importer method.
private void importLevel2(String projectName, String level1Value, File level2) {
String level2Value = level2.getName();
Arrays.stream(level2.listFiles()).forEach(file -> {
if (file.isDirectory()) {
LOGGER.warn(() -> "Ignoring directory " + file);
} else {
importFile(projectName, level1Value, level2Value, file);
}
});
}
This method collects all files located in the second level of the project structure. We do not support deeper structures, so we log a warning when we encounter a directory below level 2.
The method used to actually import data into arveo is shown below:
private void importFile(String projectName, String level1, String level2, File file) {
AuthenticationHelper.runAsUser(username, password, tenant, () -> { (1)
TypedDocumentServiceClient<DemoModel> serviceClient = (2)
typeDefinitionServiceClient.getDocumentServiceClient().byClass(DemoModel.class);
DemoModel model = serviceClient.createTypeInstance(); (3)
model.setProjectName(projectName);
model.setStructureLevel1(level1);
model.setStructureLevel2(level2);
model.setCustomerName(customer);
model.setFileSystemCreationDate(ZonedDateTime.ofInstant(
Instant.ofEpochMilli(file.lastModified()),
ZoneId.systemDefault())
);
DemoModelType type = DemoModelType.OTHER; (4)
String fileName = file.getName();
if (fileName.startsWith(DemoModelType.CONTRACT.name())) {
type = DemoModelType.CONTRACT;
} else if (fileName.startsWith(DemoModelType.INVOICE.name())) {
type = DemoModelType.INVOICE;
}
model.setType(type);
try (InputStream stream = Files.newInputStream(file.toPath())) {
ContentUpload contentUpload = new ContentUpload(fileName, stream);
Map<String, ContentUpload> contentElements = Map.of("content", contentUpload); (5)
serviceClient.create(new TypedDocumentInput<>(contentElements, model)); (6)
System.out.println("Imported file " + fileName + " belonging to project " + projectName);
} catch (IOException e) {
LOGGER.exception(e);
}
});
}
1 | The AuthenticationHelper takes care of populating spring’s security context with the required credentials. The OAuth2 client will use the provided username and password to retrieve an access token from the authentication service to authenticate the requests to the arveo service. |
2 | We use the injected TypeDefinitionServiceClient to get a service client for the type definition of our model class. |
3 | The service client can provide an instance of the interface defining the model. This instance is then populated with the metadata. |
4 | We will use a simple file name prefix to determine the type of the document. |
5 | Here we define the content elements of the new document |
6 | Finally we send the create request to the arveo service |
We can now implement the run() method of the ArveoCommand class:
@Override
public void run() {
if (!baseDirectory.isDirectory()) {
throw new IllegalArgumentException("Base directory option must point to a directory.");
}
Arrays.stream(baseDirectory.listFiles(File::isDirectory)).forEach(this::importProject); (1)
}
1 | We use a filter to ignore files in the base directory as they obviously do not belong to any project |
Now we have to adapt the application class that was generated by the spring initializer. Spring provides a CommandLineRunner interface for command line applications. Adapt the DemoToolApplication class as shown below:
@SpringBootApplication
public class DemoToolApplication implements CommandLineRunner {
@Autowired
private ArveoCommand arveoCommand;
public static void main(String[] args) {
new SpringApplicationBuilder(DemoToolApplication.class)
.web(WebApplicationType.NONE) (1)
.run(args);
}
@Override
public void run(String... args) throws Exception {
new CommandLine(arveoCommand).execute(args); (2)
System.exit(1);
}
}
1 | Turns off spring boot web features that are not required in a command line application |
2 | Initialize the picocli command line and execute our command with the options from the command line |
In the last step, we have to set some configuration properties for our command line tool. Rename the generated application.properties file in src/main/resource to application.yaml and add the following settings:
spring:
security:
oauth2:
client:
registration:
cmn-user-service-client: (1)
provider: user-service
client-id: "test-client"
client-secret: "my-secret"
authorization-grant-type: "password"
scope: "arveo"
provider:
user-service: (2)
authorization-uri: "http://localhost:39004/oauth/auth"
token-uri: "http://localhost:39004/oauth/token"
eureka:
client:
registerWithEureka: false (3)
logging: (4)
file:
name: "demo-tool.log"
level:
root: ERROR
1 | Configures an OAuth2 client that uses the resource owner password grant type. Client-id and secret are configured in the test system provided by the system test module of the project created in step 1. |
2 | Tells the OAuth2 client where to get a token from |
3 | The command line tool should not register itself in the service registry |
4 | Log only errors to a file |
Building and running the tool
Now we can build and run the command line tool. You can either use the IDE or run mvn clean install
in a command line for the project containing the demo tool. After the build has finished, you have to start the test system. Open a command line in the system-test module of the type definition project and execute the command mvn -Denv
(see Running the tests). Now we can use another command line in the target directory of the command line tool project to run the tool. Running java -jar .\demo-tool-0.0.1-SNAPSHOT.jar
prints out usage help for the tool:
> java -jar .\demo-tool-0.0.1-SNAPSHOT.jar . ____ _ __ _ _ /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ \\/ ___)| |_)| | | | | || (_| | ) ) ) ) ' |____| .__|_| |_|_| |_\__, | / / / / =========|_|==============|___/=/_/_/_/ :: Spring Boot :: (v2.5.8) Missing required options: '--username', '--password', '--directory=<baseDirectory>' Usage: <main class> [-hV] -p -u [-c=<customer>] -d=<baseDirectory> [-t=<tenant>] arveo demo tool -c, --customer=<customer> The name of the customer -d, --directory=<baseDirectory> The base directory to import from -h, --help Show this help message and exit. -p, --password The password used to log on to arveo -t, --tenant=<tenant> The tenant used to log on to arveo -u, --username The username used to log on to arveo -V, --version Print version information and exit.
The test system already contains a user that can be used for testing. The user’s credentials are:
-
username: ecr-user
-
password: password
The following example shows how to use the tool to import projects from a folder:
> java -jar .\demo-tool-0.0.1-SNAPSHOT.jar -p -u -t=integrationtest -c=Customer1 "-d=C:\test-data\" . ____ _ __ _ _ /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ \\/ ___)| |_)| | | | | || (_| | ) ) ) ) ' |____| .__|_| |_|_| |_\__, | / / / / =========|_|==============|___/=/_/_/_/ :: Spring Boot :: (v2.5.8) Enter value for --password (The password used to log on to arveo): Enter value for --username (The username used to log on to arveo): WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by de.eitco.commons.reflection.MethodLookup to constructor java.lang.invoke.MethodHandles$Lookup(java.lang.Class,int) WARNING: Please consider reporting this to the maintainers of de.eitco.commons.reflection.MethodLookup WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Imported file INVOICE_invoice1.txt belonging to project TestProject1 Imported file INVOICE_invoice2.txt belonging to project TestProject1 Imported file customer meeting 1.txt belonging to project TestProject1 Imported file standup 1.txt belonging to project TestProject1 Imported file standup 2.txt belonging to project TestProject1 Imported file standup 3.txt belonging to project TestProject1 Imported file uncategorized meeting 1.txt belonging to project TestProject1
The warning message can be ignored. The reflective access operation will be replaced in a future version.
Finally, here is a complete listing of the ArveoCommand class for copy&paste:
package de.eitco.demo.tool;
import de.eitco.commons.lang.Logger;
import de.eitco.commons.spring.security.AuthenticationHelper;
import de.eitco.demo.types.DemoModel;
import de.eitco.demo.types.DemoModelType;
import de.eitco.ecr.common.ContentUpload;
import de.eitco.ecr.sdk.TypeDefinitionServiceClient;
import de.eitco.ecr.sdk.document.TypedDocumentInput;
import de.eitco.ecr.sdk.document.TypedDocumentServiceClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import picocli.CommandLine;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.time.Instant;
import java.time.ZoneId;
import java.time.ZonedDateTime;
import java.util.Arrays;
import java.util.Map;
@Component
@CommandLine.Command(
mixinStandardHelpOptions = true,
version = "1.0-SNAPSHOT",
description = "arveo demo tool")
public class ArveoCommand implements Runnable {
private static final Logger LOGGER = Logger.getLogger(ArveoCommand.class);
@CommandLine.Option(names = {"-u", "--username"}, required = true, interactive = true,
description = "The username used to log on to arveo")
private String username;
@CommandLine.Option(names = {"-p", "--password"}, required = true, interactive = true,
description = "The password used to log on to arveo")
private String password;
@CommandLine.Option(names = {"-t", "--tenant"}, required = true,
description = "The tenant used to log on to arveo")
private String tenant;
@CommandLine.Option(names = {"-d", "--directory"}, required = true,
description = "The base directory to import from")
private File baseDirectory;
@CommandLine.Option(names = {"-c", "--customer"}, description = "The name of the customer")
private String customer;
@Autowired
private TypeDefinitionServiceClient typeDefinitionServiceClient;
@Override
public void run() {
if (!baseDirectory.isDirectory()) {
throw new IllegalArgumentException("Base directory option must point to a directory.");
}
Arrays.stream(baseDirectory.listFiles(File::isDirectory)).forEach(this::importProject);
}
private void importProject(File root) {
String projectName = root.getName();
Arrays.stream(root.listFiles()).forEach(file -> {
if (file.isFile()) {
LOGGER.warn(() -> "Ignored file " + file);
} else {
importLevel1(projectName, file);
}
});
}
private void importLevel1(String projectName, File level1) {
String level1Value = level1.getName();
Arrays.stream(level1.listFiles()).forEach(file -> {
if (file.isDirectory()) {
importLevel2(projectName, level1Value, file);
} else {
importFile(projectName, level1Value, null, file);
}
});
}
private void importLevel2(String projectName, String level1Value, File level2) {
String level2Value = level2.getName();
Arrays.stream(level2.listFiles()).forEach(file -> {
if (file.isDirectory()) {
LOGGER.warn(() -> "Ignoring directory " + file);
} else {
importFile(projectName, level1Value, level2Value, file);
}
});
}
private void importFile(String projectName, String level1, String level2, File file) {
AuthenticationHelper.runAsUser(username, password, tenant, () -> {
TypedDocumentServiceClient<DemoModel> serviceClient =
typeDefinitionServiceClient.getDocumentServiceClient().byClass(DemoModel.class);
DemoModel model = serviceClient.createTypeInstance();
model.setProjectName(projectName);
model.setStructureLevel1(level1);
model.setStructureLevel2(level2);
model.setCustomerName(customer);
model.setFileSystemCreationDate(ZonedDateTime.ofInstant(
Instant.ofEpochMilli(file.lastModified()),
ZoneId.systemDefault())
);
DemoModelType type = DemoModelType.OTHER;
String fileName = file.getName();
if (fileName.startsWith(DemoModelType.CONTRACT.name())) {
type = DemoModelType.CONTRACT;
} else if (fileName.startsWith(DemoModelType.INVOICE.name())) {
type = DemoModelType.INVOICE;
}
model.setType(type);
try (InputStream stream = Files.newInputStream(file.toPath())) {
ContentUpload contentUpload = new ContentUpload(fileName, stream);
Map<String, ContentUpload> contentElements = Map.of("content", contentUpload);
serviceClient.create(new TypedDocumentInput<>(contentElements, model));
System.out.println("Imported file " + fileName + " belonging to project " + projectName);
} catch (IOException e) {
LOGGER.exception(e);
}
});
}
}
Step 3 - Perform a task with arveo
In the third step you will perform a specific task using arveo.
Creating a standardized project structure
Your project structure must have a certain structure to be successfully imported and/or archived in arveo.
skinparam Legend { BackgroundColor transparent BorderColor transparent } legend Projects |_ Project 1 |_ Element 1.1 |_ Element 1.2 |_ Project 2 |_ Element 2.1 end legend
Here is an example of implementing this structure:
skinparam Legend { BackgroundColor transparent BorderColor transparent } legend Projects |_ Webclient |_ Orders |_ Invoices |_ Server_maintenance |_ Invoices |_ Email_correspondence end legend
In the last step of the tutorial you executed the Maven command
> java -jar .\demo-tool-0.0.1-SNAPSHOT.jar -p -u -t=integrationtest -c=Customer1 "-d=C:\test-data\"
Now you can replace the last element with your actual project folder:
> java -jar .\demo-tool-0.0.1-SNAPSHOT.jar -p -u -t=integrationtest -c=MedicalInsuranceAG "-d=C:\Projects"
After this command has been executed, you will see the report about imported files and folders:
Enter value for --password (The password used to log on to arveo): Enter value for --username (The username used to log on to arveo): ... Imported file Orders - Received_invoices.png belonging to project Webclient Imported file Incoming_invoice.png belonging to project Webclient ...
The Types archetype
This archetype creates a rather small project. It consists of an arveo scenario and tests for that.
The maven coordinate of this archetype are:
<groupId>de.eitco.ecr</groupId>
<artifactId>ecr-types-archetype</artifactId>
<version>{project-technical-version}</version>
To create an arveo scenario project use the maven archetype plugin:
mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-types-archetype -DarchetypeVersion={project-technical-version}
Here, the variable {project-technical-version} must be replaced with the actual version, f.e. 5.0.1.
Also, you have to remember, that this will generate a project structure into a project folder. So before you type this command in your command line, make sure you have prepared a folder where your project structure is going to be and you have switched into this folder on your command line.
This will start a process that will ask for some parameters and then generate a maven project according to the parameters. The following parameters will be asked for:
|
The maven groupId of the new project |
|
The maven artifactId of the new project |
|
The maven version of the new project |
|
A prefix for the names of the generated classes. |
|
The location in the eitco bitbucket server where the sources are (or will be). For a project located in https://git.eitco.de/scm/<project>/<repository>.git, this would be |
Some or all of these parameters can also be given on the commandline via -D
. The process will not ask for parameters given by command line. So the command
mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-types-archetype -DarchetypeVersion={project-technical-version} -DgroupId=my.group.id -DartifactId=my-artifact-id -Dversion=0.0.1-SNAPSHOT -Dclass-name-prefix=My -Dscm-locator=prj/repo.git
would not ask for any parameters and just create the project.
Overview of the generated project
The project generated by the archetype will consist of two modules:
implementation\types
This module contains your arveo scenario. An example type will be created with the name <class-name-prefix>Model. You can define more types here, but you will need to register them in register in <class-name-prefix>TypeRegistration. The chapter arveo type definitions describes how to define types.
test\system-test
This module contains tests for your scenario. These tests will be executed in the build. For that a complete arveo environment will be created, so you can add tests, that simply connect to arveo by the http client and can assume that your scenario is deployed.
This module can also be used to set up an arveo environment with your scenario on which you can then run tests manually. In the module run
mvn -Denv
to set up the environment. It will be torn down when you press <enter> in the console.
The full-featured archetype
This archetype creates a more complex project. It is based on the eitco commons archetype It will contain a simple web service, with an automatically generated client layer, based on eitco commons. The maven coordinate of this archetype are:
<groupId>de.eitco.ecr</groupId>
<artifactId>ecr-service-archetype</artifactId>
<version>{project-technical-version}</version>
To create an arveo based service project use the maven archetype plugin:
mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-service-archetype -DarchetypeVersion={project-technical-version}
This will start a process that will ask for some parameters and then generate a maven project according to the parameters. The following parameters will be asked for:
|
The maven groupId of the new project |
|
The maven artifactId of the new project |
|
The maven version of the new project |
|
A prefix for the names of the generated classes. |
|
The location in the eitco bitbucket server where the sources are (or will be). For a project located in https://git.eitco.de/scm/<project>/<repository>.git, this would be |
|
when set to false, it will create a little more complex project, including the audit service, the user-management enterprise service and jmeter samplers. If set to true (the default value) these features will be disabled but can be activated by uncommenting certain source locations. |
Some or all of these parameters can also be given on the commandline via -D
. The process will not ask for parameters given by command line. So the command
mvn archetype:generate -DarchetypeGroupId=de.eitco.ecr -DarchetypeArtifactId=ecr-service-archetype -DarchetypeVersion={project-technical-version} -DgroupId=my.group.id -DartifactId=my-artifact-id -Dversion=0.0.1-SNAPSHOT -Dclass-name-prefix=My -Dscm-locator=prj/repo.git -Ddisable-optional-features=false
would not ask for any parameters and just create the project.
Overview of the generated project
The project generated by the archetype will consist of four modules:
-
documentation
-
implementation
-
packaging
-
test
The documentation module
This module holds a frame for an asciidoc based documentation of your project.
The implementation module
This module contains the actual source code. It is separated into five submodules.
-
common
-
This submodule contains classes that are available on the server side as well as the client side.
-
-
generated
-
This submodule contains modules that are automatically generated.
-
Normally developers will not add code in these modules.
-
They are however relevant for building the project.
-
-
The following submodules exist
-
serialization
-
This submodule contains automatically generated serialization meta information.
-
-
client
-
This submodule contains a few submodules itself, holding client side applications for:
-
a java spring based http client api,
-
a java spring based embedded client api,
-
a typescript http client api.
-
-
-
jmeter-sampler
-
This submodule generated jmeter samplers of the services api, usable in load tests.
-
-
-
-
server
-
This submodule contains the server side implementation.
-
-
types
-
this submodule contains the arveo based model. The generated interface named <class-name-prefix>Model describes an arveo type definition as will every interface you register in <class-name-prefix>TypeRegistration. The jar compiled by this module will be available on the server side and client side. Additionally, it needs to be in the class path of your arveo instance. For the system tests (se below) this is already taken care of.
-
The packaging module
This module contains delivery artifacts to deliver the service to or with different runtimes. This includes:
-
a stand-alone jar
-
a java web archive (war)
-
a helm chart for deployment in a kubernetes cluster
The test module
This module contains a system test module. When building this module maven will start a complete arveo system (containing all required services) with the newly generated service in the pre-integration-test
-phase so that tests written here (like the generated example <class-name-prefix>ClientIT) may simply call the new service via the generated http-client (see above).
Working on the generated project
Most implementation will be done in the implementation\server module since this contains the server side code. You api and model will be defined in the implementation\common and implementation\types modules. The later will only be used for classes that are part of your arveo model and need to be in the classpath of arveo.
When testing your code, the test\system-test module comes in handy. As mentioned above, it will start a complete arveo system so that your tests can simply use the generated http client api to test your functionality. However, you can use this to manually test and debug your service, too. In case you simply need to start up the environment, in the test\system-test directory call:
mvn -Denv
If you want to debug your service call
mvn -Denv -Dservice.skip
This will start the environment except for your service. You can then start your service in debug mode from your IDE.
In both cases you can now start tests manually or call the service api directly to test your code.
Administration
Configure Database access
arveo uses the default spring datasource configuration for the JDBC datasource. The datasource must be configured as shown in the following example:
spring:
datasource:
url: "jdbc:postgresql://localhost:5432/postgres?currentSchema=arveo&ApplicationName=${spring.application.name}"
driver-class-name: org.postgresql.Driver
username: username
password: password
Specifying the ApplicationName
property is optional but can be helpful when analyzing database issues. The name of the
Spring application will then be visible in Postgres query analytics.
The username and password should not be stored in the configuration files. Instead, they should be stored in Vault. |
Advanced configuration properties can be found in the
Spring boot documentation.
To configure the connection pool, use the spring.datasource.hikari
properties.
Configure Storage Locations
Content and type definitions
Only Documents can contain content elements. A Document in the repository can contain several content elements. For example, a document could contain a content element with the original content (like a TIFF image or a Word document) and a PDF rendition. Each content element has a contentName and some more properties like the media type. The contentName is a label that uniquely identifies a single content element contained in a Document. For example, a Document might contain two content elements that are identified by the contentNames 'content' and 'rendition'.
The contentNames are not only relevant for uniquely identifying a content element contained in a document, but serve as reference for further customization of the repository. The repository does accept configuration options that are directly related to contentNames and the Document type definitions define restrictions regarding the allowed contentNames.
Type definitions define which contentNames can be contained in the entities stored in the definition.
Each content element is stored in a storage profile, which defines the place where the actual content will
be stored. The contentType parameter can be used to define what kind of content
a content element can contain. When the media type is set to application/octet-stream
, any kind of content can be used.
The name of a content element must start with a letter and can consist only of letters (upper- and lower-case),
numbers and the _ character. More formally, the name must match the regular expression [a-zA-Z][a-zA-Z0-9_]* .
|
Types of content elements
It is specified in the type definition, which content elements this type definition may have.
Usually, the content elements of the entities are stored in a JSON field in the database which contains the storage-ID and additional metadata like size, media type and a hash. The actual content data is not stored in the JSON field. If required, a content element can also be stored in a separate field of type text. The separate field will contain only the storage-ID but no additional metadata. Additional metadata for content elements using separate fields have to be handled by the client application, for example by storing them in a custom metadata attribute.
The following example is an object of type Document, for which two content elements are defined: "content" and "LARGE_CONTENT". In this example, "separateField = true" means a separate column in the database, otherwise it is written in the corresponding json field of the database. The name of a separate column in the database is derived from the name of the content element.
@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content", separateField = true)
@ContentElement(name = "LARGE_CONTENT", separateField = true)
public interface TwoContentsDocument {
@SystemProperty(SystemPropertyName.ID)
DocumentId getId();
@SystemProperty(SystemPropertyName.CONTENT)
Map<String, ContentInformation> getContentInformation();
String getName();
void setName(String name);
}
Using the contentType
attribute of the @ContentElement
annotation one can define the required content type for a
content element. The content type application/octet-stream is used as a wildcard type for any type of content.
For example, if the value of the contentType attribute is set to application/pdf, only PDF files can be stored in the
content element.
It is possible to define the content type of a new content element when it is uploaded. The server will trust this information, so the client is responsible to send the correct content type. If the client does not define the content type, the server will automatically detect the content type of the uploaded binary data.
The default content element
If a type definition of type DOCUMENT does not contain any @ContentElement
annotations, the server will automatically
assign a content element with the name content
to it. This content element’s metadata will be stored in the JSON
field of the type definition and it accepts any kind of content type.
The ContentElement annotation
The following Table contains an Overview of the available attributes of the @ContentElement
annotation.
Attribute | Default value | Explanation |
---|---|---|
name |
The name of the content element. This attribute is mandatory. |
|
profile |
The name of the storage profile used to store the content element. This attribute is optional. |
|
contentType |
application/octet-stream |
The type of content supported by the content element. |
separateField |
false |
Whether to store only the content ID and no additional metadata in a separate database field. |
fulltextExtraction |
false |
If true, the fulltext content of the content element will be extracted and stored in the NOSQL database. |
Storage profiles
A StorageProfile defines on which storage the content elements are saved. Access to the storage backends (like filesystem or S3) is handled by storage plugins.
A StoragePlugin is defined in the StorageProfile, which is used to access the connected storage. The same plugin can be used in several StorageProfiles. Each StorageProfile can have a different set of parameters (access data, URls, …) for the plugin.
ecr:
server:
storage:
profiles:
fileSystemProfile: (1)
defaultProfile: true (4)
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin (2)
pluginSettings: (3)
storagePath: /storage
s3Profile: (1)
pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin (2)
pluginSettings: (3)
pathStyleAccessEnabled: true
serviceEndpoint: "http://localhost:49999"
region: us-west-2
accessKey: myaccesskey
secretAccessKey: mysecretaccesskey
bucket: testbucket
1 | profile name |
2 | class name of the plugin |
3 | plugin specific configuration data like the path for the filesystem plugin or the bucket for the S3 plugin |
4 | defines this profile as the default profile (see Mapping content elements to storage profiles) |
Each profile is identified by name and defines the storage plugin to use. Plugin-specific settings can be configured
in the pluginSettings
map. So the plugin class name determines the storage technology and the plugin settings.
If a content element has been saved using the named StoragePlugin, the plugin defined in the profile will return a contentID, with which the stored data can be retrieved later. This id, which is usually of type String, is saved with the document. It is a task of the storage plugin to implement, which contents this id has. Usually it is a UUID, but it may also be a text string.
A plugin is assigned to each profile based on the fully qualified class name. Any name-value pairs can be specified for the configuration of the plug-in. The profiles are identified by their name.
Using aliases for storage profiles
It is possible to assign aliases to storage profile names. This might be required when storage profiles are mapped
to content elements by configuration as described below. Assigning aliases can be done in the configuration by defining
alias: profileName
entries as shown below:
ecr:
server:
storage:
profile-aliases:
alias1: encryptedProfile
another_alias: encryptedProfile
It is possible to define more than one alias for a storage profile.
Aliases are resolved before a content element is saved. The resulting ContentId
will contain the resolved profile, not
the alias name.
The bucket selector plugin does not support aliases when selection rules are evaluated. |
Mapping content elements to storage profiles
There are two ways to map a specific content element to a storage profile.
Mapping by code
To define the mapping of the content elements to storage profiles in the application code, the storage profile name
can be set in the @ContentElement
annotation using the profile
attribute.
@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content", profile = "fileSystemProfile")
public interface MyDocument {
}
The example above shows a document type with a single named content element that will be stored in a storage profile called fileSystemProfile.
Mapping by configuration
If the mapping should be controlled by the configuration and not be defined in the code, storage profiles with auto-
matchable names must be used. The matching is based on the name of the type definition (in snake-case) and the name
of the content element separated by -
.
The following type definition is used as an example in the following explanations. It uses one content element:
@Type(ObjectType.DOCUMENT)
@ContentElement(name = "rendition")
public interface MyDocument {
}
A matching profile for the content element named rendition of the interface MyDocument would be selected using the following steps:
-
Check if there is a profile called
my_document-rendition
. If so, use it. -
If not, check if there is a profile called
my_document
. If so, use it. -
If not, check if there is a default storage profile. If so, use it.
-
If none of the steps above succeeded, an exception is thrown.
Examples
The following example shows the simplest possible configuration. The type definition does not contain any content element.
It implicitly uses the default content element named content
. The content element will be stored in a storage profile
called my_document, or, if no such profile exists, in the default storage profile.
@Type(ContentType.DOCUMENT)
public interface MyDocument {
}
The next example shows the same type definition, but with an annotation that defines which storage profile to use.
@Type(ContentType.DOCUMENT)
@ContentElement(name = ContentElement.CONTENT, profile="fileSystemProfile")
public interface MyDocument {
}
The next example shows a type definition that contains two content elements. The "rendition" content element will support only PDF documents. The PDFs contained in the rendition content element will be stored in an S3 storage. The content in the other element will either be stored in a profile called my_document-content, in a profile called my_document or, if neither of those profiles exists, in the default profile.
@Type(ContentType.DOCUMENT)
@ContentElement(name="content")
@ContentElement(name="rendition", contentType="application/pdf", storageProfile="s3Profile")
public interface MyDocument {
}
Plugin configuration
The service uses a plug-in interface for connection to the specific storage provider. The following plugins are currently available:
File system
Class name: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin.
The FileSystemPlugin offers storage of the data as files in the file system.
parameter | meaning |
---|---|
storagePath |
Path to the directory that is used to store the files |
AWS, NetAPP or EMC Elastic Cloud Storage
Class name: de.eitco.ecr.storage.plugin.s3.S3Plugin.
The S3 plug-in stores data in an Amazon S3 compatible storage.
If arveo has no permissions to create buckets, then the administrator has to create the buckets manually. |
Parameter | Meaning | Default value |
---|---|---|
pathStyleAccessEnabled |
Configures the client to use path-style access for all requests. Amazon S3 supports virtual-hosted-style and path-style access in all regions. The path-style syntax, however, requires that you use the region-specific endpoint when attempting to access a bucket |
false |
serviceEndpoint |
The URL to the S3 endpoint to be used by the plugin |
|
region |
The region for access to AWS |
|
accessKey |
AWS Access Key |
|
secretAccessKey |
AWS Secret Access Key |
|
bucket |
The name of the S3 bucket to be created by the plugin. The name can only contain lowercase letters. |
|
signer |
Sets the name of the signature algorithm to use for signing requests made by this client. If not set, the default configuration of the Amazon S3 SDK will be used. |
|
proxyhost |
The optional proxy host used by the client when connecting to the S3 storage. |
|
proxyprotocol |
The protocol (HTTP or HTTPS) used to connect to the proxy. |
|
proxyport |
The port used by the client to connect to the proxy. |
|
streambuffersize |
Size of the send- and receive-buffers in bytes. |
32768 |
uploadpresignedurl |
If set to true, the client will use pre-signed URL requests to communicate with the S3 storage. |
false |
acceleratemode |
Configures the client to use S3 accelerate endpoint for all requests. |
false |
maxconnection |
The maximum number of allowed open HTTP connections. |
-1 (no limit) |
maxErrorRetries |
The maximum number of retries for failed requests. |
-1 (no retries) |
baseDelay |
The base delay in milliseconds for the retry policy. |
-1 (no delay) |
maxBackoffTime |
The maximum backoff time in milliseconds for the retry policy. |
-1 (no maximum backoff time) |
backoffStrategy |
The backoff strategy used by the retry policy. |
|
retentionEnabled |
Enables the use of S3 object locks for object retention. |
false |
retentionMode |
Specifies the protection level of retention object locks. Can be |
|
Configuring the retry policy of the S3 plugin
The Amazon S3 SDK used to connect to a S3 compatible storage supports different ways to retry failed requests. By default,
a retry policy using jitter and 3 retries is used. To configure a custom retry policy, all three parameters baseDelay
,
maxBackoffTime
and backoffStrategy
have to be configured. The backoffStrategy
parameter must be set to one of the following values:
-
FULL_JITTER
-
EQUAL_JITTER
-
EXPONENTIAL
The Amazon documentation contains an explanation of the different strategies.
Retention
The S3 plugin supports the usage of S3 object locks to set a retention time and litigation hold status on content
elements stored in the S3 compatible storage. To enable the feature, set the parameter retentionEnabled
to true.
When the retention support is enabled, the bucket used by the storage profile must be created manually. The S3 Object Locks option must be enabled for the bucket. |
The S3 plugin uses the governance retention mode by default, which means, that retention protected objects
can be deleted by or overwritten by any user of the AWS account with the required privileges. When the compliance
retention mode is used, no user (not even the root administrator of the S3 account) is able to delete or overwrite
retention protected objects. To configure this behavior, set the property retentionMode
to GOVERNANCE
or
COMPLIANCE
. More information about object locks can be found in the
AWS documentation.
When the COMPLIANCE retention mode is used, it is impossible to delete objects from the S3 storage account before
the end of the retention interval is reached.
|
Azure blob storage
Class name: de.eitco.ecr.storage.plugin.azureblob.AzureBlobStoragePlugin
The Azure blob storage plugin can be used to connect to a storage account in Microsoft Azure.
Parameter | Meaning | Default value |
---|---|---|
connectionString |
The connection string used to connect to the storage account. The access string can be obtained from the azure portal. |
|
containerName |
The name of the container in the storage account that will contain the data of the storage profile. |
|
timeoutMillis |
The timeout in milliseconds for requests to Azure. |
5000 |
retentionSupport |
Enables usage of the immutability policy feature of Azure. |
false |
policyMode |
Sets the protection level of the immutability policies. Can be |
|
Additional parameters contained in the plugin configuration will be passed on to the Configuration
used for the
Azure SDK.
Retention
The Azure blob storage plugin supports the immutability policy feature of Azure blob storage. Using this feature enables an additional security level for retention protected content elements. If a content element is retention protected or in a litigation hold, it will not be possible to delete it using the Azure management interface or the Azure SDK.
To enable the retention support, the parameter retentionSupport
must be set to true.
When the retention support is enabled, the container used by the storage profile must be created manually in Azure. The setting version-level immutability support must be enabled when the container is created. To be able to enable the version-level immutability support, the storage account must support versioning for blobs. More information can be found in the Azure documentation. |
The plugin creates unlocked immutability policies by default. Unlocked policies can be altered by Azure users with the
required privileges. Locked immutability policies can neither be deleted nor can the expiry time be shortened.
Prolonging the expiry time (and by this, the retention period), is still possible. Note that even the administrator
of the storage account is not able to delete objects with a locked immutability policy. To configure the policy mode,
set the parameter policyMode
to LOCKED
or UNLOCKED
.
When the policyMode is set to LOCKED , it is not possible to delete retention protected objects from the storage
account before the end of the retention interval is reached.
|
BucketOrganizer
Class name: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin.
The BucketOrganizer is not specific for a specific storage technology or storage interface but delegates storage requests to other storage plugins.The selection of the target plugin depends on the retention information of the document that contains the content element to be stored. The selection criteria that are used to select the target plugin can be configured in terms of a list of bucket selection rules.
The relevant retention information of the document is defined by the values of the system fields RETENTION_DATE and LITIGATION_HOLD. This value pair is matched against the bucket selection rules.The matching process starts with the first rule and continues to the next rule if the rule does not match the value pair.The matching process ends at the first rule that matches the value pair.The storage profile named in this rule will be used to store the content. Each bucket selection rule consists of three parts that are separated by the pipe (|) symbol.
1. retention date match expression
The retention date match expression is usually a time interval that begins at some calendar day and extends to some
later calendar day. The notation for the interval is inspired by ISO 8601 and may read like this
2021-01-01+01:00—2022-01-01+01:00
. The general format is begin_date—end_date
, that is both dates are separated
by "--". A retention date matches the expression if begin date ⇐ retention date < end date
.
The begin and end dates are specified as YYYY-MM-DD
followed by a time zone offset as +hh:mm
or -hh:mm
It is possible to define open intervals by specifying one of the boundary dates as UNBOUNDED
.
Retention dates may be NULL if the retention date has not (yet) been set on the document. A NULL retention date will
not match any interval specified in a match rule. For this reason the retention date match expression may be specified
to be NULL
to match NULL retention dates.
A retention date match expression can also be specified to be *
if the rule should always match.
2. litigation hold match expression
The litigation hold match expression can be one of these literals: true
, false
, *
.
While the literal *
will always match, the other literals will match the denoted value only.
3. target storage profile name
The name of the target storage profile to be used if both expressions match the corresponding system field values
Configuration parameters
Parameter | Meaning |
---|---|
bucketSelectionRules |
A list of bucket selection rules |
storage:
profiles:
bucketProfile: (1)
pluginClassName: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin (2)
pluginSettings:
bucketSelectionRules: (3)
- "*|true|fsProfileLitigationHold" (4) (5)
- "NULL|false|fsProfileForever" (4)
- "2021-01-01+01:00--2022-01-01+01:00|false|fsProfile2021" (4)
- "2022-01-01+01:00--2023-01-01+01:00|false|fsProfile2022" (4)
- "2023-01-01+01:00--2024-01-01+01:00|false|fsProfile2023" (4)
- "2024-01-01+01:00--2025-01-01+01:00|false|fsProfile2024" (4)
- "2025-01-01+01:00--2026-01-01+01:00|false|fsProfile2025" (4)
- "2026-01-01+01:00--2027-01-01+01:00|false|fsProfile2026" (4)
- "2027-01-01+01:00--2028-01-01+01:00|false|fsProfile2027" (4)
- "2028-01-01+01:00--2029-01-01+01:00|false|fsProfile2028" (4)
- "2029-01-01+01:00--2030-01-01+01:00|false|fsProfile2029" (4)
- "2030-01-01+01:00--2031-01-01+01:00|false|fsProfile2030" (4)
- "*|*|fsProfileAnotherEra" (4)
fsProfileLitigationHold: (5)
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${project.build.directory}/storage/litigationHold
fsProfileForever:
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${project.build.directory}/storage/forever
fsProfile2021:
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${project.build.directory}/storage/2021
#...
fsProfile2030:
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${project.build.directory}/storage/2030
fsProfileAnotherEra:
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${project.build.directory}/storage/anotherEra
1 | profile name; |
2 | type of plugin, so its class name; |
3 | rules list. |
4 | a bucket selection rule, consisting of retention date match expression, litigation hold match expression and target storage profile name. |
5 | the referenced profile name. |
Storage profile templates
To reduce the number of required entries in the list of bucket selection rules, storage profile templates can be used.
A storage profile template consists of a name template with placeholders, a specific time range and the regular
configuration parameters like the class name of the storage profile.The <year>
placeholder can be used as a variable
for the current year.
Storage profile templates are configured in a separate section as shown below:
ecr:
server:
storage:
profiles:
bucketProfile:
pluginClassName: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin
pluginSettings:
bucketSelectionRules:
- "*|true|fsProfileLitigationHold"
- "NULL|false|fsProfileForever"
- "<year>-01-01+01:00|false|fsProfile<year>|2021--2030" (1)
profile-templates:
- nameTemplate: "fsProfile<year>" (2)
genericTimeRange: "2021--2029" (3)
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${storage.base.directory}/storage/<year> (4)
1 | A bucket selection rule using a profile template with the year placeholder for the years between 2021 and 2030. |
2 | A name template that will create profiles for the years 2021 to 2029. |
3 | Defines the time range used to create profiles based on the template |
4 | The year placeholder can be used in the configuration properties of the plugin. |
Writing a custom storage plugin
As mentioned above, arveo uses a plugin interface for the connection to the storage backends. This section describes how to write a new storage plugin.
All classes and interfaces required to implement a custom plugin are contained in the dependency
<dependency>
<groupId>de.eitco.ecr</groupId>
<artifactId>ecr-server</artifactId>
<version>13.0.4</version>
<scope>provided</scope>
</dependency>
A storage plugin must implement the interface de.eitco.ecr.server.storage.StoragePlugin
. There are two abstract
implementations that can be extended to simplify the implementation:
-
AbstractStoragePlugin
: Provides several methods that make it easier to get plugin configuration settings. -
AbstractSimplifiedStoragePlugin
: The superclass of all plugins that do not need retention support
In addition to the interface to implement, there are some guidelines to respect when writing a custom storage plugin:
-
The plugin must provide a default no argument constructor because it will be instantiated using reflection.
-
The plugin can use dependency injection, but because of the need for a default constructor, only field injection using
@Autowired
is possible. -
There will be one instance of the plugin for each storage profile configured to use the plugin, so the plugin must be thread-safe.
Configuration settings
The StoragePlugin
interface contains a method called configure
, which will be called once for each plugin instance.
It is used to process the generic parameter values that might be required to configure the plugin. For example, the
parameters might contain a path to a file system directory or credentials for a remote storage system. Because storage
plugins can be configured in profile templates, it might be necessary to replace
placeholders configured in the template. The class AbstractStoragePlugin
already contains helper methods like
getMandatoryProperty
that take care of these replacements. The configure
method is expected to return the actual
configuration with all replacements that is used by this plugin instance. The returned configuration settings are used
by the health checks.
Using the custom storage plugin
To use the custom plugin, it is enough to add its classes to the classpath of the repository service. The plugin can then
be used for a storage profile by specifying it’s qualified class name in the pluginClassName
parameter. To add the
plugin’s class to the classpath, use the -Dloader.path=<path>
argument to start the service. The argument must point
to a directory containing the required jar files.
Renditions
Renditions of content elements, for example a PDF rendition of an image, can be created automatically. To create a
rendition, the @Rendition
annotation can be used as shown in the following example.
@Type(ObjectType.DOCUMENT)
@ContentElement(name = "original", separateField = true) (1)
@Rendition(name = "rendition", sourceElement = "original", contentType = MediaType.APPLICATION_PDF_VALUE, separateField = true) (2)
@OverwriteAllowed
public interface DocumentWithRendition {
String getName();
void setName(String name);
@ContentType(contentElement = "original") (3)
String getContentType();
void setContentType(String contentType);
@SystemProperty(SystemPropertyName.RENDITION_STATUS) (4)
Map<String, RenditionStatusInformation> getRenditionStatus();
}
1 | The content element containing the original content |
2 | The rendition content element to create automatically |
3 | The content type of the original content |
4 | A getter for the current status of the renditions of the document |
The above example shows a document type with one content element and one rendition. Both the original and the rendition
element are stored in separate fields. This is possible but not required for renditions. When the original content
is stored in a separate field, meta information like the content type is not stored in the database. As the content
type of the original content is required to create a rendition, it is recommended to define an attribute of type 'String'
that contains the original content’s type. The attribute must be annotated with @ContentType
to bind its value to
the original content element and return a valid mime type string like "image/jpeg". If no such attribute is present,
the system will try to detect the content type automatically.
The current status of the renditions can be retrieved as shown in the example above. The returned map contains a
RenditionStatusInformation
instance for each rendition content element of the document. The status information contains
a status value and the number of times the system tried to create the rendition, if available. The status of a rendition can
be one of the following values:
parameter | meaning |
---|---|
AVAILABLE |
The rendition was created successfully (or was uploaded by a client) and is available. |
PENDING |
The rendition is not yet available but is expected to be available in the future. |
FAILED |
Creating the rendition has failed permanently. |
EMPTY |
The rendition is not available because the source content element does not exist. |
RESET |
Creating the rendition has failed and the status was manually reset (see error handling). |
The @Rendition
annotation accepts the following parameters:
parameter | meaning |
---|---|
name |
The name of the rendition content element |
sourceElement |
The name of the content element to create a rendition of |
contentType |
The type of the rendition to create (a mime type string like "application/pdf") |
profile |
The name of the profile used to store the rendition content element (optional) |
separateField |
(optional) whether to store the renditions’s meta data in a separate field or in the JSON content field |
Renditions are created asynchronously. When a document is created or updated, a message will be posted to a queue in ActiveMQ. The messages are processed by event listeners in the repository service. Depending on the current load it might take some time until the rendition is available.
The system will not try to create a rendition when the rendition content element is written by the client. |
The actual rendering will be done by the Document Conversion Service. Which conversions are supported, depends on the plugins available on the classpath of the service.
Error handling
When the creation of a rendition fails, the system will re-try to create the rendition. The number of re-tries can be configured, the default is three (see configuration properties). When all retries have failed, the rendition message will be added to a dead letter queue and the status field of the rendition will be set to -1 (FAILED). For this to work, the message queue in ActiveMQ must be configured to use an individual dead letter queue as described in the ActiveMQ documentation.
<policyEntry queue="ecr-queue-create-renditions">
<deadLetterStrategy>
<individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true"/>
</deadLetterStrategy>
</policyEntry>
Reset status of failed renditions
The status of failed renditions can be set to RESET (-2) either by using the API method
de.eitco.ecr.sdk.document.TypedDocumentServiceClient.resetFailedRenditionStatus
or simply by setting the value in the
database directly. A system job polls the database and will enqueue new rendition messages in ActiveMQ to re-try to
create the renditions. The interval in which the job polls the database can be configured using the parameter
retry-renditions.cron-expression
(see configuration properties).
Dynamically skipping renditions
There are cases where the decision, whether to create a rendition for a content element, can only be made at run-time. For cases like this a type can provide a method implementing that decision. This method is marked by the annotation @RenditionCreationCondition
. Only one method of a type may have this annotation. The method
-
must have the return type
boolean
,java.lang.Boolean
orkotlin.Boolean
-
In case it is
java.lang.Boolean
it may not return null
-
-
must not be abstract
-
should the defining class be an interface this means that it is either a static or a default-method
-
note that - should the type be defined in kotlin and the method not be static - this means it has to be compiled with
-Xjvm-default=all
or-Xjvm-default=all-compatibility
-
-
-
can have up to two parameters of type
RenditionInfo
-
The first representing the source to render
-
And the second representing the target to render to
-
if only one parameter is given it is assumed to be the source
-
If such a method exists, arveo evaluates it before posting rendition messages. If the method returns false
the message is not posted. Such a method may be present on types that are not Documents, but will not have any effect. This might be helpful in scenarios where there are complex inheritance structures.
Example 1
Let’s assume a scenario where we have a document with a content element "content" that can have an arbitrary type. It is supposed to be a multi-page document, so in most cases it is a pdf-file. However, there are cases where a document is created with the content element being an MS-word document and in some cases it is just a single page image. Even multi-page tiffs are possible and in some seldom cases the content is unclear and simply "application/octet-stream".
In this scenario there is a web viewer that is supposed to show the documents content. For the viewer, pdf files are no problem whatsoever. It is fully capable to view the images also, except multiple-page tiff files that pose a problem. It is unable to view ms-office files. And for "application/octet-stream" it can only provide a download link.
Thus is decided that the backend needs to create a pdf rendition for ms-office formats and tiff files. This could be implemented with the following class:
@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content") (1)
@Rendition(name = "rendition", sourceElement = "content", contentType = MediaType.APPLICATION_PDF_VALUE) (2)
public interface DocumentWithDynamicRenditionDecision {
String getName();
void setName(String name);
@RenditionCreationCondition
default boolean decideRendition( (3)
RenditionInfo source,
RenditionInfo target (4)
) {
if (source.getMediaType().equals("application/msword")) { (5)
return true;
}
if (source.getMediaType().equals("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) {
return true;
}
if (source.getMediaType().equals("image/tiff")) {
return true;
}
return false;
}
}
1 | A content element with the name "content" is defined. |
2 | A pdf rendition of that element is defined with the name "rendition". |
3 | A default method "decideRendition" is created and marked with @RenditionCreationCondition .
|
4 | Note that the second parameter is unused. It could be omitted. |
5 | The implementation of the method is pretty simple. It checks whether the mime-type of the source element is one that we want to create a rendition for - ms-word files (old and new) or tif. If so, it returns true indicating that the arveo should create a rendition for the element. Otherwise, it returns false so that no rendition is created. |
Example 2
Assume the application described in example 1. Assume further that at one point it becomes necessary to migrate some older documents to this application. An importer is written, however most of the imports fail. This is due to the fact that many of the documents are in an older msword format that the current render engine is incapable of transforming into pdf. So it is decided to not create a rendition for those elements and simply provide a download link in the applications' client.
This poses a problem in the decideRendition()
method: Ms word documents that are created from the old source still should have created a rendition for. Thus, it is not possible to decide whether to render from the source type alone. A simple solution for this could be to add a new property create_rendition
to the type. This nullable boolean could be set when created to imply whether to create a rendition for the content element or not. A value of null
would activate the behaviour already implemented:
@Type(ObjectType.DOCUMENT)
@ContentElement(name = "content")
@Rendition(name = "rendition", sourceElement = "content", contentType = MediaType.APPLICATION_PDF_VALUE)
public interface DynamicRenditionExample2 {
String getName();
void setName(String name);
(1)
Boolean getCreateRendition();
void setCreateRendition(Boolean value);
@RenditionCreationCondition
default boolean decideRendition(
RenditionInfo source (2)
) {
if (getCreateRendition() != null) { (3)
return getCreateRendition();
}
(4)
if (source.getMediaType().equals("application/msword")) {
return true;
}
if (source.getMediaType().equals("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) {
return true;
}
if (source.getMediaType().equals("image/tiff")) {
return true;
}
return false;
}
}
1 | The property create_rendition is defined. Note that with the type java.lang.Boolean it is nullable. |
2 | Note that in this case the unused parameter is omitted. This is the only change in the method signature. |
3 | At the start of the method, it is checked, whether the new property is set, simply by calling the getter. If so the value is returned. |
4 | Otherwise, the code from example 1 is executed. |
Text renditions
The rendition feature can be used to store extracted fulltext data as content elements of a document. To achieve this,
simply add a rendition content element with the content type text/plain
.
The content types of the source content element that can be used for text-extraction depend on the available extraction plugins of the Document Conversion Service. |
To be able to use the extracted fulltext data for searches, use the fulltext extraction feature for the SOLR integration. See SOLR for details. The service will automatically use available text renditions when the document is transferred to SOLR. If no text rendition is available, the text extraction will be performed on the fly. |
Configure retention container
Configure storage containers for yearly retention periods
Once you have deployed your new data type with enabled retention, all your data is stored in your default storage profile and has a default retention of 10 years. The following example will define separate buckets containing all your objects with a retention period within one year. Configure the buckets in the ecr-service.yaml of your config service in the section arveo:storage:profiles: You can configure a new storage profile with an unlimited number of data buckets for your content.
Mandatory properties of your new bucket profile:
Property | Description |
---|---|
pluginClassName: |
must always be "de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin" |
pluginSettings:bucketSelectionRules: |
array of rules containing filter (string)|litigationHold (boolean)|storageProfile (string) |
filter (string): Must be * to match all objects or a valid zoned date time range like 2031-01-01+01:00—2032-01-01+01:00, the bucket selection is based on the document type property RETENTION_DATE. |
|
litigationHold (boolean) true= is a litigationHold bucket, false for all other regular retention buckets |
|
storageProfile (string): a valid storage profile name (arveo:storage:profiles:). |
Find more details about selection rules in Retention Bucket Selection Rules
If the configuration is not correct you will find more information in the startup log and will most likely find a MissingConfigurationException
Defining storage containers in arveo-service.yaml and your storage system is an ongoing task for your operating team. Eitco will try to create the buckets or subdirectory on your storage system but can also use already existing ones. |
ecr-service.yaml example snippet for content definitions and storages. Adapt your ecr-service.yaml and replace rules, profile names and cloud storage url, etc. with your values.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
arveo:
server:
content:
default-definition:
mediaType: "application/octet-stream"
storageProfile: bucketProfile (1)
definitions:
content:
mediaType: "application/octet-stream"
storageProfile: bucketProfile (1)
rendition:
mediaType: "application/octet-stream"
storageProfile: bucketProfile (1)
documentTypeA: (2)
mediaType: "application/octet-stream"
storageProfile: storageProfileDocumentTypeA
documentTypeB: (2)
mediaType: "application/octet-stream"
storageProfile: storageProfileDocumentTypeB
1 | Assign your bucket storage profile to the content types with a retention period. |
2 | The example provides two more storage profiles for other document types (storageProfileDocumentTypeA, storageProfileDocumentTypeB). To write all content of a document type to a storage profile you must assign this content type to the document type. The upload API will only accept content of this type for the document type. |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
storage:
profiles:
bucketProfile:
pluginClassName: de.eitco.ecr.server.storage.plugins.BucketOrganizerPlugin
pluginSettings:
bucketSelectionRules:
- "*|true|storageProfileRetentionLitigationHold"
- "NULL|false|storageProfileRetentionNone"
- "2031-01-01+01:00--2032-01-01+01:00|false|storageProfileRetention2031"
- "2032-01-01+01:00--2033-01-01+01:00|false|storageProfileRetention2032"
- "2033-01-01+01:00--2034-01-01+01:00|false|storageProfileRetention2033"
- "2034-01-01+01:00--2035-01-01+01:00|false|storageProfileRetention2034"
- "2035-01-01+01:00--2036-01-01+01:00|false|storageProfileRetention2035"
- "2036-01-01+01:00--2037-01-01+01:00|false|storageProfileRetention2036"
- "2037-01-01+01:00--2038-01-01+01:00|false|storageProfileRetention2037"
- "2038-01-01+01:00--2039-01-01+01:00|false|storageProfileRetention2038"
- "2039-01-01+01:00--2030-01-01+01:00|false|storageProfileRetention2039"
- "2030-01-01+01:00--2031-01-01+01:00|false|storageProfileRetention2030"
- "2031-01-01+01:00--2032-01-01+01:00|false|storageProfileRetention2041"
- "*|*|storageProfileRetention2042Plus"
storageProfileRetentionLitigationHold: (1)
pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
pluginSettings:
pathStyleAccessEnabled: true
serviceEndpoint: "<cloudstorage url>"
region: eu
accessKey: <myaccesskey>
secretAccessKey: <mysecret>
bucket: LitigationHold
storageProfileRetentionNone: (2)
pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
pluginSettings:
pathStyleAccessEnabled: true
serviceEndpoint: "<cloudstorage url>"
region: eu
accessKey: <myaccesskey>
secretAccessKey: <mysecret>
bucket: NoRetention
storageProfileRetention2032Plus: (3)
pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
pluginSettings:
pathStyleAccessEnabled: true
serviceEndpoint: "<cloudstorage url>"
region: eu
accessKey: <myaccesskey>
secretAccessKey: <mysecret>
bucket: RetentionPeriod2032Plus
storageProfileRetention2031: (4)
pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
pluginSettings:
pathStyleAccessEnabled: true
serviceEndpoint: "<cloudstorage url>"
region: eu
accessKey: <myaccesskey>
secretAccessKey: <mysecret>
bucket: RetentionPeriod2031
storageProfileRetention2032:
pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
pluginSettings:
pathStyleAccessEnabled: true
serviceEndpoint: "<cloudstorage url>"
region: eu
accessKey: <myaccesskey>
secretAccessKey: <mysecret>
bucket: RetentionPeriod2032
... (5)
storageProfileDocumentTypeA: (6)
pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
pluginSettings:
pathStyleAccessEnabled: true
serviceEndpoint: "<cloudstorage url>"
region: eu
accessKey: <myaccesskey>
secretAccessKey: <mysecret>
bucket: DocumentTypeA
storageProfileDocumentTypeB: (6)
pluginClassName: de.eitco.ecr.storage.plugin.s3.S3Plugin
pluginSettings:
pathStyleAccessEnabled: true
serviceEndpoint: "<cloudstorage url>" (7)
region: eu (7)
accessKey: <myaccesskey> (7)
secretAccessKey: <mysecret> (7)
bucket: DocumentTypeB
1 | always configure a litigation hold bucket |
2 | you should also configure a data that has no retention … just in case |
3 | fall back bucket for all content with retention period past 2041. You can leave this bucket and get an exception if you store content which cannot be assigned to a bucket |
4 | One buckets for each year |
5 | Configure as many buckets as needed for your content |
6 | Two more storage profiles for other document types without retention. See content types without retention above arveo:server:content:DocumentTypeA/B |
7 | replace the placeholders with your S3 url, region, access key and access secret. |
For more details on storage profiles and content types see Content types
If you want to use directories instead of buckets you can configure file system storage profiles and assign a sub directory (File system storage profile configuration) |
1
2
3
4
5
6
7
8
9
10
11
12
storageProfileLitigationHold:
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${storage.base.directory}/storage/litigationHold
storageProfileRetentionNone:
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${storage.base.directory}/storage/retentionNone
storageProfile2031:
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: ${storage.base.directory}/storage/2031
Configure Encryption
arveo provides a transparent encryption for data stored in the profiles. The encryption can be configured individually for each storage profile.
Overview
Encrypting and decrypting is performed by configurable encryption providers. Each provider is identified by a unique name. The available providers are described below.
The following tables give an overview of encryption settings for a storage profile.
Parameter | Description | Default value |
---|---|---|
enabled |
enables or disables the encryption |
false |
providerName |
name of the encryption provider to use |
commons-aes |
To make sure all content of a specific type definition is encrypted, make sure to limit the content types supported by the type definition to types that use an encrypting storage profile. |
When the BucketOrganizerPlugin is used, the encryption settings must be configured for each plugin referenced by the bucket selection rules. Configuring the encryption for the BucketOrganizerPlugin itself is not supported. |
Commons AES provider
The commons-aes
provider supports AES encryption with 256bit keys. When a new content element is created in an encrypted
profile, the provider generates a random cipher key for the element. The key is encrypted using a master password that is
configured in the profile’s encryption settings. It is then stored in the database, which creates an identifier for the key.
The keys are stored in individual tables for each profile called ecr_keys_<profileName>. After that, the content is encrypted
and stored using the profile’s storage plugin. The key-id is stored in a header together with the encrypted data. When
the data is read, the cipher key is loaded from the database using the key-id read from the header. The key is decrypted
using the master password and used to decrypt the data read by the profile’s storage plugin.
When the database table containing the keys or the master password is lost, it is impossible to restore the data stored in the profile. When the master password for a profile is changed, it is required to re-encrypt all stored keys for the profile. |
In the future, there will be a way to re-encrypt keys. For now, this issue hasn’t been implemented yet.
There is a second database table for each profile called ecr_keys_assoc_<profileName>. This table contains mappings of key IDs to content element IDs and is intended for system administration purposes. The encryption feature is configured as shown in the following example:
storage:
profiles:
encryptedProfile:
pluginClassName: "de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin"
pluginSettings:
storagePath: "/storage/encrypted"
encryptionSettings:
enabled: true
providerName: "commons-aes"
providerSettings:
password: "changeme"
The following tables give an overview of encryption settings for the commons-aes provider:
Parameter | Description | Default value |
---|---|---|
password |
the master password used to encrypt the cipher keys |
|
rngAlgorithm |
the algorithm used to generate secure random data |
Platform specific. See docs for SecureRandom.getInstanceStrong(). If not specified, the most secure algorithm available will be used |
Vault AES provider
The vault-aes
encryption provider uses the transit secrets engine of
Hashicorp Vault to encrypt and decrypt a generated random cipher key.
The cipher key is generated using a configurable random data generation algorithm and then used to encrypt the content
with AES as described below. The cipher key is then encrypted by Vault and stored in a header together with the encrypted
content data. When the data is decrypted, the encrypted cipher key is read from the header, decrypted using Vault and
then used to decrypt the content. The advantage in comparison to the commons-aes
provider is that no master key and
no stored encryption keys in the database are required. The keys required to decrypt the cipher keys (and though the
data, too) are securely stored in Vault and are never known to arveo.
When the Vault instance containing the keyring used to encrypt the random cipher keys is lost, it is impossible to decrypt the content data! |
The following tables give an overview of encryption settings for the vault-aes
provider:
Parameter | Description | Default value |
---|---|---|
keyring |
name of the key ring contained in Vault’s transit secrets engine used to encrypt the cipher keys |
|
transitEnginePath |
(optional) path of the transit engine. If null, the default path will be used. |
|
rngAlgorithm |
the algorithm used to generate secure random data |
Platform specific. See docs for SecureRandom.getInstanceStrong(). If not specified, the most secure algorithm available will be used |
The following example shows a storage profile configuration using the vault-aes
encryption provider.
vaultEncryptedProfile:
pluginClassName: de.eitco.ecr.storage.plugin.filesystem.FileSystemPlugin
pluginSettings:
storagePath: /storage/vault-encrypted
encryptionSettings:
enabled: true
providerName: vault-aes
providerSettings:
keyring: arveo
Implementation details of AES encryption
The following chapter contains information about the implementation details of the AES encryption used by arveo.
Header
The encryption library is designed to encrypt data in such a way that it can be stored permanently in encrypted form and possibly only decrypted after a long time. In order to guarantee decryption, all data required for this (except the key, of course) are stored in a header together with the encrypted data. Using the data from the header, the library can thus obtain, for example, the algorithm used and the data for key derivation, and only needs the password or the derived key for decryption.
AES
The library uses AES according to the recommendation of the Federal Office for Information Security of March 2020:
-
Operating mode: Galois/Counter-Mode
-
Hash function for key derivation: Argon2
The library allows the configuration of different parameters, but offers default values according to the recommendation of the BSI:
-
Key length: 256 bit
-
Length of GCM checksums: 128 bit
-
Length of the initialisation vector: 96 bit
-
Length of the salt for the key derivation: 32 bit
-
Parallelism for Argon2: 1
-
Memory cost for Argon2: 4096 KB
-
Iterations for Argon2: 3
The initialisation vector is randomly generated each time the encryption methods are called by using SecureRandom. The salt for the key derivation is generated in the same way each time the password derivation method is called. The fact that the initialisation vector is always regenerated ensures that the same combination of initialisation vector and key can never be used more than once. For both the AES algorithm and the Argon2 hash function, the implementations of the BouncyCastle library are used. For performance and compatibility reasons, the BouncyCastle implementations are used directly and not via the JCA:
GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());
Argon2BytesGenerator generator = new Argon2BytesGenerator();
Since the default implementation of the CipherInputStream from javax.crypto is not suitable for block ciphers with data authentication, the implementations for CipherInputStream and CipherOutputStream from the BouncyCastle library are used. To generate the random data for the initialisation vector and the salt, a SecureRandom instance created with SecureRandom.getInstanceStrong() is used by default. However, the library allows you to specify a different RNG algorithm (see Note on Linux below).
Header Format
The header begins with a string to identify data encrypted with the library followed by the length of the payload data in the header. The header is divided into blocks and can be read serially.
++>~ENC~<++|97|AES_GCM_ARGON2|1|256|128|10|4096|1|aWFtYW5pbml0aWFsaXphdGlvbnZlY3Rvcg==|aWFtYXNhbHQ=|bXlLZXlJZA== Marker|length|method|header version|key length|checksum length|iteration|storage cost|parallelism|initialisation vector|salt|key ID
Key
The keys used for encryption are either generated using random data or derived from any password using the Argon2 hash function. Since deriving keys can be very computationally intensive depending on the configuration, a key ID can be stored in the header. This makes it possible to store a key once it has been derived and to reuse it for decryption, which avoids having to derive the key from the password again. The library is not responsible for the secure storage of the key. Generating keys using random data is a much faster operation compared to key derivation. The disadvantage is, that it is not possible to derive the key from a master password in case it was lost. When generated keys are used, it is crucial to store those keys in a secure location. In this case, the header will not contain a salt but only the ID of the stored key. When an external system like Vault is used to encrypt generated keys, the encrypted generated key is stored in the header instead.
Usage
Instantiation of the AesEncryptorAndDecryptor:
AesEncryptorAndDecryptor encryptorAndDecryptor=new AesEncryptorAndDecryptor.Builder().build();
AesEncryptorAndDecryptor encryptorAndDecryptor=new AesEncryptorAndDecryptor.Builder()
.with128BitKeys()
.withInitializationVectorLength(128)
.withTagLength(128)
.withIterations(5)
.withMemoryCost(1024)
.withParallelism(3)
.withSaltLength(64)
.withRngAlgorithm("SHA1PRNG")
.build();
Examples of usage can be found in the test class de.eitco.commons.crypto.AesEncryptionTest.
Note on Linux
On Linux, Java uses the NativePRNG algorithm by default for generating random data with SecureRandom.getInstanceStrong(). This implementation uses /dev/random and may block if there is not enough data available there. This can lead to very long waiting times for key derivation and encryption. You can then either use a weaker RNG algorithm or make sure that /dev/random always contains enough data. This can be achieved with the haveged daemon, for example:
apt-get install haveged update-rc.d haveged defaults service haveged start
Configure Active MQ
arveo uses Apache ActiveMQ to queue asynchronous tasks. Access to the message broker is configured in the YAML file of the arveo service using the default configuration properties of the Spring ActiveMQ integration:
spring:
activemq:
broker-url: "tcp://127.0.0.1:61616"
user: "system"
password: "manager"
ActiveMQ’s OpenWire protocol is used to connect to the broker. The queues and topics used by the arveo can be identified by arveo- name-prefix. The arveo uses text messages containing JSON data to make it possible to consume messages in components not implemented in Java. The JSON data uses the same serialization mechanism as the REST API.
arveo uses ActiveMQ’s scheduler support for features like automated deletion of entities in the recycle bin after a configurable time. Therefore it is required to enable the scheduler in ActiveMQ by setting schedulerSupport="true" in the broker tag in activemq.xml.
Some features like the automatic creation of renditions or the removal of stored data for data protection compliance require dead letter queues in ActiveMQ. See renditions for details. The queue-specific dead letter queues must be activated by adding the following policy entries to _activemq.xml.
<policyEntry queue="ecr-queue-create-renditions">
<deadLetterStrategy>
<individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true"/>
</deadLetterStrategy>
</policyEntry>
<policyEntry queue="ecr-queue-delete-audit-entries">
<deadLetterStrategy>
<individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true"/>
</deadLetterStrategy>
</policyEntry>
Configure arveo user management as Authentication Service
The User Management Service can also be used as an OAuth2.0 Authorization Server. The service can issue JSON web tokens that can be used to log in to services that are also secured with OAuth2.
Configuration of the Authorization Server
To enable the Authorization Server, the user-service.authorization-server.enabled setting must be enabled and a keystore must be configured. The keystore must contain an RSA keypair under the specified alias:
user-service:
authorization-server:
enabled: true
keystore:
file: "Pfad/zum/Keytore/keystore.jks"
password: test
alias: test
OAuth clients
To obtain a token, a client application must log on to a specific client configured in the Authorization Server. Clients can be created both by API and by configuration. At least one client must have been configured to be able to log in via OAuth. Clients are always stored in the master tenant. In the configuration, clients can be specified as follows:
user-service:
config-data:
tenants:
- tenant-id: master
oauth2-clients:
- clientId: test-client
resourceIds:
- user-management-service
clientSecret: my-secret
authorizedGrantTypes:
- password
- client_credentials
- refresh_token
authorities:
- USER_MANAGEMENT_SERVICE_USER
accessTokenValiditySeconds: 300
refreshTokenValiditySeconds: 600
In the above example, a client with the ID "test-client" is configured to have access to the arveo User Management Service (resourceIds and authorities) and to offer the authorization grants password, client_credentials and refresh_tokens. The grants are the same as those in the OAuth2.0 standard.
By default, the client’s configured authorities are included in the issued tokens. In addition, the user’s authorities (= privileges) configured in the user service are entered in the tokens. To prevent the client’s authorities from being included in the tokens, the user-service.authorization-server.inherit-authorities setting can be set to false.
The clients are always stored in the master tenant. For systems with multiple clients, care must be taken to specify the master tenant in the configuration. |
Refreshing tokens
When a new token is issued, a refresh token is also generated (except for the client_credentials grant). This refresh token can be used to renew an expiring token without requiring the user to log in again. By default, when a token refresh request is made, the user also receives a new refresh token whose validity is still that of the first refresh token. This ensures that a user cannot be issued new access tokens by the service indefinitely. If this behavior is not desired, and the refresh tokens should each have an extended validity, the user-service.authorization-service.reuse-refresh-tokens parameter can be set to false.
Client login
To get a token, the client application must send the respective client id and the client secret as HTTP Basic Auth header in the token request. The remaining parameters are sent as form data via POST to the endpoint https://user-management-service/oauth/token. === Configure authentication/SSO with Keycloak
The arveo content services support OAuth2.0 with OpenID Connect to authenticate users and services. You can install Keycloak as your identity management and use it as OAuth2.0 authentication service instead of the _arveo user management service. This will also allow you to enable single sign on for your web clients.
The content services take either the role of a "resource server" or the role "resource server" and a "client" if the use resources of other services.
In principle, any authentication server that supports OAuth2.0 and OpenID Connect can be used. Currently Keycloak, Active Directory are approved for use with arveo.
Install Keycloak
-
Download and install Keycloak: https://www.keycloak.org/downloads.html
-
Start the server. With standalone.bat -Djboss.socket.binding.port-offset=100 the used ports can be adjusted.
-
Call the configuration interface (e.g. localhost:8180) and define an administrator user for the first login.
-
Refer to Keycloak documentation to configure Keycloak on your system.
Create Keycloak realm
-
Create your own realm (e.g. "arveo") with keycloak configuration interface.
-
Copy the public RSA key from the realm keys tab. We will use later for the configuration of the content services.
Create Keycloak clients
-
Next, the keycloak clients for the arveo services are set up. Clients may access resources, resources validate access to themselves. There is also a mixed form (confidential) that accesses resources, but can also be a resource itself.
All clients have set Client Protocol=openid-connect
Client Authenticator must be Client Id and Secret to allow a secure OAuth2.0 flow.-
Create a client arveo-service for the secure communication between the arveo services,
This client behaves like a technical user for service/service calls
access-type=confidential -
Create client for your applications e.g. arveo-webclient which is public and accesses all arveo services
This client is used for the users of your application that have logged with credentials.
Client-Protocol=public
Valid Redirect URIs=<URI of your web client>
Client Protocol=openid-connect
-
Implicit flow is no longer recommended. The standard flow should be used. Furthermore, the extension PKCE (Proof Key for Code Exchange Code) should be used (*Authentication Standard Flow with PKCE. |
Configure a client
-
To allow the client to access the arveo services, add the role arveo-service-user to the client.
-
Add token mappers to allow arveo to get information from the token
-
Tenant, the tenant in arveo. This is used to assign the user to a client.
Name=Tenant
Mapper-Type=User Attribute
User Attribute=tenant
Token Claim Name=tenant
Claim JSON Type=String
Multi Valued = Off
Add To Id Token=On
Add To Access Token=On
Add To User Info=On -
Audience for repository service, _arveo pays attention to access tokens from the score-client at all, the score-backend client must also be in the token as an audience
Name=Audience for arveo services
Mapper-Type=Audience
Included Client Audience=arveo-webclient
Add To ID Token=Off
Add To Access Token=On -
GUUID, important for authentication via LDAP. In the access token, the user_name attribute is set to the GUUID from the LDAP. This is largely stable, in contrast to the Keycloak internal user ID.
Name=GUUID
Mapper-Type=User Attribute
User Attribute=LDAP_ID
Token Claim Name=user_name
Claim JSON Type=String
Multi Valued = Off
Add To Id Token=On
Add To Access Token=On
Add To User Info=On -
Client ID, keycloak ClientID Name=Client ID
Mapper-Type=User Session Note
User Attribute=clientid
Token Claim Name=clientid
Claim JSON Type=String
Add To Id Token=On
Add To Access Token=On -
Client roles, is required for the _arveo services. The client needs the authority arveo-user-role to access the service. All roles from the client with the ClientID arveo-client are added to the claim authorities. Name=client roles
Mapper-Type=User Client Role
User Attribute=LDAP_ID
Multi Valued = On
Token Claim Name=authorities
Claim JSON Type=String
Multi Valued = Off
Add To Id Token=Off
Add To Access Token=On
Add To UserInfo = Off -
Service user. is required to identify the user of the access token as a service user. Name=Service user
Mapper-Type=Script Mapper
Script=exports=user.getUserName.startsWith("service-account");
Multi Valued = Off
Token Claim Name=technical-user
Claim JSON Type=boolean
Add To Id Token=On
Add To Access Token=On
Add To UserInfo = On
-
Configure Keycloak for SSO with Kerberos
-
Configure Keycloak user federation for SSO with Active Directory using Kerberos
-
2 additional LDAP mappers have to be added:
-
Adding the tenant to the user attributes, since the tenant does not come from the AD.
Name: add-arveo-tenant
Mapper Type: hardcoded-attribute-mapper
User Model Attribute Name: tenant
Attribute Value: master -
The role so that the user is authorized to access the arveo services Name: add-arveo_-user
Mapper Type: hardcoded-ldap-role-mapper
Role: arveo-service-user
-
As soon as a user logs on to the _arveo web client application for the first time, the user is imported from the LDAP into Keyclaok. Users who are not in the LDAP can be created locally in the Keycloak. |
Configure Keycloak:
-
Install package freeipa-client (Ubuntu)
-
Setup /etc/krb5.conf
[libdefaults] default_realm = <your realm> # The following krb5.conf variables are only for MIT Kerberos. kdc_timesync = 1 ccache_type = 4 forwardable = true proxiable = true # The following encryption type specification will be used by MIT Kerberos # if uncommented. In general, the defaults in the MIT Kerberos code are # correct and overriding these specifications only serves to disable new # encryption types as they are added, creating interoperability problems. # # The only time when you might need to uncomment these lines and change # the enctypes is if you have local software that will break on ticket # caches containing ticket encryption types it doesn't know about (such as # old versions of Sun Java).# default_tgs_enctypes = des3-hmac-sha1 # default_tkt_enctypes = des3-hmac-sha1 # permitted_enctypes = des3-hmac-sha1 # The following libdefaults parameters are only for Heimdal Kerberos. fcc-mit-ticketflags = true [realms] YOURDOMAIN.COM={ kdc=yourdomaincontronler:port } [domain_realm] yourdmain.com=YOURDOMAIN.COM .yourdomain.com=YOURDOMAIN.COM
-
chown on arveo:arveo und chmod 600
-
Import the CA certificate to your Java truststore
e.g. %javahome%/keytool -import -alias YourDomain.com -keystore truststore.jks -file ~/ca.pem -
Activate Kerberos Single Sign On:
To allow SSO set requirement for all Flow to ALTERNATIVE
-
Add a non ldap test user in manage users
Details:
Name=TestUser
User Enabled=On
Attributes:
LDAP_ID=<new UUID>
Tenant=master
Role Mappings=<add arveo-service-user>
Configure authentication between Content Services
All arveo content services use Spring Security for user authentication and authorization. Spring Security supports several standardized protocols as well as custom implementations. The basic configuration is independent of the protocol used.
When configuring the service, it is important to consider the role that the service plays in the overall system. Some services are only used by different clients and do not communicate with other services. These services only take the role of a "resource server". Other services, such as the repository service, communicate with other services themselves and assume the role of a "resource server" and a "client" at the same time.
Resource and client configuration
The following configuration can be used to make a service an OAuth2.0 resource and/or an OAuth2.0 client in the service’s application.yaml:
Service | Client | Resource |
---|---|---|
Document Service |
yes |
yes |
User Management Service |
no |
yes |
Access Control Service |
yes |
no |
Audit Service |
yes |
no |
SAP Archive Link Service (optional) |
yes |
no |
Document Conversion Service (optional) |
no |
yes |
Enterprise User Management Service (optional) |
no |
yes |
Enterprise Integration Service (optional) |
yes |
no |
Federation Service (optional) |
no |
yes |
Configure resource
Configure the respective application.yaml of the service like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
security:
general:
secured-ant-matchers: "/api/**"
open-ant-matchers: "/actuator/health,/actuator/info"
role-for-secured-access: "<service - name>"
cors-configuration:
allowed-origins: "*"
allowed-headers: "*"
allowed-methods: "GET,POST,PUT,PATCH,DELETE,OPTIONS"
max-age: 3600
spring:
security:
oauth2:
resourceserver:
jwt:
# public key for user management service
# public-key-location: "http://localhost:39002/oauth/public_key"
# public key location for keycloak
jwk-set-uri: "http://localhost:8080/auth/realms/ecr/protocol/openid-connect/certs"
(1) Generally, these parameters shouldn’t be changed.
(2) CORS defines a way in which a browser and server can interact to determine whether it is safe to allow the cross-origin request.
Configure the client
Configure the respective application.yaml of the service like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
spring:
security:
oauth2:
client:
registration:
cmn-user-service-client-credentials:
provider: user-service
client-id: "arveo-service"
client-secret: "my-secret"
authorization-grant-type: "client_credentials"
scope: "arveo"
provider:
user-service:
authorization-uri: "http://localhost:39002/oauth/auth"
token-uri: "http://localhost:39002/oauth/token"-----
keycloak:
authorization-uri: "http://localhost:8080/auth/realms/arveo/protocol/openid-connect/auth"
token-uri: "http://localhost:8080/auth/realms/arveo/protocol/openid-connect/token"
Parameter |
Description |
oauth2.resourceserver.jwt.public-key-location |
Validation key of the authentication service to validate the token. e.g. PEM or RSA Public Key. For keycloak Realm Settings Keys, for User management service see documentation on user-management/securing rest endpoints |
security.general.role-for-secured-access |
unique identifier of the service: see names of services in table below |
spring.security.oauth2.client.registration.cmn-user-service-client-credentials.client-id |
Client Id configured in your authentication service. In our keyckloak example: arveo-service, Client-ID, for User management service see documentation on user management/client context/client ID |
spring.security.oauth2.client.registration.cmn-user-service-client-credentials.client-secret |
Your client secret of the authentication service. For keyckloak Client Secret, for User management service see documentation on user management/client context/client secret |
spring.security.oauth2.client.provider.user-service.authorization-uri |
end point for user authorization |
spring.security.oauth2.client.provider.user-service.token-uri |
end point to get an access token |
spring.security.oauth2.client.registration.cmn-user-service-client-credentials.scope |
Scope is always arveo |
-
The public-key-location defines a path to a resource containing the public key of the service that issued the signed tokens. If the issuing service supports JSON Web Keys, the URL to the JWK endpoint can be set using jwk-set-uri.
-
To enable the user impersonation feature, add the following to the application.yaml configuration:
1
2
3
4
commons:
security:
oauth2:
impersonation-enabled: true
-
It is possible to disable the auto configuration of the server components by setting spring.security.oauth2.resourceserver.enabled to false.
Troubleshooting some common error messages and ways to fix them:
-
principal cannot be null from OAuth2AuthorizeRequest: There probably was no Authentication in the application’s SecurityContext. Check if the application sets an Authentication.
-
Startup fails because no bean of type ClientRegistrationRepository was found: Check the configuration. This usually happens when the values in spring.security.oauth2.client are either missing or invalid. Check indentation!
Troubleshooting
Some common error messages and ways to fix them:
-
Principal cannot be null from OAuth2AuthorizeRequest: There probably was no Authentication in the application’s SecurityContext. Check if the application sets an Authentication.
-
Startup fails because no bean of type ClientRegistrationRepository was found: Check the configuration. This usually happens when the values in spring.security.oauth2.client are either missing or invalid. Check indentation!
OAuth2.0 Authentication
All arveo services require authentication, ensuring that only another arveo service or an authenticated user can use the REST API. Authentication of a user is done by the either a authentication service like Keycloak or the arveo User Management Service. The user context is passed to invoked services. Single Sign on is supported with OAuth2.0 and Open ID Connect.
The arveo User Management Service can also be used as an OAuth2.0 Authorization Server. The service can issue JSON web tokens that can be used to log in to services that are also secured with OAuth2.
This chapter describes
-
how arveo's content services act as an OAuth2.0 resource server for applications using the arveo REST API
-
how the arveo services use OAuth2.0 to authenticate to other services as a technical user.
All content services use Spring Security for user authentication and authorization. The services support OAuth2.0 with OpenID Connect.
Spring Security enables both the OAuth2.0 support for the service’s web resources and the OAuth2.0 client support. The content services retrieve a new OAuth2.0 token from the configured OAuth2.0 authorization service when a authentication is required. This OAuth2.0 authorization service can either be the arveo user management service or Keycloak, Active Directory.
OAuth2.0 Flows (Grant types)
OAuth2.0 defines four flows to get an access token. These flows are called grant types. arveo supports two flows for user authentication and service authentication.
-
Client Credentials Flow: used for machine-to-machine content services communication.
-
Authorization Code Flow with Proof Key for Code Exchange (PKCE) technique: used by arveo Web Applications sand also used by mobile apps.
-
Resource Owner Password Flow: can be used by highly-trusted web apps.
In the following paragraphs we describe the flows. The authentication service can either be the arveo User Management Service or Keycloak. If you use Keycloak you must turn off arveo internal authentication service.
Client Credential Flow
Our machine-to-machine content services authenticate and authorize the app not an user. For this scenario, typical authentication schemes like username + password or social logins don’t make sense. Instead, the services use the Client Credentials Flow, in which they pass along their Client ID and Client Secret to authenticate themselves and get a token.
-
arveo authenticates with Authorization Service using its Client ID and Client Secret (/oauth/token endpoint).
-
The Authorization Service validates the Client ID and Client Secret.
-
The Authorization Service responds with an Access Token.
-
arveo can use the Access Token to call an API on behalf of itself.
-
The Service API responds with requested data.
Authentication Code Flow (PKCE)
The arveo single-page web applications requests Access Tokens, some additional security concerns are posed that are not mitigated by the Authorization Code Flow alone. This is because:
-
Native apps cannot securely store a Client Secret. Decompiling the app will reveal the Client Secret, which is bound to the app and is the same for all users and devices.
-
Single-page apps cannot securely store a Client Secret because their entire source is available to the browser.
Given these situations, OAuth 2.0 provides a version of the Authorization Code Flow which makes use of a Proof Key for Code Exchange (PKCE).
The PKCE-enhanced Authorization Code Flow introduces a secret created by the calling application that can be verified by the authorization server; this secret is called the Code Verifier. Additionally, the calling app creates a transform value of the Code Verifier called the Code Challenge and sends this value over HTTPS to retrieve an Authorization Code. This way, a malicious attacker can only intercept the Authorization Code, and they cannot exchange it for a token without the Code Verifier.
-
The user clicks Login within the application.
-
The OAUTH2 java script SDK creates a cryptographically-random code_verifier and from this generates a code_challenge.
-
OAUTH2 java script SDK redirects the user to the authorization service (/authorize endpoint) along with the code_challenge.
-
The authorization service redirects the user to the login and authorization prompt.
-
The user authenticates using one of the configured login options and may see a consent page listing the permissions Auth0 will give to the application.
-
The authorization service stores the code_challenge and redirects the user back to the application with an authorization code, which is good for one use.
-
OAUTH2 jav script SDK sends this code and the code_verifier (created in step 2) to the Auth0 Authorization Server (/oauth/token endpoint).
-
The authorization service verifies the code_challenge and code_verifier.
-
The authorization server responds with an ID Token and Access Token (and optionally, a Refresh Token).
-
The arveo web application can use the Access Token to call an API to access information about the user.
-
The API responds with requested data.
Maintenance mode for the database schema
This chapter documents the arveo parameters to start the database service, alter schema and stop the service.
The arveo can be started in a special mode that ensures, that this instance changes the schema and prevents other instances from being started or have already been started. If the database schema change fails, the instance terminates in a way that can be easily evaluated by the administrator to be able to react to this exception.
The service does not start if registry query returns other running instances. The service terminates after the liquibase script is executed. The following two parameters are set:
system:
terminateAfterCreation: true
updateSchema: true
So the maintenance mode can be used to update the database schema. When the maintenance mode is enabled, the arveo starts, performs necessary schema updates, and terminates once the schema was updated. Requests from clients are not processed while the system is in maintenance mode. Clients will receive a HTTP 503 response code. Schema updates must be performed by one single arveo instance to avoid race conditions. The recommended procedure for a schema update is as follows:
-
Shut down all arveo instances
-
If required: Update to a newer arveo version
-
Enable maintenance mode by setting system.maintenanceMode: true in the configuration
-
Start one single arveo instance and wait for it to shut down after the schema was updated
-
Disable maintenance mode in the configuration
-
Start all arveo instances.
The database schema of an existing system can be changed by adapting the type definition classes and restarting the
repository service with the setting arveo.server.system.maintenance-mode=true
. The service will update the database
schema and shut down once the update is finished. It will not accept requests while the schema is updated.
Supported schema changes
The following list contains the supported schema changes. Note that some changes like removing an attribute or adding constraints might not be possible when the existing data or existing constraints might be violated by the change.
-
Adding a new attribute.
-
Removing an existing attribute. Note that the column will be dropped from the schema.
-
Adding and removing indexes as well as changing index properties.
-
Change the primary key (only for META types).
-
Adding and removing of foreign keys.
-
Add new content elements (only for DOCUMENT types).
-
Adding and removing unique constraints.
-
Adding and removing not-null constraints.
It is also possible to enable certain features on existing type definitions. Disabling the features is not supported.
-
Enabling ACL support.
-
Enabling document filing.
-
Enabling optimistic locking.
-
Enabling the recycle bin.
-
Enabling retention support.
Checking for schema changes
By setting the properties arveo.server.system.maintenanceMode
and arveo.server.system.logSchemaChanges
to true,
the system will start up, check for required schema changes, write them to a special log file, and shut down again.
The database schema will not be changed. This makes it possible to check for unsupported changes to the schema before
performing the actual schema update.
The directory used to store the schema update log can be specified using the property arveo.server.system.schemaChangeLogDirectory
.
The default value is logs
. The system will create one logfile for each tenant. The contents of the file will look like
the following example:
Supported changes for attributes of type definition my_document:
- document_name: IS_UNIQUE
- container_id: FOREIGN_KEY, IS_UNIQUE
Unsupported changes for attributes of type definition my_document:
- document_name: none
- container_id: none
In this example, there are three supported changes for the type definition named my_document. A unique constraint will be added to or removed from the attributes container_id and document_name and a foreign key will be added to or removed from the attribute container_id. There are no unsupported changes, so the actual schema update should succeed.
Please note that there are some advanced schema checks that can only be done correctly when the types are actually stored in the database. For example, the checks for the correctness of parent- and child- types of a relation type is not possible when the schema update itself is skipped.
Configure audit
A @Type
may define to be audited. This means, that any write access i.e. any create, update and delete operation to any entity of this type will be logged into another table. This is done with the annotation @Audit
:
1
2
3
4
5
6
7
8
9
10
11
12
@Type(ObjectType.CONTAINER)
@Audit(AuditLocation.TYPE_SPECIFIC) (1)
public interface AuditedContainer {
@Optional
String getName();
void setName(String name);
@Optional
Integer getInteger();
void setInteger(Integer integer);
}
1 | The annotation @Audit activates auditing on a type |
The name of the table to be audited to is derived from the table name of the given type, following the form <table-name>_log
. You can choose to specify one audit table per entity table, or alternatively to audit to one global table:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Type(ObjectType.DOCUMENT)
@Audit(
value = AuditLocation.GLOBAL, (1)
indexOn = {AuditJsonField.CURRENT} (2)
)
public interface AuditedDocument {
@Optional
String getName();
void setName(String name);
@Optional
Integer getInteger();
void setInteger(Integer integer);
}
1 | Note a different AuditLocation |
2 | with indexOn it is possible to specify on which json fields of the audit table indices should be set |
In this case the table to be audited to will be the audit services default audit table default_audit_log
.
Access audit
To access the audit, the audit service provides a REST API. Access will be restricted depending on the type: If ACLs are activated on a @Type
, only users that have read access to an entity will be allowed to audit this entity. If ACLs are deactivated on a @Type
, only users with the authority AUDITOR
are allowed to audit the entities of this type.
Solr
In order to use Solr in connection with arveo, an installation of a Solr service is required. Under the following link you can download the current versions of Solr: https://solr.apache.org/downloads.html.
The Solr service must be configured in arveo within the application.yaml. See Configuration properties for details.
When a type definition is annotated with @NOSql
, the entities stored in this type definition will be stored in
Solr, too. See NOSQL Example for how to enable this feature. The system
will create a special queue table in the relational database for such a type definition. The queue table will contain
the entities that have to be stored in Solr. A system job is used to process the
entries in the queue table.
Solr deployment
A small tutorial for setting up Solr for arveo.
arveo yaml configuration:
In the following example you can see a minimal arveo yaml configuration for Solr:
ecr:
server:
solr:
defaultConfigName: "ecr-config"
host: "http://localhost:38983/solr"
username: "ecr-solr-user"
password: "password"
Collections will be created automatically from arveo. Every tenant gets his own collection. |
More information you can find in the chapter ecr.server.solr.
Solr security
As default Solr has no security settings set. It is very important to add the security settings for Solr! |
For more information for setting up the security settings for Solr you can go the chapter solr security configuration
If you use the arveo acl functionallity you can setup the solr-acl-plugin. More information for the solr-acl-plugin you can find here.
Solr Configurations
If you want to use solr with arveo you must upload a ecr-config to the Solr zookeeper. The ecr-config you can find at the following url nexus.eitco.de as zip-file.
The zip-file you can upload to zookeeper with the following command on the command line interface.
Shell:
[path to zookeeper in solr]/zkcli.sh -cmd upconfig -confdir [path to solr-config from nexus.eitco.de]/solr-config -confname ecr-config -z [zookeeper host]:[zookeeper port]
Bash:
[path to zookeeper in solr]\zkcli.bat -cmd upconfig -confdir [path to solr-config from nexus.eitco.de]\solr-config -confname ecr-config -z [zookeeper host]:[zookeeper port]
Type definitions
You can annotate a type definition Class with @NOSql. If you do that every attribute in the type definition will be created as field in the Solr manged-schema.xml. If you want to deactivate an attribute of an type definition for Solr you can do the following:
If you have a type definition which is using the annotation @NOSql you must be setup a Solr. Otherwise, arveo will be not starting! |
@NOSql
public interface PersonSimple {
String getFirstName();
void setFirstName(String value);
@NOSql(value = false)
String getLastName();
void setLastName(String value);
}
In this example only the firstname will be automatically created in the managed-schema.xml of Solr.
Fulltext Extraction
arveo automatically extracts the content of the type definitions which are annotated with @Type(ObjectType.DOCUMENT) and @NOSql. You must set up the documents conversion service for this functionality. At the following we describe the process of the automatic extraction of Solr:
-
Arveo saves the documents who are annotated with @NOSql in a queue table named type_definition_name + _ng;
-
Now a job will fetch all entry in the queue table;
-
For each entry which are annotated with @Type(ObjectType.DOCUMENT) arveo will call the document conversion service to extract the fulltext from the content;
-
At the end all information will be saved to Solr.
More general information about the document service full text extraction can be found here and here.
More general information about extracting text with arveo and Solr you can find at chapter Fulltext extraction.
Solr security configuration
By default, Solr does not use any kind of authorization or authentication. In productive systems, Solr must be secured by enabling transport encryption, authentication and authorization.
The transport encryption can be enabled by enabling HTTPs in Solr. See the
Solr documentation for instructions. When SSL is enabled, the
url in the configuration property ecr.server.solr.host
must use the https scheme.
arveo can use basic authentication to authenticate requests sent to Solr. Solr provides a basic authentication plugin that must be enabled as described in the documentation. Enabling authentication and authorization in Solr requires uploading a security.json file to Zookeeper. The following example shows a security.json file that enables the basic auth plugin and a rule based authorization plugin.
There is a github project containing a tool to generate the value for the salt and password used in the credentials-property of the basic auth plugin.
{
"authentication": {
"blockUnknown": true,
"class": "solr.BasicAuthPlugin",
"credentials": {
"ecr-solr-user": "qkxp6hmEeGTaqnEvSmH7f+qytLWd/JcwaUyqpdjt5rg= NERXZefDt7lXYvdZfB0hT3ZCgNFSqI4nJ7kGgbhaTWs="
},
"realm": "My Solr users",
"forwardCredentials": false
},
"authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",
"permissions": [
{
"name": "schema-edit",
"role": "admin"
},
{
"name": "update",
"role": "admin"
},
{
"name": "read",
"role": [
"user",
"admin"
]
}
],
"user-role": {
"ecr-solr-user": [
"user",
"admin"
]
}
}
}
To enable basic auth support for the Solr client used by arveo, you have to set the following parameters in the configuration for arveo:
ecr:
server:
solr:
username: "ecr-solr-user"
password: "password"
ACL filter plugin
When searching for ACL protected entities in Solr, the search result is filtered by ACL right. This is achieved by using a custom Solr plugin, which must be installed manually by following the steps below:
-
Download the
cmn-user-management-access-control-solr-plugin
(version 4.1.0) from the nexus repository. -
Install the plugin in Solr as described in the Solr documentation.
-
Configure the plugin in
solrconfig.xml
as shown below:
<queryParser name="aclright" class="de.eitco.commons.user.management.access.control.solr.AclRightParserPlugin">
<str name="solrAclPlugin.jdbcUrl">jdbc:postgresql://localhost:5432/mydatabase?currentSchema=mytenant</str>
<str name="solrAclPlugin.jdbcUser">myuser</str>
<str name="solrAclPlugin.jdbcPassword">mypassword</str>
</queryParser>
The plugin requires a JDBC connection to the relational database to load ACL rights. Do not change the name of the
query parser. Solr queries generated by arveo will contain a filter query using the aclright
prefix to
perform the actual filtering.
The Access Control Solr Plugin does not yet support multiple tenants! |
Alternatively, the configuration parameters for the ACL filter plugin can be set as Java system properties for the
Solr server, as environment variables, or they can be stored as secrets in Vault. To use Vault, set the following
parameters either in the solrconfig.xml
file, as Java system properties, or as environment variables:
-
solrAclPlugin.vaultEnabled=true
-
solrAclPlugin.vaultAddress=https://myvaultserver:port
(optional, the default value is "http://127.0.0.1:8200") -
solrAclPlugin.vaultToken=token
(optional, can be a token value or the path to a token file. By default, the plugin will try to load a token from ~/.vault-token) -
solrAclPlugin.vaultSecretEnginePath=path
(optional, the default is "secret")
The order in which the plugin loads configuration properties is:
-
Vault
-
System properties
-
Environment variables
Storing entities in Solr
To be able to use Solr for advanced queries, the entities must be stored in Solr, too. To enable this, simply add the
@NOSql
annotation to the attributes to be stored in Solr to the getters of your type definitions. The annotation can
be added to the class to store all attributes in Solr. arveo will automatically create the required fields
in the Solr schema. All entities of one tenant will be stored in a single collection in Solr. To avoid name collisions,
the names of the fields in Solr will consist of the name of the type definition and the name of the attribute, separated
by a dot. For example, an attribute called "name" in a type definition called "invoice" would be named "invoice.name".
Fulltext extraction
It is also possible to store fulltext data of the content of documents in Solr. For this, set the fulltextExtraction
attribute of the @ContentElement
annotation to true (see Content Elements). The
fulltext data of the content elements will be stored in a field called
<typeDefinitionName>.<contentElementName>.fulltext
in Solr. This field can be used in queries
just like any other field.
The fulltext extraction is performed by the document conversion service. Which types of content are supported for
fulltext extraction depends on the active plugins of the document conversion service. The open source plugins contained
in the document conversion service project support fulltext extraction for PDFs with text content and Microsoft Office
documents. The mime type of a content element is required for the fulltext extraction. By default, the mime type is
contained in the metadata of a content element. When the content element is stored in a separate field on the
database, the mime type is not available in the metadata but it can be defined in the @ContentElement
annotation. If
the mime type is set to the default value (application/octet-stream), the service will try to auto-detect the mime type.
Searching in Solr
The arveo API provides methods to perform queries in Solr. The methods use the EQL just like the regular search methods that perform queries on the relational database. There are some EQL features that are not supported when searching in Solr:
-
joins and unions
-
subselects
-
exists expressions
-
toLower
-
less than
-
greater than
-
is null
The behavior of the supported query expressions is dependent from the configuration of the field in Solr. For example, a text field with a tokenizer that splits text by whitespaces will deliver different results for equality expressions than a text field without such a tokenizer.
To perform a query in Solr for one specific type definition, use the getNoSqlSearchService
method of the service client
for the type definition. In the following example, this method is used to find an entity by its ID in Solr:
Optional<TypedNOSqlSearchHit<FieldTypeContainer>> optional = serviceClient.getNoSqlSearchService()
.where().id().equalTo().value(identifier).holds().uniqueResult();
The next example shows how to search in the fulltext data stored in Solr.
list = serviceClient.getNoSqlSearchService().where().contextReference(SimpleInvoiceNames.CONTENT_FULLTEXT)
.contains().value("Gubergren accumsan takimata").holds().unpaged();
To search for values in a specific field, the correct field name must be used in the context reference. The auto- generated name constant classes for the type definition interfaces contain a helper method to compute the name. The following example shows how to use this method to search in a specific field of an entity.
list = serviceClient.getNoSqlSearchService().where()
.contextReference(FieldTypeContainerNames.noSqlName(FieldTypeContainerNames.INTEGER_FIELD))
.equalTo().value(7).holds().unpaged();
When using the getNoSqlSearchService
method, the query performed in Solr will automatically be limited to entities
belonging to one single type definition. To search for entities in multiple type definitions, the method
de.eitco.ecr.sdk.SearchClient.searchServiceForNoSql
can be used.
Combined search
arveo creates a special multi-valued field called ecr_attributes
in Solr. When a getter (or an entire
interface) is annotated with @NOSql(combinedSearch = true)
, the attribute of the getter (or all attributes of the interface)
will be copied in this field. The ecr_attributes
field is of type string. The values of the attributes copied into
this field, will be converted automatically by Solr.
The ecr_attributes
field can be used for a combined search of all attribute values. This makes it possible to provide
a search method where the user does not need to know the name of the attribute to search for:
Optional<TypedNOSqlSearchHit<SimpleInvoice>> hit =
serviceClient1.getNoSqlSearchService().where().noSqlCombinedField().equalTo()
.value(invoiceNumber).holds().uniqueResult();
Optional<TypedNOSqlSearchHit<SimpleInvoice>> hit =
serviceClient1.getNoSqlSearchService().where().noSqlCombinedField().equalTo()
.value(invoiceNumber).and().noSqlCombinedField().equalTo().value("29.99").holds().uniqueResult();
Optional<TypedNOSqlSearchHit<SimpleInvoice>> hit =
serviceClient1.getNoSqlSearchService().where().noSqlCombinedField().like()
.value("Kasd invidunt stet dolor")
.and().contextReference(EcrQueryLanguage.COMBINED_SEARCH_FIELD).equalTo().value(invoiceNumber)
.holds().uniqueResult();
It is possible to copy the extracted fulltext data of content elements to the combined ecr_attributes
field, too. To
do so, configure the respective content element as shown below:
@ContentElement(name = "pdf", contentType = "application/pdf", fulltextExtraction = true, textCombinedSearch = true, textCombinedSearchLimit = 200)
The relevant settings are textCombinedSearch=true
, which enables the copying of the extracted fulltext data. The
textCombinedSearchLimit
setting limits the number of characters to copy. This makes it possible to reduce the size of
the Solr index. A value of 0
means no limit.
Relations in Solr
Relations between entities can be described using relation type definitions. While those relations can be easily resolved in the relational database, NoSQL databases like Solr are not designed to support a relational data model as shown in the following diagram:
+------------+ +--------+-----+ +------------+ | Parent | + Relation + | Child | |------------| source |--------------| target |------------| | |<--------------| |------------->| | | | | | | | +---+--------+ +--------------+ +------------+
It is still possible to search for Parent entities by the IDs of the related Child entities in Solr using
arveo. To make this possible arveo stores the IDs of the Child entities related to a Parent
entity in a multi value field in Solr. To enable this feature, the type definition interface of the Parent type
must be annotated with @NOSqlResolvedRelations
. The annotation requires a parameter containing the type definition
classes of the relations to store in Solr.
Only relations from the current version of the Parent entity to the current version of the Child entity can be stored in Solr. |
The following example shows how to search for entities by related child IDs:
List<TypedNOSqlSearchHit<SimpleInvoice>> list = documentServiceClient.getNoSqlSearchService().where() (1)
.contextReference(noSqlSearchHelper.getRelationChildIdsField(SimpleInvoice.class, InvoicePersonRelation.class)) (2)
.contains().value(containerClient.getIdentifier()).holds().unpaged(); (3)
1 | Obtain a NoSql search service from the service client for the Parent entity type definition |
2 | Using an (injectable) instance of NoSqlSearchHelper it is possible to get the name of the field containing the
child IDs in Solr |
3 | Get the ID of the child entity to search for from the entity client of the relation child entity |
Rebuilding the Solr index
Currently, there is no automated way to rebuild the Solr index. If the data in Solr was lost or corrupted, it can be rebuilt using the NOSql queue tables of the affected type definitions.
When an entity is added or updated in a type
definition that is annotated with @NOSql
, an entry for this entity is added to a queue table by a trigger on the
database. This queue table is named like the main table for the type definition with a _nq
suffix. So for example,
if the main table is called my_type_definition
, the queue table would be called my_type_definition_nq
. It contains
a column for each attribute that is supposed to be contained in Solr and the following system fields:
Field | Explanation |
---|---|
nosql_queue_id |
The ID of the entry in the queue. This value is assigned automatically by the database. |
nosql_processing_counter |
A counter that is incremented each time the system has tried to store the entry in Solr. The initial value is supposed to be 0. |
nosql_trigger_operation |
The name of the operation that caused the entry to be added to the queue table. Could be INSERT, UPDATE or DELETE. |
A system job periodically reads the entries contained in the queue table and stores them in Solr. When the entry was successfully stored in Solr, it is deleted from the queue. If the entry could not be stored in Solr, the processing counter is incremented and the job will try again until the maximum number of tries has been reached.
To rebuild the Solr index, the data to be stored in Solr has to be added to the queue table by copying the data
contained in the main table of the type definition. By definition, only the most recent version of each entity is
contained in Solr, so the data contained in the version table of the type definition is of no interest here. The
value of the nosql_processing_counter
field must be set to 0. To treat the entity as a new object in Solr, set the
value of the nosql_trigger_operation
field to INSERT
. To perform an update on an already existing object in Solr,
set the value to UPDATE
.
The code of the trigger function used to populate the queue table might be useful when creating the SQL script that
copies the entries. The trigger function is called like the main table of the type definition with a _ntf
suffix.
When the type definition makes use of the @NOSqlResolvedRelations
annotation, an additional field containing the
resolved IDs of the child entities of the relations on each entity will be contained in the queue table. This field
will be called relation_<type-id>_child_ids
where type-id
is the numeric ID of the relation type definition. This
ID can be found in the ecr_types
table. To store the resolved child IDs in Solr, additional entries for each entity
containing the entity ID, the resolved child IDs, the update counter and the trigger operation are added to the queue
table. Note that the value for the trigger operation field must be set to UPDATE
in this case.
The resolved child IDs can be copied from the table containing the relation. The parent_id
field in this table will
contain the ID of the entity that is to be stored in Solr. The child_id
field contains the IDs of the related child
entities.
The code of the trigger function used to populate the queue with the entries containing the resolved child ID might be
useful when writing the SQL script to copy these values. The trigger function will be called like the main table of
the relation type definition with a _nrtf
suffix.
Example scripts
The following examples show how to copy data to the queue table. The main table of the type definition in the example
is called test_simple_invoice
.
insert
into
"test_simple_invoice_nq"(
"version_number",
"latest_version_id",
"version_comment",
"initial_creation_date",
"creator_user_id",
"retention_date",
"modification_user_id",
"creation_date",
"update_counter",
"last_delete_restore_date",
"litigation_hold",
"deleted",
"parent_id",
"modification_date",
"id",
"acl_id",
"amount",
"invoice_number",
"nosql_processing_counter",
"nosql_trigger_operation"
)
select
"version_number",
"latest_version_id",
"version_comment",
"initial_creation_date",
"creator_user_id",
"retention_date",
"modification_user_id",
"creation_date",
"update_counter",
"last_delete_restore_date",
"litigation_hold",
"deleted",
"parent_id",
"modification_date",
"id",
"acl_id",
"amount",
"invoice_number",
0,
'INSERT'
from "test_simple_invoice";
insert
into
"test_simple_invoice_nq"(
"id",
"relation_32877_child_ids",
"nosql_processing_counter",
"nosql_trigger_operation"
)
select distinct
a."parent_id",
array(select b."child_id" from "test_invoice_person_relation" b
where b."parent_id" = a."parent_id" and b."parent_version_id" is null),
0,
'UPDATE'
from "test_invoice_person_relation" a;
System jobs
The arveo system uses several background jobs to perform essential functions. These jobs are managed by a clustered Quartz scheduler running inside the repository service and/or in a dedicated job service. The scheduler instances are synchronized using the database. The repository service creates the jobs and initial trigger configurations when the system is started for the first time. Afterwards, it is possible to modify the scheduled jobs manually.
By default, the scheduler embedded in the repository service is used to create and to execute the jobs. Dedicated job service instances configured to use the same database as the repository service can be used to execute the jobs as well. It is also possible to start the scheduler embedded in the repository service in standby mode. In standby mode, the repository service will create the jobs (if required), but it will not execute them.
The available configuration parameters for the scheduler are listed here: Job service
The available configuration parameters for the jobs are listed here: Job configuration
It is required to configure the user and the password to be used for the jobs. The user has to own the required
authorities to execute the jobs: ECR_PURGE_RECOVERY_TABLE and the autority configured in security.general.role-for-secured-access
(by default ECR_SERVICE_USER ).
|
Clean recovery table job
The expired entries in the recovery table (see Recovery) are deleted by the
clean recovery table job. By default, the job is triggered every day at 3 a.m. The user configured to execute the system
jobs needs to have the ECR_PURGE_RECOVERY_TABLE
authority to be able to perform this operation.
NOSQL queue job
Data that is supposed to be stored in the SOLR NOSQL database is stored in dedicated queue tables in the relational database used by arveo. The NOSQL queue job is used to read the data from the queue tables and to write it to SOLR. By default the job is scheduled once a second individually for every queue table in the system. It is possible to configure the number of entries to process in one run of the job as well as the maximum number of attempts to write the data to SOLR.
When the NOSQL feature is disabled for a type definition, the queue job and the triggers for the queue job have to be disabled or removed manually from the scheduler. |
Using external Job Service instances
It is possible to use one or more external Job Service instances to execute the scheduled system jobs. The service must
be configured to use the same tenants as the repository service. To be able to execute the system jobs,
the job implementations must be present in each of the Job Service’s class paths. The jobs are available as a ZIP
file (ecr-packaging-jobs-external<version>.zip) that contains all required libraries. Simply extract the contents of
the ZIP file to a directory (e.g. libs) and start the Job Service with the following parameter:
-Dloader.path=libs
.
The configuration parameters for the jobs are already configured in the database. No further configuration parameters for the jobs are required in the Job Service’s configuration. However, the service must be able to authenticate to the repository service. As the jobs use a username and password to obtain an access token, the service needs OAuth client registrations both for the client_credentials and for the password grant types. The following example shows how to configure two client registrations for the service:
spring:
security:
oauth2:
resourceserver:
jwt:
public-key-location: "http://localhost:39004/oauth/public_key"
client:
registration:
cmn-user-service-client-credentials:
provider: user-service
client-id: "tech-client"
client-secret: "tech-secret"
authorization-grant-type: "client_credentials"
scope: "oauth2"
cmn-user-service-password:
provider: user-service
client-id: "test-client"
client-secret: "my-secret"
authorization-grant-type: "password"
scope: "oauth2"
provider:
user-service:
authorization-uri: "http://localhost:39004/oauth/auth"
token-uri: "http://localhost:39004/oauth/token"
By default, the scheduler included in the repository service instances will be used to execute the scheduled jobs, too. When the scheduler in the repository service is started in standby mode, only the external Job Service instances will execute the scheduled jobs. The following configuration can be used to start the repository service with a scheduler in standby mode:
job-service:
standbyOnlyScheduler: true
Using Hashicorp Vault
Hashicorp Vault can be used to store sensitive configuration parameters like database passwords or encryption master keys. Each arveo service tries to load configuration data from a Vault instance at startup. To configure the location and access method for Vault, the following application arguments can be used:
-
spring.cloud.vault.host
: Defines the host name of the Vault host. -
spring.cloud.vault.port
: Sets the port used to connect to Vault. -
spring.cloud.vault.scheme
: Either https or http -
spring.cloud.vault.authentication
: Sets the authentication mechanism to use.
These properties cannot be configured using the Configuration Service. Configuration data from the Configuration Service
is loaded after the connection to Vault has been established. Instead, these properties must be set as application
parameters. Example: java -jar service.jar --spring.cloud.vault.port=8200
|
It is possible to disable the Vault integration by setting spring.cloud.vault.enabled=false
.
Additional information about the configuration parameters, especially the possible authentication mechanisms, can be found in the Documentation of the Spring Cloud Vault project.
Defining secrets
Vault features several ways to provide secrets to applications. Configuration properties for the arveo
services must be stored in the key value secrets engine. Each secret consists of a path and several key-value-pairs.
The path defines the scope of the property. It can either be set to application
to store a secret for all services,
or to the name of the service just like the name of the configuration files in the Configuration Service. For example,
to configure the password of the JDBC datasource used by all services, a key-value-pair of spring.datasource.password=password
would be stored under the path application
. A property for the repository service (ecr-service) would be stored
in a key-value-pair property=value
under the path ecr-service
. The following table contains the application names
of the different services.
Service | Application name |
---|---|
Repository Service |
ecr-service |
User Management Service |
cmn-user-management-service |
User Management Access Control Service |
cmn-user-management-access-control-service |
Enterprise User Management Service |
cmn-user-management-enterprise-service |
Document Conversion Service |
document-conversion-service |
Administration Service |
cmn-administration-service |
Audit Service |
cmn-audit-service |
Integration Service |
cmn-integration-service |
Configuration Properties
ecr.server.caching
Property | Type | Description | Default value |
---|---|---|---|
content-access-tokens.expire-after |
java.time.Duration |
The time after which an entity in the cache will be expired. |
15m |
content-access-tokens.size |
java.lang.Long |
The maximum number of entities in the cache. |
500 |
default-acls.expire-after |
java.time.Duration |
The time after which an entity in the cache will be expired. |
15m |
default-acls.size |
java.lang.Long |
The maximum number of entities in the cache. |
500 |
enums.expire-after |
java.time.Duration |
The time after which an entity in the cache will be expired. |
15m |
enums.size |
java.lang.Long |
The maximum number of entities in the cache. |
500 |
type-definition-access.expire-after |
java.time.Duration |
The time after which an entity in the cache will be expired. |
15m |
type-definition-access.size |
java.lang.Long |
The maximum number of entities in the cache. |
500 |
type-definitions.expire-after |
java.time.Duration |
The time after which an entity in the cache will be expired. |
15m |
type-definitions.size |
java.lang.Long |
The maximum number of entities in the cache. |
500 |
ecr.server.content-access-tokens
Property | Type | Description | Default value |
---|---|---|---|
alias |
java.lang.String |
The alias of the certificate used to sign the tokens. |
|
key-password |
java.lang.String |
The password for the alias. |
|
key-store-password |
java.lang.String |
The password for the keystore. |
|
key-store-path |
java.lang.String |
The absolute path to the keystore that contains the certificate used to sign the tokens. |
|
key-store-type |
java.lang.String |
The type of the keystore (e.g. PKCS12, JKS…) |
|
max-token-lifetime |
java.time.Duration |
The maximum allowed lifetime of the generated tokens. |
1d |
ecr.server.http
Property | Type | Description | Default value |
---|---|---|---|
file.directory |
java.io.File |
The directory used to store the temporary files. |
|
file.prefix |
java.lang.String |
The prefix to use for the names of the temporary files. |
temp |
file.suffix |
java.lang.String |
The suffix to use for the names of the temporary files. |
.dat |
file.threshold |
java.lang.Integer |
The size of the file in bytes from which on a temporary file will be used for buffering. |
131072 |
ecr.server.jobs
Property | Type | Description | Default value |
---|---|---|---|
clean-recovery-table.cron-expression |
java.lang.String |
Defines the CRON expression used to schedule the job. |
0 0 3 * * ? |
clean-recovery-table.enabled |
java.lang.Boolean |
If set to false, the job will not be scheduled. |
true |
jms-statistics.cron-expression |
java.lang.String |
Sets the CRON expression used to schedule the job. |
*/15 * * * * ? |
jms-statistics.enabled |
java.lang.Boolean |
If set to false, the job will not be scheduled. |
false |
jms-statistics.jms-receive-timeout |
java.time.Duration |
Defines the time the job will wait for a reply from the message broker. |
1s |
no-sql-queue.batch-size |
java.lang.Integer |
Sets the number of entries to load from the queue table in one batch. |
100 |
no-sql-queue.cron-expression |
java.lang.String |
Sets the CRON expression used to schedule the job. |
*/1 * * * * ? |
no-sql-queue.enabled |
java.lang.Boolean |
If set to false, the job will not be scheduled. |
true |
no-sql-queue.retries |
java.lang.Integer |
Sets the maximum number of attempts to write an entry in the queue to solr. |
3 |
password |
java.lang.String |
Defines the password of the user used to run the jobs. |
|
retention-cleanup.global-settings.asynchronous |
java.lang.Boolean |
If true, content of documents will be removed from the storage asynchronously using a message queue. The database entries will not be removed but marked as deleted using the COMPLIANCE_DELETED field. |
false |
retention-cleanup.global-settings.batch-size |
java.lang.Integer |
Defines the size of a single batch of entities processed by the job. |
1000 |
retention-cleanup.global-settings.max-message-queue-size |
java.lang.Integer |
The maximum acceptable size of the message queues used by the job. Checked when the job is started. If the size of one of the queues exceeds the limit, the job is cancelled. |
100000 |
retention-cleanup.global-settings.max-runtime |
java.time.Duration |
Maximum acceptable runtime for the job. When the time is exceeded, the job is cancelled. |
|
retention-cleanup.global-settings.purge-content |
java.lang.Boolean |
If true, all content elements of a document and all it’s versions will be deleted. |
true |
retry-renditions.batch-size |
java.lang.Integer |
Defines the number of document versions to select in one run of the job. |
1000 |
retry-renditions.cron-expression |
java.lang.String |
Defines the CRON expression used to schedule the job. |
0 0 3 * * ? |
retry-renditions.enabled |
java.lang.Boolean |
If set to false, the job will not be scheduled. |
true |
username |
java.lang.String |
Defines the name of the user used to run the jobs. |
ecr.server.jobs.retention-cleanup
Property | Type | Description | Default value |
---|---|---|---|
global-settings.asynchronous |
java.lang.Boolean |
If true, content of documents will be removed from the storage asynchronously using a message queue. The database entries will not be removed but marked as deleted using the COMPLIANCE_DELETED field. |
false |
global-settings.batch-size |
java.lang.Integer |
Defines the size of a single batch of entities processed by the job. |
1000 |
global-settings.max-message-queue-size |
java.lang.Integer |
The maximum acceptable size of the message queues used by the job. Checked when the job is started. If the size of one of the queues exceeds the limit, the job is cancelled. |
100000 |
global-settings.max-runtime |
java.time.Duration |
Maximum acceptable runtime for the job. When the time is exceeded, the job is cancelled. |
|
global-settings.purge-content |
java.lang.Boolean |
If true, all content elements of a document and all it’s versions will be deleted. |
true |
ecr.server.liquibase
Property | Type | Description | Default value |
---|---|---|---|
auto-change-log |
java.lang.String |
Defines the location used to store the auto generated changelog. |
changeLog/auto.xml |
changelog-directory |
java.lang.String |
The directory used when generated changelogs are kept. This setting is only relevant when keepChangelogs is set to true. |
changelog |
custom-change-log |
java.lang.String |
Defines the location of a custom liquibase changelog to execute on startup after the database schema was initialized. Changelogs can be loaded from the classpath by adding the 'classpath:' prefix. Files must be identified by an absolute path using the prefix 'file:/'. |
|
keep-changelogs |
java.lang.Boolean |
If set to true, generated changelogs will be kept in separate files in the configured directory. |
false |
pre-initialization-change-log |
java.lang.String |
Defines the location of a custom liquibase changelog to execute on startup before the database schema was initialized. Changelogs can be loaded from the classpath by adding the 'classpath:' prefix. Files must be identified by an absolute path using the prefix 'file:/'. |
ecr.server.memory
Property | Type | Description | Default value |
---|---|---|---|
buffer-size |
java.lang.Integer |
Defines how many bytes of data to keep in memory when working with streams before switching to a temporary file. |
1024000 |
ecr.server.messaging
Property | Type | Description | Default value |
---|---|---|---|
json-messages |
java.lang.Boolean |
If enabled, the payload of JMS messages will be a JSON string. |
true |
queue-listener-concurrency |
java.lang.String |
Specify the number of threads used for listeners for queues (NOT topics!) via a "lower-upper" String, e.g. "5-10", or a simple upper limit String, e.g. "10" (the lower limit will be 1 in this case). |
1-10 |
redelivery.back-off-multiplier |
java.lang.Integer |
The number to multiply the redelivery delay with for every redelivery attempt. |
5 |
redelivery.initial-redelivery-delay |
java.lang.Long |
The time in milliseconds to wait until a failed message will be redelivered. |
1000 |
redelivery.maximum-redeliveries |
java.lang.Integer |
The maximum number of redelivery attempts for failed messages. |
3 |
redelivery.use-exponential-back-off |
java.lang.Boolean |
If true, the time between redeliveries of a failed message will be multiplied with the backOffMultiplier for each redelivery. |
true |
ecr.server.query
Property | Type | Description | Default value |
---|---|---|---|
in-condition-optimization-limit |
java.lang.Integer |
Sets the number of entries in an in clause from which in the optimized query is used. -1 disables that feature. |
-1 |
no-sql-query-time-warning-millis |
java.lang.Integer |
Sets the maximum duration in milliseconds for the execution time of queries on the noSql database after which a warning will be logged. |
5000 |
statement-execution-time-warning-millis |
java.lang.Integer |
Sets the maximum duration in milliseconds for the execution time of a database statement after which a warning will be logged. This setting applies to the relational database. |
5000 |
ecr.server.security
Property | Type | Description | Default value |
---|---|---|---|
type-definition-access-checks-enabled |
java.lang.Boolean |
Defines whether type definition specific access checks are enabled or not. |
true |
ecr.server.solr
Property | Type | Description | Default value |
---|---|---|---|
collection-name |
java.lang.String |
The name of the collection to use. |
ecr |
collection-replicas |
java.lang.Integer |
The default number of replicas for a new collection created by the respository service. |
1 |
collection-shards |
java.lang.Integer |
The default number of shards for a new collection created by the respository service. |
1 |
commit-within-millis |
java.lang.Integer |
Defines the maximum time in milliseconds after which the solr client will perform a commit. |
1000 |
default-config-name |
java.lang.String |
Defines the default SolrConfig. |
solr-plugin-config |
host |
java.lang.String |
Defines the host for the connection of the Solr Client. |
|
http-client-connection-timeout |
java.lang.Integer |
Defines the connection timeout for the Solr HTTP client in milliseconds. |
10000 |
password |
java.lang.String |
The password used for basic authorization to SOLR. |
|
schema-name |
java.lang.String |
The name of the schema. If null, the collection name is used. |
|
ssl-key-store |
java.lang.String |
The path to the keystore to use for SSL communication. |
|
ssl-key-store-password |
java.lang.String |
The password for the SSL keystore. |
|
ssl-tru-ststore |
java.lang.String |
The truststore to use for SSL communication. |
|
ssl-truststore-password |
java.lang.String |
The password for the SSL truststore. |
|
use-ssl-client-auth |
java.lang.Boolean |
Whether to use SSL client authentication. |
false |
username |
java.lang.String |
The username used for basic authorization to SOLR. |
ecr.server.storage
Property | Type | Description | Default value |
---|---|---|---|
profile-aliases |
java.util.Map<java.lang.String,java.lang.String> |
A mapping of alias names to storage profile names. |
|
profile-templates |
java.util.List<de.eitco.ecr.server.config.StorageProfileTemplate> |
A list of profile templates used by the bucket selector plugin. |
|
profiles |
java.util.Map<java.lang.String,de.eitco.ecr.server.config.StorageProfileSettings> |
A map containing all configured storage profiles. |
ecr.server.system
Property | Type | Description | Default value |
---|---|---|---|
batch-update-statement-cache-enabled |
java.lang.Boolean |
Enables or disables the cache for generated batch update SQL statements. |
true |
create-solr-changes |
java.lang.Boolean |
If set to false when initializing the type schema solr changes will not be executed. |
true |
event-listeners-enabled |
java.lang.Boolean |
Enables or disables the JMS event listeners used to process system events like recycle bin cleanup and the creation of renditions. |
true |
fetch-jms-statistics |
java.lang.Boolean |
Enables or disables the regular fetching of JMS statistics from the ecr_jms_statistics table. |
true |
initialize-empty-database |
java.lang.Boolean |
If set to true, the system will create the schema even if not in maintenance mode should the table ecr_types be empty. |
true |
log-schema-changes |
java.lang.Boolean |
If set to true together with maintenanceMode, the system will only log required changes to the database schema and shut down after the log was written. |
false |
maintenance-mode |
java.lang.Boolean |
If true, the server will update the database schema at startup and shut down after the update was finished. This is actually a combination of updateSchema = true and terminateAfterCreation = true. |
false |
schema-change-log-directory |
java.lang.String |
The location of the logfile used when checkForSchemaChanges is set to true. |
logs |
terminate-after-creation |
java.lang.Boolean |
If true, the server will terminate after the database schema was created. |
false |
update-schema |
java.lang.Boolean |
Whether to update the database schema at startup or not. |
false |
ecr.server.upload
Property | Type | Description | Default value |
---|---|---|---|
maximum-file-size |
java.lang.Long |
Defines the maximum size of a single file in one multipart upload in bytes. -1 means no limit. |
-1 |
maximum-in-memory-size |
java.lang.Integer |
Defines the maximum size of data to keep in memory before using a temporary file (in bytes). |
1048576 |
maximum-total-size |
java.lang.Long |
Defines the maximum total size of all files in one multipart upload in bytes. -1 means no limit. |
-1 |
job-service
Property | Type | Description | Default value |
---|---|---|---|
standby-only-scheduler |
java.lang.Boolean |
If true, the scheduler used by the job service will be in standby mode. It will not process any jobs. |
false |
wait-for-event |
java.lang.Boolean |
If true, the scheduler will not start to process events until the {@link de.eitco.commons.job.service.common.StartSchedulerEvent} is sent. |
false |
Monitoring
arveo uses Spring Boot Actuator to expose a monitoring REST API that can be consumed by monitoring systems like Prometheus or the Administration Service. The Actuator documentation linked above contains information about the available monitoring data, how to enable or disable specific endpoints and how to configure security.
The overview of all actuator endpoints is available at /actuator
. Health information is available at /actuator/health
.
Custom health indicators
In addition to the default health indicators, arveo provides the following additional health indicators:
-
storagePlugins
: Checks if at least one storage profile is configured and if all storage plugins configured in the storage profiles are able to store data.-
FileSystemPlugin
: Checks if the configured storage directory exists and whether the database sequence used to generate storage IDs is available. -
S3Plugin
: Checks if the configured bucket exists. When the last storage operation has failed, the endpoint checks if the S3 service is available. -
SwiftV2Plugin
,SwiftV3Plugin
: Checks if the configured container exists. When the last storage operation has failed, the endpoint checks if the Swift service is available.
-
-
typeDefinitions
: Checks if there is at least one registered type definition.
The custom health indicators can be disabled like any other health indicator by setting the configuration property
management.health.key.enabled
(where key is the name of the indicator) to false.
Custom endpoints
In addition to the default actuator endpoints, arveo provides the following custom actuator endpoints.
-
storageProfiles
: Provides a list of all storage profiles and the storage plugin used by each profile. -
typeDefinitions
: Provides a list of all type definitions.
The custom endpoints can be disabled like any other actuator endpoint by setting the property
management.endpoint.key.enabled
(where key is the name of the endpoint) to false.
Custom metrics
In addition to the default metrics, arveo provides additional metrics that can be used to monitor the performance of the system.
Storage
For each storage profile a metric is available that records the following statistics:
Metric | Description |
---|---|
|
Number of read operations |
|
Number of write operations |
|
Total amount of bytes read |
|
Total amount of bytes written |
|
Number of read errors |
|
Number of write errors |
|
Read times |
|
Write times |
Each metric contains a tag named profile
with a value for each configured storage profile.
These metrics are reset each time the repository service instance is restarted. |
Profiles that use the BucketOrganizerPlugin are not included in the metrics. Instead, a separate metric for each of
the referenced profiles used by the bucket organizer profile is available.
|
It is possible to disable the recording of these metrics by setting the following parameters to false. This does not only disable the availability of the metrics but the entire recording mechanism.
management:
metrics:
enable:
ecr:
storage: false
Relational database
The following metrics are collected for operations on the relational database:
Metric | Description |
---|---|
|
Maximum and total execution time as well as the number of executed database statements |
|
Number of database errors |
|
Number of statements that took longer than the configured threshold to execute |
Each of these metrics contains a tag for the current tenant and the type of statement that was executed. The threshold
time after which an execution time warning is logged and the counter is incremented can be configured using the setting
ecr.server.query.statementExecutionTimeWarningMillis
(in milliseconds).
The recording of these metrics can be disabled using the following configuration parameter:
management:
metrics:
enable:
ecr:
rdb: false
Solr
The following metrics are collected for the Solr object database:
Metric | Description |
---|---|
|
Maximum and total time as well as a counter for add operations |
|
Maximum and total time as well as a counter for query operations |
|
Maximum and total time as well as a counter for delete-by-id operations |
|
A counter for all Solr exceptions |
|
Number of queries that took longer than the configured threshold to execute: |
All of the above metrics contain a tag for the current tenant. The error counter contains a tag for the type of operation
that has failed. The threshold time after which an execution time warning is logged and the counter is incremented can
be configured using the setting ecr.server.query.noSqlQueryTimeWarningMillis
(in milliseconds).
It is possible to disable the collection of these metrics using the following configuration parameter:
management:
metrics:
enable:
ecr:
solr: false
JMS queues
arveo provides metrics to monitor the state of the JMS queues used for asynchronous operations. These
metrics rely on the statistics plugin of ActiveMQ to retrieve statistics.
The plugin must be activated in activemq.xml
as shown below:
<broker...>
<plugins>
<statisticsBrokerPlugin/>
</plugins>
</broker>
The statistics collected by the statistics plugin are polled periodically by a job running in the repository service. The job is disabled by default. To activate it, set the following parameters in the configuration of the repository service:
ecr:
server:
jobs:
jms-statistics:
cron-expression: "*/15 * * * * ?"
enabled: true
In the example configuration above, the job is triggered every 15 seconds.
Unlike other system jobs, this job always runs within the repository service. It cannot be offloaded to a separate job service instance. |
The following metrics will be available once the job was activated:
Metric | Description |
---|---|
|
The number of messages currently contained in the queue. |
|
The total number of messages that have been enqueued in the queue. |
|
The total number of messages that have been dequeued from the queue. |
|
The average time a message was enqueued before it was dequeued. |
Each metric contains a tag called queue
containing the name of the queue. The following queues are currently used
by the repository service:
Queue | Description |
---|---|
|
Contains messages containing IDs of content elements that have to be purged from the storage. |
|
Contains messages with a delivery delay that will cause an entity to be deleted from the recycle bin once it’s recycle delay has expired. |
|
Contains messages of renditions that have to be created for new content elements. |
|
The dead letter queue for the ecr-queue-create-renditions queue. This is used to set the rendition availability status to failed once all retries for the creation of a rendition have failed. |
Type definitions
arveo provides metrics for several operations for each type definition. The following metrics are available:
Metric | Description |
---|---|
|
Counter and time measurements for read operations. |
|
Counter for read operation errors caused by the client. |
|
Counter for read operation errors caused by the server. |
|
Counter and time measurements for delete operations. |
|
Counter for delete operation errors caused by the client. |
|
Counter for delete operation errors caused by the server. |
|
Counter and time measurements for create operations. |
|
Counter for create operation errors caused by the client. |
|
Counter for create operation errors caused by the server. |
|
Counter and time measurements for update operations. |
|
Counter for update operation errors caused by the client. |
|
Counter for update operation errors caused by the server. |
|
Counter and time measurements for recycle operations. |
|
Counter for recycle operation errors caused by the client. |
|
Counter for recycle operation errors caused by the server. |
|
Counter and time measurements for restore operations. |
|
Counter for restore operation errors caused by the client. |
|
Counter for restore operation errors caused by the server. |
|
Counter and time measurements for find operations. |
|
Counter for find operation errors caused by the client. |
|
Counter for find operation errors caused by the server. |
|
Counter and time measurements for batchupdate operations. |
|
Counter for batch update operation errors caused by the client. |
|
Counter for batch update operation errors caused by the server. |
Each of these metrics has a tag called type-definition
containing the name of the type definition the measurement
was taken for.
Prometheus
arveo provides an actuator endpoint that can be used to collect metrics data using Prometheus.
Prometheus collects data by periodically calling configured sources ("scrapes"). The following example shows an entry
in the prometheus.yml
file for a scrape configuration that collects data from the prometheus actuator endpoint every
15 seconds:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'arveo'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['localhost:39001']
The metrics support of arveo is based on Micrometer. To support monitoring systems like Prometheus, micrometer remembers the last maximum value of time based metrics for a configurable amount of time. This time should be close to the scrape interval of Prometheus and can be configured in the configuration properties of arveo as shown in the example below:
management:
metrics:
export:
prometheus:
step: 15s
The data collected by Prometheus can be visualized using Grafana.
Other monitoring systems
Support for other monitoring system then Prometheus can be enabled by adding the required library to the classpath. The Spring Boot documentation contains a list of the supported monitoring systems and further information about how to configure them.
Attributes in MDC
arveo adds the following additional attributes to the mapped diagnostic context (MDC) of the logging framework to make it easier to analyze the system’s behavior:
-
ecr.user-id
: The ID of the user performing the current request, if available -
ecr.tenant
: The tenant for which the current request is executed.
These attributes are added using a HandlerInterceptor
for the REST endpoints using Spring WebMVC.Note that these
attributes will not be added when arveo is used embedded.In this case, the application using the embedded
arveo instance is responsible for adding required information to the MDC.
Depending on the logger appender in use, it is possible to add these attributes to log messages. See the documentation of logback for details.
Open Telemetry
arveo supports using Open Telemetry to monitor the system’s behavior. Most notably
it is possible to view traces of requests across the different services using a tracing backend like
Jaeger or Zipkin. The outermost span of a trace that was started by
a user’s request will contain the ecr.user-id
and ecr.tenant
attributes containing the user’s ID and the current
tenant. This is done by the same mechanism as described above for the MDC.
Because arveo is based on several widely used open source libraries, the automatic instrumentation mode of Open Telemetry can be used to record traces. This is done by the Open Telemetry java agent as described in the Open Telemetry documentation.
The following example shows the required parameters to use Open Telemetry with Jaeger for the repository service.
-Dotel.traces.exporter=jaeger
-Dotel.metrics.exporter=none
-Dotel.service.name=repository-service
-javaagent:<path>/opentelemetry-javaagent.jar
The metrics export is disabled in the above example. As described in the sections above, metrics can be collected using the actuator endpoints.
Access Control
Access rights
The REST API has the following user-rights (authorities) for different endpoints:
-
ECR_SERVICE_USER (configurable): Required authority for all API endpoints. Must always be present.
-
ECR_ADMIN: Allows editing type- and attribute definitions as well as other administrative operations.
-
ECR_DSGVO_ADMIN: Allows a user to change the litigation hold and retention settings of entities contained in type definitions using the retention feature.
-
ECR_DSGVO_PRIVILEGED_DELETE: An addition to ECR_DSGVO_ADMIN that allows a user to delete an entity which is still within it’s retention period. Organisational precautions must be put in place to ensure DSGVO compliance when making use of this authority.
-
ECR_ALL_TYPES_READ: Allows read access to all type definitions that use type level access restrictions.
-
ECR_ALL_TYPES_WRITE: Allows write access to all type definitions that use type level access restrictions.
-
ECR_PURGE_RECOVERY_TABLE: Allows a user to trigger the removal of expired entries in the recovery table.
Access control lists
arveo makes use of access control lists (ACLs) to protect entities. Each entity can be protected by one ACL. The handling of ACLs is performed by the Access Control Service. The documentation of the Access Control Service contains more information about the general concept of the ACLs used by arveo.
Mapping of the Access Control List values
Although the module user-management-access-control
defines the concepts and functionality of the ACLs, the actual
mapping of the values is implemented in arveo. The class de.eitco.ecr.acl.AclRight
implements the
following permissions using the numeric values shown below.
/**
* The user is allowed to see the object's meta data but not the content.
*/
BROWSE(4000),
/**
* The user is allowed to see the meta data and content of the object.
*/
READ(8000),
/**
* The user is allowed to add annotations to the object.
*/
COMMENT(12000),
/**
* The user is allowed to change meta data and content of the object creating a new version.
*/
WRITE(16000),
/**
* The user is allowed to overwrite an existing version of the object.
*/
OVERWRITE(20000),
/**
* The user is allowed to delete the object.
*/
DELETE(24000),
/**
* The user is allowed to change the ACL of the object.
*/
CHANGE_ACL((Short.MAX_VALUE - 1));
To illustrate the information above, here are some examples:
-
The permission COMMENT is assigned to a group or a job position. In this case, the assignee is per default granted the permissions BROWSE and READ;
-
The prohibition WRITE is assigned to a group or a job position. In this case, the assignee is per default prohibited all the higher rights, so OVERWRITE, DELETE and CHANGE_ACL;
-
A job position J1 with the permission COMMENT acts as a substitute for a job position J2 with the permission OVERWRITE. So the job position J1 is assigned the permission OVERWRITE for the time of the substitution.
-
A job position J1 with the prohibition WRITE acts as a substitute for a job position J2 with the prohibition READ. The job position J1 is still assigned the prohibition WRITE for the time of the substitution. This way, it is guaranteed, that J1 is still able to perform their tasks (which would become impossible if they were assigned stronger prohibitions, that is, the prohibitions of J2).
Batch updates for ACLs
It is possible to change values of multiple ACLs. So lists of ACLs, that satisfy a certain condition, can be processed. For every ACL, that fulfills a given condition, the following modifications can be specified:
-
addgroupright (adds a given right to a given group in every ACL that fulfills the condition);
-
adduserright (adds a given right to a given user in every ACL that fulfills the condition);
-
keepgroupright (keeps the current right of a given group in every ACL that fulfills the condition);
-
keepuserright (keeps the current right of a given user in every ACL that fulfills the condition).
All the other entries in the ACLs that fulfill the condition, are removed. |
The ACL updates are performed in the the module 'common', in the Client SDK by the class AclServiceClient. It calls the method updateAclsWhere() and passes two parameters: an Expression of type Boolean (the condition mentioned above) and a List of type AccessControlListModification as modifications (the four modifications mentioned above) to apply to every ACL. The method updateAclsWhere() executes the given ACL batch updates.
There is also a more convenient method with the same name updateAclsWhere(), that returns a ConditionBuilder, that can be used in searches (see Search Service).
Example of usage
Consider the following snippet from a test class as an example of the batch update functionality for the ACLs. Pay attention to the method setRightsTo(), which is called to modify the current rights.
aclServiceClient.updateAclsWhere().contextReference("id").in().values(
acl1.getIdentifier().getValue(),
acl2.getIdentifier().getValue()
).holds()
.setRightsTo(GrantAndDeny.grant(AclRight.READ)).of(umAdmin.getIdentifier())
.execute();
Attribute Based Access Control (ABAC)
arveo allows entity access based on attributes of that entity.
This can be specified per entity by a static method annotated with @Security
.
The method must return an eql expression that resolves to a boolean i.e. a condition.
It will be called by arveo when entities of the given type are accessed to retrieve an additional filter for the access.
Operations will only affect entities where the condition evaluates to true.
Every operation on entities of the type will execute the method and add the resulting expression to the filter of the operation:
-
Searches will add the expression to the filter of the search request.
-
Batch operations will add the expression to their filter
-
Calls that operate on a specific id will fail if the expression yields
false
.
The simplest case would look like this:
1
2
3
4
5
6
7
8
9
10
@Type(ObjectType.DOCUMENT)
public interface UnsecuredDocuments {
@Security
static Expression<Boolean> calculateAccess() {
return Eql.alwaysTrue();
}
}
This would add the filter 'true' to every operation on the entity, which would allow anyone to access entities.
In most cases, one would want to compare attributes of the entity with properties of the user requesting the current operation. The first can be accomplished with the eql:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Type(ObjectType.CONTAINER)
public interface ThresholdContainer {
int getThreshold();
void setThreshold(int threshold);
@Security
static Expression<Boolean> calculateAccess(Alias alias) {
return EcrQueryLanguage.condition().
alias(alias).field("threshold").
greaterThan().value(300);
}
}
Users may only access entities of the type above where the field 'threshold' is greater than 300.
In order to check the user requesting an operation, one can define a parameter to the method of the type AuthenticationContext
.
Other information may be accessed this way, too.
The method can have up to four parameters of the following types:
-
AuthenticationContext
: this class holds information about the user requesting the operation. -
AclRight
: the right needed to perform the operation. -
Alias
: identifies the part of the query that holds the entity -
DSLContext
: an entrypoint to the jooq api bound to the database and schema the table containing the entities is located in.
Parameters
AuthenticationContext
: who requests the operation?
The AuthenticationContext
holds information about the user requesting the operation.
This parameter will most likely be used in every such method, except for the most basic cases.
Take a case where access to a document is specified by a field named access_token
.
It holds the name of a user-management authority every user with access to it must have.
If it is null, every user has access to the document:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@Type(ObjectType.DOCUMENT)
@OverwriteAllowed
public interface DocumentWithAccessToken {
@Mandatory(false)
String getAccessToken(); (1)
void setAccessToken(String accessToken);
@Security
static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext, DSLContext dslContext) { (4)
return EcrQueryLanguage.condition()
.alias(alias).field("access_token").isNull() (2)
.or()
.value(authenticationContext.getAuthorities())
.contains().alias(alias).field("access_token")
.holds();
}
// ...
// more attributes (3)
}
1 | the type defines the attribute that specifies access |
2 | the query generated uses this attribute. |
3 | other elements of the type are omitted for the sake of readability |
4 | note that the third parameter is unused. In such a case it could be omitted. |
AclRight
: what will the operation do?
The AclRight
parameter holds the right necessary to perform the operation requested.
This is a hint for the method about what should actually be done in the operation.
It allows differentiating between read and write access:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface ContainerAccessedByUserId {
long getOwner(); (1)
void setOwner(long owner);
List<Long> getAudience(); (2)
void setAudience(List<Long> audience);
@Security
static Expression<Boolean> checkAccess(Alias alias, AuthenticationContext authenticationContext, AclRight right) {
long userId = authenticationContext.getUser().getIdentifier().getValue();
if (AclRight.READ.getValue() < right.getValue()) { (3)
return EcrQueryLanguage.condition().alias(alias).field("owner").equalTo().value(userId).holds();
}
return EcrQueryLanguage.condition() (4)
.alias(alias).field("audience").contains().value(userId)
.or().alias(alias).field("owner").equalTo().value(userId)
.holds();
}
// ...
// more attributes
}
1 | This type defines an attribute owner holding the user id of the user, responsible.
The owner of an entity will be the only user to modify the entities. |
2 | The type also defines a list of user ids audience , holding the ids of users that may read the entity.
Users that are neither owner nor audience have no access on the entity. |
3 | Thus, in cases where a right greater than READ is requested, the method returns an expression, that checks whether the current user is the owner of the document. |
4 | In every other case .i.e. the requested access right is READ or below, an expression is returned, that checks whether the current user is the owner or part of the audience. |
Alias
The alias identifies the part of the query executed that contains the entity and should be used to reference its members.
Always use the alias as given in the examples. Other ways to reference the entity might work in most cases but only using the alias assures that referencing entity attributes works in every case. |
The full class name is de.eitco.ecr.common.search.Alias .
Avoid confusion with another Alias class.
|
DSLContext
In some cases, using expressions on the entity itself may become cumbersome or slow. For that, one can use the DSLContext parameter. This allows access by jooq to any table in the same schema the table of the requested entity is located in. It can be used to obtain specific data directly.
Since the access is directly to the database, there are no further access checks on queries using DSLContext.
Depending on the operation requested, the method may be able to execute INSERT or UPDATE statements.
It is the responsibility of the security methods author to make sure changes do not create an inconsistent or otherwise corrupted state of the database.
The simplest way to assure this, is to use the DSLContext only to read data.
|
Examples
Subselect
There might be cases where the attribute defining access is not part of the entity itself, but part of another entity referred to by a foreign key or a relation.
In such cases a subselect comes handy.
Assume two entity types: documents, to which access is restricted by an attribute named owner_group
which is part of the second entity a container.
An owner group must be given Documents are linked to their container with a foreign key named contained_in
:
1
2
3
4
5
6
7
8
9
10
11
12
13
@Type(ObjectType.CONTAINER)
public interface OwnedContainer {
long getOwnerGroup(); (1)
void setOwnerGroup(long ownerGroup);
// ...
// more attributes
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
@Type(ObjectType.DOCUMENT)
public interface OwnedDocument {
@ForeignKey(target = OwnedContainer.class, targetProperty = "id")
ContainerId getContainedIn(); (2)
void setContainedIn(ContainerId container);
@Security
static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext) {
List<Long> groupIds = authenticationContext.getAllGroups().stream().map(group -> group.getIdentifier().getValue()).collect(Collectors.toList()); (3)
return EcrQueryLanguage.condition().alias(alias).field("contained_in").in() (4)
.select("id").from("owned_container").as("container").where().
contextReference("container", "owner_group").in().values(groupIds).holds().holds();
}
// ...
// more attributes
}
1 | The entity OwnedContainer holds the attribute that specifies access. |
2 | The entity OwnedDocument is linked with a container by its attribute contained_in . |
3 | The AuthenticationContext is used to obtain the ids of every group the current user is a member of. |
4 | The group ids are used to create a check whether the entity is contained in a container whose owner_group is one of the users groups. |
Interface inheritance
Since attribute based security - by definition - is based on attributes, it must be able to be specified by type. However, in some cases a more general solution is desired. In these cases, java interface inheritance comes handy.
Assume the class DocumentWithAccessToken
from above.
Assume further that there are other types (ContainerWithAccessToken
and FolderWithAccessToken
) that should be secured by their access-token as well.
In this case it is a good practice to combine the access method and field in a common superinterface:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(2)
public interface WithAccessToken {
@Mandatory(false)
String getAccessToken();
void setAccessToken(String accessToken);
@Security
static Expression<Boolean> calculateAccess(Alias alias, AuthenticationContext authenticationContext) {
return EcrQueryLanguage.condition() (1)
.alias(alias).field("access_token").isNull()
.or()
.alias(alias).field("access_token").in()
.values(new ArrayList<>(authenticationContext.getAuthorities()))
.holds();
}
}
1 | The check for the access token is defined here. |
2 | note that this interface does not specify an entity by itself, since it lacks a @Type annotation. |
Then the types itself can simply inherit this feature:
1
2
3
4
@Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface ContainerWithAccessToken extends WithAccessToken {
}
1
2
3
4
@Type(ObjectType.FOLDER)
@OverwriteAllowed
public interface FolderWithAccessToken extends WithAccessToken {
}
Complex scenario: a Hospital
Here we take a look at a more complex example: a Hospital. The hospital manages documents concerning cases. A case belongs to a patient. Users of the system are hospital employees and may access data about documents, cases and patients. These users are part of one or several wards. For every ward there is a group in the system containing the users that are part of this ward. Cases have a list of wards - that may change over time - where the patient was treated for that case. Access is specified as follows;
-
A user may only access cases whose wards contain at least one ward, the user is a member of.
-
A user may only access patients whose cases he may access.
-
A user may only access document whose cases he may access.
Cases could be modeled as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
@Type(ObjectType.CONTAINER)
public interface MedicalRecordCase {
@Mandatory
@ForeignKey(target = MedicalRecordPatient.class, targetProperty = "id")
ContainerId getPatient(); (1)
void setPatient(ContainerId containerId);
@Mandatory
List<String> getWards(); (2)
void setWards(List<String> wards);
@Security
static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {
List<String> groupNames = authenticationContext.getAllGroups() (3)
.stream().map(group -> group.getEntityName().getValue()).collect(Collectors.toList());
Expression<Boolean> result = null;
for (String groupName : groupNames) { (4)
Expression<Boolean> wardCondition = EcrQueryLanguage.condition() (5)
.alias(alias).field("wards").contains().value(groupName)
.holds();
if (result == null) {
result = wardCondition;
} else {
result = Eql.or(result, wardCondition); (6)
}
}
if (result == null) {
return Eql.alwaysFalse(); (7)
}
return result;
}
// case attributes ... (8)
}
1 | A case holds a foreign key to a patient. Since a case must have a patient, this attribute is mandatory. |
2 | A case has a list of wards, where it was treated. This attribute is also mandatory. |
3 | When computing access, the groups - and thus the wards - of the current user are obtained from the AuthenticationContext |
4 | Since it is necessary to check whether the intersection between the wards of the case and the groups of the user is not empty, it is iterated over all the groups of the user. |
5 | A condition is created that checks whether the entities wards contain the current group. |
6 | Access is granted when one of the conditions created yields true . |
7 | If the user is in no group whatsoever he may access no case at all. |
8 | Further attributes are omitted for the sake of readability. |
Now Patients specify their security as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@Type(ObjectType.CONTAINER)
public interface MedicalRecordPatient {
@Security
static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {
Alias caseAlias = Alias.byName("case"); (2)
Expression<Boolean> caseAccessCondition = MedicalRecordCase.access(caseAlias, authenticationContext); (1)
return EcrQueryLanguage.condition().exists()
.select("id").from(MedicalRecordCase.class).as(caseAlias.getValue()) (3)
.where()
.alias(caseAlias).field("patient").equalTo().alias(alias).id() (4)
.and(caseAccessCondition).holds().holds(); (5)
}
// patient attributes ...
}
1 | Access to a patient depends on access to cases.
So, the MedicalRecordCase.access() is called (see above). |
2 | In order to do that a custom alias is specified, that is used for the method call and in the query below. |
3 | Using a subselect its is checked whether there is a case … |
4 | … that is assigned to the patient the access is checked for and … |
5 | … and to which the current user may access. |
Documents may specify their security method very similar, only the document-to-case link is specified the other way around:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
@Type(ObjectType.DOCUMENT)
public interface MedicalRecordDocument {
@ForeignKey(target = MedicalRecordCase.class, targetProperty = "id")
@Mandatory
ContainerId getCase(); (1)
void setCase(ContainerId containerId);
@Security
static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {
Alias caseAlias = Alias.byName("case");
Expression<Boolean> caseAccessCondition = MedicalRecordCase.access(caseAlias, authenticationContext); (2)
return EcrQueryLanguage.condition()
.exists().select("id").from(MedicalRecordCase.class).as(caseAlias.getValue())
.where()
.alias(caseAlias).id().equalTo().alias(alias).field("case") (3)
.and(caseAccessCondition).holds().holds();
}
// patient attributes ...
}
1 | A document is assigned to a case. This is mandatory. |
2 | As for patients, the access check for documents depends on the access check for cases. |
3 | A similar subselect to the one above is created, however here the outer select holds the link to the inner one. |
Revision history and ABAC
In the example above access to the entities is defined by one attribute: the wards of a case. It is assumed that a case may be treated in several wards - one after another - and every employee belonging to those wards needs access to the case, its patients data and its documents. Visiting the wards one after another will result in several updates on the case - each adding another ward - and thus in a revision history where the list of wards will build up over time.
This has an interesting consequence in the scenario above : The access to older versions of the case will be granted to users who were allowed to access it at the time the version was created.
For instance, if a case started in the pulmonology it would have the following revision list:
revision | ward(s) |
---|---|
1 |
pulmonology |
If it was moved to intensive care after that, it would result in the following revision list:
revision | ward(s) |
---|---|
1 |
pulmonology |
2 |
pulmonology, intensive care |
Employees working in intensive care would be unable to access data of revision 1 of this case. Depending on the scenario this might or might not be desired.
If this is not desired, it can be fixed with a simple annotation on the case interface:
1
2
3
4
@Mandatory
@Versioned(value = false)
List<String> getWards();
By simply specifying the wards
attribute as not versioned, changes on the attribute will affect every revision of the case.
If a case started in the pulmonology it would at first have the same revision history as above:
revision | ward(s) |
---|---|
1 |
pulmonology |
However, if it was moved to intensive care now, the revision list would look like this:
revision | ward(s) |
---|---|
1 |
pulmonology, intensive care |
2 |
pulmonology, intensive care |
Now all employees in pulmonology and intensive care have access to every revision of this case.
This solution can be used generally. When access control to entities depends on attributes, deciding whether those attributes are versioned or not is an important detail.
Accessing external tables
Assume that in the hospital from the example above, the information which employee belongs to which ward is kept in a separate table named 'employee_to_ward'. This table is managed by an external application.
Using direct database access
As stated earlier, it is possible to add a parameter of the type org.jooq.DSLContext
to a security method in order to gain direct access to the database.
This could be used to access the 'employee_to_ward' table:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@Security
static Expression<Boolean> access(
Alias alias,
AuthenticationContext authenticationContext,
DSLContext context (1)
) {
long userId = authenticationContext.getUser().getIdentifier().getValue(); (2)
final List<String> wards = context.selectFrom("test_employee_to_ward")
.where(DSL.field(DSL.name("employee")).eq(DSL.value(userId))) (3)
.fetch(DSL.field("ward", String.class));
Expression<Boolean> result = null;
for (String ward : wards) {
// ... (as above) (4)
1 | The DSLContext is defined as another parameter. |
2 | The AuthenticationContext is only used to get the current users id. |
3 | The wards of the user are obtained using the jooq-api to directly access the database. Depending on the scenario, it might improve performance to cache the result of this query. |
4 | After that, the same code as above is executed. |
Using a Metadata type
Alternatively, an arveo custom @Type
could be used to access the external table:
1
2
3
4
5
6
7
8
9
10
11
12
13
@View (1)
@Name("employee_to_ward") (2)
@Type(ObjectType.META)
public interface UserToWard {
long getEmployee();
void setEmployee(long employee);
String getWard();
void setWard(String ward);
}
1 | The @View annotation marks the type as external.
This means arveo will not create the corresponding table. |
2 | The @Name annotation specifies the name of the table the types entities are stored in. |
Now, in the security method this type can be accessed with a subselect:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Security
static Expression<Boolean> access(Alias alias, AuthenticationContext authenticationContext) {
final Alias userWard = Alias.byName("user_ward"); (1)
return EcrQueryLanguage.condition()
.exists().select("ward").from(UserToWard.class).as(userWard.getValue())(2)
.where()
.alias(userWard).field("employee").equalTo()
.value(authenticationContext.getUser().getIdentifier().getValue()) (3)
.and()
.alias(alias).field("wards").contains() (4)
.alias(userWard).field("ward").holds()
.holds();
}
1 | First, an alias is declared for the subselect. |
2 | Then, a query is created that checks whether there is a ward, that … |
3 | … the current user is assigned to and … |
4 | … that is contained in the current entities wards attribute. |
Data Modelling
Entity types
The following chapter defines entity types and type definitions, used in arveo.
To be able to store objects in the database we define a class for entity definitions.So an entity represents a type of data structure used in the arveo.There are five supported entity types.
-
Document: an entity that can contain metadata and content. Documents are the only objects that can have content, the content may be binary. Documents can be contained in folders (Document).
-
Container: simple folder-like object not organized in a tree structure but with relations to other objects. A Container contains only metadata and cannot be contained in a folder (Container).
-
Relation: an entity that represents a relation between two other entities. A relation can contain metadata (Relation).
-
Folder: an entity that contains metadata and is organized in a tree structure like in a file system (Folder)
-
Meta: an entity that contains only metadata. Unlike containers, metadata entities do not support system attributes like ID and creation date (Metadata)
Each type definition is represented by one (or more) tables in the database.
Each entity is referred by its system-wide unique id, which consists of a tenant id and its type definition id, followed by the sequential database id of this entity:
[12bit Tenant id][14bit Type Definition id][38bit Entity id].
Versioned entities
All above listed entities (except for meta) are versioned by default. It means that they store version information, modification information. The class VersionInformation combines information about a version, including version id, version number and version comment. The version modification object stores a modification stamp, consisting of a user id and a ZonedDateTime object, both for the events of creation and last modification of the entity. The version information is stored in a separate table for each typed entity.
When specifying a type definition, you can decide which attributes of this type definition are versioned.
If none of the attributes are versioned, the entire object is not versioned. For the type Document the content changes are always versioned. |
Custom types
You can make your class a type and add features by annotating your classes. You can define the custom metadata schema with simple getter and setter methods.
When you start a project you have to create your own types. Simply annotate the class with the TYPE annotation and define your schema with type safe getter/setter methods (Example). |
You can find the arveo-specific annotations in the module type-definition-annotations
. The goal is to create a type, and specify its properties. So annotations precisely define the behavior of the type definitions. When defining a type, a database table is created. To achieve this, you annotate the type definition with @Type. There is an exception to that: when annotating with @View or @Partial_View, no database table is created.
There are 2 types of annotations:
-
annotations on types (interfaces): @Target({ElementType.TYPE, ElementType.ANNOTATION_TYPE})
-
annotations on properties (getter-methods): @Target({ElementType.METHOD, ElementType.ANNOTATION_TYPE})
Some annotations can be used both on interfaces and on getter-methods. The annotation ElementType.ANNOTATION_TYPE is used for inherited annotations. The following annotation groups are used in arveo:
-
constraint: contains annotations that define specific properties or behaviour of attributes;
-
defaults: contains annotations that define default values of attributes;
-
index: contains annotations that define indexes on type definitions;
-
naming: contains annotations that specify names for tables, attribute definitions, type definitions, enumeration types and enumeration values;
-
reference: contains annotations that specify references between types or attributes;
-
system: contains annotations that concern system properties;
-
view: contains annotations that mark an interfaces as view;
-
other: contains annotations like @Type, @EcrIgnore and others, which stand out and cannot be classified into a group.
You can use the 5 entity classes to create custom entity types to serve the needs of your system. The customized entity types reflect the structure of your project or organization and can be created in a flexible way by extending the five entity types of the arveo system. You can make your class a type and add features by annotating your classes. You can define the custom metadata schema with simple getter and setter methods.
To create your first project using arveo you may want to review the following examples and follow the pattern.
Inherited annotations
Certain properties of annotations have a wide usage throughout the code, so it is therefore more convenient to define a certain annotation once for frequent usage.
The following is a listing of the interface definition @CustomAnnotation, which defines itself as a system property version id.If you mark a getter-method with this annotation, there is no need to list the system property name.
@Target({ElementType.METHOD, ElementType.ANNOTATION_TYPE})
@SystemProperty(SystemPropertyName.VERSION_ID)
public @interface CustomAnnotation {
}
To take advantage of this interface, we annotate getter-methods with it as shown in the listing below:
public interface InterfaceInheritanceExample {
@SystemProperty(SystemPropertyName.ID)
DocumentId getId();
@CustomAnnotation
VersionId getVersionId();
}
Examples
Enumeration example
Define a enum class and use it in a another object type (Example).
@Enumeration(typeName = "my_enum")
public enum MyEnum {
ENUM1, ENUM2, ENUM3, ENUM4
}
Document type example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
@Type(ObjectType.DOCUMENT) (1)
@RetentionProtected
@ContentElement(defaultDefinition = true, separateField = true)
@OverwriteAllowed
@RecycleBin
@Audit
public interface Resume {
// Immutable identifier documentid of the resume document: unique and readonly
@Unique
@ReadOnly
// alternatively: use autoincrement instead of unique and readonly to let the service create a unique sequence
//@Autoincrement
long getDocumentId(); (2)
void setDocumentId(long value);
// title of the resume document
String getTitle(); (2)
void setTitle(String value);
// relation to Person by person.id()
@ForeignKey (target = Person.class, targetProperty = "id") (3)
String getPersonId();
void setPersonId(String value);
// Multi value with former employers
List<String> getEmployers();
void setEmployers(List<String> employers);
MyEnum getEnum();
void setEnum(MyEnum myEnum);
}
1 | Definition of the object type to allow Document to upload content |
2 | A database column is created for this property with a default name DocumentId. The column is readonly, mandatory, autoincrement and unique. The database creates a sequence of integer values. The value is readonly and so immutable. This allows users and 3rd party applications to identify and find the object. If you leave the @Autoincrement annotation the id must be set on creation and is readonly and immutable from that moment on. |
3 | This annotation specifies a foreign key to class Person |
Container type example
The following example class is marked as type Container. To use an entity type, we annotate the class using the @Type annotation.
@Type(ObjectType.CONTAINER) (1)
public interface Person {
String getFirstName(); (2)
void setFirstName(String value);
@Name("last_name") (3)
String getSurname();
void setSurname(String value);
@Unique (4)
String getVatNumber();
void setVatNumber(String value);
}
1 | Definition of the object type to be Container |
2 | A database column is created for this property with a default name first_name |
3 | This annotation specifies the name of the database column, which is different from the default |
4 | This annotation specifies a unique column, in this case vat_number. |
Referencing attributes by name
The system creates a column for each attribute of a type definition in the type definition’s database table. The name
of the column will be a snake case representation of the camel case name of the getter method of the attribute. For
example, the getter getInvoiceNumber
will be mapped to an attribute (and a column) named invoice_number
. To make it
easy to reference these names in a compile-safe manner, classes with string constants for all type definitions will be
generated automatically. For example, for a type definition class called SimpleInvoice
a class named
SimpleInvoiceNames
will be generated in the same package as SimpleInvoice
.
The classes containing the constants are generated using an annotation processor that is contained in the library containing the type annotations. The processor is picked up by the compiler automatically. |
The following example shows how these constants can be used to perform a search referencing two different attributes.
EcrSearchService<SimpleInvoice> searchService = serviceClient.asEntitySearchService(); (1)
List<SimpleInvoice> list = searchService.where() (2)
.entity().field(SimpleInvoiceNames.INVOICE_NUMBER).like().value("2021-08-*")
.and()
.entity().field(SimpleInvoiceNames.AMOUNT).greaterThan().value(90D)
.holds()
.unpaged();
1 | serviceClient is a TypedDocumentServiceClient obtained using the TypeDefinitionServiceClient |
2 | A query is formulated using the fluent API of the EQL using the attributes invoice_number and amount |
Type annotations
Annotation | Parameter | Description |
---|---|---|
@Type |
ObjectType |
Define the entity type of your class by setting a valid ObjectType: DOCUMENT, FOLDER, RELATION, CONTAINER, META: Example |
@AccessChecks |
boolean |
This annotation specifies whether type-based access-checking will be enabled on a type. Default = false. See permissions on type definitions for details. |
@AclDisabled |
boolean |
Support for ACLs is enabled by default but can be disabled by annotating the type class with . Additionally, annotating a getter for the ACL-Id system property with @Mandatory enforces the assignment of an ACL to every entity. Meta types do not support ACLs. |
@FilingEnabled |
boolean |
The filing feature makes it possible to assign a document to a folder. The feature is disabled by default and can be activated on typ classes of type DOCUMENT by annotating the class with . |
@RetentionProtected |
boolean |
The retention and litigation hold feature is disabled by default and can be enabled by annotating a type class with. Meta types do not support retention. Example |
@OptimisticLocking |
boolean |
The optimistic locking feature makes it possible for clients to ensure that updates do not overwrite changes made by other clients by accident. The feature is disabled by default and can be enabled by annotating a type class with |
@RecycleBin |
boolean |
The recycle bin feature makes it possible to move entities to the recycle bin and restore them again if required. The feature is disabled by default but can be enabled by annotating a type class with. Recycle Bin |
@Recovery |
boolean |
Enables the recovery log. Content objects or files are deleted after configurable time: Recovery Log |
@ContentElement |
String |
Define the allowed content types: Example |
@Audit |
boolean |
This annotation enables auditing of create-, update- and delete-operations on the type definition. |
@Versioned |
boolean |
This annotation defines if all properties of a type are versioned or not. If the annotation is present on a type and on a getter in the type, the annotation on the getter wins. |
@OverwriteAllowed |
boolean |
By default, arveo creates a new version if the content object of a document is changed. You can always read and restore all older versions of a content element. If overwrite is allowed you can replace a content element and overwrite it on the content store. The old version is lost. |
@View |
boolean |
The metadata type is a database view: Example |
@Tablename |
String |
Set the real database table name of a system column which is not camel case but snake case: Example |
@SourceType |
boolean |
This annotation marks a setter method to be setting a property that is part of an update or create call and not a member of the entity itself. Examples are revision commentary or the update counter. |
@TargetType |
boolean |
This annotation specifies the class being the target of a foreign key or relation. |
@InheritedProperty |
boolean |
This annotation marks a property as an inherited property. |
@Enumeration |
String |
This annotation can be used to configure a registered enumeration type.You must pass the database snake case name of the enumeration type: Example |
@EcrIgnore |
Ignore Property |
This annotation marks a method to be ignored as property or a class to be ignored as type.The property is not stored in the database table. |
@NOSql |
boolean |
This annotation enables full-text support for all columns of the document types. By default, the full-text support is disabled: Example. |
Property annotations
Annotation | Parameter | Description |
---|---|---|
@AutoIncrement |
boolean |
The annotation AutoIncrement indicates that the value of an attribute will be auto-incremented by the database. |
@Indexed |
String |
The annotation ensures, that an index will be created for one or more properties. You must pass the index name as a parameter. When several attributes are annotated to use an index with the same name, a multi-column-index will be created for these columns. Use {@link Index} to configure additional properties of the index. |
@Unique |
boolean |
Defines a unique column. If you try to create an entity with a duplicate value an unique constraint violation is thrown. arveo creates an unique index or an unique constraint on the database and ensures the integrity of the documents. Example |
@Mandatory |
boolean |
Defines a mandatory column. Default = false, the create operation fails with an exception if the property is not set. Example |
@Readonly |
boolean |
The property must be set when the entity is created (like @Mandatory) and cannot be changed afterwards. If a column has the annotations @Readonly and @Unique you have an immutable index value that can be used as business primary key. This ensures that users and third-party systems can clearly identify and find a document. Example |
@Versioned |
boolean |
This annotation defines if an attribute of a type is versioned or not (when placed on a getter). If the annotation is present on a type and on a getter in the type, the annotation on the getter wins. |
@Length |
Long |
This annotation specifies the length of a string or binary attribute |
@Precision |
Long |
This annotation specifies the precision of a decimal, parameter: digits before and after comma |
@Casesensitive |
boolean |
This annotation marks a field of type String as case-sensitive. This effects how searches on this field will be performed. The value itself will always be stored preserving the case. |
@DefaultValue |
String |
It is possible to specify default values for properties. Pass the database name of the property (camel vs. snake case!) and define a function returning the required type. If an instance of a type with a field that has a default value specified is created, and a value for that field is not defined, the default value will be used instead. However, if the field is explicitly set to null, then null will be used instead. Example |
@DefaultSystemPropertyValue |
String |
It is possible to calculate the initial value of the retention period and set it as a default value for RETENTION_DATE system column. Pass the database column name "retention_date" and a function returning a ZonedDateTime value. Example |
@PrimaryKey |
boolean |
This annotation marks a custom property as part of the elements primary key. The primary key will be combined of every custom property annotated with this annotation and the system property id. The property will be mandatory. |
@SecondaryKey |
boolean |
This annotation marks a property as secondary. The property is mandatory and unique. |
@ForeignKey |
String |
Defines a foreign key. You must pass the class name and the column for the foreign key. _arveo creates the foreign key on the database and ensures the data integrity of your entities. Example |
@CascadeDelete |
boolean |
It is possible to define foreign keys that cascade a delete operation to the referencing entity. Example |
@Systemproperty |
SystemPropertyName |
To access system properties you can use the annotation @SystemProperty and pass one of the following names (Example). |
@FormattedCounter |
String |
This annotation marks an attribute of type String as a formatted counter. Formatted
counters can be used to generate string valued attributes with a counter backed by a sequence as well as a
prefix and a suffix. The name of the sequence can be user defined, or it can be auto-generated by the system. Prefix,
suffix, and the name of the sequence can contain placeholders. Currently, the only supported placeholder is
|
@RelationCounter |
Class |
This annotation marks a property of type Int as a counter for a specific relation type identified by the type definition class. |
@EcrIgnore |
boolean |
This annotation marks a method to be ignored as property or a class to be ignored as type. The property is not stored in the database table. |
@NOSql |
boolean |
This annotation enables or disables full-text support for this property. |
Please note the following tips regarding Unique identifiers:
To allow users and 3rd party applications to identify and find objects in arveo you should define a unique and immutable property. The property must be @Unique to ensure that an application can identify the item. Make the property @ReadOnly to ensure that the identifier is always set and immutable.
Your business application or the user must set the value when the object is created. Use the @AutoIncrement annotation instead of @Unique and @Readonly if a simple sequential Long id meets your requirements. If you need a more sophisticated unique identifier you can use the annotation @FormattedCounter which allows you to create e.g. String identifiers like <year>-<sequence> (Example).
If overwrite is turned on it is possible to manipulate the originally saved content and compromise the document without creating a versioned copy. Ensure that the @OverwriteAllowed annotation is not present on legally compliant document types. |
Examples
Default values
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
@Type(ObjectType.CONTAINER)
public interface ContainerWithSimpleDefaultProperty {
String DEFAULT_STRING = "default string"; (2)
@Mandatory
String getMyStringField();
void setMyStringField(String myStringField);
@DefaultValue("my_string_field") (1)
default String defaultStringField() {
return DEFAULT_STRING;
}
// ...
// your custom attribute definitions
// ...
}
1 | With @DefaultValue("my_string_field") the method defaultStringField is defined to return the default value of my_string_field .Note that the reference in the annotation is in snake-case while the actual property getMyStringField is camel-case. |
2 | In a simple case like this it is considered good practice to declare a constant default value as a public constant.However, the default method does not need to return a constant.For example, date-time fields could use ZonedDateTime.now() to specify the timestamp of the creation as default value. |
Index example
As an example of annotations usage let us define an interface BookIndex with two properties, page and chapter.These properties have to be indexed.
Meta
@Type(ObjectType.META)
@Index("book-chapter-page-index")
public interface BookIndex {
@PrimaryKey
@AutoIncrement
int getId();
@Indexed("book-chapter-page-index")
int getChapter();
void setChapter(int chapter);
@Indexed("book-chapter-page-index")
int getPage();
void setPage(int page);
}
The above-mentioned properties are thus marked with the annotation @Indexed, which ensures, that an index will be created for these attributes.Here, the annotation @Index on the type is an example of an annotation on a type, described above.
Formatted counters example
Using the @FormattedCounter annotation it is possible to define counters with prefix and suffix that are backed by a sequence on the database.There are several properties that can be defined in the annotation:
Property | Description |
---|---|
prefix |
The prefix used for the counter values. Can contain placeholders. |
suffix |
The suffix used by the counter values. Can contain placeholders. |
digits |
The number of digits for the counter. Shorter numbers will be padded with zero. |
sequenceName |
The name of the sequence to use. Can contain placeholders. |
autoGenerateSequences |
The number of sequences to auto-generate when the system is started in maintenance mode. |
startValue |
The start value of the generated sequence(s). |
The parameters prefix, suffix and sequenceName support placeholders.Currently, the system supports a placeholder for
dates in the form $date(<format>) where format is a java date format string supported by
java.time.format.DateTimeFormatter#ofPattern(String)
|
The autoGenerateSequences property can only be used when the sequenceName contains the placeholder
$date(uuuu) .It must not contain any other placeholders.
|
The following example shows a formatted counter attribute used as an invoice number that will produce counter values in
the form 2021#0103
.It will be backed by a sequence called inv_no_seq_2021
.The system will create the next 10
sequences automatically (inv_no_seq_2021
to inv_no_seq_2030
).The start value of each sequence will be 100. The
sequence to use will be determined automatically because of the date placeholder in the sequenceName
property.So on
January 1st 2022, the generated counter values will use another prefix and the counter will start over at 100
(2022#0100
).Each time the system is started in maintenance mode, it will make sure that sequences for the next 10
years will be present.
@FormattedCounter(prefix = "$date(uuuu)#", digits = 4, sequenceName = "inv_no_seq_$date(uuuu)", autoGenerateNextSequences = 10, startValue = 100)
String getInvoiceNumber();
Foreign keys with ON DELETE CASCADE example
Add the @CascadeDelete annotation to the getter for the foreign key attribute. For relation types it is possible to add the cascade delete option to the foreign keys to the parent and child of the relation.To do that, add a system property for the parent- and/or child-id and annotate it with @CascadeDelete.
// simple foreign key
@CascadeDelete
@Mandatory(false)
@ForeignKey(target = BookIndex.class, targetProperty = "id")
Integer getReferencedIndex();
// parent- and child-id of a relation
@CascadeDelete
@SystemProperty(SystemPropertyName.PARENT_ID)
short getParentId();
@CascadeDelete
@SystemProperty(SystemPropertyName.CHILD_ID)
short getChildId();
The cascade delete option is supported only for entities that are not versioned (hence it cannot be used on Document types) and do not support retention or inheritance.It is also not possible to inherit attribute values from a type definition that has a foreign key with the cascade delete option. |
Property-like system fields
If a getter for a system field is defined, then it is possible to define a setter, if the system field is property like. The following fields are property-like:
-
acl_id;
-
retention_date.
The following listing shows the definition of a getter and a setter method on a property-like field.
public interface Secured {
@SystemProperty(SystemPropertyName.ACL_ID)
AccessControlListId getAclId();
void setAclId(AccessControlListId aclId);
}
Define a view
To define your type as a view or a partial view, you have to annotate your type with @View or @PartialView.The @View annotation specifies whether the defined type is a view i.e. whether it should create the tables for it.The @PartialView annotation marks a class to be a partial view of the type definition created by another class via the @Type annotation.Partial views can be used for updates and selects with limited select clauses.No tables will be created for classes annotated this way.The interfaces that are to be defined as views of an object type, have to be registered on the interface, representing this object type.For instance, if an interface NamedFile inherits from the interface NamedEntity, and NamedEntity is a partial view of NamedFile, it has to be registered on the object from which it inherits:
@PartialView(NamedFile.class)
public interface NamedEntity {
//...
}
Note: An interface may also be a partial view of more than one type definitions.
External views
It is possible to expose tables that are under control of other applications to arveo and include them in its type system.This assumes that the given tables are in the same database schema as the tables of arveo.Also, one needs to know the name of these tables as well as their types.In this case one can define a meta-type annotated with @View.
External views will only be read from ecr.It will never write to an external view. |
For example, the access-control-service defines several tables, one of it named usrv_acl
.In this table there are - amongst others - two fields: id
(a bigint
) and name
(a varchar
).With this knowledge one can define the following external view:
1
2
3
4
5
6
7
8
9
10
11
@View (1)
@Type(ObjectType.META) (2)
@TableName("usrv_acl") (3)
public interface AclView {
@Unique
String getName(); (4)
@PrimaryKey
long getId(); (5)
}
1 | We annotate the class with @View to declare it as an external view. |
2 | Specifying the type as a META type is good practice, since every other type would expect specific system fields. |
3 | Specifying the table name is good practice here, since an external table most likely follows its own name convention. However, it would be possible to omit the @TableName annotation here and instead name the class UsrvAcl . |
4 | Since we know that the table usrv_acl has a field name of type varchar we can define the property name of type String . |
5 | We know that the table usrv_acl has a field id of type bigint , so we specify a java property accordingly. |
NoSQL example
You can write the annotation @NOSql to the type definitions, which should also be created in the solr schema, so that the whole class is created with its fields.
@Type(ObjectType.CONTAINER)
@NOSql
public interface PersonSimple {
String getFirstName();
void setFirstName(String value);
String getLastName();
void setLastName(String value);
}
If you don’t want to create a field, you can disable it with the annotation @NOSql(value = false).
@Type(ObjectType.CONTAINER)
@NOSql
public interface PersonSimple {
String getFirstName();
void setFirstName(String value);
@NOSql(value = false)
String getLastName();
void setLastName(String value);
}
@SystemProperty annotation
To access system properties you can use the annotation @SystemProperty and pass one of the following names (Retention Information Getter)
general system fields:
-
ID: The unique identifier of the entity. Use on EcrId properties (or subclasses as applicable). Can be used on any entity
-
CREATION_DATE: The date and time the relation was created. Use on ZonedDateTime properties. Can only be used on relations.
-
CREATOR_USER_ID: The id of the user that created this relation. Use on UserId properties. Can only be used on relations.
-
ACL_ID: The id of the ACL currently assigned to the entity. Might be null. This is not supported for metadata entities.
-
ACL_RIGHT: The resolved right based on the ACL currently assigned to the entity and the current user. This is not supported for metadata entities.
-
RETENTION_INFO: Information about the retention properties of the entity. It contains the RETENTION_DATE and the LITIGATION_HOLD flag described below. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.
-
RETENTION_DATE: The retention date defines the minimum storage date i.e. the related object can not be deleted until after this date passed. The the storage period may be extended but never shortened. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.
-
LITIGATION_HOLD:A flag that indicates whether a document is related to a litigation. If the flag is set the document must never be deleted - even if the retention date has passed by. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations and Containers that declared to be retention protected.
versioned system fields:
-
VERSION_NUMBER: The number of the version of the versioned entity. Use on int/Integer properties. Can only be used on versioned entities.
-
VERSION_ID: The unique identifier of the version of the entity. Use on VersionId properties. Can only be used on versioned entities
-
UPDATE_COUNTER: A counter that is incremented each time an entity is updated. It is used for the optimistic locking feature and therefore is only available on type definitions that use optimistic locking.
-
IS_CURRENT_VERSION: A boolean that indicates whether the entity was the current version at the time it was loaded from the backend. Can only be used on versioned entities.
-
MODIFICATION_INFO: Information about the date and time as well as the user of the first and last modification of the entity. Use on ModificationInformation properties. Can only be used on potentially versioned entities i.e. Folders, Documents, Relations.
document system fields:
-
CONTENT: Information about the content of the document. Use on Map<String, ContentInformation> properties. Can only be used on documents.
-
CONTAINING_FOLDER: The id of the folder containing the document (if any). Use on FolderId properties. Can only be used on documents.
folder system fields:
-
FOLDER_NAME: The name of the folder. Use on String properties. Can only be used on folders.
-
PARENT_FOLDER: The id of this folders parent. Use on FolderId properties. Can only be used on folders.
relation system fields:
-
PARENT_ID: The id of the parent of this relation. Use on TypedId properties (or applicable subclasses). Can only be used on relations.
-
PARENT_VERSION_ID: The version-id of the parent of this relation. Use on VersionId properties. Can only be used on relation types that support relations to or from versions.
-
CHILD_ID: The id of the child of this relation. Use on TypedId properties (or applicable subclasses). Can only be used on relations.
-
CHILD_VERSION_ID: The version-id of the child of this relation. Use on VersionId properties. Can only be used on relation types that support relations to or from versions.
Data Types
Java Type | Database Type | Description |
---|---|---|
String |
text |
Unlimited unicode text. Limit the length with @Length annotation |
Integer or int |
int |
32 bit integer value, Integer = null is allowed |
Long or long |
bigint |
64 bit long value, Long = null is allowed |
Double or double |
double |
double value, Double = null is allowed |
Boolean or boolean |
boolean |
Boolean value, Boolean = 3 state boolean |
Decimal or decimal |
decimal( precision) |
Decimal value, Decimal = null is allowed, add @Precision annotation |
UUID |
uuid |
uuid type |
byte[ length ] |
bytea |
Binary data with a length, specified by a java int (max. 4 gb). |
String |
text |
String based ID with a non-null length. |
EnumerationType |
EnumerationType |
arveo creates an enumeration object on postgreSQL 12. |
ZonedDateTime |
datetime |
arveo stores a GMT based date time value in postgreSQL 12 |
LocalDate |
datetime |
arveo stores a date time value in postgreSQL 12, but only the date is relevant |
LocalTime |
datetime |
arveo stores a date time value in postgreSQL 12, but only the time is relevant |
List<String> |
array(text) |
_arveo stores multiple text values in an array column of postgreSQL 12. |
List<Long> |
array(bigint) |
_arveo stores multiple bigint values in an array column of postgreSQL 12. |
By default, postgreSQL 12 does not limit the length of String values. Typically, it is not necessary to define a length using the @Length annotation because postgreSQL 12 does handle Strings of all length very well. Your strings should have a length up to 4 kByte. Even larger strings are allowed, but you should take care that you do not inadvertently consume too much data space if you store very large strings. |
List data types allow you to store more than String or long value for a property. You can search for each value using the array search operation of the arveo query language. |
Enumeration data types allow you to set one or more values from a fixed set of values. |
System Properties
The following chapter describes types of system properties in arveo.
There are different types of system properties:
-
General system properties: system properties that are available on all types of entity (except for meta data entities).
-
General system properties: system properties that are available on all types of entity (except for meta data entities).
-
Versioned entity system properties: system properties that are only available on entities that can be versioned (Containers, Documents, Folders, Relations). Those properties are contained in the main table of a type definition.
-
Document system properties: system properties that are only available on documents.
-
Folder system properties: system properties that are only available on folders.
-
Relation system properties: system properties that are only available on relations.
-
Version system properties: system properties that are only available on versions of entities. Those properties are contained in the version table of a type definition.
System Property Names
All system columns in the database are snake case but not camel case. e.g. the Java RetentionDate variable is persisted as "retention_date". |
Name | Database Type | Description |
---|---|---|
id |
bigint |
The unique identifier of the entity. Use EcrId properties (or subclasses as applicable). Can be used on any entity and is applied by arveo for all types but metadata. |
acl_id |
bigint |
The id of the ACL currently assigned to the entity. Might be null. This is not supported for metadata entities or type with disabled ACLs. |
creation_date |
datetime |
GMT timestamp when the entity or version was created, precision (1/1000 second) |
creator_user_id |
bigint |
The ID of the user who created the entity or version (User Management) |
deleted |
boolean |
Optional flag that indicates that an entity is currently contained in the recycle bin. |
last_delete_restore_date |
datetime |
Optional GMT timestamp of when the entity was last moved in or out of the recycle bin. |
retention_date |
datetime |
The GMT based retention timestamp defines the minimum storage date i.e. the related object can not be deleted until after this date passed. Cannot be used on meta data entities and is only available on entity types that declared to be retention protected (Retention) |
litigation_hold |
boolean |
The boolean indicates whether a document is related to a litigation. If the flag is set the document must never be deleted - even if the retention date has passed by. Cannot be used on meta data entites and is only available on entity types that declared to be retention protected (Retention) |
update_counter |
int |
Optional counter for the number of updates on an entity used for optimistic locking. |
Name | Database Type | Description |
---|---|---|
version_number |
bigint |
The sequential number of the latest version of the versioned entity. |
latest_version_id |
bigint |
The unique identifier of the latest version of the entity. |
version_comment |
string |
A comment set by the client when a new version is created. |
modification_date |
datetime |
GMT timestamp when the version was created or changed, precision (1/1000 second) |
modification_user_id |
bigint |
The ID of the user who created or changed the version |
initial_creation_date |
datetime |
GMT timestamp of when the first version of an entity was created. |
Name | Database Type | Description |
---|---|---|
content |
json |
JSON containing content properties: |
parent_id |
bigint |
Optional field that contains the ID of the folder the document is contained in. |
Name | Database Type | Description |
---|---|---|
folder_name |
String |
The name of the folder. |
parent_id |
bigint |
The ID of the parent of the folder in the folder tree. |
Name | Database Type | Description |
---|---|---|
parent_id |
bigint |
The id of the parent of this relation. |
parent_version_id |
bigint |
The version-id of the parent of this relation. Can only be used on relation types that support relations to or from versions. |
child_id |
bigint |
The id of the child of this relation. |
child_version_id |
bigint |
The version-id of the child of this relation. Can only be used on relation types that support relations to or from versions. |
Name | Database Type | Description |
---|---|---|
version_number |
bigint |
The sequential number of the version. |
version_id |
bigint |
The unique identifier of the version. |
version_comment |
string |
A comment set by the client when a new version is created. |
entity_id |
bigint |
The ID of the entity the version belongs to. |
Timestamps
All timestamp system properties (creation_date, initial_creation_date, modification_date) are stored in the database using the GMT timezone and a precision of 1 millisecond. When using the Java API, the values will be returned as ZonedDateTime instances.
The initial_creation_date field will contain the timestamp of when the very first version of an entity was created. This field is never updated. The creation_date field on the other hand will contain the time a specific version of an entity was created. Thus, the creation_date field in the main table will be updated when a new version is created because the main table will always contain the latest version of an entity. The modification timestamp field (modification_date) will contain the timestamp of when a version was created or overwritten. This field, too, will be updated in the main table each time a new version is created. It will be updated in the main table and in the version table when a version gets overwritten.
Document type
The following chapter provides a more detailed overview of the type Document
.
A Document
is one of five entity types supported by the arveo system. Unlike the other entity types, documents are always versioned too keep track of changes of the binary content.
A Document
consists of the following components:
-
Technical metadata, which is filled by arveo and cannot be changed, see System properties
-
Typed metadata as defined in the annotated interface (the type definition)
-
0-n content objects: A content object has a content type that is freely configured in the system.A maximum of one element can be inserted per content type.Examples of content types are: original object, rendition, full text, text notes, XML properties, etc.
-
content metadata like content size, mime-type and hash
-
0-n annotations per content object: Only for image objects (TIFF, JPEG, PNG, BMP, PDF/A) annotations can be created in a layer independent of the document.
Any number of versions can be created for a Document
. All the versions are traceable in the repository and can be referenced via independent system-wide unique IDs.
Container type
The following chapter provides a more detailed overview of the type Container
.
A Container
is an object without content. It supports all system managed metadata attributes and custom attributes
defined by the type definition. It is called 'Container' because it’s primary use case is to serve as an entity that
contains custom metadata and that is related to other entities like a document via foreign keys or relations.
Use container objects to build records and cases that contain documents.You can map the relationship between file, case and documents either as a foreign key (@ForeignKey annotation) or using the relation type objects (Relation Type). |
If you use Foreign keys to create the relationship between objects you can inherit values from the parent to its children (Inheritance) |
Containers can be versioned. A Container
consists of the following components:
-
Technical meta information, which is filled by arveo and cannot be changed, see System properties
-
Typed container type metadata according to the type definition of the container type.
Any number of versions can be created for a Container
. All the versions are traceable in the repository and can be
referenced via independent IDs.
Relation type
The following chapter provides a more detailed overview of the type Relation
.
A Relation
represents a connection between two entities (document, container, folder or meta). It is directed, having
a parent and a child and it can contain custom metadata attributes. A Relation
type must specify the type of the parent
and child entities. Any number of versions can be created for a Relation
. All the versions are traceable in the repository and can be referenced via independent IDs.
Changes of the child-id or parent-id are not tracked in the version table. |
Relation
+------------+ +--------+-----+ +------------+ | Parent | + Relation + | Child | |------------| source |--------------| target |------------| | |<--------------| |------------->| | | attributes | | attributes | | attributes | | | | | | | +---+--------+ +--------------+ +------------+
Relation
type definition@Type(ObjectType.RELATION) (1)
@SourceType(Customer.class) (2)
@TargetType(Invoice.class) (3)
public interface CustomerInvoiceRelation {
@SystemProperty(SystemPropertyName.CHILD_ID) (4)
@InputProperty(InputPropertyName.RELATION_CHILD) (5)
DocumentId getChildId();
void setChildId(DocumentId childId);
@SystemProperty(SystemPropertyName.PARENT_ID) (6)
@InputProperty(InputPropertyName.RELATION_PARENT) (7)
ContainerId getParentId();
void setParentId(ContainerId parentId);
String getStatus();
void setStatus(String status);
}
1 | Specifies that the type definition is used for relations |
2 | Defines the type of the source or parent of the relation |
3 | Defines the type of the target or child of the relation |
4 | Marks an attribute to return the value of the childId property of the relation |
5 | Marks an attribute to set the value of the childId property of the relation |
6 | Marks an attribute to return the value of the parentId property of the relation |
7 | Marks an attribute to set the value of the parentId property of the relation |
Relations vs. foreign keys
Instead of using relations, it is possible to model a dependency between two entities using foreign keys. The key difference between the two approaches is that a relation can carry its own metadata attributes, which a foreign key can not. This possibility requires an additional database table (or two, in case of versioned relations) for a relation, which might have a negative impact on the performance. If the dependency between the two entities does not require its own metadata attributes (and is not a many-to-many relation), it is recommended to use foreign keys instead of relations.
Foreign keys can be defined by
adding the @ForeignKey
annotation to an attribute in a type definition. The targetProperty
attribute of the
annotation must point to the ID or to a custom metadata attribute with a unique constraint of the target type.
The type of the annotated attribute must match the type of the target property of the foreign key. The chapter
Foreign Keys contains a more detailed overview of the foreign key feature.
@ForeignKey(name = "fk_invoice_customer", target = Customer.class, targetProperty = "id")
long getCustomerNumber();
+------------+ +------------+ | Parent | | Child | |------------| foreign key |------------| | |---------------->| | | attributes | | attributes | | | | | +---+--------+ +------------+
Relations to versions
By default, a relation can point to the current version or to a specific version of its parent or child, when the
parent- or child-type supports versions. This behavior can be controlled by the supportedNodeVersion
property of
the @Source
and @Target
annotations used for relation type definitions. The attribute supports three different
values (defined in de.eitco.ecr.type.definition.annotations.reference.SupportedNodeVersion
):
Value | Meaning |
---|---|
CURRENT_VERSION |
The relation must point to the current version of the node identified by the node’s ID (NOT the VersionId of the current version) |
SPECIFIC_VERSION |
The relation must point to a specific version of the node identified by it’s VersionId. |
CURRENT_OR_SPECIFIC_VERSION |
The relation can point to either the current version or a specific version of the node. This is the default. |
Unique relations
A single relation always has exactly one parent and one child. However, by default a single entity can be the parent or
child of multiple relations (many-to-many). By adding unique constraints to the parentId
and/or childId
system
properties of the relation type, it is possible to define one-to-many, many-to-one or one-to-one relations.
@SystemProperty(SystemPropertyName.CHILD_ID)
@Unique(constraintName = "uccr_parent_child_uc")
ContainerId getChildId();
Relation counters
By using the @RelationCounter
annotation it is possible to create counters on the parent- and child-entities for both
incoming and outgoing relations. The counters are persisted in the database and are updated automatically when relations
are added or removed.
The @RelationCounter
annotation contains two attributes: The relationType
attribute
defines the type of relation to count and the direction
attribute defines whether to count incoming (the entity is
the child or target of the relation) or outgoing (the entity is the parent or source of the relation). By annotating
the relation counter attribute with @Versioned
it is possible to control whether the counter attribute is stored
in the version table for each version or in the main table for all versions. When the counter is stored in the version
table it will contain the count for a single version of the entity. If it is stored in the main table it will contain
the count for all versions of the entity. The following example shows how to define relation counter attributes. The
@Name
annotation is used because the attribute name is too long for a database column name.
@RelationCounter(relationType = TypedContainerContainerRelation.class, direction = RelationCounterDirection.INCOMING)
@Versioned(false)
int getIncomingRelationCounter();
@RelationCounter(relationType = TypedContainerContainerRelation.class, direction = RelationCounterDirection.INCOMING)
@Versioned
@Name("v_in_relation_counter")
int getVersionedIncomingRelationCounter();
Working with relations
The $arveo API provides several methods that can be used to create, modify and resolve relations. Relations
itself are treated just like any other entity type. Entities, that can be the parent or child of a relation (containers,
folders, documents and meta data entities), provide additional relation-specific methods in the client API. The available
methods are defined in the interface de.eitco.ecr.sdk.TypedBaseRelationNodeEntityClient
, which is a super interface
of the clients used in the API for documents, folders, containers and meta data entities. The injectable
de.eitco.ecr.sdk.SearchClient
offers additional methods to search for relations using filters on the relation, the
parent or the child.
Folder type
The following chapter provides a more detailed overview of the type Folder
.
A Folder
is an entity that is organized in a file system like tree structure. A Folder
can contain custom metadata
attributes. Documents can be filed in a Folder
.
A Folder
consists of the following components:
-
Technical meta information, which is filled by arveo and cannot be changed, see System properties.
-
Typed folder type metadata according to a schema defined for the document type.
Any number of versions can be created for a Folder
. All the versions are traceable in the repository and can be referenced via independent IDs.
Only documents can be filed in a Folder . To enable the filing feature, add the @FilingEnabled annotation to your
document type.
|
Metadata type
Metadata
types are used for example to connect external tables. They do not contain any specific system fields and no typed ID as a primary key. The database table can be created by the arveo or an existing table can be used.
Use the @View annotation to mark a metadata type as a view for which the system should not create a table and use the @TableName annotation to define the name of the table of the external system. |
Metadata
types do not support versioning and retention protection.
You can use the @PrimaryKey annotation to define one or more properties of a Metadata type to be the primary key.
|
Inheritance
Simple direct inheritance
The following chapter describes the inheritance scheme, used in arveo. The object to be inherited and its initial state is shown in the following table.
Create | Initial state | |
---|---|---|
Company |
||
ID (Company) |
- |
888 |
Name |
CTuX |
CTuX |
CountryCode |
DE |
DE |
PhoneNumber |
- |
[NULL] |
The following table describes direct inheritance (hence with no intermediate objects). Here, Invoice is an object that inherited from Company. The following table describes its initial state, and the update status after 3 different updates.
Create | Initial state | Update 1 | After Update 1 | Update 2 | After Update 2 | Update 3 | After Update 3 | Update 4 | After Update 4 | |
---|---|---|---|---|---|---|---|---|---|---|
Invoice |
||||||||||
ID (Invoice) |
- |
931 |
- |
931 |
- |
931 |
- |
931 |
- |
931 |
InvoiceNumber |
EIT-53 |
EIT-53 |
- |
EIT-53 |
- |
EIT-53 |
- |
- |
- |
EIT-53 |
companyID |
- |
[NULL] |
888 |
888 |
[NULL] |
[NULL] |
[NULL] |
[NULL] |
- |
[NULL] |
companyName |
- |
[NULL] |
SAP |
CTuX |
Eitco |
Eitco |
- |
[NULL] |
- |
[NULL] |
companyCountryCode |
- |
[NULL] |
- |
DE |
- |
[NULL] |
- |
[NULL] |
- |
[NULL] |
companyPhone |
- |
[NULL] |
+49 (30) 408191-425 |
[NULL] |
+49 (30) 408191-425 |
+49 (30) 408191-425 |
- |
[NULL] |
+41 123456 |
+41 123456 |
Error: no change! |
Not possible: faulty update parameters! |
Note the following principles:
After update2: All inherited fields are NULLs if inheritance key is set to NULL, unless values are explicitly specified. After update3: All inherited fields are NULLs if inheritance key is set to NULL, unless values are explicitly specified. - Even if the inheritance key was already NULL before. |
Multilevel inheritance
This inheritance form has an object to be inherited from, just like the direct inheritance. An objects inherits from it, after that another object inherits from the second object. The initial object is still the same, its initial state is described in the table above.
In the following table, the second object Creditor, which inherits from the first object, is described.
Create | Initial state | |
---|---|---|
Creditor |
||
ID (Creditor) |
- |
999 |
CreditorNumber |
471147114711 |
471147114711 |
CompanyID |
888 |
888 |
companyName |
- |
CTuX |
companyCountryCode |
- |
DE |
companyPhone |
- |
[NULL] |
In the table above, the object Creditor inherited the following properites through the companyID: companyName, companyCountryCode, companyPhone. |
The results of multilevel inheritance through an intermediate object are shown in the table below:
Create | Initial state | Update 1 | After Update 1 | Update 2 | After Update 2 | |
---|---|---|---|---|---|---|
Invoice |
||||||
ID (Invoice) |
- |
931 |
- |
931 |
- |
931 |
InvoiceNumber |
EIT-11 |
EIT-11 |
- |
EIT-11 |
- |
EIT-11 |
creditorID |
- |
[NULL] |
999 |
999 |
[NULL] |
[NULL] |
companyName |
- |
[NULL] |
SAP |
CTuX |
Eitco |
EITCO |
companyCountryCode |
- |
[NULL] |
- |
DE |
- |
[NULL] |
companyPhone |
- |
[NULL] |
+49 (30) 408191-425 |
[NULL] |
+49 (30) 408191-425 |
+49 (30) 408191-425 |
Indirect inheritance
The third form of inheritance is indirect inheritance. It is much like the second form, only the inheriting object inherits the IDs of both objects it inherits from. In the example above, the object Invoice inherits both the creditorID and the companyID.
In the following table, the object Creditor is described.
Create | Initial state | |
---|---|---|
Creditor |
||
ID (Creditor) |
- |
999 |
CreditorNumber |
471147114711 |
471147114711 |
CompanyID |
888 |
888 |
The table below describes the mechanism of indirect inheritance.
Create | Initial state | Update 1 | After Update 1 | Update 2 | After Update 2 | Update 2a | After Update 2a | |
---|---|---|---|---|---|---|---|---|
Invoice |
||||||||
ID (Invoice) |
- |
931 |
- |
931 |
- |
931 |
- |
931 |
InvoiceNumber |
EIT-11 |
EIT-11 |
- |
EIT-11 |
- |
EIT-11 |
- |
EIT-11 |
creditorID |
- |
[NULL] |
999 |
999 |
[NULL] |
[NULL] |
[NULL] |
[NULL] |
companyID |
- |
[NULL] |
- |
888 |
- |
888 |
[NULL] |
[NULL] |
companyName |
- |
[NULL] |
SAP |
CTuX |
Eitco |
CTuX |
Eitco |
EITCO |
companyCountryCode |
- |
[NULL] |
- |
DE |
- |
DE |
- |
DE |
companyPhone |
- |
[NULL] |
+49 (30) 408191-425 |
[NULL] |
+49 (30) 408191-425 |
[NULL] |
+49 (30) 408191-425 |
+49 (30) 408191-425 |
This form of inheritance is currently not needed and therefore not supported by ECR. |
Inheritance of ACLs
The acl_id
system field is property-like, thus a type can define it as inherited.
This permits scenarios where there is one main entity providing the access definition with several entities being linked to it. If the acl of the main entity changes, the ACLs of the linked entities change as well:
1
2
3
4
5
6
7
8
9
10
11
12
13
@Type(ObjectType.CONTAINER)
public interface MainEntity {
@Mandatory (2)
@SystemProperty(SystemPropertyName.ACL_ID) (1)
AccessControlListId getMainAcl(); (7)
void setMainAcl(AccessControlListId id);
// ...
// your custom attribute definitions
// ...
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@Type(ObjectType.DOCUMENT)
public interface ChildEntity {
@Mandatory (4)
@ForeignKey(target = MainEntity.class, targetProperty = "id") (3)
ContainerId getCurrentMainEntity(); (6)
void setCurrentMainEntity(ContainerId mainEntity);
@SystemProperty(SystemPropertyName.ACL_ID)
@InheritedProperty(foreignKeyPropertyName = "current_main_entity", sourcePropertyName = "acl_id") (5)
AccessControlListId getAcl();
// ...
// your custom attribute definitions
// ...
}
1 | The main entity defines a property that accesses the ACL system property. |
2 | This property is defined mandatory - thus the main entity will always have an ACL. |
3 | The child entity defines a foreign key to the main entity. |
4 | By specifying the foreign key property as mandatory, every child entity will be linked to a main entity |
5 | Now we can specify an ACL property being inherited. |
6 | Note that foreignKeyPropertyName (in line 12) is written in snake-case while the actual property getter is written in camel-case. |
7 | Note further, that while the property referenced is actually defined by the getter getMainAcl (MainEntity line 6), sourcePropertyName is set to the name of the system field "acl_id" to derive the property. |
Let’s see this behaviour in action.
Assume that we have a TypeDefinitionServiceClient
named typeDefintionServiceClient
and also the ids of two ACLs (firstAclId
and differenceAclId
).
First we can create service Clients for the two types defined above:
The annotation @DefaultValue() only accepts the database column name as static string parameter. As the document type properties are CAMEL case and the database column names are SNAKE case you must convert your properties e.g. MyCamelCaseProperty = my_camel_case_property. |
1
2
3
4
TypedContainerServiceClient<MainEntity> mainEntityServiceClient =
typeDefinitionServiceClient.getContainerServiceClient().byClass(MainEntity.class);
TypedDocumentServiceClient<ChildEntity> childEntityServiceClient =
typeDefinitionServiceClient.getDocumentServiceClient().byClass(ChildEntity.class);
With these service clients we can now create several entity instances of MainEntity
and ChildEntity
:
1
2
3
4
5
6
7
8
9
10
11
12
13
MainEntity mainEntity = mainEntityServiceClient.createTypeInstance();
mainEntity.setMainAcl(firstAclId);
TypedContainerClient<MainEntity> mainEntityClient = mainEntityServiceClient.createEntity(mainEntity);
ChildEntity childEntity1 = childEntityServiceClient.createTypeInstance();
childEntity1.setCurrentMainEntity(mainEntityClient.getIdentifier());
TypedDocumentClient<ChildEntity> childEntityClient1 = childEntityServiceClient.createEntity(childEntity1);
// ...
ChildEntity childEntityN = childEntityServiceClient.createTypeInstance();
childEntityN.setCurrentMainEntity(mainEntityClient.getIdentifier());
TypedDocumentClient<ChildEntity> childEntityClientN = childEntityServiceClient.createEntity(childEntityN);
The instances of ChildEntity
will automatically have the same ACL as mainEntity
:
1
2
3
Assert.assertEquals(childEntityClient1.getEntity().getAcl(), firstAclId);
// ...
Assert.assertEquals(childEntityClientN.getEntity().getAcl(), firstAclId);
If the ACL of the parent is updated…
1
2
mainEntity.setMainAcl(differentAclId);
mainEntityClient.updateAttributes(mainEntity);
…then the ACLs the instances of ChildEntity
change as well:
1
2
3
4
5
6
7
8
childEntityClient1 = childEntityClient1.reload();
// ...
childEntityClientN = childEntityClientN.reload();
Assert.assertEquals(childEntityClient1.getEntity().getAcl(), differentAclId);
// ...
Assert.assertEquals(childEntityClientN.getEntity().getAcl(), differentAclId);
Default ACLs and Inheritance
In many cases it will be desirable to be able to specify a default ACL for a given type. But the naive approach for defining a default ACL will prove cumbersome:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Type(ObjectType.CONTAINER)
public interface ContainerWithDefaultAcl extends WithData {
@Mandatory
@SystemProperty(SystemPropertyName.ACL_ID)
long getAclId();
void setAclId(long aclId);
@DefaultValue("acl_id") (1)
default long defaultAcl() {
return ?? (2)
}
}
1 | Of course one can define the ACL system property with a default value. |
2 | However, when specifying the default value one faces a problem. The id of an ACL is set by the Access Control Service automatically and will vary from deployment to deployment, even between test and production environments. |
However, the concepts presented so far can be used for a better solution. The main idea is to specify the ACL by its name instead of its id. For that we will need access to a table containing ACL names and their respective ids. Here external views can be used. We have already seen an external view exposing the ACL table to arveo:
1
2
3
4
5
6
7
8
9
10
11
@View
@Type(ObjectType.META)
@TableName("usrv_acl")
public interface AclView {
@Unique
String getName();
@PrimaryKey
long getId();
}
Since that exposes ACLs as arveo type instances, ACLs can be used for inheritance. And since ACL names are unique they can be used as a foreign key, particularly as one defining inheritance. That way the actual ACL id can be inherited by a key that is an ACL name, for which we can easily define a default value that is stable over every environment:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
@Type(ObjectType.CONTAINER)
public interface ContainerWithDefaultAcl extends WithData {
String DEFAULT_ACL_NAME = "default-container-acl"; (6)
@Id
ContainerId getId();
@Optional (7)
@ForeignKey(target = AclView.class, targetProperty = "name") (2)
String getAcl(); (1)
void setAcl(String acl);
@DefaultValue("acl")
default String defaultAcl() { (3)
return DEFAULT_ACL_NAME;
}
@Mandatory (7)
@InheritedProperty(foreignKeyPropertyName = "acl", sourcePropertyName = "id") (5)
@SystemProperty(SystemPropertyName.ACL_ID)
long getAclId(); (4)
void setAclId(long aclId);
}
1 | In our type we define a property ACL, that holds the name of the ACL. |
2 | This property is a foreign key that targets the field name table usr_acl . |
3 | For this property we can easily specify a default value. |
4 | Now we specify the ACL property. |
5 | It is simply defined to be inherited by the foreign key to the ACL table. |
6 | It is good practice to store constant default values in constants. |
7 | Marking the ACL id as @Mandatory enforces that every instance of the entity must have an ACL. However, this does not need to be an inherited one (since the ACL name is marked @Optional). So the more cumbersome way - to set the ACL by its id - is still possible. Marking the ACL propery as @Mandatory would forbid this. |
Retention
Annotations @RetentionProtected
An object may be annotated as @RetentionProtected. This will enable all further retention annotations listed below.Every retention enabled object extends the data model by
-
Datetime Retention_Date: contains the fixed retention period as ZonedDateTime format
-
Boolean LitigationHold: stores the litigation hold property
The convenience class 'Retention_Info' contains both values and can be used to read the retention information with one call. |
Annotations @DefaultSystemPropertyValue(RETENTION_DATE)
It is possible to define a default value for the RETENTION_DATE system column (Default Values]).
If a retention date is not explicitly set, a default value for the retention period is calculated using the default value function implemented by the document type.
1
2
3
4
@DefaultSystemPropertyValue(SystemPropertyName.RETENTION_DATE)
default ZonedDateTime defaultDatum() {
return ZonedDateTime.Now().plusYears(10);
}
@RetentionProtected annotations is required if you want to set a default for retention_date. |
If you have defined foreign keys, you can inherit the retention date from container or folder objects.This is very helpful if you have records in your data model (Defaults and Inheritance). |
Examples
Document Type: 10 year retention period
The following example shows how to set the default retention to creation date + 10 years. It also shows how to set a default value for the property warrantyEnd based on the ReceiptDate + 3 years.
It is still possible to set the Retention_date and warrantyEnd when you upload the document and overwrite the default value. |
/*
* Copyright (c) 2020 EITCO GmbH
* All rights reserved.
*
* Created on 02.10.2020
*
*/
package de.eitco.ecr.system.test.types.defaultvalues;
import de.eitco.ecr.common.RetentionInformation;
import de.eitco.ecr.type.definition.annotations.ContentElement;
import de.eitco.ecr.type.definition.annotations.ObjectType;
import de.eitco.ecr.type.definition.annotations.OverwriteAllowed;
import de.eitco.ecr.type.definition.annotations.Type;
import de.eitco.ecr.type.definition.annotations.constraint.Mandatory;
import de.eitco.ecr.type.definition.annotations.constraint.SecondaryKey;
import de.eitco.ecr.type.definition.annotations.defaults.DefaultSystemPropertyValue;
import de.eitco.ecr.type.definition.annotations.defaults.DefaultValue;
import de.eitco.ecr.type.definition.annotations.system.Id;
import de.eitco.ecr.type.definition.annotations.system.RetentionProtected;
import de.eitco.ecr.type.definition.annotations.system.SystemProperty;
import de.eitco.ecr.type.definition.annotations.system.SystemPropertyName;
import org.springframework.http.MediaType;
import java.time.ZoneId;
import java.time.ZonedDateTime;
@Type(ObjectType.DOCUMENT)
@RetentionProtected
@ContentElement(name = "content", separateField = true)
@OverwriteAllowed
public interface DocumentWithDefaultRetention {
@Id
Object identifier();
@SystemProperty(value = SystemPropertyName.RETENTION_INFO)
RetentionInformation getRetentionInformation();
@SystemProperty(value = SystemPropertyName.RETENTION_DATE)
ZonedDateTime getRetentionDate();
void setRetentionDate(ZonedDateTime retentionDate);
@SystemProperty(value = SystemPropertyName.LITIGATION_HOLD)
Boolean getLitigationHold();
@SecondaryKey
String getName();
void setName(String name);
@Mandatory
ZonedDateTime getReceiptDate();
void setReceiptDate(ZonedDateTime receiptDate);
@Mandatory
ZonedDateTime getWarrantyEnd();
void setWarrantyEnd(ZonedDateTime warrantyEnd);
@Mandatory
String getMimeType();
void setMimeType(String value);
// helper for snake case db column names based on camel case getter/setter names
// attenttion you MUST use snake db column names in default value annotations! if the name is wrong you will get a model exception during start up
String DB_COL_WARRANTYEND = "warranty_end"; (1)
String DB_COL_MIMETYPE = "mime_type";
String DB_COL_RECEIPTDATE = "receipt_date";
String DB_COL_NAME = "name";
String DB_COL_RETENTIONDATE = "retention_date";
ZoneId ZoneIdEuropeBerlin = ZoneId.of("Europe/Berlin");
// set default values
@DefaultValue(DB_COL_WARRANTYEND)
default ZonedDateTime defaultWarrantyEnd() {
return getReceiptDate().withZoneSameInstant(ZoneIdEuropeBerlin).plusYears(3);
}
@DefaultSystemPropertyValue(SystemPropertyName.RETENTION_DATE)
default ZonedDateTime defaultRetentionDate() {
return ZonedDateTime.now(ZoneIdEuropeBerlin).plusYears(10);
}
@DefaultValue(DB_COL_MIMETYPE)
default String defaultMimeType() {
return MediaType.APPLICATION_OCTET_STREAM_VALUE;
}
}
(1) The annotation @DefaultValue() only accepts the database column name as static string parameter.As the document type properties are CAMEL case and the database column names are SNAKE case you must convert your properties e.g. MyCamelCaseProperty = my_camel_case_property.In the below example constants are defined in the type. |
The retention annotations also work for the document types: container, folder and relation. |
Tenant separation
Objects can be separated by tenant. If this is activated on a type definition, the corresponding table will contain a system field tenant_id, where the id of the tenant an entity resides in is stored.
The field will be assigned to the creation users tenants id and never change. When entities are queried, a filter will be added that filters only entities whose tenant_id field have the value of the querying users tenants id.
Fallback-tenant
Should the creation user be in the fallback-tenant, the _tenant_id field will be set to NULL. Should a querying user reside in the fallback-tenant, the query will not be filtered by tenant. This means that the fallback tenant behaves like a view to the data that combines all tenants. If such a thing is unwanted deactivate the fallback tenant using the multi-tenancy.mode configuration property.
Usage
The behaviour is primarily defined by the multi-tenancy.mode configuration property
It can be controlled more fine-grained by the annotation @TenantSeparation, however this makes sense mostly in environments where multi-tenancy.mode is allowed.
@Type(ObjectType.CONTAINER)
@TenantSeparation(true)
public interface Person {
// property definitions ...
}
There are aliases specified: @TenantSeparated for @TenantSeparation(true) and @TenantAgnostic for @TenantSeparation(false).
The default behaviour depends on the multi-tenancy.mode configuration property:
If it is enforced, types are separated by tenant by default. Any type specified to be not separated by tenant will cause arveo to fail at startup.
If it is disabled, types are not separated by tenant by default. Any type specified to be separated by tenant will cause arveo to fail at startup.
Advanced db schema changes
Simple changes of the database schema like adding a new attribute are performed automatically by the system in maintenance mode. In some cases it might be required to perform more complex schema changes, which cannot be handled by the system automatically. The following changes cannot be performed automatically on tables that already contains data:
-
setting NOT NULL for an existing column;
-
type changes especially to non-string columns;
-
foreign keys;
-
making a column UNIQUE.
For example, changing the data type of an attribute is not supported because it usually requires project specific migration steps. Advanced changes like this can be performed by custom liquibase scripts.
To perform custom database schema migrations, arveo offers several ways to define custom liquibase migration scripts:
-
A global script that will be executed before the first type definition will be created or updated. This script can be configured using the property
ecr.server.liquibase.preInitializationChangeLog
. -
A global script that will be executed after the last type definition was created or updated. This script can be configured using the property
ecr.server.liquibase.customChangeLog
. -
A script for a specific type definition that will be executed before the type definition is created or updated. This script can be configured using the annotation
@PreSchemaInitialization
on the class representing the type definition. -
A script for a specific type definition that will be executed after the type definition was created or updated. This script can be configured using the annotation
@PostSchemaInitialization
on the class representing the type definition.
The values of the configuration properties for the global scripts and the annotations must be valid URIs pointing to
a liquibase changelog script. The URIs can point to a filesystem resource (using file:/
) or a classpath resource
(using classpath:
). Each script will be executed in every configured tenant.
Schema initialization steps
For a better understanding of how the schema initialization works, the following list shows the steps performed by the system at startup:
-
for each tenant:
-
Create or update the system tables
-
Execute custom pre initialization changelog if configured
-
For each registered type definition class:
-
Execute custom class-specific pre schema initialization script if configured
-
Create or update the type definition table(s)
-
Execute custom class-specific post schema initialization script if configured
-
-
Execute custom liquibase changelog if configured
-
Note that the actions performed by the automatic schema initialization in step 3.b. can be influenced by the changes that were already performed by the custom scripts executed before. For example, the system will not try to create a new attribute if the custom script has already performed the required schema changes.
Example
The following example shows a type definition class that defines a custom script that will be executed before the type definition is updated. The script expects that the type definition table already exists on the database and is used to change the data type of the attribute postal_code from Long to String. Note that for the sake of simplicity, the script does not perform an actual data migration but simply drops and re-creates the database column for the attribute.
1
2
3
4
@Type(ObjectType.CONTAINER)
@Index(value = "my_container_name_index", onVersionTable = true)
@PreSchemaInitialization("classpath:liquibase/my-container-changelog.xml")
public interface MyContainer {
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<?xml version="1.1" encoding="UTF-8"?>
<databaseChangeLog
xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.1.xsd"
logicalFilePath="my-container-changelog.xml">
<changeSet id="update-my-container-1" author="root">
<dropColumn tableName="my_container" columnName="postal_code"/>
<addColumn tableName="my_container">
<column name="postal_code" type="text"/>
</addColumn>
<dropColumn tableName="my_container_ver" columnName="postal_code"/>
<addColumn tableName="my_container_ver">
<column name="postal_code" type="text"/>
</addColumn>
</changeSet>
</databaseChangeLog>
Note that the script in the above example first updates the content of the type definition system tables to reflect the changed data type of the attribute postal_code of the type my_container. Doing this causes the automatic migration performed afterwards to ignore the change. Other changes in the type class would still be performed automatically, if possible. The script then simple drops and re-creates the column for the attribute. In a real-life scenario, this is the place where the actual data migration would happen.
Changes not checked during startup
The following changes in the type system will not be checked for:
-
Inheritance: Changing the source key or the source property of an inherited property is allowed. The system will accept it (and not even check it). This can have subtle consequences. The data of an entity created before such a change will still be as before. However, the next time it is updated the inheritance will be computed anew and thus the data will change according the new inheritance rule.
-
formatted counter sequence names: Changing the name of the sequence of a formatted counter will take effect. This can have an impact on your application. It will result in the creation of a new sequence and effectively reset the counters value. This might be desired effect - it could also be the result of an oversight in the type changes. To protect oneself from accidental changes it is deemed could practice to mark formatted counter fields with
@Unique
. -
indexes prefixed with
ecr_mnl_
: Indexes defined on tables belonging to ecr types will be created and deleted according to changes in the types. However, indexes whose names start with the prefixecr_mnl_
will be excluded from that. This enables admins to quickly react on slow systems without a system update inferring with such a patch. This has two consequences-
Admins, that manually add an index, should pick a name for the index that starts with
ecr_mnl_
-
Developers, that add an index to an arveo type should pick a name that does not start with
ecr_mnl_
-
In a case where an index prefixed with ecr_mnl_ is used it will be beneficially in the long-run, to add the index to the type. In this case the prefix ecr_mnl_ in the name must be omitted when defining the index on the type.
|
Document Service
The Document Service service is responsible for handling various repository entities such as documents and folders. The following entity types are supported: document, folder, container, relation and metadata.
The service saves the binary data belonging to the documents and delivers them again. Various plugins are available for connecting storage devices and services. A plug-in is assigned to a profile and configured. When saving data, the client has to specify the profile to be used and thereby decides where the data will be saved.
Upload data
Content, annotations (see below) and metadata can be uploaded as a coherent document. 0-n content elements of different content types are possible. Each content element is named. As a result, you get a globally unique ID (DocumentID), which can be used to reference content, annotations and / or just metadata of the latest version of the document. It is possible to clone content elements from one document to another, creating a copy of the content on the storage. For that, a ContentReference can be supplied when the document is created.
TypedDocumentServiceClient<SingleContentDocument> serviceClient =
typeDefinitionServiceClient.getDocumentServiceClient().byClass(SingleContentDocument.class); (1)
SingleContentDocument document = serviceClient.createTypeInstance();
document.setName("some name");
TypedDocumentClient<SingleContentDocument> client = serviceClient.create(
new TypedDocumentInput<>(Map.of("content", (2)
new ContentUpload(inputStream)), document)); (3)
1 | The typeDefinitionServiceClient is an instance of TypeDefinitionServiceClient, that can be injected. |
2 | The type definition SingleContentDocument uses only the default content definition, hence the default name is used. |
3 | The actual content is passed as an InputStream. |
TypedDocumentServiceClient<TypedTargetDocument> serviceClient =
typeDefinitionServiceClient.getDocumentServiceClient().byClass(TypedTargetDocument.class); (1)
TypedTargetDocument document = serviceClient.createTypeInstance();
DocumentContentReference reference = new DocumentContentReference(documentId, "content"); (2)
TypedDocumentClient<TypedTargetDocument> client =
serviceClient.create(new TypedDocumentInput<>(document, Map.of("content", reference)));
1 | The typeDefinitionServiceClient is an instance of TypeDefinitionServiceClient, that can be injected. |
2 | Here the documentId is the ID of an already existing document that uses the default content definition. |
Base64EncodedData data = new Base64EncodedData(base64Data);(1)
Map<String, ContentUpload> contentElements = Map.of("content", new ContentUpload(data));(2)
serviceClient.create(new TypedDocumentInput<>(contentElements, document));(3)
1 | Wrap the Base64 encoded data in a new Base64EncodedData instance |
2 | Create a content upload with the created Base64EncodedData |
3 | Upload the document |
Validating uploaded content
There are several different ways to validate the content of an uploaded document. The method to use depends on the requirements of the client application. Some applications might already have computed a hash of the content while others might offload this to the server.
Validating content on the client side
When content is uploaded to a type definition that supports content metadata, the server computes an SHA-256 hash for the received data and returns it in the result of the upload request. The client can use this hash value to compare the data received by the server with the original data. The following example shows how to compare the hash values:
ContentTest entity = client.create(input).getEntity(); (1)
Hash hash = entity.getContent().get("content").getHash(); (2)
Hash expectedHash = Hash.sha256Hash(inputStream, 1000000, tempFile); (3)
Assert.assertEquals(expectedHash, hash);
1 | The document is uploaded using a type definition service client |
2 | Get the hash returned from the server. getContent is a getter for the system property SystemPropertyName.CONTENT . |
3 | Use de.eitco.ecr.common.Hash to compute the expected hash |
The TypedDocumentServiceClient
offers an additional method to validate uploaded content. The createAndValidate
method automatically computes a hash of the uploaded data and compares it with the hash value returned from the server. If the two hashes do not match, a HashValidationException
is thrown and the created document will be purged.
TypedDocumentServiceClient<ContentTest> client = typeDefinitionServiceClient
.getDocumentServiceClient().byClass(ContentTest.class);
ContentUpload contentUpload = new ContentUpload(data);
Map<String, ContentUpload> content = Map.of("content", contentUpload);
ContentTest instance = client.createTypeInstance();
TypedDocumentInput<ContentTest> input = new TypedDocumentInput<>(content, instance);
client.createAndValidate(input);
Validating content on the server side
It is also possible to pass a hex representation of an SHA-256 hash code of the uploaded content to the server. If such a hash is present, the server will compare the computed hash value with the one specified by the client. If the values do not match, the upload fails and the uploaded file will not be stored.
Hash hash = Hash.sha256Hash(inputStream, 1000000, tempFile); (1)
ContentUpload contentUpload = new ContentUpload(
"lorem_ipsum.txt", (2)
null, (3)
null, (4)
data,
hash
);
Map<String, ContentUpload> content = Map.of("content", contentUpload);
ContentTest document = client.createTypeInstance();
TypedDocumentInput<ContentTest> input = new TypedDocumentInput<>(content, document);
client.create(input);
1 | Use de.eitco.ecr.common.Hash to compute the hash |
2 | The filename |
3 | null for the length, will be computed by the server |
4 | null for the content type, will be computed by the server |
Validating the content of an existing document
The TypedDocumentServiceClient
provides a method called hashMatches
that can be used to check if the content of an existing document is valid. The client has to provide the expected hash, the document’s ID and the name of the content element to check. An additional parameter called loadContent
defines if the server should use the hash value stored in the database or if it should load the content from the storage and compute a new hash value to compare. It is possible to check the content of a specific version of a document, too.
Hash hash = Hash.sha256Hash(inputStream, 1000000, tempFile);
boolean hashMatches = documentServiceClient.hashMatches(documentId, "content", hash, false);
Download data
Content, annotations and metadata of a document can be downloaded via API. It is possible to load the entire document as a multipart or a structure of the document that includes all metadata, annotations and a list of content elements with their IDs, types and identifiers. Each content element can then be loaded using the document ID / content ID or the document ID / content type. Access to individual content elements without a document ID is not possible for reasons of access control. Access control based on the document ID is ensured with every access.
Update metadata without a version
The meta information of a document can be changed. The changes can be persisted in the database without creating a version. It is possible, to maintain frequently changing information on the document quickly without creating the overhead of a version. However, in the event of an audit, the changes are not traceable.
Delete an object
Documents contain one or more content elements which are not stored in the database but in the storage system. When a document is deleted using one of the delete-calls, the content elements will remain on the storage. To delete both the database entries and all content elements (including those referenced from older versions), a client can use the purge methods provided by the document clients.
A type definition can use the optional recycle bin feature. If it is enabled, entities in the type definition can be moved to and restored from the recycle bin. The Delete-API allows you to execute the methods:
-
MoveToRecycleBin(): to move an object to the recycle bin. The DELETE-property of the latest version is set to 1 and content and older versions are not affected.
-
Delete() all the versions of the object are deleted from the database.
-
Purge(): all the version of the objects are deleted from the database and the content objects or files are erased.
-
RestoreFormRecycleBin(): restore an object from the recycle bin, the DELETE-property is set to 0
If an object has relations to other objects is related by other objects the delete or purge method will fail with a foreign key exception. The Relation API provides methods to delete the relations (Remove Relations) |
Filter recycle bin
Entities in the recycle bin will be filtered from normal queries by default, but a client can compose search expressions that override this behavior. To do that it is sufficient to include a reference to the deleted system field in the expression. The following example shows a part of a query that will show only deleted entities:
....and().systemField(SystemFieldList.GeneralSystemField.Deleted.INSTANCE).equalTo().value(true)
Note that the deleted system field can contain null values, which have the same meaning as false. When a client uses one of the delete calls to delete one or more entities, all database entries for those entities will be deleted (including all versions).
There is no option to restore entities once they have been deleted. |
If there are relations between entities that are to be deleted, the relations are not deleted. Instead, a ForeignKeyException is thrown - and has to be handled by the caller.
Removing all relations of an entity
To delete all relations that originate from a certain entity, the method removeAllRelations() has to be used. The method returns the deleted relations:
List<Relation> removed = sourceContainerClient.removeAllRelations();
You can also delete all relations that point to a specific entity. For this, there is the method removeAllIncomingRelations(). This also returns the deleted relations:
List<Relation> removed = targetContainerClient.removeAllIncomingRelations();
Once all relations have been removed, the entity can also be deleted.
Locking
If your applications want to update objects from different processes at the same time you must decide if you want to use no locking or optimistic locking. No locking means that the latest update wins and overwrites the concurrent update. Depending on the database configuration it might happen that one update becomes a deadlock victim and an exception is thrown. If optimistic locking is enabled for the document type the API ensures that updates do not overwrite changes made by other clients by accident. The feature is disabled by default and can be enabled by annotating a type class with @OptimisticLocking. e.g. two processes A and B load the same object including content and versions at the same time and get the same version of the document. Now both processes process the document and some metadata and add additional content. A is faster than B. With No Locking B overwrites the changes made by A. With optimistic locking B cannot save the changes and receives a Locking exception. Process B has to load the changes made by A and retry the operation.
Download links for external users
You can create download links for content elements that can be used by external users who do not exist in the user management service. Such a link has an expiration date and can be used to download a single content element. The links are digitally signed using a configurable certificate, so that the receiver cannot alter the referenced content element or the expiration date of the link. The arveo service provides a special HTTP endpoint to process download links. This endpoint does not require authentication. Instead, the download link contains credentials that allow the user to access the referenced content element.
This feature must be activated by configuring a keystore containing an RSA keypair that will be used to sign the links. |
The configuration options are listed here.
Download links (or content access tokens), can be created using the Java SDK as shown in the following exaple:
@Autowired
private ContentAccessTokenResourceClient contentAccessTokenResourceClient;
ContentAccessTokenInput input = new ContentAccessTokenInput(
documentId,(1)
"content",(2)
ZonedDateTime.now().plusHours(3)(3)
);
String token = contentAccessTokenResourceClient.createToken(input);
1 | The ID of the document containing the content element to download |
2 | The name of the content element |
3 | The expiration date of the link (can be omitted) |
The returned token can then be used to download the content by performing a GET request to the following endpoint:
GET http://my-arveo-instance/api/streaming/<token>
Download links can only be created for content elements that are accessible to the user creating the link. When the client does not define an expiration date when the content access token is created, the configured maximum lifetime is used.
The creation of new tokens will fail when the client specifies an expiration date that would exceed the configured maximum lifetime. |
Versioning
The goal of using the concept of versioning is to create and work with version-safe archives and track the history of each change in the system.
Versioning basics
All entity types in arveo may have a version, which itself is an optional attribute. The attributes of the entity types specify in their definition whether they are versioned. If an entity type has at least one versioned attribute, a version table is created. The version number of an existing entity is automatically created and can be retrieved via the system property version_number.
In the version table, the version changes to the metadata are listed, as well as the changes to one or more content elements. Optionally you can specify a Unicode version comment. Each version gets a version ID, which is unique for this bundle of version tables. The version id allows a developer to retrieve content and metadata of exactly this version of the entity. Using the API a developer can query all versions including their metadata and content elements for each entity ID or version ID. It is ensured that the existing content of a version is not changed or deleted by a new version, but there is an exception to this rule, which does allow to overwrite a version change.
There is a function that allows you to make a change without having to note it in the version table. And there is a way to forbid this for a certain entity type. |
Implementation of versioning
The concept of versioning is implemented using the annotation @Versioned, which is defined by the interface Versioned. This annotation defines if an attribute of a type is versioned or not (when placed on a getter) or if all attributes of a type are versioned or not (when placed on a type). When the annotation is present on a type and on a getter in the type, the annotation on the getter wins.
The following example of an object of type Container contains an attribute "name", which is a versioned attribute. The other attribute "counter" in this example is marked as not versioned.
Example:
@Type(ObjectType.CONTAINER)
@OverwriteAllowed
public interface TypedSourceContainer {
@Name("counter")
@Versioned(false)
int getCounter();
@Name("name")
@Versioned
String getName();
}
Data model for versioning
The actual search table only contains the current status of metadata and system fields. In the version table, however, all entities and their versions including the metadata are listed. Only versioned attributes are included in the version table. A current internal version counter (1.,2…n) is maintained in the system column version_number.
During versioning the service counts up the internal version counter by incrementing the value of the system column version_number by 1. The value is stored in the version table.
Changes to non-versioned fields cannot be tracked because they are not written to the version table. To prevent accidental overwriting of such fields, optimistic locking can be activated. In this case, a certain property is defined to let the system know, a certain version of an entity is outdated.
Optimistic locking
Activating the optimistic locking prevents overwriting for versioned fields. When simultaneously editing an entity and trying to overwrite saved changes of another user, an error message is thrown. Overwriting is not thus possible. Hence, through activating the optimistic locking on an entity type definition (using the annotation @OptimisticLocking), you prevent data corruption.
Optimistic locking is used only for single updates, not for batch updates.
Structure of the version system table
The version system table consists of the following columns (this is not a complete excerpt):
column | db data type | java data type | nullable? |
---|---|---|---|
version_id |
bigserial |
long |
no |
entity_id |
int8 |
long |
yes |
version_acl_id |
int8 |
long |
yes |
modification_date |
timestamp |
ZonedDateTime |
no |
modification_user_id |
int8 |
long |
no |
version_comment |
text |
String |
yes |
version_number |
int4 |
int |
no |
In this table, version_id is the primary key. The foreign key entity_id references the corresponding entity table.
Version ID
The version ID has the following structure:
[12bit Tenant id][14bit Type Definition id][38bit Version id]
Here the tenant may be for instance a database scheme or a customer. It is followed by a type definition, for instance Container. The third part is the version id in the database. The composed version id is unique in arveo system.
Search language
Concept
Any client application, that needs a search function, can implement the Search Service with a suitable parameter. An example of such an implementation is the class DocumentServiceClient in the Client API. The search queries are formulated similarly, what is different is the search result, which is always typed. In arveo the type is Entity.
Technical implementation
Search Service is part of the module 'commons'. It was created to enable more convenient searching. The Search Service works on the basis of EQL (Eitco Query Language). This query language is also used for some other services, like Access Control Service. The main interface is SearchService. It is a functional interface, providing just one method to be implemented: search(). However, this functional interface has a variety of convenience methods, enabling faster and more convenient search, like firstResult(), uniqueResult(), count(), stream() and others.
Page<EntityType> search(@NotNull SearchRequest searchRequest);
As the only parameter, a search request is accepted, returning a Page of results. A Page has a page definition, a completeCount and a parameterized list of results. The Search Service also provides a method where() with a condition builder, filtering results based on a specific condition.
SearchServiceFactory is a server class, which builds search queries. It has methods for creating an instance of search service for Documents (searchServiceForDocument()), but also for all the other entities, including Metadata. The result of the search is transformed into a Document (or respectively another entity) by the DocumentMapper.
The class SearchResourceImplementation provides an API for searches that are not bound to one and only one type definition.
The interface SearchService is implemented by the class EcrSearchService.
The search client creates different search services, which can be used to search for corresponding entities, for instance a folder search service, a document search service and so on. And there is also a GenericUnionSearchService, that can be used to create any joins on search statements.
Usage
The following examples demonstrates the usage of the Search Service to retrieve an object page.
SearchService<Object> searchService = <a valid instance>;
Page<Object> objectPage = searchService.where()
.contextReference("field").equalTo().value(7).or()
.contextReference("other_field").greaterEqual().contextReference("another_field")
.holds()
.order().descendingBy("field").from(5).pageSize(7);
It is possible to check the type of object searched for:
1
2
3
4
5
6
7
8
9
10
11
searchService.where() (1)
.entity().typeId() (2)
.equalTo()
.typeId(NamedFile.class) (3)
.or()
.entity().typeName() (4)
.in().expressions(x -> x
.typeName(NamedTextFile.class) (5)
.typeName(NamedFolder.class)
).and()
.entity().typeId().notEqual().typeId("named_relation") (6)
1 | The variable searchService is an EcrSearchService . |
2 | The id of the type of given entity is referenced by the method typeId() . |
3 | The type id is checked to be the id of the type defined by the class NamedFile (which is obtained by the method typeId() ). |
4 | Here the type name is referenced instead of the type id. |
5 | As with the type id, the name of the type defined by the class NamedTextFile is obtained. |
6 | The type id can also be obtained if only the type name is given. |
Search endpoints
Using the ecr sdk you will be able to obtain a SearchClient
by spring injection.
@Autowired
private SearchClient searchClient;
A search client has several methods to search in different ways or different contexts.
Aggregation searches
In some situations one needs to accumulate some values that are listed in a database. In SQL this is done using aggregate functions and the group by
clause. For example in an invoice archive one might be interested in the number of invoices per customer, or the sum of their totals (per customer).
Queries like this can be executed using the aggregated search. As opposed to the other search methods the result entity type of this search method is Map<String, Object>
, since aggregating properties will potentially result in a different type - one that might not be specified. Thus, a more general return type is used.
To start an aggregated search query, you will need to build a search service for your aggregated search first. We will build a service for the example above: querying the number and total sum of invoices per user.
Assume that we have a type customer
defined by the class Customer
and a type invoice
defined by the class Invoice
:
@Type(ObjectType.DOCUMENT)
@FilingEnabled
public interface Customer {
@SystemProperty(SystemPropertyName.ID)
DocumentId id();
@Unique
String getName();
void setName(String name);
}
@Type(ObjectType.DOCUMENT)
@FilingEnabled
public interface Invoice {
@ForeignKey(name = "fk_invoice_customer", target = Customer.class, targetProperty = "id")
long getCustomerNumber();
void setCustomerNumber(long number);
@Optional
String getCustomerName();
void setCustomerName(String customerName);
@Optional
String getName();
void setName(String name);
@Optional
Integer getTotal();
void setTotal(Integer total);
@Optional
Boolean getOpen();
void setOpen(Boolean open);
}
As you can see, the invoice references the customer with the property customer_number
defined by the method getCustomerNumber()
. Now we can build a search service as follows:
final EcrSearchService<Map<String, Object>> aggregationSearchService = searchClient.aggregate() (1)
.count("i", "id").as("invoice_count") (2)
.sum("i", "total").as("invoice_total")
.groupedBy("c", "name").as("customer")
.from().type(Invoice.class).as("i").join().type(Customer.class).as("c") (3)
.on().alias("i").field("customer_number").equalTo().alias("c").id() (4)
.holds().build();
1 | Calling SearchClient.aggregate() is the entry point to the fluent api to build a search service for aggregation search requests. |
2 | At first, we need to specify what to aggregate and what to group by: In our case we want to get the count (of the invoice ids) and the sum of the invoice totals grouped by customer name (which is unique). Every field that is grouped by will also be part of the result. |
3 | Now we need to specify from where the data to aggregate comes from. We join the type Invoice with the type Customer . Note that we specify aliases for the types "i" and "c" , which we used in the step before to reference the types fields. |
4 | Now we specify the condition for the join. The condition is that the invoices customer_number must equal the customers' id - as the foreign key fk_invoice_customer above specifies. |
Now we can query for the aggregated data:
final List<Map<String, Object>> all = aggregationSearchService.where().alwaysTrue().holds().unpaged();
This will result in a list of maps - one map per customer - where every map contains the keys "customer"
, "invoice_count"
and "invoice_total"
. Holding the customers name, the number of their invoices and their total sum, respectively´.
Additionally, we can query specific customers and invoices using the same search service. In this scenario for example we could query every customers "invoice_count"
and "invoice_total"
of invoices that are open i.e. that they haven’t paid, yet:
final List<Map<String, Object>> open = aggregationSearchService.where()
.alias("i").field("open").equalTo().value(true)
.holds().unpaged();
This will also result in a list of maps - one map per customer - where every map contains the keys "customer"
, "invoice_count"
and "invoice_total"
. Holding the customers name, the number of their invoices and their total sum, respectively´ - only counting open invoices.
Note that we can reference the invoices field open
by using the alias i
we provided earlier, even though it is not part of the result.
=== Enterprise Search
==== NoSQL Document Database apache solr 8.6
Apache Solr is a search server and is used as an independent full-text search server for ECR Healthcare. Solr uses the Apache Lucene search library as the core for full-text indexing and search.
Retention periods
arveo supports a range of retention management features:
-
Full support of document life cycle;
-
Supports prolongation and litigation hold for data retention managers;
-
Privileged delete before retention expires;
-
Privileges for data protection officers (delete) and data protection managers (litigation);
-
Flexible storage container definition (e.g. months, years) for documents with identical retention period (S3 buckets or file system folders);
-
Fast erasure of storage container by asynchronous delete jobs.
Concept
arveo is able to store content with a fixed retention date to ensure that the legal or tax relevant retention period of a document is taken into account and the content is protected from deletion. You can configure retention rules for arveo document types and automatically apply the appropriate retention period to uploaded documents.
If some of your documents could be required in a legal proceeding but the retention period expires before the end of dispute you can set a litigation hold or prolong the retention period to protect the data until the dispute has finished.
Let us describe why the storage container concept is used by arveo. Most storage systems can create objects much faster than they can delete them. Once the retention has expired it is much faster to remove a bucket (cloud storage) or partition/directory (file system). You can setup retention rules to define which documents are stored to the containers. All documents within a certain retention range (e.g. 1 year or 3 months) will be stored to one storage container (S3 bucket or directory). arveo allows you to delete millions of content objects in a very short time by simply removing the entire storage container.
If a document needs to be deleted e.g. for data privacy reasons, arveo also provides an API call to erase single objects by their ID. If you want to delete an object before its retention period has expired the user needs along with delete_right also the dataprivacy_admin privilege.
Because the new legal data privacy / protection act makes it necessary to erase data even before the expected retention period has expired arveo does not use hardware retention features, which protect data from erasure on the hardware level. arveo protects the content by software design. arveo stores the retention information in the database and only allows access to the content and metadata by the arveo REST API. The REST API prevents any delete operation before the retention period has expired. As only arveo and highly authorized administrators have data writer rights for the database and the storage it is impossible that content be deleted or manipulated before the retention expires.
The operator must take appropriate technical or organizational measures to ensure that the data is stored in the storage in such a way that it cannot be changed within the legally prescribed retention period. The provider of the arveo services should ensure that only authorized data protection officers & administrators have data write (INSERT,UPDTAE; DELETE) permissions for the database and the content repository. |
Storage container and document life cycle
Since deleting large amounts of documents is a performance critical task, the arveo repository service provides special support for mass deletion of documents whose retention period has expired.
The basic idea is to define separate storage locations, which are exclusively used to store documents with similar retention requirements. The deletion of documents with specific retention requirements is then a matter of deleting all contents of a specific storage location in one step. Storage locations containing documents with the same retention period will be called storage container for the rest of this section.
arveo allows you to store data with the same retention in one storage container and is able to create storage containers automatically.
The storage containers are either folders (file system storage) or buckets (S3 object storage). The actual selection of the storage container for a document with specific retention requirements can be configured by rules, that select the storage container based on the retention period and litigation hold status of the uploaded document.
When the litigation hold is set, the object is moved to the litigation hold directory or bucket and will not be deleted when the initial retention period expires. When the litigation hold ends, the document is deleted the next time a delete job runs. The number of objects under litigation hold is typically small and does not affect the overall erasure performance.
When a litigation hold is removed, the objects are moved to other storage container which do not have a litigationHold on them.
The following diagram shows the life cycle of a document with a fixed retention period set on upload, a legal dispute and automatic erasure at the end of the document’s life cycle:
Each storage container in fact corresponds to a separate storage profile that is used to store the contents of that storage container. The rules that are used to map the retention requirements of documents to storage container are defined as rules for the Bucket Organizer Plugin, see Bucketorganizer.
Litigation hold
arveo provides a system property LITIGATION_HOLD that allows you to prolong the retention until you remove the litigation hold property.
This function requires the DATAPRIVACY_ADMIN privilege. |
Prolongation
You can prolong the retention period but not shorten it. You can use the API call to set the initial retention period if the retention is null. When the retention is prolonged, arveo moves the object to the appropriate storage container.
This function requires the DATAPRIVACY_ADMIN privilege. |
Erase a document
The arveo delete API will as for all other objects without a retention period delete the respective objects. See also Deletion of objects and Recovery table.
After the retention period has expired, the function requires the DELETE privilege, but before the retention period has expired, DATAPRIVACY_PRIVILEGED_DELETE privilege is required. |
This API should not be used for operations like deleting the objects of a certain year. This should be done using the erasure storage container API. |
Erase storage container
If you have used the storage container feature to speed up the deletion of documents at the end of their life cycle, you can delete all documents within a retention period range with one API REST call 'EraseStorageContainer'.
You can either erase the storage container (buckets, folders) controlled by your operating team or with an automated arveo job. You can set up a scheduled job in the arveo integration service. Use the erasure storage container template job and adopt it to your needs. The erasure job will delete all entities of a document type within the given retention period range where litigation hold is not set. The job will write an entry for each erased object in the corresponding audit log table. For more detailed explanation, see the erasure job template example.
Mass deletion of documents under retention requires the SUPER_USER privilege. |
Enable the audit log feature for all document types and dependent document types if you need a report of the erased objects. Audit Log |
Grant the deletion right for your storage containers to arveo. If arveo cannot delete the containers, your operating team is in charge of this task and you must set the option delete rows only. |
Privileges & roles
Privilege | DATAPRIVACY_ADMIN (Data Protection Manager) | DATAPRIVACY_PRIVILEGED_DELETE (Data Protection Officer) | SUPER_USER (Data Protection Administrator) |
---|---|---|---|
Prolongation |
yes |
no |
no |
Litigation Hold |
yes |
no |
no |
Delete before retention |
no |
yes |
no |
Mass Delete |
no |
no |
yes |
Examples
Create document with retention and set litigation hold
public void createDocumentWithRetention() throws IOException {
final String TEST_IDENTIFIER = "SetLitigationHold test timestamp in ms=";
final String TEST_DATA = "abcde";
final String TEST_DATA_MIMETYPE = MediaType.APPLICATION_OCTET_STREAM_VALUE;
TypedDocumentServiceClient<DocumentWithRetention> serviceClient =
typeDefinitionServiceClient.getDocumentServiceClient().byClass(DocumentWithRetention.class);
ZonedDateTime now = ZonedDateTime.now(ZoneOffset.UTC);
DocumentWithRetention newDocument = serviceClient.createTypeInstance();
newDocument.setName(TEST_IDENTIFIER + System.currentTimeMillis());
newDocument.setReceiptDate(now );
newDocument.setMimeType(TEST_DATA_MIMETYPE);
newDocument.setRetentionDate(now);
ByteArrayInputStream data = new ByteArrayInputStream(TEST_DATA.getBytes());
Map<String, ContentUpload> content = Map.of("content", new ContentUpload(data));
TypedDocumentClient<DocumentWithRetention> newClient = serviceClient.create(new TypedDocumentInput<>(content, newDocument));
Assert.assertEquals(IOUtils.toByteArray(newClient.readContent("content")), TEST_DATA.getBytes());
DocumentWithRetention loadedDocument = newClient.getEntity();
Assert.assertNotNull(loadedDocument);
Assert.assertTrue(loadedDocument.getName().startsWith(TEST_IDENTIFIER));
Assert.assertEquals(loadedDocument.getMimeType(), TEST_DATA_MIMETYPE);
assertDateEquals( loadedDocument.getReceiptDate(), now);
assertDateEquals(loadedDocument.getRetentionInformation().getRetentionDate(), now);
Assert.assertFalse(loadedDocument.getRetentionInformation().isLitigationHold());
// set LitigationHold = true
newClient.updateLitigationHold(true);
newClient = newClient.reload();
DocumentWithRetention litigationOnDocument = newClient.getEntity();
Assert.assertTrue(litigationOnDocument.getRetentionInformation().isLitigationHold());
// set LitigationHold = false)
newClient.updateLitigationHold(false);
newClient = newClient.reload();
DocumentWithRetention litigationOffDocument = newClient.getEntity();
Assert.assertFalse(litigationOffDocument.getRetentionInformation().isLitigationHold());
}
Set retention / prolong retention
public void createDocumentWithoutRetention() throws IOException {
final String TEST_IDENTIFIER = "SetRetention test timestamp in ms=";
final String TEST_DATA = "abcde";
final String TEST_DATA_MIMETYPE = MediaType.APPLICATION_OCTET_STREAM_VALUE;
TypedDocumentServiceClient<DocumentWithRetention> serviceClient =
typeDefinitionServiceClient.getDocumentServiceClient().byClass(DocumentWithRetention.class);
// store document without retention
DocumentWithRetention newDocument = serviceClient.createTypeInstance();
newDocument.setName(TEST_IDENTIFIER + System.currentTimeMillis());
newDocument.setReceiptDate(ZonedDateTime.now());
newDocument.setMimeType(TEST_DATA_MIMETYPE);
ByteArrayInputStream data = new ByteArrayInputStream(TEST_DATA.getBytes());
Map<String, ContentUpload> content = Map.of("content", new ContentUpload(data));
TypedDocumentClient<DocumentWithRetention> newClient = serviceClient.create(new TypedDocumentInput<>(content, newDocument));
Assert.assertEquals(IOUtils.toByteArray(newClient.readContent("content")), TEST_DATA.getBytes());
DocumentWithRetention emptyRetentionDocument = newClient.getEntity();
RetentionInformation retentionInformation = emptyRetentionDocument.getRetentionInformation();
Assert.assertNotNull(retentionInformation);
Assert.assertNull(retentionInformation.getRetentionDate());
Assert.assertFalse(retentionInformation.isLitigationHold());
// set initial retention
ZonedDateTime initialRetentionDate = ZonedDateTime.now();
emptyRetentionDocument.setRetentionDate(initialRetentionDate);
TypedDocumentClient<DocumentWithRetention> initialRetentionClient = newClient.updateAttributes(emptyRetentionDocument);
DocumentWithRetention initialRetentionDocument = initialRetentionClient.getEntity();
assertDateEquals(initialRetentionDocument.getRetentionInformation().getRetentionDate(), initialRetentionDate);
// prolong retention
ZonedDateTime prolongedRetentionDate = ZonedDateTime.of(2050, 1, 1, 0, 0, 0, 0, ZoneId.of("Europe/Berlin"));
initialRetentionDocument.setRetentionDate(prolongedRetentionDate);
TypedDocumentClient<DocumentWithRetention> prolongedRetentionClient = initialRetentionClient.updateAttributes(initialRetentionDocument);
DocumentWithRetention prolongedRetentionDocument = prolongedRetentionClient.getEntity();
assertDateEquals(prolongedRetentionDocument.getRetentionInformation().getRetentionDate(), prolongedRetentionDate);
}
Retention cleanup job
The retention cleanup job can be used to remove entities with an expired retention period that are not currently in litigation hold status. The job can be triggered to run in the internal job scheduler of the repository service or in a separate instance of the job service. It expects two configuration parameters to be present in the job context of the triggered execution:
-
type-definition-name
: The name of the type definition that contains the entities to remove. -
retention-cleanup-retention-end-time
: The time at which the rentention period has expired. All entities with a retention period that has expired before the specified time will be removed. The specified time must be in the past.
On systems supporting multiple tenants, the tenant to run the job for must be configured using the property tenant .
|
The following optional properties can be set in the context of the triggered execution:
-
retention-cleanup-asynchronous
: Enables the asynchronous mode of the job. The asynchronous mode is described below. -
retention-cleanup-batch-size
: The size of a single batch of entities to process. The default is 1000 and the maximum is 10000. -
retention-cleanup-filter
: An optional filter in form of an EQLExpression<Boolean>
to apply to the query used to find entities with expired retention period. -
retention-cleanup-maximum-queue-size
: The maximum acceptable size of the message queue used by the job in asynchronous mode. If the queue size exceeds the configured maximum, the job will stop. The job will check the queue size once a minute. -
retention-cleanup-duration
: An optional maximum duration of the job’s runtime. If the duration is exceeded, the job will stop. The default is null (no limit). The value must be ajava.time.Duration
. -
retention-cleanup-max-entity-count
: The maximum number of entities to process. If this number is reached, the job will stop. The default is -1 (no limit). -
retention-cleanup-protocol-file
: Optional property that can contain a fully qualified path to a file that will contain a list of all deleted entity IDs.
The retention-cleanup-maximum-queue-size limitation mechanism relies on statistics data for the JMS queues that is
collected by a separate system job. This job is not enabled by default and must be enabled using the property
ecr.server.jobs.jms-statistics.enabled=true to use this feature.
|
All properties except the type definition name, the retention end time and the protocol file can be configured in the configuration file of the service either globally or for each type definition. See configuration reference for details.
Asynchronous mode
In the asynchronous mode, content is not deleted from the storage immediately. Instead, a message queue is used to delete
the content asynchronously. The database entries for the documents are not removed but marked as deleted using the
COMPLIANCE_DELETED
field.
The entries that were marked as deleted are automatically excluded from query results. It is not possible to read those entries using the arveo client API.
The arveo client API can be used to delete all entities that were marked as deleted as shown in the following example:
Expression<Boolean> expression = EcrQueryLanguage.condition()
.entity().systemField(SystemFieldList.GeneralSystemField.ComplianceDeleted.INSTANCE)
.equalTo().value(true).holds();
serviceClient.delete(expression, 1000); (1)
1 | Set a limit to reduce database load |
Triggering the job
Both the repository service and the job service offer an API that provides methods to create triggers for the job. The following example shows how to use this API to create a simple trigger that will fire once at a specified time. The API requires administrator privileges.
EcrSchedulerResourceClient schedulerClient = systemManagementClient.getSchedulerClient();
JobKeyModel jobKey = new JobKeyModel(SystemJobIdentities.ECR_JOBS_GROUP, SystemJobIdentities.RETENTION_CLEANUP);
TriggerKeyModel triggerKey = new TriggerKeyModel(SystemJobIdentities.ECR_JOBS_GROUP, "test-trigger-retention-cleanup");
SimpleTriggerModel trigger = new SimpleTriggerModel(triggerKey, jobKey);
trigger.setNextFireTime(ZonedDateTime.now());
trigger.setJobDataMap(Map.of(
SystemJobDataKeys.TYPE_DEFINITION_NAME, SimpleInvoiceNames.getTypeDefinitionName(),
SystemJobDataKeys.RETENTION_CLEANUP_RETENTION_END_TIME, ZonedDateTime.now(),
SystemJobDataKeys.TENANT, "master",
SystemJobDataKeys.RETENTION_CLEANUP_PROTOCOL_FILE, getTargetDir() + File.separator + "retention-cleanup-job.log"
));
schedulerClient.scheduleSimpleTrigger(trigger);
The API provides additional methods to create cron expression based triggers and to unschedule a job.
REST API
Client SDKs
The client SDKs provide APIs for applications using arveo. SDKs exist for both Java and TypeScript. Client applications should not use the REST API of arveo directly but instead use one of the provided SDKs.
JSON serialization
arveo uses a custom serialization for the JSON data in the REST API to support advanced features like polymorphism. Additionally, the custom serialization allows the arveo server and the client SDKs to pass type information. This way it is for example possible to differ between number types like short, int and long. The client SDKs take care of the serialization and the direct usage of the REST API is discouraged.
If it is necessary to (de-) serialize the custom JSON data, use the already configured Jackson ObjectMapper that is
used by the server and the SDKs. This ObjectMapper is equipped with mixin types that contain information about how
to (de-) serialize the custom JSON content. The internal ObjectMapper can be obtained by injecting an instance of
de.eitco.commons.spring.web.json.AsdlObjectMapperHolder
.
The service offers an overview page containing the REST resources and details about the models. It can generate examples for the models, too. The overview page is located at the root URL of the service.
Type information
Each object contains a type identifier in a json property called @type
. The required value is listed in the API
overview page for each model class. Example:
"identifier": {
"@type": "container-id",
"identifier": {
"@long": "1"
}
}
Type information for data types
There are some special type identifiers used to identify the type of JSON fields.
The following table lists types and their corresponding identifiers.
Type (Java) | Identifier |
---|---|
Byte |
@byte |
Short |
@short |
Long |
@long |
BigInteger |
@big-int |
Float |
@float |
Instant |
@utc-date-time |
ZonedDateTime |
@zoned-date-time |
Class<?> |
@type-reference |
UUID |
@uuid |
byte[] |
@binary |
LocalDate |
@date |
LocalTime |
@time |
Other data types do not require specific type identifiers.
The following example shows a special type identifier:
"retentionDate": {
"@zoned-date-time": "2020-12-15T15:52:21.5193002+01:00[Europe/Berlin]"
}
Collections
To distinguish between different types of collections (lists and sets) there are type identifiers for collection types.
Type (Java) | Identifier |
---|---|
List |
@list |
Set |
@set |
The following is an example of the Type List:
"list": {
"@list": []
}
Java SDK
The SDK contains the general API for accessing the arveo. The SDK can be used both to access the arveo via HTTP and to use the arveo as an embedded library.
<dependency>
<groupId>de.eitco.ecr</groupId>
<artifactId>ecr-sdk-http</artifactId>
<version>${ecr.version}</version>
</dependency>
<dependency>
<groupId>de.eitco.ecr</groupId>
<artifactId>ecr-embedded</artifactId>
<version>${ecr.version}</version>
</dependency>
The SDK offers both a generic API, where attributes of objects are mapped as a generic map, and a typed API. The typed API uses classes to be created by the project that represent the objects with the attributes. The main entry point for the API is the class de.eitco.ecr.sdk.TypeDefinitionServiceClient. An instance of this class can be obtained using Spring Dependency Injection. With the methods
-
getDocumentServiceClient()
-
getContainerServiceClient()
-
getFolderServiceClient()
-
getRelationServiceClient()
-
getMetaDataServiceClient()
you obtain a client factory that can be used to create a service client for a specific type definition. This service client can then be used to create new objects or load existing objects. For created or loaded objects, one in turn receives an entity client that offers methods for accessing the object. Special version clients are also available for concrete versions of entities.
Using the SDK in a non-web application
The SDK can be used both in applications that provide web functionality like REST endpoints and in applications that do not contain any web functionality. For non-web applications, some differences need to be considered.
Dependencies
By default, the SDK contains an OAuth2 client implementation that relies on some web-related spring beans. For non-web applications, a different OAuth2 client implementation is available. The default implementation needs to be excluded from the SDK dependency and replaced by the non-web implementation as shown in the following example:
<dependency>
<groupId>de.eitco.ecr</groupId>
<artifactId>ecr-sdk-http</artifactId>
<version>${ecr.version}</version>
<exclusions>
<exclusion>
<groupId>de.eitco.commons</groupId>
<artifactId>cmn-spring-security5-oauth2-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>de.eitco.commons</groupId>
<artifactId>cmn-spring-security5-oauth2-client-non-web</artifactId>
<version>${commons-oauth2-version}</version>
</dependency>
The current version of the OAuth2 client can be found in the Nexus.
Application initialization
The SDK contains some dependencies that cause Spring to initialize some web functionality automatically. This can cause
problems like missing spring security configuration errors. Non-web applications can simply turn off all of Springs
web functionality by using the SpringApplicationBuilder
class as shown in the following example:
@SpringBootApplication
public class MyApplication {
public static void main(String[] args) {
new SpringApplicationBuilder(MyApplication.class)
.web(WebApplicationType.NONE)
.run(args);
}
}
Batch Operations
The SDK provides various methods for batch operations. For example, several objects can be created or updated at once.
Create, update or delete multiple objects of the same type
All service clients provide methods for creating, updating and deleting multiple objects. Since a service client is bound to a specific type definition, only objects of the same type can be created, updated or deleted in this way. The objects to be updated or deleted are identified by any selector. When updating, methods are available that return the updated objects and methods that return only the number of updated objects. Especially if a large number of objects are updated at once, only the latter methods should be used. With these methods, the objects can only be updated in the same way. If the objects are to be customised, the methods from the BatchOperationServiceClient (see below) must be used.
Create or update several objects of different types
The BatchOperationServiceClient class provides methods to create or update multiple objects of different types.
Create several interdependent objects
To create multiple objects of different types, special BatchCreateInput input objects are used that bundle the type of the object and its properties. The order in which the objects are created corresponds to the order in which the input objects are passed. Each of these input objects contains a virtual ID that identifies it within the batch operation. In this way, for example, a relation as well as its source and target can be created in a batch operation. The relation only has to be created with the virtual IDs of source and target.
If the relation between the objects consists not only of the ID, but also of a foreign key to any attribute, a reference to the corresponding attribute of the referenced object must be given to the dependent object. For this purpose, the class BatchAttributeReference is available, which bundles the name of the foreign key attribute, the referenced attribute and the virtual ID of the other object in the batch operation. Code examples can be found in the class de.eitco.ecr.system.test.BatchCreationIT.
Update multiple objects of different types
The BatchOperationServiceClient also provides methods to update several different objects of different types in a batch operation. A separate input object is passed for each object to be updated, which contains the ID of the object and the properties to be updated. This means that individual changes can also be made to each object with these methods. The BatchUpdateUtility class provides auxiliary methods with which the respective input objects can be created. Code examples can be found in the class de.eitco.ecr.system.test.BatchUpdateIT.
Automatic update in case of collision
The BatchCreateInput objects used to create various types make it possible to automatically update the existing object in the event of a collision. To do this, the BatchCreateInput only has to be made aware of the field on which the collision could occur:
TypedContainerBatchCreateInput<Person> containerBatchCreateInput =
new TypedContainerBatchCreateInput<>(new TypedContainerInput<>(person), List.of());
containerBatchCreateInput.setCollisionCheckAttribute("first_name");
In the above example, a container is to be created in a batch where a collision could possibly occur on the attribute
first_name
.
The attribute that is to be used to detect the collisions must be provided with a unique constraint.
Create or update (upsert) operations
The SDK provides methods to perform create or update (upsert) operations on entities.The entity to update, if it should exist, is identified by an EQL selector. If a matching entity is found, it is updated using the provided data. If no matching entity is found, the provided data is used to create a new entity. The selector must match exactly one or zero existing entities. If it matches more than one entity, an exception is thrown. The following example shows how to perform an upsert operation.
TypedContainerServiceClient<Person> serviceClient =
typeDefinitionServiceClient.getContainerServiceClient().byClass(Person.class);
LocalDate birthday = LocalDate.of(1995, Month.SEPTEMBER, 16);
Person person = serviceClient.createTypeInstance();
person.setBirthday(birthday);
person.setSurname("Smith");
person.setFirstName("John");
person.setBreakTime(LocalTime.NOON);
person.setProcedureDate(ZonedDateTime.now());
TypedContainerClient<Person> client = serviceClient.createOrUpdate(
EcrQueryLanguage.condition().entity().field(PersonNames.FIRST_NAME).equalTo().value("John").holds(), (1)
person
);
1 | The selector that uniquely identifies the entity to update |
Generic batch operations
The generic batch operation API can be used to perform different operations like create, update or delete in one transaction. The batch operations use the same input types as the other batch functions described above, which makes it possible to use the result of one operation in another following operation. The following operation types are available:
Read operations
-
TypedContainerBatchReadOperation
-
TypedDocumentBatchReadOperation
-
TypedFolderBatchReadOperation
-
TypedMetaDataBatchReadOperation
-
TypedRelationDataBatchReadOperation
The purpose of read operations is to provide input data for other operations. For example, a read operation could be used to read an entity of which only the ID is known, and then use the entity’s attribute values as input for a create operation. When the entity cannot be read, the entire batch of operations fails and the transaction is rolled back.
Delete operations
-
TypedContainerBatchDeleteOperation
-
TypedDocumentBatchDeleteOperation
-
TypedFolderBatchDeleteOperation
-
TypedMetaDataBatchDeleteOperation
-
TypedRelationBatchDeleteOperation
Delete operations are used to delete a single entity. Unlike the other operations, it is not possible to reference a delete operation. When the entity cannot be deleted, the entire batch of operations fails and the transaction is rolled back.
Update operations
-
TypedContainerBatchUpdateOperation
-
TypedDocumentBatchUpdateOperation
-
TypedFolderBatchUpdateOperation
-
TypedMetaDataBatchUpdateOperation
-
TypedRelationBatchUpdateOperation
Update operations are used to update a single entity. When the entity cannot be updated, the entire batch of operations fails and the transaction is rolled back.
Create or update operations
-
TypedContainerBatchCreateOrUpdateOperation
-
TypedDocumentBatchCreateOrUpdateOperation
-
TypedFolderBatchCreateOrUpdateOperation
-
TypedMetaDataBatchCreateOrUpdateOperation
-
TypedRelationBatchCreateOrUpdateOperation
Create or update operations perform an upsert as described in Create or update (upsert) operations. When the operation cannot update or create the entity, the entire batch of operations fails and the transaction is rolled back.
Create operations
-
TypedContainerBatchCreateOperation
-
TypedDocumentBatchCreateOperation
-
TypedFolderBatchCreateOperation
-
TypedMetaDataBatchCreateOperation
-
TypedRelationBatchCreateOperation
Create operations are used to create a new entity. When the entity cannot be created, the entire batch of operations fails and the transaction is rolled back.
Examples
The first example implements a solution for the following problem: An invoice was archived with a relation to an invalid customer. The customer must be replaced with a new customer and the reference in the invoice must be updated.
Customer customer = customerServiceClient.createTypeInstance();
customer.setName(UUID.randomUUID().toString());
TypedDocumentBatchCreateOperation<Customer> createCustomerOperation = (1)
new TypedDocumentBatchCreateOperation<>(customer);
Invoice invoice = invoiceServiceClient.createTypeInstance();
BatchAttributeReference reference = new BatchAttributeReference( (2)
InvoiceNames.CUSTOMER_NUMBER,
SystemFieldList.GeneralSystemField.Id.INSTANCE.getName(),
createCustomerOperation.getVirtualId().getUuid()
);
TypedDocumentBatchUpdateInput<Invoice> invoiceInput =
new TypedDocumentBatchUpdateInput<>(invoiceId, new TypedDocumentInput<>(invoice), List.of(reference));
TypedDocumentBatchUpdateOperation<Invoice> updateInvoiceOperation = (3)
new TypedDocumentBatchUpdateOperation<>(invoiceInput);
TypedDocumentBatchDeleteOperation deleteCustomerOperation = new TypedDocumentBatchDeleteOperation(customerId); (4)
List<EcrId> ids = batchOperationServiceClient.performTypedBatchOperations( (5)
createCustomerOperation, updateInvoiceOperation, deleteCustomerOperation);
1 | The operation to create the new customer |
2 | A reference to the ID of the new customer to be used for the customer_number field of the updated invoice |
3 | The operation to update the existing invoice |
4 | The operation to delete the invalid customer |
5 | An injected instance of de.eitco.ecr.sdk.BatchOperationServiceClient |
The second example shows how to use a read operation.
TypedDocumentBatchReadOperation readCustomerOperation = new TypedDocumentBatchReadOperation(customerId);
Invoice invoice = invoiceServiceClient.createTypeInstance();
invoice.setCustomerNumber(customerId.getIdentifier());
BatchAttributeReference attributeReference = new BatchAttributeReference( (1)
InvoiceNames.CUSTOMER_NAME,
CustomerNames.NAME,
readCustomerOperation.getVirtualId().getUuid()
);
TypedDocumentBatchCreateInput<Invoice> invoiceInput =
new TypedDocumentBatchCreateInput<>(new TypedDocumentInput<>(invoice), List.of(attributeReference));
TypedDocumentBatchCreateOperation<Invoice> createInvoiceOperation =
new TypedDocumentBatchCreateOperation<>(invoiceInput);
List<EcrId> ids = batchOperationServiceClient.performTypedBatchOperations(readCustomerOperation, createInvoiceOperation); (2)
1 | A reference to the name attribute of the customer read by the read operation used for the customer_name attribute of the invoice |
2 | An injected instance of de.eitco.ecr.sdk.BatchOperationServiceClient |
System tables
This section contains information about the system tables used by arveo.
Tables for type definitions
The system stores some information like the ID of type definitions in the database. For this, the following tables are used:
-
ecr-types
: Contains en entry for each type definition -
ecr-types-content-elements
: Contains 1:n mappings of content elements to type definitions.
+---------------+ +----------------------------+ | ecr_types | | ecr_types_content_elements | +---------------+ +----------------------------+ | id |<----+ | ce_name | |---------------| +---------------+ ce_type_id | | creation_date | |----------------------------| | ecr_version | | ce_content_type | | object_type | | ce_profile | | type_name | | ce_store_json | +---------------+ +----------------------------+
Column | Type | Description |
---|---|---|
id |
int4 |
ID of the type definition |
creation_date |
timestamp |
Creation date and time |
object_type |
text |
Type of the objects in the type definition |
type_name |
text |
The name of the type definition |
Column | Type | Description |
---|---|---|
ce_name |
text |
The name of the content element |
ce_type_id |
int4 |
The ID of the type definition containing the content element |
ce_content_type |
text |
The allowed content type of the content element |
ce_profile |
text |
The name of the storage profile used by the content element |
ce_store_json |
boolean |
Whether the content element uses the JSON field or not |
Folder structure tables
The object type FOLDER is used to create tree-like structures with parent- and child-relationships. The structure is stored in the ecr_folder_structure table. The table ecr_folder_structure_closure contains a transitive hull of the parent- and child-relationships to allow fast database queries in the tree.
+----------------------+ +------------------------------+ | ecr_folder_structure | | ecr_folder_structure_closure | +----------------------+ +------------------------------+ | child_id | | id | |----------------------| |------------------------------| | child_name | | child_id | | child_type_id | | child_type_id | | parent_id | | depth | | parent_type_id | | parent_id | +----------------------+ | parent_type_id | +------------------------------+
Column | Type | Description |
---|---|---|
child_id |
int8 |
The ID of the child-folder |
child_name |
varchar(128) |
The name of the child-folder |
parent_id |
int8 |
The ID of the parent-folder |
parent_type_id |
int4 |
The ID of the parent type definition |
Column | Type | Description |
---|---|---|
id |
uuid |
The ID of the entry in the closure table |
child_id |
int8 |
The ID of the child-folder |
child_type_id |
int4 |
The ID of the child type definition |
depth |
int4 |
The distance between the child and the parent on the direct path in the tree |
parent_id |
int8 |
The ID of the parent folder |
parent_type_id |
int4 |
The ID of the parent type definition |
Recovery table
The recovery table ecr_recovery is used for the recovery feature.
+----------------+ | ecr_recovery | +----------------+ | deleted_date | | entity | | entity_id | | keep_until | | type_id | | version_id | +----------------+
Column | Type | Description |
---|---|---|
deleted_date |
timestamp |
Date and time at which the entity was deleted |
entity |
jsonb |
A JSON representation of the deleted entity |
entity_id |
int8 |
The ID of the entity |
keep_until |
timestamp |
The date and time until which to keep the entity in the recovery table |
type_id |
int4 |
The ID of the type definition |
version_id |
int8 |
The version ID of the deleted entity |
Keystore tables
When the encryption feature is enabled for a storage profile, the generated keys are stored in
profile-specific database tables. For each encrypted profile, a table called ecr_keys-<profile>
and a table called
ecr_keys_assoc_<profile>
is created. The ecr_keys_<profile> table contains the generated keys, and the
ecr_keys_assoc_<profile> table contains the associations between content elements and keys.
+--------------------+ +------------------------+ | ecr_keys_profile | | ecr_keys_assoc_profile | +--------------------+ +------------------------+ | id |<--+ | content_id | |--------------------| +-------+ id | | key | +------------------------+ +--------------------+
Column | Type | Description |
---|---|---|
id |
int8 |
The ID of the key |
key |
bytea |
The encryption key |
Column | Type | Description |
---|---|---|
content_id |
text |
The ID of the content element |
id |
int8 |
The ID of the key |
Compatibility list
To operate arveo successfully the operator of the platform must provide and manage the following services.
The following table lists 3rd party services used in arveo.
Service | Supported Version | Comment |
---|---|---|
JDK |
Java 11 |
Integration tests run on Adopt Open JDK 11, but all JDKs are supported |
ActiveMQ |
ActiveMQ 5.15,5.16 |
|
PostgreSQL |
postgres 12, 13 |
|
Apache Solr |
Apache Solr 8.6 |
|
S3 Storage |
Ceph 15, 16 |
Retention is not supported yet, even if provided by the vendor |
File System |
NFS |
|
Linux OS |
Ubuntu 18.04, 20.04 |
|
Application Server |
Tomcat 9, 10 |
|
kubernetes |
1.19 |
If helm deployment is used |
docker |
20.10.8 |
If helm deployment is used |
OAuth |
OAuth2.0 |
Grant flows: |
Authentication Services |
Keycloak 15 |
|
LDAP Server |
MS Active Directory |
|
MS Graph |
Document Conversion with Microsoft 365, requires M365 account |
|
SSO |
Kerberos |
Kerberos Aiuthentication Service is MS Active Directory |
Important Terminology
- ECR
-
Short for Enterprise Content Services; this is the collection of the arveo content services providing all document and record features.
- EQL
-
Eitco Query Language.
Used for search operations.
- Entity
-
Object that represents a type of data structure used in arveo.
- Document
-
An entity that can contain metadata and content.
- Folder
-
An entity that contains metadata and is organized in a tree structure like in a file system.
- Relation
-
An entity that represents a relation between two other entities.
- Container
-
Simple folder-like object not organized in a tree structure but with relations to other objects.
- Meta
-
An entity that contains only metadata.
- Content type
-
A meta specification, that classifies the data.
Examples of content types are: original object, rendition, full text, text notes, XML properties, etc.
- Retention
-
Continuous audit-proof storage of all company data for compliance or own business purposes.
- Litigation hold
-
A flag that indicates whether a document is related to a litigation.
If the flag is set the document must never be deleted - even if the retention date has passed by.
- Bucket
-
Object storage.
- Encryption
-
Translating data into unreadable forms by means of electronic or digital codes or keys.
A specific key in the form of a procedure or an algorithm is required for the reverse transformation. Then the legitimate user can access the original data.
- Annotation
-
A construct used on interfaces or getter-methods to specify their properties.
- Storage profile
-
Are used to define on which storage the content elements are saved.
- Storage Container
-
Are folders or buckets on the content storage containing documents with the same retention period (e.g. Jan-Dez 2031).