About Document Conversion Service

The Document Conversion Service provides a REST API to convert binary data from one format to another. For example, a JPEG image could be converted to a PDF file. Another use case is text extraction from different document types. The actual content conversion is not implemented by the Conversion Service. The service offers an interface for plugins that perform the actual conversion.

Configuration

There might exist several plugins that may fulfill a given request, so you can configure which plugin to use for a given request. By default, the service just selects the first available plugin that claims to support the requested source and target mimetypes.

In order to configure this, specify a list of use cases in your application.yaml in the key rendering (for render plugins) or in the key extraction (for fulltext plugins).

Use cases for render plugins

Each use case consists of

  • sourceType

    • a regular expression matching the mimetype to be converted

  • targetType

    • a regular expression matching the mimetype to be converted to

  • plugin

    • the name of the plugin to use (by default the fully qualified class name, but can be specified by each plugin)

For a given request the service will search the list in the given order and use the first plugin whose configured sourceType and targetType match the requests.

Example of a rendering configuration
rendering:
  - sourceType: application/pdf
    targetType: image/jpe?g
    plugin: my-pdf-to-image-plugin
  - sourceType: .*
    targetType: .*
    plugin: my-fallback-plugin

The configuration above will instruct the service to use the plugin with the name my-pdf-to-image-plugin for requests to render PDFs to JPG or JPEG files. In any other case it uses the plugin my-fallback-plugin. Note that if my-fallback-plugin would be listed before the other one every request would be handled by my-fallback-plugin.

Use cases for extraction plugins

Each use case consists of

  • sourceType

    • a regular expression matching the mimetype to be converted

  • plugin

    • the name of the plugin to use (by default the fully qualified class name, but can be specified by each plugin)

For a given request the service will search the list in the given order and use the first plugin whose configured sourceType matches the requests.

Example of an extraction configuration
extraction:
  useCases:
    - sourceType: application/pdf
      plugin: de.eitco.commons.conversion.plugins.oss.TikaFulltextOcrExtractionPlugin

The configuration above will instruct the service to use the plugin with the name my-extraction-plugin for requests to extract text from pdf files.

General configuration settings

Limiting the size of uploaded data

To limit the size of the uploaded data to be rendered, the following configuration properties can be used:

server:
  undertow:
    max-http-post-size: 30MB
spring:
  servlet:
    multipart:
      max-file-size: 30MB
      max-request-size: 30MB

Plugins Overview

The service provides two plugin interfaces: RenderPlugin and FulltextPlugin. A RenderPlugin converts from one document type to another. It defines a list of supported source mimetypes and a list of supported target mimetypes. A FulltextPlugin can extract text from a document. It defines a list of supported source mimetypes. The target mimetype is always text/plain.

Open source plugins

The document-conversion-plugins-oss library provides several plugins based on open source libraries. To use the plugins, the jar file of the library must be added to the classpath of the service. The plugins will be registered automatically.

A ZIP containing the library and additional dependencies can be downloaded from nexus using the following maven coordinates:

<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>document-conversion-plugins-oss</artifactId>
    <type>zip</type>
    <classifier>zip</classifier>
    <version>6.0.1-SNAPSHOT</version>
</dependency>

Plugins contained in document-conversion-plugins-oss

de.eitco.commons.conversion.plugins.oss.OpenPdfRenderPlugin

Renders txt to pdf with openpdf.

Source media types Target media types
  • text/plain

  • application/pdf

de.eitco.commons.conversion.plugins.oss.TiffToPdfRenderPlugin

Renders TIF to Pdf using IText.

Source media types Target media types
  • image/tiff

  • application/pdf

de.eitco.commons.conversion.plugins.oss.PdfMergingContainerPlugin

Merges pdf files.

Source media types Target media types
  • application/pdf

  • application/pdf

de.eitco.commons.conversion.plugins.oss.OpenHtmlRenderPlugin

Renders from XHTML to PDF by using com.openhtmltopdf.

Source media types Target media types
  • application/xhtml+xml

  • application/pdf

de.eitco.commons.conversion.plugins.oss.ImagesToPdfRenderPlugin

Renders from JPG,PNG,GIF to PDF by using apache pdfbox.

Source media types Target media types
  • image/jpeg

  • image/png

  • image/gif

  • image/bmp

  • application/pdf

de.eitco.commons.conversion.plugins.oss.PdfToImagesRenderPlugin

Renders from pdf to JPG,PNG.GIF,TIF by using apache pdfbox.

Source media types Target media types
  • application/pdf

  • image/jpeg

  • image/png

  • image/gif

  • image/tiff

de.eitco.commons.conversion.plugins.oss.TikaFulltextOcrExtractionPlugin

Extracts text by using Apache Tika and Tesseract OCR.

Source media types Target media types
  • application/pdf

  • application/xml

  • text/html

  • application/msword

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document

  • application/vnd.ms-excel

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • application/vnd.ms-powerpoint

  • application/vnd.openxmlformats-officedocument.presentationml.presentation

  • application/epub+zip

  • application/vnd.ms-outlook

  • application/rtf

  • application/vnd.oasis.opendocument.presentation

  • application/vnd.oasis.opendocument.spreadsheet

  • application/vnd.oasis.opendocument.text

  • text/plain

  • text/plain

de.eitco.commons.conversion.plugins.oss.PdfToMultiPageTiffRenderPlugin

Renders from pdf to TIFF Pages by using apache pdfbox.

Source media types Target media types
  • application/pdf

  • image/tiff

de.eitco.commons.conversion.plugins.oss.TikaFulltextExtractionPlugin

Extracts text by using Apache Tika.

Source media types Target media types
  • application/pdf

  • application/xml

  • text/html

  • application/msword

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document

  • application/vnd.ms-excel

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • application/vnd.ms-powerpoint

  • application/vnd.openxmlformats-officedocument.presentationml.presentation

  • application/epub+zip

  • application/vnd.ms-outlook

  • application/rtf

  • application/vnd.oasis.opendocument.presentation

  • application/vnd.oasis.opendocument.spreadsheet

  • application/vnd.oasis.opendocument.text

  • text/plain

  • text/plain

de.eitco.commons.conversion.plugins.oss.MsgAttachmentsExtractionPlugin

Extracts attachments from Outlook Message files and renders them to a single PDF.

Source media types Target media types
  • application/vnd.ms-outlook

  • application/pdf

The OpenHtmlRenderPlugin has a problem with xml-structures who doesn’t close the tags. This will give you an exception.

Fulltext plugins

The following fulltext extraction plugins exist:

  • de.eitco.commons.conversion.plugins.oss.TikaFulltextExtractionPlugin

    • extracts text from pdf, doc(x), xls(x), epub, html, msg, odp, ods, odt, pptx, rtf and xml files using open source java solutions (namely apache tika)

  • de.eitco.commons.conversion.plugins.oss.TikaFulltextOcrExtractionPlugin

    • extracts text with ocr from pdf, doc(x), xls(x), epub, html, msg, odp, ods, odt, pptx, rtf and xml files using open source java solutions (namely apache tika and tesseract for ocr)

Enable tesseract ocr:

If you want to use the TikaFulltextOcrExtractionPlugin to extract text from images, you need to install tesseract. In the following steps the installation will be explained.

  1. Download and install tesseract from this page https://github.com/tesseract-ocr/tessdoc/blob/main/Downloads.md.

    • For Ubuntu for example: sudo apt install tesseract-ocr

  2. Add tesseract to the path enviroment variables. For example in Windows you must add the following lines in the path:

    • {path to tesseract}\Tesseract-OCR

    • {path to tesseract}\Tesseract-OCR\tessdata

  3. Add new language packages to tesseract tessdata directory. Download the packages from the following site https://ocrmypdf.readthedocs.io/en/latest/languages.html

    • The default language is english (tesseract shortname = eng)

    • If you add a new language you must add they also in the yaml

    • Also you can change the dpi for tessearct image extraction

Example of adding english and germany to tesseract and change the dpi
extraction:
  tikaOcrLanguage: "eng+deu"
  tikaOcrDpi: 300

Levigo Jadice plugins

The document-conversion-plugins-jadice library provides several plugins based on Microsoft Graph libraries. To use the plugins, the jar file of the library must be added to the classpath of the service. The plugins will be registered automatically.

A ZIP containing the library and additional dependencies can be downloaded from Nexus using the following maven coordinates:

<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>document-conversion-plugins-jadice</artifactId>
    <type>zip</type>
    <classifier>zip</classifier>
    <version>6.0.1-SNAPSHOT</version>
</dependency>

Plugins contained in document-conversion-plugins-jadice

de.eitco.commons.conversion.plugins.jadice.JadiceToPdfPlugin

Renders various formats to PDF using Jadice.

Source media types Target media types
  • application/msword

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document

  • application/vnd.ms-excel

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • application/vnd.ms-powerpoint

  • application/vnd.openxmlformats-officedocument.presentationml.presentation

  • text/plain

  • message/rfc822

  • application/xhtml+xml

  • text/html

  • image/jpeg

  • image/gif

  • image/png

  • image/bmp

  • application/pdf

Please note that Microsoft Office documents (Powerpoint and Word) are here not confronted with any formatting related issues.

Configuration considerations of Levigo Jadice usage

A RenditionUseCase describes the source- and target-mimetype of a conversion supported by a specific plugin. The configuration file below shows how the source-mimetype 'application/msword' to the target-mimetype 'application/pdf' is configured using the JadiceToPdfPlugin.

rendering:
  useCases:
    - plugin: de.eitco.commons.conversion.plugins.jadice.JadiceToPdfPlugin
      sourceType: "application/msword"
      targetType: "application/pdf"
  containerUseCases:
    - targetType: "application/pdf"
    - plugin: de.eitco.commons.conversion.plugins.jadice.JadiceToPdfPlugin

Microsoft Graph plugins

The document-conversion-plugins-msgraph library provides several plugins based on Microsoft graph libraries. To use the plugins, the jar file of the library must be added to the classpath of the service. The plugins will be registered automatically.

If you want to use this plugin you need a Microsoft 365 account which can use sharepoint and azure.

A ZIP containing the library and additional dependencies can be downloaded from nexus using the following maven coordinates:

<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>document-conversion-plugins-ms-graph</artifactId>
    <type>zip</type>
    <classifier>zip</classifier>
    <version>6.0.1-SNAPSHOT</version>
</dependency>

Plugins contained in document-conversion-plugins-ms-graph

de.eitco.commons.conversion.plugins.msgraph.GraphRenderPlugin

Renders Microsoft Office Documents and some other formats to PDF using the Microsoft Graph API

Source media types Target media types
  • application/msword

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document

  • application/vnd.ms-excel

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • application/vnd.ms-powerpoint

  • application/vnd.openxmlformats-officedocument.presentationml.presentation

  • application/vnd.ms-outlook

  • application/rtf

  • text/html

  • application/xhtml+xml

  • application/vnd.oasis.opendocument.presentation

  • application/vnd.oasis.opendocument.spreadsheet

  • application/vnd.oasis.opendocument.text

  • message/rfc822

  • image/tiff

  • application/pdf

Setup microsoft graph render plugins

If you want to use the graph render plugins you will need a technical user and a sharepoint drive. A technical user in azure is named "app registration". In the following we will explain how to get an azure app registration and the sharepoint drive id.

App Registration

Login into https://portal.azure.com/. And search for the App Registration now you will see the following windows and you can create a new app registration:

app-registration-01
Figure 1. Azure - App-Registration

If you go into your new app registration you will find in the overview the client-id and the tenant-id.

app-registration-02
Figure 2. App-Registration details

Now you can go to the api-authorization. The minimum you will need for the microsoft graph render plugin is the authorization named "Files.ReadWrite.All".

app-registration-03
Figure 3. App-Registration api authorization

Now you can also set the authentication security for the app-registration. You can allow the app-registration to work in every tenant in azure or only in the tenant of yourself.

app-registration-04
Figure 4. App-Registration authentication

The last we will need is a secret this can generate in the following window.

app-registration-05
Figure 5. App-Registration secrets
SharePoint

At first you need to create a sharepoint site and a drive. If you have create this you will need the SharePoint drive id. In the following we will explain how to get the drive id.

First you will need a SharePoint Teamwebsite. After you create the website, you can create a new document library [Optional].

A helper for the following request is the following Graph Explorer.

If you use this tool you musst be login with a user and you will set the permissions in "Modify permissions".

Response:

{
    "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#sites/$entity",
    "id": "xxxxxxxx.sharepoint.com,0bbbfad6-xxxx-xxxx-xxxx-xxxxxxxxxxxx,3c5f2d82-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "name": "xxxxxxxx",
    "displayName": "arveo"
}

Response:

{
    "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#drives",
    "value": [
        {
            "id": "b!XXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
            "name": "Conversion"
        }
    ]
}
YAML configuration

In the yaml of the document conversion service you must set the following to use the graph render plugin:

ms-graph:
  azure:
    credentialType: CLIENT_SECRET
    clientId: <azure-client-id>
    clientSecret: <azure-client-secret>
    tenantGuid: <azure-tenant-guid>
  sharepoint:
    driveId: <drive-id>

Alternatively, you can use username and password credentials to login to your azure account:

ms-graph:
  azure:
    credentialType: USERNAME_PASSWORD
    clientId: <azure-client-id>
    username: <username>
    password: <password>
  sharepoint:
    driveId: <drive-id>

Microsoft Azure Plugins

The document-conversion-plugins-ms-azure library provides several plugins based on microsoft azure libraries. To use the plugins, the jar file of the library must be added to the classpath of the service. The plugins will be registered automatically.

A ZIP containing the library and additional dependencies can be downloaded from nexus using the following maven coordinates:

<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>document-conversion-plugins-ms-azure</artifactId>
    <type>zip</type>
    <classifier>zip</classifier>
    <version>6.0.1-SNAPSHOT</version>
</dependency>

Plugins contained in document-conversion-plugins-ms-azure

de.eitco.commons.conversion.plugins.msazure.AzureCognitiveOcrExtractionPlugin

Extracts text from images and pdfs by using azure cognitive services.

Source media types Target media types
  • image/jpeg

  • image/png

  • image/bmp

  • application/pdf

  • image/tiff

  • text/plain

Setup Microsoft Azure fulltext plugin

If you want to use the azure fulltext plugins you will need a cognitive service in your azure portal.

Setup cognitive service

Login into https://portal.azure.com/. And search for the Cognitive Services (Computer Vision) now you will see the following window and you can create a new computer vision service:

computer-vision-01
Figure 6. Azure - Create computer vision service

After creating a computer vision service, you need the endpoint. The endpoint can you find in the overview of the service.

computer-vision-02
Figure 7. Computer vision service overview

Now you need an access key of your computer vision service.

computer-vision-03
Figure 8. App-Registration api authorization
YAML configuration

In the yaml of the document conversion service you must set the following to use the azure fulltext plugin:

azure:
  cognitive:
    key: ""
    endpoint: ""
    ocrDetectionLanguage: "de"
    modelVersion: "latest"
The default value for the "ocrDetectionLanguage" value is de (german). The other language you can choose is english. If you want to use english, you can write in your yaml "en".

Azure computer version models can you find here. === Amazon AWS plugins

The document-conversion-plugins-amazon-aws library provides several plugins based on amazon aws. To use the plugins, the jar file of the library must be added to the classpath of the service. The plugins will be registered automatically.

If you want to use this plugin you need an Amazon AWS account (also named IAM account).

A ZIP containing the library and additional dependencies can be downloaded from nexus using the following maven coordinates:

<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>document-conversion-plugins-amazon-aws</artifactId>
    <type>zip</type>
    <classifier>zip</classifier>
    <version>6.0.1-SNAPSHOT</version>
</dependency>

Plugins contained in document-conversion-plugins-amazon-aws

de.eitco.commons.conversion.plugins.aws.AwsTextractPdfToSearchablePdfPlugin

Generates a searchable pdf from a scanned pdf with aws textract

Source media types Target media types
  • application/pdf

  • application/pdf

The AwsTextractPdfToSearchablePdfPlugin has a dependency on the PdfToImagesRenderPlugin (de.eitco.commons.conversion.plugins.oss)

Setup IAM account and credentials

If you don’t want to use a technical user for this plugin you can continue with add permission.
1 Create User

Login into AWS Console and search for AWS IAM. Here you can create a new user with the Add users button.

aws-iam-1
Figure 9. AWS IAM create user
2 Add Permission

After you create a new user you must add permissions. To do that, go to the newly created user account.

aws-iam-2
Figure 10. AWS IAM add permissions

Click on Add permission, then on Attach existing policies and add the AmazonTextractFullAccess permission to the user.

aws-iam-3
Figure 11. AWS IAM add permissions for textract
3 Create Access Key

Now you need to create an access key. Switch to the tab Security credentials.

aws-iam-4
Figure 12. AWS IAM security credentials

Here you can find acces keys. Create a new access key.

aws-iam-5
Figure 13. AWS IAM add create access key
YAML configuration

In the yaml of the Document Conversion Service you must set the following to use the aws textract render plugin:

Here you can find the aws region list.

aws:
    accessKey: "XXXXXXXXXXXXXXXXXXXX"
    secretKey: "X1XX2XXXXXXXXXXX3XXXXXXXX4XXXXXXX567XXX8"
    region: "EU_WEST_1"

e-iceblue plugins

If you want to use the e-iceblue Plugins, you must have an iText license.

The document-conversion-plugins-e-iceblue library provides several plugins based on e-iceblue libraries. To use the plugins, the jar file of the library must be added to the classpath of the service. The plugins will be registered automatically.

A ZIP containing the library and additional dependencies can be downloaded from Nexus using the following maven coordinates:

<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>document-conversion-plugins-e-iceblue</artifactId>
    <type>zip</type>
    <classifier>zip</classifier>
    <version>6.0.1-SNAPSHOT</version>
</dependency>

Plugins contained in document-conversion-plugins-e-iceblue

de.eitco.commons.conversion.plugins.eiceblue.EIcebluePdfToPdfAPlugin

Generates a subtype of pdfa from a pdf with e-iceblue.

Source media types Target media types
  • application/pdf

  • application/pdf

YAML configuration

In the yaml of the document conversion service you must set the following to use the aws textract render plugin:

Here you can find the aws region list.

e-iceblue:
  pdfType: "PdfA1A"
  license: ""

iText Plugins

If you want to use the iText Plugins, you must have an iText license.

The document-conversion-plugins-itext library provides several plugins based on iText libraries. To use the plugins, the jar file of the library must be added to the classpath of the service. The plugins will be registered automatically.

A ZIP containing the library and additional dependencies can be downloaded from Nexus using the following maven coordinates:

<dependency>
    <groupId>de.eitco.commons</groupId>
    <artifactId>document-conversion-plugins-itext</artifactId>
    <type>zip</type>
    <classifier>zip</classifier>
    <version>6.0.1-SNAPSHOT</version>
</dependency>

Plugins contained in document-conversion-plugins-itext

de.eitco.commons.conversion.plugins.itext.ITextPdfToSearchablePdfaPlugin

Generates a searchable pdfa out of a scanned pdf with itext.

Source media types Target media types
  • application/pdf

  • application/pdf

The ITextPdfToSearchablePdfaPlugin has a dependency to the PdfToImagesRenderPlugin (de.eitco.commons.conversion.plugins.oss)

Installing Tesseract

If you want to use the ITextPdfToSearchablePdfaPlugin you need to install tesseract on you operating system.

  1. Download and install tesseract from this page https://github.com/tesseract-ocr/tessdoc/blob/main/Downloads.md.

    • For Ubuntu for example: sudo apt install tesseract-ocr

  2. Add tesseract to the path enviroment variables. For example in Windows you must add the following lines in the path:

    • {path to tesseract}\Tesseract-OCR\tessdata

YAML configuration

In the yaml of the document conversion service you must set the following:

itext:
  pathToTessData: "{path to tesseract}/Tesseract-OCR/tessdata"
  pdfLang: "deu"
  iTextLicensePath: "{path to itext license}/itextkey.json"

For the pdfLang option you can look at the following page tesseract model list

Usage

The following examples show how to perform different conversions using the client API of the conversion service. The following dependency is required to get access to the http client API:

<dependency>
  <groupId>de.eitco.commons</groupId>
  <artifactId>document-conversion-http-client-spring-boot-starter</artifactId>
  <version>${project.version}</version>
</dependency>

The client API instances can be obtained using injectable factory classes.

@Autowired
private FulltextResourceClientFactory fulltextClientFactory;
@Autowired
private DocumentConversionResourceClientFactory conversionClientFactory;
The utility class de.eitco.commons.io.ContentAnalyzer contained in cmn-commons-io can be used to determine the mime type of a file.

Converting an image to PDF

This example shows how to convert an image of type JPEG to PDF.

DocumentConversionResourceClient client = conversionClientFactory.newClient(); (1)
File jpeg = new File("src/test/data/source/source_inputstreamlist_images/jpgSample.jpg");

try (FileInputStream inputStream = new FileInputStream(jpeg)) {
    InputStream rendition = client.render(MediaType.IMAGE_JPEG_VALUE, MediaType.APPLICATION_PDF_VALUE, inputStream); (2)
    try (FileOutputStream out = new FileOutputStream("target/rendition.pdf")) {
        IOUtils.copy(rendition, out); (3)
    }
}
1 Creates a new client instance using the DocumentConversionResourceClientFactory
2 Sends a request for the rendition to the service. The source- and target-mimetypes are strings and can be obtained from any utility class containing standard mime type strings.
3 Saves the rendition to a file using Apache Commons IO IOUtils

Extracting text from a PDF file

This example shows how the API can be used to extract text content from a PDF file.

FulltextResourceClient client = fulltextClientFactory.newClient(); (1)
File pdf = new File("src/test/fulltext-data/source/test.pdf");

try (FileInputStream inputStream = new FileInputStream(pdf)) {

    String text = client.extractText(MediaType.APPLICATION_PDF_VALUE, inputStream); (2)
}
1 Creates a new client instance using the FulltextResourceClientFactory
2 Sends a request for the extraction to the service. The source-mimetype is a string and can be obtained from any utility class containing standard mime type strings.

Combining multiple images to one PDF

This exmaple shows how to combine multiple images to a single PDF file.

DocumentConversionResourceClient client = conversionClientFactory.newClient(); (1)

File image1 = new File("src/test/data/source/source_inputstreamlist_images/jpgSample.jpg");
File image2 = new File("src/test/data/source/source_inputstreamlist_images/pngSample2.png");

List<ConversionMultipartBodyElement> inputStreamAndMediaTypeList = new ArrayList<>();

FileInputStream fileInputStream1 = null;
FileInputStream fileInputStream2 = null;

try {
    fileInputStream1 = new FileInputStream(image1);
    fileInputStream2 = new FileInputStream(image2);

    inputStreamAndMediaTypeList.add(new ConversionMultipartBodyElement(MediaType.IMAGE_JPEG_VALUE, fileInputStream1)); (2)
    inputStreamAndMediaTypeList.add(new ConversionMultipartBodyElement(MediaType.IMAGE_PNG_VALUE, fileInputStream2));

    InputStream combinedPdf = client.combineToPdf(inputStreamAndMediaTypeList); (3)

} finally {
    fileInputStream1.close();
    fileInputStream2.close();
}
1 Creates a new client instance using the DocumentConversionResourceClientFactory
2 Create the body elements for the multipart request that will be sent to the service
3 Sends the request to the service

Monitoring

Metrics

The Document Conversion Service provides metrics that can be used to monitor the performance. The following metrics are available:

  • dcs.render.time: Records the time it took to perform one rendering request as well as a counter for the number of performed requests.

  • dcs.render.errors: A counter for errors that occurred while rendering.

  • dcs.extract.time: Records the time it took to perform one fulltext extraction request as well as a counter for the number of performed requests.

  • dcs.extract.errors: A counter for errors that occurred while performing fulltext extractions.

Each of these metrics contains tags for the source and (if applicable) the target mime type in standard string representation (for example: image/jpeg) and for the size of the original content. The size metric uses size ranges: 0-1MB, 1-10MB, 10-100MB, 100-1000MB and >1000MB.

The recording of those metrics can be disabled by setting the parameters management.metrics.enable.dcs.render or management.metrics.enable.dcs.extract to false.

Open Telemetry

The Document Conversion Service supports the usage of Open Telemetry. Spans are created for the methods of each render plugin. The outermost span will contain the ID and the tenant (if available) of the user who performed the request.