Skip to main content

Setting up a workflow

When predictions are missing or have low confidence, a human in the loop may be needed. In this guide we will cover the following topics:

Creating a workflow

First we will learn how to deploy and configure a default workflow for a model.

Starting a workflow execution

Then we will show how the workflow can be executed using Cradl's CLI and SDKs.

Exporting documents

... and finally how documents can be exported by configuring a webhook.

Cradl Web App

Prerequisites

In order to follow this guide, make sure that you have:

  1. Installed the command-line interface (CLI)
  2. Created and downloaded API credentials
  3. Created a model

Only available for paid plans

You must be on one of the paid plans in order to use the workflow functionality.

Creating a workflow

Now we will set up a default workflow for a model. Before we can do that, we need to find the ID of the model we want to set up a workflow for. The model ID can be copied from the Overview-tab of your model, or you can use the CLI or SDKs:

$ las models list 
{
"models": [
{
"modelId": "las:model:<my-model-id>",
"name": "My invoice model"
"description": "A brand new model for reading invoices",
...
},
]
}

Now that we have our model ID, let's create a workflow:

$ las workflows create-default --from-model-id las:model:<my-model-id> 

Creating secrets ... Done.
Creating assets ... Done.
Creating datasets ... Done.
Creating transitions ... Done.
Creating workflow ... Done.

{
"workflowId": "las:workflow:<my-workflow-id>"
"createdTime": "foobar",
...
}

In the next section, we'll take a closer look at what our auto-generated workflow does.

info

Auto-generated workflows are currently only supported by the CLI.

Workflows, transitions and executions

A workflow is defined by a series of transitions which mutate the state of your workflow execution. There are two types of transitions; manual transition which mutates the state based on input from a user, and docker transition which mutates the state programatically in a Docker-container. At each step in the workflow, the current state is given as input to the transition, and the output of the transition is the new state. When a new workflow execution is created, an initial state is provided.

Starting a workflow execution

Let's test our new workflow by creating a new workflow execution. The initial state of the workflow executions is provided as a JSON object, and the workflow we just generated assumes that the initial state is on the form {"documentId": "las:document:<document-id>"}.

$ las create document mydocument.pdf > input.json

$ las workflows execute las:workflow:<my-workflow-id> input.json
{
"workflowId": "las:workflow:<my-workflow-id>"
"executionId": "las:workflow-execution:<my-execution-id>"
...
}

Now that we have created a workflow execution, the following steps are executed:

  1. The initial state will be provided as input to the first transition (Preprocess). The transition will create a Prediction on the provided document.
  2. If the confidence of any field predicted is below certain thresholds, a manual transition will be invoked so that an end user can validate that the predictions are correct.
  3. In the last transition (Postprocess), the _ground truth of the document is updated and the document is assigned to a Dataset so that it can be used for training. This transition is also responsible for exporting the document.

Exporting documents

To customize the exporting functionality, we have two options:

  1. We can use one of the default export options (webhooks or file export)
  2. Write a custom Docker image

In this section, we will cover how to use the default export options. Before we get started, make sure to find the ID of the Postprocess-transition:

$ las transitions list
[
{
"name": "Postprocess transition for workflow [..]",
"transitionId": "las:transition:<transition-id>",
...
}
}

Alternative 1: Configuring a webhook

In order to configure the Postprocess-transition to use a webhook, we need to set the environment variable WEBHOOK_URI on our Postprocess-transition. Updating environment variables will overwrite any exising environment variables that are set, so make sure to include them as well:

$ las transitions get las:transition:<transition-id>
{
"transitionId": "las:transition:<transition-id>",
"name": "Postprocess transition for workflow [...]",
"transitionType": "docker",
...
"parameters": {
"environment": {
"DATASET_ID": "las:dataset:<dataset-id>",
"FIELD_CONFIG_ASSET_ID": "las:asset:<asset-id>",
"MODEL_ID": "las:model:<model-id>"
},
...
},
...
}

Copy the old environment variables plus our new environment variable to a new file called env.json:

$ echo '{
"DATASET_ID": "las:dataset:<dataset-id>",
"FIELD_CONFIG_ASSET_ID": "las:asset:<asset-id>",
"MODEL_ID": "las:model:<model-id>",
"WEBHOOK_URI": "https://my.webhook.com/a"
}' > env.json

Now we update the environment variables for the Postprocess-transition:

$ las transitions update las:transitions:<transition-id> --environment env.json

The next time the Postprocess-transition is run, the result of the workflow execution will be posted to the specified URL.

Alternative 2: File export using SSH/SCP

Coming soon.

Reference