Inference

Description

Use the teklia-qwen predict command to apply a QWEN model on a set of images.

Parameter Description Type Default

--model-name

Path to the QWEN model to use for inference. Should be either a local path or a name from HuggingFace.

str

--adapter-name

Path to the adapter model to use for inference. Should be either a local path or a name from HuggingFace.

str

None

--images-dir

Path to the folder where the images to predict are stored.

pathlib.Path

--output-json

Path to save prediction results in JSON format.

pathlib.Path

Path("results.json")

--query-path

Path to the file containing the instruction prompt.

pathlib.Path

Path("query.txt")

--system-prompt-path

Path to the file containing the custom system prompt. If not set, the default system prompt will be used.

Optional[pathlib.Path]

None

--max-new-tokens

The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.

int

2048

--temperature

The value used to modulate the next token probabilities.

float

1.0

--delimiter

The delimiter to parse the model output when in "csv" mode.

str

","

--post-process

The post-processsing method to apply to the predictions. Should be either "default", "json", "markdown", "csv" or "xml".

Mode

"default"

--attention-map

Whether to plot the attention GIF.

bool

False

--font-path

Path to the font used to write the text in the attention GIF.

pathlib.Path

Path("fonts/LinuxLibertine.ttf")

--labels

Path to the JSONL files with the labels. This will be used to generate features to train a confidence-score model.

pathlib.Path

--confidence-model

Path to the external confidence model.

pathlib.Path

--stop-strings

A list of strings that will trigger the end of the generation.

List[str]

None

Requirements

  • Images should be resized so that their largest size does not exceed 2000 pixels.

  • Inference can run on a single GPU. Here are some GPUs tested and supported by the 7B model:

    • 1 x NVIDIA GeForce RTX 3090 Ti GPU

    • 1 x NVIDIA A100 GPU

    • 1 x NVIDIA V100 GPU

Nested entities support

Nested entities are only partially supported at inference. This specific parsing is only available in XML mode. You can use nesting to an infinite depth.

Below are two prediction examples with nested entities:

  • This is fully supported:

    <root>
      <Person>
        <Firstname>John</Firstname>
        <Lastname>Doe</Lastname>
        <Nickname>dit Anonymous</Nickname>
      </Person>
    </root>
  • This is partially supported:

    <root>
      <Person>
        <Firstname>John</Firstname>
        <Lastname>Doe</Lastname>
        dit <!-- This is the important detail, i.e unnested text -->
        <Nickname>Anonymous</Nickname>
      </Person>
    </root>

    ⚠️ All entities will be properly parsed except dit, which is not in a nested entity of the same level as John, Doe and Anonymous, it will simply be removed from the transcription. No warning will be raised.

Examples

Predict using the base model

You can run a basic inference using this model from HuggingFace.

  • Content of my_query.txt:

    Extract the firstnames and surnames from this document. Format your answer in a Markdown table.
  • Command to run:

    teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt
  • Output:

    {
      "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": {
        "raw_output": "Answer: | Firstname | Surname |\n|------|-------|\n| alain | dalmatien |\n| marie | montagne |",
        "confidence": {
          "raw": 0.87,
          "content": null,
          "structure": null
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": null,
        "entities": [],
        "attention": null
      }
    ...
    }

Predict using an adapter

You can run a basic inference using this model from HuggingFace and a fine-tuned adapter.

  • Content of my_query.txt:

    Extract the firstnames and surnames from this document. Format your answer in a Markdown table.
  • Command to run:

    teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \
      --adapter-name path/to/adapter/folder/ \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt
  • Output:

    {
      "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": {
        "raw_output": "Answer: | Firstname | Surname |\n|------|-------|\n| alain | dalmatien |\n| marie | montagne |",
        "confidence": {
          "raw": 0.87,
          "content": null,
          "structure": null
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": null,
        "entities": [],
        "attention": null
      }
    ...
    }

Predict with a custom system prompt

You can run a more advanced inference using a custom system prompt. This can be useful when predicting with a fine-tuned model, as the system prompt used during training is overwritten during the export.

  • Content of my_system_prompt.txt

You need to extract information from these French documents. Each image contains a table, and each row of the table contains information about an individual.

Here is the information you need to extract for each person:

* Firstname (should be capitalized)
* Surname (should be capitalized)

If the information is missing, put '`N/A`'.
  • Command to run:

teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \
    --images-dir images/ \
    --output-json results.json \
    --query-path my_query.txt \
    --system-prompt-path my_system_prompt.txt
  • Output:

    {
      "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": {
        "raw_output": "Answer: | firstname | surname |\n|------|-------|\n| Alain | Damasio |\n| Marion | Montaigne |",
        "confidence": {
          "raw": 0.97,
          "content": null,
          "structure": null
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": null,
        "entities": [],
        "attention": null
      }
    ...
    }

Predict with constraints to stop the generation

Since QWEN can occasionally hallucinate, there are two ways to control the generation:

  • Limit the number of generated tokens. Use the --max-new-tokens option to ensure Qwen only generates up to n new tokens.

  • Stop when specific strings are predicted. Use the --stop-strings option to stop generation once any of the specified strings appears in the output.

You can combine both options: the model will stop as soon as either condition is met.

  • Command to run:

teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \
    --images-dir images/ \
    --output-json results.json \
    --query-path my_query.txt \
    --system-prompt-path my_system_prompt.txt
    --max-new-tokens 200 \
    --stop-strings "\n" "</root>"
  • Output:

In this example:

  • The model stops after generating 200 tokens

  • Or stops earlier as soon as it predicts a newline (\n) or the closing XML tag (</root>).

Predict with post-processing

Markdown mode

You can also use the --post-process markdown option to parse the predicted Markdown table into a dictionary.

  • Command to run

    teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt \
      --system-prompt-path my_system_prompt.txt \
      --post-process markdown
  • Output

    {
      "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": {
        "raw_output": "Answer: | firstname | surname |\n|------|-------|\n| Alain | Damasio |\n| Marion | Montaigne |",
        "confidence": {
          "raw": 0.97,
          "content": 0.96,
          "structure": 1.0
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": "Alain Damasio\nMarion Montaigne",
        "entities": [
          {
            "type": "firstname",
            "offset": 0,
            "length": 5,
            "confidence": 1.0
          },
          {
            "type": "surname",
            "offset": 6,
            "length": 7,
            "confidence": 1.0
          },
          {
            "type": "firstname",
            "offset": 14,
            "length": 6,
            "confidence": 1.0
          },
          {
            "type": "surname",
            "offset": 21,
            "length": 9,
            "confidence": 1.0
          }
        ],
        "attention": null
      }
    ...
    }

CSV mode

You can also use the --post-process csv option to parse the predicted CSV string into a dictionary.

  • Command to run

    teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt \
      --system-prompt-path my_system_prompt.txt \
      --post-process csv \
      --delimiter ;
  • Output

    {
      "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": {
        "raw_output": "firstname ; surname\n Alain ; Damasio \nMarion;Montaigne",
        "confidence": {
          "raw": 0.97,
          "content": 0.96,
          "structure": 1.0
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": "Alain Damasio\nMarion Montaigne",
        "entities": [
          {
              "type": "firstname",
              "offset": 0,
              "length": 5,
              "confidence": 1.0
          },
          {
              "type": "surname",
              "offset": 6,
              "length": 7,
              "confidence": 1.0
          },
          {
              "type": "firstname",
              "offset": 14,
              "length": 6,
              "confidence": 1.0
          },
          {
              "type": "surname",
              "offset": 21,
              "length": 9,
              "confidence": 1.0
          }
        ],
        "attention": null
      }
    ...
    }

XML mode

You can also use the --post-process xml option to parse the predicted XML content into a dictionary.

  • Command to run

    teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt \
      --system-prompt-path my_system_prompt.txt \
      --post-process xml
  • Output

    {
      "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": {
        "raw_output": "<root><firstname>Alain</firstname> <surname>Damasio</surname></root>",
        "confidence": {
          "raw": 0.97,
          "content": 0.96,
          "structure": 1.0
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": "Alain Damasio",
        "entities": [
          {
            "type": "firstname",
            "offset": 0,
            "length": 5,
            "confidence": 1.0
          },
          {
            "type": "surname",
            "offset": 6,
            "length": 7,
            "confidence": 1.0
          },
        ],
        "attention": null
      }
    ...
    }

JSON mode

You can also use the --post-process json option to parse the predicted JSON object into a dictionary.

  • Command to run

    teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt \
      --system-prompt-path my_system_prompt.txt \
      --post-process json
  • Output

    {
      "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": {
        "raw_output": "[{\"firstname\": \"Alain\", \"surname\": \"Damasio\"}, {\"firstname\": \"Marion\", \"surname\": \"Montaigne\"}]",
        "confidence": {
          "raw": 0.97,
          "content": 0.96,
          "structure": 1.0
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": "Alain Damasio\nMarion Montaigne",
        "entities": [
          {
            "type": "firstname",
            "offset": 0,
            "length": 5,
            "confidence": 1.0
          },
          {
            "type": "surname",
            "offset": 6,
            "length": 7,
            "confidence": 1.0
          },
          {
              "type": "firstname",
              "offset": 14,
              "length": 6,
              "confidence": 1.0
          },
          {
              "type": "surname",
              "offset": 21,
              "length": 9,
              "confidence": 1.0
          }
        ],
        "attention": null
      }
    ...
    }

Predict with temperature scaling

You can also use the --temperature option to modulate the confidence score.

  • Command to run

    teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt \
      --system-prompt-path my_system_prompt.txt \
      --post-process markdown \
      --temperature 2.0
  • Output

    {
      "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": {
        "raw_output": "Answer: | firstname | surname |\n|------|-------|\n| Alain | Damasio |\n| Marion | Montaigne |",
        "confidence": {
          "raw": 0.76,
          "content": 0.71,
          "structure": 0.81
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": "Alain Damasio\nMarion Montaigne"
        "entities": [
          {
            "type": "firstname",
            "offset": 0,
            "length": 5,
            "confidence": 1.0
          },
          {
            "type": "surname",
            "offset": 6,
            "length": 7,
            "confidence": 1.0
          },
          {
            "type": "firstname",
            "offset": 14,
            "length": 6,
            "confidence": 1.0
          },
          {
            "type": "surname",
            "offset": 21,
            "length": 9,
            "confidence": 1.0
          }
        ],
        "attention": null
      }
    ...
    }

Predict and visualize attention

You can use the --attention-map option to visualize QWEN’s attention map.

  • Command to run

    teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt \
      --system-prompt-path my_system_prompt.txt \
      --max-new-tokens 100 \
      --attention-map
  • Output

    {
      "rimes": {
        "raw_output": "BONNAUD Richard\n6 Rue des Mar\u00e9chaux\n57 380 FAULQUEMONT\nT\u00e9l: 03.61.98.54.36\n\nFaulquemont, le 8/3/2007\n\nNAIF Assurances\nAU BOURG\n40300 LABATUT\n\nObjet : R\u00e9siliation d assurance habitation\nR\u00e9f\u00e9rence client : DPUET3",
        "confidence": {
          "raw": 0.92,
          "content": null,
          "structure": null
        },
        "estimated_confidence": null,
        "parsing_failed": false,
        "parsed_output": null,
        "entities": [],
        "attention": "images/rimes.gif"
      }
    }

The GIF will be located in the same directory as the input image: attention map

Predict and generate features

You can use the --labels option to extract features, both about each image and Qwen’s prediction. This argument should point to a JSONL file, holding the ground truth annotations of the images to process.

  • Command to run

    teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt \
      --system-prompt-path my_system_prompt.txt \
      --max-new-tokens 100 \
      --labels labels.jsonl
  • Output

One CSV file, one column per feature.

image_h,image_w,aspect_ratio,image_mean_pixel,image_std_pixel,output_mean_softmax,output_mean_top2,output_std_softmax,output_mean_top2,output_std_top2,output_length,target
500.0,750.0,0.6666666666666666,129.5420151111111,36.71494801935272,0.6521164721856683,0.507029977881603,0.269365231353945,0.35904884921690255,248.0,0.8

The CSV will be saved in the same directory as the JSON output file, under the same name.

Predict with a confidence-estimation model

You can use the --confidence-model option to load a trained confidence-estimation model. This argument should point to a folder, holding the files of the model to use.

  • Command to run

    teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \
      --images-dir images/ \
      --output-json results.json \
      --query-path my_query.txt \
      --system-prompt-path my_system_prompt.txt \
      --max-new-tokens 100 \
      --confidence-model ./my_confidence_model