Inference
Description
Use the teklia-qwen predict command to apply a QWEN model on a set of images.
| Parameter | Description | Type | Default |
|---|---|---|---|
|
Path to the QWEN model to use for inference. Can be eiher a full model or an adapter. Should be either a local path or a name from HuggingFace. |
|
|
|
Path(s) to the folder(s) where the images to predict are stored. |
|
|
|
Path to save prediction results in JSON format. |
|
|
|
Path to the file containing the instruction prompt. |
|
|
|
Path to the file containing the custom system prompt. If not set, the default system prompt will be used. |
|
|
|
The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. |
|
|
|
The value used to modulate the next token probabilities. |
|
|
|
The delimiter to parse the model output when in |
|
|
|
The post-processsing method to apply to the predictions. Should be either |
|
|
|
Path to the JSONL files with the labels. This will be used to generate features to train a confidence-score model. |
|
|
|
Path to the external confidence model. |
|
|
|
A list of strings that will trigger the end of the generation. |
|
|
|
Disable 4-bit quantization. Enabled by default. |
|
True |
|
Enable thinking mode. Disabled by default. |
|
False |
Requirements
-
Images should be resized so that their largest size does not exceed 2000 pixels.
-
Inference can run on a single GPU. Here are some GPUs tested and supported by the 7B model:
-
1 x NVIDIA GeForce RTX 3090 Ti GPU
-
1 x NVIDIA A100 GPU
-
1 x NVIDIA V100 GPU
-
Nested entities support
Nested entities are only partially supported at inference. This specific parsing is only available in XML mode. You can use nesting to an infinite depth.
Below are two prediction examples with nested entities:
-
This is fully supported:
<root> <Person> <Firstname>John</Firstname> <Lastname>Doe</Lastname> <Nickname>dit Anonymous</Nickname> </Person> </root> -
This is partially supported:
<root> <Person> <Firstname>John</Firstname> <Lastname>Doe</Lastname> dit <!-- This is the important detail, i.e unnested text --> <Nickname>Anonymous</Nickname> </Person> </root>⚠️ All entities will be properly parsed except
dit, which is not in a nested entity of the same level asJohn,DoeandAnonymous, it will simply be removed from the transcription. No warning will be raised.
Examples
Predict using a full model or an adapter
You can run a basic inference using this model from HuggingFace.
-
Content of
my_query.txt:Extract the firstnames and surnames from this document. Format your answer in a Markdown table.
-
Command to run:
Both full models ( --model-name Qwen/Qwen3-VL-8b-Instruct) and adapters (--model-name my_adapter/) are supported.teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt -
Output:
{ "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": { "raw_output": "Answer: | Firstname | Surname |\n|------|-------|\n| alain | dalmatien |\n| marie | montagne |", "confidence": { "raw": 0.87, "content": null, "structure": null }, "estimated_confidence": null, "parsing_failed": false, "parsed_output": null, "entities": [], } ... }
Predict with a custom system prompt
You can run a more advanced inference using a custom system prompt. This can be useful when predicting with a fine-tuned model, as the system prompt used during training is overwritten during the export.
-
Content of
my_system_prompt.txtYou need to extract information from these French documents. Each image contains a table, and each row of the table contains information about an individual. Here is the information you need to extract for each person: * Firstname (should be capitalized) * Surname (should be capitalized) If the information is missing, put '`N/A`'.
-
Command to run:
teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt -
Output:
{ "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": { "raw_output": "Answer: | firstname | surname |\n|------|-------|\n| Alain | Damasio |\n| Marion | Montaigne |", "confidence": { "raw": 0.97, "content": null, "structure": null }, "estimated_confidence": null, "parsing_failed": false, "parsed_output": null, "entities": [], } ... }
Predict with/without a quantized model
By default, the model will be loaded in 4-bit by unsloth. To disable quantization, use --no-load-in-4bit.
-
Command to run:
Quantization halves VRAM usage: Qwen3-VL-8B-Instruct requires ~9.4 GB with quantization, and ~19.5 GB without quantization. teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --no-load-in-4bit
Predict with structured output
-
Command to run:
teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt --schema my_schema.yaml
Schema format
A schema is a YAML file with the following top-level keys:
| Key | Type | Description |
|---|---|---|
|
|
Name of the schema. Used as the Pydantic model name. |
|
|
If |
|
|
A mapping of field names to their definitions. Set to |
Field definition
Each field under fields supports the following keys:
| Key | Type | Required | Description |
|---|---|---|---|
|
|
Yes |
Type of the field. One of |
|
|
No (default: |
If |
|
|
No |
Description of the field, used as a hint for the model. |
|
|
No |
Only for |
|
|
Yes |
Only for |
Examples
-
To ensure a valid JSON output without any constraint, use the following schema:
name: Generic as_list: false fields: {} -
To return a list of objects, set
as_list: true:name: Generic as_list: true fields: {} -
To define a custom schema, use this template as an example. Set
as_list: trueto return a list ofPerson:name: Person as_list: false fields: name: type: str required: true age: type: int required: false occupation: type: enum values: ["boulanger", "instituteur", "agent de mairie"] has_children: type: bool description: "Cette personne a-t-elle des enfants ?" salary: type: float required: false description: "Salaire mensuel net en euros" date_birth: type: str pattern: ^\d{2}/\d{2}/\d{4}$ description: "Date de naissance en format DD/MM/YYYY"
Predict with constraints to stop the generation
Since QWEN can occasionally hallucinate, there are two ways to control the generation:
-
Limit the number of generated tokens. Use the
--max-new-tokensoption to ensure Qwen only generates up tonnew tokens. -
Stop when specific strings are predicted. Use the
--stop-stringsoption to stop generation once any of the specified strings appears in the output.
You can combine both options: the model will stop as soon as either condition is met.
-
Command to run:
teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt --max-new-tokens 200 \ --stop-strings "\n" "</root>"
In this example:
-
The model stops after generating 200 tokens
-
Or stops earlier as soon as it predicts a newline (
\n) or the closing XML tag (</root>).
Predict with post-processing
Markdown mode
You can also use the --post-process markdown option to parse the predicted Markdown table into a dictionary.
-
Command to run
teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt \ --post-process markdown -
Output
{ "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": { "raw_output": "Answer: | firstname | surname |\n|------|-------|\n| Alain | Damasio |\n| Marion | Montaigne |", "confidence": { "raw": 0.97, "content": 0.96, "structure": 1.0 }, "estimated_confidence": null, "parsing_failed": false, "parsed_output": "Alain Damasio\nMarion Montaigne", "entities": [ { "type": "firstname", "offset": 0, "length": 5, "confidence": 1.0 }, { "type": "surname", "offset": 6, "length": 7, "confidence": 1.0 }, { "type": "firstname", "offset": 14, "length": 6, "confidence": 1.0 }, { "type": "surname", "offset": 21, "length": 9, "confidence": 1.0 } ] } ... }
CSV mode
You can also use the --post-process csv option to parse the predicted CSV string into a dictionary.
-
Command to run
teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt \ --post-process csv \ --delimiter ; -
Output
{ "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": { "raw_output": "firstname ; surname\n Alain ; Damasio \nMarion;Montaigne", "confidence": { "raw": 0.97, "content": 0.96, "structure": 1.0 }, "estimated_confidence": null, "parsing_failed": false, "parsed_output": "Alain Damasio\nMarion Montaigne", "entities": [ { "type": "firstname", "offset": 0, "length": 5, "confidence": 1.0 }, { "type": "surname", "offset": 6, "length": 7, "confidence": 1.0 }, { "type": "firstname", "offset": 14, "length": 6, "confidence": 1.0 }, { "type": "surname", "offset": 21, "length": 9, "confidence": 1.0 } ] } ... }
XML mode
You can also use the --post-process xml option to parse the predicted XML content into a dictionary.
-
Command to run
teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt \ --post-process xml -
Output
{ "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": { "raw_output": "<root><firstname>Alain</firstname> <surname>Damasio</surname></root>", "confidence": { "raw": 0.97, "content": 0.96, "structure": 1.0 }, "estimated_confidence": null, "parsing_failed": false, "parsed_output": "Alain Damasio", "entities": [ { "type": "firstname", "offset": 0, "length": 5, "confidence": 1.0 }, { "type": "surname", "offset": 6, "length": 7, "confidence": 1.0 }, ] } ... }
JSON mode
You can also use the --post-process json option to parse the predicted JSON object into a dictionary.
-
Command to run
teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt \ --post-process json -
Output
{ "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": { "raw_output": "[{\"firstname\": \"Alain\", \"surname\": \"Damasio\"}, {\"firstname\": \"Marion\", \"surname\": \"Montaigne\"}]", "confidence": { "raw": 0.97, "content": 0.96, "structure": 1.0 }, "estimated_confidence": null, "parsing_failed": false, "parsed_output": "Alain Damasio\nMarion Montaigne", "entities": [ { "type": "firstname", "offset": 0, "length": 5, "confidence": 1.0 }, { "type": "surname", "offset": 6, "length": 7, "confidence": 1.0 }, { "type": "firstname", "offset": 14, "length": 6, "confidence": 1.0 }, { "type": "surname", "offset": 21, "length": 9, "confidence": 1.0 } ] } ... }
Predict with temperature scaling
You can also use the --temperature option to modulate the model’s temperature.
For OCR/IE, it is recommended to set a low temperature (between 0 and 0.1).
-
Command to run
teklia-qwen predict --model-name my_local_models/qwen_finetuned/ \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt \ --post-process markdown \ --temperature 0 -
Output
{ "006bb5fa-84eb-4cb9-ae43-a12694e8d99b": { "raw_output": "Answer: | firstname | surname |\n|------|-------|\n| Alain | Damasio |\n| Marion | Montaigne |", "confidence": { "raw": 0.76, "content": 0.71, "structure": 0.81 }, "estimated_confidence": null, "parsing_failed": false, "parsed_output": "Alain Damasio\nMarion Montaigne" "entities": [ { "type": "firstname", "offset": 0, "length": 5, "confidence": 1.0 }, { "type": "surname", "offset": 6, "length": 7, "confidence": 1.0 }, { "type": "firstname", "offset": 14, "length": 6, "confidence": 1.0 }, { "type": "surname", "offset": 21, "length": 9, "confidence": 1.0 } ] } ... }
Predict and generate features
You can use the --labels option to extract features, both about each image and Qwen’s prediction. This argument should point to a JSONL file, holding the ground truth annotations of the images to process.
-
Command to run
teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt \ --max-new-tokens 100 \ --labels labels.jsonl -
Output
One CSV file, one column per feature.
image_h,image_w,aspect_ratio,image_mean_pixel,image_std_pixel,output_mean_softmax,output_mean_top2,output_std_softmax,output_mean_top2,output_std_top2,output_length,target 500.0,750.0,0.6666666666666666,129.5420151111111,36.71494801935272,0.6521164721856683,0.507029977881603,0.269365231353945,0.35904884921690255,248.0,0.8
The CSV will be saved in the same directory as the JSON output file, under the same name.
Predict with a confidence-estimation model
You can use the --confidence-model option to load a trained confidence-estimation model. This argument should point to a folder, holding the files of the model to use.
-
Command to run
teklia-qwen predict --model-name Qwen/Qwen3-VL-8B-Instruct \ --images-dirs images/ \ --output-json results.json \ --query-path my_query.txt \ --system-prompt-path my_system_prompt.txt \ --max-new-tokens 100 \ --confidence-model ./my_confidence_model