Utilities

All these methods are available from the module arkindex_worker.utils.

pluralize

Pluralize a noun, if necessary, using simplified rules of English pluralization and a list of exceptions.

  • str singular A singular noun describing an object

  • int count The object count, to determine whether to pluralize or not

  • ➡️ str The noun in its singular or plural form

>>> pluralize("byte", 1)
"byte"
>>> pluralize("byte", 22)
"bytes"

parse_source_id

Parse a UUID argument (Worker Version, Worker Run, …​) to use it directly in the API.

Arkindex API filters generally expect False to filter manual sources.

>>> parse_source_id("manual")
None
>>> parse_source_id("caf9f53a-9fb6-4e1e-bf3b-9929feb9486e")
"caf9f53a-9fb6-4e1e-bf3b-9929feb9486e"

decompress_zst_archive

Decompress a ZST-compressed tar archive in data dir. The tar archive is not extracted. This returns the path to the archive and the file descriptor.

Beware of closing the file descriptor explicitly or the main process will keep the memory held even if the file is deleted.

  • Path compressed_archive Path to the target ZST-compressed archive

  • ➡️ tuple[int, Path] File descriptor and path to the uncompressed tar archive

>>> fd, tar_path = decompress_zst_archive(Path("model.tar.zst"))
>>> tar_path
PosixPath('/tmp/tmpXXXXXX.tar')
>>> close_delete_file(fd, tar_path)

extract_tar_archive

Extract the tar archive’s content to a specific destination

  • Path archive_path Path to the archive

  • Path destination Path where the archive’s data will be extracted

>>> extract_tar_archive(Path("archive.tar"), Path("/data/output"))

extract_tar_zst_archive

Extract a ZST-compressed tar archive’s content to a specific destination

  • Path compressed_archive Path to the target ZST-compressed archive

  • Path destination Path where the archive’s data will be extracted

  • ➡️ tuple[int, Path] File descriptor and path to the uncompressed tar archive

>>> fd, tar_path = extract_tar_zst_archive(Path("model.tar.zst"), Path("/data/output"))
>>> close_delete_file(fd, tar_path)

close_delete_file

Close the file descriptor of the file and delete the file

  • int file_descriptor File descriptor of the archive

  • Path file_path Path to the archive

>>> fd, tar_path = decompress_zst_archive(Path("archive.tar.zst"))
>>> close_delete_file(fd, tar_path)

zstd_compress

Compress a file using the Zstandard compression algorithm.

  • Path source Path to the file to compress.

  • Path destination Optional path for the created ZSTD archive. A tempfile will be created if this is omitted.

  • ➡️ tuple[int | None, Path, str] The file descriptor (if one was created) and path to the compressed file, hash of its content.

>>> fd, path, checksum = zstd_compress(Path("model.bin"))
>>> path
PosixPath('/tmp/teklia-XXXXXX.tar.zst')
>>> # provide a fixed destination to skip the tempfile
>>> _, path, checksum = zstd_compress(Path("model.bin"), destination=Path("model.bin.zst"))

create_tar_archive

Create a tar archive using the content at specified location.

  • Path path Path to the file to archive

  • Path destination Optional path for the created TAR archive. A tempfile will be created if this is omitted.

  • ➡️ tuple[int | None, Path] The file descriptor (if one was created) and path to the TAR archive.

>>> fd, path = create_tar_archive(Path("my_folder/"))
>>> path
PosixPath('/tmp/teklia-XXXXXX.tar')
>>> # provide a fixed destination to skip the tempfile
>>> _, path = create_tar_archive(Path("my_folder/"), destination=Path("archive.tar"))

create_tar_zst_archive

Helper to create a TAR+ZST archive from a source folder.

  • Path source Path to the folder whose content should be archived.

  • Path destination Path to the created archive, defaults to None. If unspecified, a temporary file will be created.

  • ➡️ tuple[int | None, Path, str] The file descriptor of the created tempfile (if one was created), path to the archive and its hash.

>>> fd, path, checksum = create_tar_zst_archive(Path("my_folder/"))
>>> path
PosixPath('/tmp/teklia-XXXXXX.tar.zst')

create_zip_archive

Helper to create a ZIP archive from a source folder.

  • Path source Path to the folder whose content should be archived.

  • Path destination Path to the created archive, defaults to None. If unspecified, a temporary file will be created.

  • ➡️ tuple[int | None, Path] The file descriptor of the created tempfile (if one was created), path to the archive.

>>> fd, path = create_zip_archive(Path("my_folder/"))
>>> path
PosixPath('/tmp/teklia-XXXXXX.zip')

batch_publication

Decorator for functions that should raise an error when the value passed through the batch_size parameter is not a strictly positive integer.

  • Callable func The function to wrap with the batch_size check

  • ➡️ Callable The function passing the batch_size check

>>> @batch_publication
... def create_elements(self, elements, batch_size=50):
...     ...
>>> create_elements(worker, elements, batch_size=0)
AssertionError: batch_size shouldn't be null and should be a strictly positive integer

make_batches

Split an object list in successive batches of maximum size batch_size.

  • list objects The object list to divide in batches of batch_size size

  • str singular_name The singular form of the noun associated with the object list

  • int batch_size The maximum size of each batch to split the object list

  • ➡️ Generator A generator of successive batches containing batch_size items from objects

>>> items = [1, 2, 3, 4, 5]
>>> for batch in make_batches(items, "item", batch_size=2):
...     print(batch)
[1, 2]
[3, 4]
[5]