Utilities
All these methods are available from the module arkindex_worker.utils.
pluralize
Pluralize a noun, if necessary, using simplified rules of English pluralization and a list of exceptions.
-
str singularA singular noun describing an object -
int countThe object count, to determine whether to pluralize or not -
➡️
strThe noun in its singular or plural form
>>> pluralize("byte", 1)
"byte"
>>> pluralize("byte", 22)
"bytes"
parse_source_id
Parse a UUID argument (Worker Version, Worker Run, …) to use it directly in the API.
Arkindex API filters generally expect False to filter manual sources.
>>> parse_source_id("manual")
None
>>> parse_source_id("caf9f53a-9fb6-4e1e-bf3b-9929feb9486e")
"caf9f53a-9fb6-4e1e-bf3b-9929feb9486e"
decompress_zst_archive
Decompress a ZST-compressed tar archive in data dir. The tar archive is not extracted. This returns the path to the archive and the file descriptor.
Beware of closing the file descriptor explicitly or the main process will keep the memory held even if the file is deleted.
-
Path compressed_archivePath to the target ZST-compressed archive -
➡️
tuple[int, Path]File descriptor and path to the uncompressed tar archive
>>> fd, tar_path = decompress_zst_archive(Path("model.tar.zst"))
>>> tar_path
PosixPath('/tmp/tmpXXXXXX.tar')
>>> close_delete_file(fd, tar_path)
extract_tar_archive
Extract the tar archive’s content to a specific destination
-
Path archive_pathPath to the archive -
Path destinationPath where the archive’s data will be extracted
>>> extract_tar_archive(Path("archive.tar"), Path("/data/output"))
extract_tar_zst_archive
Extract a ZST-compressed tar archive’s content to a specific destination
-
Path compressed_archivePath to the target ZST-compressed archive -
Path destinationPath where the archive’s data will be extracted -
➡️
tuple[int, Path]File descriptor and path to the uncompressed tar archive
>>> fd, tar_path = extract_tar_zst_archive(Path("model.tar.zst"), Path("/data/output"))
>>> close_delete_file(fd, tar_path)
close_delete_file
Close the file descriptor of the file and delete the file
-
int file_descriptorFile descriptor of the archive -
Path file_pathPath to the archive
>>> fd, tar_path = decompress_zst_archive(Path("archive.tar.zst"))
>>> close_delete_file(fd, tar_path)
zstd_compress
Compress a file using the Zstandard compression algorithm.
-
Path sourcePath to the file to compress. -
Path destinationOptional path for the created ZSTD archive. A tempfile will be created if this is omitted. -
➡️
tuple[int | None, Path, str]The file descriptor (if one was created) and path to the compressed file, hash of its content.
>>> fd, path, checksum = zstd_compress(Path("model.bin"))
>>> path
PosixPath('/tmp/teklia-XXXXXX.tar.zst')
>>> # provide a fixed destination to skip the tempfile
>>> _, path, checksum = zstd_compress(Path("model.bin"), destination=Path("model.bin.zst"))
create_tar_archive
Create a tar archive using the content at specified location.
-
Path pathPath to the file to archive -
Path destinationOptional path for the created TAR archive. A tempfile will be created if this is omitted. -
➡️
tuple[int | None, Path]The file descriptor (if one was created) and path to the TAR archive.
>>> fd, path = create_tar_archive(Path("my_folder/"))
>>> path
PosixPath('/tmp/teklia-XXXXXX.tar')
>>> # provide a fixed destination to skip the tempfile
>>> _, path = create_tar_archive(Path("my_folder/"), destination=Path("archive.tar"))
create_tar_zst_archive
Helper to create a TAR+ZST archive from a source folder.
-
Path sourcePath to the folder whose content should be archived. -
Path destinationPath to the created archive, defaults to None. If unspecified, a temporary file will be created. -
➡️
tuple[int | None, Path, str]The file descriptor of the created tempfile (if one was created), path to the archive and its hash.
>>> fd, path, checksum = create_tar_zst_archive(Path("my_folder/"))
>>> path
PosixPath('/tmp/teklia-XXXXXX.tar.zst')
create_zip_archive
Helper to create a ZIP archive from a source folder.
-
Path sourcePath to the folder whose content should be archived. -
Path destinationPath to the created archive, defaults to None. If unspecified, a temporary file will be created. -
➡️
tuple[int | None, Path]The file descriptor of the created tempfile (if one was created), path to the archive.
>>> fd, path = create_zip_archive(Path("my_folder/"))
>>> path
PosixPath('/tmp/teklia-XXXXXX.zip')
batch_publication
Decorator for functions that should raise an error when the value passed through the batch_size parameter is not a strictly positive integer.
-
Callable funcThe function to wrap with thebatch_sizecheck -
➡️
CallableThe function passing thebatch_sizecheck
>>> @batch_publication
... def create_elements(self, elements, batch_size=50):
... ...
>>> create_elements(worker, elements, batch_size=0)
AssertionError: batch_size shouldn't be null and should be a strictly positive integer
make_batches
Split an object list in successive batches of maximum size batch_size.
-
list objectsThe object list to divide in batches ofbatch_sizesize -
str singular_nameThe singular form of the noun associated with the object list -
int batch_sizeThe maximum size of each batch to split the object list -
➡️
GeneratorA generator of successive batches containingbatch_sizeitems fromobjects
>>> items = [1, 2, 3, 4, 5]
>>> for batch in make_batches(items, "item", batch_size=2):
... print(batch)
[1, 2]
[3, 4]
[5]