Utils Module
AnnDataConverter
Utility class for converting datasets into AnnData or multimodal AnnData dictionaries.
Source code in src/autoencodix/utils/adata_converter.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
dataset_to_adata(datasetcontainer, split='train')
staticmethod
Convert a DatasetContainer split to an AnnData or multimodal AnnData dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasetcontainer
|
DatasetContainer
|
Container holding train/valid/test datasets. |
required |
split
|
Literal['train', 'valid', 'test']
|
The dataset split to convert. Defaults to "train". |
'train'
|
Returns:
| Type | Description |
|---|---|
Optional[Dict[str, AnnData]]
|
A single AnnData object (for NumericDataset) or a dictionary of AnnData objects (for MultiModalDataset). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the specified split does not exist in the DatasetContainer. |
NotImplementedError
|
If the dataset type is not supported. |
Source code in src/autoencodix/utils/adata_converter.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
BulkDataReader
Reads bulk data from files based on configuration.
Supports both paired and unpaired data reading strategies.
Attributes:
| Name | Type | Description |
|---|---|---|
config |
Configuration object |
Source code in src/autoencodix/utils/_bulkreader.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | |
__init__(config)
Initialize the BulkDataReader with a configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
DefaultConfig
|
Configuration object containing data paths and specifications. |
required |
Source code in src/autoencodix/utils/_bulkreader.py
19 20 21 22 23 24 25 | |
read_data()
Read all data according to the configuration.
Returns:
| Type | Description |
|---|---|
Tuple[Dict[str, DataFrame], Dict[str, DataFrame]]
|
A tuple containing (bulk_dataframes, annotation_dataframes) |
Source code in src/autoencodix/utils/_bulkreader.py
27 28 29 30 31 32 33 34 35 36 | |
read_paired_data()
Reads numeric paired data
Returns:
| Type | Description |
|---|---|
Tuple[Dict[str, DataFrame], Dict[str, DataFrame]]
|
Tuple containing two Dicts: 1. with name of the data as key and pandas DataFrame as value 2. with str 'paired' as key and a common annotaion/metadata as DataFrame |
Source code in src/autoencodix/utils/_bulkreader.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |
read_unpaired_data()
Read data without enforcing sample alignment across modalities.
Returns:
| Type | Description |
|---|---|
Tuple[Dict[str, DataFrame], Dict[str, DataFrame]]
|
A tuple containing (bulk_dataframes, annotation_dataframes) |
Source code in src/autoencodix/utils/_bulkreader.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
ImageDataReader
Reads and processes image data.
Reads all images from the specified directory, processes them, and returns a list of ImgData objects.
Source code in src/autoencodix/utils/_imgreader.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 | |
parse_image_to_tensor(image_path, to_h=None, to_w=None)
Reads an image from the given path, optionally resizes it, and converts it to a tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_path
|
Union[str, Path]
|
The path to the image file. |
required |
to_h
|
Optional[int]
|
The desired height of the output tensor, by default None. |
None
|
to_w
|
Optional[int]
|
The desired width of the output tensor, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The processed image as a tensor. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the image path is invalid or the image cannot be read. |
ImageProcessingError
|
If the image format is unsupported or an unexpected error occurs during processing. |
Source code in src/autoencodix/utils/_imgreader.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
read_all_images_from_dir(img_dir, to_h, to_w, annotation_df, is_paired=None)
Reads all images from a specified directory, processes them, returns list of ImgData objects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
img_dir
|
str
|
The directory containing the images. |
required |
to_h
|
Optional[int]
|
The desired height of the output tensors. |
required |
to_w
|
Optional[int]
|
The desired width of the output tensors. |
required |
annotation_df
|
DataFrame
|
DataFrame containing image annotations. |
required |
is_paired
|
Union[bool, None]
|
Whether the images are paired with annotations. |
None
|
Returns:
| Type | Description |
|---|---|
List[ImgData]
|
List of processed image data objects. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the annotation DataFrame is missing required columns. |
Source code in src/autoencodix/utils/_imgreader.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 | |
read_annotation_file(data_info)
Reads annotation file and returns DataFrame with file contents Args: data_info: specific part of the Configuration object for input data Returns: DataFrame with annotation data.
Source code in src/autoencodix/utils/_imgreader.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
read_data(config)
Read image data from the specified directory based on configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
DefaultConfig
|
The configuration object containing the data configuration. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, List[ImgData]]
|
A Tuple of Dicts: |
Dict[str, DataFrame]
|
|
Tuple[Dict[str, List[ImgData]], Dict[str, DataFrame]]
|
|
Raises:
| Type | Description |
|---|---|
Exception
|
If no image data is found in the configuration or other validation errors occur. |
Source code in src/autoencodix/utils/_imgreader.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | |
validate_image_path(image_path)
Checks if file extension is allowed:
Allowed are (independent of capitalization): - jpg - jpeg - png - tif - tiff
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_path
|
Union[str, Path]
|
path or str of image to read |
required |
Source code in src/autoencodix/utils/_imgreader.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
ModelOutput
dataclass
A structured output dataclass for autoencoder models.
This class is used to encapsulate the outputs of autoencoder models in a consistent format, allowing for flexibility in the type of outputs returned by different architectures.
Attributes:
| Name | Type | Description |
|---|---|---|
reconstruction |
Tensor
|
The reconstructed input data. |
latent_mean |
Optional[Tensor]
|
The mean of the latent space distribution, applicable for models like VAEs. |
latent_logvar |
Optional[Tensor]
|
The log variance of the latent space distribution, applicable for models like VAEs. |
additional_info |
Optional[dict]
|
A dictionary to store any additional information or intermediate outputs. |
Source code in src/autoencodix/utils/_model_output.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
Result
dataclass
A dataclass to store results from the pipeline with predefined keys.
Attributes:
| Name | Type | Description |
|---|---|---|
latentspaces |
TrainingDynamics
|
TrainingDynamics object storing latent space representations for 'train', 'valid', and 'test' splits. |
sample_ids |
TrainingDynamics
|
TrainingDynamics object storing sample identifiers for 'train', 'valid', and 'test' splits. |
reconstructions |
TrainingDynamics
|
TrainingDynamics object storing reconstructed outputs for 'train', 'valid', and 'test' splits. |
mus |
TrainingDynamics
|
TrainingDynamics object storing mean values of latent distributions for 'train', 'valid', and 'test' splits. |
sigmas |
TrainingDynamics
|
TrainingDynamics object storing standard deviations of latent distributions for 'train', 'valid', and 'test' splits. |
losses |
TrainingDynamics
|
TrainingDynamics object storing the total loss for different epochs and splits ('train', 'valid', 'test'). |
sub_losses |
LossRegistry
|
LossRegistry object (extendable) for all sublosses. |
preprocessed_data |
Tensor
|
torch.Tensor containing data after preprocessing. |
model |
Union[Dict[str, Module], Module]
|
final trained torch.nn.Module model. |
model_checkpoints |
TrainingDynamics
|
TrainingDynamics object storing model state at each checkpoint. |
datasets |
Optional[DatasetContainer]
|
Optional[DatasetContainer] containing train, valid, and test datasets. |
new_datasets |
Optional[DatasetContainer]
|
Optional[DatasetContainer] containing new train, valid, and test datasets. |
adata_latent |
Optional[AnnData]
|
Optional[AnnData] containing latent representations as AnnData. |
final_reconstruction |
Optional[Union[DataPackage, MuData]]
|
Optional[Union[DataPackage, MuData]] containing final reconstruction results. |
sub_results |
Optional[Dict[str, Any]]
|
Optional[Dict[str, Any]] containing sub-results for multi-task or multi-modal models. |
sub_reconstructions |
Optional[Dict[str, Any]]
|
Optional[Dict[str, Any]] containing sub-reconstructions for multi-task or multi-modal models. |
embedding_evaluation |
DataFrame
|
pd.DataFrame containing embedding evaluation results. |
Source code in src/autoencodix/utils/_result.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 | |
__getitem__(key)
Retrieve the value associated with a specific key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
The name of the attribute to retrieve. |
required |
Returns: The value of the specified attribute. Raises: KeyError - If the key is not a valid attribute of the Results class.
Source code in src/autoencodix/utils/_result.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
__repr__()
Return the same representation as str for consistency.
Source code in src/autoencodix/utils/_result.py
308 309 310 | |
__setitem__(key, value)
Assign a value to a specific attribute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
The name of the attribute to set. |
required |
value
|
Any
|
The value to assign to the attribute. |
required |
Raises: KeyError: If the key is not a valid attribute of the Results class.
Source code in src/autoencodix/utils/_result.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
__str__()
Provide a readable string representation of the Result object's public attributes.
Returns:
| Type | Description |
|---|---|
str
|
Formatted string showing all public attributes and their values |
Source code in src/autoencodix/utils/_result.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | |
get_latent_df(epoch, split, modality=None)
Return latent representations as a DataFrame.
Retrieves latent vectors and their corresponding sample IDs for a given epoch and data split. If a specific modality is provided, the results are restricted to that modality. Column names are inferred from model ontologies if available; otherwise, generic latent dimension labels are used.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
epoch
|
int
|
The epoch number to retrieve latents from. |
required |
split
|
str
|
The dataset split to query (e.g., "train", "valid", "test"). |
required |
modality
|
Optional[str]
|
Optional modality name to filter the latents and sample IDs. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame where rows correspond to samples, columns represent latent |
DataFrame
|
dimensions, and the index contains sample IDs. |
Source code in src/autoencodix/utils/_result.py
312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 | |
get_reconstructions_df(epoch, split, modality=None)
Return reconstructions as a DataFrame.
Retrieves reconstructed features and their corresponding sample IDs for a given epoch and data split. If a specific modality is provided, the results are restricted to that modality. Column names are based on the dataset's feature identifiers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
epoch
|
int
|
The epoch number to retrieve reconstructions from. |
required |
split
|
str
|
The dataset split to query (e.g., "train", "valid", "test"). |
required |
modality
|
Optional[str]
|
Optional modality name to filter the reconstructions and sample IDs. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame where rows correspond to samples, columns represent |
DataFrame
|
reconstructed features, and the index contains sample IDs. |
Source code in src/autoencodix/utils/_result.py
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 | |
update(other)
Update the current Result object with values from another Result object.
For TrainingDynamics, merges the data across epochs and splits and overwrites if already exists. For all other attributes, replaces the current value with the other value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Result
|
The Result object to update from. |
required |
Raises: TypeError: If the input object is not a Result instance
Source code in src/autoencodix/utils/_result.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
SingleCellDataReader
Reader for multi-modal single-cell data.
Source code in src/autoencodix/utils/_screader.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
read_data(config)
staticmethod
Read multiple single-cell modalities into MuData object(s).
Args: config: Configuration object containing data paths and parameters.
Returns:
| Type | Description |
|---|---|
Dict[str, MuData]
|
For non-paired translation: Dict of Dicts with {'multi_sc': DataDict} as outer dict and with modalty keys and mudata obj as inner dict. |
Dict[str, MuData]
|
For paired translation and non translation cases: dict with "multi_sc" as key and mudata as value |
Source code in src/autoencodix/utils/_screader.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |