fsml.learn.data_management package

Submodules

fsml.learn.data_management.dataloader module

class fsml.learn.data_management.dataloader.FSMLDataLoader(dataset: FSMLMeanStdDataset | FSMLOneMeanStdDataset, batch_size: int = 1, shuffle: bool = True, follow_batch: List[str] | None = None, exclude_keys: List[str] | None = None, **kwargs)

Bases: DataLoader

Just a custom dataloader for our dataset

fsml.learn.data_management.dataloader.collate(_batch: Generic[T]) Tuple[Tensor, Tensor]

A function that replace the default collate function of the DataLoader. This function takes as input a batch of any size and creates two tensor by just vstacking all the input datas and the output datas.

Parameters:

_batch – The current given as input by the dataloader

Returns:

(Input data, Output data)

fsml.learn.data_management.dataset module

class fsml.learn.data_management.dataset.FSMLMeanStdDataset(data_path: str, mixup: bool = False, train: float = 0.8, test: float = 0.2)

Bases: Dataset

A class for the FSMLMeanStdDataset

This dataset represent a mix of different dataset, where each of the single dataset is given by a CSV file contained in the input data_path. This dataset can be used to train a number of different models, one per dataset, or a single model on the entire dataset, where the entire dataset merge all the N dataset contained. This mixing is given by a parameter called mixup. In the case mixup is true then the method __getitem__ will return a mixture of samples.

Attributes

filesList[str]

A list with all the files

num_filesint

The total number of files in the folder

count_per_fileDict[str, int]

Count the total number of elements for that file

train_datasetsList[FSMLOneMeanStdDataset]

A list with all the dataset for training

test_datasetsList[FSMLOneMeanStdDataset]

A list with all the dataset for testing

max_parametersint

The number of maximum parameters overall (from the different dataset)

max_outputsint

The number of maximum output shape overall (from the different dataset)

total_dataint

The total amount of data mixing all the datasets

train_sizeint

The total amount of training data

test_sizeint

The total amount of test data

is_trainbool

True then use the training data, False use test data

test() None

Set the dataset for testing instead of training

class fsml.learn.data_management.dataset.FSMLOneMeanStdDataset(csv_file: str, train: float = 0.8, test: float = 0.2)

Bases: Dataset

This class represent a FSMLOneMeanStdDataset.

A class to represent the dataset of all the simulations. Essentially the dataset is structured such that, on request (i.e. indexing) or on iteration it returns a couple of elements, where the first element represent the input of the neural network, while the second will be the ground trouth that will be used to compare the output of the NN against it and thus compute the loss and update the optimizer.

The dataset is initialized with just the fully qualified path that points to the folder containing the result of all the simulations that have been performed previously. In that folder there would be a bunch of CSV files that contains a single row for each simulation of the same model, and a bunch of columns (with a lowercase name) representing the parameters of the model, and a bunch of (uppercase) columns representing the mean and the standard deviation of species.

The on indexing, i.e. call the __getitem__() method with an index i as input the dataset will returns a tuple with just two elements, where the first are the input of the neural networks and are composed by the parameters of the model, while the second represents the ground trouth values is composed by the mean and the standard deviations.

Attributes

csv_filestr

the input path of the CSV file with the data

input_dataList[List[float]]

A list with all the input data (parameters)

input_sizeint

The size of the input data (i.e. the number points in a single tensor)

output_dataList[List[float]]

A list with all the output data (mean and std for each specie)

output_sizeint

The size of the output data (i.e. the number points in a single tensor)

num_dataint

The total number of samples

parametersList[str]

A list with all parameters name

outputsList[str]

A list with all the output names

train_dataTuple[List[List[float]], List[float]]

A list with all the train data

test_dataTuple[List[List[float]], List[float]]

A list with all the test data

test() None

Set the dataset for testing

fsml.learn.data_management.dataset.get_dataset_by_indices(src_dataset: FSMLOneMeanStdDataset, train_ids: List[int], test_ids: List[int]) FSMLOneMeanStdDataset

From an input already existing FSMLOneMeanStdDataset create a new dataset of the same type, by copying it, but selecting only a portion of the train and test set. This portion is identified by the input train indexes and test indexes.

Parameters:
  • src_dataset – The input already existing dataset

  • train_ids – The indexes for the new train set

  • test_ids – The indexes for the new test set

Returns:

a new dataset with “filtered” train and test set

Module contents