fsml.learn.data_management package

Submodules

fsml.learn.data_management.dataloader module

class fsml.learn.data_management.dataloader.FSMLDataLoader(dataset: FSMLMeanStdDataset | FSMLOneMeanStdDataset, batch_size: int = 1, shuffle: bool = True, follow_batch: List[str] | None = None, exclude_keys: List[str] | None = None, **kwargs)

Bases: DataLoader

Just a custom dataloader for our dataset

fsml.learn.data_management.dataloader.collate(_batch: Generic[T]) → Tuple[Tensor, Tensor]

A function that replace the default collate function of the DataLoader. This function takes as input a batch of any size and creates two tensor by just vstacking all the input datas and the output datas.

Parameters:: _batch – The current given as input by the dataloader
Returns:: (Input data, Output data)

fsml.learn.data_management.dataset module

class fsml.learn.data_management.dataset.FSMLMeanStdDataset(data_path: str, mixup: bool = False, train: float = 0.8, test: float = 0.2)

Bases: Dataset

A class for the FSMLMeanStdDataset

This dataset represent a mix of different dataset, where each of the single dataset is given by a CSV file contained in the input data_path. This dataset can be used to train a number of different models, one per dataset, or a single model on the entire dataset, where the entire dataset merge all the N dataset contained. This mixing is given by a parameter called mixup. In the case mixup is true then the method __getitem__ will return a mixture of samples.

Attributes

filesList[str]: A list with all the files
num_filesint: The total number of files in the folder
count_per_fileDict[str, int]: Count the total number of elements for that file
train_datasetsList[FSMLOneMeanStdDataset]: A list with all the dataset for training
test_datasetsList[FSMLOneMeanStdDataset]: A list with all the dataset for testing
max_parametersint: The number of maximum parameters overall (from the different dataset)
max_outputsint: The number of maximum output shape overall (from the different dataset)
total_dataint: The total amount of data mixing all the datasets
train_sizeint: The total amount of training data
test_sizeint: The total amount of test data
is_trainbool: True then use the training data, False use test data

test() → None: Set the dataset for testing instead of training

class fsml.learn.data_management.dataset.FSMLOneMeanStdDataset(csv_file: str, train: float = 0.8, test: float = 0.2)

Bases: Dataset

This class represent a FSMLOneMeanStdDataset.

A class to represent the dataset of all the simulations. Essentially the dataset is structured such that, on request (i.e. indexing) or on iteration it returns a couple of elements, where the first element represent the input of the neural network, while the second will be the ground trouth that will be used to compare the output of the NN against it and thus compute the loss and update the optimizer.

The dataset is initialized with just the fully qualified path that points to the folder containing the result of all the simulations that have been performed previously. In that folder there would be a bunch of CSV files that contains a single row for each simulation of the same model, and a bunch of columns (with a lowercase name) representing the parameters of the model, and a bunch of (uppercase) columns representing the mean and the standard deviation of species.

The on indexing, i.e. call the __getitem__() method with an index i as input the dataset will returns a tuple with just two elements, where the first are the input of the neural networks and are composed by the parameters of the model, while the second represents the ground trouth values is composed by the mean and the standard deviations.

Attributes

csv_filestr: the input path of the CSV file with the data
input_dataList[List[float]]: A list with all the input data (parameters)
input_sizeint: The size of the input data (i.e. the number points in a single tensor)
output_dataList[List[float]]: A list with all the output data (mean and std for each specie)
output_sizeint: The size of the output data (i.e. the number points in a single tensor)
num_dataint: The total number of samples
parametersList[str]: A list with all parameters name
outputsList[str]: A list with all the output names
train_dataTuple[List[List[float]], List[float]]: A list with all the train data
test_dataTuple[List[List[float]], List[float]]: A list with all the test data

test() → None: Set the dataset for testing

fsml.learn.data_management.dataset.get_dataset_by_indices(src_dataset: FSMLOneMeanStdDataset, train_ids: List[int], test_ids: List[int]) → FSMLOneMeanStdDataset

From an input already existing FSMLOneMeanStdDataset create a new dataset of the same type, by copying it, but selecting only a portion of the train and test set. This portion is identified by the input train indexes and test indexes.

Parameters:

src_dataset – The input already existing dataset
train_ids – The indexes for the new train set
test_ids – The indexes for the new test set

Returns:

a new dataset with “filtered” train and test set

fsml.learn.data_management package

Submodules

fsml.learn.data_management.dataloader module

fsml.learn.data_management.dataset module

Attributes

Attributes

Module contents