Apheris Statistics Reference🔗

apheris_stats.simple_stats🔗

`corr(datasets, session, column_names, global_means=None, group_by=None, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Computes the federated pearson correlation matrix for a given set of columns.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	datasets that the computation shall be run on	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`Iterable[str]`	set of columns	required
`global_means`	`Dict[Union[str, Tuple], Union[int, float, Number]]`	means over all datasets for given column names. If global_means is None, it will be automatically determined in a separate pre-run	`None`
`group_by`	`Union[Hashable, Iterable[Hashable]]`	mapping, label, or list of labels, used to group before aggregation.	`None`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result as a pandas DataFrame with the correlation matrix of the specified columns.

Example

corr_matrix = simple_stats.corr(
    datasets=[transformations_dataset_essex, transformations_dataset_norfolk],
    column_names=['age', 'length of covid infection'],
    global_means={'age': 50, 'length of covid infection': 10},
    session=session
)

`cov(datasets, session, column_names, global_means=None, group_by=None, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Computes the federated covariance matrix for a given set of columns.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	datasets that the computation shall be run on	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`Iterable[str]`	set of columns	required
`global_means`	`Dict[Union[str, Tuple], Union[int, float, Number]]`	means over all datasets for given column names. If global_means is None, it will be automatically determined in a separate pre-run	`None`
`group_by`	`Union[Hashable, Iterable[Hashable]]`	mapping, label, or list of labels, used to group before aggregation.	`None`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result as a pandas DataFrame with the correlation matrix of the specified columns.

Example

coc_matrix = simple_stats.cov(
    datasets=[transformations_dataset_essex, transformations_dataset_norfolk],
    column_names=['age', 'length of covid infection'],
    global_means={'age': 50, 'length of covid infection': 10},
    session=session
)

`count_column_value(datasets, session, value, *, column_names=None, column_name=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns how often value appears in a certain column of the datasets.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names over which the count values shall be calculated. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column over which the count values shall be calculated	`None`
`value`	`Union[str, int, float, bool]`	This value will be counted	required
`aggregation`	`bool`	Defines whether the counts should be aggregated over all `datasets` or whether the counts should be returned per dataset.	`True`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.ROUND`: only valid for counts, rounds to the privacy bound or 0. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Union[int64, dict]`	statistical result

`count_group_by(datasets, session, *, column_names=None, column_name=None, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Function that counts categorical values of a table column.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names over which the count group by shall be calculated. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`Union[ColumnIdentifier, List[ColumnIdentifier]] \| None`	(deprecated) name of the column over which the the count group by values shall be calculated.	`None`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.ROUND`: only valid for counts, rounds to the privacy bound or 0. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result. Its result contains a pandas DataFrame with the counts summed over the datasets.

`count_null(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns the number of occurrences of NA values (such as None or numpy.NaN) and the number of non-NA values in the datasets. NA are counted based on panda's isna() and notna() functions.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names over which the NA values shall be calculated. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column over which the NA values shall be calculated	`None`
`group_by`	`Union[Hashable, Iterable[Hashable]]`	(optional) mapping, label, or list of labels, used to group before aggregation.	`None`
`aggregation`	`bool`	defines whether the counts should be aggregated over all `datasets` or whether the counts should be returned per dataset.	`True`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.ROUND`: only valid for counts, rounds to the privacy bound or 0. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`describe(datasets, session, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Create a description of a dataset

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Dict[Any, Dict[str, DataFrame]]`	statistical description of datasets

`histogram(datasets, session, column_name, bins, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns a histogram for the given datasets

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_name`	`str`	name of the column for which the histogram shall be generated	required
`bins`	`Union[int, Iterable[float]]`	int or sequence of scalars. If bins is an int, it defines the number of bins with equal width. If it is a sequence, its content defines the bin edges.	required
`group_by`	`Union[Hashable, Iterable[Hashable]]`	mapping, label, or list of labels, used to group before aggregation.	`None`
`aggregation`	`bool`	If True, the histogram is aggregated over all `datasets`. Otherwise, one histogram will be returned per dataset. Aggregation is only feasible, if `bins` is an Iterable which defines the bin edges.	`True`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.ROUND`: only valid for counts, rounds to the privacy bound or 0. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns: statistical result

`iqr_column(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Function to approximate the interquartile range (IQR) over multiple datasets. Internally, first a histogram with a user-defined number of bins and user-defined upper and lower bounds is created over all datasets. Based on this histogram the IQR is approximated.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names to compute the interquartile range (IQR) over. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column to compute the interquartile range (IQR) over.	`None`
`global_min_max`	`Iterable[float]`	a list that contains the global minimum and maximum values of the combined datasets. This needs to be computed separately, using for example the function `min_column`/`max_column` combined with `min_aggregation`/`max_aggregation`.	required
`group_by`	`Union[Hashable, Iterable[Hashable]]`	mapping, label, or list of labels, used to group before aggregation.	`None`
`n_bins`	`int`	number of bins for internal histogram	`100`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`kaplan_meier(datasets, session, duration_column_name, event_column_name, group_by=None, plot=False, stepsize=1, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Create a Kaplan Meier survival statistic

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`duration_column_name`	`str`	duration column for survival function	required
`event_column_name`	`str`	event column - indicating death	required
`group_by`	`str`	grouping column	`None`
`plot`	`bool`	if True results will be displayed using pd.DataFrame.plot()	`False`
`stepsize`	`Union[int, Dict[str, int]]`	histogram bin size, can be an integer or a dictionary mapping group names (i.e. elements that are found in the `group_by` column) to step sizes. Default is 1. For missing groups in the dictionary, step size 1 is used.	`1`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`max_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns the max over a specified column.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names over which the max shall be calculated. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column over which the max shall be calculated	`None`
`group_by`	`Union[Hashable, Iterable[Hashable]]`	optional; mapping, label, or list of labels, used to group before aggregation.	`None`
`aggregation`	`bool`	defines whether the max should be aggregated over all `datasets` or whether the max should be returned per dataset.	`True`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`mean_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns the mean over a specified column.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of columns over which the mean shall be calculated. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column over which the mean shall be calculated.	`None`
`group_by`	`Union[Hashable, Iterable[Hashable]]`	optional; mapping, label, or list of labels, used to group before aggregation.	`None`
`aggregation`	`bool`	defines whether the mean should be aggregated over all `datasets` or whether the mean should be returned per dataset.	`True`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`median_with_confidence_intervals_column(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Function to approximate the median and the 95% confidence interval over multiple datasets. Internally, first a histogram with a user-defined number of bins and user-defined upper and lower bounds is created over all datasets. Based on this histogram the median and the confidence interval are approximated.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names to compute the median over. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column to compute the median over.	`None`
`global_min_max`	`List[float] \| Dict[ColumnIdentifier, List[float]]`	a list that contains the global minimum and maximum values of the combined datasets or a dictionary mapping column names to their global minimum and maximum values. This needs to be computed separately, using for example the function `min_column`/`max_column` combined with `min_aggregation`/`max_aggregation`.	required
`group_by`	`Union[Hashable, Iterable[Hashable]]`	mapping, label, or list of labels, used to group before aggregation.	`None`
`n_bins`	`int`	number of bins for internal histogram	`100`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result - If no `group_by` argument is used, its result contains a `numpy.ndarray` with approximate median, lower and upper bound of the 95% confidence interval. - If a `group_by` argument is used, its result contains a tuple of three dicts (approximate median, lower and upper bound of the 95% confidence interval).

`median_with_quartiles(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Function to approximate the median and the 1st and 3rd quartile over multiple datasets. Internally, first a histogram with a user-defined number of bins and user-defined upper and lower bounds is created over all datasets. Based on this histogram above-mentioned values are approximated.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names to compute the median over. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column to compute the median over.	`None`
`global_min_max`	`List[float] \| Dict[ColumnIdentifier, List[float]]`	a list that contains the global minimum and maximum values of the combined datasets or a dictionary mapping column names to their global minimum and maximum values. This needs to be computed separately, using for example the function `min_column`/`max_column` combined with `min_aggregation`/`max_aggregation`.	required
`group_by`	`Union[Hashable, Iterable[Hashable]]`	mapping, label, or list of labels, used to group before aggregation.	`None`
`n_bins`	`int`	number of bins for the internal histogram	`100`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`..	`RAISE`

Returns:

Type	Description
`Any`	statistical result; Its result contains a tuple with the 1st quartile, the median, and the 3rd quartile.

`min_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns the min over a specified column.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of columns over which the min shall be calculated. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column over which the min shall be calculated.	`None`
`group_by`	`Union[Hashable, Iterable[Hashable]]`	optional; mapping, label, or list of labels, used to group before aggregation.	`None`
`aggregation`	`bool`	defines whether the min should be aggregated over all `datasets` or whether the min should be returned per dataset.	`True`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`pca_transformation(datasets, session, column_names, n_components, handle_outliers=PrivacyHandlingMethod.RAISE.value)` 🔗

Computes the principal components transformation matrix of given list of datasets. Args: datasets: datasets that the computation shall be run on session: For remote runs, use a SimpleStatsSession that refers to a cluster column_names: set of columns n_components: number of components to keep handle_outliers: Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are:

      - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating
         privacy bound.
      - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset
         from the federated computation in case of privacy violations.
      - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound
         was violated.

    Default is `PrivacyHandlingMethod.RAISE`.

Returns: transformation matrix as pandas DataFrame.

`shape(datasets, session, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns the shape of the datasets

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.ROUND`: only valid for counts, rounds to the privacy bound or 0. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`squared_errors_by_column(datasets, session, *, column_names=None, column_name=None, global_mean=0.0, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns the sum over the squared difference from global_mean over a specified column.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names over which the squared errors computation shall be calculated. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column over which the squared errors computation shall be calculated.	`None`
`global_mean`	`Union[float, Dict[ColumnIdentifier, float], Dict[ColumnIdentifier, Dict]]`	the deviation of each element to this value is squared and then added up. The mean can be computed via apheris.simple_stats.mean_column.	`0.0`
`aggregation`	`bool`	defines whether the operation should be aggregated over all `datasets` or whether the operation should be returned per dataset.	`True`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`sum_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

Returns the sum over a specified column.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`column_names`	`List[ColumnIdentifier] \| None`	List of column names over which the sum shall be calculated. Can only be None if deprecated `column_name` is used.	`None`
`column_name`	`ColumnIdentifier \| None`	(deprecated) name of the column over which the sum shall be calculated	`None`
`group_by`	`Union[Hashable, Iterable[Hashable]]`	optional; mapping, label, or list of labels, used to group before aggregation.	`None`
`aggregation`	`bool`	defines whether the sum should be aggregated over all `datasets` or whether the sum should be returned per dataset.	`True`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`

Returns:

Type	Description
`Any`	statistical result

`tableone(datasets, session, numerical_columns=None, numerical_nonnormal_columns=None, categorical_columns=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE, tolerate_client_failures=False)` 🔗

Create an overview statistic

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	list of FederatedDataFrames that define the pre-preprocessing of individual datasets.	required
`session`	`Union[SimpleStatsSession, LocalDummySimpleStatsSession, LocalDebugSimpleStatsSession]`	For remote runs, use a `SimpleStatsSession` that refers to a cluster of Compute Clients and an Aggregator. If you want to simulate a cluster locally, use a `LocalDummySimpleStatsSession` or `LocalDebugSimpleStatsSession`.	required
`numerical_columns`	`Iterable[str]`	names of columns for which mean and standard deviation shall be calculated.	`None`
`numerical_nonnormal_columns`	`Iterable[str]`	names of columns for which the median, as well as 1st and 3rd quartile shall be calculated. These values are approximated via a histogram.	`None`
`categorical_columns`	`Iterable[str]`	names of categorical columns, whose value counts shall be counted.	`None`
`group_by`	`Union[Hashable, Iterable[Hashable]]`	mapping, label, or list of labels, used to group before aggregation.	`None`
`n_bins`	`int`	number of bins of the histogram that is used to approximate the median and 1st and 3rd quartile of columns in `numerical_nonnormal_columns`.	`100`
`handle_outliers`	`Union[PrivacyHandlingMethod, str]`	Parameter of enum type PrivacyHandlingMethod which specifies the handling method in case of bounded privacy violations. The implemented options are: - `PrivacyHandlingMethod.FILTER`: filters out all groups that are violating privacy bound. - `PrivacyHandlingMethod.FILTER_DATASET`: removes out the entire dataset from the federated computation in case of privacy violations. - `PrivacyHandlingMethod.RAISE`: raises a PrivacyException if privacy bound was violated. Default is `PrivacyHandlingMethod.RAISE`.	`RAISE`
`tolerate_client_failures`	`bool`	If True, the computation will continue even if some clients fail. If False, the computation will raise an exception if any client fails.	`False`

Returns:

Type	Description
`Any`	statistical result; Its result contains a pandas DataFrame with the tableone statistics over the datasets.

apheris_stats.simple_stats.exceptions🔗

`ObjectNotFound` 🔗

Bases: ApherisException

Raised when trying to access an object that does not exist.

`InsufficientPermissions` 🔗

Bases: Exception

Raised when an operation does not have sufficient permissions to be performed.

`PrivacyException` 🔗

Bases: Exception

Raised when a privacy mechanism required by the data provider(s) fails to be applied, is violated, or is incompatible with the user-chosen settings.

`RestrictedPreprocessingViolation` 🔗

Bases: PrivacyException

Raised when a prohibited command is requested to be executed due to restricted preprocessing.

apheris_stats.simple_stats.util🔗

`LocalDebugDataset` 🔗

`init(dataset_id, gateway_id, dataset_fpath, permissions=None, policy=None)` 🔗

Dataset class for LocalDebugSimpleStatsSessions.

Parameters:

Name	Type	Description	Default
`dataset_id`	`str`	Name of the dataset. Allowed characters: letters, numbers, "_", "-", "."	required
`gateway_id`	`str`	Name of a hypothetical gateway that this dataset resides on. Datasets with the same gateway_id will be launched into the same client. Allowed characters: letters, numbers, "_", "-", "."	required
`dataset_fpath`	`str`	Absolute filepath to data.	required
`policy`	`dict`	Policy dict. If not provided, we use empty policies.	`None`
`permissions`	`dict`	Permissions dict. If not provided, we allow all operations.	`None`

`LocalDebugSimpleStatsSession` 🔗

Bases: LocalSimpleStatsSession

For debugging Apheris Statistics computations locally on your machine. You can work with local files and custom policies and custom permissions. Inject the LocalDebugSimpleStatsSession into a simple-stats computation.

To use the PDB debugger, it is necessary to set max_threads=1.

`init(datasets, workspace=None, max_threads=None, verbose=False)` 🔗

Inits a LocalDebugSimpleStatsSession.

Parameters:

Name	Type	Description	Default
`datasets`	`List[LocalDebugDataset]`	A list of `LocalDebugDataset` that define the datasets.	required
`workspace`	`Union[str, Path]`	path to use as workspace. If not provided, a temporary directory is used as workspace, and information is lost after a statistical query is finished.	`None`
`max_threads`	`Optional[int]`	The maximum number of parallel threads to use for the Flare simulator. This should be between 1 and the number of gateways used by the session. Note that debugging may fail for max_threads > 1. Default=1.	`None`
`verbose`	`bool`	If True, the simulator will print logs to the console. If False, the simulator will not print logs to the console, but they can be retrieved from the workspace after the simulation has finished.	`False`

`LocalDummySimpleStatsSession` 🔗

Bases: LocalSimpleStatsSession

`init(dataset_ids=None, workspace=None, policies=None, permissions=None, max_threads=None, verbose=False)` 🔗

Inits a LocalDummySimpleStatsSession. When you use the session, DummyData, policies and permissions are downloaded to your machine. Then a simulator runs on your local machine. You can step into the code with a Debugger to investigate problems. Instead of using the original policies and permissions, you can use custom ones. This might be necessary if the DummyData datasets are too small to fullfil privacy constraints for your query. This comes with the downside that your simulation deviates from a "real" execution.

To use the PDB debugger, it is necessary to set max_threads=1.

Parameters:

Name	Type	Description	Default
`dataset_ids`	`List[str]`	List of dataset IDs. For each dataset ID, a client will be spun up, that uses the datasets' DummyData as his dataset. We automatically apply the privacy policies and permissions of the specified datasets.	`None`
`workspace`	`Union[str, Path]`	Union[str, Path] = None	`None`
`policies`	`Optional[Dict[str, dict]]`	Dictionary that defines an asset policy (value) per dataset ID (key) in `dataset_ids`. If a dataset ID is not given in the dictionary, we use the one of the original data. If None, we use the policies of the original data.	`None`
`permissions`	`Optional[Dict[str, dict]]`	Dictionary that defines permissions (value) per dataset ID (key) in `dataset_ids`. If a dataset ID is not given in the dictionary, we use the one of the original data. If None, we use the permissions of the original data.	`None`
`max_threads`	`Optional[int]`	The maximum number of parallel threads to use for the Flare simulator. This should be between 1 and the number of gateways used by the session. Note that debugging may fail for max_threads > 1. Default=1.	`None`
`verbose`	`bool`	If True, the simulator will print logs to the console. If False, the simulator will not print logs to the console, but they can be retrieved from the workspace after the simulation has finished.	`False`

`provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000)` 🔗

Create and activate a cluster of Compute Clients and a Compute Aggregator.

Parameters:

Name	Type	Description	Default
`dataset_ids`	`List[str]`	List of dataset IDs. For each dataset ID, a Compute Client will be spun up.	required
`client_n_cpu`	`float`	number of vCPUs of Compute Clients	`0.5`
`client_memory`	`int`	memory of Compute Clients [MByte]	`1000`
`server_n_cpu`	`float`	number of vCPUs of Compute Aggregators	`0.5`
`server_memory`	`int`	memory of Compute Aggregators [MByte]	`1000`

Returns: SimpleStatsSession - Use this session in with simple statistics functions like apheris_stats.simple_stats.tableone.

`PrivacyHandlingMethod` 🔗

Bases: Enum

Defines the handling method when bounded privacy is violated.

Attributes:

Name	Type	Description
`FILTER`		Filter out all groups that are violating privacy bound
`FILTER_DATASET`		Removes out the entire dataset from the federated computation in case of privacy violations
`ROUND`		only valid for counts, rounds to the privacy bound or 0
`RAISE`		raises a PrivacyException if privacy bound was violated

`ResultsNotFound` 🔗

Bases: Exception

`SimpleStatsSession` 🔗

Bases: StatsSession

`init(compute_spec_id)` 🔗

Inits a SimpleStatsSession that connects to a running cluster of Compute Clients and an Aggregator. If you have no provisioned/activated cluster yet, then use apheris_stats.simple_stats.util.provision

Parameters:

Name	Type	Description	Default
`compute_spec_id`	`UUID \| str`	Compute spec ID that corresponds to a running cluster or Compute Clients and an Aggregator. (If you have no provisioned/activated cluster yet, then use `apheris_stats.simple_stats.util.provision`)	required

`get_module_functions(module)` 🔗

Return a list of functions in module.

Apheris Statistics Reference🔗

apheris_stats.simple_stats🔗

corr(datasets, session, column_names, global_means=None, group_by=None, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

cov(datasets, session, column_names, global_means=None, group_by=None, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

count_column_value(datasets, session, value, *, column_names=None, column_name=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

count_group_by(datasets, session, *, column_names=None, column_name=None, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

count_null(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

describe(datasets, session, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

histogram(datasets, session, column_name, bins, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

iqr_column(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

kaplan_meier(datasets, session, duration_column_name, event_column_name, group_by=None, plot=False, stepsize=1, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

max_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

mean_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

median_with_confidence_intervals_column(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

median_with_quartiles(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

min_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

pca_transformation(datasets, session, column_names, n_components, handle_outliers=PrivacyHandlingMethod.RAISE.value) 🔗

shape(datasets, session, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

squared_errors_by_column(datasets, session, *, column_names=None, column_name=None, global_mean=0.0, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

sum_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE) 🔗

tableone(datasets, session, numerical_columns=None, numerical_nonnormal_columns=None, categorical_columns=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE, tolerate_client_failures=False) 🔗

apheris_stats.simple_stats.exceptions🔗

ObjectNotFound 🔗

InsufficientPermissions 🔗

PrivacyException 🔗

RestrictedPreprocessingViolation 🔗

apheris_stats.simple_stats.util🔗

LocalDebugDataset 🔗

__init__(dataset_id, gateway_id, dataset_fpath, permissions=None, policy=None) 🔗

LocalDebugSimpleStatsSession 🔗

__init__(datasets, workspace=None, max_threads=None, verbose=False) 🔗

LocalDummySimpleStatsSession 🔗

__init__(dataset_ids=None, workspace=None, policies=None, permissions=None, max_threads=None, verbose=False) 🔗

provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000) 🔗

PrivacyHandlingMethod 🔗

ResultsNotFound 🔗

SimpleStatsSession 🔗

__init__(compute_spec_id) 🔗

get_module_functions(module) 🔗

`corr(datasets, session, column_names, global_means=None, group_by=None, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`cov(datasets, session, column_names, global_means=None, group_by=None, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`count_column_value(datasets, session, value, *, column_names=None, column_name=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`count_group_by(datasets, session, *, column_names=None, column_name=None, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`count_null(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`describe(datasets, session, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`histogram(datasets, session, column_name, bins, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`iqr_column(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`kaplan_meier(datasets, session, duration_column_name, event_column_name, group_by=None, plot=False, stepsize=1, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`max_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`mean_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`median_with_confidence_intervals_column(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`median_with_quartiles(datasets, session, global_min_max, *, column_names=None, column_name=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`min_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`pca_transformation(datasets, session, column_names, n_components, handle_outliers=PrivacyHandlingMethod.RAISE.value)` 🔗

`shape(datasets, session, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`squared_errors_by_column(datasets, session, *, column_names=None, column_name=None, global_mean=0.0, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`sum_column(datasets, session, *, column_names=None, column_name=None, group_by=None, aggregation=True, handle_outliers=PrivacyHandlingMethod.RAISE)` 🔗

`tableone(datasets, session, numerical_columns=None, numerical_nonnormal_columns=None, categorical_columns=None, group_by=None, n_bins=100, handle_outliers=PrivacyHandlingMethod.RAISE, tolerate_client_failures=False)` 🔗

`ObjectNotFound` 🔗

`InsufficientPermissions` 🔗

`PrivacyException` 🔗

`RestrictedPreprocessingViolation` 🔗

`LocalDebugDataset` 🔗

`init(dataset_id, gateway_id, dataset_fpath, permissions=None, policy=None)` 🔗

`LocalDebugSimpleStatsSession` 🔗

`init(datasets, workspace=None, max_threads=None, verbose=False)` 🔗

`LocalDummySimpleStatsSession` 🔗

`init(dataset_ids=None, workspace=None, policies=None, permissions=None, max_threads=None, verbose=False)` 🔗

`provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000)` 🔗

`PrivacyHandlingMethod` 🔗

`ResultsNotFound` 🔗

`SimpleStatsSession` 🔗

`init(compute_spec_id)` 🔗

`get_module_functions(module)` 🔗