Regression Models Reference🔗
The Apheris Regression Models codebase is a toolkit to allow you to run regressions on Apheris.
Currently the only supported model is the Cox Regression, but in future versions of Apheris you can expect to find other regression models in this package.
regression.logistic_regression.api_client🔗
fit_lr(datasets, session, feature_cols, target_col, validation_set_col=None, validation_split=None, feature_selector_direction=None, num_rounds=5, num_steps_per_round=2)
🔗
Trains a linear regression model using the specified datasets and session.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasets |
Union[Iterable[FederatedDataFrame], FederatedDataFrame]
|
The datasets to be used for training. |
required |
session |
Union[SupervisedMLSession, LocalDebugMLSession]
|
The session object that defines compute_spec and dataset ids, |
required |
num_rounds |
int
|
The number of training rounds to perform. |
5
|
feature_cols |
List[Union[int, float, str]]
|
Columns to be used as features in the model. |
required |
target_col |
Union[int, float, str]
|
Column to be used as the target variable. |
required |
validation_set_col |
Optional[Union[int, float, str]]
|
Column to be used for the validation set. Defaults to None. |
None
|
validation_split |
float
|
Fraction of the data to be used for validation. Defaults to 0.2. |
None
|
feature_selector_direction |
Optional[str]
|
Direction for feature selection ('forward' or 'backward'). Defaults to None. |
None
|
num_steps_per_round |
int
|
Number of steps to perform per round. Defaults to 2. |
2
|
Returns:
Name | Type | Description |
---|---|---|
results |
dict
|
Dictionary containing the model parameters as results of the training process. |
Source code in .env/lib/python3.10/site-packages/regression/logistic_regression/api_client.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
validate_lr(datasets, session, feature_cols, ground_truth_col, modelparameter)
🔗
Validates the model based on the specified validation dataset. Args: datasets (FederatedDataFrame): The dataset to be used for prediction. session (Union[SupervisedMLSession, LocalDebugMLSession]): The session object that defines compute_spec and dataset ids. feature_cols (List[Union[int, float, str]]): Columns to be used as features in the model. ground_truth_col (Union[int, float, str]): Column to be used as the ground truth. modelparameter (dict): The model parameters to be used for prediction as dictionary. This can be the output of the fit_lr function. Returns: results: Dictionary containing the predicted values.
Source code in .env/lib/python3.10/site-packages/regression/logistic_regression/api_client.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
regression.cox.api_client🔗
fit_coxph(datasets, session, time_col, target_col, validation_set_col=None, max_time=-1, num_rounds=5, num_steps_per_round=2)
🔗
Actual training job to fit a cox regression model to given federated datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasets |
Union[Iterable[FederatedDataFrame], FederatedDataFrame]
|
List of FederatedDataFrame or single FederatedDataFrame that point to the datasets to be used for training. |
required |
session |
Union[SupervisedMLSession, LocalDebugMLSession]
|
session object that defines compute_spec and dataset ids, |
required |
num_rounds |
int
|
Number of training rounds |
5
|
time_col |
Union[int, float, str]
|
Column name of integer valued time columns, |
required |
target_col |
Union[int, float, str]
|
Column name of ground truth, |
required |
validation_set_col |
Optional[Union[int, float, str]]
|
Column name of boolean valued validation set indicator, if None a validation set is obtained by a train test split of 20% |
None
|
max_time |
int
|
maximum time over all datasets, if not given it is computed in a preliminary max computation, |
-1
|
num_steps_per_round |
int
|
number of steps per federated round, |
2
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary with three keys: |
dict
|
|
dict
|
|
dict
|
|
Raises:
Type | Description |
---|---|
RuntimeError
|
If the job cannot be created |
TimeoutError
|
If the job takes longer than the supplied timeout |
ResultsNotFound
|
If the job did not complete due to an error. In this case, please check the supplied logs for more details. |
Source code in .env/lib/python3.10/site-packages/regression/cox/api_client.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
validate_cox(datasets, session, time_col, ground_truth_col, modelparameter)
🔗
Validates the model based on the specified validation datasets and column ground_truth_col. For validation, the default scoring function from the lifelines CoxPH model, which is the average partial log-likelihood, is used. Args: datasets (FederatedDataFrame): The dataset to be used for prediction. session (Union[SupervisedMLSession, LocalDebugMLSession]): The session object that defines compute_spec and dataset ids. time_col (Union[int, float, str]): Column to be used as time column for the cox inference. The time column should be integer valued. ground_truth_col (Union[int, float, str]): Column to be used as the ground truth. modelparameter (dict): The model parameters to be used for prediction as dictionary. This can be the output of the fit_coxph function.
Returns:
Name | Type | Description |
---|---|---|
results |
dict
|
Dictionary containing the predicted values. |
Source code in .env/lib/python3.10/site-packages/regression/cox/api_client.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
regression.session🔗
LocalDebugMLSession
🔗
Bases: LocalDebugSimpleStatsSession
Local session object that connects the regression model with nvflare simulator and supports running a simulation of an SupervisedMLJobDefinition.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasets |
List[LocalDebugDataset]
|
A list of LocalDebugDatasets that should be included in the session. Each dataset will be assigned to its own simulated Compute Gateway. |
required |
workspace |
Optional[Union[str, Path]]
|
The path to the workspace where the results will be stored. |
None
|
max_threads |
Optional[int]
|
The maximum number of threads to be used in the computation. By default this is set to the number of Compute Gateways in the computation, but can be set to 1 for improved debugging with PDB. |
None
|
Source code in .env/lib/python3.10/site-packages/regression/session/session.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
SupervisedMLSession
🔗
Bases: SimpleStatsSession
Session object that connects the regression models with job api and supports running a SupervisedMLJobDefinition.
Can be instantiated manually for a running compute spec, but typically will be
created using the provision
function from regression.session.provision
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
compute_spec_id |
UUID
|
The UUID of the compute spec running the Regression Models |
required |
Source code in .env/lib/python3.10/site-packages/regression/session/session.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
__init__(compute_spec_id)
🔗
Create SupervisedMLSession for a given compute_spec_id.
Source code in .env/lib/python3.10/site-packages/regression/session/session.py
162 163 164 165 166 167 168 |
|
regression.session.provision🔗
provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000, modelversion=None)
🔗
Create and activate a compute spec to run remote regression models on Apheris.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_ids |
List[str]
|
A list of Apheris dataset IDs |
required |
client_n_cpu |
float
|
The fractional number of CPUs to request in the compute spec for the Compute Gateways. Consider increasing this if your computation takes too long. |
0.5
|
client_memory |
int
|
The amount of client memory to request in the compute spec for the Compute Gateways. Consider increasing this if your computation runs out of memory. |
1000
|
server_n_cpu |
float
|
The fractional number of CPUs to request in the compute spec for the Orchestrator. Consider increasing this if your computation takes too long during aggregation. |
0.5
|
server_memory |
int
|
The amount of client memory to request in the compute spec for the Orchestrator. Consider increasing this if your computation runs out of memory in the Orchestrator. |
1000
|
modelversion |
Optional[str]
|
The version of regression models to use for this session. Defaults to the latest available version. |
None
|
Returns:
Type | Description |
---|---|
SupervisedMLSession
|
A |
Source code in .env/lib/python3.10/site-packages/regression/session/provision.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
misc🔗
ResultsNotFound
🔗
Bases: Exception
Source code in .env/lib/python3.10/site-packages/apheris_stats/simple_stats/_core/stats_session.py
75 76 |
|