syft.frameworks.torch.linalg.lr

Module Contents

class syft.frameworks.torch.linalg.lr.EncryptedLinearRegression(crypto_provider: BaseWorker, hbc_worker: BaseWorker, precision_fractional: int = 6, fit_intercept: bool = True)

Multi-Party Linear Regressor based on Jonathan Bloom’s algorithm. It performs linear regression using Secure Multi-Party Computation. While the training is performed in SMPC, the final regression coefficients are public at the end and predictions are made in clear on local or pointer Tensors.

Reference: Section 2 of https://arxiv.org/abs/1901.09531

Parameters
  • crypto_provider – a BaseWorker providing crypto elements for AdditiveSharingTensors (used for SMPC) such as Beaver triples

  • hbc_worker – The “Honest but Curious” BaseWorker. SMPC operations in PySyft use SecureNN protocols, which are based on 3-party computations. In order to apply it for more than 3 parties, we need a “Honest but Curious” worker. To perform the Encrypted Linear Regression, the algorithm chooses randomly one of the workers in the pool and secret share all tensors with the chosen worker, the crypto provider and the “Honest but Curious” worker. Its main role is to avoid collusion between two workers in the pool if the algorithm secred shared the tensors with two randomly chosen workers and the crypto provider. The “Honest but Curious” worker is essentially a legitimate participant in a communication protocol who will not deviate from the defined protocol but will attempt to learn all possible information from legitimately received messages.

  • precision_fractional – precision chosen for FixedPrecisionTensors

  • fit_intercept – whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered)

coef

torch.Tensor of shape (n_features, ). Estimated coefficients for the linear regression problem.

intercept

torch.Tensor of shape (1, ) if fit_intercept is set to True, None otherwise. Estimated intercept for the linear regression.

pvalue_coef

numpy.array of shape (n_features, ). Two-sided p-value for a hypothesis test whose null hypothesis is that the each coeff is zero.

pvalue_intercept

numpy.array of shape (1, ) if fit_intercept is set to True, None otherwise. Two-sided p-value for a hypothesis test whose null hypothesis is that the intercept is zero.

fit(self, X_ptrs: List[torch.Tensor], y_ptrs: List[torch.Tensor])

Fits the linear model using Secured Multi-Party Linear Regression. The final results (i.e. coefficients and p-values) will be public.

predict(self, X: torch.Tensor)

Performs predicion of linear model on X, which can be a local torch.Tensor or a wrapped PointerTensor. The result will be either a local torch.Tensor or a wrapped PointerTensor, depending on the nature of X.

summarize(self)

Prints a summary of the coefficients and its statistics. This method should be called only after training of the model.

_check_ptrs(self, X_ptrs, y_ptrs)

Method that check if the lists of pointers corresponding to the explanatory and explained variables have their elements as expected. It also computes parallelly some Regressor’s attributes such as number of features and total sample size.

static _add_intercept(X_ptrs)

Adds a column-vector of 1’s at the beginning of the tensors X_ptrs

static _get_workers(ptrs)

Method that returns the pool of workers in a tuple

static _remote_dot_products(X_ptrs, y_ptrs)

This method computes the aggregated dot-products remotely. It corresponds to the Compression stage (or Compression within) of Bloom’s algorithm

_share_ptrs(self, ptrs, worker_idx)

Method that secret share a list of remote tensors between a worker of the pool and the ‘honest but curious’ worker, using a crypto_provider worker

_compute_pvalues(self)

Compute p-values of coefficients (and intercept if fit_intercept==True)

class syft.frameworks.torch.linalg.lr.DASH(crypto_provider: BaseWorker, hbc_worker: BaseWorker, precision_fractional: int = 6)

Distributed Association Scan Hammer (DASH) algorithm based on Jonathan Bloom’s algorithm. It uses Secured Multi-Party Computation at combine phase. While the training is performed in SMPC, the final regression coefficients are public at the end.

Reference: Section 2 of https://arxiv.org/abs/1901.09531

Parameters
  • crypto_provider – a BaseWorker providing crypto elements for ASTs such as Beaver triples

  • hbc_worker – The “Honest but Curious” BaseWorker. SMPC operations in PySyft use SecureNN protocols, which are based on 3-party computations. In order to apply it for more than 3 parties, we need a “Honest but Curious” worker. To perform the DASH algorithm, we choose randomly one of the workers in the pool and secret share all tensors with the chosen worker,the crypto provider and the “Honest but Curious” worker. Its main role is to avoid collusion between two workers in the pool if the algorithm secred shared the tensors with two randomly chosen workers and the crypto provider. The “Honest but Curious” worker is essentially a legitimate participant in a communication protocol who will not deviate from the defined protocol but will attempt to learn all possible information from legitimately received messages.

  • precision_fractional – precision chosen for FixedPrecisionTensors

coef

torch.Tensor of shape (n_features, ). Estimated coefficients for DASH algorithm.

pvalue

numpy.array of shape (n_features, ). Two-sided p-value for a hypothesis test whose null hypothesis is that the each coeff is zero.

fit(self, X_ptrs: List[torch.Tensor], C_ptrs: List[torch.Tensor], y_ptrs: List[torch.Tensor])
get_coeff(self)
get_standard_errors(self)
get_p_values(self)
_check_ptrs(self, X_ptrs, C_ptrs, y_ptrs)

Method that check if the lists of pointers corresponding to the response vector, transient covariate vectors and independent permanent covariate vectors have their elements as expected. It also computes parallelly some Regressor’s attributes such as degrees of freedom and total sample size.

static _get_workers(ptrs)

Method that returns the pool of workers in a tuple

static _remote_dot_products(X_ptrs, C_ptrs, y_ptrs)

This method computes the aggregated dot-products remotely. It corresponds to the Compression stage (or Compression within) of DASH algorithm

static _remote_qr(C_ptrs)

Performs the QR decompositions of permanent covariate matrices remotely. It returns a list with the upper right matrices located in each worker

static _inv_upper(R)

Performs the inversion of a right upper matrix (2-dim tensor) in MPC by solving the linear equation R * R_inv = I with backward substitution.

_share_ptrs(self, ptrs, worker_idx)

Method that secret share a list of remote tensors between a worker of the pool and the ‘honest but curious’ worker, using a crypto_provider worker

_compute_pvalues(self)

Compute p-values of coefficients