Site icon Synced

Shared Machine Learning: Ant Financial’s Solution For Data Privacy

Growing public awareness and concerns over data privacy are pushing tech companies to explore new ways to advance machine learning models without centralizing people’s personal data. One of the world’s largest financial companies, Alibaba subsidiary Ant Financial, recently introduced Shared Machine Learning (SML) as their solution for data privacy.

Ant Financial has spent two years on the research and development of SML as a learning paradigm that can aggregate and process multiparty information while protecting the privacy of individual participants.

Current data protection technologies are either based on Trusted Execution Environment (TEE) or Multiparty Computation (MPC) systems. TEE indicates a secure area for a third-party processor that protects data with confidentiality and integrity. Examples include SGX from Intel, SEV from AMD, and Trust Zone from ARM.

MPC on the other hand is an emerging subfield of cryptography that jointly computes a function over data from different parties while keeping the data private. A recent example is federated learning, a distributed method Google introduced in 2017 to train models on a large corpus of decentralized data.

Ant Financial’s SML solution combines TEE and MPC for applications in banking, insurance, and commerce.

TEE-based SML

SML uses Intel’s SGX technology in its foundation layer and is compatible with other TEE implementations. The SGX-based SML method supports both online prediction and offline training.

Online prediction models have a higher requirement for stability in load balancing, failover, and dynamic capacity expansion. One of the key technologies for improving stability is clustering, but conventional clustering solutions are not applicable on SGX. This prompted Ant Financial to design a new distributed online service framework as shown below.

Unlike conventional clustering methods, in the SML framework each service will register and maintain a heartbeat connection with the ClusterManager (CM).

This framework is able to:

The framework supports a variety of commonly used prediction algorithms including LR, GBDT, and Xgboost; and enables prediction on encrypted data from multiple parties.

In offline training, the SGX-based SML framework is compatible with Xgboost using LibOsOcclum and a home-grown distributed networking system to support data fusion and distributed training. Ant Financial is currently also using this solution to migrate TensorFlow.

TEE-based shared machine learning on multiparty data works as follows:

MPC-based SML

Ant Financial’s MPC-based SML framework has three layers:

The MPC-based SML framework supports popular algorithms including LR, GBDT, GNN, etc. Below is the training process:

The specific architecture of the training engine is shown below:

Federated learning vs. shared machine learning

Ant Financial also identified a couple of major differences between federated learning and shared machine learning:

More information is available (in Mandarin) in this Ant Financial tech post.


Journalist: Tony Peng | Editor: Michael Sarazen

Exit mobile version