Skip to the content.

Privacy-Preserving Data Analysis and Secure Data Infrastructure for Real Applications

Japanese / English

Objectives

Our goal is to advance research in system software (data infrastructure, OS), data analysis (differential privacy, federated learning), and real-world applications (medical data, trajectory data) to alleviate concerns such as data leakage due to unauthorized accesses, innocent but leaky data extraction by legitimate users, and the identification of individuals (intentional or unintentional) from analysis results. We aim to contribute to the realization of a Society 5.0 that allows safe and proactive data utilization.

This project has been adopted in the JST CREST field “Creation of System Software for Society 5.0 by Integrating Fundamental Theories and System Platform Technologies”.

Team

Research Topics

Objectives

  1. Secure File System that Do Not Rely on Trustworthy Administrators
  2. System Mechanisms for Enforcing and Traceable Privacy Protection
  3. Flexible Privacy-Preserving Data Analysis and Machine Learning
  4. Demonstrations in Medical and Trajectory Data Applications

Secure File System that Do Not Rely on Trustworthy Administrators

System Mechanisms for Enforcing and Traceable Privacy Protection

Flexible Privacy-Preserving Data Analysis and Machine Learning

In this sub-theme, we research the following topics:

VLDB23-Olive

Combining federated learning (FL) with trusted execution environments (TEEs) is a promising approach to achieve privacy-preserving FL. Server-side TEE implementation enables FL to progress without exposing gradient information to the server during each round, particularly benefiting utility in FL using local differential privacy. However, considering the vulnerability of server-side TEE in the context of FL has not been sufficiently researched. In this sub-topic, we design systems and algorithms to analyze the vulnerability of TEE in FL, propose rigorous defense methods to enhance TEE, and develop systems that can utilize TEE for private and high-utility federated learning.

vldb23-secSV

Evaluating the contribution of each client’s data is crucial in federated learning for applications such as data markets, explainable AI, or malicious client detection. Shapley value (SV) is a commonly used metric for contribution evaluation. However, existing methods for calculating SV in FL do not consider privacy. They assume that the server can access both unencrypted FL models and client data without privacy protection. Therefore, in this topic, we research privacy-preserving SV computation. We propose a protocol for efficient and private SV calculation, SecSV, for cross-silo FL. SecSV uses a hybrid privacy protection scheme to avoid ciphertext-ciphertext multiplication between test data and models. Experimental results show that SecSV is 5.9 to 18.0 times faster than the Baseline using homomorphic encryption.

Demonstrations in Medical and Trajectory Data Applications

  1. Medical Sub-Theme
    • In the medical sub-theme, we primarily conduct three demonstrations:
    • Development, deployment, and practical use of privacy protection systems using differential privacy in medical settings.
    • Collaboration with Kenjiro Taura
    • Collaboration with Toshihiro Hanawa
    • Collaboration with Yang Cao

個人情報保護を強制するプログラミング基盤 の 医療への展開

Here, the main goal is to establish AI learning methods that protect patient information used in AI learning while using the privacy enforcement technology developed by Kenjiro Taura. AI learning is carried out using the framework of differentially private stochastic gradient descent (DPSGD) to minimize the risk of patient information leakage.

セキュアな医用AIの実臨床

In this collaboration, the focus is on the deployment side, specifically how to safely and scalably execute pre-trained AI and deliver the results to clinical practitioners and patients using secure computation and supercomputers.

Publications

  1. Shang Liu, Yang Cao, Takao Murakami, Masatoshi Yoshikawa. “A Crypto-Assisted Approach for Publishing Graph Statistics with Node Local Differential Privacy.” 2022 IEEE International Conference on Big Data (Big Data). 2022.
  2. Shuyuan Zheng, Yang Cao, Masatoshi Yoshikawa, Huizhong Li, Yong Qiang. “FL-Market: Trading Private Models in Federated Learning.” 2022 IEEE International Conference on Big Data (Big Data). 2022.
  3. Ryota Hiraishi, Masatoshi Yoshikawa, Shun Takagi, Yang Cao, Sumio Fujita, Hidehito Gomi. “Mitigating Privacy Vulnerability Caused by Map Asymmetry.” Springer eBooks. 2022.
  4. Shun Takagi, Fumiharu Kato, Yang Cao, Masatoshi Yoshikawa. “Asymmetric Differential Privacy.” 2022 IEEE International Conference on Big Data (Big Data). 2022.
  5. 空閑 洋平, 中村 遼. “遠隔会議システムの計測データを用いた広域ネットワーク品質計測 .” インターネットと運用技術シンポジウム論文集 . 2022.
  6. Ruixuan Cao, Fumiharu Kato, Yang Cao, Masatoshi Yoshikawa. “An Accurate, Flexible and Private Trajectory-Based Contact Tracing System on Untrusted Servers.” Lecture Notes in Computer Science. 2022.
  7. Shuyuan Zheng, Yang Cao, Masayuki Yoshikawa. “Secure Shapley Value for Cross-Silo Federated Learning.” Proceedings of the VLDB Endowment. 2023.
  8. Cao Xiao, Yang Cao, Primal Pappachan, Atsuyoshi Nakamura, Masatoshi Yoshikawa. “Differentially Private Streaming Data Release Under Temporal Correlations via Post-processing.” Lecture Notes in Computer Science. 2023.
  9. Ryota Hiraishi, Masatoshi Yoshikawa, Yang Cao, Sumio Fujita, Hidehito Gomi. “Mechanisms to Address Different Privacy Requirements for Users and Locations.” IEICE Transactions on Information and Systems. 2023.
  10. Xiaoyu Li, Yang Cao, Masatoshi Yoshikawa. “Locally Private Streaming Data Release with Shuffling and Subsampling.” 2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW). 2023.
  11. Shumpei Shiina, Kenjiro Taura. “Itoyori: Reconciling Global Address Space and Global Fork-Join Task Parallelism.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ‘23). 2023.
  12. Chao Tan, Yang Cao, Sheng Li, Masatoshi Yoshikawa. “General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023.
  13. Shun Takagi, Yang Cao, Yasuhito Asano, Masatoshi Yoshikawa. “Geo-Graph-Indistinguishability: Location Privacy on Road Networks with Differential Privacy.” IEICE Transactions on Information and Systems. 2023.
  14. Ruixuan Liu, Yang Cao, Yanlin Wang, Lingjuan Lyu, Chen Yun, Chang Hong. “PrivateRec: Differentially Private Model Training and Online Serving for Federated News Recommendation.” KDD ‘23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023.
  15. Shun Takagi, Fumiharu Kato, Yang Cao, Masatoshi Yoshikawa. “From Bounded to Unbounded: Privacy Amplification via Shuffling with Dummies.” 2023 IEEE 36th Computer Security Foundations Symposium (CSF). 2023.
  16. Fumiharu Kato, Yang Cao, Masayuki Yoshikawa. “Olive: Oblivious Federated Learning on Trusted Execution Environment against the Risk of Sparsification.” Proceedings of the VLDB Endowment. 2023.
  17. Hisaichi Shibata, Shouhei Hanaoka, Yang Cao, Masatoshi Yoshikawa, Tomomi Takenaga, Yukihiro Nomura, Naoto Hayashi, Osamu Abe. “Local Differential Privacy Image Generation Using Flow-Based Deep Generative Models.” Applied sciences. 2023.
  18. Cao Xiao, Yang Cao, Primal Pappachan, Atsuyoshi Nakamura, Masatoshi Yoshikawa. “Differentially Private Streaming Data Release Under Temporal Correlations via Post-processing.” Lecture Notes in Computer Science. 2023.