Blog. Learning processes and tests at schools and universities are typically highly personal. Therefore in the past, these data have not been shared. Now, researchers at The Technical University of Denmark, have found a way to share these data without infringing on students’ privacy. It is called ‘differentially private machine learning’.
A typical scene at a university today: A class of students each answer a set of tasks. By conventional means, the teacher estimates each student’s performance and releases this information in private to the given student. If the teacher – on top of that – also estimates the difficulty of each task based on all the performances, she is not really aware of whether she is breaching some of the other students’ privacy.
A future scene: The students get a list of their own answers and the correct answers. The school develops and shares an algorithm based on all students’ answers and on a difficulty score, and noone can see which individual data is used for this algorithm. Each student can then match his or her own data up against this algorithm to get much more precise feedback and detailed estimation of his or her own ability scores and probabilities of passing a subject. The student can decide to publish his own results without compromising others, who don’t want to publish their results. Noone can use the algorithm to reveal other individual students’ data.
The future scene is possible today according to research from the Department of Applied Mathematics and Computer Science at the Technical University of Denmark: A differential privacy workflow for inference of parameters in the Rasch model.
The paper shows that privacy-enhanced algorithms can provide better feedback than the individual human teacher. With differentially private machine learning the algorithm is allowed to be trained on inherently private data. The method is using noice to reduce the probability of breach of privacy. The key idea is to secure that the randomized output does not in a significant way depend on any of the possible data subjects’ data.
In the paper, the authors, Teresa Anna Steiner, David Enslev Nyrnberg, and Lars Kai Hansen, conclude:
“Our experiments based on simulated data suggest that the workflow provides estimates of similar quality as the non-private for medium sized classes and industry standard privacy budgets. These findings were confirmed in two real data sets.”
All code is here: https://github.com/DavidEnslevNyrnberg/PrivRaschPuplish
The paper can be found here: http://kdd.di.unito.it/pap2018/papers/PAP_2018_paper_1.pdf