Data generated by the healthcare industry is voluminous and federated across a variety of stakeholders.  Important sources of healthcare data have traditionally included electronic medical records, insurance claims databases, and clinical trial study results. Increasingly, these conventional sources are being complemented by non-traditional sources including location data captured from mobile phones and activity data captured from consumer wearable devices.

In addition, new biomarkers gathered through genetic testing promise to aid understanding of the genetic basis of disease. Data linkage across these data sources is key to conducting holistic studies of patient outcomes. However, the nature and size of these data sources make it infeasible to obtain consent from individuals for the use of their data. This means that special care must be taken in the process of linking separately held healthcare datasets.

 Our algorithms are designed to scale to datasets of millions of records while maintaining mathematically provable security guarantees for contributed data. We intend to build a market network in which healthcare stakeholders can both request data linkage studies and contribute their datasets for linkage in return for monetary compensation.