The acquisition, transfer, and aggregation of data on a massive scale for data mining and predictive analysis raises questions that simply are not answered by the paradigms that have dominated privacy law to date. This chapter develops a taxonomy of current U.S. privacy law. It then uses that taxonomy to elucidate the mismatch between current law and big data privacy concerns and makes five suggestions to reduce the mismatch. Of these, probably the most important is to recognize that there is no functioning market for assessing citizens' preferences, and that there is a critical need for measuring both the privacy impact of data acquisition and the potential benefit of data use; much can be learned from the experience of environmental regulation.
Big data involves practices that have radically disrupted entrenched information flows. From modes of acquiring to aggregation, analysis, and application, these disruptions affect actors, information types, and transmission principles. Privacy and big data are simply incompatible and the time has come to reconfigure choices that we made decades ago to enforce certain constraints. It is time for the background of rights, obligations, and legitimate expectations to be explored and enriched so that notice and consent can do the work for which it is best suited.
Chapter 3: The Economics and Behavioral Economics of Privacy
This chapter surveys the growing body of theoretical and empirical research on the economics and behavioral economics of privacy, and discusses how these streams of research can be applied to the investigation of the implications of consumer data mining and business analytics. An important insight is that personal information, when shared, can become a public good whose analysis reduces inefficiencies and increases economic welfare; when abused, it can lead to transfer of economic wealth from data subjects to data holders. The interesting economic question then becomes, who will bear the costs if privacy-enhancing technologies become more popular in the age of big data: data subjects (whose benefits from business analytics and big data would shrink with the amount of information they share), data holders (who may face increasing costs associated with collecting and handling consumer data), or both?
How do information privacy laws regulate the use of big data techniques, if at all? Do these laws strike an appropriate balance between allowing the benefits of big data and protecting individual privacy? If not, how might we amend or extend laws to better strike this balance? Most information privacy law focuses on collection or disclosure and not use. Once data has been legitimately obtained, few laws dictate what may be done with the information. The chapter proposes five general approaches for change.
Chapter 5: Enabling Reproducibility in Big Data Research: Balancing Confidentiality and Scientific Transparency
This chapter begins by motivating the scientific rationale for access to data and computational methods to enable the verification and validation of published research findings. It describes the legal landscape in the context of big data research and suggests two guiding principles to facilitate reproducibility and reuse of research data and code within and beyond the scientific context.
II. Practical Framework
Chapter 6: The Value of Big Data for Urban Science
This chapter addresses the motivations for the new urban science, and the value for cities – particularly with respect to analysis of the infrastructure, the environment, and the people. It discusses the key technical issues necessary to build a data infrastructure for curation, analytics, visualization, machine learning, data mining, as well as modeling and simulation to keep up with the volume and speed of data.
Chapter 7: Data for the Public Good: Challenges and Barriers in the Context of Cities
This chapter uses an example of the creation of a data warehouse which links data on multiple services provided by the public sector to individuals and families as a way to highlight both the barriers to and opportunities for cities to use data. It identifies the key issues that need to be addressed – what data to develop and access from counties, states, the federal government, and private sources; how to develop the capacity to use data; how to present data and be transparent; and how best to keep data secure so that individuals and organizations are protected – as well as the key barriers.
Chapter 8: A European Perspective on Research and Big Data Access
Many of the legal and ethical issues associated with big data have wider relevance; this chapter discusses them from a European perspective. The first part gives an historical overview of the progress that has been made across Europe to develop a harmonised approach to legislation designed to provide individuals and organisations with what has become known as the ‘right to privacy’. The second part examines the impact that these legislative developments have had and are continuing to have on cross-border access to microdata for research purposes.
Chapter 9: The New Deal on Data: A Framework for Institutional Controls
This chapter explores the emergence of the Big Data society, arguing that the ‘personal data sector‘ of the economy needs productive collaboration between the government, the private sector, and the citizen to create new markets – just as the automobile and oil industries did in prior centuries. It envisions data access to be governed by ‘living informed consent‘, where the user is entitled to know what data is being collected about her by which entities, empowered to understand the implications of data sharing, and finally put in charge of the sharing authorizations. It discusses the establishment of a New Deal on Data, grounded in principles, such as the opt-in nature of data provision, the boundaries of the data usage, and parties accessing the data.
Chapter 10: Engineered Controls for Dealing with Big Data
Regardless of what data policies have been agreed to, access must be allowed through controls engineered into the data infrastructure. Without sound technical enforcement, incidents of abuse, misuse, theft of data, and even invalid scientific conclusions based on undetectably altered data can be expected. This chapter discusses what features those access controls might have – delineating the characteristics of subjects, objects, and access modes. Although fundamental computing concepts for engineered controls on access to data and on information flows are reasonably well developed, they are perhaps not so widely deployed as they might be. Areas of research that could change the picture in the future include advances in practical cryptographic solutions to computing on encrypted data, which could reduce the need to trust hardware and system software. Advances in methods for building systems in which information flow, rather than access control, as the basis for policy enforcement could also open the door for better enforcement of comprehensible policies
Chapter 11: Portable Approaches to Informed Consent and Open Data
What frameworks are available to permit data reuse? How can legal and technical systems be structured to allow people to donate their data to science? What are appropriate methods to repurpose traditional consent forms so that user-donated data can be gathered, de-identified, and syndicated for use in computational research environments? This chapter examines how traditional frameworks to permit data reuse have been left behind by the mix of advanced techniques for re-identification and cheap technologies for the creation of data about individuals. It discusses the approaches developed in technological and organizational systems to ‘create‘ privacy where it has been eroded while allowing data reuse. This approach draws on encryption and boundary organizations to manage privacy on behalf of individuals. It also discusses a new approach of ‘radical honesty‘ towards data contribution and the development of ‘portable‘ approaches to informed consent that could potentially support a broad range of research without the unintended fragmentation of data created by traditional consent systems.
III. Statistical Framework
Chapter 12: Extracting Information from Big Data: A Privacy and Confidentiality Perspective
This chapter discusses the new statistical challenges associated with inference in the context of big data. It pays particular attention to the importance of providing access to researchers in order to both develop new statistical approaches to address the issues of coverage and nonresponse as well as new models based, for example, on examining large numbers of outliers.
This chapter explores the interactions between data dissemination, big data, and statistical inference. It identifies a set of lessons that stewards of big data can learn from statistical agencies’ experiences about the measurement of disclosure risk and data utility. It discusses how the sheer scale and potential use of big data will require that analysis be taken to the data rather than the data to the analyst or the analyst to the data. It suggests that a viable way forward for big data access is an integrated system including (i) unrestricted access to highly redacted data, most likely some version of synthetic data, followed with (ii) means for approved researchers to access the confidential data via remote access solutions, glued together by (iii) verification servers that allow users to assess the quality of their inferences with the redacted data so as to be more efficient with their use (if necessary) of the remote data access.
Chapter 14: Differential Privacy: A Cryptographic Approach to Private Data Analysis
This chapter shows how differential privacy provides a mathematically rigorous theory of privacy, a theory amenable to measuring (and minimizing) cumulative privacy loss, as data are analyzed and re-analyzed, shared and linked. There are trade-offs – differential privacy requires a new way of interacting with data, in which the analyst only accesses data through a privacy mechanism, and in which accuracy and privacy are improved by minimizing the viewing of intermediate results. But the approach provides a measure that captures cumulative privacy loss over multiple releases; it offers the possibility that data usage and release could be accompanied by publication of privacy loss.