Best Practices for Working Creatively with Personal Data

Introduction

Marilène Oliver, Scott Smallwood, Stephen Moore and J.R. Carpenter. Screen capture of My Data Body 2021, VR artwork made as part of Know Thyself as a Virtual Reality project. Image courtesy of the artists.

My Data Body and Your Data Body are partner virtual reality (VR) projects created by a collaborative and interdisciplinary team based at the University of Alberta, Canada, as part of larger project called Know Thyself as a Virtual Reality (KTVR). My Data Body seeks to make visible and manipulable all the data humans now endlessly generate and are responsible for, while Your Data Body troubles how we interact with, and are equally responsible for, the data of others. 

My Data Body has at its centre a high-resolution, volume rendered full body MR scan dataset that viewers can enter into and explore. Embedded into this semi-transparent virtual body are other data corpuses downloaded from Facebook and Google. These textual data corpuses are plotted into cross sections of the body. In the horizontal (axial) plane, Mac terminal data is plotted into bone, Google data into muscle, and Facebook data into fat. In the vertical plane are plotted data usage agreements and into the depth plane are theoretical texts about virtuality and privacy in the digital age. The viewer can pull out these cross sections and read them; once they let go, the cross sections float away but ultimately and uncontrollably return to the scanned body. Passwords and logins flow back and forth through veins and arteries, and hashtags pool in organs. Certain organs can be pulled out of the body and “drawn with”: the heart leaves a trail of emojis and the brain a trail of login pop-up windows demanding usernames and passwords. The medically scanned, passive, obedient, semi-transparent body becomes a data processing site that can be pulled apart and (dis)organized. The whole body/data processing site finds itself at the centre of a data cloud generated from social media data. 

Your Data Body is a partner project to My Data Body, made using a combination of open-source and donated datasets. This project focuses on issues of data privacy and ownership, playing on the etymology of the word data, meaning “given.” The scan datasets, which are stored in a series of pods that the viewer can teleport between, can be picked up and moved around, resized and recoloured, inviting a playful stacking of the body parts to make a whole Frankenstein-like figure. Audio is attached to each body part, triggered as the viewer holds and manipulates it. Anonymized open-source datasets are accompanied by an automated voice that reads the study data and usage permissions published alongside the dataset, whereas donated datasets have a recording of the data subject reflecting on their relationship to their data. In addition, there are also two highly used open-source datasets: the Visible Human Project, from the American National Library of Medicine; and Melanix, which comes with the radiology software OsiriX. Both have an AI-generated chatbot attached to them with whom the user can “discuss” different issues relating to data ownership, privacy, and virtuality.

The making of these artworks has raised many practical and ethical questions about the use of sensitive personal data (such as medical scans) as artistic material and subject matter. Such questions range from those of access to data through to ownership of the data once it has been transformed into an artwork, and to what extent data can be manipulated and re-presented in the name of affective, socio-political, artistic research. These projects have highlighted the complexities of conducting creative research using personal data, especially with regards to the intersection of research ethics, data privacy, and rapidly emerging technology from an interdisciplinary perspective.

As the KTVR project is supported by public grants, seated in an academic institution, and involves medical data, it has benefitted from going through multiple review processes that would not normally be available to independent artists and creative researchers. We have found, however, that while ethics review processes prompt rigorous consideration of how consent, data privacy, and potential harms should be addressed during the research project, they have a tendency to view ethics narrowly and are unable to adequately address the ethical issues particular to research-creation, in particular those that involve emerging and complex technologies (Oliver 2021). Most ethics boards are not equipped to evaluate artists’ proposals because they lack familiarity with research-creation processes (Cox et al. 2014) and their values and interests differ from those of the artistic community (Bolt 2016). Current policies use ethical guidelines that are meant for other disciplines and are ill-fitted or even antithetical to artistic practice (Bolt 2016). There is a need for ethical guidelines tailored specifically to creative work that will prompt artists to reflect research ethics throughout the design, research, and practice stages of their projects and provide review boards with a resource to consult when assessing a project.  

The guidelines proposed here aim to provide guidance specific to the use of sensitive personal data in artistic practices, and they are intended to be used by the artistic and creative research community at large and also ethics boards. Although the guidelines focus primarily on medical scan datasets (which are technically a form of “sensitive personal data”), they are applicable to all personal datasets such as biometric data and data scraped from social media platforms. The medical scan dataset with its indexical link to the interior, intimate, privacy of the body is both an example and a metaphor with which to think through other forms of personal data. 

These guidelines begin with a short review of existing visual research ethics, current data protection regulations, and legal frameworks. The guidelines then address a series of reflexive questions specific to the ethical and informed use of medical datasets with regard to anonymization, provenance, consent, different cultural understandings of data, authorship, and dissemination, data sustainability, and finally how AI (specifically machine learning) is applied to data. We use examples of artworks throughout as a way to contemplate and these complex, nuanced ethical questions, which differ according to situated circumstances.