Privacy and data protection aspects
Intensely interacting with local governments and NGOs, the UniteEurope consortium has encountered challenges of a legal and ethical nature that can serve in an exemplary manner for related projects or similar endeavours. These are in parts due to the sensitive policy field that UniteEurope is operating in, but – in a larger part – also to the weak legal footing of SMA and the peculiarities of governmental end users.
As Wetzstein and Leitner [34] elaborate, public bodies making use of social media analyses do not “focus on people as customers or consumers, but as citizens (…) and often act in fields of “great societal relevance and political interest”. Therefore, SMAT designed for governmental purposes require more elaborate considerations on legal and ethical aspects for a variety of reasons.
In the first place, social media have changed our very notion of privacy. Participants seem to care little when it comes to sharing personal information about oneself, about one’s friends or networks in digital environments. Often it is difficult for the user to distinguish between what is public and what is private [4]. “The space for private, unidentified, or unauthenticated activity is rapidly shrinking. (…) nearly every human transaction is subject to tracking, monitoring, and the possibility of authentication and identification” [35]. The fact that the very concept of privacy is becoming increasingly blurred is further exacerbated by the absence of clear privacy regulations in the field information and communication technologies in general and social media in particular.
While it seems that privacy concerns tend to be of minor importance to social media users, empirical evidence proposes that such concerns are increasing when users interact directly with governmental agencies, e.g. via e-government services. According to the World Bank, “[c]itizens often express concern about the security of their private and confidential information, possible surveillance, and anonymity” [33]. The authors suggest that “(w)ithout strong protection or the quick resolution of any breach, citizens will be wary of sharing their information with the government, and efforts to connect and interact would quickly be undermined” [33].
As a consequence, citizens’ acceptability of governments making use of social media – a sphere where the legal framework is weak and users tend to feel unobserved—requires legitimacy. Therefore, governments need to make sure comply with all existing legal standards to ensure “trustworthiness, traceability, security and privacy of citizens’ data” [26].
In order to keep privacy impacts, governments shall restrict themselves to using publicly available data only. This means that the respective SMAT must not collect information which individuals post on their private accounts, but limit its access to posts that are explicitly marked “public”. However, this is not enough as safeguarding measure, as the European Court of Justice (ECJ) held in 2008: “A general derogation from the application of the directive in respect of published information would largely deprive the directive of its effect. It would be sufficient for the Member States to publish data in order for those data to cease to enjoy the protection afforded by the directive” (C-73/07 Satakunnan Markkinapörssi and Satamedia [2008] ECR I-9831, § 48). According to the standards of legitimate data processing imposed by the European Data Protection Directive (Directive 95/46/EC) as well as by the relevant national acts that transpose the Data Protection Directive (DPD) in the EU member states, the decisive point is the question of the “data subject”: One author of a posting is not necessarily the only “data subject” dealt with in a posting, because this individual can publish information about third “data subject”. In the case that an author publishes “sensitive data” of a third person, this constitutes “illegitimately published information”. The “processor” (i.e. SMAT-provider) of such illegitimately published “sensitive data” of a “data subject” commits an activity relevant in terms of data protection principles. Thus, the SMAT-provider is responsible even if the purpose of the tool is not to collect personal data, but to inform future policy [36]. It goes without saying that, data protection regulations do not hold business less responsible than governments; however, it can be observed that governments are more concerned with this legally vague situation and have a greater interest to strive for transparency and social acceptability.
Recommendations from the UniteEurope consortium are to guarantee a careful selection of social media sources alongside their compliance with European and national data protection legislation, but also to render social media authors anonymous by hiding their names and nick names from the end users of the tool. Thereby all provided information is restricted to the text of the posting leaving the author’s point of reference, location or any other personal information. Additionally, the end users need to be made aware of the legal situation and safeguarding measures that are being taken. SMAT providers should furthermore consult with and register at the relevant national Data Protection Commission (DPC) and observe legal developments in the fields (further remarks in [36]).
We can preliminarily conclude that SMAT providing governments with data that inform forward-looking policies are standing on a weak legal footing whilst data protection legislation and jurisdiction fall short in grasping the potential consequences coming from these new technologies and their rapid progress. Despite a certain protectiveness towards privacy rights that is shown in the current legal debate, SMA tools remain in large parts unregulated [36].
Ethical issues and methodological consequences
Furthermore, there is a range of ethical aspects that come into play when governments make use of SMAT. These are, to a great extent, depending on the very purpose of the application itself. Taking the case of the UniteEurope project, carrying out social media analysis for supporting integration policy making is a highly value-laden field, which demands sensitive precautions for protecting particularly vulnerable individuals, but also for appropriately avoiding political misuse.
In this regard, Omand, Bartlett et al. [4] mention the issue of interpretation which concerns the fast changing language used in social media, but also aspects of irony or consciously spread rumors that generally cannot be identified by computers and can lead to misleading result:
“There are new forms of online behavior, norms and language that make analysis and verification difficult. Translating often unprecedentedly large, complex and conflicting bodies of information into actionable, robust insight is a significant challenge that has not been overcome.” [4]
The aspect of rumors on social media is more and more coming into focus of current research, notably in the field of SMA use for crisis mitigation [37]. Public bodies that intend to use the information retrieved from social media for policy making need to be aware of these deficiencies in order to know how to interpret and evaluate the information, however there has not been any significant progress yet. In this regard, it is crucial to inform end users about the limits of the tool with regards to interpretation.
A more severe issue in terms of (research) ethics is that of the missing “informed consent” in SMA, comprehensively dealt with in Krieger, Grubmüller et al. [36].
“Being in compliance with the law is one step to diminish ethical concerns, but must be considered a minimum standard only for coming up to ethical requirements concerning data protection. In this regard, the lack of ‘informed consent’ is an issue that requires precautions in order to protect the authors of postings who might not be aware of the public availability of their contents, let alone of their deployment for research purposes.” [36]
Whereas the issue of informed consent is also encountered in conventional research methods (e.g. unobtrusive observation), the authors consider it particularly delicate in SMAT “due to the very nature of ‘digital reality’ that allows fast and easy detection of data” [36]. Especially younger users are often not aware of the consequences of their public postings [35], let alone that they might be used as information source for governments. Therefore, measures to ensure anonymity is not only important from a legal, but also from an ethical viewpoint. More generally speaking, “(g)overnments will need to exercise care in securing their systems and software to avoid any perception of surveillance” [33].
A further concern is the selection of social media sources that a SMAT uses for contents gathering. As the “core” of the tool, they determine the quantity, the quality, as well as the explanatory power of the yielded results. They decide about which groups, which comments and which opinions are reflected in the results and, on the longer run, considered for policy making. As Grubmüller, Krieger et al. [36] state, the selection of sources needs to be based on aspects such as “(…) ‘Who is active on social media?’, which brings about issues of ‘digital divide’ (exclusion of certain groups of people depending on variables such as age, computer literacy, gender, etc.), the strong presence of populist and extremist positions in social networks and, in contrast, the weak presence of (certain groups of) migrants” [36]. An according methodology has been developed in the frame of the UniteEurope project.
This leads to the question of representation, which not only refers to the input to the tool, but also to the output the tool is producing. In general, as Krieger, Grubmüller et al. [36] recommend, that quantitative results (e.g. frequencies of names/keywords, number of references through users etc.) which are very useful for SMAT in a commercial context, should be accompanied by qualitative data and additional context information (such as the indication of sources, the number of sources, extracts from the postings, links to the original pages, etc.). Otherwise, as they claim, results based on frequencies only “(…) can be misleading in the sense that individual sources and/or individual users can produce above-average amounts of partial contents” [36]. Also sentiment analyses (i.e. categorisation of content entities as positive, negative or neutral), which use to be very widespread with commercial SMAT, can be problematic and often not applicable for SMAT for government that deal with value-laden subjects such as migrant integration.
At the same time public administrators using SMAT must be trained in diversity awareness and social media literacy [38–41]. Awareness raising measures for end users shall inform them of both the opportunities and limitations that these new technologies hold for governments. This is also to prevent potential (unintended) misuse of such tools. Thus, Krieger, Grubmüller et al. [36] recommend for SMAT providers to “providing manuals and training materials that contain sensitizing information with regards to how these data are being gathered as well as both the significance and limits which the results bear”.