Skip to main content

Ilse Ras reports on her research on British newspapers

My role in the project is to collect and analyse a corpus of British newspaper articles, published between 2000 and 2016, on the topic of human trafficking.
A corpus is nothing more than a collection of texts, and corpus linguistics is the field that uses corpora (the plural of corpus) to draw conclusions about language use.

Collecting corpora is not always a straightforward endeavour, in particular when the topic under investigation is sometimes misunderstood by the public, the media, and even legislators. For instance, human trafficking may also be known as ‘modern slavery’, and may encompass such crimes such as organ harvesting, forced labour, and domestic servitude. Furthermore, there is an ongoing debate about whether sex work should always be considered a form of exploitation and trafficking. Finally, it may not always be clear whether someone has been trafficked (i.e. moved using coercion or deception for the purposes of exploitation), or smuggled (i.e. voluntarily but irregularly moved across borders), in particular as those who volunteer, or more accurately pay, to be smuggled across borders are also often deceived and exploited, both along the way and at the end point.

I used the Lexis Nexis database, mainly for practical reasons: it has a great number of British newspaper articles, I’ve used it before, and its output is compatible with a special Python script that Chris Norton (University of Leeds) wrote for me a few years ago, that separates articles and organises them in a useful folder structure.

One way of collecting articles is to go into Lexis Nexis, input some very broad but relevant search terms, and manually select all articles that actually discuss the topic under investigation. This is what I did for my PhD research. However, this is an extremely time-consuming method, as it requires wading through several millions of articles in order to select maybe 50 – 90 thousand.
I only have three months in which to complete my part of the project, and it would be a waste of time to spend all three months just collecting data.
But that’s not necessary. Gabrielatos, then at the University of Lancaster, published a paper in 2007 outlining the data collection method used to create a corpus for the RASIM-project (more information here: 
http://ucrel.lancs.ac.uk/projects/rasim/). Gabrielatos’ (2007) method entails collecting a sample corpus using two ‘core’ search terms; generating a list of possible additional search terms from this sample corpus; testing these possible additional search terms, and then using those that pass the test (as well as the core search terms) to collect the full corpus. It’s a far less time-consuming method, and although there is a slightly increased risk of collecting articles that aren’t strictly relevant, it is also more systematic and therefore replicable than just manually collecting articles.

So that’s what I did. I first created three sample corpora:
1.       One at the start of the period (1/1/00-30/9/00)
2.       One at the end of the period (1/1/16-30/9/16)
3.       One in the middle of the period (1/1/08-30/9/08)
I used a handful of core search terms, rather than two. These included ‘slavery’, ‘forced labour’, ‘sexual exploitation’, and ‘human trafficking’.

Of these search terms, ‘slavery’ produced the highest number of articles that did not also mention any of the other search terms, presumably because historical slavery is often still considered a different thing than modern slavery (which is in itself worthy of examination). As such, this search term is the threshold against which all other potential search terms are tested.

I used these three sample corpora to create key word lists. A key word list shows which words are used much more often in the sample corpus compared to another, reference, corpus. Rank 60 is the cut-off point for selecting additional search terms from these key word lists, so only the top 60 words of every list were tested against the threshold set by ‘slavery’. I also asked my co-investigators to send me lists of words that they thought could be useful search terms, and tested those, too.

Eventually, I used the core search terms, and the additional search terms that passed the threshold test, and collected the full corpus, which currently consists of slightly over 80 thousand articles.  Chris’s Python script initially only recognised articles published by seven of the major British newspapers – which is what it was originally intended to do. So I adapted it to recognise the other news sources that we wanted to include in this project.

The next step is to actually conduct analyses of this corpus, and I will be back to update you on that in December.
-          Ilse Ras

References:
Gabrielatos, C. 2007. Selecting query terms to build a specialised corpus from a restricted-access database. ICAME Journal 31, pp.5-43 (available here: http://clu.uni.no/icame/ij31/ij31-page5-44.pdf)

Comments

Popular posts from this blog

Dr Nina Muždeka explains what she will examine in her research

As a complex issue, transnational human trafficking invites  debate facilitated by the role of media as both a contemporary watchdog and a modern forum for showcasing diverse viewpoints. In the analysis of the transnational human trafficking coverage in the news media within the domain of narrative theory and the theoretical framework of poststructuralism, the following two aspects appear to be crucial: (1)  The role of news media, as a forum for expressing different opinions in relation to the causes and solutions to human trafficking, in the construction of public opinion and response to the issue, as well as in the formation and implementation of policy on human trafficking, exemplified by the choices they make in reporting on the issue, and (2)  The application of the contemporary narrative theory to the analysis of news media texts as means to construct meaning and reality, which details and explains the importance of the process of story-telling and the struct...

Our upcoming symposium

Over the last few months we have been busy planning our Symposium. Herewith the latest details of the event: This one-day symposium is held by our PACCS (Partnership for Conflict, Crime and Security Research) ESRC-funded project on ‘Representation of Transnational Human Trafficking in Present-Day news media, true crime, and fiction’? The symposium will take place on September the 12th 2017 in Leeds . It will commence with registration at 9.30 am followed by the opening address at 10 am. Information on the venue can be found here . Carriageworks is in the heart of Leeds city centre with excellent links to public transport, and with several large car parks nearby. The symposium will showcase some of our project partners’ research results (with findings split across the genres of newstexts, crime fiction, and true crime documentaries), welcomes feedback from participants, and features a group of especially invited speakers: Police and Crime Commissioner for West Yorkshir...

PROJECT UPDATE

PROJECT UPDATE Following our successful Symposium  in September, at which we shared our findings and hosted prominent researchers, practitioners and experts, we set about working on our other planned avenues of dissemination for our research into the representation of transnational human trafficking. Our efforts soon bore fruit:  in November we were offered a book contract by Palgrave.  We are very pleased to have been awarded additional funding by ESRC to publish this Palgrave Pivot book Open Access.  Open Access will no doubt increase its impact and outreach to a wider audience of readers, which we consider to be very important and at the heart of our project's rationale. Our book  is entitled Representations of Transnational Human Trafficking:  Present-day News Media, True Crime, and Fiction . It is   a collection of essays by project investigators Dr Christiana Gregoriou, Dr Charlotte Beyer, Dr Melissa Dearey, and Dr Nina Muždeka and researc...