The art of translation: Developing multilingual digital resources using artificial intelligence to support translation in bioinformatics

Applications are invited for an Open-Oxford-Cambridge AHRC DTP-funded Collaborative Doctoral Award at The Open University in partnership with the European Molecular Biology Laboratory - European Bioinformatics Institute (EBI). This fully-funded studentship is available from October 2022 on a full or part-time basis. Further details about the value of an Open-Oxford-Cambridge AHRC DTP award are available on our Studentships page.

Closing date: midday (UK time) 11 January 2022

Project overview

Societies flourish with open and free and communication between their members, while communication barriers can lead to social breakdown. By promoting communication across social groupings, the field of translation studies provides the means for removing barriers and promoting engagement across entire languages, cultures, and societies. Scientific fields, indeed, are striving for more collaboration and open science (demonstrated most notably in the worldwide sharing of Covid-19 data this past year). However, bioinformatics, the computational analysis of big biological data, is dominated by wealthier nations for which English is the official language. The result of this situation is that a key barrier for bioinformatics is that the materials are almost universally written in English, preventing engagement with what is already a technical, jargon-heavy field. Some key questions we would like to answer this project are:

  • What bioinformatics language exists currently in non-English languages, and how consistent is the meaning?
  • How do you translate jargon, particularly when the translated terms often do not exist or are not used in non-English languages?
  • How can you do this at scale, in a fast-moving field where information is often difficult to access due to the primary site of work being a laboratory environment? 
  • How can technical language meaning be translated effectively and sustainably for accessibility and democratization of science? 

In this project, the student will first and foremost develop a detailed linguistic description of the technical language used in bioinformatics, and from this description, develop computational tools for automating the translation of bioinformatics text. They will work in an interdisciplinary team of researchers in professional translation studies, bioinformatics, and artificial intelligence-mediated translation tools to develop an answer to this question and create a means for bioinformatics language translation at scale. 

In this project, the student will:  

  1. Survey current work on linguistic descriptions of scientific language.
  2. Carry out a complete linguistic description of bioinformatics text (using appropriate methodologies, e.g. corpus linguistics, formal linguistic description, etc).
  3. Perform a detailed comparative linguistics study, based on the different languages involved in the project (for example: plant biology research is common in Latin America, and therefore Spanish and Portuguese are more likely to contain plant-related bioinformatics terms).
  4. Survey the currently available machine translation tools for bioinformatics to identify current gaps in such technology.  
  5. Based on the results of (1-4), develop automated, artificial intelligence driven tools.  
  6. Evaluate the tools developed in (5), which will include cycles of survey and discussion with domain experts to inform and test the resources, and  
  7. Pilot these tools in the Galaxy Training Network, a free, open, globally available bioinformatics training suite.

Supervisory team

The Open University | Language & Literacies Research Area 

  • Wendi Bacon | Lecturer in Health Sciences 
  • Andrew Gargett | Lecturer in Artificial Intelligence 
  • Severine Hubscher-Davidson | Senior Lecturer in French & Translation & Research Director for Languages & Applied Linguistics 

EMBL-EBI | Training Team 

Training & professional experiences in the role 

  • Developing and documenting linguistic description
  • Data processing skills
  • Coding & programming skills 
  • Artificial intelligence & automated translation (Andrew Gargett) 
  • Markdown programming & bioinformatics jargon (Wendi Bacon, EBI) 
  • International Society for Computation Biology (ISCB) 
  • EBI with ISCB have flagged resource translation as a key priority, therefore, presentation at ISCB conferences and collaboration with heads of bioinformatics across the world will be part of the role 
  • The Galaxy Project 
  • Presentation and work with the Galaxy Community Conference & international training courses (over 1200 attended the last one) to promote open science and democratised data analysis 

Partner: European Molecular Biology Laboratory - European Bioinformatics Institute (EBI) Training Team 

The EMBL-European Bioinformatics Institute’s Training Team provides bioinformatics resources for scientists all over the world. They have built a strong global network of scientists across the globe. In their mission statement, they aim  

“To enable EMBL to deliver world-leading training in bioinformatics and scientific service provision to the research community, empowering scientists at all career stages to make the most of biological data, and strengthening bioinformatics capacity across the globe.” 

In line with this, they created and directed the Cabana project to increase bioinformatics capacity in Latin America and are involved in similar projects in Africa and Austral-Asia. EMBL-EBI offers access to an international community of bioinformaticians speaking a wide variety of languages. They also are a conduit to reach the target audience for every step if this project, to ensure feedback, voice and agency of the non-English speakers that this project is aimed at, and to ensure usage and impact of the results.

How to apply

Applicant Experience:

  • BSc in second language, with linguistic/translation courses/modules 
  • Coding/programming skills (desired) 
  • MA in translation or linguistics (desired) 
  • Experience with automated translation (desired) 

Potential applicants are encouraged to contact Wendi Bacon (wendi.bacon@open.ac.uk) with questions and for any guidance before submitting their application. 

You should apply to a PhD Programme in the Languages and Applied Linguistics Research Area by midday (UK time) 11 January 2022, indicate your interest in being considered for an Open-Oxford-Cambridge AHRC DTP studentship and submit a completed copy of the OOC DTP Application Form at the same time. Further details on how to apply for OOC DTP studentship funding is available on our How to Apply page.