News Archives

Announcing the #AI4D Africa Innovation 2019 Winners

The AI for Development (AI4D) Initiative is pleased to announce the winners of the AI4D-Africa Innovation Call for Proposals 2019.

Sign up and join us to celebrate the winners at  Deep Learning Indaba 2019 at the #AI4D Network of Excellence Innovation Grant Award Ceremony:

    • Tuesday, 27th August 2019 at 7 PM (Nairobi Time)
    • (LOCATION UPDATE) Interaction Hall – KUCC, Kenyatta University, Nairobi, Kenya.

The first named individual is the Principle Investigator. Funding for these innovation seed grants is made available with the support of Canada’s International  Development Research Centre. To learn more about our Network of Excellence in Artificial Intelligence for Development in Sub-Saharan Africa click here. 

Congratulations to all recipients. Follow us at @AI4Dev. 


Dr. Abdelhak Mahmoudi  
Mohammed V University of Rabat, Morocco
Arabic Speech-to-MSL Translator: ‘Learning for Deaf’
To develop an Arabic text to Moroccan Sign Language (MSL) translation product through building two corpora of data on Arabic texts for the use of translation into MSL. The collected corpora of data will train Deep Learning Models to analyze and map Arabic words and sentences against MSL encodings.


Dr. Adewale Akinfaderin, Olamilekan Wahab and Olubayo Adekanmbi
Data Duality Lab, Data Science Nigeria, MTN Nigeria, Nigeria
Using Artificial Intelligence to Digitize Parliamentary Bills in Sub-Saharan Africa 
To improve and expand the categorizing of parliamentary bills in Nigeria using Optical Character Recognition (OCR), document embedding, and recurrent neural networks to three other countries in Africa: Kenya, Ghana, and South Africa. 


Dr. Amelia Taylor, Eva Mfutso-Bengo and Binart Kachule
University of Malawi and the Polytechnic, University of Malawi, Malawi
A Semi-Automatic Tool for Meta-Data Extraction from Malawi Court Judgments 
To develop a methodology for a semi-automatic classification of judgments disseminated by the High Court Library of the Malawi Judiciary with the purpose of enabling ‘intelligent searching’ within this body of knowledge.


Dr. Aminata Zerbo Sabane, Dr. Tegawendé Bissyande, and T. Idriss Tinto 
L’université Joseph Ki-Zerbo and La Communauté Afrique Francophone des Données Ouvertes, Burkina Faso
Preservation of Indigenous Languages 
To initiate a research roadmap for the preservation of indigenous languages through the means of collecting, categorizing and archiving of translation and voice synthesis to perform the automatic translation in official and indigenous languages. 


Denis Pastory Rubanga, Dr. Zekaya Never, Dr. Machuve Dina, Lilian Mkonyi, Loyani K. Loyani, Richard Mgaya.
Tokyo University of Agriculture, The Nelson Mandela African Institution of Science and Technology, and Sokoine University of Agriculture, Tanzania
A Computer Vision Tomato Pest Assessment and Prediction Tool    
Pest monitoring by using a data-driven computer vision technique in directing the extension officers support services across sub-Sahara Africa in a real-time pest damage assessment and recommendation support system for small scale tomato farmers.


Martha Shaka, Nyamos Waigama, Emilian Ngatunga, Halidi Maneno, Said Said, Said Mmaka, Frederick Apina, Simon Chaula, Emani Sulutya, Merikiadi Mashaka
University of Dodoma and Benjamin Mkapa Hospital, Tanzania
Effective Creation of Ground Truth Data-Set for Malaria Diagnosis Using Deep Learning 
To create an automatic data annotation tool and ground truth dataset for malaria diagnosis using deep learning. The ground truth dataset and the tool will streamline the development of AI tools for pathology diagnosis.


Dr. Moes Thiga and Dr. Pamela Kimeto
Kabarak University, Kenya
Early Detection of Pre-Eclampsia Using Wearable Devices and Long Short Term Memory Networks
To determine the effectiveness of Long Short Term Memory Network in the prediction of pregnant mothers at high risk of developing pre-eclampsia and the effectiveness of prophylaxis of preeclampsia.


Ronald Ojino and Khushal Brahmbhatt
Cooperative University of Kenya, Kenya
A Public Dataset on Poaching Trends in Kenya and a Study on the Predictive Modeling of Poaching Attacks
To test the feasibility of the deployment of Unmanned Ground Vehicles (UGVs) for automated intelligent patrol, detection, wildlife monitoring, identification across the national parks and reserves in Kenya. 


Steven Edward, Edward James, and Deo Shao
Nelson Mandela African Institute of Science and Technology, Tanzania
Improving the Pharmacovigilance system using Natural Language Processing  on Electronic Medical Records
To improve the pharmacovigilance system by proposing a novel algorithm for the auto-extract of adverse drug reaction cases from Electronic Medical Records and reduce the time taken and introduce the confidentiality of reporting.


Dr. Tegawendé F. Bissyande, Dr. Aminata Zerbo Sabane, and T. Idriss Tinto 
Université Joseph Ki-Zerbo and La Communauté Afrique Francophone des Données Ouvertes, Burkina Faso
Building a Medicinal Plant Database for Preserving Ethnopharmacological Knowledge in the Sahel 
To initiate the collection and construction of a medicinal plant database on top of which a search engine and AI-based image recognition for plants to enable scalable search of preserved knowledge.

A roadmap for artificial intelligence for development in Africa

Based on the workshop ” Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa” – April 3-5, 2019 Nairobi, Kenya.


Artificial intelligence is poised to enhance productivity, innovation and help countries across sub-Saharan Africa to achieve the Sustainable Development Goals – helping to improve outcomes from health care to agriculture to education. Yet as with any technology, the transformative potential benefits come with challenges that need to be managed and moderated. Though countries across sub-Saharan Africa are on this threshold of harnessing AI technologies to support genuine human development and bolster economic and political progress, there are likely to be ramifications on already precarious livelihoods, labour markets, and fragile governing institutions. Addressing these opportunities and challenges are key to the roadmap for AI for development.  

The AI Readiness Index 2018 (forthcoming)
The AI Readiness Index 2018 (forthcoming)

Infrastructural challenges, data gaps, highly skilled labour needs, and poor regulatory environments are still inhibiting people’s ability to harness AI across Africa. A global review of secondary data available on AI suggests that the continent needs to support better infrastructure and fill data gaps, and support education and capacity building to ensure there are enough highly skilled researchers able to implement AI solutions. Moreover, a critical dimension is the legal, ethical and human rights-based frameworks needed for countries to optimize the best uses of AI – frameworks that will protect citizens’ data and build trust and legitimacy for AI and the multitude of applications. These are still largely absent.

In the next five years, what would happen if we had more than 30 African countries developing their own AI policies and strategies? Imagine if there are more than 400 PhDs in AI and machine learning from across the continent? And what if universities, the private sector, and other public interest institutions invest a billion dollars in collaborations on AI to support the African sustainable development agenda?

These proposals are now emerging from the African AI community after the recent workshop in Nairobi on Artificial Intelligence for Development in Sub-Saharan Africa. The meeting gathered sixty African and international experts together to demonstrate and discuss how people across the continent are developing AI at a rapid pace. There is a vibrant AI ecosystem emerging across the continent with initiatives like Deep Learning Indaba, the African Institute for Mathematical Science (AIMS), Data Science Africa, and Women+ in Machine Learning and Data Science (WIMLDS) — all of which aim to strengthen machine learning and artificial intelligence in Africa.

To support the growing momentum in Africa’s emerging AI ecosystem, this emerging ‘network of excellence’ of machine learning and AI practitioners and researchers is building a collaborative roadmap for AI for Development in Africa. The three-day workshop focused in on three critical areas of 1) policy and regulations, 2) skills and capacity building and 3) the application of AI in Africa. The following slides and the analysis below synthesize the participatory discussions during the workshop on these three critical areas.  

Artificial Intelligence Network of Excellence in Sub-Saharan Africa – Nairobi Workshop 2019

Policy and regulations

“Thirty African countries should develop AI specific policies and strategies over the next five years”

Currently, out of the 46 sub-Saharan African states, only Kenya has an AI taskforce that is working towards a national strategy. There is a clear need to build policy capacity as well to help design AI regulatory frameworks fit for the African context. An area of convergence in this discussion was the need to channel investments toward prioritized solutions for development, but also the need to reflect and better understand what “development” means. The pan-African development blueprint – the African Union’s Agenda 2063 – could offer such a shared vision. Other ideas emerged during the workshop, including collaboration within the network to inform and grow AI-specific policies to at least 30 African countries over the next five years, and increasing the adoption of the AU’s Convention on Cyber Security and Personal Data Protection. There is also critical recognition that the copy-and-paste of made-in-the-North AI policy frameworks will not serve the needs of Africans. No one country has all the solutions. So while policy options and best practices from other parts of the worlds can provide great sources of inspiration and guidance, the priorities and directions must be determined locally in a way that facilitates innovation, curbs any harms and upholds human rights.

Skills and infrastructure

“Create a pipeline of 400 African PhDs in AI, data science, and other interdisciplinary fields.”

Building an inclusive AI future in Africa means we must build AI strength through community. Skills and capacity received a lot of attention during the three-day workshop. One of the participants at the workshop shared that there is a need to “better formalize AI training and the recruitment of AI talent in Africa” and to create a pipeline of 400 African PhDs in AI, data science, and other interdisciplinary fields over the next five years. This means having the appropriate infrastructure to build this capacity.

Furthermore, there is the need to enhance the strength and agility of educational systems to meet new digital opportunities, and support investments in learning outcomes in the areas of machine learning, artificial intelligence, and data sciences in Africa. As one participant said, “Let’s think beyond PhDs. We should also target and invest in youth at the primary school level, offer mentorship programs for emerging leaders, including for women and girls to build the next generation of AI practitioners, and create opportunities to bring more of the public into the AI conversation.” Moving forward, inclusion must remain a core principle in guiding the design of responsive and equitable AI skills and capacity building roadmap in Africa.

The hope is that it will be possible to establish an AI Centre of Excellence in every African country by 2030 that will help incubate ideas and foster AI communities of practice in an interdisciplinary and inclusive matter.


“A collective investment of USD 1 billion dollars in collaborative innovation and research prioritizing solution areas for sustainable development in Africa.”

The application of AI must be connected to people and their needs on the ground, and the potential benefits of AI translated into real impact for people. The ground-up approach of this collaborative workshop was an excellent way to start drawing fresh ideas and practical use cases on how AI could help increase access to healthcare and education for the most vulnerable in rural areas, and improve the movement of people, goods, and ideas especially within and between rapidly growing megacities across Africa.

A key idea discussed among participants was how to mobilize collaboration among a network of African companies, universities, research centres, and public institutions to collaborate on advancing the AI research for development agenda. Participants suggested a collective investment of USD 1 billion dollars in collaborative innovation and research prioritizing solution areas for sustainable development in Africa. These partnerships could mean when Africans design and deploy AI applications, societal goals and human rights commitments like decent work conditions and gender equality are integrated into projects from the beginning.

Next steps

It is very clear from the diverse range of voices at the workshop that we cannot forget that within Africa’s emerging and vibrant AI ecosystem, the inclusion and diversity of voices from traditionally marginalized communities must be prioritized. And there is a need to expand the AI Network of Excellence conversation to Francophone and Portuguese-speaking countries on the continent as well. It is also critical to understand that contexts differ in Kenya, Nigeria, and South Africa and even more so in places where AI is still unknown, so strategies and engagement must be tailored to local situations.

The enriching and collaborative dialogue over the three days of the AI4D workshop has built a solid foundation for the research and innovation agenda in Africa. Progress on this agenda will depend on the collective action of many actors to hone and support this vision, and we will continue to engage with partners and stakeholders in shaping an inclusive AI ecosystem in Africa.  

See also:


We would like to thank all the participants who contributed their voices and ideas during this event. This three-day dialogue was organized by the International Development Research Centre, Swedish International Development Agency, Knowledge for All Foundation and the Strathmore University in Nairobi, Kenya. Follow us on Twitter at @AI4Dev


lnteligencia Artificial y Desarrollo en America Latina: bases para una iniciativa regional

El crecimiento de la potencia computacional, la disponibilidad de grandes cantidades de datos abiertos y el progreso de los algoritmos han hecho que la Inteligencia Artificial (IA) se convierta en una de las tecnologías más prometedoras y desafiantes del siglo XXI. Sin embargo, varios países de la región de América Latina y el Caribe (ALC) apenas están generando conciencia sobre la importancia de estos desarrollos, y pocos han explorando el desarrollo de políticas nacionales aplicables a la inteligencia artificial. En este proceso, existe una desconexión importante entre la comunidad de práctica de la IA y las comunidades en desarrollo.

Dado este contexto, el Centro Latam Digital y el Centro Internacional de Investigaciones para el Desarrollo (IDRC) organizaron conjuntamente un taller con invitación para establecer colectivamente con actores clave un proyecto regional de Inteligencia Artificial para el Desarrollo.

El taller se llevó a cabo los días 25 y 26 de febrero de 2019 en el Centro de Investigación y Docencia Económicas (CIDE) en la Ciudad de México. El taller se desenvolvió como un intercambio abierto con la colaboración de la Iniciativa Latinoamericana para Datos Abiertos (ILDA) a través de sesiones plenarias y discusiones en grupos de trabajo con más de 30 expertos de nueve países de la región. Los participantes fueron seleccionados en función de sus antecedentes y conocimientos asociados al tema en cuestión.

Los objetivos del taller fueron:

  • Reunir a una serie de actores clave de las diversas comunidades involucradas en el desarrollo o producción, investigación, regulación y uso de la IA para compartir conocimiento e identificar áreas prioritarias claves de investigación y capacitación para la región.
  • Proporcionar directrices sobre la estructura y los objetivos potenciales de una iniciativa para apoyar el desarrollo de políticas nacionales sólidas sobre la IA en la región.
  • Establecer la base de una comunidad regional de responsables políticos, investigadores, empresas e instituciones del sector privado y organizaciones no gubernamentales orientadas al desarrollo y la implementación de la

Este blog resume las ideas más importantes del taller junto con los pasos a seguir hacia la construcción de una red de expertos y organizaciones que responden a la evolución reciente de las tecnologías de la información y comunicación (TIC) en América Latina e impulsen la IA como un vehículo para el desarrollo social y económico de la región.


 El enfoque durante el primer día fueron los desafíos de la inteligencia artificial en la región latinoamericana. Los participantes en el grupo coincidieron en que los gobiernos no reconocen a la IA como una solución potencial para los asuntos urgentes del desarrollo social y económico de su país. Es necesario crear un mayor entendimiento del impacto potencial de la adopción de la IA como un medio para la solución de problemas prioritarios en las agendas nacionales de desarrollo. El contexto latinoamericano enfrenta una falta de conciencia en el sector público sobre la IA; se trabaja de forma reactiva, vertical, en silos y sin convergencia.

Aunque exista un creciente acervo de informes y recursos para el análisis de la importancia de las agendas digitales nacionales, pocas veces se traducen en acciones por parte de los gobiernos debido a la falta de capacidad, estructura legal y autoridad para llevar a cabo la implementación en un contexto local específico con recursos limitados.

Los participantes estaban de acuerdo con que el gobierno es un actor esencial para impulsar políticas públicas eficientes para el aprovechamiento de la IA como habilitador del desarrollo de largo plazo. Sin embargo, los gobiernos necesitan marcos éticos de gobernanza de datos, lineamientos para la protección de datos y privacidad de información, una mejor distribución de recursos y de mecanismos para dar continuidad a una agenda digital y el uso de IA como habilitador.

Los siguientes puntos fueron planteados por los participantes.

  • Es clave definir una agenda en común entre actores en la región para plantear la IA como una capacidad transversal, no un producto, para impulsar el desarrollo sostenible.
  • Para definir una agenda accionable, es importante impulsar plataformas multiactor para dialogar y conocer a los actores a nivel regional y entender sus capacidades en distintos sectores, así como realizar un diagnóstico para para entender qué herramientas ya
  • Es necesario no solo fortalecer las capacidades de investigación en temas de IA, sino también hacer un vínculo entre académicos y empresas hasta complementar e equilibrar capacidades. Si complementamos capacidades, más fuerza tenemos para crear alianzas para desarrollar iniciativas alrededor de problemas críticos de los países de la región.
  • Es clave generar una demanda ciudadana para una agenda nacional de IA a través de
  • Es sumamente importante diagnosticar la etapa de madurez de IA en la región, para entender la oportunidad de fortalecer capacidades desde alfabetización digital hasta

gestión de proyectos de IA, articulación en políticas públicas, la capacidad técnica y acceso y manejo de infraestructura en los varios sectores.

Caminos por seguir

Los participantes buscaran definir posibles estrategias alrededor de tres puntos de intervención: diseño e influir en la política pública, desarrollo de habilidades, y aplicaciones de IA en la región. Conforme a la idea presentada por los representantes de IDRC, las intervenciones serían diseñados y ejecutados por una red regional de especialistas y organizaciones liderando temas relacionados con la IA.

Política pública

Para influir en la política pública, se acordó que es importante realizar un mapeo de iniciativas y políticas en la región para tener un panorama claro de los vacíos de conocimiento, capacidades, niveles de adopción y recursos asignados alrededor de IA. También se tendría que identificar dos a tres puntos de incidencia, posiblemente en el marco de gobierno digital o problemas sectoriales concretos que necesiten soluciones de corto plazo, para demostrar el potencial de impacto de la adopción de IA en la eficacia y eficiencia de programas gubernamentales. Para vigilar e informar sobre estrategias y políticas de IA, se podría crear un observatorio que también desarrolle una metrica de madurez de adopción nacional y subnacional en la región. Por último, es importante ofrecer actividades de capacitación presenciales y en línea de IA para gobiernos y definir esquemas de incentivos para su participación.

Desarrollo de habilidades

En el ámbito de desarrollo de habilidades, se propone crear un centro de investigación de alta calidad que estructure programas para crecer y retener talento para contribuir a la IA aplicada para abordar los retos al desarrollo en la región. Este centro se conecta a universidades, gobierno, industria. Cuenta con una agenda de investigación interdisciplinaria y un área de emprendimiento para contribuir al diseño de las soluciones para el desarrollo . Se comentó que al tener un centro de IA aplicada basada en la región, esta iniciativa puede tener un efecto secundario a la educación alrededor de alfabetización digital, STEM y generar entusiasmo en el sistema educativa.


Se deben definir áreas (como educación, agricultura, violencia, salud) para la aplicación de IA usando un marco de alto nivel como Objetivos de Desarrollo Sostenible que permiten adaptarse en contextos locales. Género debe ser un tema transversal que se integre en todas las áreas en las que se aplicaría intervenciones de IA. Para definir estas áreas, es clave realizar un mapeo de actores que participen en la selección de temas y ayuden identificar las iniciativas que deben ser elegidas a través de una serie de criterios. También se debe identificar mejores prácticas en IA para facilitar la réplica de iniciativas a través de estandarizar y proporcionar paquetes de herramientas para la implementación. Se propuso la idea de tener 3-4 estudios de caso en los primeros dos años del proyecto, y en cinco años tener resultados más tangibles con métricas e indicadores de impacto tales como niveles de bienestar, capacidades, conciencia y percepción del potencial de la inteligencia artificial entre otros.


Los participantes de este taller demostraron un gran interés en identificar oportunidades para la colaboración, la creación de redes y el compromiso de políticas en torno a la inteligencia artificial para el desarrollo en la región. Es necesario continuar con las discusiones y fortalecer vínculos entre las instituciones participantes para dar continuidad a las agendas discutidas. Se alentó a los participantes a identificar los próximos pasos en los que se pueda actuar.

Conforme a las conclusiones del taller, se desarrollará una red formal para sostener esta comunidad de especialistas. Esta red contará con una marca, estrategia y objetivos definidos, y un proceso de integración de nuevos actores para expandir a nivel país y regional a una escala manejable. A través de esta red, se definirán estrategias para compartir hitos, eventos, iniciativas de ley, contactos y oportunidades de financiamiento a través del uso de herramientas como Google Drive, Slack, una página web, y las redes sociales. Se desarrollará una ruta crítica para la red junto con un plan de comunicación para asegurar el seguimiento de las conversaciones acerca este proyecto. Un vez confirmado el cronograma y el presupuesto de IDRC, se organizarán reuniones de trabajo subsecuentes.

Centro Latam Digital e ILDA facilitarán conversaciones en curso entre los participantes a través de un canal de AI4D en SLACK para dar continuidad a la formación de una red de expertos, legisladores, investigadores y tecnólogos que han contribuido a este diálogo sobre inteligencia artificial para el desarrollo de América Latina.

Organizado por:

Philip Apodo Oyier from Jomo Kenyatta, University of Agriculture and Technology on AI and common problems in Africa

Philip Apodo Oyier, Jomo Kenyatta University of Agriculture and Technology at workshop "Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa", Nairobi, Kenya, April 2019
Philip Apodo Oyier, Jomo Kenyatta University of Agriculture and Technology at the workshop “Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa”, Nairobi, Kenya, April 2019

What are you working on at the moment?

My name is Philip Apodo Oyier from the Jomo Kenyatta University of Agriculture and Technology. From the school of computing and information technology. I teach computer science and I run a coordinator centre called Kenyan Centre for data analytics, where we run a masters program in data analytics. The idea is soon as they are done with the thesis, they can now solve industry-based problems in applied Artificial Intelligence.

How do you perceive development and Artificial Intelligence?

The workshop in Nairobi has been interesting because we have interacted with people from different fields, but with a common theme of AI and its applications. So far so good, because every learning experience proves what you know and especially for me the what has been most interesting, the use cases across Africa because we have common problems and AI has a potential to provide solutions. Of course, AI contributes a lot for development because the AI techniques can be applied to make more decisions that are automated and intelligent.

That relives man of some of the tasks that we do. And then the decision can be used for man to better improve on our environment and our problem-solving capabilities.

What would be your blue sky project in Africa?

If given a limited resource because of our centre that I’m managing, I’d wish to get more industry players to give us problem sets or data sets that students can use for their thesis work.

Because that’s currently where the challenge is, you find that students are done with their masters, with their thesis, but now practical problems from indexing that can solve become a challenge. So if we get a limit that would be really my concentration, get industry players, link them with the students, then the problems that come from the industry, the students solve those problems.

Improving Pharmacovigilance Systems using Natural Language Processing on Electronic Medical Records

This research focuses on enhancing Pharmacovigilance Systems using Natural Language Processing on Electronic Medical Records (EMR). Our major task was to develop an NLP model for extracting Adverse Drug Reaction(ADR) cases from EMR. The team was required to collect data from two hospitals, which are using EMR systems (i.e. University of Dodoma (UDOM) Hospital and Benjamin Mkapa (BM) Hospital). During data collection and analysis, we worked with health professionals from the two mentioned hospitals in Dodoma. We also used the public dataset from the MIMIC-III database. These datasets were presented in different formats, CSV for UDOM hospital and MIMIC III and PDF for BM hospital as shown on the attached file.

Team during an interview with Pharmacologist in BM hospital
Team during an interview with Pharmacologist in BM hospital

In most cases, pharmacovigilance practices depend on analyzing clinical trials, biomedical writing, observational examinations, Electronic Health Records (EHRs), Spontaneous Reporting (SR) and social media (Harpaz et al., 2014). As to our context, we considered EMR to be more informative compared to other practices, as suggested by (Luo et al., 2017). We studied schemas of EMRs from the two hospitals. We collected inpatients’ data since outpatients’ would have given the incomplete patient history. Also, our health information systems are not integrated, which makes it difficult to track patients’ full history unless patients were admitted to a particular hospital for a while. From all the data sources used there was a pattern of information that we were looking for, and this included clinical history, prior patient history, symptoms developed, allergies/ ADRs discovered during medication and patient’s discharge summary.

Much as we worked on UDOM and BM hospitals’ data, we encountered several challenges that made the team focus on MIMIC-III dataset while searching for an alternative way to our local data. Here were the challenges noted:

  • The reports had no clear identification of ADR cases.
  • In most cases, the doctor did not mention the reasons for changing a medicine on a particular patient which made it hard to understand whether the medication didn’t work well for a specific patient or any other reasons like adverse reaction.
  • The justification for ADR cases was vague
  • Mismatch of information between patients and doctors
  • The patients talk in a way that doctor can’t understand
  • There is a considerable gap between the health workers and regulatory authorities (They don’t know if they have to report for ADR cases)
  • The issue of ADR is so complex since there is a lot to take into account like Drug to Drug, Drug to food and Drug to herbal interactions.
  • There was no common/consistent reporting style among doctors
  • The language used to report is hard for a non-specialist to understand.
  • Some fields were left empty with no single information which led to incomplete medical history
  • The annotation process prolonged since we had one pharmacologist for the work.

After noticing all these challenges, the team carefully studied the MIMIC-III database to assess the availability of the data with ADR cases which would help to come up with the baseline model to the problem. We discovered that the NoteEvent table has enough information about the patient history with all clear indications of ADR cases and with no ADR see the text.

To start with, we were able to query 100,000 records from the database with many attributes, but we used a text column found in the NoteEvent table with the entire patient’s history including (patient’s prior history, medication, dosage, examination, changes noted during medications, symptoms etc.). We started the annotation of the first group by filtering the records to remain with the rows of interest. We used the following keywords in the search; adverse, reaction, adverse events, adverse reaction and reactions. We discovered that only 3446 rows contain words that guided the team in the labelling process. The records were then annotated with the labels 1 and 0 for ADR and non-ADR cases respectively, as indicated in the filtration notebook.

In analysing the data, we found that there were more non-ADR cases than ADR cases, in which non-ADR cases were 3199 and 228 ADR cases and 19 data rows not annotated. Due to high data imbalance, we reduced Non-ADR cases to 1000, and we applied sampling techniques (i.e upsampling ADR cases to 800) to at least balance the classes to minimize bias during modelling.

After annotation and simple analysis we used NLTK to apply the basic preprocessing techniques for text corpus as follows:-

  1. Converting the corpus-raw sentences to lower cases which helps in other processing techniques like parsing.
  2. Sentence tokenization, due to the text being in paragraphs, we applied sentence boundary detection to segment text to sentence level by identifying sentence starting point and endpoint.

Then we worked with regular contextual expressions to extract information of interest from the documents by removing some of the unnecessary characters and replacing some with easily understandable statements or characters as for professional guidelines.

We removed affixes in tokens which put words/tokens into their root form. Also, we removed common words(stopwords) and applied lemmatization to identify the correct part of speech(s) in the raw text. After data preprocessing, we used Term Frequency Inverse Document Frequency (TF-IDF) from scikit-learn to vectorize the corpus, which also gives the best exact keywords in the corpus.

In modelling to create a baseline model, we worked with classification algorithms using scikit-learn. We trained six different models which are Support Vector Machines, eXtreme Gradient Boosting, Adaptive Gradient Boosting , Decision Trees, Multilayer Perceptron and Random Forest  and then we selected three (Support Vector Machine, Multilayer Perceptron and Random Forest )models which performed better on validation compared to other  models for further model evaluation. We’ll also use the deep learning approach in the next phase of the project to produce more promising results for the model to be deployed and kept in practice. Here is the link to colab for data pre-processing and modelling.

From the UDOM database, we collected a total of 41,847 patient records in chunks of 16185, 18400, and 7262 from 2017 to 2019 respectively. The dataset has following attributes (Date, Admission number, Patient Age, Sex, Height(Kg), Allergy status, Examination, Registration ID, Patient History, Diagnosis, and Medication ), we downsized it to 12,708 records by removing missing columns and uninformative rows. We used regular contextual expressions to extract information of interest from the documents as for professional guidelines. The data cleaned and exchanged data formatting, analyzing and preparing data for machine learning was elaborated in this Colab link.

On the BM hospital, the PDF files extracted from EMS have patient records with the following information.

  1. Discharge reports
  2. Medical notes
  3. Patients history
  4. Lab notes

Health professionals on the respective hospitals manually annotated the labels for each document, and this task took most of our time in this phase of the project. We’re still collecting and interpreting more data from these hospitals.

The team organizes and extracts information from BM hospital PDF files by exchanging data formatting, analyzing and preparing data for machine learning. We experimented with OCR processing for PDF files to extract data, but we didn’t generate promising results as more information appeared to be missing. We therefore hard to programmatically remove content from individual files and align them to the corresponding professional provided labels.

The big lesson that we have learned up to now is that most of the data stored in our local systems are not informative. Policymakers must set standards to guide system developers during development and health practitioners when using the system.

Lastly but not least, we want to thank our stakeholders, mentors and funders for your involvement in our research activities. It is because of such a partnership we can be able to achieve our main goal.

Building a Data Pipeline for a Real World Machine Learning Application

We set out with a novel idea; to develop an application that would (i) collect an individual’s Blood Pressure (BP) and activity data, and (ii) make future BP predictions for the individual with this data.

Key requirements for this study therefore were;

  1. The ability to get the BP data from an individual.
  2. The ability to get a corresponding record of their activities for the BP readings.
  3. The identification of a suitable Machine Learning (ML) Algorithm for predicting future BP.

Pre-test the idea – Pre testing the idea was a critical first step in our process before we could proceed to collect the actual data. The data collection process would require the procurement of suitable smart watches and the development of a mobile application, both of which are time consuming and costly activities. At this point we learnt our first lessons; (i) there was no precedence to what we were attempting and subsequently (ii) there were no publicly available BP data sets available for use in pre-testing our ideas.

Simulate the test data – The implication therefore was that we had to simulate data based on the variables identified for our study. The variables utilized were the Systolic and Diastolic BP Reading, Activity and a timestamp. This was done using a spreadsheet and the data saved as a comma separate values (csv) file. The csv is a common file format for storing data in ML.

Identify a suitable ML model – The data simulated and that in the final study was going to be time series data. The need to predict both the Systolic and Diastolic BP using previous readings, activity and timestamps meant that we were was handling a multivariate time series data. We therefore tested and settled on an LSTM model for multivariate time series forecasting based on a guide by Dr Jason Browniee (

Develop the data collection infrastructure – There being no pre-existing data for the development implied that we had to collect our data. The unique nature of our study, collecting BP and activity data from individuals called for an innovative approach to the process.

  • BP data collection – for this aspect of the study we established that the best way to achieve this would be the use of smart watches with BP data collection and transmission capabilities. In addition to the BP data collection, another key consideration for the device selection was affordability. This was occasioned both by the circumstances of the study, limited resources available and more importantly, the context of use of a probable final solution; the watch would have to be affordable to allow for wide adoption of the solution.

The watch identified was the F1 Wristband Heart and Heart Rate Monitor.

  • Activity data collection – for this aspect of the study a mobile application was identified as the method of choice. The application was developed to be able to receive BP readings from the smart watch and to also collect activity data from the user.

Test the data collection – The smart watch – mobile app data collection was tested and a number of key observations were made.

  • Smart watch challenges – In as much as the watch identified is affordable it does not work well for dark skinned persons. This is a major challenge given the fact that a majority of people in Kenya, the location of the study and eventual system use, are dark skinned. As a result we are examining other options that may work in a universal sense.
  • Mobile app connectivity challenges – The app initially would not connect to the smart watch but this was resolved and the data collection is now possible.

Next Steps

  • Pilot the data collection – We are now working on piloting the solution with at least 10 people over a period of 2 – 3 weeks. This will give us an idea on how the final study will be carried out with respect to:
  1. How the respondents use the solution,
  2. The kind of data we will be able to actually get from the respondents
  3. The suitability of the data for the machine learning exercise.
  • Develop and Deploy the LSTM Model – We shall then develop the LSTM model and deploy it on the mobile device to examine the practicality of our proposed approach to BP prediction.

Extracting meta-data from Malawi Court Judgments

We have set the task to develop semi-automatic methods for extracting key information from criminal cases issued by courts in Malawi. Our body of court judgments came partly from the MalawiLii platform and partly from the High Court Library in Blantyre, Malawi. We focussed our first analysis on cases between 2010 – 2019.

Amelia Taylor, University of Malawi | UNIMA · Information Technology and Computing
Amelia Taylor, University of Malawi | UNIMA · Information Technology and Computing

Here is an example of a case for which a PDF is available on MalawiLii. Here is an example of a case for which only a scanned image of a pdf is available. We used OCR for more than 90% of data to extract the text for our corpus (see below a description of our corpus).

Please open these files to familiarise yourself with the content of a court criminal judgment. What kind of information we want to extract?  For each case we wanted:

  1. Name of the Case
  2. Number of the Case
  3. Year in which the case was filled
  4. Year in which the judgment was given, Court which issued the judgment
  5. Names of Judges
  6. Names of parties involved (appellants and respondents, but you can take this further and extract names of principal witnesses, and names of victims)
  7. References to other Cases
  8. References to Laws/Statues and Codes, and,
  9. Legal keywords which can help us classify the cases according to the ICCS classification.

This project has taught us so much about working with text, preparing data for a corpus, exchange formats for the corpus data, analysing the corpus using lexical tools, and machine learning algorithms for annotating and extracting information from legal text.

Along the way we experimented also with batch OCR processing and different annotation formats such as IOB tagging[1], and the XML TEI[2] standard for sharing and storing the corpus data, but also with the view of using these annotations in sequence-labelling algorithms.

Each has advantages and disadvantages, the IOB tagging does not allow nesting (or multiple labelling for the same element), while an XML notation would allow this but it is more challenging to use in algorithms. We also learned how to build a corpus, and experimented with existing lexical tools for analysing this corpus and comparing it to other legal corpora.

We learned how to use POS annotations and contextual regular expressions to extract some of our annotations for laws and case citations and we generated more than 3000 different annotations. Another interesting thing we learned is that preparing annotated training data is not easy, for example, most algorithms require training examples to be of the same size and the training set needs to be a good representation of the data.

We also experimented with the classification algorithms and topics detection using skitlearn, spacy, weka and mathlab. The hardest task was to prepare the data in the right format and to anticipate how this data will lead to the outputs we saw. We felt that time spent in organising and annotating well is not lost but will result in gains in the second stage of the project when we focus on algorithms.

Most algorithms split the text into tokens, and for us, multi-word tokens (or sequences) are those we want to find and annotate. This means a focus on sequence-labelling algorithms. The added complications which are peculiar to legal text is that most of our key terms belong logically to more than one label, and the context of a term can span multiple chunks (e.g., sentences).

When using LDA (Latent Dirichlet Association) to detect topics in our judgments, it became clear to us that one needs to use a somehow ‘sumarised’ version in which we collapse sequences of words into their annotations  (this is because LDA uses term frequency-based measure of keyword relevance, whereas in our text the most relevant words may appear much less frequently than others).

Our work has highlighted to us the benefits and importance of multi-disciplinary cooperation. Legal text has its peculiarities and complexities so having an expert lawyer in the team really helped!

Finding references to laws and cases is made slightly more complicated because of the variety in which these references may appear or because of the use of “hereinafter”. Legal text makes use of “hereinafter”[3], e.g., Mwase Banda (“hereinafter” referred to as the deceased). But this can also happen for references to laws or cases as the following example shows:

Section 346 (3) of the Criminal Procedure and Evidence Code Cap 8:01 (hereinafter called “the Code”) which Wesbon J  was faced with in the case of  DPP V Shire Trading CO. Ltd (supra) is different from the wording of Section 346 (3) of the Code  as it stands now.

Compare extracting the reference to law from “Section 151(1) of the Criminal Procedure and Evidence Code” to extracting from “Our own Criminal Procedure and Evidence Code lends support to this practice in Sections 128(d) and (f)”. We have identified a reasonably large number of different references to laws and cases used in our text!  The situation is very similar for case citations. Consider the following variants:

  • Republic v Shautti , Confirmation case No. 175 of 1975 (unreported)
  • Republic v Phiri [ 1997] 2 MLR 68
  • Republic v Francis Kotamu , High Court PR Confirmation case no. 180 of 2012 ( unreported )
  • Woolmington v DPP [1935] A.C. 462
  • Chiwaya v Republic  4 ALR Mal. 64
  • Republic v Hara 16 (2) MLR 725
  • Republic v Bitoni Allan and Latifi Faiti

Something for you to Do Practically! To play with some annotations and appreciate the diversity in formats, and at the same time the huge savings that a semi-automatic annotation can bring, we have set up a doccano platform for you: you log in here using the user guest and password Gu3st#20.

Annotating with keywords for the purposes of the ICCS classification proved to be even harder. The International Classification of Crime for Statistical Purposes (ICCS)[4] and it is a classification of crimes as defined in the national legislations and comes on several levels each with varying degrees of the specification. We considered mainly the Level 1 and we wanted to classify our judgments according to the 11 types in Level 1 as shown in the Table.

Table 1: Level 1 sections of the ICCS
Table 1: Level 1 sections of the ICCS

We discovered that this task of classification according to Level 1 requires a lot of work and it is of a significant complexity (and the complexities only grow if we would consider the sublevels of the ICCS).  First, the legal expert of our team manually classified all criminal cases of 2019 according to Level 1 ICCS and worked on a correspondence between the Penal Code and the ICCS classification.  This is excellent.

We are in the process of extending this to mapping other Malawi laws, codes and statutes that are relevant to criminal cases into the ICCS. This in itself is a whole project on its own for the legal profession and requires processing a lot of text and making ‘parallel correspondences’! Such national correspondence tables are still work in progress in most countries and to our knowledge, our work is the first of such work for Malawi.

Looking at Level 1 of the ICCS meant we were kept very busy. Our research centred on hard and important questions.  How to represent our text so that it can be processed efficiently? What kind of data labels are most useful for the ICCS classification? What type of annotations to use (IOB or an xml-based)? What algorithms to employ (Hidden Markov Models or Recurrent Neural Networks or Long Short Term Memory)? But most importantly, we focussed on how to prepare our annotated data to be used with these algorithms?

We need to be mindful that this is a fine classification because we have to distinguish between texts that are quite similar. For example, if we wanted to classify whether a judgment by the type of law it falls under, say whether it is either civil or criminal case, this would have been slightly easier because the keywords/vocabulary used in civil cases would be quite different than that used in criminal cases.

We want to distinguish between types of crimes, and the language used in our judgments is very similar. Within our data set there is the level of difficulty, e.g.,  theft and murder cases may be easier to differentiate, that is Type 1 and 7 from the table above, than, say, to differentiate between types 1 and 2.

We have the added complication that most text representation models which define the relevance of a keyword as given by its frequency (whether that is TF or TF-IDF) but in our text, a word may appear only once and still be the most significant word for the purpose of our classification. For example, a keyword that distinguishes between type 1 and type 2 murders is “malice aforethought” and this may only occur once in the text of the judgment.

To help with this situation, one can extract first the structure of the judgment and focus only on the part that deals with the sentence of the judge. Indeed, there is research that focuses only on extracting various segments of a judgment.

This may work in many cases because usually the sentence is summarised in one paragraph. But it does not work for all cases. This is so especially when the case history is long, the crime committed has several facets, or the case has several counts, e.g., the murder victim is an albino or a disabled person.

In such situations one needs a combined strategy which uses: (1) An good set of annotated text with meta-data described above; (2) the mapping of the Penal Code/ Laws/Statues relevant to the ICCS; (3) collocations of words/ or a thesaurus and (4) concordances to help us detect clusters and extract relevant portions of the judgments; (5) employing sequence modelling algorithms, e.g., HMM, recurrent neural networks, for annotation and classification.

In the first part of the project, we focussed on the tasks (1) – (4) and experimented to some extent with (5).  What we wanted is to find a representation of our text based on all the information at (1) – (4) and attempt to use that in the algorithms we employ.

We have created a training set of over 2500 annotations for references to sections of the law and over 1000 annotations for references to other cases. We are still preparing these so that they are representative of the corpus and are good examples.

And finally but most importantly, while working on this AI4D project, it has brought me in contact with very clever people, whom I would have not otherwise met. We appreciate the support and guidance of the AI4D team!



[3] Hereinafter is a term that is used to refer to the subject already mentioned in the remaining part of a legal document. Hereinafter can also mean from this point on in the document.

[4] United Nations Economic Commission for Europe. Conference of European Statisticians. Report of the UNODC/UNECE Task Force on Crime Classification to the Conference of European Statisticians. 2011. Available:>

Maria Fasli, University of Essex, UNESCO Chair in Data Science and Analytics on developing AI solutions in Africa

Play video by Maria Fasli, University of Essex, UNESCO Chair in Data Science and Analytics at workshop "Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa", Nairobi, Kenya, April 2019
Play video by Maria Fasli, University of Essex, UNESCO Chair in Data Science and Analytics at workshop “Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa”, Nairobi, Kenya, April 2019

What are you working on at the moment?

My name is Maria Fasli, I am a professor of computer science and my area of expertise is in Artificial Intelligence. I work for the University of Essex in the UK. I work in arrange of projects, with both, industry as well as public sector organizations, trying to help them to understand the data that they have, their needs around data and how to make better use of their data.

How do you perceive development and Artificial Intelligence?

This is a really interesting question; I think AI has a really big roll to play in development. We need to bring AI into the developing countries and transitioning countries, to make a difference on the ground. It is not about us making up solutions in the west, but it is about developing solutions here locally.

There is a whole area that we need to work on around developing capacity and helping people create the right networks here in Africa as well as in other areas in the world, South Africa, Southeast Asia, to make a difference.

There is a big scope to use AI to support sustainable development goals and make progress, help developing and transitioning countries, develop into knowledge economies so that they can be the ones that have the power to make a difference for their own citizens.

What is your blue sky project in Africa?

This is another really good question. In the west, we’ve been using surveys to collect data and we’ve been doing clinical trials, we’re always trying to learn in a very structured kind of way. What I would like to work on if I had an unlimited budget is techniques that can learn and reason from observation on data.

Where are you trying instead of running a survey and collecting data about the population where you can control what it is that you’re getting back. Learning from the kind of data that is already available, because there is an abundance of data, but we’re currently lacking the techniques and trying to make sense out of this data.

How do you feel about the workshop?

I think it has been amazing, we’ve made a lot of progress, we’ve had concrete ideas coming out as the next steps and I look forward to personally supporting the initiative going forward if I’m needed in whichever way is possible.

Do you have a one-liner for us? One line?

A slogan. AI for all!

December Review; AI4D- African Language Dataset Challenge // Bilan de decembre; Défi AI4D – Jeu de Données sur les Langues Africaines

The close of 2019 marked the second month of the AI4D African Dataset Challenge, an effort aimed at incentivizing the uncovering and creation of African language datasets for improved representation in NLP. This challenge is hosted on Zindi and has been ongoing since the 1st of November. Each month we take stock and award a total of USD 1000 to the two most outstanding submissions.

In December, these two were as follows;

  • A Yoruba dataset submitted by David Adelani. This submission was put together by three individuals, David, Damilola Adebonojo and Omo Yooba, the latter two of whom are major Yoruba contributors for Global Voices Lingua, a movement which aims to bridge worlds and amplify voices through translating stories into dozens of languages. Beyond including some of the news stories from the Global Voices website, they translated several chapters of a book, got parallel sentences from a Twitter account that posts Yoruba proverbs, translated part of a movie dialogue found on YouTube and supplemented these with multi-domain sentences containing scientific and medical terms to work towards a representative dataset.
  • A Fongbe submission composed of datasets prepared for two tasks; 
    • Fongbe-French Machine Translation with data sourced from Bible translations, scraping a website and translating a book freely available online.
    • Automatic Speech Transcription data consisting of phoneme labels, single-speaker audio sentences as well as multi-speaker conversational audios.

We received 6 submissions in December, composed of data from 4 languages, Fongbe, Igbo, Swahili and Yoruba. This brings our overall language total, taking into consideration November and December submissions, to 6; Fongbe, Hausa, Igbo, Swahili, Wolof and Yoruba.

We observed one novel data collection process that involved first scanning text from a book containing a collection of folk-tales then digitizing these using Google’s Text Recognition software for Optical Character Recognition(OCR).  There was also a notable submission of Igbo names, a valuable resource that can be incorporated into the task of Named Entity Recognition. To learn more about other techniques being to create datasets, be sure to check the November round-up here.

As we begin evaluation of the January submissions, we continue to be impressed by the calibre of datasets submitted and the effort put into their creation. 

This work actively challenges us to think deeper about the various copyright implications of some of these data collection sources and processes and the modality of finally making all this data open. In addition to the choice of dataset to use for a Machine Learning task in the second phase of this challenge, as each month brings us closer to the end of the dataset creation phase.

Contribution by:
Kathleen Siminyu, AI4D-Africa Network Coordinator
Sackey Freshia, Jomo Kenyatta University of Agriculture and Technology
Daouda Tandiang Djiba, GalsenAI

La fin de l’année 2019 a marqué le deuxième mois du défi AI4D African Dataset Challenge, un effort visant à encourager la découverte et la création de jeux de données sur les langues africaines pour une meilleure représentation en NLP. Ce défi est hébergé sur Zindi et se déroule depuis le 1er novembre. Chaque mois, nous faisons le point et attribuons un total de 1000 USD aux deux meilleures soumissions.

En décembre, il s’agissait des deux suivantes ;

  • Un jeu de données Yoruba soumis par David Adelani. Cette soumission a été réalisée par trois personnes, David, Damilola Adebonojo et Omo Yooba, ces deux derniers étant des contributeurs yorubas majeurs pour Global Voices Lingua, un mouvement qui vise à rapprocher les mondes et à amplifier les voix en traduisant des histoires dans des dizaines de langues. En plus d’inclure certains des articles du site web de Global Voices, ils ont traduit plusieurs chapitres d’un livre, obtenu des phrases parallèles d’un compte Twitter qui publie des proverbes yorubas, traduit une partie d’un dialogue de film trouvé sur YouTube et complété ces derniers par des phrases multi-domaines contenant des termes scientifiques et médicaux pour travailler sur un jeu de données représentatif.
  • Une soumission Fongbe composée d’un jeu de données préparées pour deux tâches ; 
    • La traduction automatique Fongbe-français avec des données provenant de traductions de la Bible, en grattant un site web et en traduisant un livre disponible gratuitement en ligne.
    • Données de transcription automatique de la parole comprenant des étiquettes de phonèmes, des phrases audio à un seul locuteur ainsi que des audios conversationnels à plusieurs locuteurs.


Nous avons reçu 6 soumissions en décembre, composées de données provenant de 4 langues, le fongbe, l’igbo, le swahili et le yoruba. Cela porte à 6 le nombre total de langues, en tenant compte des contributions de novembre et de décembre : le fongbe, le haoussa, l’igbo, le swahili, le wolof et le yoruba.

Nous avons observé un nouveau processus de collecte de données qui consistait à scanner le texte d’un livre contenant un ensemble de contes populaires, puis à numériser ces derniers à l’aide du logiciel de reconnaissance de texte de Google pour la reconnaissance optique de caractères (OCR). 

Il y a également eu une soumission notable de noms Igbo, une ressource précieuse qui peut être incorporée dans la tâche de reconnaissance des entités nommées. Pour en savoir plus sur les autres techniques de création de jeu de données, consultez le résumé de novembre ici.

Alors que nous commençons l’évaluation des soumissions de janvier, nous continuons à être impressionnés par la qualité des jeux de données soumis et par les efforts déployés pour leur création. 

Ce travail nous met activement au défi de réfléchir plus en profondeur aux diverses implications en matière de droits d’auteur de certaines de ces sources et processus de collecte de données et à la modalité de rendre enfin toutes ces données ouvertes. Outre le choix de l’ensemble de données à utiliser pour une tâche d’apprentissage automatique dans la deuxième phase de ce défi, puisque chaque mois nous rapproche de la fin de la phase de création de l’ensemble de données.

Contribution de:
Kathleen Siminyu, Coordinatrice du réseau AI4D-Africa
Sackey Freshia, Jomo Kenyatta University of Agriculture and Technology
Daouda Tandiang Djiba, GalsenAI

Delmiro Fernandez-Reyes form UCL on how AI can deliver better medicines in Africa

Delmiro Fernandez-Reyes, University College London at workshop "Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa", Nairobi, Kenya, April 2019
Delmiro Fernandez-Reyes, University College London at the workshop “Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa”, Nairobi, Kenya, April 2019

What are you working on at the moment?

I’m based at the department of computer science, University College London and as well at the College of Medicine at the University of Ibadan. My work is related to solutions for global health challenges such as paediatric infections, malaria, or communicable or noncommunicable diseases.

The work has been basically harnessing algorithms we develop, to actually look at the data that can improve diagnostics, or can improve clinical pathways, or can actually as well make decisions faster, therefore savings on the healthcare systems which are a stretch.

So basically, we can focus on challenges on this global health problems. What I do at the moment is develop the hardware that AI is going to work on, we develop a microscope itself that have a lot of AI components, which for the diagnostics like navigation, detection of the specific objects, like malaria, parasites and all the etymological aspects of malaria screening.

Another important part of what we do, I think that the role of AI as I see as a person who works in challenges in health in the region is that in Africa it is more transformative because it creates opportunity. For example, these projects, the ones I’m talking about, are already running, they are generating employment, they are generating teams. This is being now developed to use technology in the frontline.

We have a tool that improves MRI resolution and that is now being used by radiologists in Nigeria. Through those tools you can train people, professionals, increase interdisciplinarity, so it opens opportunity, which is the opposite as you see in the north countries or in the west, AI seems to be to take jobs out of people or doing tasks. I think in Africa you can use it as challenges that will increase development or the region.

How do you perceive development and Artificial Intelligence?

The way to facilitate development is focusing on the challenges the region has. The region of many challenges, from technological gaps to the ones of governance.

I want to focus on the ones closest to me, because of my background as a basic scientist in medicine and computer science. In those areas, we can clearly see that we can aid the developing areas of improving the key drivers of lack of development, which is inequality, neonatal mortality, maternal mortality. Those are actually three axes that drive the region.

The region has still too many communicable diseases, HLB, Tuberculosis, malaria, those are now the challenge. Another challenge is, as people are getting older in southern Africa, like Nigeria, span is increasing with the GDP increase, you will have a bigger impact on noncommunicable diseases.

For those, I think we can bring a lot of management, health care systems, policy-making and strategies for that. Of course, there is another area on the development that you cannot do that only for the health, you have to develop, power, infrastructure and water – sanitation, so there needs to be a concerted element to this, you cannot have only the health people working alone, has to be the engineers of infrastructure at the same time or telecommunications.

What is your blue sky project in Africa?

The main project that we will focus on is what we are already doing. We would like to have AI-driven platform for diagnosis of diseases fast in clinical labs. You can achieve that.

November Review; AI4D- African Language Dataset Challenge // Bilan de novembre ; Défi AI4D – Jeu de Données sur les Langues Africaines



On the 1st of November, we launched the AI4D-African Language Dataset Challenge on Zindi, an effort towards incentivizing the uncovering and creation of African language datasets for improved representation in NLP. This first phase of what is expected to be a two-phase challenge, is taking place over 5 months, November 2019 to March 2020, with evaluation of submissions done on a monthly basis. Each month, the top 2 submissions will receive a cash prize of USD 500.

Being well into December we are excited to announce that the top two submissions for November were received from;

  • Oshingbesan Adebayo who submitted a dataset composed of three West African indigenous languages(Hausa, Igbo and Yoruba). The dataset was acquired from a wide variety of sources ranging from transcriptions of songs, online news sites, excerpts from published books, websites in indigenous languages to blogs, Twitter, Facebook and more. 
  • Thierno Diop who submitted an Automatic Speech Recognition dataset for Wolof in the domain of transportation services. The data was prepared through a collaboration between BAAMTU Datamation, a senegalease company focused on using data to help companies to leverage AI and Big Data, and WeeGo, an app which help passengers to get information about urban transport in Senegal.

Overall, we received 9 submissions in the month of November, composed of data from a total of 4 unique languages. These are Hausa, Igbo, Wolof and Yoruba.

Majority of the data came from online sources. Scraping of newspaper sites such as BBC, DW and VOA which curate news in several African languages emerged as one of the top ways that participants went about creating datasets. A great strategy for putting together a sizeable dataset over the coming months would be to keep going back to the site(s) every so often and keeping your dataset up to date with the site as news is regularly published. Capturing a wide variety of news categories would go a long way in ensuring the dataset is well balanced and representative of language variety. Wikipedia sites published in various languages also featured as a data source. 

  • BBC publishes news in Afaan Oromoo, Amharic, Hausa, Igbo, Kirundi, Pidgin, Somali, Swahili, Tigrinya and Yoruba 
  • DW publishes news in Amharic, Hausa and Kiswahili 
  • VOA publishes news in Afaan Oromoo, Amharic, Bambara, Hausa, Kinyarwanda/Kirundi, Ndebele, Shona, Somali, Kiswahili and Tigrinya

A closely related online source is Twitter data, which we have seen particularly curated for the task of sentiment analysis. A good place to start would be the accompanying Twitter profiles of the above news sites. While we haven’t had any data sourced from Facebook yet, I imagine that the profiles maintained by these news outlets for various languages would also be a good place to start.  

Manual translation also emerged with some submissions compiled as a result of one or several individuals coming together to translate pieces of text as well as custom applications such as mobile applications being used to crowdsource voice overs for the dataset created for Automatic Speech Recognition. 

I am also excited to announce that we will have a workshop at ICLR 2020, “AfricaNLP – Unlocking Local Languages”, which will be held in Addis Ababa in April of next year.
Part of the agenda of this workshop is set aside to showcase exceptional work and resulting datasets that will emerge as output from this exercise.

We will also use the workshop as an opportunity to launch the second phase of this challenge. If you have been following our thought process since the beginning, then you will know that the second phase of the challenge is largely dependent on the outcomes of this first phase. The one(or hopefully two) downstream NLP tasks that will be the object of the 2nd phase will utilise datasets that result from this first phase.

Finally, we have a Call for Papers for the workshop, specifically for research work involving African languages. Feel free to start making your submissions on this page. Here’s some key dates to keep in mind:

  • Submission deadline: 1st February, 2020
  • Notification to authors: 26th February, 2020
  • Workshop: 26th April, 2020

Happy Holidays!

Contribution by:
Kathleen Siminyu, AI4D-Africa Network Coordinator
Sackey Freshia, Jomo Kenyatta University of Agriculture and Technology
Daouda Tandiang Djiba, GalsenAI

Le 1er novembre, nous avons lancé le Défi AI4D – Ensemble de données sur les langues africaines sur Zindi, un effort pour encourager la découverte et la création  jeux de données sur les langues africaines pour une meilleure représentation en NLP. Cette première phase de ce qui devrait être un défi en deux phases, se déroule sur 5 mois, de novembre 2019 à mars 2020, avec une évaluation de la soumission faite sur une base mensuelle. Chaque mois, les deux meilleures soumissions recevront un prix en espèces de 500 USD.

Nous sommes heureux d’annoncer que les deux meilleures soumissions pour novembre ont été reçues ;

  • Oshingbesan Adebayo qui a soumis un jeu  de données composé de trois langues autochtones d’Afrique de l’Ouest (haoussa, igbo et yoruba). Le jeu  de données a été acquis auprès d’une grande variété de sources allant de transcriptions de chansons, de sites d’information en ligne, d’extraits de livres publiés, de sites Web en langues autochtones à des blogues, Twitter, Facebook et autres. 
  • Thierno Diop qui a soumis un ensemble de données de reconnaissance automatique de la parole pour le wolof dans le domaine des services de transport. Les données ont été préparées grâce à une collaboration entre BAAMTU Datamation, une société sénégalaise spécialisée dans l’utilisation des données pour aider les entreprises à tirer parti de l’intelligence artificielle et de Big Data, et WeeGo, une application qui aide les passagers à obtenir des informations sur le transport urbain au Sénégal.

Au total, nous avons reçu 9 soumissions au mois de novembre, composées de données provenant de 4 langues uniques au total. Il s’agit du haoussa, de l’igbo, du wolof et du yoruba.

La majorité des données provenaient de sources en ligne. Le grattage(scraping) de sites de journaux tels que la BBC, DW et VOA qui organisent des actualités dans plusieurs langues africaines est apparu comme l’un des principaux moyens utilisés par les participants pour créer des jeux  de données. Une excellente stratégie pour constituer un jeu de données important au cours des mois à venir serait de retourner sur le(s) site(s) de temps en temps et de garder le jeu de données à jour avec le site car des nouvelles sont régulièrement publiées. La saisie d’une grande variété de catégories de nouvelles contribuerait grandement à assurer que le jeu  de données est bien équilibré et représentatif de la variété des langues. Les sites Wikipédia publiés dans différentes langues sont également présentés comme une source de données. 

  • La BBC publie des nouvelles en afaan oromo, amharique, haoussa, igbo, kirundi, pidgin, somali, swahili, tigrinya et yoruba 
  • DW publie des nouvelles en Amharique, Hausa et Kiswahili 
  • VOA publie des informations en Afaan Oromoo, Amharique, Bambara, Haoussa, Kinyarwanda/Kirundi, Ndebele, Shona, Somali, Kiswahili et Tigrinya

Une source en ligne étroitement liée est celle des données de Twitter, que nous avons vu particulièrement bien conservée pour la tâche d’analyse des sentiments. Un bon point de départ serait les profils Twitter des sites d’information ci-dessus. Bien que nous n’ayons pas encore eu de données provenant de Facebook, j’imagine que les profils tenus par ces sites d’information dans différentes langues seraient également un bon point de départ.  

La traduction manuelle a également fait son apparition, certaines soumissions ayant été compilées à la suite de la collaboration d’une ou de plusieurs personnes pour traduire des morceaux de texte ainsi que des applications personnalisées telles que des applications mobiles utilisées pour créer des voix hors champ pour un ensemble de données créé pour la reconnaissance automatique de la parole. 

Je suis également heureux d’annoncer que nous aurons un atelier à la conférence ICLR 2020, “AfricaNLP – Unlocking Local Languages“, qui se tiendra à Addis-Abeba en avril prochain. Une partie de l’ordre du jour de cet atelier est réservée à la présentation des travaux exceptionnels et des jeux  de données qui résulteront et qui seront le fruit de cet exercice.

Nous profiterons également de l’atelier pour lancer la deuxième phase de ce défi. Si vous avez suivi notre processus de réflexion depuis le début, vous savez que la deuxième phase du défi dépend en grande partie des résultats de cette première phase. Les une (ou, espérons-le, deux) tâches de NLP en aval qui feront l’objet de la deuxième phase utiliseront les ensembles de données qui résultent de cette première phase.

Enfin, nous avons un appel à communications pour l’atelier, spécifiquement pour les travaux de recherche impliquant les langues africaines. N’hésitez pas à commencer à faire vos soumissions ici.

  • Date limite de soumission: 1er février 2020
  • Notification de la décision: 26 février 2020
  • Atelier  : 26 avril 2020

Joyeuses Fêtes!

Contribution de:
Kathleen Siminyu, Coordinatrice du réseau AI4D-Africa
Sackey Freshia, Jomo Kenyatta University of Agriculture and Technology
Daouda Tandiang Djiba, GalsenAI

Vukosi Marivate from University of Pretoria on Africa’s position in AI

Play video by Vukosi Marivate, University of Pretoria, CSIR, Deep Learning at workshop "Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa", Nairobi, Kenya, April 2019
Play video by Vukosi Marivate, University of Pretoria, CSIR, Deep Learning Indaba  at the workshop “Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa”, Nairobi, Kenya, April 2019

What are you working on at the moment?

I am doctor Vukosi Marivate. I am a chair of data science at the University of Pretoria in South Africa and I am also here representing Deep Learning Indaba. My work mostly is involved in machine learning and natural language processing as well as how we use data science for society.

How do you perceive development and Artificial Intelligence?

I see AI as being a tool that we can use in society, so not restricting it for development and on the continent, I believe that we all have our own challenges, doesn’t matter where you are and how can we use AI as one of the tools that could be used to improve the lives of Africans. If we start from there all the other things follow.

What is your blue sky project in Africa?

As Africa, I think we are in an interesting position when we’re trying to look at AI and how it can be used. One of the things that become important is also demystifying it for the public and decision-makers. I think the blue sky is how do we get AI to be interpretable and transparent. That’s one big part, there should be more work done in that. It’s great having very accurate models or high accuracy, low error, but how then does somebody else interpret what is going on and understand it. Because I think that is where a lot of the bias creeps in is, things are used without them being understood of why they are working the way that they work.

How do you feel about the workshop?

The workshop has been great, its been really meeting with a lot of great minds from across the continent and beyond. I am looking forward to seeing what we do with the network after this.

Short one-liner if you have one?

Okay. For what? For the workshop? Just a slogan. Oh, we said we need to capacitate AI strength on the African continent through our communities.


Benjamin Rosman from University of the Witwatersrand on AI and development

Benjamin Rosman, University of the Witwatersrand / CSIR at workshop "Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa", Nairobi, Kenya, April 2019
Benjamin Rosman, University of the Witwatersrand / CSIR at the workshop “Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa”, Nairobi, Kenya, April 2019

What are you working on at the moment?

I am Benjamin Rosman. I work at the University of the Witwatersrand in South Africa, Johannesburg. And I also work at the CSIR, which is the Council of Scientific and Industrial Research in South Africa. And then with another head, I am also one of the founders and organizers of the Deep Learning Indaba.

In my research lab, which is based mainly at the University of the Witwatersrand, we focus mainly on questions around machine learning and decision theory, so we work in predominately in reinforcement learning in deep learning and areas around that and we recently started working in applied areas as well.

How do you perceive development and Artificial Intelligence?

I think that the combination of AI and development is an interesting one. I think AI provides an opportunity to solve a lot of problems in the developing world, as it is currently around the world in general.

I think there is a lot of opportunities for students and society more generally to get involved in acquiring these tools, which can be used in a wide variety of industries and application areas. if we think about this right, there is an opportunity to make a large impact and train a lot of people in very impactful areas.

What is your blue sky project in Africa?

What I would really love to see and there are so many research topics that I would love to work on, but what I would really love to see is a pipeline from fundamental research in Africa to applied research, considering aspects of ethics and society and finally with the pipeline running through all the way through to commercialization, so that we can be training academics, we can be, educating society in general, we can be starting start-ups and improving the way that large corporates and governments work across the continent.


AI4D – African Language Dataset Challenge // Défi AI4D – Jeu de Données sur les Langues Africaines

NLP Challenge

Getting started with programming is easy, a well-trodden path. Whether it be picking up the skill itself, a new programming language or venturing into a new domain, like Natural Language Processing (NLP), you can be sure that a variety of beginner tutorials exist to get you started. The ‘Hello World!’s, as you may know them. 

Where NLP is concerned, some paths tend to be better trodden than others. It is infinitely easier to accomplish an NLP task, say Sentiment Analysis, in English than it is to do the same in my mother tongue, Luhya. This reality is an extrapolation of the fact that the languages of the digital economy are major European languages.

The gap between languages with plenty of data available on the Internet and those without is ever increasing. Pre-trained language models in recent times have led to significant improvement in various NLP tasks and Transfer Learning is rapidly changing the field. While leading architectures for pre-training models for Transfer Learning in NLP are freely available for use, most are data-hungry. The GPT-2 model, for instance, used millions, possibly billions of text to train. (ref)

The only way I know how to begin closing this gap is by creating, uncovering and collating datasets for low resource languages. With the AI4D – African Language Dataset Challenge, we want to spur on some groundwork. While Deep Learning techniques now make it possible to dream of a future where NLP researchers and practitioners on the continent can easily innovate in the languages their communities speak, a future where literacy and mastery of a major European language is no longer a prerequisite to participation in the digital economy, these techniques require data. Data that can only be created by the communities that speak these languages, by individuals that have the technical skills, by those of us who understand the importance of this work and have the desire to undertake it.

The challenge will run for 5 months(November 2019 to March 2020), with cash prizes of USD 500 awarded as an incentive to the top 2 submissions each month. This is the first of a two-phase challenge. In this first phase, the creation of datasets. We would like to see some of these datasets developed for specific downstream tasks but this is not necessary. 

We have however earmarked four downstream NLP tasks and anticipate that one(or two) of these will be the framing of the second phase of this challenge; Sentence Classification, Sentiment Analysis, Question Answering and Machine Translation. Other downstream tasks that participants may be interested in developing datasets for, or have already developed datasets for, are also eligible. Our intention is that the datasets are kept free and open for public use under a Creative Commons license once the challenge is complete.

The challenge is hosted on Zindi, head on over to this page for full details, the prize money provided through a partnership between the International Development Research Centre (IDRC) and the Swedish International Development Cooperation Agency (SIDA), the facilitation of the challenge through combined efforts of the Artificial Intelligence for Development Network(AI4D-Africa) and the Knowledge 4 All Foundation(K4All), and finally, our expert panel that have volunteered their time to undertake the difficult qualitative aspect of dataset assessment; Jade Abbott – RetroRabbit, John Quinn – Google AI/Makerere University, Kathleen Siminyu – AI4D-Africa, Veselin Stoyanov – Facebook AI and Vukosi Marivate – University of Pretoria. 

The rest, we leave up to the community.  

Contribution by Kathleen Siminyu, AI4D-Africa Network Coordinator

Photo by Eva Blue on Unsplash.

Se lancer dans la programmation est facile, c’est un chemin bien balisé. Qu’il s’agisse de l’acquisition de la compétence elle-même, un nouveau langage de programmation ou vous aventurer dans un nouveau domaine, tel que le traitement du langage naturel (NLP), vous pouvez être sûr qu’il existe une variété de tutoriels pour débutants pour vous aider à démarrer. Les “Hello World!”, Comme vous les connaissez peut-être.


En ce qui concerne le traitement des langages (NLP) , certains chemins ont tendance  à être mieux balisés que d’autres. Par exemple en analyse sentimental, il est beaucoup plus facile d’accomplir une tâche de NLP  que de faire de même dans ma langue maternelle, Luhya. Cette réalité est une extrapolation du fait que les langues de l’économie numérique sont en majeur partie des  langues européennes.

L’écart entre les langues contenant beaucoup de données disponibles sur Internet et celles qui n’en possèdent pas ne cesse de se creuser. Les modèles linguistiques pré-entraînés  de ces dernières années ont conduit à une amélioration significative de diverses tâches du traitement des langages (NLP) et l’apprentissage par transfert (Transfer Learning) change rapidement le domaine. Bien que les principales architectures pour les modèles de pré-entraînés  à l’apprentissage par transfert en NLP soient librement utilisables, la plupart ont besoin de beaucoup de données. Le modèle GPT-2, par exemple, utilise des millions, voire des milliards de textes pour apprendre . (ref)

La seule façon pour moi de commencer à combler cet écart consiste à créer, à découvrir et à assembler des ensembles de données pour des langages disposant de peu de ressources. Avec le défi AI4D – Jeu de données sur les langues africaines, nous souhaitons stimuler le travail préparatoire. Bien que les techniques d’apprentissage en profondeur permettent désormais de rêver d’un avenir où les chercheurs et les praticiens en NLP  du continent pourront facilement innover dans les langues parlées par leurs communautés, un avenir où l’alphabétisation et la maîtrise d’une grande langue européenne n’est plus une condition préalable à la participation à la l’économie numérique, ces techniques nécessitent des données. Des données qui ne peuvent être créées que par les communautés qui parlent ces langues, par des personnes possédant les compétences techniques, par ceux d’entre nous qui comprenons l’importance de ce travail et qui souhaitent le faire.

Le défi durera 5 mois (de novembre 2019 à mars 2020), avec des prix en espèces de 500 USD attribués sous forme d’encouragement aux 2 meilleurs projets chaque mois. C’est le premier d’un défi en deux phases. Dans cette première phase, la création de jeux de données. Nous aimerions voir certains de ces jeux de données développés pour des tâches spécifiques en aval, mais ce n’est pas nécessaire.

Nous avons toutefois réservé quatre tâches du NLP  en aval et prévoyons qu’une (ou deux) d’entre elles constitueront le cadre de référence de la deuxième phase de ce défi. Classification de textes , analyse des sentiments, réponses aux questions et traduction automatique. Les autres tâches en aval pour lesquelles les participants pourraient  être intéressés par le développement de jeux de données ou pour lesquels ils ont déjà développé des jeux de données sont également éligibles. Notre intention est que les jeux de données restent libres et ouverts au public sous une licence “Creative Commons” une fois le challenge terminé.

Le défi est hébergé sur Zindi, rendez-vous sur cette page pour obtenir tous les détails, l’argent du prix fourni grâce au partenariat entre le Centre de recherches pour le développement international (CRDI) et l’Agence suédoise de coopération pour le développement international (SIDA), la facilitation du défi par les efforts combinés du réseau de l’intelligence artificielle pour le développement (AI4D-Africa) et de la fondation Knowledge 4 All (K4All), et enfin de notre groupe d’experts qui ont offert de leur temps pour aborder le difficile aspect qualitatif de l’évaluation d’un jeu  de données; Jade Abbott – RetroRabbit, John Quinn – Google AI / Université Makerere, Kathleen Siminyu – AI4D-Africa, Veselin Stoyanov – Facebook AI et Vukosi Marivate – Université de Pretoria.

Le reste, nous laissons à la communauté.

Contribution de Kathleen Siminyu, Coordinatrice du réseau AI4D-Africa

Photo par Eva Blue sur Unsplash.


Isaac Rutenberg, Strathmore University on development of AI in Africa

Isaac Rutenberg, Strathmore University at workshop "Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa", Nairobi, Kenya, April 2019
Isaac Rutenberg, Strathmore University at the workshop “Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa”, Nairobi, Kenya, April 2019

What are you working on?

My name is Issac Rutenberg, I am the director of the centre for intellectual property and information technology law, CIPIT, at the Strathmore law school, here in Nairobi, Kenya. We are working at the intersection of intellectual property and IT, particularly in the ways that people utilize both of those for various reasons, including development.

How do you perceive development and Artificial Intelligence?

I think at the moment it is quite early, there are some very nascent projects in AI on the continent, there is actually quite a lot of them. I think that the impact of those so far has been quite minimal.

I think that we are at an early stage of determining how we want to use AI. In some ways that is really good, because the rest of the world has shown us, or has allowed us to see some of the pitfalls, some of the major problems that we are going to encounter as we develop AI, we will encounter that in everyday life on a regular basis. We do already in some instances but it’s only going to grow.

What is your blue sky project in Africa?

If I could have AI solve any problems, it would be getting products to international markets. A lot of agriculture in Africa is wasted for variety of reasons, I know a lot of those are structural and AI is obviously not going to solve all of them, but somehow if we could use AI to help the distribution systems, the analysis of all of the data that is required or there is generated, that impacts how products are moved around. I think that would have a very big impact on the people in their daily lives.