Case Study: Dubawa Audio
Dubawa Audio is an innovative AI-powered tool developed by the Centre for Journalism Innovation and Development (CJID) to address the…
Dubawa Audio is an innovative AI-powered tool developed by the Centre for Journalism Innovation and Development (CJID) to address the challenges of monitoring and fact-checking radio content in sub-Saharan Africa. This case study explores the journey from ideation to implementation, highlighting key lessons for small newsrooms in the region looking to adopt AI solutions.
Background
Radio is the primary mass media in Africa, especially in rural areas where internet penetration is low, making it crucial for information dissemination and fact-checking. The vast amount of radio content produced daily makes manual monitoring and fact-checking nearly impossible. There was a need for an automated solution to transcribe, archive, and extract claims from radio shows.
As the Head of Innovation at CJID, Monsur Hussaine, explained, they “set out to achieve this [Dubawa] because the radio is the mass media in Africa. That’s where the majority of Africans get their news. Yet there’s no documentation on the radio.”
The sheer number of radio stations, many of which run 24 hours a day, makes it nearly impossible for any human to effectively track and analyse all the content and information being disseminated on the platform. As a result, fact-checkers have traditionally struggled to keep up with misinformation and disinformation spread on radio programs.
CJID decided to improve monitoring efficiency by automation. “That was why we set out to build a tool that’d monitor the information being disseminated on the radio, by transcribing, archiving and extracting claims from the radio shows”, says Hussain.
Ideation: Identifying the Need for Dubawa Audio AI
The ideation process for Dubawa Audio AI began with recognising the documentation gap in radio content, despite its cultural importance in Africa. The team conducted user research and feasibility studies with journalists and fact-checkers to understand the challenges they faced. They identified specific examples, like the Brekete show, where disinformation flourished due to a lack of efficient monitoring and the show’s emotional appeal. This led to the conception of a tool that could automate the processes of monitoring, transcription, and claim identification.
Through this process, the vision for Dubawa Audio AI took shape as a solution that would use artificial intelligence to transform how radio content is monitored and fact-checked across Africa, addressing a critical need in the fight against misinformation.
Building the Tool: Features and Core Functionalities
From the outset, Dubawa Audio AI aimed to address two core functionalities: transcribing radio shows by automatically converting audio into text, which could be archived and referenced later, and claim extraction to identify specific statements from radio broadcasts that require fact-checking.
As the developers worked on improving these core features, they realised that the archival function provided an unexpected benefit. By storing large volumes of transcribed radio content, they had effectively created a radio monitoring tool. The developer noted, “Based on the archival of all these transcriptions, we’ve actually designed a radio monitoring tool, which means by making our data public, anybody can track what’s going on on the radio. One could search for a keyword, and see anywhere it has been mentioned on all of these radio shows, and be able to understand the context in which it was mentioned.”
The radio monitoring feature evolved through several iterations from the initial solution. The team now has an archive of transcribed radio shows, which they are looking to make public to allow people to monitor ongoing broadcasts in real-time.
Development and Implementation
Monsur describes the development process: “This is a tool that people have not built before, and this is the first tool with that niche capability, perhaps anywhere in the world, so there was no template to follow. It just involved a lot of testing and experimenting.”
Initially, they started with the Google transcription tool, which did not work well due to accent and dialect barriers. The issue persisted with other transcription tools, leading the team to partner with a local transcription company. They leveraged hours of radio recordings to manually transcribe shows, building a more accurate dataset.
Through this process, Monsur learned an important lesson: “We are the ones that need to focus on our own problems, especially in catching up with technological advancements in areas of natural language processing (NLP). We need to work on our own languages for the diversification of large language models.”
Technical Challenges
The development of Dubawa Audio AI faced several significant technical challenges, primarily in the areas of language and accent representation, transcription accuracy, and claim extraction.
Language and Accent Representation was one of the primary challenges in achieving accurate transcription, particularly given the diversity of accents and languages in the region. The lead developer noted, “If you look at our accents, even the way I would speak English. And we will mix English with our local languages. A lot of that. So you’d find that there’s a gap really between that and what AI systems understand.” Existing AI models struggled with African languages and accents, and the team had to deal with mixed language use, such as English interspersed with local languages. Different accents within the same language, like Yoruba, posed additional challenges.
Monsur explains their approach to addressing this challenge: “What we are trying to do now is contribute to dataset improvement on African languages representation. Through the Dubawa audio platform, we have an archive of 100s of hours of audio recordings and their transcriptions. That’s a large dataset, and we are looking at expanding that over the next few months. This dataset would be very valuable in contributing to the language support in Africa.” He further notes, “I know there are existing datasets for languages like Yoruba, for example, but they don’t use representative data. The ones I know are Bible TTS and Afro TTS; the bible data is not very representative of the Yoruba Language, and neither is the Afro TTS. There’s a lot of work into understanding how the representation and bias would affect any model being built on such a dataset.”
Transcription Accuracy presented another significant challenge. Initial attempts using Google’s transcription tool were inadequate for the task. To address this, the team partnered with a local transcription company to develop a more accurate model. They also had to differentiate between various language varieties, such as Nigerian English, Ghanaian English, and Nigerian Pidgin, to ensure each model was used specifically for that language.
Claim Extraction was the third major technical hurdle. The team initially used the Distilbert model for claim extraction. However, they later transitioned to OpenAI models after their launch, which significantly improved performance. Monsur explained, “We started building before the launch of OpenAI, and for the claim extraction, we were using Distilbert initially, and then after OpenAI’s launch, we tested out OpenAI models, and we saw that it had better performance. This meant we had to drop months of hard work to move to an Open AI model, and this increased the tool’s accuracy in that regard.”
These technical challenges underscore the complexity of developing AI tools for diverse linguistic environments and the importance of continual adaptation to new technologies and methodologies.
Overcoming Challenges
The team’s approach to overcoming these obstacles involved continuous testing and iteration of different technologies, building a substantial dataset of over 100 hours of audio recordings and transcriptions, and focusing on creating value before considering monetisation.
One of the most formidable hurdles was managing the sheer volume of content. Initially rolled out to monitor three radio stations in Nigeria and one in Ghana, the tool had to sift through hours of radio broadcasts daily. The developers needed to find ways to streamline the claim detection process to make it manageable and effective.
The solution lay in leveraging AI to significantly reduce the workload. Instead of requiring human fact-checkers to manually listen to endless hours of radio, the AI was designed to present only claims that were potentially false or worth investigating. As one of the developers pointed out, “We basically reduced the time needed for our fact-checkers to get to the right information.”
This approach dramatically improved efficiency, allowing the team to process large volumes of content quickly. However, it’s important to note that while the tool provided automated insights, the final decision on whether to fact-check a claim still rested with human journalists. This approach highlighted the importance of maintaining human oversight in AI-powered journalism tools, ensuring that the technology augments rather than replaces human judgment.
By focusing on these strategies — iterative development, building a robust dataset, and prioritizing efficiency and value creation — the Dubawa team was able to overcome the initial implementation challenges and create a tool that effectively addresses the needs of fact-checkers in the African media landscape.
Feedback Loop
Monsur explains that they have gotten substantial feedback on the improvements that need to be made to the user experience “Before we launched the product we conducted internal testing with everyone at the CJID. We saw that a lot of features had to be improved, which helped us build a better product. The idea of radio monitoring came from user feedback as well. It’s a continuous process for us, and we have seen a lot of improvement over time.”
Outcome and Impact
Dubawa Audio has made significant strides in its implementation and adoption across West Africa. The tool currently monitors four radio shows — three in Nigeria and one in Ghana — demonstrating its ability to handle diverse content across different regional contexts.
The impact on the fact-checking process has been substantial. As one of the developers explains, “So rather than having to listen to hours and hours of radio… we basically reduce the time needed for our fact-checkers to get to the right information. It’s still a human decision to decide on what to fact-check. Dubawa AI just sources the claims from the radio shows.” This statement underscores how the tool has streamlined the fact-checking workflow, allowing human fact-checkers to focus their efforts on verifying the most relevant claims rather than sifting through hours of content.
The platform’s user base has grown impressively, now boasting over 150 users. Interestingly, while the tool was initially conceived for fact-checking, many journalists have found it valuable primarily for its transcription capabilities. This unexpected use case highlights the tool’s versatility and its potential to address multiple needs in the journalism ecosystem.
To ensure the tool’s effectiveness, the Dubawa team has implemented a rigorous accuracy benchmarking process. They compare the AI-generated transcriptions with manual transcriptions of the same content, calculating a percentage accuracy score. This ongoing evaluation allows them to continually refine and improve the tool’s performance.
Since its launch, Dubawa Audio AI has had a notable impact on fact-checking efforts, particularly by reducing the time spent reviewing radio broadcasts, allowing fact-checkers to focus on the most relevant information.
The tool’s real-time monitoring capabilities also improved the speed at which misinformation could be addressed, giving newsrooms in Nigeria and Ghana a tool that made fact-checking more efficient.
Overall, Dubawa Audio has demonstrated its value in significantly reducing the time needed for fact-checkers to identify relevant claims, while also providing useful transcription services to a broader journalistic audience. Its success in monitoring multiple radio shows across different countries suggests the potential for further expansion and impact in the fight against misinformation in Africa.
Lessons Learned
The development of Dubawa Audio AI provided valuable insights for AI implementation in journalism, particularly in the African context. These lessons can guide other newsrooms and organisations looking to develop similar tools:
Staying updated with technological advancements proved crucial. The team had to adapt quickly, switching from Distilbert to OpenAI models for better performance. As one team member noted, “The biggest lessons we learned would probably be that we need to stay on top of technological advancements. You can’t just start with one technology and want to stick with that all along.”
The project highlighted Africa’s need for more representation in AI development. The lack of African language and accent representation in existing AI models underscores the importance of developing local solutions.
Collaboration with local partners was key to success. Partnering with a local transcription company proved more effective than using off-the-shelf solutions, especially in handling the nuances of local languages and accents.
User experience emerged as a critical factor. Continuous feedback and redesigns were necessary to improve usability for non-technical users, ensuring the tool’s adoption and effectiveness.
The team learned to consider broader applications beyond the initial scope. While originally conceived as a fact-checking aid, Dubawa Audio AI evolved into a comprehensive radio monitoring platform, demonstrating the value of remaining open to new possibilities.
Long-term challenges, particularly in language representation and accent variation, require ongoing attention and resources. The team recognised that these issues would continue to need focus and innovation.
Adaptability and evolution became central to the project’s success. The team remained open to evolving the tool based on early outcomes, which allowed them to add valuable features like archival monitoring that weren’t part of the original plan.
The project reinforced that AI complements rather than replaces human judgment. While AI significantly reduced the workload, human oversight remained crucial, with fact-checkers making the final decisions on what to fact-check.
Addressing local contexts proved essential. Dubawa Audio AI’s focus on radio, a dominant media source in Africa, underscored the importance of tailoring AI solutions to regional media consumption patterns. This approach highlighted that tools developed for other markets might not be as effective in different contexts.
Finally, the value of an in-house technical team became apparent. CJID’s team of machine learning engineers, data engineers, and software developers was crucial to the project’s success. As one team member explained, “We also have this AI in journalism fellowship, where we put tech guys and journalists or researchers on the same team to ideate and develop products to solve journalism problems. These journalists take the products to their newsrooms to support their work, and it motivates stakeholders in seeing why it’s important to have a technical team in-house.”
Future Outlook
The team behind Dubawa Audio AI has ambitious plans for the tool’s future development and impact. These plans encompass several key areas of focus:
Expanding language support is a primary goal. The team aims to include more African languages like Yoruba, Hausa, and Igbo, broadening the tool’s applicability across diverse linguistic communities in Africa.
Building comprehensive datasets is another crucial objective. The team is working on creating datasets that represent various accents, genders, and dialects. This effort will enhance the tool’s accuracy and effectiveness across different African contexts.
Government collaboration forms an important part of the future strategy. CJID is part of the Nigerian AI Collective and is working with the government, with support from Luminate, to advance the broader AI Ecosystem in Nigeria. This collaboration could significantly accelerate the development of AI solutions in the information ecosystem.
The team is also positioning Dubawa Audio as a key tool for media monitoring during future election cycles. As one team member envisions, “I expect that by the next general election cycle, it’s going to be a key determinant of understanding every information being disseminated on the radio. It’s going to be a one-stop shop for everything media monitoring that relates to radio.” This focus could make Dubawa Audio an essential resource for ensuring informed and fair electoral processes.
Open-source contribution is another avenue the team is exploring. There are plans to contribute to the advancement of AI in African languages, potentially through open-source initiatives. This approach could foster collaboration and accelerate progress in developing AI solutions for African languages.
Conclusion
The development of Dubawa Audio demonstrates the potential for AI solutions tailored to the unique challenges of sub-Saharan African journalism. By focusing on local needs, collaborating with regional partners, and continuously adapting to technological changes, the team has created a valuable tool for fact-checkers and journalists. As AI continues to evolve, projects like Dubawa Audio pave the way for more inclusive and effective technological solutions in the region. Monsur says “foundational problems for Nigeria are infrastructure and computer, and there are different gaps in each newsroom. It all depends on the vision of the leadership in the newsrooms.”
Thank you to Aisha Bello for research and copy support.
First published on Medium By Stephanie 'S.I' Ohumu on January 30, 2025.
Exported from Medium on April 22, 2025.