About The Project - Towards Amharic DBpedia
Amharic Wikipedia lacks structured knowledge representation, which limits its integration into global knowledge graphs like DBpedia. This gap hinders the accessibility, discoverability, and interoperability of Amharic content in multilingual semantic web applications.
This project proposes to enhance the Amharic chapter of DBpedia by extracting structured data from Amharic Wikipedia. It involves:
- Extending the DBpedia Extraction Framework to support Amharic
- Creating new mappings for infobox templates
- Extracting rich semantic data such as citations, disambiguations, and anchor texts
The output will be published as RDF triples and made accessible via a user-friendly web interface, following FAIR (Findable, Accessible, Interoperable, Reusable) data principles.
Mentors
Ricardo Usbeck, Meti, Hizkiel Alemayehu, Tilahun Abedissa
Deliverables
- An Amharic-compatible extension to the DBpedia Extraction Framework
- New infobox mapping templates for Amharic Wikipedia
- An automated extraction pipeline with support for:
- Citations
- Disambiguation
- Topical concepts
- Personal data
- A structured knowledge graph (RDF triples) linked with English DBpedia and Wikidata
- A lightweight web interface with:
- SPARQL querying
- Multilingual support
- Comprehensive documentation covering:
- Setup
- Usage
- FAIR-compliant publishing
Project Technologies
- Python
- DBpedia Extraction Framework (DEF)
- Apache Jena
- SPARQL
- Laravel (PHP)
Project Topics
- Semantic Web
- Knowledge Graph
- Web
_Weekly reports and updates will be shared here throughout GSoC 2025.
Written on May 25, 2025