About The Project - Towards Amharic DBpedia

Amharic Wikipedia lacks structured knowledge representation, which limits its integration into global knowledge graphs like DBpedia. This gap hinders the accessibility, discoverability, and interoperability of Amharic content in multilingual semantic web applications.

This project proposes to enhance the Amharic chapter of DBpedia by extracting structured data from Amharic Wikipedia. It involves:

  • Extending the DBpedia Extraction Framework to support Amharic
  • Creating new mappings for infobox templates
  • Extracting rich semantic data such as citations, disambiguations, and anchor texts

The output will be published as RDF triples and made accessible via a user-friendly web interface, following FAIR (Findable, Accessible, Interoperable, Reusable) data principles.

Mentors

Ricardo Usbeck, Meti, Hizkiel Alemayehu, Tilahun Abedissa


Deliverables


  • An Amharic-compatible extension to the DBpedia Extraction Framework
  • New infobox mapping templates for Amharic Wikipedia
  • An automated extraction pipeline with support for:
    • Citations
    • Disambiguation
    • Topical concepts
    • Personal data
  • A structured knowledge graph (RDF triples) linked with English DBpedia and Wikidata
  • A lightweight web interface with:
    • SPARQL querying
    • Multilingual support
  • Comprehensive documentation covering:
    • Setup
    • Usage
    • FAIR-compliant publishing

Project Technologies


  • Python
  • DBpedia Extraction Framework (DEF)
  • Apache Jena
  • SPARQL
  • Laravel (PHP)

Project Topics


  • Semantic Web
  • Knowledge Graph
  • Web

_Weekly reports and updates will be shared here throughout GSoC 2025.

Written on May 25, 2025