undergraduate dissertation · early NLP · 2023-2024

Biological knowledge graph construction

Undergraduate dissertation using transformer-assisted triple extraction on biological abstracts, then turning the output into graph structure.

NLP · biology · knowledge graphs · transformers · PubMed

This was my undergraduate dissertation, and one of the projects where my biology background and later NLP interests overlap most directly. The question was whether transformer models could help extract useful subject-relation-object triples from biological abstracts, and whether those triples could be turned into a meaningful knowledge graph.

The work included corpus collection from PubMed abstracts, triplet extraction, qualitative graph visualisation, and quantitative evaluation through downstream predictive models. It is rougher than my more recent projects, but it captures a transition point: using NLP tools on scientific text before I had fully moved into speech and language processing.

The local timestamp scan places the project material from June 2023, with later file activity in August 2024, under Python/text-mining-project-main/.

← All projects