This project presents an integrated framework for detecting and classifying Obsessive-Compulsive Disorder (OCD) in online discourse by harnessing the synergy between ontology development and machine learning. The research develops a comprehensive OCD ontology that captures the multifaceted aspects of the disorder — its symptoms, behaviours, and related mental health concepts — drawing upon medical literature, psychological studies, and existing biomedical ontologies. The ontology bridges formal clinical terminology with the non-specialist language used by individuals in online forums, enabling accurate interpretation and classification of OCD-related content in digital communications.
Research Questions
- What knowledge about OCD exists in current medical and mental disease ontologies, and how complete is their coverage of the disorder?
- How can an ontology for OCD be built and evaluated to incorporate both formal medical definitions and the language used by non-experts in online forums?
- How effectively can the ontology categorise OCD data into obsessions and compulsions, and what is the impact of enriching it with non-specialist vocabulary?
- To what extent can NLP-based techniques, including BERT and BioBERT models, improve the prediction of obsession and compulsion when trained on ontology-labelled datasets?
Key Contributions
- OCD Ontology Development: Two publicly available OCD ontologies — one focusing on clinical vocabulary and an extended version integrating non-specialist language through WordNet and contextual similarity features. Available on BioPortal.
- Ontology Reuse Patterns: Novel ontology alignment design patterns enabling full, partial, or nominal reuse of existing biomedical ontologies (MDO, SNOMED-CT, BFO, etc.).
- Deep Learning Enrichment: Automatic ontology enrichment using Bio_ClinicalBERT and Word2Vec models trained on OCD forum data (54,410 posts) to discover semantically related terms.
- Labelled Dataset Creation: An ontology-guided labelled dataset for OCD forum posts, specifically categorised for obsession and compulsion, providing a foundation for machine learning classification.
- Classification Models: Comprehensive evaluation of classification models (BERT, BioBERT, SVM, Random Forest) demonstrating that fine-tuned BERT significantly outperforms traditional approaches for OCD content classification.