Experiments on Approaches to Named Entity Recognition in IsiZulu
Named Entity Recognition (NER) has always been one significant part of Natural Language Processing technologies. This thesis investigates a range of approaches to address the NER task in isiZulu, a morphologically complex language belonging to the Bantu languages, spoken by over 12 million people in South Africa. We present a deep learning based part-of-speech tagger with custom 50-dimensional word embeddings to automatically generate POS tags for the following NER task, which outperforms the previous benchmark by around 9% of the F-score. Furthermore, we evaluate both feature-based and sequence-based approaches (i.e. CRF and bidirectional RNN) for entity recognition, showing the robustness of the former method with enhanced feature engineering and the challenges of applying the latter method. Also, we propose potential improvements that could be achieved for future system development. Our work contributes to developing core NLP technologies for comparatively low-resource languages that have not been examined thoroughly and provides insights into other research on similar problems.
Showing items related by title, author, creator and subject.