booknlp

UC Berkeley School of Information
Currently developing BookNLP– a multi-language NLP pipeline to annotate and disambiguate books and long documents– in Python. Researching and implementing algorithms for quotation identification, quotation attribution, character name clustering, and character gender inference.

cross-linguistic analysis of acoustic data

UC Berkeley D-Lab
Generalized vowel formants across speakers of the same language. Parsed acoustic language data to find IPA transcriptions, developed and applied machine learning techniques to normalize acoustic data across languages, and condensed tokens for individual speakers to create languages profiles. Worked in Python using the parselmouth library.