Talk abstract: The realities of building domain-specific language models for production. LSEG Labs have built a set of domain-specific language models, based on Google’s BERT architecture, using LSEG’s proprietary financial data. In this talk, we will discuss our journey taking these models from inception to production, covering all the pain-points along the way.
Focusing on our Financial News NLP model, we will look at the pre-processing of financial news, training on GCP Preemptible TPUs and running inference via AWS Batch Transform. We will discuss how we benchmark our model using a downstream classification task. Finally, we will look at the pros & cons of different ways we are able to serve these models to customers.
Bio: Stanimir Vichev is a senior backend engineer working with the LSEG Labs team for the past three years. He has worked on designing, building and deploying several apps in the fields of machine learning, NLP, real-time analytics and blockchain. He has around 7 years of commercial experience in engineering, working for Thomson Reuters and Bank of America Merrill Lynch. Stanimir holds a BSc in Computer and Business Studies from the University of Warwick, as well as a Masters in Information and Data Science from UC Berkeley.