Talk abstract: The realities of building domain-specific language models for production. LSEG Labs have built a set of domain-specific language models, based on Google’s BERT architecture, using LSEG’s proprietary financial data. In this talk, we will discuss our journey taking these models from inception to production, covering all the pain-points along the way.
Focusing on our Financial News NLP model, we will look at the pre-processing of financial news, training on GCP Preemptible TPUs and running inference via AWS Batch Transform. We will discuss how we benchmark our model using a downstream classification task. Finally, we will look at the pros & cons of different ways we are able to serve these models to customers.
Bio: Matt Harding is a full-stack developer within the LSEG Labs team. Having been with the team for 4 years , he works on MVPs in the areas of machine learning and financial data analytics. He has 8 years commercial experience working on trade analytics for companies including Barclays Investment Bank and Thomson Reuters. He enjoys participating in hackathons, recently winning one of the tops prizes at Tech Crunch Disrupt 2019. He holds a BSc in Economics and an MSc in Computer Science from the University of Nottingham.