Large Language Model Operations

Flockserve - LLM Inference Endpoint

Purpose:Most of production LLM loads are carried using closed-sourced solutions of cloud providers such as Google's Vertex AI or Azure ML etc. Purpose was to develop an Open-sourced, cloud agnostic, cost-efficient and flexiable alternative to those services.

Challange:Handling dynamic request rates and high volumes of traffic

Key Strategies Applied:Asynchronous processing of the requests were key to process high volumes of requests. Developing a custom metric "Queue Length Running Mean" to base the up/down scaling decision worked effectively. Use of skypilot for node provisioning was very helpful for achieving a cloud-agnostic solution.

Time Series Forcasting

M5 - Walmart Sales Forecasting Challange

Purpose:Forecasting sales of 30000 items in 12 Walmart Stores for 28 days using the last 6 years’ sales data together with a calendar and product-related information.

Challange:Intermittent demand for products was the main challenge with this dataset. Also, single-product level sales were highly variable.

Key Strategies Applied:200 Features are generated mainly statistics on sales data and interactions between sales and calendar. Also applied clustering based on intermittent demand-related features to group the products and train in-group products together. Finally, 28*3 Gradient Boosting Machines are trained to forecast different horizons from 1 to 28.

Image & Natural Language Processing

Product Matching

Purpose:Using an e-commerce platform’s i.e Shopee — product listing images and textual descriptions written by the owner of the listing, identify the identical products listed by different vendors.

Data:35000 listing images and descriptions in English or Indonesian or both.

Strategy:Creating combined embedding space of image and text then quantifying the similarity of listings based on cosine distance.

Model Architecture:EfficientNet-b3 & BERT + FC + ArcFace

Key Properties:Unseen test data micro averaged F1-score of ~0.73

Web crawling & Scraping

Scraping car listings and images

Purpose: Scraping, transforming and storing car images together with other relevant information.

Scope: 1.5 million images

Storage: Amazon Web Services (AWS) – S3

Key Properties: Scraped responsibly by obeying Robots.txt and with 1 API request per second rate.

Deep Learning for Sequential Data

Predicting m-RNA Folding Probabilities

Purpose: Given m-RNA molecule base pair sequences and properties of each base pair, predicting the folding probability of each base pair.

Data: Sequential data as the order of m-RNA molecule is critical to understand the behavior of the molecule. Therefore, transformers and recurrent neural networks are useful.

Model Architecture: Embedding + LSTM with 3 hidden layer + Linear output layer

Key Properties: GPU training, Data augmentation, Weighted training by measurement errors, use of experiment tracking tools.

Predictive Modelling

Predicting the Critical Temperature of Superconductors

Purpose: Understanding the affecting factors and predicting the critical temperature of superconductors.

Data: 20 thousand rows and 81 columns of data representing the chemical properties of superconductors.

Model Development: Regression models are developed using stepwise feature selection, L1&L2 parameter shrinkages. Also, XGBoost hyper-parameter tuning with grid-search is performed and XGBoost model is trained and compared with regression models.

Key Achievements: GPU training, Data augmentation, Weighted training by measurement errors, use of experiment tracking tools.

Processing Data Stream

Processing, Visualising and Storing real-time fire data

Purpose: Create 3 data streams of temperature data, process, join and pipeline it to feed dynamic visualisation showing recent highest temperature values and static visualisation showing fire locations on a map.

Data: Historic surface temperature data coming from different NASA satellites.

System Architecture: 3 Kafka event producers are created to simulate real-time data with variable broadcasting frequencies. This data is parallelly processed by Spark streaming application. Results are visualised and saved into MongoDB.