Projects

OS Mixture of Depths

Implemented Mixture-of-Depths: Dynamically allocating compute in transformer-based language models by Raposo et al.

OS ShortGPT

Unofficial implementations of block/layer-wise pruning methods for LLMs.

OS Stealing Part of a Production Language Model

Implemented Stealing Part of a Production Language Model by Carlini et al.

General-GPT

Initial exploration of fine-tuning GPT-2 for interleaved CLIP embedding input and output. The goal of this project is to showcase that GPT is able to directly reason across multiple modalities.

shivaen.org

My Personal Website

ZEST

Zoom Education Suite, an add-on to Zoom calls that my team and I built as part of HooHacks 2020

TreasureAI

An AI that attempts to find the treasure in an OpenAI gym envionment

Work Experience

Apple

Present

NLP Software Engineer July, 2024 - Present

Swift
C++
Python
NLP

Toyon Research Corporation

June, 2024

Deep Learning Engineer January, 2023 - June, 2024

• Efficiently integrated deep learning based dense estimation techniques such as optical flow into C++ applications for overhead/satellite imagery. Performed optimizations using TensorRT and multiprocessing for 10-30x speedups. • Configuring and training multi-agent RL experiments in both cooperative and competitive settings. • Technical lead for a low-resource speech-to-speech translation project working with cascaded unit-based models. • Lead developer for an extractive and generative RAG pipeline accompanied with more concrete evaluations.

C++
PyTorch
Computer Vision
Audio/Speech Processing
NLP

Multimodal Research

December, 2022

Student Researcher in CLAWS@GT August, 2021 - December, 2022

Advised by Professor Srijan Kumar and working with Gaurav Verma at Georiga Tech on multimodal system robustness.

• Implemented an adversarial approach that utilizes XLAN and Meshed Memory image captioning models to test the robustness of current multimodal models on the CrisisMMD Dataset.
• Using Bottom-Up Attention, BERT, CLIP, and NLTK to create image relevant text augmentations for multimodal models like CLIP. This work led to our paper at ACL'23 titled: "Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning"

PyTorch
Multimdodal
NLP

Brightcove

August, 2022

Machine Learning Engineer June, 2022 - August, 2022

Developed algorithms and methods using deep learning approaches for automatic video segmentation/chapetering.

• Designed various pipelines, supervised and unsupervised, for topic segmentation of video transcripts to detect important points/segments of time within videos.
• Implemented and modified recent segmentation methods using pre-trained language models and PyTorch. These models ranged from Text Tiling as a baseline to hierarchical BERT style models.
• Processed publicly available meeting and article corpora to evaluate approaches using multiple segmentation metrics on top of common classification metrics.
• Presented these results alongside a qualitative assessment to highlight the challenges and successes that come with each approach.

PyTorch
Multimodal
NLP
AWS Sagemaker

Brain Technologies, Inc.

August, 2021

NLP Intern June, 2021 - August, 2021

Worked with a small team to create a Flask app using GPT-3 for query analysis and refinement in order to make recommendations on food and products.

• Used NLP libraries (SpaCy, NLTK, etc.) alongside GPT-3 and prebuilt models for named entity recognition, question-answering, and relevancy filtering on both queries and reviews.
• Built models such as bidirectional RNNs in Tensorflow as well as pretrained task-agnostic/task-specific BERT models for intent recognition and vague/non-vague classification.
• Integrated recommendation components with production app by configuring an API.

Python
PyTorch
Tensorflow
GPT-3
AWS

Recommendation System Research

May, 2020

Student Researcher September, 2020 - May, 2021

Working with Professor Hongning Wang and graduate student Renqin Cai at UVa on popularity bias in recommendation models.

• Implemented pipeline for both RNN and Self-Attention models which includes metrics, data preprocessing, training, evaluation, and the models themselves.
• Assessed bias in and more interestingly in testing, where sequential predictions in the RNN would impact the bias over time.
• Created evaluation metric called "temporal discounting" to assess popularity bias in sequential models, which will allow for debiasing architectures to be developed.

Python
PyTorch
CUDA

Optum

August, 2020

Technology Development Intern June, 2020 - August, 2020

Worked as a full-stack software dev and product owner to create a COVID-19 dashboard where managers and directors could learn more about the virus as it pertains to their direct reports and/or the company as a whole.

• Set up a CI/CD pipeline through Jenkins to Openshift
• Created and maintained our MongoDB on top of constructing our ML pipeline
• Worked with express in NodeJS to set up endpoints for our data as well as harnessed active directory(LDAP) to collect location and direct reports of our users
• Contributed in building the web app using Angular, in which I utilized D3 to create and interactiveUS map, integrated single sign-on, and used Optum’s own UI toolkit for various components

HTML5/CSS/JS
NodeJS
Tensorflow
MongoDB
D3JS
Angular