About Me
I am a passionate AI/ML professional with 2+ years of experience in scalable AI solutions, data engineering, and MLOps. I specialize in building intelligent systems that drive business impact and innovation. Committed to democratizing AI, I focus on ethical AI development, deep learning, and cloud-based ML solutions, continuously optimizing models for real-world applications. π
Education
- MSc in Applied Data Analytics - Boston University, Massachusetts (January 2025)
Professional Experience
AI/ML Engineer at Steradian, Bhubaneswar
(June 2022- July 2023)
- Enhanced data quality for medical datasets, improving accuracy and reliability.
- Optimized IoT data analysis, leading to a 30% increase in operational efficiency.
- Developed a machine learning-based recommendation engine, boosting cross-selling by 40%.
Research and Development Engineer at Innotrat, Remote
(November 2022- August 2023)
- Built a document classification system using BERT and Generative AI, categorizing files with improved precision.
- Performed object detection on drawings with YOLOv5OBB, achieving mAP >85% across 200+ labeled drawings.
- Trained an OCR model using PaddleOCR to recognize handwritten text, achieving an hmean/F1 score >90%.
Data Analyst Intern at Remotecare, Chennai
(August 2021- June 2022)
- Implemented ML models for customer churn prediction and sales forecasting.
- Conducted extensive data analysis to identify trends and drive company performance.
Applied Machine Learning Student Intern at Applied AI, Remote
(July 2021- August 2022)
- Moderated machine learning assignments for 200+ students, ensuring consistent quality and timely feedback.
- Designed exercises on supervised learning and neural networks, improving studentsβ understanding by 30%.
π Skills
- Languages: Python, SQL, R, PySpark, MATLAB, Bash, C, HTML, CSS
- Libraries & Frameworks: Pandas, NumPy, Scikit-Learn, TensorFlow, Keras, PyTorch, NLTK, Spacy, OpenCV
π Data Engineering
- ETL & Data Pipelines: Apache Spark, Airflow, dbt, Kafka, Luigi
- Data Warehousing & Storage: AWS Glue, Redshift, S3, EMR, Snowflake, Google BigQuery, Azure Synapse
- Big Data Technologies: Hadoop, Apache Spark, Data Lakes, Delta Lake
- Version Control & CI/CD: Git, GitHub Actions, Docker, Kubernetes, Terraform
π€ Machine Learning & AI
- Traditional ML: Regression, Classification, Clustering, Random Forest, Gradient Boosting, SVM
- Deep Learning: CNNs, GANs, LSTMs, Transformer Models (BERT, GPT), Reinforcement Learning
- Feature Engineering & Model Optimization: Hyperparameter Tuning, Cross-Validation, Model Explainability
π Data Analysis & Visualization
- Exploratory Data Analysis (EDA): Pandas, NumPy, Scipy, Statsmodels
- Data Visualization: Matplotlib, Seaborn, Power BI, Tableau
- Statistical Analysis: Probability, Hypothesis Testing, A/B Testing
βοΈ MLOps & DevOps
- Cloud Platforms: AWS, GCP, Azure
- Model Deployment: Flask, FastAPI, TensorFlow Serving, MLflow, Vertex AI, Sagemaker
- CI/CD & Automation: Jenkins, Docker, Kubernetes, Airflow, Terraform
π Soft Skills
- Problem-solving, Critical Thinking, Effective Communication, Team Collaboration, Agile Methodologies
Featured Work
Below are some of the projects Iβm most proud of. Each showcases my technical skills and commitment to solving complex problems and delivering real value through data-driven innovation.
π AWS Serverless Data Pipeline
π Project Overview
This project demonstrates a serverless data pipeline using AWS S3, SNS, SQS, Lambda, Glue, and Athena to automate cross-region data migration and querying. The pipeline transfers data from an S3 bucket in one region to another, automates schema detection with Glue, and enables querying with Athena.
β
Key Learning Outcomes:
- Migrate data across S3 buckets using SNS, SQS, and Lambda.
- Automate workflows with AWS Lambda for event-driven processing.
- Catalog data in AWS Glue and query it via AWS Athena.
- Implement cross-region data management with serverless architectures.
- Build a scalable and automated data processing pipeline in AWS.
π§ Project Steps
- Create S3 Buckets β Set up source and target buckets for data transfer.
- Configure SNS & SQS β Set up notifications and event-driven triggers.
- Develop Lambda Function β Automate data transfer and trigger Glue Crawler.
- Run Glue Crawler β Detect schema and register data in the Glue Catalog.
- Query Data in Athena β Use SQL queries to analyze the transferred data.
This project provides hands-on experience in cloud automation, serverless workflows, and scalable data processing. π
π AWS Glue & Snowflake ETL Pipeline
π Project Overview
This project builds a scalable ETL pipeline using AWS Glue, S3, dbt, and Snowflake to extract data from an external API, store it in S3, and automate transformations with dbt before loading into Snowflake.
β
Key Learning Outcomes:
- Extract and store API data using AWS Glue and S3.
- Automate data transformation with dbt macros in Snowflake.
- Implement multi-layered data modeling (raw β transform β mart).
- Set up secure IAM roles & integrations for AWS and Snowflake.
- Deploy an end-to-end cloud-based ETL pipeline for analytics.
π§ Project Steps
- Set Up IAM Roles β Grant Glue and Snowflake access permissions.
- Extract & Store Data β Use AWS Glue to pull API data into S3.
- Integrate Snowflake & AWS β Configure secure data access.
- Transform Data with dbt β Build raw, transform, and mart models.
- Deploy dbt Environment β Automate transformations for scalable workflows.
This project provides real-world experience in data extraction, ETL automation, and cloud-based data warehousing. π
LLM-Powered Wikipedia Chat Assistant with RAG
- Developed a sophisticated conversational assistant empowered by cutting-edge LLM technologies and retrieval-based augmentation techniques.
- Implemented the ReAct prompt framework to guide the assistant in structured question answering, ensuring accurate and informative responses.
- Leveraged OpenAIβs LLM frameworks along with Llamaindex and Chainlit for seamless integration of Wikipedia knowledge into the conversational flow.
- Ensured user-friendly interaction by enabling users to ask questions and receive well-informed answers based on selected Wikipedia pages.
- Employed advanced retrieval mechanisms and context awareness to enhance response coherence and relevance, providing an enriched conversational experience.
Object Detection Using YOLOV8
- Developed a robust web application leveraging YOLOv8 for real-time object detection in videos.
- Implemented an intuitive user interface using Streamlit, enabling seamless interaction and analysis of uploaded content.
- Integrated OpenCV for efficient image processing and accurate object detection functionalities.
- Customized features such as class selection and bounding box visualization enhance user experience and facilitate precise analysis of detected objects.
- Ensured scalability and flexibility for future enhancements by following best practices in coding and project organization.
Chatbot with OpenAI GPT-3 using Flask
- Developed a chatbot application utilizing Flask and integrated the powerful GPT-3 API for generating human-like responses to user queries.
- Implemented features to maintain a comprehensive history of all user interactions with the GPT-3 API, ensuring transparency and traceability.
- Leveraged the immense capabilities of GPT-3, trained on vast amounts of text data, to produce responses indistinguishable from human-written text.
- Ensured seamless integration of the GPT-3 API into the Flask-based chatbot, providing users with a natural and engaging conversational experience.
- Successfully deployed the chatbot application, empowering users to interact with a sophisticated AI-powered assistant capable of providing high-quality responses to a wide range of queries.
Automatic Speech Recognition With TensorFlow
- Developed an end-to-end deep learning project focusing on audio data processing and visualization, leveraging TensorFlow and Python.
- Implemented audio processing techniques and spectrogram calculation to preprocess and analyze audio data effectively.
- Evaluated the modelβs performance using the Word Error Rate (WER) metric, ensuring accuracy and reliability of the final AI model.
- Successfully deployed the trained AI model, providing a practical solution for audio data analysis and processing tasks.
- Acquired valuable insights and hands-on experience in building real-world deep learning models for audio data applications through this comprehensive project.
Conversational Assistant Using Rasa
- Developed a chatbot using Rasa, an open-source machine learning framework, by incorporating training data and visualizing chatbot stories for effective model training.
- Generated training data through various methods, including creating files with messages and actions, as well as interactive chatting with the bot to label messages and actions.
- Utilized Flask to build a simple application and seamlessly connected it with the chatbot backend, enabling smooth interaction and integration.
- Gained practical experience in chatbot development, from data generation and model training to application deployment, through hands-on implementation of Rasa framework and Flask integration.
- Demonstrated proficiency in leveraging machine learning frameworks and web development technologies to create functional and interactive chatbot applications, contributing to a holistic understanding of AI-powered conversational agents.
Build a Road Sign Recognition System with CNN
- Extensive data collection: Acquired diverse dataset with various road signs, encompassing different lighting conditions, perspectives, and environmental settings.
- Utilization of specialized deep learning architecture: Employed Convolutional Neural Networks (CNNs) as the core technological foundation for image recognition tasks.
- Implementation of Python-based libraries: Leveraged TensorFlow to develop a robust road sign classification system.
- Significant advancement in road safety: This project marks a pivotal step towards enhancing road safety and advancing the capabilities of autonomous vehicles and traffic management systems.
- Contribution to automation: By automating the identification of road signs, the project aims to improve efficiency and accuracy in traffic management.
The Learning Agency Lab - PII Data Detection
- Spearheaded the development of a cutting-edge solution for the Kaggle competition aimed at automating the detection and removal of personally identifiable information (PII) from student writing.
- Orchestrated meticulous data preprocessing to standardize and refine the textual data, ensuring optimal performance of subsequent algorithms.
- Leveraged advanced techniques such as Named Entity Recognition (NER) and machine learning models to accurately identify and categorize PII entities within the text.
- Implemented robust rule-based filtering and pattern-matching algorithms to effectively remove sensitive information while preserving the integrity and coherence of the educational content.
- Demonstrated exceptional proficiency in testing, refining, and fine-tuning the developed solution, resulting in a high-performing system poised to make a significant impact in the realms of privacy protection and educational data utilization.
If youβre interested in the more detailed aspects of my work or want to see my code in action, check out the repositories below. Each repository includes extensive documentation on how the projects were built and how they can be run.
Iβm also open to feedback on my projects, so donβt hesitate to raise an issue or submit a pull request!
Thank you for taking the time to explore my work. Letβs connect and make something great together!
How to Reach Me
I am always interested in hearing about new opportunities, and collaborations, or just chatting about technology and AI. Feel free to reach out to me through the following channels: