// AI_ML_ENGINEER & COMPUTER_VISION_SPECIALIST

SOHAM
BIT
PyTorch · Hugging Face · Computer Vision · GenAI

B.Tech CSE @ IIITR (3rd Year · CGPA 8.05). Research Intern at ISRO × DTU — satellite 6DoF pose estimation. Building production AI: RAG pipelines, voice assistants, custom LLMs. Open to AI/ML & Computer Vision internship roles.

9+
Projects
8.05
CGPA
ISRO
Research
Top 15%
Hackathon
// 01

About

soham@iiitr:~$ whoami
$cat profile.json
{
"name":"Soham Bit",
"degree":"B.Tech CSE, IIIT Raichur (2023–2027)",
"cgpa":8.05,
"specialization":"Machine Learning · Computer Vision · GenAI",
"current_role":"Research Intern @ ISRO × DTU (6DoF Pose Estimation)",
"core_stack":["PyTorch", "Hugging Face", "LangChain", "OpenCV"],
"also_built":["RAG Pipelines", "Voice Assistants", "Custom GPT", "RL Agents"],
"coordinator":"DEEPLABS — AI & ML Society, IIITR",
"open_to":"AI/ML Internships · Computer Vision Roles"
}
$
// 02

Experience

Research Intern — 6DoF Pose Estimation
ISRO × DTU Internship
May 2025 – Jan 2026
  • Benchmarked 5 SOTA pose estimation architectures (KRN, SPN, ViTPose, PVNet, HRNet) on BOP/IPD datasets; analyzed accuracy-latency trade-offs for satellite deployment.
  • Engineered a lightweight neural network for 6DoF pose estimation, reducing model size by ~60% vs. baseline while retaining >85% ADD-S accuracy on BOP benchmarks.
  • Applied transfer learning & multi-modal fusion, cutting inference latency by ~40% for on-board satellite CPU constraints.
Coordinator — DEEPLABS
AI & ML Society, IIITR
Jan 2025 – Present
  • Led 6+ workshops on NLP, RAG pipelines, transformer fine-tuning, and LLM deployment for 50+ junior students; average session rating 4.6/5.
  • Organized 3 technical sessions on conversational AI; grew society active membership by 30%.
Tech Manager — CodeSoc
IIITR
Nov 2023 – Sep 2024
  • Led collaborative GitHub projects and delivered technical DSA and AI workshops for society members.
// 03

Projects

2025
FEATUREDRAGGenAI
Voice-Controlled Desktop Assistant
Modular voice assistant with RAG pipeline & 5+ Google API integrations (Calendar, Gmail, Drive). <2s end-to-end response latency.
>93% intent accuracy
<2s latency
6 task categories
2025
ISROCV6DoF
6DoF Object Pose Estimation — Multi-modal CNN
Two-stage multimodal ResNet-50 with CBAM attention on BOP/IPD dataset for industrial bin-picking. RGB-D fusion with residual refinement head.
<3mm translation err
<2° rotation err
87% ADD-S score
2025
TransformerDockerHackathon
Dream11 — IPL Fantasy Score Predictor
Transformer architecture over player performance, match context & universal embeddings. Dockerized pipeline, deployment in under 5 minutes.
8.3 RMSE
200+ players
Top 15% rank
2024
CVTemporalDeployed
Video Anomaly Detection
Frame-level anomaly classifier on UCF-Crime (1,900+ videos) using TSN & I3D temporal features. Deployed with web + backend interface.
89.2% AUC
1,900+ videos
2024
DiffusionFrom Scratch
Stable Diffusion Re-Implementation
Recreated core Stable Diffusion from scratch: U-Net, VAE, CLIP encoder, DDPM/DDIM schedulers. Explored sampling strategies and speed-diversity trade-offs.
2024
NLPCNN+LSTM
Flickr8k Image Captioning
CNN encoder + LSTM attention decoder with teacher forcing. MobileNetV2 variant 4× smaller with <3% BLEU degradation vs full ResNet encoder.
0.27 BLEU-4
8,000 images
2024
RLDQNPPO
Reinforcement Learning Agents Suite
DQN, DQN-Conv, Actor-Critic & PPO agents from scratch. DQN-Conv converged in <400 episodes; PPO stable across 4 OpenAI Gym environments.
195+ avg reward
<5% reward variance
2023–24
LLMGPTFrom Scratch
Custom LLM Applications & GPT from Scratch
Full autoregressive GPT (6L · 6H · 384D) trained on 1M+ chars in <20 min on a single GPU. Character-level perplexity ~4.2.
~4.2 perplexity
1M+ char corpus
2023
HACKATHONNLPLegalAI
Automated Jurisdiction System — India
IPC Recommendation, Legal Chatbot, Abstractive Summarizer & Multilingual Explainer in one multi-user portal for judges, police, and lawyers. (Code Kshetra 2.0)
// 04

Tech Stack

// DL Frameworks
PyTorchHugging FaceLangChainSentence-TransformersONNXTensorRT
// Computer Vision
6DoF Pose EstimationCNNsOpenCVViTCBAM AttentionObject DetectionBOP/IPDPoint CloudsResNet / MobileNet
// NLP / GenAI
TransformersRAG PipelinesDiffusion ModelsWhisper STTXTTS TTSGemini APIVAEsGPT (from scratch)
// ML Methods
Transfer LearningFine-tuningFeature EngineeringModel DeploymentMixed-Precision Training
// Reinforcement Learning
DQNPPOActor-CriticOpenAI Gym
// Tools & Platforms
PythonC++DockerGit / GitHubDjangoReact.jsNode.jsMongoDBGoogle ColabPyQt6SQL
// transmission_open
Let's Build
Something Real

Actively seeking AI/ML & Computer Vision internship roles. Open to research collaborations and open-source work.
Reach out — let's talk about what I can build for your team.