Data Engineer → ML Engineer

Building the bridge from raw data to intelligent systems

I'm Nikhil Kudupudi — transforming complex data challenges into scalable ML pipelines. Currently pursuing my Master's at Northeastern, previously engineering fintech solutions at KFin Technologies.

3.9

GPA at NEU

40%

Faster Decisions

1000s

Records Daily

2+

Years Experience

From Data Pipelines to Machine Learning

SEP 2024 — PRESENT

Northeastern University

M.S. Data Analytics Engineering · Boston, MA

Deepening expertise in machine learning, statistical modeling, and MLOps. Building end-to-end systems that bridge data engineering fundamentals with intelligent automation.

Machine Learning MLOps Deep Learning Data Mining
AUG 2023 — JUL 2024

Fintech Engineer

KFin Technologies · Hyderabad, India

Designed ETL workflows powering forecasting systems. Created dashboards that cut decision time by 40% — realizing the power of data-driven insights and wanting to push further into ML.

ETL Pipelines Power BI Forecasting Automation
NOV 2022 — AUG 2023

Software Engineer

WebileApps · Hyderabad, India

First exposure to data at scale. Automated collection and cleansing processes, reducing analyst workload by 25%. This sparked my passion for building systems that transform raw data into actionable insights.

Python Automation Analytics Dashboards

Projects that tell the story

Data Engineering

Real-Time News Analytics

End-to-end streaming pipeline with dynamic topic ingestion, sentiment analysis, and multi-platform visualizations for actionable market insights.

KafkaSpark StreamingNLP EngineDelta LakeDashboard
Kafka Spark Streaming NLP dbt Delta Lake
MLOps · AI Systems

Modular RAG Application

Scalable, Dockerized NLP chatbot with configurable content sources. GCP-native architecture with orchestrated pipelines and real-time retrieval.

SourcesPrefectEmbeddingsVector StoreStreamlit
RAG MLflow Prefect GCP Docker
Data Engineering

Reddit Data Pipeline

PRAW-powered ETL processing thousands of daily records with AWS infrastructure, automated cataloging, and BI-ready analytics tables.

PRAWAirflowS3GlueRedshiftBI
Airflow Celery AWS S3 Glue Redshift
ML · Orchestration

Time-Series Forecasting

Modular forecasting pipeline with automated feature engineering, multiple model variants, and anomaly detection for market trend analysis.

APIsPostgreSQLFeaturesModelsDashboard
ARIMA Prophet LSTM Airflow AWS
Analytics · Visualization

Bank Churn Analysis

Comprehensive analytics workflow identifying high-risk customers through EDA, feature analysis, and interactive dashboards for retention strategies.

DataEDAFeature AnalysisSegmentationTableau
Pandas EDA Tableau Matplotlib
Full-Stack · Automation

Employment System Dashboard

Backend system for applicant onboarding with Google APIs, DocuSign flow automation, document status tracking, and automated email alerts.

ClientAPIsDocuSignStatus EngineAlerts
JavaScript Python Google APIs DocuSign Streamlit

Other Work

Heart Disease Prediction Autonomous Car Steering Simulation Crash Reporting Analytics Crime Data Visualization

The technical foundation

Data Engineering

  • Apache Kafka
  • Spark / PySpark
  • Airflow / Prefect
  • dbt
  • Snowflake

ML & Analytics

  • Scikit-learn
  • XGBoost
  • PyTorch
  • MLflow
  • FAISS / RAG

Cloud & DevOps

  • AWS (S3, Glue, Redshift)
  • GCP (Cloud Run)
  • Docker
  • GitHub Actions
  • Terraform

Building something interesting?

I'm always open to discussing data engineering, MLOps, or new opportunities.