Intro

“Information is the oil of the 21st century and analytics is the combustion engine!” This quote by Max Levchin, co-founder of PayPal has always found a place in my curious mind that the mysteries in the world of big data are still unraveling. I have always been inclined towards the complexities posed by mathematics and was more than willing to try my hand at newer and more complicated problems. Working with data and making data driven decisions has always excited me to gain more knowledge and motivated me to work more in this domain. My immediate goal after master’s is to be a Data Analyst. In the long run I see myself becoming a Data scientist as I am hugely inspired by magic that data can show. Working in teams from my undergraduate degree has set me markers about striving for curiosity, team building, leadership, social and local problem solving. Apart from academics, I have been a part of many extra-curricular activities during my undergraduate. Having participated and managed various events has not only improved my decision making but has also made me patient and tolerant. I am a strong believer of ‘Hard work pays off’ so I made hard work as a part of my lifestyle. I am a motivated and curious engineer with demonstrated experience in Data Analytics domain. Experienced in exploratory data analytics, data visualization, developing ML algorithms and developing predictive models. I believe that I am a strong problem solver, extremely patient along with excellent communication skills. I am dedicated to exceeding goals and expectations. awesome work.

Master of Science
Data Analystics Engineering

Graduating May 2023

Courses: Probabilty and Statstics, Database Management for Analytics, Data Mining Engineering, Computation and Visualization, Machine Learning, Deterministic Operational Research

Bachelor of Engineering
Information Technology

Graduated In May 2021

Courses: Fundatmental of Datastructure, Object Oriented Programming, Discrete Structures, Computer Graphics, Theory of Computation, Database Management Systems, Software Engineering and Project Management, Systems Programming, Design and Analysis of Algorithm, Cloud Computing, Data Science and Big Data Analytics, Machince Learning and Applications, Software Design and Modeling, Software Testing and Quantitative Analysis, Internet of Things, Distributed Computing Systems, Social Media, Analytics

Data Analyst Intern
Magna Plastech & Engineering

September 2020 - February 2021

  • Initiated development of star schema data models for sales and production of the company in collaboration with team members
  • Conducted data wrangling and implemented 30+ parallelized ETL pipelines jobs leveraging extract/transform/load (ETL) data and deployed it in TAC (Talend Administrator Center) server
  • Developed interactive Power-BI dashboards of overall sales and production performance to enhance data-informed decision making
  • Created dashboard helped to identify downtime while production, thus improved production by 7% and helped to change sales strategy, increasing sales by 13%
  • Data Engineering Intern,
    MBGenesesis Healthcare Private Limited

    June 2020 - September 2020

  • Managed and Built the company’s data lake of around 100sGBs, and configured the infrastructure provisioning using Terraform
  • Monitored the query performance and logs for the snowflake warehouse by creating 5 dashboards using Datadog
  • Provided structured data across various teams to help them make informed decisions, which resulted in 10% profit growth in the quarter
  • Credit Card Approval Prediction

    January 2022 - May 2022

  • Performed Exploratory data analysis (EDA) on a dataset with 1 million rows and 50 features.
  • Preprocessed the data using the libraries like pandas and NumPy to clean the data, further analyzing it to predict the
  • Calculated the precision, recall, and accuracy a to evaluate the model and select Random Forest as the best model
  • E-Commerce of US Market
    Northeastern University

    November 2021 - December 2021

  • Applied Data Pre-processing techniques by importing dataset of 19 rows table in R studio and worked on probability distribution
  • Reported time series analysis using python using Jupiter Notebook for order and discount data and integrated with R to provide visualization report by applying recurrence plotting method
  • Performed text analysis and sentiment analysis for 2000+ products to analyze positive-negative words for common words in the product description
  • Analysis of IMDb and Movie Lens Dataset

    November 2021 - December 2021

  • Devised a multi-star schema with 6 fact and 19-dimension tables in Alteryx and maintained referential integrity
  • Implemented Slowly Changing Dimensions (SCDs) to integrate data in Talend and designed data pipelines to load Data Warehouse
  • Orchestrated ETL workflow in Talend to load all tables (60 Billon+ rows) in a scheduled manner
  • Multimodal Biometric System

    October 2020 - March 2021

  • Administered a project entitled "Multimodal Biometric System,” where Iris, Facial, Speech Recognition, and fingerprints were combined, utilized, and implemented using CNNs
  • Integrated these factors to gain 85% better precisions on various datasets and create a reliable biometric system.
  • Applied this model in the department for attendance system, which saved about 15 minutes for taking attendance in each class every day and implemented it in the professor's cabin to enhance security
  • Analysis of Crime Data in NYC

    November 2021 - December 2021

  • Performed data preprocessing and data cleansing techniques for 24 columns and 140k row by using pandas library
  • Visualized for count of crimes committed according to location in city by folium map and head map for density of crimes committed in city by using matplotlib and seaborn libraries for getting insights of crime rate in each borough
  • Applied K-Nearest Neighbor(KNN), Random Forest, Support Vector Machine(SVM) algorithms for arrest by borough and precinct to compare performance; obtained best accuracy by using Random Forest for arrest by borough (94%) and arrest by precinct (20%)
  • World University Ranking

    September 2021 - October 2021

  • Unified 6 different data files for World University Ranking in R studio and executed Data Preprocessing techniques
  • Performed querying business questions to give statistical report based on noble prizes, researchers, and other characteristics of universities
  • Visualized data for Top 100 universities, male-female ratio, quality of teaching based on given score, education expenditure of different countries
  • Handwriting Analysis Software

    October 2020 - March 2021

  • Unified 6 different data files for World University Ranking in R studio and executed Data Preprocessing techniques
  • Performed querying business questions to give statistical report based on noble prizes, researchers, and other characteristics of universities
  • Visualized data for Top 100 universities, male-female ratio, quality of teaching based on given score, education expenditure of different countries
  • Predictive Maintenance

    June 2022 - August 2022

  • Collaborated with team members and professor to build a project to identify the machine failure using various models from scratch such as Logistics Regression, Naïve Bayes and Neural Network for industry purposes
  • Extracted Product Failure Data with over 1 M records and 14 attributes; also performed preprocessing, scaling, and exploratory Data Analysis
  • Achieved over 90% F1 scores and successfully implemented the model in the laboratory to reduce human labor
  • Merchandise Printing Press

    September 2021 - December 2021

  • Applied Data Pre-processing and Data Cleaning technique in MS SQL Server on 8 tables containing 3000+ customers
  • Created EER and UML diagram models followed by mapping of conceptual model to relational model (MongoDB, MySQL) according to product type, city of order etc., which reduced the process time by 50%
  • Obtained the BI reports for sales and production in Python using matplotlib and generated a Dashboard in Tableau to identify the sales in each city using symbol maps
  • Publications

    • A Survey: Handwriting Analysis Software Using Image Preprocessing and Machine Learning

    • Link to the paper