Intro
“Information is the oil of the 21st century and analytics is the combustion engine!” This quote by Max Levchin, co-founder of PayPal has always found a place in my curious mind that the mysteries in the world of big data are still unraveling. I have always been inclined towards the complexities posed by mathematics and was more than willing to try my hand at newer and more complicated problems. Working with data and making data driven decisions has always excited me to gain more knowledge and motivated me to work more in this domain. My immediate goal after master’s is to be a Data Analyst. In the long run I see myself becoming a Data scientist as I am hugely inspired by magic that data can show. Working in teams from my undergraduate degree has set me markers about striving for curiosity, team building, leadership, social and local problem solving. Apart from academics, I have been a part of many extra-curricular activities during my undergraduate. Having participated and managed various events has not only improved my decision making but has also made me patient and tolerant. I am a strong believer of ‘Hard work pays off’ so I made hard work as a part of my lifestyle. I am a motivated and curious engineer with demonstrated experience in Data Analytics domain. Experienced in exploratory data analytics, data visualization, developing ML algorithms and developing predictive models. I believe that I am a strong problem solver, extremely patient along with excellent communication skills. I am dedicated to exceeding goals and expectations. awesome work.
Master of Science
Data Analystics Engineering
Graduating May 2023
Courses: Probabilty and Statstics, Database Management for Analytics, Data Mining Engineering, Computation and Visualization, Machine Learning, Deterministic Operational Research
Bachelor of Engineering
Information Technology
Graduated In May 2021
Courses: Fundatmental of Datastructure, Object Oriented Programming, Discrete Structures, Computer Graphics, Theory of Computation, Database Management Systems, Software Engineering and Project Management, Systems Programming, Design and Analysis of Algorithm, Cloud Computing, Data Science and Big Data Analytics, Machince Learning and Applications, Software Design and Modeling, Software Testing and Quantitative Analysis, Internet of Things, Distributed Computing Systems, Social Media, Analytics
September 2020 - February 2021
Initiated development of star schema data models for sales and production of the company in collaboration with team members
Conducted data wrangling and implemented 30+ parallelized ETL pipelines jobs leveraging extract/transform/load (ETL) data and deployed it
in TAC (Talend Administrator Center) server
Developed interactive Power-BI dashboards of overall sales and production performance to enhance data-informed decision making
Created dashboard helped to identify downtime while production, thus improved production by 7% and helped to change sales
strategy, increasing sales by 13%
June 2020 - September 2020
Managed and Built the company’s data lake of around 100sGBs, and configured the infrastructure provisioning using Terraform
Monitored the query performance and logs for the snowflake warehouse by creating 5 dashboards using Datadog
Provided structured data across various teams to help them make informed decisions, which resulted in 10% profit growth in the quarter
Performed Exploratory data analysis (EDA) on a dataset with 1 million rows and 50 features.
Preprocessed the data using the libraries like pandas and NumPy to clean the data, further analyzing it to predict the
Calculated the precision, recall, and accuracy a to evaluate the model and select Random Forest as the best model
November 2021 - December 2021
Applied Data Pre-processing techniques by importing dataset of 19 rows table in R studio and worked on probability distribution
Reported time series analysis using python using Jupiter Notebook for order and discount data and integrated with R to provide visualization
report by applying recurrence plotting method
Performed text analysis and sentiment analysis for 2000+ products to analyze positive-negative words for common words in the product description
November 2021 - December 2021
Devised a multi-star schema with 6 fact and 19-dimension tables in Alteryx and maintained referential integrity
Implemented Slowly Changing Dimensions (SCDs) to integrate data in Talend and designed data pipelines to load Data Warehouse
Orchestrated ETL workflow in Talend to load all tables (60 Billon+ rows) in a scheduled manner
October 2020 - March 2021
Administered a project entitled "Multimodal Biometric System,” where Iris, Facial, Speech Recognition, and fingerprints were combined, utilized,
and implemented using CNNs
Integrated these factors to gain 85% better precisions on various datasets and create a reliable biometric system.
Applied this model in the department for attendance system, which saved about 15 minutes for taking attendance in each class every day and
implemented it in the professor's cabin to enhance security
November 2021 - December 2021
Performed data preprocessing and data cleansing techniques for 24 columns and 140k row by using pandas library
Visualized for count of crimes committed according to location in city by folium map and head map for density of crimes committed in city by using matplotlib and seaborn libraries for getting insights of crime rate in each borough
Applied K-Nearest Neighbor(KNN), Random Forest, Support Vector Machine(SVM) algorithms for arrest by borough and precinct to compare performance; obtained best accuracy by using Random Forest for arrest by borough (94%) and arrest by precinct (20%)
September 2021 - October 2021
Unified 6 different data files for World University Ranking in R studio and executed Data Preprocessing techniques
Performed querying business questions to give statistical report based on noble prizes, researchers, and other characteristics of universities
Visualized data for Top 100 universities, male-female ratio, quality of teaching based on given score, education expenditure of different countries
October 2020 - March 2021
Unified 6 different data files for World University Ranking in R studio and executed Data Preprocessing techniques
Performed querying business questions to give statistical report based on noble prizes, researchers, and other characteristics of universities
Visualized data for Top 100 universities, male-female ratio, quality of teaching based on given score, education expenditure of different countries
Collaborated with team members and professor to build a project to identify the machine failure using various models from scratch such as Logistics Regression, Naïve Bayes and Neural Network for industry purposes
Extracted Product Failure Data with over 1 M records and 14 attributes; also performed preprocessing, scaling, and exploratory Data Analysis
Achieved over 90% F1 scores and successfully implemented the model in the laboratory to reduce human labor
September 2021 - December 2021
Applied Data Pre-processing and Data Cleaning technique in MS SQL Server on 8 tables containing 3000+ customers
Created EER and UML diagram models followed by mapping of conceptual model to relational model (MongoDB, MySQL) according to product type, city of order etc., which reduced the process time by 50%
Obtained the BI reports for sales and production in Python using matplotlib and generated a Dashboard in Tableau to identify the sales in each city using symbol maps
Publications
A Survey: Handwriting Analysis Software Using Image Preprocessing and Machine Learning
Contact