Johan_Portfolio
Johan’s Data Science / Data Analyst Portfolio
These are some projects that I’ve been working on since I was graduated from Purwadhika Bootcamp School in March 2021
Project 1 : Email Campaign Prediction
- Create a classification model that able to save estiamtion budget for advertisement about 74% monthly to help marketing team send promotion advertisement only to customer who will likely to read it
- Engineered features to help indentify the behaviour of the customer
- Visualize the data using seaborn and matplotlib
- Optimized the data using Linear Regression, Gaussian Naive Bayes, and Decision Tree Classifier Model, RepeatedStratifiedKFold, GridsearchCV, and threshold changing to reach best model
- Deploy model for user using Flask
Results :
Project 2 : Delivery Truck On-Time / Delay Prediction
- Create a model that able to predict whether the delivery will be delayed or on time
- Decide top 10 factors that has the highest ratio to determine the delivery status. This is able to help business owner to focus on that 10 factors to reduce delay ratio
- Optimized data using Gradient Booster Classifier, RepeatedStratifiedKFold and GridsearchCV to achieve best model with accuracy of 89%
Top 10 factors that affecting delay of delivery :
Project 3 : Sentiment Analysis Predictor
- Create a model that accept text input and able to classify whether the input is a positive or negative comment
- Apply simple text preprocessing (lowercase, remove punctuation, stopwords, etc)
- Optimized model with TfidfVectorizer and n-grams = 2 before applied with Logistic Regression and Multinomial Naive Bayes to receive up to 80% accuracy
- Deploy model for user using Flask
Results :
Positive Result :
Negative Result :
Project 4 : Animal Image Classifier
- Create a model that able to differentiate picture between Cat, Dog, and Panda. This model could be useful for animal photographer who take picture of any animal
- I was able to get the model to predict the animal with 73% accuracy after minimal tuning. For most of cases, this would meet the need of an end user of the app. To get these result, I used Sequential model with randomsearch of dense, droput, and added neuron number
- I run this model in google colab with setting of runtime using GPU for time efficiency, also I can greed it out and use colorful picture
Results :
Project 5 : Text Summarizer
- Create a model using NLP-Spacy that able to summarize the input English based language article into important sentences inside it
- Summary was taken from sentences by weighting every words of sentences and take the top n importance sentences as a summary
- This model able to help people who does not have a lot of time for reading all the article and just want to get into the point of their article
- Deploy the model using heroku for better user experience
Results :
You can try to summarize an article here :
Result :
Project 6 : Dota 2 Base Stat Visualization using Tableau
- Visualize the base stat of Dota 2 hero on updated patch 7.29 (Dawnbreaker included). This visualization is useful for someone who loves and play Dota 2 to choose best hero stat in order to enhance their laning stage and boosting their chance to win the game
- Scrap the data from Dota 2 Fandom using Selenium
- Clean the data to make easier to visualize using pandas
- Visualize the data using Tableau
Results :
Check my tableau post in here : Dota 2 Hero Base Stat Visualization
Project 7 : Scraping Job Vacancy in Data Industry from LinkedIn
- Visualize the condition of what job in data industry that company is looking for. This visualization is useful for someone who is looking for a job in Data Industry and want to see the situation
- Scrap the data from LinkedIn using Selenium
- Clean the data using pandas
- Visualize the data using Tableau
Results :
Check the interactive dashboard in My Tableau Profile here