Table Of Contents:
1. Business Problem
2. Data Description
3. Exploratory Data Analysis
4. Data Preparation/Feature Engineering
5. Model Building
6. Submit Model on Kaggle
1. Business Problem
1.1 Problem Description:
The “Elo Merchant Category Recommendation” challenge is aimed at helping understand customer loyalty through machine learning. Elo, a major Brazilian payment brand, has built machine learning models to analyze important aspects of their customers’ lifecycle. However, their existing models are not tailored to individual customers or profiles, preventing Elo from delivering personalized brand recommendations or filtering unwanted ones. The challenge is to develop algorithms that can identify and serve relevant opportunities to individuals based on their loyalty signals.
Competition Description:
Elo has partnered with merchants to offer promotions or discounts to cardholders, but it is unclear whether these promotions effectively benefit the consumers or the merchants. Personalization is key to improving the customer experience and ensuring repeat business for the merchants.
1.2 Problem Statement
Elo’s machine learning models currently lack personalized recommendations for individuals or profiles. In this competition, participants will develop algorithms that can uncover customer loyalty signals and provide personalized opportunities. The goal is to improve customers’ lives, reduce unwanted campaigns, and create the right experience for customers.
1.3 Real World/Business Objectives and Constraints
Objectives:
– Predict loyalty score to improve customers’ lives and reduce unwanted campaigns.
– Minimize the difference between predicted and actual rating (RMSE).
1.4 Dataset Source:
The dataset can be found at the following links:
– Data Source: [data source link](https://www.kaggle.com/c/elo-merchant-category-recommendation/data)
– Competition link: [competition link](https://www.kaggle.com/c/elo-merchant-category-recommendation/overview)
2. Dataset Overview
The dataset consists of various files, including training and test sets, historical transactions, information about merchants, and new merchant transactions. The features are largely anonymized, and their meanings are not elaborated. External data is allowed.
3. Exploratory Data Analysis:
3.1 Train and Test Set:
The training set includes the ID of the card, the activation date, anonymous classification features, and the target variable (loyalty score). The test set is similar to the training set, but without the target variable. There are no null values in the train set, except for one in the test set which can be replaced with the mode of the column. Additional features need to be created as the existing features may not be helpful in predicting the target values.
3.2 New Merchant Transactions:
This dataset contains new merchant transaction data, including the merchant and the month_lag indicating the time of the transaction. There are some null values in the dataset, which should be imputed before feeding it to the model. Analysis of the data shows patterns related to the year of purchase, hours of purchase, and days of the week when purchases are made. Additional features can be created based on these patterns.
3.3 Merchant:
The merchant dataset provides information about merchants, including pipeline information. Some features have null values, which can be imputed with the mode. Categorical features should be one hot encoded before feeding to the model. Numerical features show similar distributions and values, indicating the need for further investigation to determine their relevance.
(Note: The content has been paraphrased and reorganized for clarity and flow.)