Which city will Domino’s pizza open in?

I’ve worked with recommender systems to predict where Bajaj Finance and Capital First would open next. You can see those posts here and here.

So what about where Domino’s pizza?—the chain that really started my obsession with recommender systems and business analytics. When I just started my project, I ran a small scale prediction for Domino’s pizza.

Quoting from an earlier blog post “I took store location data for Domino’s Pizza that I’d collected back in 2015 and ran my model. It told me that Bharatpur, Chittorgarh and Palakkad were the top 3 towns Domino’s should open in. I went onto the Domino’s locations web page to see if I’d struck gold. And I had. One year later, Domino’s now has stores open in Chittorgarh and Palakkad.”

Let’s bring out the big guns for Domino’s pizza now, shall we? Let’s run the full-fledged model and see if my predictions prove true one more time. Here are my model’s mapped predictions for the next 10 cities where Domino’s will open stores:

dominos-plot

Made with R packages:ggmap, ggplot2, maps and edited in Adobe Photoshop

If you’re interested, here are the next 10 recommendations for where Domino’s pizza will open: Malegaon, Khammam, Agartala, Kumbakonam, Srinagar, Chittoor, Muktsar, Raichur, Tarn Taran, Akola.

Why don’t these cities look familiar? Because Domino’s pizza has already opened in 292 cities; they are now targeting the smaller towns.

I’ll keep you updated on how my predictions fare. I’ll also post predictions for more franchises’ store openings. Stay tuned.

Update: Kottayam has a Domino’s store. I collected the data to build these predictions in September 2016. So it looks like one of our ten predictions has proved true in only two months. Let’s see what happens to the other 9.

Where should Capital First open next?

Which cities should Capital first open in next? These recommendations are based on the same model and same code as my recommendations for Bajaj Finance (with a few Capital First specific tweaks).

capital-first.png

Here is my presentation on where Capital First should open branches next. Here is the code behind it.

I’ll be uploading recommendations for new franchises soon. I will also be talking about more technical matters – gathering, cleaning the data behind the model, User-based collaborative filtering, analyzing and building the model
and tracking how my ‘predictions’ perform over the new few months. Stay tuned for what’s to come.

Which cities should Bajaj Finance open in next?

Which cities should Bajaj Finance open in next? These are my recommendations for where Bajaj Finance should start offering financial products next.

Now that my ACT and SAT exams are over, I’m back to blogging. Before my work on predicting the FM Auctions using a regression model, I left off talking about building a model that uses recommender systems to predict where franchises should open next – suggest to them the cities which are more suitable for their business. I spent the most of the last two months building and fine-tuning this same model.

It took many weeks to collect this data – 26 franchises and 1476 cities worth of data – and many more weeks to refine it – optimizing the data for each franchise, removing duplicate entries (a city like Tiruchirappalli, I learnt, can have up to 8 different spellings), cross-checking that data against Annual reports and Earnings Presentations, deciding on a resolution for each franchise (Should a store in Thane be counted as a store in Mumbai?) – that sort of thing.

After all this work, I’m proud to say that I can now deliver: Here is my presentation on where Bajaj Finance should open branches next. Here is the code behind it.

I will be uploading recommendations like this one for every franchise I can get my hands on over the next few weeks. I will also be talking about more technical matters – gathering, cleaning the data behind the model, analyzing and building the model and tracking how my ‘predictions’ perform over the new few months. Stay tuned for what’s to come.

Building the regression models behind my FM predictions

In my last post, I said that I’d explain the analysis behind the conclusions in that post. This is where I’ll be writing about it. This post is a bit technical – it’s all about the models I built and the process behind building them, so it might only appeal to data geeks out there. It’s quite tedious, so I’ll jump right into it. If you’re looking for the data and statistics behind my predictions post, I’ll be writing about that soon.

I collected the data of past successful bids here, data for reserve prices here, data for cities available for auction here, and the store location data for Domino’s, Cafe Coffee Day and Hero MotoCorp from their websites. Right off the bat, there’s a lot of data and a lot of data rarely comes clean.

Cleaning the data proved to pose two main challenges. First was the problem of resolution – do I take Delhi, Gurgaon, Noida, New Delhi, Faridabad, Ghaziabad as one city or not? Nearly every large city suffered from this resolution problem. Eventually, I decided to resolve it to the same level as it was in the FM list – if they were selling a separate frequency for Gurgaon (they aren’t), I’d consider it separately from Delhi. Otherwise, it’d be a part of Delhi. I know it seems intuitive in hindsight but it took me some time to come by this. 

The second challenge was straight-up bad data. There’s four ways to spell Trichy, the others being Tiruchirapali, Tiruchurappalli and Tiruchy – and every franchise had their own way of spelling it. I tried eliminating the vowels in city names to create a common spelling-list that my data could safely rest upon, but that fell apart when my program encountered cities like Cuddapah (Kadapa) and Calicut (Kozhikode). I had to resolve these anomalies one at a time; each city really has its own challenges. After days of cleaning data, I was finally ready to build my model.

Here’s the code for my model in case you’re interested.

fm.df <- read.csv("Analysis.csv")               #read file 

#create training dataset
fmtrain.df <- subset(fm.df, !is.na(LicenseFee))

#create test dataset
fmtest.df <- subset(fm.df, is.na(LicenseFee))   

#build regression models
fm.1 <- lm(LicenseFee ~ Category, data = fmtrain.df)
fm.2 <- lm(LicenseFee ~ Category + Dominos, data = fmtrain.df)
fm.3 <- lm(LicenseFee ~ Category + CCD, data = fmtrain.df)
fm.4 <- lm(LicenseFee ~ Category + Hero, data = fmtrain.df)
fm.5 <- lm(LicenseFee ~ Category + CCD + Dominos + Hero, data = fmtrain.df)

minmax <- as.data.frame(predict(fm.1, fmtest.df, interval = 'confidence', level = 0.6)) #prepare limit for 60% confidence
df60 <- cbind(fmtest.df$City, minmax)
df60 <- subset(df60, select = c("fmtest.df$City", "upr"))

minmax <- as.data.frame(predict(fm.1, fmtest.df, interval = 'confidence', level = 0.8)) #prepare limit for 80% confidence
df80 <- cbind(fmtest.df$City, minmax)
df80 <- subset(df80, select = c("fmtest.df$City", "upr"))

minmax <- as.data.frame(predict(fm.1, fmtest.df, interval = 'confidence', level = 0.95)) #prepare limit for 95% confidence
df95 <- cbind(fmtest.df$City, minmax)
df95 <- subset(df95, select = c("fmtest.df$City", "upr"))

bizcon.fm5 <- cbind(df60, df80, df95) #return upper limits for specified confidence intervals
write.csv(bizcon.fm5, "bizcon.fm5.csv") #save file

First, I split my data into two groups – a training set (the prices I had from the round 1 auctions) and a test set (the questions I had to answer using my model). There’s a convenient function in R, lm, which builds a linear regression model. I had four tools to build my model – Domino’s, Cafe Coffee Day, Hero MotoCorp and the ‘Category’ of the city. Adding an extra variable always improved my model a little bit (I found this out by calculating the R value for every model), but I couldn’t use all of them at once. Using every variable in the mix would mean that unless a city had all four variables, it’d return a blank. I wouldn’t be able to predict for 159 of 248 cities in my dataset.

So I built 5 models (it’s in the code) and ran the program for all of them. The program’s job is to find the upper limit for each confidence level for every regression model. Here‘s a good source you could check out if you’re not entirely sure what a regression model is.

Then I averaged out what each model gave me. So if a city had a ‘Category’, a Domino’s store and a Hero dealership, it would run through all models which relied on these three variables (the models being fm.1, fm.2, fm.4). Then, I summed the upper limits each model returned and divided this sum by 5 (for every confidence interval) to place it on a common scale with the rest of the cities. 

This gave me just what I needed. A price that a city should be bid at based on how desperately a company wanted the frequency of that city. That’s how I ended up with my figures for each confidence interval and my bidding guide – the FM auction cheat sheet.

Stay tuned for a super interesting model based on new data coming up this weekend.