My experience at a venture funded computer vision startup

I recently interned at Dragonfruit AI (https://dragonfruit.ai/), a Video AI startup currently focussed on physical security.

Every heist movie has a security guard looking at a wall of videos in a dimly lit backroom. Our heist protagonists pick a time when the guard is out on a restroom break. The next morning, the museum staff notices a priceless vase missing and all hell breaks loose. The security team starts poring over hours and hours of footage from the previous day, from dozens of different cameras, frantically looking for something that might lead them to the criminals but it’s too late.

Imagine you could use computer vision to automatically and instantly analyze the dozens of videos pouring into the feed and identify all the people in it. Then you had an algorithmic layer on top of the machine learning driven identification to correlate these people across different cameras and track their path. You could instantly be alerted when a person not identified as museum security was lingering on the premises and alert the authorities. This is what Dragonfruit can do.

Say you’re a traffic authority looking at an intersection and want to identify all the blue cars that took an illegal turn over the past day? You can use Dragonfruit to find them in a second. Say you’re a detective and need to investigate an accident and see if a person in a red hoodie was hit by a green car? Use Dragonfruit to find out in a second.

I worked on the platform team at Dragonfruit, the team responsible for the infrastructure that goes behind ingesting terabytes of data from thousands of cameras, feeding it to the machine learning algorithms, storing the results and organizing it to make it easy for other microservices and the frontend to access.

My project was to work on the local version of Dragonfruit’s cloud infrastructure with the goal of deploying our software onto a private network. This is important because, for compliance reasons, it is not always possible for our customers to upload security footage to the cloud. Every week, I was getting my hands dirty building on layers upon layers of architecture, from setting up Kubernetes services and ensuring they communicated with one another correctly, to connecting them to our databases, making sure they talked to the right messaging queues and indexed the results correctly in the internal search engine.

Thinking about the on-premise infrastructure for a SaaS service is a serious technical puzzle and figuring it out bit by bit taught me how to use a whole host of technologies as well as the ins and outs of setting up these pieces within a larger system in a short period of time. I learned just how much challenging problems can accelerate your pace of learning during this internship. Watching the pieces come together over the course of 4 months was like watching a 1000-piece jigsaw puzzle come to completion. First, the mountains come into picture. Then, you see a river and some houses. Finally, you have the landscape. It was very rewarding and personally fulfilling.

But this isn’t a coincidence. To have this special mix of motivation to work every morning, willingness to drink from a firehose of information and personal fulfillment during a persistent technical challenge, I think the work environment needs to provide you with a few features. First, you need to have a sense of ownership over your work. Second, you have to know that your work matters and substantially moves a needle for the “whole”. Third, you need to have support to turn to when you are stuck or it is too easy to lose steam and cruise.

In my experience, most good big companies provide you with the first and third but not the second. This makes sense because in big companies, by definition, intern projects are not as wide in scope since there is X amount of work but 1000+ employees to divide that among. Most good startups that hire interns provide you with one and two but not three. This also makes sense because they are moving too fast to stop and wait for the intern to catch up at that stage.

Dragonfruit has a unique culture of hiring more interns than other startups at that stage and through a process of trial and error (and more iterations since there are more interns), a process for quickly making them effective, somewhat circumventing the “catch up” problem. This allows Dragonfruit to provide all 3, making it the best case work experience for someone curious about working at a startup.

Dragonfruit also gave me an insider’s view on how effective startups work. I learned about the high pace, the changing prioritization of tasks, the intensity of work and the value of quick prototyping and validating your market hypotheses based on customer response. I believe this is a great opportunity to work on a challenging technical project with a world class team while learning how a well-funded startup operates and would highly recommend anyone interested to apply.

Quick Recap: Where should businesses open next?

I have spent the past many months working on recommending to businesses which cities they should open outlets in next. And I’ve published a good bit of my findings on this blog. Here it is, all in one place:

The blog posts behind the same can be found here:

Here’s the code behind my model.

Aside: If you’re wondering why I haven’t uploaded in a while, it’s because I’ve been pretty busy studying for my twelfth grade board exams (exactly as stressful as it sounds). I’m going to be back to programming and conducting more research on businesses and their franchise locations as soon as my exams are done. I can’t wait to get back. Thanks for reading!

Which city should Uber start services in next?

uber_logo_09 Here‘s my presentation on where Uber should open next. Too busy for a presentation? Here’s a map I created showing the top cities where Uber will open next:

Made using R packages:ggmap, maps, ggplot2 and edited in Adobe Photoshop

Now that I’ve done both Uber and Ola, maybe I’ll predict where certain banks will open next. Stay tuned for what’s to come.

Which city should Ola Cabs start services in next?

startup-ola

Ola Cabs—India’s largest taxi service provider—are responsible for my travel to and from basketball practice, friends’ houses and school (when I miss the bus), so it’s only fair that we give Ola due respect. Last time we analyzed where Domino’s pizza should open stores. Let’s now run our recommender algorithm and see where Ola Cabs should (and hopefully, will) start services next.

Here is my presentation detailing my model’s top recommendations for where Ola cabs should open.

If you don’t have the time to go through a 15-slide presentation, here’s the skinny on my model’s recommendations. These are the cities where I predict Ola cabs will open next:

Made using R packages:ggmap, ggplot2, ggmap; Edited using Adobe Photoshop

Next, I’ll be doing the same for Ola’s top competitor—Uber. And of course, I’ll write about how these ‘predictions’ fare next quarter.

Which city will Domino’s pizza open in?

I’ve worked with recommender systems to predict where Bajaj Finance and Capital First would open next. You can see those posts here and here.

So what about where Domino’s pizza?—the chain that really started my obsession with recommender systems and business analytics. When I just started my project, I ran a small scale prediction for Domino’s pizza.

Quoting from an earlier blog post “I took store location data for Domino’s Pizza that I’d collected back in 2015 and ran my model. It told me that Bharatpur, Chittorgarh and Palakkad were the top 3 towns Domino’s should open in. I went onto the Domino’s locations web page to see if I’d struck gold. And I had. One year later, Domino’s now has stores open in Chittorgarh and Palakkad.”

Let’s bring out the big guns for Domino’s pizza now, shall we? Let’s run the full-fledged model and see if my predictions prove true one more time. Here are my model’s mapped predictions for the next 10 cities where Domino’s will open stores:

Made with R packages:ggmap, ggplot2, maps and edited in Adobe Photoshop

If you’re interested, here are the next 10 recommendations for where Domino’s pizza will open: Malegaon, Khammam, Agartala, Kumbakonam, Srinagar, Chittoor, Muktsar, Raichur, Tarn Taran, Akola.

Why don’t these cities look familiar? Because Domino’s pizza has already opened in 292 cities; they are now targeting the smaller towns.

I’ll keep you updated on how my predictions fare. I’ll also post predictions for more franchises’ store openings. Stay tuned.

Update: Kottayam has a Domino’s store. I collected the data to build these predictions in September 2016. So it looks like one of our ten predictions has proved true in only two months. Let’s see what happens to the other 9.

Where should Capital First open next?

Which cities should Capital first open in next? These recommendations are based on the same model and same code as my recommendations for Bajaj Finance (with a few Capital First specific tweaks).

Here is my presentation on where Capital First should open branches next. Here is the code behind it.

I’ll be uploading recommendations for new franchises soon. I will also be talking about more technical matters – gathering, cleaning the data behind the model, User-based collaborative filtering, analyzing and building the model
and tracking how my ‘predictions’ perform over the new few months. Stay tuned for what’s to come.

Which cities should Bajaj Finance open in next?

Which cities should Bajaj Finance open in next? These are my recommendations for where Bajaj Finance should start offering financial products next.

Now that my ACT and SAT exams are over, I’m back to blogging. Before my work on predicting the FM Auctions using a regression model, I left off talking about building a model that uses recommender systems to predict where franchises should open next – suggest to them the cities which are more suitable for their business. I spent the most of the last two months building and fine-tuning this same model.

It took many weeks to collect this data – 26 franchises and 1476 cities worth of data – and many more weeks to refine it – optimizing the data for each franchise, removing duplicate entries (a city like Tiruchirappalli, I learnt, can have up to 8 different spellings), cross-checking that data against Annual reports and Earnings Presentations, deciding on a resolution for each franchise (Should a store in Thane be counted as a store in Mumbai?) – that sort of thing.

After all this work, I’m proud to say that I can now deliver: Here is my presentation on where Bajaj Finance should open branches next. Here is the code behind it.

I will be uploading recommendations like this one for every franchise I can get my hands on over the next few weeks. I will also be talking about more technical matters – gathering, cleaning the data behind the model, analyzing and building the model and tracking how my ‘predictions’ perform over the new few months. Stay tuned for what’s to come.

Updated: How much should you bid for Phase 3 of the FM Auction?

FM Companies: Here’s the updated cheat sheet for bidding at phase 3 of the FM auctions.

This updated model has tighter ranges and is, in my opinion, more accurate. I used Facebook’s ad reach potential for each city as a predictor of its license fee to build a linear regression model. Predictions here are based on that model. If you’re interested, here‘s the code for that model.

This Facebook fueled model beats the franchise based model I built a week ago for two reasons. First, Facebook’s ad reach potential is a better proxy for a city’s advertising potential than franchise data. Advertising potential is directly related to how much companies will spend on advertising their products on radio, which, in turn, is directly related to the price at which an FM station gets sold.

Second, the Facebook ad potential helped my predictions get more granular. Two cities, no matter how similar they might be, rarely get sold at the same price. Using franchises as a predictor didn’t provide my data with this price uniqueness. I only ended up with many buckets of similarly priced cities. If a city had 1 Cafe Coffee Day and 1 Domino’s pizza store, my model predicted it to sell at the exact same price as every other city with 1 Cafe Coffee Day and 1 Domino’s pizza store. That’s still a good approximation, but my facebook fueled model allows me to go beyond that approximation.

A third of the cities up for auction in the last round of Phase 3 of the auctions went unsold. The cities highlighted in the table below are my predictions for which cities are likely to go unsold in this round.

Ask yourself how badly you want the frequency of a particular city. Refer to the table to find the value that matches your priority for that city. Bid that amount. (All figures in lakhs of rupees.)

City	60% chance	80% chance	95% chance	Reserve Price
Achalpur	143	249	416	171
Agartala	279	383	546	16
Aizwal	308	411	573	12
Akola	213	318	483	30
Alappuzha	351	454	615	702
Amravati	259	363	527	351
Asansol	331	433	596	194
Barshi	152	258	424	171
Belgaum	323	426	588	702
Bellary	213	318	483	702

Click here to view and download the table.

If you’re very eager on buying the frequency for Achalpur, you’re best off bidding the amount in the 95% chance column. Bidding at 416 lakhs gives you an almost certain chance of clearing the round with the frequency in hand.

Suppose you’re not as confident about the frequency for Agartala; you’re not as desperate to go out and get it. You decide that you want to bid an amount that will give you a 60% chance of winning the auction for Agartala’s FM Station. Bidding at 279 lakh is your best bet.

A quick note about reserve prices: the higher it is compared to the 60% chance bid, the lower the probability of the frequency selling. For Achalpur, the reserve price (171 lakh) is above the 60% bid (143 lakh). For Agartala, however, there’s some distance between the reserve price (16 lakh) and the 60% bid (279 lakh). This means that the FM station for Agartala will sell like hot cakes but that for Achalpur won’t. Cities highlighted in the table are likely go unsold.

I hope this table was valuable to you. Good luck at the auctions!

Building the regression models behind my FM predictions

In my last post, I said that I’d explain the analysis behind the conclusions in that post. This is where I’ll be writing about it. This post is a bit technical – it’s all about the models I built and the process behind building them, so it might only appeal to data geeks out there. It’s quite tedious, so I’ll jump right into it. If you’re looking for the data and statistics behind my predictions post, I’ll be writing about that soon.

I collected the data of past successful bids here, data for reserve prices here, data for cities available for auction here, and the store location data for Domino’s, Cafe Coffee Day and Hero MotoCorp from their websites. Right off the bat, there’s a lot of data and a lot of data rarely comes clean.

Cleaning the data proved to pose two main challenges. First was the problem of resolution – do I take Delhi, Gurgaon, Noida, New Delhi, Faridabad, Ghaziabad as one city or not? Nearly every large city suffered from this resolution problem. Eventually, I decided to resolve it to the same level as it was in the FM list – if they were selling a separate frequency for Gurgaon (they aren’t), I’d consider it separately from Delhi. Otherwise, it’d be a part of Delhi. I know it seems intuitive in hindsight but it took me some time to come by this.

The second challenge was straight-up bad data. There’s four ways to spell Trichy, the others being Tiruchirapali, Tiruchurappalli and Tiruchy – and every franchise had their own way of spelling it. I tried eliminating the vowels in city names to create a common spelling-list that my data could safely rest upon, but that fell apart when my program encountered cities like Cuddapah (Kadapa) and Calicut (Kozhikode). I had to resolve these anomalies one at a time; each city really has its own challenges. After days of cleaning data, I was finally ready to build my model.

Here’s the code for my model in case you’re interested.

fm.df <- read.csv("Analysis.csv")               #read file 

#create training dataset
fmtrain.df <- subset(fm.df, !is.na(LicenseFee))

#create test dataset
fmtest.df <- subset(fm.df, is.na(LicenseFee))   

#build regression models
fm.1 <- lm(LicenseFee ~ Category, data = fmtrain.df)
fm.2 <- lm(LicenseFee ~ Category + Dominos, data = fmtrain.df)
fm.3 <- lm(LicenseFee ~ Category + CCD, data = fmtrain.df)
fm.4 <- lm(LicenseFee ~ Category + Hero, data = fmtrain.df)
fm.5 <- lm(LicenseFee ~ Category + CCD + Dominos + Hero, data = fmtrain.df)

minmax <- as.data.frame(predict(fm.1, fmtest.df, interval = 'confidence', level = 0.6)) #prepare limit for 60% confidence
df60 <- cbind(fmtest.df$City, minmax)
df60 <- subset(df60, select = c("fmtest.df$City", "upr"))

minmax <- as.data.frame(predict(fm.1, fmtest.df, interval = 'confidence', level = 0.8)) #prepare limit for 80% confidence
df80 <- cbind(fmtest.df$City, minmax)
df80 <- subset(df80, select = c("fmtest.df$City", "upr"))

minmax <- as.data.frame(predict(fm.1, fmtest.df, interval = 'confidence', level = 0.95)) #prepare limit for 95% confidence
df95 <- cbind(fmtest.df$City, minmax)
df95 <- subset(df95, select = c("fmtest.df$City", "upr"))

bizcon.fm5 <- cbind(df60, df80, df95) #return upper limits for specified confidence intervals
write.csv(bizcon.fm5, "bizcon.fm5.csv") #save file

First, I split my data into two groups – a training set (the prices I had from the round 1 auctions) and a test set (the questions I had to answer using my model). There’s a convenient function in R, lm, which builds a linear regression model. I had four tools to build my model – Domino’s, Cafe Coffee Day, Hero MotoCorp and the ‘Category’ of the city. Adding an extra variable always improved my model a little bit (I found this out by calculating the R value for every model), but I couldn’t use all of them at once. Using every variable in the mix would mean that unless a city had all four variables, it’d return a blank. I wouldn’t be able to predict for 159 of 248 cities in my dataset.

So I built 5 models (it’s in the code) and ran the program for all of them. The program’s job is to find the upper limit for each confidence level for every regression model. Here‘s a good source you could check out if you’re not entirely sure what a regression model is.

Then I averaged out what each model gave me. So if a city had a ‘Category’, a Domino’s store and a Hero dealership, it would run through all models which relied on these three variables (the models being fm.1, fm.2, fm.4). Then, I summed the upper limits each model returned and divided this sum by 5 (for every confidence interval) to place it on a common scale with the rest of the cities.

This gave me just what I needed. A price that a city should be bid at based on how desperately a company wanted the frequency of that city. That’s how I ended up with my figures for each confidence interval and my bidding guide – the FM auction cheat sheet.

Stay tuned for a super interesting model based on new data coming up this weekend.

How much should you bid for Phase 3 of the FM Auction?

Dear FM companies, here is your cheat sheet for the upcoming round of the phase 3 FM auctions:

For a 60% chance of winning the auction for a city, bid the amount in the second column. For an 80% chance, bid the amount in the third column. For an almost-certain 95% chance, bid the amount in the fourth column. (I’ll be talking about the statistical models and data collection that went into this in an upcoming post.)

City	60% chance	80% chance	95% chance	Reserve Price
Achalpur	237	420	710	171
Agartala	416	599	887	16
Aizwal	342	524	812	12
Akola	375	458	590	30
Alappuzha	272	356	490	702
Amravati	434	517	648	351
Asansol	912	1074	1329	194
Barshi	342	524	812	171
Belgaum	421	504	635	702
Bellary	284	369	504	702

Click here to view and download the full table.

Take Achalpur for example. Maybe you’re very keen on buying the FM station for Achalpur because of say, your business connections there. You’re best off bidding 710 lakhs for a 95% chance of bagging the frequency for Achalpur.

Maybe, unlike Achalpur, you’re not familiar with the business atmosphere in Agartala. Quite naturally, you decide that you want a lower, say 60% shot at winning the frequency for Agartala. In this case, you should look toward bidding at 416 lakh.

So that’s the way you want to bid for every frequency up for auction. But maybe just knowing what price to bid isn’t enough for you. You want to know if you’re getting a bang for your buck. The last column, the reserve price, gives an indication of just that. It’s the price at which an auction kicks off – a ‘base price’ for every city up for auction.

Some cities are overvalued and others are undervalued. For Agartala, the reserve price is far lower than the 60% bid; if you don’t buy it, someone else likely will. It’s reasonably valued. Now take a look at Alappuzha. It’s reserve price is quite high; it even tops the 60% bid. It’s an overvalued frequency and you would not want to buy it unless you’re very interested in Alappuzha. These ‘overvalued frequencies’ are highlighted in the table. My model says that these cities are punching above their weight: their reserve prices are way too high.

I hope this table was of value to you. Good luck at the auctions!

	Quick Recap: Where s… on Which city should Ola Cabs sta…
	Quick Recap: Where s… on Which city should Uber start s…
	Quick Recap: Where s… on Where should Capital First ope…
	Quick Recap: Where s… on Which cities should Bajaj Fina…
	Which city should Ub… on Which city should Ola Cabs sta…

	Quick Recap: Where s… on Which city should Ola Cabs sta…
	Quick Recap: Where s… on Which city should Uber start s…
	Quick Recap: Where s… on Where should Capital First ope…
	Quick Recap: Where s… on Which cities should Bajaj Fina…
	Which city should Ub… on Which city should Ola Cabs sta…

Priyank Aranke

Personal Blog