Connect to Redshift
One of the top priorities for any app owner is to keep their users engaged after the first use. Recently, we worked with one of our clients to help them analyze engagement within their portfolio of apps for the purpose of determining which behaviors determined long term engagement. In this post, I hope to provide some insights using predictive analytics to share as best practice for driving long term engagement, and also provide an example of using raw event level data in Redshift for predictive modeling.
There is no shortage of market research showing that app developers face a very competitive market these days. Cost per install rates and competition for consumer mindshare continues to increase, while the number of hours in a day continues to remain constant. This means that it’s all about retention; once you get a user to install your app, you need to ensure that user sticks around for a while. Here are some statistics:
- According to Facebook at their F8 summit in March 2015, 85% of app users churn after just one use. While apps within our client’s portfolio perform a bit better, still 53% of users only appear on a single day after install, as shown in the histogram below.
- On average there are 95 apps installed on an android phone, but only a third of them are used throughout the day.
- Nielsen reports that in Q4 of 2014, average apps used by a user per month is 26.7, a number that has stayed relatively flat in the last two years. Also, over 70% of the total usage is coming from the top 200 apps.
- Adoption barrier is too high because app discovery is hard and disk space on mobile devices is too expensive.
TLDR; After extensive research, we have seen that a positive first impression is critical to keeping users engaged in the long run. In fact, the first 2 hours of opening an app for the first time has big impact on whether the user will become an engaged user or not.
In order to build our model, what we did with data in Redshift was:
- collected all users who opened a given app for the first time over the course of a 10-day period,
- captured their interactions in the app during 2 hours of usage in their first session,
- built a data set from this 2-hour period,
- and then used these features to predict whether the users would become loyal app users in the 30 days following the 2-hour period.
Note that each user has a different 2-hour period, depending on the user’s first app open timestamp. We simply picked a set of common user features listed as follows, and each of the feature is a simple count of events within the 2-hour period.
- Number of app sessions.
- Average session length.
- Total time spent in the app.
- Number of crashes or exceptions that occurred.
- Total amount of revenue from in-app purchases, if any.
- Number of app state transitions, e.g., app going to background or coming to foreground of mobile device.
- Number of screen views.
- Number of push notifications received.
- Number of crashes.
- For each custom event (e.g., Sign Up, Log off, Add a friend, etc.) logged in the app, count the number of occurrences.
We measured user engagement level based on frequency of usage as well as the presence of a purchase event. Specifically, if the app has a purchase event, then we label revenue generating users within the 30-day period as high value users; if the app does not collect purchase data, users who have been active for at least 5 distinct days out of the 30-day period are considered high value users.
Next, we split the data set into training and test sets, fit a regularized linear model on the training data, and evaluated the model performance on the test set. We used R package glmnet and used default parameters in model fitting. The graph below shows a boxplot of AUC values from a few of the apps within the client’s portfolio (a median AUC value of 0.74 is decent given that we haven’t fine tuned each model). What we saw was that users’ first interactions with an app are critically important to their engagement level in the long run.
Some interesting observations from the analysis:
- Not surprisingly, more interactions within the first 2 hours is a good indicator of long run engagement. Depending on the app, more interactions could be represented by more sessions, more custom events, longer session length, or longer total time spent in app. This is also consistent with findings published by one of our partners, Localytics.
- Certain app events are often good predictors of users’ long term engagement level. Not surprisingly, bottom of funnel events like “sign up” and “add credit card” are very good predictors. Additionally, events core to an an app’s specific user flow are have high levels of prediction. Therefore, it’s a good practice to create a data plan and track custom events in your app, even if you aren’t thinking about using them right away.
- We also saw a number of behaviors which are indicative of poor long term engagement. These include:
- If the user changes default settings, such as changing default sort order of items, changing default chart style, etc. This indicates that the user is doing your UX optimization for you and isn’t happy about it.
- Errors during signup or sign in process.
- Unhandled exceptions or app crashes.
An easy way to begin to answer these complex questions is to start sending data to Amazon Redshift and run the same kind of analysis. You can get example R code for what we did in this blog post from github here. It is just intended to get you started on unleashing the power of your app data. You can also easily add more features to the model, or apply some feature engineering to build more predictive features, or select a different modeling algorithm and fine tune it to get more accurate predictions.
Hope this helps and good luck. Don’t hesitate to reach out if we can be helpful.