The power of R is based on a wide range of packages with advanced algorithms ready-to-use. In this example we'll use the k-means algorithm for custom users segmentation.
Unsupervised learning: k-Means k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (Source: Wikipedia)
Because this example needs a custom installation of Google Analytics tracking (content grouping, fingerprint), I've prepared a special dataset for this purpose. You can find the complete code below.
# K-Means Cluster Analysis # load data into R # you can download data from Google Analytics API or download the sample dataset # source('ga-connection.R') # download and preview the sample dataset download.file(url="https://raw.githubusercontent.com/michalbrys/R/master/users-segmentation/sample-users.csv", "sample-users.csv", method="curl") gadata <- read.csv(file="sample-users.csv", header=T, row.names = 1) head(gadata) # clustering users into 3 groups fit <- kmeans(gadata, 3) # get the cluster means aggregate(gadata,by=list(fit$cluster),FUN=mean) # append and preview the cluster's assignment clustered_users <- data.frame(gadata, fit$cluster) head(clustered_users) # visualize the results in 3D chart #install.packages("plotly") library(plotly) plot_ly(clustered_users, x = clustered_users$beginner_pv, y = clustered_users$intermediate_pv, z = clustered_users$advanced_pv, type = "scatter3d", mode = "markers", color=factor(clustered_users$fit.cluster) ) # write the results to the file write.csv(clustered_users, "clustered-users.csv", row.names=T)
The results visualized in the
Results - clustered users
In addition to the chart, you get a
.csv file with the userId (fingerprint) and predicted label (the segment number). You can use the results, uploading it to your marketing systems. Example of the results:
> clustered_users Beginner Intermediate Advanced fit.cluster 266876 9 45 4 1 965265 9 51 7 1 ... 981924 19 10 8 2 732529 19 16 1 2 ... 377795 2 7 38 3 918083 2 8 28 3
The complete source code of the examples showed above is in my GitHub repository: