Blog Viewer

Automatic Classification for Landing Page Optimization

By Sushant Ajmani posted 12-02-2014 12:53 AM

  

Introduction

“Wouldn’t it be wonderful, if someone could quickly provide their web analytics account credentials through a web interface and in no time know which Top 3 Landing Pages need immediate attention.”?”

This was the inception of the idea from where we began, a quick and elegant data driven technique to assist Web analysts pondering over Website optimization. This formed the crux of the problem which the data science team here at Nabler realized could be very well optimized when converted to a machine learning exercise. 


The Concept

The immediate models that hit the drawing board were classification models, since the whole outcome expected from the exercise was around identifying which webpages needed optimization. But before that we had to put certain things in place.

Discovery

With these ideas bubbling, we realized the need to have a different model for every vertical since the web metrics would vastly differ across a B2B vs. a B2C or an ecommerce website. A ‘one fits all’ model would fall fairly short.

Data Prep

We compiled a vertical specific repository of sample landing pages which were manually tagged by web consultants as requiring or not requiring optimization (target variable as 1 vs. 0) based on their years of experience and behaviour of visitors represented by metrics like visits, time on site, bounce rate, page views per visit to name a few. Basically this served as our training set cum validation set. With the data set having the target variable in place we trimmed the data using Pareto Rule and created a few artificial variables that we felt captured the essence of the thought process of the consultants, which would enable the machine to better understand the reason for the target variable being assigned a particular value.
 

Model Plan

We worked our way out for most of the leading classification models that data science has to offer: Decision Trees, Random Forests, Bagging, and Logistic Regression. 

Model Build 

We did a 70:30 random split of the dataset for training and validation. We began with Decision Trees which gave us classification accuracy between 68% to 80%. But certain scenarios of tree building were not reasonably making sense where both nodes of a split would have either both zeros or one (essentially nor requiring a split). Hence our next option was to prune the tree which did far better. Ensemble methods such as Bagging and Random Forests gave us better results . As expected the ensemble modeling techniques achieved consistent accuracy between 75% and 90% in majority of the occasions. This was not surprising because these models have the capability and inbuilt mathematical rigour to reduce the variability.

Model Review

The training and validation phase went well and it was time now to put a completely unknown data set to test out the model. Fingers crossed, bets put across and sesame we achieved an accuracy of as high as 91% with just 9% margin of error. They were validated by web consultants and by the beauty of ensemble models they were very close to an actual consultant classifying the pages manually but at fraction of a second saving so much more time and resource to be more productively used.

Model Implementation

With the modeling done and results cross validated, we are in the process of implementing the solution on a web interface, where the user authenticates with a key for their web analytics account and the data for landing pages is read in, with the output highlighting the top 3 pages requiring optimization. 

Conclusion

Nabler envisioned of a scenario where resources could be maximally used, clocked to the extreme best and giving classy output. One such vision has been realized and it soon will be up and running for the clients to come and feel the beauty of it generated in the back-end by sophisticated data science.

At the end, we would like to thank our Data Sciences team for their wonderful effort.

Permalink

Most Recent Blogs

Log in to see this information

Either the content you're seeking doesn't exist or it requires proper authentication before viewing.