Probabilistic Visitor Stitching on Cross-Device Web Logs

Sungchul Kim, Nikhil Kini, Jay Pujara, Eunyee Koh, Lise Getoor
International Conference on World Wide Web (WWW), page 1581--1589 - 2017
Download the publication : p1581-kimwww17.pdf [1.2Mo]  
Personalization – the customization of experiences, interfaces, and content to individual users – has catalyzed user growth and engagement for many web services. A critical prerequisite to personalization is establishing user identity.However the variety of devices, including mobile phones, appliances, and smart watches, from which users access web services from both anonymous and logged-in sessions poses a significant obstacle to user identification. The resulting entity resolution task of establishing user identity across devices and sessions is commonly referred to as "visitor stitching". We introduce a general, probabilistic approach to visitor stitching using features and attributes commonly contained in web logs. Using web logs from two real-world corporate websites, we motivate the need for probabilistic models by quantifying the difficulties posed by noise, ambiguity,and missing information in deployment. Next, we introduce our approach using probabilistic soft logic (PSL), a statistical relational learning framework capable of capturing similarities across many sessions and enforcing transitivity. We present a detailed description of model features and design choices relevant to the visitor stitching problem. Finally,we evaluate our PSL model on binary classification performance for two real-world visitor stitching datasets. Our model demonstrates significantly better performance than several state-of-the-art classifiers, and we show how this advantage results from collective reasoning across sessions.

BibTex references

  author       = "Kim, Sungchul and Kini, Nikhil and Pujara, Jay and Koh, Eunyee and Getoor, Lise",
  title        = "Probabilistic Visitor Stitching on Cross-Device Web Logs",
  booktitle    = "International Conference on World Wide Web (WWW)",
  pages        = "1581--1589",
  year         = "2017",
  publisher    = "International World Wide Web Conferences Steering Committee",

Other publications in the database