Semantically Enriched Variable Length Markov Chain Model for Analysis of User Web Navigation Sessions
Abstract
The rapid growth of the World Wide Web has resulted in intricate Web sites, demanding enhanced user skills to find the required information and more sophisticated tools that are able to generate apt recommendations. Markov Chains have been widely used to generate next-page recommendations; however, accuracy of such models is limited. Herein, we propose the novel Semantic Variable Length Markov Chain Model (SVLMC) that combines the fields of Web Usage Mining and Semantic Web by enriching the Markov transition probability matrix with rich semantic information extracted from Web pages. We show that the method is able to enhance the prediction accuracy relatively to usage-based higher order Markov models and to semantic higher order Markov models based on ontology of concepts. In addition, the proposed model is able to handle the problem of ambiguous predictions. An extensive experimental evaluation was conducted on two real-world data sets and on one partially generated data set. The results show that the proposed model is able to achieve 15–20% better accuracy than the usage-based Markov model, 8–15% better than the semantic ontology Markov model and 7–12% better than semantic-pruned Selective Markov Model. In summary, the SVLMC is the first work proposing the integration of a rich set of detailed semantic information into higher order Web usage Markov models and experimental results reveal that the inclusion of detailed semantic data enhances the prediction ability of Markov models.
References
-
B. Liu , Web Data Mining , 2nd edn. ( Springer , 2011 ) . Crossref, Google Scholar - ACM SIGCOMM Computer Communication Review 26(3), 22 (1996). Crossref, Google Scholar
- IEEE Transactions on Knowledge Data Engineering 16(11), 424 (2004). Google Scholar
- ACM Transactions on Data Knowledge Engineering 53(3), 225 (2005). Crossref, Web of Science, Google Scholar
N. R. Mabroukeh and C. I. Ezeife , Using domain ontology for semantic Web usage mining and next page prediction, Proc. 18th ACM Conf. Information and Knowledge Management (CIKM 2009) (2009) pp. 1677–1680. Google ScholarN. R. Mabroukeh and C. I. Ezeife , Semantic-rich Markov models for Web prefetching, IEEE Int. Conf. Data Mining Workshops (2009) pp. 465–470. Google ScholarJ. Borges and M. Levene , Generating dynamic higher-order Markov models in Web usage mining, Proc. Ninth European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), eds.A. Jorge (2005) pp. 34–45. Google Scholar- IEEE Transactions on Knowledge and Data Engineering 19(4), 441 (2007). Crossref, Web of Science, Google Scholar
- International Journal of Information Technology & Decision Making 7(4), 639 (2008). Link, Web of Science, Google Scholar
- International Journal of Information Technology & Decision Making 4(1), 81 (2005). Link, Web of Science, Google Scholar
- International Journal of Information Technology & Decision Making 6(1), 15 (2007). Link, Web of Science, Google Scholar
- Data & Knowledge Engineering 65 , 512 ( 2008 ) . Crossref, Web of Science, Google Scholar
C. Makris , A Web-page usage prediction scheme using weighted suffix trees, Proc. 14th Int. Conf. String Processing and Information Retrieval (2007) pp. 242–253. Google Scholar- Expert Systems with Applications 37 , 6201 ( 2010 ) . Crossref, Web of Science, Google Scholar
- Expert Systems with Applications 39(12), 11243 (2012). Crossref, Web of Science, Google Scholar
- Knowledge-Based Systems 49 , 50 ( 2013 ) . Crossref, Web of Science, Google Scholar
S. Jespersen , T. Pedersen and J. Thorhauge , Evaluating the Markov assumption for Web usage mining, Proc. Fifth ACM Int. Workshop Web Information and Data Management (2003) pp. 82–89. Google Scholar- Computer Networks 33(6), 337 (2000). Crossref, Web of Science, Google Scholar
- International Journal of Information Technology & Decision Making 2(3), 459 (2003). Link, Google Scholar
- International Journal of Information Technology & Decision Making 9(4), 547 (2010). Link, Web of Science, Google Scholar
- Naval Research Logistics 51(4), 557 (2004). Crossref, Web of Science, Google Scholar
- IEEE Transactions on Circuits and Systems for Video Technology 14(5), 634 (2004). Crossref, Web of Science, Google Scholar
F. Khalil , J. Li and H. Wang , A framework for combining Markov model with association rules for predicting Web page accesses, Proc. Fifth Australasian Data Mining Conf. (AusDM2006)61 (2006) pp. 177–184. Google ScholarS. Chimphlee , Rough sets clustering and Markov model for Web access prediction, Proc. Post Graduate Annual Seminar (2006) pp. 470–474. Google Scholar- ACM Transactions on Internet Technology 4(4), 163 (2004). Crossref, Google Scholar
M. Eirinaki , M. Vazirgiannis and D. Kapogiannis , Web path recommendations based on page ranking and Markov models, Proc. Seventh ACM Int. Workshop Web Information and Data Management (WIDM '05) (2005) pp. 2–9. Google ScholarM. Eirinaki , Introducing semantics in Web personalization: The role of ontologies, Proc. EWMF/KDO'2005 (2005) pp. 147–162. Google ScholarZ. Zhang and O. Nasraoui , Efficient hybrid Web recommendations based on Markov clickstream models and implicit search, Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence (2007) pp. 621–627. Google Scholar- Wuhan University Journal of Natural Sciences 12(5), 773 (2007). Crossref, Google Scholar
- International Journal of Information Technology & Decision Making 7(2), 291 (2008). Link, Web of Science, Google Scholar
- IEEE Transactions on Affective Computing 3(2), 152 (2011). Crossref, Web of Science, Google Scholar
T. Nguyen , H. Y. Lu and J. Lu , Ontology-style Web usage model for semantic Web applications, Proc. 10th Int Conf. Intelligent Systems Design and Applications (ISDA) (2010) pp. 784–789. Google Scholar- Knowledge and Information Systems 30(3), 1 (2011). Web of Science, Google Scholar
- INFORMS Journal on Computing 15 , 171 ( 2003 ) . Crossref, Web of Science, Google Scholar
- Knowledge and Information System 1 , 5 ( 1999 ) . Crossref, Google Scholar
B. Berendt , Measuring the accuracy of sessionizers for Web usage analysis, Proc. Workshop on Web Mining at the First SIAM Int. Conf. Data Mining (2001) pp. 7–14. Google Scholar- Soft Computing 11(8), 717 (2006). Crossref, Web of Science, Google Scholar
-
C. D. Manning , P. Raghavan and H. Schütze , An Introduction to Information Retrieval ( Cambridge University Press , Cambridge, England , 2008 ) . Crossref, Google Scholar - Science 132 , 1115 ( 1960 ) . Crossref, Web of Science, Google Scholar
- Data Mining and Knowledge Discovery 6(1), 61 (2002). Crossref, Web of Science, Google Scholar
J. Pitkow and P. Pirolli , Mining longest repeating subsequences to predict www surfing, Proc. 2nd USENIX Symp. Internet Technologies and Systems2 (1999) pp. 13–21. Google ScholarP. I. Hofgesang and J. P. Patist , On modelling and synthetically generating Web usage data, Int Conf. Web Intelligence and Intelligent Agent Technology (2008) pp. 98–102. Google Scholar- Knowledge Acquisition — Special issue: Current Issues in Knowledge Modeling 5(2), 199 (1993). Crossref, Google Scholar
P. Cimiano and J. Völker , Text2Onto: A framework for ontology learning and data-driven change discovery, NLDB'05 Proc. 10th Int. Conf. Natural Language Processing and Information Systems (2005) pp. 227–238. Google Scholar- Journal of Computer Science 1(1), 107 (2005). Crossref, Google Scholar
- The Journal of Systems and Software 83(5), 803 (2010). Crossref, Web of Science, Google Scholar