Share with you (Internet Application): W7- The Interest of Web Mining (web2.0)

Looking at such photo, can you guess what it is?? I am now taking the course of data mining and I am very intersting in one of topic in data mining- The web mining!!
Web mining is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining. (from the WIKI)
The content of website mining contains:
1. Web Content Mining: (Web Page Content Mining, Search Result Mining)
2. Web Structure Mining
3. Web Usage Mining: (General Access Pattern Tracking, Customized Usage Tracking)
We can apply the data mining technique in the Web2.0. That is so great to do this because through the web mining, we can get more understanding of the internet. It sounds so great but is not an easy task! Please have look at the basic introduction of the above photo about the web mining. It is very good news for website companies that web mining technique really gives them what they need. Most of the website companies want to keep track of the customers’ behaviors and do the CRM (Customer Relationship Management) , they may want to know: what is the most popular page visitors like to browse? What kind of service customers like best? The top 10 most visited web pages? The sequential visiting patterns of certain visitors and so on. By analyzing the web log files, customers’ information files and so on, we can give out the answers for the above questions. I have read a lot of information for this topic and here I want to introduce some basic method of analyzing the web log files.

Goal: We can find out what combination of pages most users like to browse? (page 1,5,6,7……)
Material: The web log file, data base management system, data analysis software…
Algorithm: Association Rules
Sample:
The XYZ Corporation maintains a set of five web pages: {A, B, C, D, E}. The following sessions have been created:
S1 = {U1, }
S2 = {U2, }
S3 = {U1, }
S4 = {U3, }
Where u1, u2 and u3 are identifies of three users and the support threshold is 30%. (Appear for 2 times)
C1= {(A), (B), (C), (E)}
L1 = {(A), (B), (C), (E)} (qualified item set: Appears>=2)
C2 = {(A, B), (A, C), (A, E), (B, C), (B, E), (C,E)}
L2 = {(A, C), (B, C), (C, E)} (qualified item set: Appears>=2)
C3 = {(A, B, C), (A, C, E), (B, C, E)}
As a result, the following web page(s) occurred together at least twice in the 4 transactions:
L = {(A), (B), (C), (E), (A, C), (B, C), (C, E)}
It is very interesting to do this research to keep track of the customers’ behaviors so that the website companies can adjust their web site immediately and revise the web site according to customers’ need.
Other famous algorithm is called: Sequential Patterns (the visiting sequence of the webpage) just like people like to buy peanuts after they buy beers.
Reference:
1. Sementic Web Mining
2. WIKI
3. Information and Pattern Discovery on the World Wide Web
4. Web Mining and Web Usage Mining

Share with you (Internet Application)

Sunday, March 04, 2007

W7- The Interest of Web Mining (web2.0)

0 Comments:

About

About Me

Previous