Concept of web usage mining
Web servers record and accumulate data about user interactions whenever requests for resources are received. Analyzing the web access logs of different web sites can help understand the user behavior and the web structure, thereby improving the design of this colossal collection of resources. There are two main tendencies in Web Usage Mining driven by the applications of the discoveries: General Access Pattern Tracking and Customized Usage Tracking [2, 5]. Web Usage Mining is to mine data from log record on web page. Log record lots useful information such as URL, IP address and time and so on. Analyzing and discovering Log could help us to find more potential customers and trace service quality and so on .
The web usage mining is the process of applying the data mining technology to the web data and is the pattern of extracting something that the users are interest in from their network behaviors to be interested. When people visit one website, he will leave some data such as IP address, visiting pages, visiting time and so on, web usage mining will collect, analyze and process the log and recording data . Through these, utilize some mathematic method to establish users’ behavior and the interest models, and use these models to understand the user behavior, thus to improve the website structure. Then finally provides a better characteristic information service for the user.
Approach of web usage mining
The web usage mining generally includes the following several steps: data collection, data pretreatment, establishing interesting model the data back processes. (1) Data collection Data collection is the first step of web usage mining, the data authenticity and integrality will directly affect the following works smoothly carrying on and the final recommendation of characteristic service’s quality. Therefore it must use scientific, reasonable and advanced technology to gather various data.
At present, towards web usage mining technology, the main data origin has three kinds: server data, client data and middle data (agent server data and package detecting). (2) Data pretreatment Some databases are insufficient, inconsistent and including noise. The data pretreatment is to carry on a unification transformation to those databases. The result is that the database will to become integrate and consistent, thus establish the database which may mine. In the data pretreatment work, mainly include data clearing, user recognition, user conversation recognition and data formatting. (3) Establish interesting model Use statistical method to carry on the analysis and mine the pretreated data. We may discover the user or the user community’s interests then construct interest model.
At present the usually used machine learning methods mainly have clustering, classifying, the relation discovery and the order model discovery. Each method has its own excellence and shortcomings, but the quite effective method mainly is classifying and clustering at the present. (4)Pattern analysis Carry on the further analysis and induction to the interested pattern which has already established. First delete the less significance rules or models from the interested model storehouse; Next use technology of OLAP and so on to carry on the comprehensive mining and analysis; Once more, let discovered data or knowledge be visible; Finally, provide the characteristic service to the electronic commerce website.