The job of science is Knowledge Discovery; data are incidental to this process, representing the empirical foundations, but not the understanding per se A lot of this process is pattern recognition (including discovery of correlations, clustering/classification), discovery of outliers or anomalies, etc.

Goal: previously unseen records should be assigned a class as accurately as possible.Brescia -Data Mining -lezione 2 22 Data Mining Methods and Some Examples Clustering Classification Associations Neural Nets Decision Trees Pattern Recognition Correlation/Trend Analysis Principal Component Analysis Independent Component Analysis Regression Analysis Outlier Identification Visualization Autonomous Agents Self-Organizing Maps(SOM) Link (Affinity Analysis) Group together similar items and separate dissimilar items in DB Classify new data items using the known classes& groups Find unusual co-occurring associations of attribute values among DB items Predict a numeric attribute value Organize information in the database based on relationships among key data descriptors Identify linkages between data items based on features shared in common M.Brescia -Data Mining -lezione 2 23 Classification: Definition Given a collection of records(training set) Eachrecordcontainsasetofattributes,oneoftheattributesisthe class.Motivation Data explosion problem Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge!Solution: Data warehousing and data mining Data warehousing and on-line analytical processing Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases M.

