Data Mining in Engineering
FALL 2013 
Friday 58 p.m. Room: BA1240 Bahen Centre Information
Tech 
Professor S. Sayad 
OBJECTIVES

Data
mining is about explaining the past and predicting the future by means
of analyzing data. The
objective of this course is to enable you to obtain accurate, precise,
quantitative and qualitative knowledge from data. 
INTRODUCTION

An
exceptional ability to deal with data is the defining characteristic
of an engineer. Three
major problems currently associated with engineering data are:
the presence of experimental error in the data; the presence of
both qualitative and quantitative data and very large quantities of
data. There are two very closely related fields directed at overcoming
these problems: data mining and statistics.
Data Mining is a
branch of Informatics
that refers to a wide variety of methods used to discover patterns in
data (i.e. to obtain information from data). Statistics is the branch of mathematics directed at how to
collect and analyze data in the presence of variability. This course
will focus upon Data Mining and the statistics required to accomplish
it. Although the necessary statistics will be examined in class, an
undergraduate statistics course is a prerequisite for this course. Data Mining methods have been used for an extremely wide variety of applications. The list includes: computer vision techniques, control systems, traffic control, geographical information systems, electric power systems stability, medical diagnosis, electronic commerce, customer support, credit and other financial analysis, information retrieval, software development, business processes, assembly sequences in engineering, skill learning, transportation and production planning. In chemical engineering and chemistry, topics include image processing, color matching, elucidation of structure property relationships as well as structure biological function relationships, environmental assessment of processes, assessment of process efficiency and economic viability, detection of the hydrodynamic regime in two phase flow, detection of coding and noncoding regions in DNA sequences, analysis of screening data for identifying classes of compounds that are promising cancer drugs. This course will acquaint you with data mining methods which have been found particularly useful in a variety of fields and will show you how to apply these methods to practical problems. As will be seen below, the essential challenge in the course is for you to learn the methods sufficiently well to be able to apply them to diverse engineering data. 
COURSE CONTENT 
Although there will be some supplementary references (published papers, web sites, etc.) the course textbook will provide the backbone of the course. This textbook is online: An Introduction to Data Mining Topics of interest include methods of visualization, classification, clustering, regression and association. Visualization allows data miners to find patterns or structures in datasets using graphical methods. These methods include Scatterplot Matrices, Parallel Coordinates, Pixel Oriented Methods, Icon based Methods, Dimensional Stacking, Treemap and simple charts (line, bar and pie). Classification attempts to predict whether or not a case is a member of a particular qualitative class by building a model based on some independent variables. These methods include: ZeroR, OneR, Decision Tree, Naïve Bayesian, InstanceBased Learning, Artificial Neural Networks and Support Vector Machine. Regression methods emphasize fitting equations to data to predict the values of continuous variables. In addition to linear and nonlinear regression we are also interested in methods involving both qualitative and quantitative variables (e.g. Classification and Regression Trees). Clustering methods find groups of items that are similar. (Since the classes are unspecified before the analysis, clustering is sometimes referred to as unsupervised learning.) Clustering methods include KMean Clustering and SelfOrganizing Maps (SOM). Association methods create rules that describe how often events occur together. A method termed “A Priori” is a method of particular interest. 
ASSIGNMENTS

As alluded to above, in lectures, examples will be drawn from a wide variety of applications. In some cases these methods have not previously been applied to engineering data. Thus, there will be many small projects in this course requesting each of you to apply a recently learned method to such data. The preferred source of the data will be your own research. However, in some cases, data from the Internet or scientific publications may be used. Even computer simulated data may occasionally be suitable. Assignments will be discussed in class and main basis for the mark in each case will how well the example illustrates the Data Mining method. The course will not involve computer programming. However, you will be expected to be able to download and intelligently use specified software. 
MARKS

First
presentation will compose
30% of the course mark with the remaining 70% allocated
to the final presentation. There will be no mid or final term
examinations. 
Revised: August 29, 2013