search for books and compare prices
Tables of Contents for Pattern Classification
Chapter/Section Title
Page #
Page Count
Preface
xvii
 
Introduction
1
19
Machine Perception
1
1
An Example
1
8
Related Fields
8
1
Pattern Recognition Systems
9
5
Sensing
9
1
Segmentation and Grouping
9
2
Feature Extraction
11
1
Classification
12
1
Post Processing
13
1
The Design Cycle
14
2
Data Collection
14
1
Feature Choice
14
1
Model Choice
15
1
Training
15
1
Evaluation
15
1
Computational Complexity
16
1
Learning and Adaptation
16
1
Supervised Learning
16
1
Unsupervised Learning
17
1
Reinforcement Learning
17
1
Conclusion
17
3
Summary by chapters
17
1
Bibliographical and Historical Remarks
18
1
Bibliography
19
1
Bayesian Decision Theory
20
64
Introduction
20
4
Bayesian Decison Theory---Continuous Features
24
2
Two-Category Classification
25
1
Minimum-error-Rate Classification
26
3
Minimax Criterion
27
1
Neyman-Pearson Criterion
28
1
Classifiers, Discriminant Functions, and Decision Surfaces
29
2
The Multicategory Case
29
1
The Two-Category Case
30
1
The Normal Density
31
5
Univariate Density
32
1
Multivariate Density
33
3
Discriminant Functions for the Normal Density
36
9
Case 1: Σi = σ2I
36
3
Case 2: Σi = Σ
39
2
Case 3: Σi = arbitrary
41
1
Decision Regions for Two-Dimensional Gaussian Date
41
4
Error Probabilities and Integrals
45
1
Error Bounds for Normal Densities
46
5
Chernoff Bound
46
1
Bhattacharyya Bound
47
1
Error Bounds for Gaussian Distribution
48
1
Signal Detection Theory and Operating Characteristics
48
3
Bayes Decision Theory---Discrete Features
51
3
Independent Binary Features
52
1
Bayesian Decisions for Three-Dimensional Binary Data
53
1
Missing and Noisy Features
54
2
Missing Features
54
1
Noisy Features
55
1
Bayesian Belief Network
56
6
Belief Network for Fish
59
3
Compound Bayesian Decision Theory and Context
62
22
Summary
63
1
Bibliographical and Historical Remarks
64
1
Problems
65
15
Computer exercises
80
2
Bibliography
82
2
Maximum-Likelihood and Bayesian Parameter Estimation
84
77
Introduction
84
1
Maximum-Likelihood Estimation
85
5
The General Principle
85
3
The Gaussian Case: Unknown μ
88
1
The Gaussian Case: Unknown μ and Σ
88
1
Bias
89
1
Bayesian Estimation
90
2
The Class-Conditional Densities
91
1
The Parameter Distribution
91
1
Bayesian Parameter Estimation: Gaussian Case
92
5
The Univariate Case: p(μ\D)
92
3
The Univariate Case: p(x\D)
95
1
The Multivariate Case
95
2
Bayesian Parameter Estimation: General Theory
97
5
Recursive Bayes Learning
98
2
When Do Maximum-Likelihood and bayes Methods Differ?
100
1
Noninformative Priors and Invariance
101
1
Gibbs Algorithm
102
1
Sufficient Statistics
102
5
Sufficient Statistics and the Exponential Family
106
1
Problems of Dimensionality
107
7
Accuracy Dimension and Training Sample Size
107
4
Computational Complexity
111
2
Overfitting
113
1
Component Analysis and Discriminants
114
10
Principal Component Analysis (PCA)
115
2
Fisher Linear Discriminant
117
4
Multiple Discriminant Analysis
121
3
Expectation-Maximization (EM)
124
4
Expectation-Maximization for a 2D Normal Model
126
2
Hidden Markov Models
128
33
First-Order Markov Models
128
1
First-Order Hidden Markov Models
129
1
Hidden Markov Model Computation
129
2
Evaluation
131
2
Hidden Markov Model
133
2
Decoding
135
1
HMM Decoding
136
1
Learning
137
2
Summary
139
1
Bibliographical and Historical Remarks
139
1
Problems
140
15
Computer exercises
155
4
Bibliography
159
2
Nonparametric Techniques
161
54
Introduction
161
1
Density Estimation
161
3
Parzen Windows
164
10
Convergence of the Mean
167
1
Convergence of the Variance
167
1
Illustrations
168
1
Classification Example
168
4
Probabilistic Neural Networks (PNNs)
172
2
Choosing the Window Function
174
1
kn-Nearest-Neighbor Estimation
174
3
kn-Nearest-Neighbor and Parzen-Window Estimation
176
1
Estimation of A Posteriori Probabilities
177
1
The Nearest-Neighbor Rule
177
10
Convergence of the Nearest Neighbor
179
1
Error Rate for the Nearest-Neighbor Rule
180
1
Error Bounds
180
2
The k-Nearest-Neighbor Rule
182
2
Computational Complexity of the k-Nearest-Neighbor Rule
184
3
Metrics and Nearest-Neighbor Classification
187
5
Properties of Metrics
187
1
Tangent Distance
188
4
Fuzzy Classification
192
3
Reduced Coulomb Energy Networks
195
2
Approximations by Series Expansions
197
18
Summary
199
1
Bibliographical and Historical Remarks
200
1
Problems
201
8
Computer exercises
209
4
Bibliography
213
2
Linear Discriminant Functions
215
67
Introduction
215
1
Linear Discriminant Functions and Decision Surfaces
216
3
The Two-Category Case
216
2
The Multicategory Case
218
1
Generalized Linear Discriminant Functions
219
4
The Two-Category Linearly Separable Case
223
4
Geometry and Terminology
224
1
Gradient Descent Procedures
224
3
Minimizing the Perceptron Criterion Function
227
8
The Perceptron Criterion Function
227
2
Convergence Proof for Single-Sample Correction
229
3
Some Direct Generalizations
232
3
Relaxation Procedures
235
3
The Descent Algorithm
235
2
Convergence Proof
237
1
Nonseparable Behavior
238
1
Minimum Squared-Error Procedures
239
10
Minimum Squared-Error and the Pseudoinverse
240
1
Constructing a Linear Classifier by Matrix Pseudoinverse
241
1
Relation to Fisher's Linear Discriminant
242
1
Asymptotic Approximation to an Optimal Discriminant
243
2
The Widrow-Hoff or LMS Procedure
245
1
Stochastic Approximation Methods
246
3
The Ho-Kashyap Procedures
249
7
The Descent Procedure
250
1
Convergence Proof
251
2
Nonseparable Behavior
253
1
Some Related Procedures
253
3
Linear Programming Algorithms
256
3
Linear Programming
256
1
The Linearly Separable Case
257
1
Minimizing the Perceptron Criterion Function
258
1
Support Vector Machines
259
6
SVM Training
263
1
SVM for the XOR Problem
264
1
Multicategory Generalizations
265
17
Kesler's Construction
266
1
Convergence of the Fixed-Increment Rule
266
2
Generalizations for MSE Procedures
268
1
Summary
269
1
Bibliographical and Historical Remarks
270
1
Problems
271
7
Computer exercises
278
3
Bibliography
281
1
Multilayer Neural Networks
282
68
Introduction
282
2
Feedforward Operation and Classification
284
4
General Feedforward Operation
286
1
Expressive Power of Multilayer Networks
287
1
Backpropagation Algorithm
288
8
Network Learning
289
4
Training Protocols
293
2
Learning Curves
295
1
Error Surfaces
296
3
Some Small Networks
296
2
The Exclusive-OR (XOR)
298
1
Larger Networks
298
1
How Important Are Multiple Minima?
299
1
Backpropagation as Feature Mapping
299
4
Representations at the Hidden Layer-Weights
302
1
Backpropagation, Bayes Theory and Probability
303
2
Bayes Discriminants and Neural Networks
303
1
Outputs as Probabilities
304
1
Related Statistical Techniques
305
1
Practical Techniques for Improving Backpropagation
306
12
Activation Function
307
1
parameters for the Sigmoid
308
1
Scaling Input
308
1
Target Values
309
1
Training with Noise
310
1
Manufacturing Data
310
1
Number of Hidden Units
310
1
Initializing Weights
311
1
Learning Rates
312
1
Momentum
313
1
Weight Decay
314
1
Hints
315
1
On-Line, Stochastic or Batch Training?
316
1
Stopped Training
316
1
Number of Hidden Layers
317
1
Criterion Function
318
1
Second-Order Methods
318
6
Hessian Matrix
318
1
Newton's Method
319
1
Quickprop
320
1
Conjugate Gradient Descent
321
1
Conjugate Gradient Descent
322
2
Additional Networks and Training Methods
324
6
Radial Basis Function Networks (RBFs)
324
1
Special Bases
325
1
Matched Filters
325
1
Convolutional Networks
326
2
Recurrent Networks
328
1
Cascade-Correlation
329
1
Regularization, Complexity Adjustment and Pruning
330
20
Summary
333
1
Bibliographical and Historical Remarks
333
2
Problems
335
8
Computer exercises
343
4
Bibliography
347
3
Stochastic Methods
350
44
Introduction
350
1
Stochastic Search
351
9
Simulated Annealing
351
1
The Boltzmann Factor
352
5
Deterministic Simulated Annealing
357
3
Boltzmann Learning
360
10
Stochastic Boltzmann Learning of Visible States
360
5
Missing Features and Category Constraints
365
1
Deterministic Boltzmann Learning
366
1
Initialization and Setting Parameters
367
3
Boltzmann Networks and Graphical Models
370
3
Other Graphical Models
372
1
Evolutionary Methods
373
5
Genetic Algorithms
373
4
Further Heuristics
377
1
Why Do They Work?
378
1
Genetic Programming
378
16
Summary
381
1
Bibliographical and Historical Remarks
381
2
Problems
383
5
Computer exercises
388
3
Bibliography
391
3
Nonmetric Methods
394
59
Introduction
394
1
Decision Trees
395
1
CART
396
15
Number of Splits
397
1
Query Selection and Node Impurity
398
4
When to Stop Splitting
402
1
Pruning
403
1
Assignment of Leaf Node Labels
404
1
A Simple Tree
404
2
Computational Complexity
406
1
Feature Choice
407
1
Multivariate Decision Trees
408
1
Priors and Costs
409
1
Missing Attributes
409
1
Surrogate Splits and Missing Attributes
410
1
Other Tree Methods
411
2
ID3
411
1
C4.5
411
1
Which Tree Classifier Is Best?
412
1
Recognition with Strings
413
8
String Matching
415
3
Edit Distance
418
2
Computational Complexity
420
1
String Matching with Errors
420
1
String Matching with the ``Don't-Care'' Symbol
421
1
Grammatical Methods
421
8
Grammars
422
2
Types of String Grammars
424
1
A Grammar for Pronouncing Numbers
425
1
Recognition Using Grammars
426
3
Grammatical Inference
429
2
Grammatical Inference
431
1
Rule-Based Methods
431
22
Learning Rules
433
1
Summary
434
1
Bibliographical and Historical Remarks
435
2
Problems
437
9
Computer exercises
446
4
Bibliography
450
3
Algorithm-Independent Machine Learning
453
64
Introduction
453
1
Lack of Inherent Superiority of Any Classifier
454
11
No Free Lunch Theorem
454
3
No Free Lunch for Binary Data
457
1
Ugly Duckling Theorem
458
3
Minimum Description Length (MDL)
461
2
Minimum Description Length Principle
463
1
Overfitting Avoidance and Occam's Razor
464
1
Bias and Variance
465
6
Bias and Variance for Regression
466
2
Bias and Variance for Classification
468
3
Resampling for Estimating Statistics
471
4
Jackknife
472
1
Jackknife Estimate of Bias and Variance of the Mode
473
1
Bootstrap
474
1
Resampling for Classifier Design
475
7
Bagging
475
1
Boosting
476
4
Learning with Queries
480
2
Arcing, Learning with Queries, Bias and Variance
482
1
Estimating and Comparing Classifiers
482
13
Parametric Models
483
1
Cross-Validation
483
2
Jackknife and Bootstrap Estimation of Classification Accuracy
485
1
Maximum-Likelihood Model Comparison
486
1
Bayesian Model Comparison
487
2
The Problem-Average Error Rate
489
3
Predicting Final Performance from Learning Curves
492
2
The Capacity of a Separating Plane
494
1
Combining Classifiers
495
22
Component Classifiers with Discriminant Functions
496
2
Component Classifiers without Discriminant Functions
498
1
Summary
499
1
Bibliographical and Historical Remarks
500
2
Problems
502
6
Computer exercises
508
5
Bibliography
513
4
Unsupervised Learning and Clustering
517
84
Introduction
517
1
Mixture Densities and Identifiability
518
1
Maximum-Likelihood Estimates
519
2
Application to Normal Mixtures
521
9
Case 1: Unknown Mean Vectors
522
2
Case 2: All Parameters Unknown
524
2
k-Means Clustering
526
2
Fuzzy k-Means Clustering
528
2
Unsupervised Bayesian Learning
530
7
The Bayes Classifier
530
1
Learning the Parameter Vector
531
3
Unsupervised Learning of Gaussian Data
534
2
Decision-Directed Approximation
536
1
Data Description and Clustering
537
5
Similarity Measures
538
4
Criterion Functions for Clustering
542
6
The Sum-of-Squared-Error Criterion
542
1
Related Minimum Variance Criteria
543
1
Scatter Criteria
544
2
Clustering Criteria
546
2
Iterative Optimization
548
2
Hierarchical Clustering
550
7
Definitions
551
1
Agglomerative Hierarchical Clustering
552
3
Stepwise-Optimal Hierarchical Clustering
555
1
Hierarchical Clustering and Induced Metrics
556
1
The Problem of Validity
557
2
On-line clustering
559
7
Unknown Number of Clusters
561
2
Adaptive Resonance
563
2
Learning with a Critic
565
1
Graph-Theoretic Methods
566
2
Component Analysis
568
5
Principal Component Analysis (PCA)
568
1
Nonlinear Component Analysis (NLCA)
569
1
Independent Component Analysis (ICA)
570
3
Low-Dimensional Representations and Multidimensional Scaling (MDS)
573
28
Self-Organizing Feature Maps
576
4
Clustering and Dimensionality Reduction
580
1
Summary
581
1
Bibliographical and Historical Remarks
582
1
Problems
583
10
Computer exercises
593
5
Bibliography
598
3
A MATHEMATICAL FOUNDATIONS
601
36
A.1 Notation
601
3
A.2 Linear Algebra
604
6
A.2.1 Notation and Preliminaries
604
1
A.2.2 Inner Product
605
1
A.2.3 Outer Product
606
1
A.2.4 Derivatives of Matrices
606
2
A.2.5 Determinant and Trace
608
1
A.2.6 Matrix Inversion
609
1
A.2.7 Eigenvectors and Eigenvalues
609
1
A.3 Lagrange Optimization
610
1
A.4 Probability Theory
611
12
A.4.1 Discrete Random Variables
611
1
A.4.2 Expected Values
611
1
A.4.3 Pairs of Discrete Random Variables
612
1
A.4.4 Statistical Independence
613
1
A.4.5 Expected Values of Functions of Two Variables
613
1
A.4.6 Conditional Probability
614
1
A.4.7 The Law of Total Probability and Bayes' Rule
615
1
A.4.8 Vector Random Variables
616
1
A.4.9 Expectations, Mean Vectors and Covariance Matrices
617
1
A.4.10 Continuous Random Variables
618
2
A.4.11 Distributions of Sums of Independent Random Variables
620
1
A.4.12 Normal Distributions
621
2
A.5 Gaussian Derivatives and Integrals
623
5
A.5.1 Multivariate Normal Densities
624
2
A.5.2 Bivariate Normal Densities
626
2
A.6 Hypothesis Testing
628
2
A.6.1 Chi-Squared Test
629
1
A.7 Information Theory
630
3
A.7.1 Entropy and Information
630
2
A.7.2 Relative Entropy
632
1
A.7.3 Mutual Information
632
1
A.8 Computational Complexity
633
4
Bibliography
635
2
Index
637