search for books and compare prices
Tables of Contents for Document Warehousing and Text Mining
Chapter/Section Title
Page #
Page Count
Acknowledgments
xiii
 
Preface
xv
 
Part One: Text Analysis for Business Intelligence
1
78
Expanding the Scope of Business Intelligence
3
26
The Need to Deal with Text
3
3
Growth of Textual Information---The Good News
6
1
Growth of Textual Information---The Bad News
7
2
Finding Information: It's Not as Easy as It Used to Be
7
2
Beware What You Wish for: Finding Too Much Information
9
1
The Document Warehousing Approach to the Information Glut
9
11
Supporting Business Intelligence with Text
10
1
Defining the Document Warehouse
11
9
The Role of Text Mining in Document Warehousing
20
7
Building the Document Warehouse
22
2
Benefits of Document Warehousing
24
3
Conclusions
27
2
Understanding the Structure of Text: The Foundation of Text-Based Business Intelligence
29
26
The Myth of Unstructured Texts
30
1
Natural Structures: It's All in Your Head
31
12
The Building Blocks of Language
31
11
Working with Statistical Techniques
42
1
Macrostructures: Introducing Artificial Structures in Documents
43
10
Hierarchical Conventions from Words to Documents The Jewel in the Crown: Markup
45
1
Languages for Arbitrary Structure
46
4
It Isn't So Linear After All: Hypertext
50
3
Conclusions
53
2
Exploiting the Structure of Text
55
24
Text-Oriented Business Intelligence Operations
56
9
Summarizing Documents
57
1
Classifying and Routing Documents
58
2
Answering Questions
60
1
Searching and Browsing by Topic and Theme
61
3
Searching by Example
64
1
Text-Oriented Business Intelligence Techniques
65
9
Full Text Searching: Text Processing 101
65
5
Undirected Summarization
70
1
Document Clustering
71
3
Integration with the Data Warehouse
74
2
Dimensional Models: A Quick Refresher
74
2
Integration with the World Wide Web
76
1
Adapting to Changing Users' Interests
77
1
Conclusions
78
1
Part Two: Document Warehousing
79
242
Overview of Document Warehousing
81
22
Meeting Business Intelligence Requirements
82
2
Who Are the End Users?
82
1
What Information Is Needed?
82
1
When Is It Needed?
83
1
Where Is the Information Found?
84
1
The Role of the Document Warehouse in Business Intelligence
84
1
The Architecture of the Document Warehouse
85
11
Document Sources
86
3
Text Processing Servers
89
3
Text Databases and Other Storage Options
92
1
Metadata Repositories
93
1
User Profiling
94
2
The Process of Document Warehousing
96
6
Identifying Document Sources
96
2
Document Retrieval
98
1
Preprocessing Operations
99
2
Text Analysis Operations
101
1
Managing the Document Warehouse
101
1
Supporting End-User Operations
102
1
Conclusions
102
1
Meeting Business Intelligence Requirements: More Than Just Numbers
103
20
A Variety of Problems to Choose From
104
4
Intelligent Document Management
104
1
Historical Reporting and Trend Analysis
105
1
Market Monitoring
106
1
Competitive Intelligence
107
1
Defining the Business Objectives
108
9
Getting What You Want from Your Text
108
4
Answering the Right Business Questions
112
2
Determining Who Will Use the System
114
1
Extracting the Right Information for Future Processing and Searching
115
1
Classifying Documents for Browsing
115
2
Setting the Scope
117
4
Time Requirements
118
1
Space Requirements and Sizing the Document Warehouse
119
1
Creating the Document Warehouse Project Plan
120
1
Design and Development
120
1
Conclusions
121
2
Designing the Document Warehouse Architecture
123
36
Document Sources
124
12
File Servers
125
4
Document Management Systems
129
5
Internet Resources
134
1
From Document Sources to Text Analysis
135
1
Text processing Servers
136
5
Using Crawlers and Agents to Retrieve Documents
136
4
Text Analysis Services
140
1
Document Warehouse Storage Options
141
3
Database Options
142
2
The Metadata Repository and Document Data Model
144
6
Document Content Metadata
144
1
Search and Retrieval Metadata
145
2
Text Mining Metadata
147
1
Storage Metadata
148
1
Document Data Model
149
1
User Profiles and End-User Support
150
5
End-User Profiles
153
2
Data Warehouse and Data Mart Integration
155
3
Linking Numbers and Text
156
1
Integration Heuristics
157
1
Conclusions
158
1
Finding and Retrieving Relevant Text
159
24
Manual Retrieval Methods
160
3
Search Tools
161
2
Automatic Retrieval Methods
163
8
Data-Driven Searching
163
1
Searching Internal Networks
164
1
Configuring Crawlers
165
4
Batch versus Interactive Retrieval
169
2
Retrieving from Document Processing Systems
171
1
Tradeoffs between Manual and Automatic Retrieval
171
3
Precision
172
1
Recall
172
1
Cost
173
1
Effectiveness
173
1
Text Management Issues
174
2
Avoiding Duplication
174
1
Accommodating Document Revisions and Versioning
175
1
Assessing the Reliability of a Source
175
1
Improving Performance
176
6
Representing Users' Areas of Interest
176
1
Data Store for Interest Specifications
177
1
Creating Interest Specifications
178
2
Interest Specifications Drive Searching
180
1
Prototype-Driven Searching
181
1
Conclusions
182
1
Loading and Transforming Documents
183
26
Internationalization and Character Set Issues
184
2
Coded Character Sets
185
1
Translating Documents
186
9
Indexing Text
195
3
Full Text Indexing
195
1
Thematic Indexing
196
2
Document Classification
198
3
Labeling
198
2
Multidimensional Taxonomies
200
1
Document Clustering
201
3
Binary Relational Clustering
202
1
Hierarchical Clustering
202
1
Self-Organizing Map Clustering
202
2
Summarizing Text
204
4
Basic Summarization Methods
205
1
Dealing with Large Documents
206
2
Conclusions
208
1
Managing Document Warehouse Metadata
209
30
Metadata Standards
210
18
Common Warehouse Model
212
9
Knowledge Management Based on the Open Information Model
221
2
Dublin Core
223
5
Adapting Metadata Standards to Document Warehousing
228
1
Content Metadata
228
2
Technical Metadata
230
5
Controlling Document Loads in the Warehouse
231
2
Prioritizing Items in Multiple Processing Queues
233
1
Summarizing Documents
234
1
Business Metadata
235
3
Quality: Timeliness and Reliability
236
1
Access Control
236
1
Versioning
237
1
Conclusions
238
1
Ensuring Document Warehouse Integrity
239
22
Information Stewardship and Quality Control
240
14
Document Search and Retrieval
241
7
Text Analysis
248
5
Content Validation
253
1
Security
254
4
File System Security
255
1
Database Roles and Privileges
255
1
Programmatic Access Control
255
1
Virtual Database Security
256
2
Privacy
258
2
Contracts between Document Owners and the Warehouse
258
1
Is Privacy the Third Rail of Business Intelligence? Protecting Individuals and Organizations
259
1
Conclusions
260
1
Choosing Tools for Building the Document Warehouse
261
44
Choosing Text Analysis Tools
262
36
Statistical/Heuristic Approach
264
11
The Knowledge-Based Approach
275
12
Neural Network Approach: Megaputer's TextAnalyst
287
7
There Is More Than One Way to Mine Text
294
4
Choosing Supplemental Tools
298
3
Choosing Web Document Retrieval Tools
301
2
Conclusions
303
2
Developing a Document Warehouse: A Checklist
305
16
What Should Be Stored?
306
4
Understanding User Needs
306
1
Defining Document Sources
307
1
Metadata
308
1
User Profiles
309
1
Integration with the Data Warehouse
309
1
Where Should It Be Stored?
310
2
What Text Mining Services Should Be Used?
312
4
Indexing Services
313
1
Feature Extraction
314
1
Summarization
314
1
Document Clustering
314
1
Question Answering
315
1
Classification and Routing
315
1
Building Taxonomies and Thesauri
316
1
How Should the Warehouse Be Populated?
316
3
Crawlers
317
1
Searching
318
1
How Should the Warehouse Be Maintained?
319
1
Conclusions
319
2
Part Three: Text Mining
321
136
What is Text Mining?
323
46
Defining Text Mining
324
2
Foundations of Text Mining
326
30
Information Retrieval
327
14
Computational Linguistics and Natural Language Processing
341
10
Discovering Knowledge in Text: Example Cases
351
5
Text Mining Methodology: Using the Cross-Industry Process Model for Data Mining
356
9
Business Understanding
358
1
Data Understanding
359
1
Data Preparation
360
1
Modeling
361
1
Evaluation
362
1
Deployment
362
1
Adopting the CRISP-DM to Text Mining
363
2
Text Mining Applications
365
3
Knowing Your Business
366
1
Knowing Your Customer
366
1
Knowing Your Competition and Market
367
1
Conclusions
368
1
Know Thyself: Using Text Mining for Operational Management
369
18
Operations and Projects: Understanding the Distinct Needs of Each
371
3
Enterprise Document Management Systems
374
4
Benefits of Enterprise Document Management Systems
374
1
Limits of Enterprise Document Management Systems
375
3
Integrating Document Management with Document Warehousing
378
3
Document Extraction
378
3
Steps to Effective Text Mining for Operational Management
381
3
Specifying a Process for Extracting Information
381
3
Meeting Wide-Ranging Organizational Needs
384
1
Conclusions
385
2
Knowing Your Business-to-Business Customer: Text Mining for Customer Relationship Management
387
20
Understanding Your Customer's Market
388
1
Developing a Customer Intelligence Profile
389
2
Sample Case of B-to-B Customer Relationship Management
391
15
Getting the Information I: Internal Sources
391
6
Getting the Information II: External Documents
397
1
Collecting External Documents
398
1
Preliminary Document Analysis
399
7
Conclusions
406
1
Text Mining for Competitive Intelligence
407
30
Competitive Intelligence versus Business Intelligence
408
2
Competitive Intelligence Profiles
410
3
Identifying Information Sources
413
3
XML Text Processing Operations
416
9
XML Interface Models
417
6
Getting Financial Information from XBRL Documents
423
2
The Practice of Competitive Intelligence
425
11
Competitive Intelligence in Health Care: Patent Analysis
426
5
Competitive Intelligence in Manufacturing: Financial Analysis
431
2
Competitive Intelligence in Financial Services Market: Market Issue Analysis
433
3
Conclusions
436
1
Text Mining Tools
437
20
Criteria for Choosing Tools
438
15
Preprocessing Tools
438
8
Text Mining Tool Selection
446
7
Still Looking for a Silver Bullet: The Limits of Text Mining
453
3
Discourse Analysis
454
1
Semantic Models
455
1
Conclusions
456
1
Part Four: Conclusions
457
26
Changes in Business Intelligence
459
24
Business Intelligence and the Dynamics of Organizations
459
7
Changing Decision Makers
461
1
Changing Technologies
462
1
Changing Strategies
463
3
Meeting BI Needs with the Document Warehouse and Text Mining
466
8
The Process of Document Warehousing
466
8
Text Mining for Decision Support
474
4
Text Mining and Operational Management
475
1
Text Mining and Customer Relationship Management
475
1
Text Mining and Competitive Intelligence
476
2
Shifting Emphasis of BI
478
1
Text, Not Just Numbers
478
1
Heuristics, Not Just Algorithms
478
1
Distributed Intelligence, Centralized Management
479
1
Next Steps: Where Do We Go from Here?
479
2
Conclusions
481
2
Appendix A: Templates
483
4
Appendix B: Tools and Resources
487
10
Appendix C: Basic Document Warehouse Data Model
497
18
Bibliography
515
8
Glossary
523
4
Index
527