Text Mining 2400-ZEWW330
1. Introduction to methods of analyzing unstructured data. Techniques including Data Mining, Text Mining, Web Mining.
2. Functionality and tools of SAS Enterprise Miner.
3. Functionality and tools of SAS Text Miner.
4. Search methods for text information. Decomposition of text data. Quantitative representation of a set of documents.
5. Automatic processing of text data. Identification of keywords.
6. Stop list, start list. Canonical forms. Weighing functions. Frequency weights.
7. Transformation of text data. Reducing the size of the frequency matrix.
8. Data visualization. Creating a concept link tree.
9. Analysis of large document repositories. Using the %tmfilter macro in the text mining process.
10. Web content analysis. The use of the %tmfilter macro in the web mining process.
11. Clustering methods. Analysis of segment and cluster profiles.
12. Classification models. Scoring. Evaluation of the generated model.
13. Grouping of text data and prognostic modeling.
14. Predictions based on unordered text.
15. Cooperation with other SAS Enterprise Miner packages. Other Text Mining tools.
Type of course
Prerequisites (description)
Course coordinators
Learning outcomes
Knowledge acquired through participation in the course: statistical methods useful in the analysis of unstructured data and their exemplary applications to find unknown relationships, patterns and trends between data in the collected data sets, as well as practical skills in using the SAS Enterprise Miner program and SAS Text Miner.
KW01, KW02, KW03, KU01, KU02, KU03, KK01, KK02, KK03
Assessment criteria
Students are graded on the basis of a final project based on a self-designed and implemented model of text data analysis.
Bibliography
Obligatory reading:
[1] Lasek M., Pęczkowski M., Enterprise Miner. Wykorzystywanie narzędzi Data Mining w systemie SAS.
[2] Lasek M., Data Mining. Zastosowania w analizach i ocenach klientów bankowych, Oficyna Wydawnicza „Zarządzanie i finanse”, Warszawa 2002.
[3] Witkowska D., Sztuczne sieci neuronowe i metody statystyczne. Wybrane zagadnienie finansowe, Wydawnictwo C.H. Beck, Warszawa 2002.
[4] Text Mining Using SAS Software, SAS Education.
Further reading:
[1] Frątczak E., Pęczkowski M., Sienkiewicz K., Skaskiewicz K., Statystyka od podstaw z systemem SAS, ISBN 83-7225-179-7, Oficyna Wydawnicza Szkoły Głównej Handlowej, Warszawa 2002.
[2] Giudici P., Applied Data Mining. Statistical Methods for Business and Industry, Wiley 2003.
[3] Hadasik D. (1998), Upadłość przedsiębiorstw w Polsce i metody jej prognozowania, Wydawnictwo Akademii Ekonomicznej w Poznaniu, Poznań.
[4] Jagielska J., Matthews Ch. Whitfort T. (1999), An investigation into the application of neural networks, fuzzy logic, genetic algorithms, and rough sets to automated knowledge acquisition for classification problems, Neurocomputing 24, 37-54.
[5] Jain L.B., Martin N.M. (eds.) (1999), Fusion of Neural Networks, Fuzzy Sets, and Genetic Algorithms. Industrial Applications, CRC Press.
[6] Kudyba S., Managing Data Mining. Advice from Experts, IT Solutions Series, ISBN 1-59140-243-3, CyberTech Publishing, Idea Group Inc. 2004.
[7] Nelles O. (2001), Nonlinear System Identification. From Classical Approaches to Neural Networks and Fuzzy Models, Springer Verlag, Berlin Heidelberg.
[8] Osowski S. (2001), Sieci neuronowe wykorzystujące systemy wnioskowania rozmytego, Software nr 2, 18-20 i 62.
[9] Raudys Š. (2001), Statistical and Neural Classifiers. An Integrated Approach to Design, Springer-Verlag, London.
[10]Ribeiro R., Zimmermann H.-J., Yager R., Kacprzyk J. (1999), Soft Computing in Financial Engineering, Studies in Fuzzines and Soft Computing, vol. 28, Physica Verlag, Heidelberg.
[11]Wang J. (ed.), Data Mining. Opportunities and Challenges, IRM Press 2003.
[12]Witten J.H., Frank E. (2000), Data Mining. Practical Machine Learning Tools and Techniques with Java Implementations, Academic Press, Morgan Kaufmann Publishers.
[13]Zwierz U., Wstęp do systemu SAS, Oficyna Wydawnicza Szkoły Głównej Handlowej, Warszawa 2001.
[14]Data & Text Mining, wydawca Prentice Hall.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: