(in Polish) Text Mining (ścieżka SAS) 2400-ZEWW968
The objective of the course is to provide students with a comprehensive introduction to modern methods of text data exploration and analysis using tools available in the SAS environment. In light of the growing importance of unstructured data sources such as social media, online forums, corporate documents, and reports, the ability to process and analyze such data has become an essential part of a data analyst’s skill set.
Throughout the course, students will gain knowledge of both the theoretical foundations of natural language processing and practical techniques for preparing and cleaning text data, including tokenization, stop word removal, text normalization, and conversion into structures that enable further analysis (e.g., frequency matrices, feature vectors). The course will focus on the use of functionalities offered by SAS Text Miner and other SAS components designed for the analysis of unstructured data.
Students will learn to apply selected text mining methods such as frequency analysis, document clustering, sentiment analysis, topic extraction, and building classification models based on machine learning algorithms. Particular emphasis will be placed on the practical application of these methods in the context of real-world analytical challenges, such as customer opinion analysis, monitoring narratives about organizations, and supporting decision-making processes based on unstructured data.
An integral part of the course will be independent student projects, allowing participants to go through the full analytical cycle - from data preparation, through exploration and modeling, to the interpretation and presentation of results. The course aims not only to develop technical skills but also to foster the competencies necessary for the conscious and responsible use of text mining methods in various areas of business, public administration, and science.
Estimated student workload: 2ECTS x 25h = 50h
(K) - contact hours (S) - hours of independent work
exercises (classes): 30h (K) 0h (S)
consultations: 2h (K) 0h (S)
preparation for exercises: 0h (K) 10h (S)
work with additional materials: 0h (K) 8h (S)
Total: 32h (K) + 18h (S) = 50h
Type of course
Course coordinators
Learning outcomes
Learning outcomes (codes): K_W01, K_W02, K_U01, K_K01.
Upon completion of the course, the student:
Knowledge
is familiar with text data exploration methods and understands their specific characteristics,
is familiar with the functions and capabilities of the SAS environment for analyzing unstructured data,
is able to describe the process of natural language processing, including methods for preparing text data for analysis,
understands the importance of text data analysis in the context of real-world business, social, and scientific problems.
Skills
is able to prepare text data sets for analysis using text mining techniques,
is able to apply selected text analysis methods such as frequency analysis, sentiment analysis, topic extraction, and document clustering,
is capable of carrying out the full analytical cycle - from data preparation to result presentation,
is able to use SAS tools to practically solve analytical problems involving text data.
Social Competences
demonstrates independence in applying theoretical knowledge in the field of text data exploration and analysis, as well as in working with real empirical examples,
demonstrates responsibility and self-control by learning in conditions that require the selection of analytical methods and tools,
is systematic thanks to effective organization of personal work and completion of practical tasks during classes,
shows interest in applying text data analysis to solve current business, social, and scientific problems,
acts with integrity and academic honesty by adhering to methodological standards and fulfilling course and examination requirements.
Assessment criteria
The course is completed based on two components: active class participation (50%) and a final project (50%), which involves independently designing and carrying out a complete text data analysis process using the methods and tools covered during the course.
Bibliography
Required reading (selected chapters):
Spinczyk D., Dzieciątko M., Text Mining: metody, narzędzia i zastosowania, PWN 2016
Silge J., Robinson D., Text Mining with R: A Tidy Approach, O’Reilly 2024, https://www.tidytextmining.com/
Wróblewski P., Machine learning i natural language processing w programowaniu. Podręcznik z ćwiczeniami w Pythonie, Helion, 2024
Additional literature (selected chapters):
Gutman A. J., Goldmeier J., Analityk danych. Przewodnik po data science, statystyce i uczeniu maszynowym, Helion 2023
Jurafsky D., Martin J. H., Speech and Language Processing, 3rd ed., 2025, https://web.stanford.edu/~jurafsky/slp3/
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: