Text Mining 1600-SZD-SPEC-TM-EF
Text Mining is an interdisciplinary approach to quantitative analysis of textual data. Text Mining can be a valuable tool for analyzing large collections of data, such as web portals, newspapers, academic articles, social media, archives, transcripts or government documents. Through the use of a variety of techniques, such as tokenization, lemmatization, sentiment analysis, and topic classification, automated text analysis allows content exploration to identify potentially relevant patterns, trends, and information mining. This makes it possible to process, categorize, and visualize textual data to better understand economic, social, or political processes taking place. The first part of the course is devoted to basic methods within Text Mining (data preparation, text categorization, clustering, topic modeling, sentiment analysis, visualization), while the second part is about advanced methods (word embeddings, neural networks, large language models).
Type of course
Course coordinators
Learning outcomes
Knowledge | The graduate knows and understands:
WG_01 - to the extent necessary for existing paradigms to be revised - a worldwide body of work, covering theoretical foundations as well as general and selected specific issues - relevant to a particular discipline
within the social sciences
WG_02 - the main development trends in the disciplines of the social sciences in which the education is provided
WG_03 - scientific research methodology in the field of the social sciences
WK_01 - fundamental dilemmas of modern civilisation from the perspective of the social sciences
Skills | The graduate is able to:
UK_05 - speaking a foreign language at B2 level of the Common European Framework of Reference for Languages using the professional terminology specific to the discipline within the social sciences, to the extent enabling participation in an international scientific and professional environment
Social competences | The graduate is ready to
KO_01 - fulfilling the social obligations of researchers and creators
KO_02 - fulfilling social obligations and taking actions in the public interest, in particular in initiating actions in the public interest
KO_03 - think and acting in an entrepreneurial manner
And others: ‒ Student has knowledge about text mining; ‒ Student is familiar with text mining methodology; ‒ Student is able to use knowledge about text mining to conduct his/her own research
Assessment criteria
Description of requirements related to participation in classes, including the
permitted number of explained absences: 2 absences are allowed
Principles for passing the classes and the subject (including resit session): Preparation of a project
Methods for the verification of learning outcomes: Evaluation of the prepared project
Evaluation criteria: A maximum of 100 points can be obtained for the project. In order to receive a positive final grade, it is necessary to receive at least 50 points.
Practical placement
-
Bibliography
Ch. Aggarwal (2022). Machine Learning for Text. Springer; Ch. Aggarwal, Ch.X. Zhai (2012). Mining Text Data. Springer; J. Silge, D. Robinson (2020). Text Mining with R. O’Reilly Media; B. Bengfort, R. Bilbro, T. Ojeda (2018). Applied Text Analysis with Python. O’Reilly
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: