Text analysis in economic research using econometric models and machine learning algorithms 2400-PL3SL338A
Initially, the principles of academic writing will be discussed, including the formulation of research problems, objectives, research questions and hypotheses, conducting literature reviews, selecting appropriate research methods, interpreting results, and, ultimately, the proper preparation and presentation of a Bachelor's dissertation. In addition, the theoretical and methodological foundations of text analysis will be introduced, including sources of textual data, natural language processing methods, econometric models incorporating textual data, and selected machine learning algorithms. Particular attention will be devoted to the methodology of sentiment analysis, topic modelling, and the development of econometric and predictive models based on text-derived variables. Subsequently, the seminar will be conducted in the form of workshops focusing on the practical implementation of research projects, the collection and preparation of textual data, the application of selected analytical methods, the presentation of results, and consultations regarding subsequent stages of the Bachelor's dissertation.
Typ aktywności K (kontaktowe) S (samodzielne)
wykład (zajęcia) 15 0
ćwiczenia (zajęcia) 15 0
egzamin 0 0
konsultacje 30 0
przygotowanie do ćwiczeń 0 15
przygotowanie do wykładów 0 0
praca z materiałami dodatkowymi umieszczanymi na platformie Moodle 0 0
przygotowanie do kolokwium 0 0
przygotowanie do egzaminu 0 0
Razem 60 15 = 75
Course coordinators
Type of course
Learning outcomes
The student is familiar with the principles of preparing a Bachelor's dissertation and possesses knowledge of the correct formulation of research problems, objectives, research questions and hypotheses, conducting literature reviews, selecting appropriate research methods, interpreting results, and the proper academic writing and presentation of a Bachelor's dissertation. The student understands the theoretical and methodological foundations of text analysis and is able to select methods appropriate to a defined research problem. The student understands the characteristics of econometric methods and machine learning algorithms and is able to construct simple models.
Assessment criteria
The requirements for passing the autumn semester are the selection of a Bachelor's dissertation topic, the presentation of the dissertation outline during the seminar, and the submission of a dissertation chapter containing the literature review from which the research hypotheses are derived. The requirement for passing the summer semester is the submission of the completed Bachelor's dissertation.
Bibliography
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media.
Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of machine learning research, 3(Jan), 993-1022.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, DOI: 10.48550/arXiv.1810.04805.
Friedman, J., Hastie, T., & Tibshirani, R. (2009). The elements of statistical learning: data mining, inference and prediction. Springer Series in Statistics.
Goodfellow, I., Bengio, Y., & Courville, A. (2016) Deep learning. MIT Press.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
Kherwa, P., & Bansal, P. (2020). Topic modelling: a comprehensive review. EAI Endorsed transactions on scalable information systems, 7(24).
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4), 1093-1113, DOI: 10.1016/j.asej.2014.04.011.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved from: cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, No. 1, pp. 29-48).
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint, DOI: 10.48550/arXiv.1908.10084.
Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 952-961).
Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural language processing with transformers. Building Language Applications with Hugging Face. 1st Edition. O'Reilly Media.
Wankhade, M., Rao, A.C.S. & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review 55, 5731-5780, DOI: 10.1007/s10462-022-10144-1.