Introduction to Computational Social Science 3700-MSNS-24-ICSS
This course is an introduction to computational methods for social scientists. Therefore, it does not require any previous programming experience. During the first part of the course, students will learn the basics of Python and how it can be useful for research in Social Science. Through simple examples, they will become familiar with such programming concepts in Python as scalar objects, primitive operations, branching, iterations, functions, complex objects, data handling, and ‘memory-independent computing’. The second part of the course will be devoted to answering questions of how new computational methods may be applied to social science and how they can be used to study phenomena that are hard to track with traditional methods. After the course a student should be able to solve very basic computing problems with the use of Python, know what the available tools are, how they work, and how they might be applied to answer questions social scientists may ask. The course will cover basic concepts of computational social science such as how to use external data sources (primarily web-based), most important web data formats, popular computational tools, and environments, working with APIs, webscraping, and Natural Language Processing (NLP). Each topic will be illustrated with real-life examples, and students will have the possibility to not only learn basic concepts and see real-world applications but also apply the methods in practice working on very simple examples. The main aim of the course is to show students what is possible and to equip them with basic concepts and terminology that will allow them to design studies using computational methods and communicate with technical persons (data scientists, programmers, etc.).
The development of the Internet and social media opens a whole new world of possibilities for social scientists to track human behavior. The questions that were hard to handle using traditional methods of data collection now can be addressed. Furthermore, the new possibilities allow for the formulation of new questions and tracking phenomena, which were impossible to follow before. However, the new sources of information require social scientists to work on the verge of social science and computer science. This new area is usually called computational social science. Therefore, social scientists need to learn what type of data is available out there and how to collect it. It does not necessarily mean that they need to learn computer science because they might cooperate with computer scientists, but at least they need to understand the basic concepts to be able to plan adequate research.
This course is an introduction to computational methods for social scientists, therefore, it will introduce basic concepts only. It will not cover advanced methods, techniques, and theories. During the course following topics will be introduced: available data sources, data formats, popular computational tools, and environments, working with APIs, webscraping, and Natural Language Processing (NLP). However, the focus will be not on the technical aspect but on the possible applications for social scientists. Each topic will be illustrated with real-life examples, and students will have the possibility to not only learn basic concepts but also apply the methods in practice working on very simple examples.
The hands-on workshops will be based on prepared scripts that will require only simple configuration from students.
At the end of the course, students will be able to understand basic concepts of computational social science, communicate with data scientists/computer scientists using adequate vocabulary, and foremost formulate research questions that can be addressed with computational methods and/or data extracted from existing web-based data sources.
Type of course
Prerequisites (description)
Learning outcomes
The student who will complete the course will have a basic competence in the Python programming language and will be able to perform basic operations on their own. They will be able to apply the knowledge and skills gained during the course for their own computations and research. Furthermore, they will be able to find resources and improve their skill through self-learning.
By the end of the semester students should be able to:
understand basic concepts of programming such as algorithm, branching, iteration, and ‘memory independent computing’;
know and understand the syntax and semantics of Python programming language;
know and can perform operations on different types of Python programming languages;
can write simple functions in Python;
can handle JSON files in Python;
know how to use Google Colab workspace;
understand the importance of writing readable and reproducible code;
understand basic concepts of computational social science;
communicate with data scientists/computer programmers etc. (using adequate vocabulary);
understand the advantages, challenges, and limitations of computational methods in social sciences;
formulate research questions that can be addressed with computational methods and/or data extracted from existing web-based data sources;
plan research using computational methods (especially webscraping, web API data extraction, and natural language processing);
use materials from the course to scrap a website, work with a simple API, and perform basic Natural Language Processing.
Assessment criteria
The final grade will be determined by two components: six homework assignments and the written research project. The final grade will be the weighted average computed according to the following formula: 45% * (homework) + 55% * (written report) = Total Score.
Written Report – to pass the course students will need to write a research project proposal in which they formulate a research problem that can be addressed with the methods discussed during the course as well as explain how this question may be answered (what data should be collected, how it should be processed and analyzed etc.).
Grades will be assigned according to the following scale:
5 – 90-100% – outstanding performance
4+ – 85-89
4 – 75-84% – good performance
3+ – 70-74
3 – 60-69% – minimum passing performance
2 – 59% or less – performance not suitable for passing
Students are allowed to miss up to 2 classes without any formal excuse (i.e. sick leave). An additional 2 classes can be missed in case of formal excuse. However, students are encouraged to schedule a meeting with the instructor during office hours if they miss a class. An absence does not exempt from doing homework assignments.
Bibliography
Bibliography:
Main
Vallacher, R. R., Read, S. J., & Nowak, A. (Eds.). (2017). Computational social psychology. Routledge.
Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences of the United States of America, 110(15), 5802–5805.
Supplementary
Guttag, J.V. (2021). Introduction to Computation and Programming Using Python. The MIT Press
Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. O'Reilly Media, Inc.
Wickham, H. (2014). Advanced r. Chapman and Hall/CRC.
Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. O'Reilly Media, Inc.
Kosinski, M., Wang, Y., Lakkaraju, H., & Leskovec, J. (2016). Mining big data to extract patterns and predict real-life outcomes. Psychological methods, 21(4), 493--506.
https://docs.python.org/3/tutorial/index.html
https://docs.scrapy.org/en/latest/
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: