Module- Coding [Python ] 2100-SPP-L-D3MOCO
Module 1: Python Fundamentals
- Core Concepts: Variables, data types, operators, lists, dictionaries.
- Control Flow: Conditional statements (if, else), loops (for, while).
- Functions: Defining and calling functions, parameters, return values.
- Modules and Libraries: Introduction to essential libraries (e.g., pandas, NumPy) for data manipulation.
Module 2: Regular Expressions
- Syntax: Character classes, quantifiers, anchors, grouping.
- Pattern Matching: Searching and extracting text based on patterns.
- Applications: Data cleaning, validation, text parsing.
Module 3: Data Scraping
- Web Scraping: Extracting structured data from websites using libraries like Beautiful Soup or Scrapy.
- API Interactions: Fetching data from web services.
- Ethical Considerations: Respecting robots.txt and terms of service.
Module 4: Natural Language Processing (NLP) Fundamentals
- Text Preprocessing: Tokenization, stemming, lemmatization, stop word removal.
- Feature Extraction: Bag-of-words, TF-IDF.
- Text Representation: Word embeddings (e.g., Word2Vec).
Module 5: Text Document Processing
- File Handling: Reading and writing text files in various formats.
- Text Cleaning and Normalization: Removing noise, handling special characters, converting to lowercase.
- Parsing: Extracting structured information from unstructured text.
Module 6: Optical Character Recognition (OCR)
- Tesseract OCR: Introduction and usage for converting images of text to machine-readable text.
- Preprocessing Techniques: Improving OCR accuracy by enhancing image quality.
Module 7: Language Models in Document Analysis
- Sentiment Analysis: Determining the emotional tone of text using pre-trained models or custom classifiers.
- Text Classification: Categorizing documents into predefined classes (e.g., spam/not spam).
- Named Entity Recognition: Identifying and extracting entities like names, locations, and organizations.
Course coordinators
Learning outcomes
Knowledge: The student knows and understands:
- In an advanced degree, methods of collecting, analyzing, and interpreting quantitative and qualitative data used in the process of creating and analyzing political processes [K_W03]
Skills: The student is able to:
- Design and conduct complex social research, particularly in the nature of social diagnosis, select appropriate specialized tools, including analytical tools for research questions and collected data, and justify the choices made [K_U01]
- Design, in team collaboration, a complex study assessing the relevance, effectiveness, and efficiency of a social program, and collect and utilize data for this purpose, including the use of modern IT tools [K_U04]
- Prepare and deliver a written and oral presentation on a selected social problem, including a complex and atypical problem, and propose and justify solutions to it [K_U06]
Assessment criteria
- Homework Assignments: (30 points)
- Group Project: (Code evaluation: 15 points, Presentation: 15 points)
The presentation will be evaluated by invited guests from both the fields of political science and machine learning.
- Passing Grade: To pass the course, students must earn at least 30 points from the group project and homework assignments combined. Successful completion of the group project is mandatory for passing the course.
Bibliography
Lutz, M. (2013). Learning Python (5th ed.). O'Reilly Media, Inc.
Lutz, M. (2011) Programming Python powerful object-oriented programming. O’Reilly Media, Inc.
Pilgrim, M. (2009) Dive into python 3. Apress.
Ascher, D. and Martelli, A. (2002) Python cookbook. O’Reilly.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org
Jurafsky, D. and Martin, J.H. (2009) Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: