Red-teaming, safety and explainability for artificial intelligence systems 1000-2M24RTS
The increasing complexity of artificial intelligence models and systems is forming new interesting challenges regarding the analysis of the security, robustness and behaviour of these models. In this course, we will analyse and discuss current articles and research directions on trustworthy artificial intelligence. Due to the dynamic development of this topic, the subject does not have a fixed programme, but will prioritise research challenges current at the time of the course, in particular articles published at this year's conferences such as CVPR, NeurIPS, ICML, ECML.
Three blocks are expected:
- Extension of the XAI techniques presented in the subject ‘Explainable Machine Learning’ to techniques applied to deep neural network models, including those applied to computer vision tasks and language models.
- Adversarial analysis of models (red-teaming of models) to identify and fix their weaknesses. Attack techniques from the NIST, OWASP and MITTRE frameworks will be discussed in this block.
- Social issues such as bias analysis, ethics issues for AI, predicting non-obvious consequences of implementing AI solutions.
The lecture part consists of 7 two-hour meetings, of which
- two are devoted to explanation methods for deep network models, e.g. counterfactual explanation: introduction to the issue, current methods and tools used; Review of attribution methods in vision / evaluation of explanation in vision
- two are devoted to current methods for red-teaming AI models, e.g. Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models; Red Teaming Language Models with Language Models
- two are devoted to social issues such as bias detection or AI ethics
- one for student presentations
The lab part consists of 7 two-hour meetings in a computer room where students will solve tasks or reproduce results thematically related to the lecture part
The project part is the completion of a larger team research project to conduct an attack/explanation on one of the popular basic models.
Type of course
Requirements
Prerequisites
Prerequisites (description)
Course coordinators
Additional information
Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes:
- Bachelor's degree, first cycle programme, Computer Science
- Master's degree, second cycle programme, Computer Science
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: