Praktikum Vision & Language
On this page is all relevant information regarding the Praktikum Vision and Language (Prof. Dr. Radu Timofte, Prof. Dr. Goran Glavaš ) in the Summer semester of 2023.
Description
The fields of Natural Language Processing and Computer Vision have both greatly advanced in recent years due to improvements in hardware and the huge amounts of data available on the internet. At the intersection of the two modalities text and image, we have the multimodal vision+language field. Here, we tackle a wide range of problems from the text-based image generation, to image search, caption generation, image understanding and more.
How to tackle those tasks? What are the challenges? What are the solutions? What is state-of-the-art? Can we improve it further or reduce existing limitations?
In this Praktikum Vision and Language, we work in groups (1-3 participants), on a vision+language project, explore the current state-of-the-art and devise new ways to use the models as tools, propose improvements over the current approach, or uncover and maybe reduce existing limitations.
Each group is expected to prepare a written review report (10 pages) covering their project and research background and a corresponding oral presentation (around 20-25 minutes, each member has to speak a part).
Each participant will get hands-on and teamwork skills as well as critical analysis, scientific discourse, and preparation, writing, and presentation on a specific vision+language deep learning task.
Objectives
- Each participant will get hands-on and teamwork skills as well as critical analysis, scientific discourse, and preparation, writing, and presentation on a specific vision+language deep learning task.
Prerequisites
- Basic concepts of mathematical analysis and linear algebra.
- Basic knowledge of machine learning and deep learning is helpful.
- Basic programming skills; most of the Praktikum will be Python and use the PyTorch framework.
The course language is English.
Locations and Dates
- The Praktikum (kickoff meeting and presentations) will take place at our office at John Skilton Str. 8A (we are located at the 3rd floor of the Sensalight office there).
- You best reach us from the Skyline Hill Center bus stop. There is also a new path through the fence going from the Oswald-Külpe-Weg bus stop directly to our office which is not yet visible in Google Maps.
- Week 3 Kickoff Meeting (Time TBA)
- Until the next week: group and project assignment finalized
- Bi-weekly meetings with supervisor to report progress and discuss problems
- Juli: Presentations held in blocks (depends on course size)
- 21.07.2023: Deadline for Final Reports, Slides, and Project Code
Contact
- Gregor Geigle (email: gregor.geigle@uni-wuerzburg.de)
- Prof. Dr. Radu Timofte (email: radu.timofte@uni-wuerzburg.de)
- Prof. Dr. Goran Glavaš (email: goran.glavas@uni-wuerzbug.de)
For general questions related to the course, please use the Moodle forum General and only use mail for individual problems or questions.
Image classification with text labels
- Improving the zero-shot and few-shot setting (read: no or very little training data)
- Or: Extending existing methods to the multilingual setting (currently only in English)
Creating a task for "ChatGPT with images"
- Testing the limits of current models
(Challenging) Making your own "ChatGPT with images"
- Combine different existing models for our own model
- Leverage efficient training methods