Tutorials

The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.

TUTORIALS LIST

A Guided Tour of Computational Modelling of Visual Attention (VISIGRAPP)
Instructor : Olivier Le Meur

Building Personal AIs with First Person (Egocentric) Vision (VISIGRAPP)
Instructor : Antonino Furnari

A Guided Tour of Computational Modelling of Visual Attention

Instructor

	Olivier Le Meur Univ Rennes CNRS IRISA France

Abstract

Since the first computational model of visual attention, proposed in 1998 by Itti et al. [1], a lot of progress has been made. Progress concern both the modelling in itself and the way we assess the performance of saliency models. Recently, new advances in machine learning, more specifically in deep learning, have brought a new momentum in this field. In this tutorial, we present saliency models as well as the metrics used to assess their performances. In particular, we will empathize new saliency models which are based on convolutional neural networks. We will present different deep architectures and the different loss functions used during the training process. We will conclude this presentation by introducing saccadic models [2,3] which are a generalization of saliency models [1] Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, 20(11), 1254-1259. [2] Le Meur, O., & Liu, Z. (2015). Saccadic model of eye movements for free-viewing condition. Vision research, 116, 152-164. [3] Le Meur, O., & Coutrot, A. (2016). Introducing context-dependent and spatially-variant viewing biases in saccadic models. Vision research, 121, 72-84.

Keywords

Visual attention
Saliency modelling
Eye movements
Deep saliency network

Aims and Learning Objectives

This tutorial aims to present both the history and the latest achievements in the computational modelling of visual attention

Target Audience

Master students, Phd students, postdoc and researchers

Prerequisite Knowledge of Audience

Computer Vision - image processing - machine learning

Detailed Outline

I. Introduction
II. Definitions and concepts of visual attention
III. Non-supervised saliency models
IV. Ground truth definition and methods for assessing saliency models
V. Deep saliency network, a new breakthrough
VI. Saccadic model as a new generation of saliency models
V. Attentive applications
VI. Conclusions

Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org

Building Personal AIs with First Person (Egocentric) Vision

Instructor

	Antonino Furnari University of Catania Italy

Brief Bio Antonino Furnari is an Assistant Professor at the University of Catania. He received his PhD in Mathematics and Computer Science in 2017 from the University of Catania and authored one patent and more than 50 papers in international book chapters, journals and conference proceedings. Antonino Furnari is involved in the organization of different international events, such as the Assistive Computer Vision and Robotics (ACVR) workshop series (since 2016), the International Computer Vision Summer School (ICVSS) (since 2017), and the Egocentric Perception Interaction and Computing (EPIC) workshop series (since 2018) and the EGO4D workshop series (since 2022). Since 2018, he has been involved in the collection, release, and maintenance of the EPIC-KITCHENS dataset series, and in particular in the egocentric action anticipation and action detection challenges. Since 2021, he has been involved in the collection and benchmarking of the EGO4D dataset. He is co-founder of NEXT VISION s.r.l., an academic spin-off the the University of Catania since 2021. His research interests concern Computer Vision, Pattern Recognition, and Machine Learning, with focus on First Person Vision. More information is available at http://www.antoninofurnari.it/.

Abstract

The increasing availability of wearable devices capable of acquiring and processing images and video from the point of view of the user (e.g., Google Glass, Microsoft HoloLens and Magic Leap One) has promoted the interest of the computer vision community on first person (egocentric) vision. Being portable and allowing to mediate the reality as perceived by their users, such devices are ideal candidates for implementing personal intelligent assistants which can understand our behavior and augment our abilities. Unlike standard “third person vision”, which assumes that the processed images and video are acquired from a static point of view neutral to the perceived events, first person (egocentric) vision assumes images and video to be acquired from the rather non-static point of view of the user by means of a wearable device. These unique acquisition settings make first person (egocentric) vision different from standard third person vision. Most notably, the visual information collected using wearable cameras always “tells something” about the user, revealing what they do, what they pay attention to and how they interact with the world. Moreover, wearable devices allow to effortlessly collect huge quantities of user-centric visual data. In this tutorial, we will discuss the challenges and opportunities offered by first person (egocentric) vision, cover the historical background and seminal works, present the main technological tools (including devices and algorithms) which can be used to analyze first person visual data and discuss challenges and open problems.

Keywords

wearable, first person, egocentric, localization, action recognition, action anticipation

Aims and Learning Objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to understand the user’s behavior and build personalized applications. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, action and activity recognition.

Target Audience

First year PhD students, graduate students, researchers.

Prerequisite Knowledge of Audience

Fundamentals of Computer Vision and Machine Learning (including Deep Learning)

Detailed Outline

The tutorial will cover the following topics:
- Outline of the tutorial;
- History of first person (egocentric) vision and motivation;
- Differences between third person and first person vision;
- Wearable devices to acquire/process first person visual data;
- Main problems of interest in first person vision:
- Localization;
- Attention;
- Action recognition;
- Object recognition;
- Activity recognition;
- Action anticipation;
- Indexing and exploitation of egocentric visual data;
- Technological tools (devices and algorithms) which can be used to build first person vision applications;
- Challenges and open problems;
- Conclusions and insights for research in the field.

Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org