![]() |
Human/Computer Vision Symposium 2005 Friday, May 13th at the Beckman Institute |
![]() |
Abstract: Computer vision methods offer a promising alternative to costly laser-based systems in applications ranging from the construction of realistic object models for the film, television, and video game industries, to the quantitative recovery of metric information (metrology) for data analysis. Their relative accuracy has, however, been limited so far to about 1/200 (1mm for a 20cm wide object). Illustrating the promise of research aimed at achieving a much higher level of precision, this presentation describes a method for acquiring high-quality solid models of complex 3D shapes from multiple calibrated photographs. After the purely geometric constraints associated with the silhouettes found in each image have been used to construct a coarse surface approximation in the form of a visual hull, the visual hull is carved to obtain a final 3D model while enforcing photometric and geometric constraints. Our preliminary experiments demonstrate the recovery of very fine surface detail for four test objects.
Abstract: Many computer vision algorithms limit their performance by ignoring the underlying 3D geometric structure in the image. We propose a technique for estimating the coarse geometric properties of a scene by learning appearance-based models of geometric classes. We show that we are able to estimate the geometric properties, even in cluttered natural scenes, and demonstrate the potential usefulness of the geometric context in two applications: object detection and automatic single-view reconstruction.
Abstract: Mediated communication between public spaces is a relatively new concept. One current example of this interaction is video conferencing among people within the same organization. Large scale video-conferencing walls have begun to appear in public or semi-public areas such as workplace lobbies and kitchens. These connections provide a link via audio and/or video to another public space within the organization. When placed in public or semi-public work spaces, they are often designed for casual encounters among people within that community. Thus far, communicating via these systems has not met expectations. Some drawbacks to such systems have been lack of privacy, gaze ambiguity, spatial incongruity, and fear of appearing too social in a work environment. In this talk, I discuss a different goal and approach to linking public spaces. We are not creating a substitute for face-to-face interaction, but rather new modes of conversational and physical interaction within this public space. We address the need for designs best suited for linking public spaces and discuss future work in social computing at UIUC.
Abstract: This talk will focus on texture and object representations based on local scale- and affine-covariant image regions (keypoints), as applied to three problems: (1) recognizing single-texture images, (2) classifying individual regions in multi-texture images, and (3) recognizing object classes. First, I will talk about a texture representation suitable for recognizing images of textured surfaces under a wide range of transformations, including perspective distortions and non-rigid deformations. Second, I will discuss how this representation can be augmented with statistical co-occurrence relations for the task of classifying individual regions in a multi-texture image. Finally, I will describe a weakly supervised approach to learning models for object recognition based on semi-local parts, or spatially coherent, distinctive groups of keypoints. This object representation, in conjunction with a discriminative maximum entropy framework, has shown promising results for multi-category classification tasks.
Abstract: In this talk, we discuss an important problem in computer vision: How to segment an image based on different dynamical motions of the pixels or their 3-D counterparts? The problem becomes particularly interesting and difficult when different types of motions (e.g., linear, bilinear) are simultaneously present in the image. In this talk we will introduce some mathematical techniques that allow us to tackle this problem.
Abstract: Visual selective attention can be conceptualized as a process of competitive interactions among stimuli for the control of responses within a hierarchical neural representation (e.g. Desimone & Duncan, 1995). Such competition is necessary to disambiguate the responses high-level cells, whose large receptive fields would otherwise lead them to conflate the properties of multiple objects. Here, I will review neurophysiological evidence and computational necessity for competitive interaction models of attention, and discuss the psychophysical consequences of such models.
Abstract: Here we establish a statistical framework for studying effective and functional connectivity in the brain using data obtained with a relatively new neuroimaging method, the event-related optical signal (EROS). We apply dynamic factor analysis as a method for testing various structural models on the lagged covariance matrices derived from the EROS data. Examples include modeling propagation of activity from primary to secondary cortex in a visual task, as well as cross-activation and cross-inhibition between hemispheres in a task involving inter-hemispheric competition. We demonstrate how integrity of anatomical connections between the two hemispheres explains different patterns of cross-hemispheric interactions. This approach allows for fitting data to models that capture dynamic cognitive processes as they rapidly evolve over time.
Abstract: A complex scene is readily described as a "city skyline" or a "living room," and this label seems to capture the essence of the scene, leading to expectations of the kinds of objects and features typically present. This "gist" information can minimize demands on attention and memory by guiding attention to relevant locations in a scene and by facilitating retrieval of important scene elements rather than irrelevant ones. Importantly, research using rapid sequences of photographs shows that scene gist is detected after just 100 ms of viewing, suggesting the existence of a rapid, effortless, and possibly automatic gist-extraction mechanism, which then guides other forms of processing such as object detection and identification. To the extent that gist detection occurs prior to object identification, it must rely on different sources of information or processing mechanisms. Yet, behavioral research has only recently begun to examine the information underlying gist perception. In contrast, computational approaches to scene categorization and content-based image retrieval have sought such information in developing similarity metrics to group visually similar images and classification algorithms to distinguish between such groups. Still, such models struggle to classify images by meaning. By combining these largely separate approaches, this project aims to understand how humans effortlessly assign labels to pictures and to enhance computational approaches to scene categorization, with the overriding goal of developing a theory of scene gist acquisition.
Abstract: Functional neuroimaging methods are increasingly used in clinical applications to look for indications of disease. We present an explicit discrimination and classification procedure that operates directly on functional imaging time series data to classify a specific subject into one of two groups.
Abstract: When conversing face to face, we are immediately aware when we have surprised, amused, or perhaps angered our listener. We carry out the task of decoding expressions while continuing our thoughts and conversation, even though the signals listeners provide are often subtle and quick. The processes and percepts which underpin our ability to accurately decode these brief displays of subtle emotion are not widely studied and are poorly understood. Most research on expression recognition uses still photographs of high emotional intensity expressions. A significant obstacle to the study of these behaviors is the lack of controlled, ecologically sound dynamic stimuli and experimental paradigms. Spencer-Smith et. al. (2000) created realistic three-dimensional models of facial movements that can be animated in real-time. Using these tools, we demonstrated that very low emotional intensity expressions are accurately identified when presented in a dynamic fashion, while accuracy with static displays is near chance. Additionally, sad expressions are identified either as sad or happy depending on the direction of presentation (increasing or decreasing emotional intensity). In the last portion of the presentation, I describe recent work on the modeling expression recognition using a geometric framework.