Vortragsliste:
- 21.02.2012 / ICG / 13:00h
Title:Web-based Augmented Reality
Speaker:Christoph Oberhofer
Abstract:
Augmented Reality (AR) applications are usually built on top of dedicated visual tracking pipelines implemented in a high performance compiled programming language, such as C++, or executed inside optimized runtime-environments, like Adobe Flash. Such implementations tie applications to specific platforms and vendors, making it difficult to provide a single solution for multiple systems. Today's cross-platform AR solutions are mainly based upon proprietary web-technologies lacking of mobile device support or open standards.
Within this thesis, the development, implementation and evaluation of an AR tracking pipeline using natural features is presented. The whole pipeline, starting from camera access to final 3D real-time rendering, is solely based on standard web technologies including HTML5, JavaScript and WebGL. The novelty lies within the completely plugin-free manner of the solution, running in basically each modern web browser on the PC and even on mobile phones.
An extensive evaluation shows that real-time framerates are achieved on entry-level PCs whereas interactive experience is made feasible on high-end smartphones.
- 17.02.2012 / ICG / 10:30h
Title:Fast Scalable Dense Reconstruction of the World from Photos and Videos
Speaker:Jan-Michael Frahm
Abstract:
In recent years photo and video sharing web sites like Flickr and Youtube have become increasingly
popular. Nowadays, every day millions of photos are uploaded. These photos survey the world.
Given the scale of data we are facing significant challenges to process them within a short time
frame given limited resources. In my talk I will present my work on the highly efficient organization
and reconstruction of 3D models from city scale photo collections (millions of images per city) on
a single PC in the span of a day as well as my work on the real-time scene reconstruction from
video. The approaches address a variety of the current challenges to achieve a concurrent 3D
model from these data. For reconstruction from photo collections these challenges are: selecting
the data of interest from the noisy datasets, efficient robust camera motion estimation. Shared
challenges of photo collection based 3D modeling and 3D reconstruction from video are: high
performance stereo estimation from multiple views, as well as image based location recognition for
topology detection. In the talk I will discuss the details of our appearance and geometry based
image organization method, our efficient stereo technique for determining the scene depths from
photo collection images will also be explained during the talk. It allows to perform the scene depth
estimation with multiple frames per second from a large set of views with a considerable variation
in appearance. Additionally, I will discuss some of the lessons learned for how to approach these
large scale challenges in the future.
Jan-Michael Frahm is a Research Assistant Professor at University of North Carolina at Chapel
Hill. He received his Ph.D in computer vision in 2005 from the Christian-Albrechts University
of Kiel, Germany. His Diploma in Computer Science is from the University of Lübeck. Jan-Michael Frahm‘s research interests include a variety of computer vision problems. He has worked
on structure from motion for single/multi-camera systems for static and dynamic scenes to create
3D models of the scene; real-time multi-view stereo to create a dense scene geometry from camera
images; use of camera-sensor systems for 3D scene reconstruction with fusion of multiple orthogonal
sensors; improved robust and fast estimation methods from noisy data to compensate for highly
noisy measurements in various stages of the reconstruction process; high performance feature
tracking for salient image-point motion extraction; and the development of data-parallel algorithms
for commodity graphics hardware for efficient 3D reconstruction. He has published over 60 peer
reviewed papers in international conferences and journals. Jan-Michael Frahm is Editor in Chief
of the Elsevier journal of Image and Vision Computing.
- 31.01.2012 / ICG / 13:00h
Title:Work Toward Thesis Proposal: Online Reconstruction for Augmented Reality (AR)
Speaker:Thanh Nguyen
Abstract:
Part I: Revisit Marker Based Tracking and Marker-less Tracking in AR (20min)
Among tracking solutions in AR, visual tracking seems to be a key and prominent approach. In our last work, we proposed solutions for two niche scenarios: marker based tracking and marker-less tracking for indoor environment ( can be easily extended to outdoor environment up to certain assumptions). These solutions include, relatively to the scenarios, random-dot-uniform-line and a fast-flexible-friendly slam.
Part II: Building Toward A New Tracking Solution (15min)
We have, recently, been developing an online localization and tracking system that uses 3D point cloud (outdoor environment) produced from sparse reconstruction. The system is aimed to run on a typical mobile device and be able to do self-relocalization in real-time. We also foresee that our system can suffer poor localization results because of obstacles from wide base-line feature matching. Therefore, a fast-affine-invariant features detection has been proposed. We will be presenting the current status and our vision toward the thesis proposal.
- 24.01.2012 / ICG / 13:00h
Title:Online Model-Based Multi-Scale Pose Estimation
Speaker:Thomas Kempter
Abstract:
In this thesis we propose a novel model- and point-based pose estimation approach, which is able to operate on multiple scales in real time. We build our work on a state-of-the-art visual Simultaneous Localization And Mapping (SLAM) approach and extend it to exploit metric prior knowledge about the geometry of a single object in the scene. Using keypoint-based localization, which actually tracks the object by its surroundings, we are robust to occlusions and can even determine a pose when the object vanishes completely. Additionally, we refine this pose based on edge information to increase accuracy when the object occupies a certain amount of the image. In our experiments we show an improvement of the mean translational localization error compared to a state-of-the-art SLAM system from 6.1 cm to 1.7 cm for solid objects, and from 6.9 cm to 2.6 cm for wiry objects. Furthermore, when tracking is lost due to a lack of distinctive features, a purely model-based tracking component takes over. This reduces the number of frames for which the pose cannot be estimated by more than 20%. Our approach delivers a metrically correct pose estimate relative to a known object solely based on visual input from a single camera, which is useful for different robotic applications such as the autonomous inspection of a power pylon by an Unmanned Aerial Vehicle (UAV).
- 17.01.2012 / ICG / 13:00h
Title:Short review on Fractal & Chaos Game Theory
Speaker:Mahdi Jampour
Abstract:
- 13.01.2012 / ICG / 13:00h
Title:Convex relaxation of a class of vertex penalizing functionals
Speaker:Thomas Pock
Abstract:
We investigate a class of variational problems that incorporate in
some sense curvature information of the level lines. The functionals
we consider incorporate metrics defined on the orientations of pairs of
line segments that meet in the vertices of the level lines. We
discuss two particular instances: One instance that minimizes the
total number of vertices of the level lines and another instance
that minimizes the total sum of the absolute exterior angles between
the line segments. In case of smooth level lines, the latter
corresponds to the total absolute curvature. We show that these
problems can be solved approximately by means of a tractable convex
relaxation in higher dimensions. In our numerical experiments we
present preliminary results for image segmentation, image denoising
and image inpainting.
(Joint work with Kristian Bredies, Benedikt Wirth)
- 10.01.2012 / ICG / 13:00h
Title:Probabilistic Joint Image Segmentation and Labeling
Speaker:Adrian Ion
Abstract:
We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag, followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset, as well as in VOC2010, where 41.7% accuracy on the test set is achieved.
References:
Probabilistic Joint Image Segmentation and Labeling. A. Ion, J. Carreira, C. Sminchisescu. Neural Information Processing Systems (NIPS), 2011
Image Segmentation by Figure-Ground Composition into Maximal Cliques. A. Ion, J. Carreira, C. Sminchisescu. International Conference on Computer Vision (ICCV), 2011
- 20.12.2011 / ICG / 13:00h
Title:Self-Introduction
Speaker:Vladimir Kanchev
Abstract:
- 13.12.2011 / ICG / 13:00h
Title:Introduction to Straight Skeletons
Speaker:Gernot Walzl
Abstract:
The Straight Skeleton is an internal structure of polygons.
It partitions the interior of a simple polygon with n vertices into
n monotone polygons.
The talk will investigate the problems of computing the straight
skeleton and will give an overview about known algorithms.
Can it be used as a replacement for the medial axis?
- 06.12.2011 / ICG / 13:00h
Title:Free Viewpoint Virtual Try-On With Commodity Depth Cameras
Speaker:Stefan Hauswiesner
Abstract:
We present a system that allows users to interactively control a 3D
model of themselves at home using a commodity depth camera. It
augments the model with virtual clothes that can be downloaded. As a
result, users can enjoy a private, virtual try-on experience in their
own homes. As a prerequisite, the user needs to enter or pass through
a multi-camera setup that captures him or her in a fraction of a
second. From the captured data, a 3D model is created. The model is
transmitted to the user's home system to serve as a realistic avatar
for the virtual try-on application. The system provides free-viewpoint
high quality rendering with smooth animations and correct occlusion,
and therefore improves the state of the art in terms of quality. It
utilizes cheap hardware and therefore is affordable for and accessible
to a wide audience.
- 29.11.2011 / ICG / 13:00h
No talk
- 22.11.2011 / ICG / 13:00h
Title:ISMAR Recap
Speaker:VR/AR group
Abstract:
- 16.11.2011 / ITI Geb. 16 1 OG / 13:00h
Title:Physical Qualities of Interaction
Speaker:Prof. Andreas Butz, LMU
Abstract:
While there are well established and widely accepted interaction
concepts for the personal computer world, a comprehensive interaction
concept for the often predicted aera of ubiquitous computing is still
missing. When computers and digital artifacts mix with our physical
environment, physicality seems to be a promising candidate to form the
basis of such an interaction concept. In this talk, I will show some of
our investigations into the design space between physical and digital
worlds, and we can speculate together, where this may lead.
- 15.11.2011 / ICG / 13:00h
Title:Recovery of Depth Information Using Paired Optical and Thermal Images
Speaker:Peter Pinggera
Abstract:
This thesis deals with the recovery of dense depth information from thermal (far
infrared spectrum) and optical (visible spectrum) images using computational stereo
techniques. Systems which originally employ optical and thermal cameras separately
could benefit from the obtained depth information based on the inherent stereo setup
and without the need for additional hardware. However, the large differences in the
characteristics of cross-spectral images make this task significantly more difficult
than for the common optical stereo case. As a result no method has been proposed
in previous work which is able to solve the considered problem. In this work we
therefore investigate if a solution can be achieved by utilizing novel approaches as
well as methods suggested in literature.
A modular framework based on a common taxonomy of stereo algorithms is im-
plemented as a basis for the conducted experiments. The most crucial aspect is
the definition of robust matching cost measures which are able to describe local
similarities between the cross-spectral images. Furthermore powerful optimization
techniques prove to be essential for the computation of valid depth estimates.
We implement, test and evaluate state-of-the-art robust matching cost methods
and compare their performance with novel approaches. The influence of combina-
tions with different types of optimization techniques is also investigated. Tests are
performed on simulated as well as real cross-spectral stereo data, including both
still images and video sequences. A qualitative evaluation and a comparison with
standard optical stereo results shows that through the introduced approaches very
coarse but largely valid dense depth estimates can indeed be achieved. We obtain
best results by using distances between dense descriptors based on histograms of
unsigned oriented image gradients (HOG and DAISY descriptors) as a matching
cost in combination with semi-global matching optimization. In all our experiments
this approach outperforms methods which have previously been suggested for use
in such a scenario like mutual information or dense local self-similarity descriptors.
- 08.11.2011 / CGV / 13:00h
Title:tba
Speaker:Prof. Anders Hast
Abstract:tba
- 25.10.2011 / ICG / 13:00h
Title:Spatio-Temporal Video Processing
Speaker:Manuel Werlberger
Abstract:
Part I:
The ability to generate intermediate frames between two given images in a video sequence is an essential task for video restoration and video post-processing. In addition, restoration requires robust denoising algorithms, must handle corrupted frames and recover from impaired frames accordingly. In this paper we present a unified framework for all these tasks. In our approach we use a variant of the TV-L1 denoising algorithm that operates on image sequences in a space-time volume.
Part II:
There is a general trend to use space-time volumes for video processing. As motion is the essential feature for almost any video processing task it is favourable to imply the temporal information already at the motion estimation stage. We demonstrate an approach to directly compute trajectories of arbitrary ordering.
- 18.10.2011 / ICG / 13:00h
Title:Indoor Navigation with Mixed-Reality Views and Sparse Localization
Speaker:Alessandro Mulloni
Abstarct:
We present our recent work on supporting indoor navigation with Mixed Reality when continuous localization is not possible. We combine activity-based instructions with sparse localisation at selected info points in the building. Based on localisation accuracy the interface adapts the visualisation by changing the density and quality of information shown. We refine and validate our designs through user involvement in a series of user studies. Our results validate our design and show that info points act both as confirmation points and as overview points.
- 11.10.2011 / CGV / 13:00h
Title:Dynamic Illumination for Robust Microscopic 3D Metrology
Speaker:David Ferstl
Abstract:
Traditional microscopic shape from focus reconstruction is often limited by the surface dynamic and the texture of the analyzed specimen. In many real-world applications, surfaces have a strong varying reflectance leading to saturated image parts, or lack in detectable texture. In such cases, shape from focus generates incorrect and sparse depth maps. We present a novel method to eliminate these vulnerabilities without additional reconstruction time. Beyond that, we propose a novel method to further reduce the computational costs of traditional shape from focus to a minimum. To overcome the problems of high reflectance differences and lacks in texture we use a projector-camera system to compensate the reflectance variations and additionally project measurable texture. The surface reflection is compensated by a local adaption of the illumination for every acquisition. To reduce measurement time, the compensation pattern is tracked through the image stack and is updated in a prediction-correction step. The exact projector pattern to create additional texture is determined through a detailed analysis of the focus measure operator and the optical effects during the projection. The additional reduction in measurement time is achieved with a novel focus measure which calculates the focus through a comparison of an estimated all-in-focus image and the stack images by normalized cross correlation. Therewith, the depth estimation of each surface point in the shape from focus algorithm stops if a local focus maximum beyond a predefined threshold is found. The experiments show, that our method outperforms the traditional shape form focus algorithm and is also a performance enhancement to comparable methods like high dynamic range imaging in terms of speed and accuracy.
- 04.10.2011 / ICG / 13:00h
Title:Video-Based Human Body Action Recognition for Games
Speaker:Markus Murschitz
Abstract:
If a person performs an action (like walking), and is captured by a camera, it turns out that the captured
video contains patterns corresponding to the action performed. Such is exploited by the field of Human
Body Action Recognition, where videos are classified using computer vision algorithms. Human Body
Action Recognition is an active research topic. As with various modern-day human-computer-interface
technologies one of the first applications is the gaming industry.
The aim of this work is to utilize action-classifications to control a computer game, where a specific
action corresponds to a specific keystroke pressed in the game. While many published works address the
topic of video-based action recognition, there are not many works, which are able to perform it in real
time, which is a crucial requirement for any gaming application. Another requirement is to be able to the
detect the number of repetitions an action has been performed.
In this work this two requirements are addressed. The real-time requirement is addressed by exploiting
the massively parallel computing capabilities of modern graphics cards for dense feature extraction and
actor detection. Where the features are Histograms of Oriented Gradients (HOG) and Local Binary Patterns
(LBP) on appearance and the Histogram of Flow-Orientations (HOF) and Local Binary Patterns on
Flow-magnitude (LBFP). The motion and flow information of the actor are transformed into a prototype
per frame. The sequence of prototypes is analyzed by subsequence-matching to known action-specific
prototype sequences. This results in an action classification and information about their temporal alignment,
which is used to perform repetition detection. The repetition detection and action classification are
performed by utilizing Dynamic Time Warping (DTW) as a distance measure.
Evaluations are performed on several public available datasets and compared to results of other works.
It turns out that the system leads to comparable (but slightly inferior) results which are accomplished in
real time. Finally a prove of concept is given by incorporating the full evaluation pipeline with a game,
where the evaluation pipeline works as a substitute for a keyboard.
Title:Automatically Generated Transfer Function Galleries for High Dimensional Multivariate Data
Speaker:Markus Muchitsch
Abstract:
This thesis deals with the design of transfer functions for the visualization of volumetric data sets by means of direct volume rendering. Thereby, we focus on the visualization of medical data sets. Transfer functions define, which structures of a data set are visible and how they appear. To this, contained voxels are assigned optical properties such as color and opacity depending on their data values. A scalar data set is frequently insufficient for an unambiguous classification of different structures, as they are often defined by overlapping value ranges. A better discriminability can be achieved using multivariate data sets in which each voxel is described by multiple parameters. For their visualization multi-dimensional transfer functions are required. The design of simple one-dimensional transfer functions by means of primitive editors already entails a time consuming, error-prone and inefficient trial and error approach. At the creation of multi-dimensional transfer functions this issue drastically intensifies. In any case users require knowledge about the technical background of transfer functions as well as the data set to be visualized. Furthermore, a lot of experience in handling the chosen editor is needed to gain suitable results. This work presents a system with the primary goal to allow a simple, intuitive and efficient creation of transfer functions. Using special histogram calculations value ranges of structures in the transfer function space are detected. Based on them, initial transfer functions are created. This method spares the tedious trial and error approach of the manual editor. In addition, only minimal knowledge on the data set to be visualized is required. Thumbnails created from the initial transfer functions are arranged in a clear as well as simply and efficiently navigable gallery. For their implementation two approved user interface concepts, known for their good usability, have been selected and adapted to enable efficient transfer function design. By interacting with the thumbnails depicted in the gallery, transfer functions associated to them can be adapted and combined within a reasonably constrained scope. Since interactions are not performed directly in the transfer function space, immediate knowledge about the impact of transfer functions on the visualization of a data set is not necessary. Even for users experienced in handling primitive editors, this technique entails a significant acceleration of the design work and improvement of the created results. The possibility for further adaptation and combination of transfer functions is important to create visualizations according to the current task. For the distinction of structures with overlapping value ranges, the presented system works with multi-dimensional separable transfer functions. Using conventional multi-dimensional transfer functions, a reasonable interaction is only possible with up to three dimensions. In contrast, separable transfer functions allow the application of an arbitrary amount of dimensions. These can be adapted separately and are combined according to a certain scheme. The complexity, increasing with the dimensionality of separable transfer functions, is hidden by the user interface of the developed system. Erroneous classifications, potentially occurring at the application of separable transfer functions, are prevented by means of a color test performed in the used render system. Weaknesses identified at the color test based evaluation are addressed with a preliminary prototype for improved evaluation of multi-dimensional separable transfer functions.
- 15.03.2011 / ICG / 13:00h
Title: Enforcing topological constraints in random field image segmentation
Speaker:Chao Chen
Abstract:
We introduce a new way to integrate knowledge about topological
properties (TPs) into random field image segmentation model. Instead of
including TPs as additional constraints during minimization of the
energy function, we devise an efficient algorithm for modifying the
unary potentials such that the resulting segmentation is guaranteed with
the desired properties. Our method is more flexible in the sense that it
handles more topology constraints than previous methods, which were only
able to enforce pairwise or global connectivity. In particular, our
method is very fast, making it for the first time possible to enforce
global topological properties in practical image segmentation tasks.*
- 04.03.2011 / ICG / 13:00h
Title: Visualization of Multimodal Volume Data to Diagnose Cardiac Disease
Speaker:Gitta Domik, Stephan Arens, Nico Bredenbals , Research Group “Computer Graphics, Visualization and Image Processing”, Department of Computer Science, University of Paderborn,
Germany
Abstract:
In corporation with the Heart- and Diabetes Centre of North-Rhine
Westphalia we develop the software package Volume Studio to support
diagnosis of cardiac disease using medical volume data. Our flexible,
GPU-based framework is able to combine several volume data sets in form
of different modalities (e.g. CT and PET), different metrics (e.g. size,
curvature, vesselness, texture) and segmentations. Hence complex
compositing of multimodal volumes, defined by a pipeline of transfer
functions, sampling elements, and logical operations, is possible at
runtime.
We also present work-in-progress on a heart model to increase the
effectiveness of diagnosis and on the use of controlled experiments to
test the effectiveness of various visualizations (e.g. transfer
functions, Curved Planar Reformations).*
- 14.02.2011 / ICG / 16:00h
Title: Minimal representations for the estimation of uncertain projective entities
Speaker: Prof. Dr.-Ing. Wolfgang Förstner, Institut für Geodäsie und Geoinformation, Universität Bonn.
Abstract: Estimation using homogeneous entities has to cope with obstacles such as singularities of covariance matrices and redundant parameterisations which do not allow an immediate definition of maximum likelihood estimation and lead to estimation problems with more parameters than necessary. The talk presents a representation of the uncertainty of all types of geometric entities and estimation procedures for geometric entities and transformations which (1) only require the minimum number of parameters, (2) are free of singularities, (3) allow for a consistent update within an iterative procedure, (4) enable to exploit the simplicity of homogeneous coordinates to represent geometric constraints and (5) allow to handle geometric entities which are at infinity or at least very far, avoiding the usage of concepts like the inverse depth.
We discuss the concept and show its usefulness for bundle adjustment, estimating vanishing points or 3D lines from 3D points and for determining 3D lines from observed image line segments in a multi view setup.
- 11.02.2011 / ICG / 13:00h
Title:A Short Overview of Work on “Interactive 3D Graphics and Games” at the University of Paderborn
Speaker:Gitta Domik, Research Group “Computer Graphics, Visualization and Image
Processing”, Department of Computer Science, University of Paderborn,
Germany
Abstract:In this (very short) talk I will give an overview of our work in the
area of „Interactive 3D Graphics & Games“, which has established itself
in two main areas:
(a) Real-time graphics in medicine, where we concentrate on multi
modality imaging and visualization to diagnose cardiac disease through
volume visualization (e.g. through transfer functions)
(b) Serious Games to support exposure therapy for children traumatized
by traffic accidents.
A third area we work in is the development of competency for
transdicisplinary collaboration in graduate students.
- 08.02.2011 / ICG / 11:00h
Title:Coherent Image-Based Rendering of Real-World Objects
Speaker:Stefan Hauswiesner
Abstract:Many mixed reality systems require the real-time capture and re-rendering of the real world to integrate real objects more closely with the virtual graphics. This includes novel view-point synthesis for virtual mirror or telepresence applications. For real-time performance, the latency between capturing the real world and producing the virtual output needs to be as little as possible. Image-based visual hull (IBVH) rendering is capable of rendering novel views from segmented images in real time. We improve upon existing IBVH implementations in terms of robustness and performance by reformulating the tasks of major components. Moreover, we enable high resolutions and little latency by exploiting view- and frame coherence. The suggested algorithm includes image warping between successive frames under the constraint of redraw volumes. These volumes form a boundary of the motion and deformation in the scene, and can be constructed efficiently by describing them as the visual hull of a set of bounding rectangles which are cast around silhouette differences in image-space. As a result, our method can handle arbitrarily moving and deforming foreground objects and free viewpoint motion at the same time, while still being able to reduce workload by reusing previous rendering results.
Title:Context, social aspects and the use of sensors in mobile computing
Speaker:Mariusz Nowostawski
Abstract:During the talk Mariusz will briefly introduce the Information Science
department of the University of Otago, faculty research interests and
ongoing projects. The talk will present a number of context-aware mobile
computing projects and the use of sensors in different areas. Mariusz
will briefly discuss past projects such as: virtual stickies, fall
detection, smoke alarm notification and Parkinson disease tremor
studies. The talk will finish with the outlook on ongoing projects in
the area of human activity tracking and life analytics.
- 25.01.2011 / ICG / 13:00h
Title:Mobile Augmented Reality Campus Guide
Speaker:Claus Degendorfer
Abstract:Smartphones are becoming increasingly interesting as a mobile Augmented Reality
(AR) platform over the past few years because of improved hardware resources,
including processing power, memory capabilities, built-in cameras and GPS sensors. With such devices it is possible to create mobile AR information systems which provide augmented reality anywhere, at anytime. Some limitations of current AR systems are a lack of appropriate AR content, the inaccuracy of current sensorbased annotation matching approaches and the poor matching rates of vision-based approaches under changing environment conditions which we want to address in this work.
We therefore developed a system to enable end-users to create textual AR annotations which provide information about the surrounding environment on a global scale. Furthermore, we investigated the possibilities of vision-based annotation matching and implemented three different improvements to annotation matching. Tests showed that a combination of these improvements can increase the annotation matching rate under difficult lighting situations by up to 50 %. This work was therefore one step in the evolution of mobile AR information systems.
Title:A Convex Approach to Layered Motion and Stereo
Speaker:Markus Unger
Abstract:Currently there is a trend towards layered motion models for optical flow estimation. A lot of the top performers on the Middlebury database already use some form of layers. In this talk we discuss advantages and disadvantages of a layered motion model. We present a novel approach that can handle a large number of layers (more than 100 times of current approaches). Our model consists of the Potts model
on top of parametric layers with an additional layer for occlusion. We show that we can realize two common
occlusion models by means of convex constraints in the Potts model. This allows us to jointly optimize for the layers and occlusions. We present some preliminary applications and results.
As this is ongoing work vital discussions are welcome!
- 18.01.2011 / ICG / 13:00h
Title:Learning Transformation Invariant Representations from weakly-related Videos for Tracking and Detection
Speaker:Samuel Schulter
Abstract: For current computer vision systems, object detection and tracking is a very challenging task, whereas humans perform very well on both of them. This comes from the fact that these systems have to cope with all variations and transformations that occur in natural scenes, such as shape, appearance, different illuminations and occlusions. In general, machine learning algorithms learn hypotheses based on labeled training data to correctly separate unseen test data according to their class, i.e., positive or negative. In computer vision, this principle also holds for detection algorithms as well as for tracking approaches that are based on classifiers. However, the amount of labeled training data available is often too small to capture all possible object transformations and intra-class variations, what makes generalization a hard task. In contrast, unlabeled data exist in large amounts and are typically easy to collect. But the extraction of useful informations from unlabeled data is difficult in practice, as they often stem from different distributions than the labeled data, i.e.. they are only weakly-related towards the target class.
Therefore, we exploit videos as source of unlabeled data, because they comprise an underlying structure given by real-world constraints, which is the space-time coherence of naturally moving objects. This fact makes them more informative than a heterogenous collection of single images.
That is, observing objects that undergo natural transformations allows for learning representations that are more transformation invariant, although the object labels are unknown. The main intent of this Master's Thesis is to incorporate video data to a state-of-the-art object detection and tracking system with the goal to learn more in- variant object representations and to yield better generalization performance. Based on a Random Forest framework, we define an optimization problem, which also involves data containing local transformations of naturally moving objects extracted from video sequences. We gathered real-world video sequences from the web and applied a dense optical flow, in order to extract useful motion information from video data. The evaluation of our methods shows that we can improve the generalization performance in object detection and tracking.
Title:3D OBJECT CATEGORIZATION WITH PROBABILISTIC CONTOUR MODELS
Speaker:Kerstin Pötsch
Abstract:We present a probabilistic framework for learning 3D contour-based category models
represented by Gaussian Mixture Models. This idea is motivated by the fact that even small sets of contour
fragments can carry enough information for a categorization by a human. Our approach represents an
extension of 2D shape based approaches towards 3D to get a pose-invariant 3D category model. We
reconstruct 3D contour fragments and generate what we call `3D contour clouds' for specific objects. The
contours are modeled by probability densities, which are described by Gaussian Mixture Models. Thus, we
obtain a probabilistic 3D contour description for each object. We introduce a similarity measure between
two probability densities which is based on the probability of intra-class deformations. We show that a
probabilistic model allows for flexible modeling of shape by local and global features. We show that even
with small inter-class difference it is possible to learn one 3D Category Model against another category
and thus demonstrate the feasibility of 3D contour-based categorization.
Title:Robust Multi-View Reconstruction from Highly Redundant Aerial Images
Speaker:Markus Rumpler
Abstract:This thesis investigates and presents robust multi-view matching methods to produce dense depth maps from highly redundant imagery. We investigate in several experiments the influence of different cost functions and cost aggregation schemes on the results of multi-view depth matching in a plane sweep framework. The evaluation includes local and global optimization methods. The main contribution of this thesis is an extension of the highly efficient TV-L1 optical flow algorithm that includes the epipolar constraint. While correspondence computation is still performed between pairs of images, we present a method for correspondence linking between nearby views. This enables the use of measurements from all neighboring views used for matching and provides wider baselines for robust and accurate triangulation. We provide evaluation results of the proposed method and present its performance in contrast to a standard plane sweep approach. The benefits include less computation time and memory costs, continuous results instead of discrete depth estimates and comparable but in most cases even better accuracy. It requires no or just little user guidance, thus our design is capable for integration into a fully automatic reconstruction pipeline.
- 11.01.2011 / ICG / 13:00h
Title:Learning Potentials for Game Theory Based Graph Matching and Applications to Object Localization
Speaker:Michael Donoser
Abstract: This talk focuses on the graph matching problem of finding correspondences
between two point sets using unary and pairwise potentials which analyze
local descriptors and geometrical compatibility. Recently it was shown that
optimal parameters for the features used in the unary potentials can be
learned, which significantly improves results in supervised and unsupervised
settings. It was demonstrated that even linear assignments (not considering
geometry) with well learned potentials may improve over state-of-the-art
quadratic assignment solutions. In this work we focus on two extensions of
such methods. First we show that is also possible to directly learn pairwise
potentials in terms of kernels functions for pairs of points in a supervised
setting (using a statistical shape model) which significantly improves
matching quality (up to 25%) in a priori known scenarios. Second, we
describe a graph matching optimization formulation based on finding an
evolutionary stable strategy which provides accurate assignments even in
cases of a large number of outliers (outperforming related spectral
approaches). Experiments on synthetic point sets, face alignment datasets
and an application in the area of object localization demonstrate the broad
applicability of the method.
- 15.12.2010 / ICG / 14:30h
Title:A Bayesian Approach to Variational Methods
Speaker:Rene Ranftl
Abstract: Variational models are among the most successful methods for low-level Computer Vision tasks today. While such models can be derived and formulated in a completely deterministic setting, they nonetheless have a deep connection to the probabilistic framework of Bayesian inference. This thesis highlights this connection and the advantages that a probabilistic approach to variational methods can have. A fundamental question in variational models is the formulation of an appropriate image model. A
especially popular image model is given by the Total Variation prior due to its edge preserving properties.
It will be shown that the usually employed energy minimization approach is not able to fully exploit the
properties of the underlying models if such a image prior is used. An alternative approach that is based
on Bayesian estimation is introduced and the connections to energy minimization are highlighted.
The proposed estimator is defined by a very high-dimensional integral that can not be solved with de-
terministic numerical integration algorithms. To tackle this problem, the framework of Markov Chain
Monte Carlo (MCMC) integration is introduced and refined into an algorithm that is specifically tai-
lored to the needs of image processing. To speed up the computations, a parallelization scheme and an
implementation on graphics processing hardware is proposed.
We show the advantages of the proposed algorithm over the energy minimization approach on convex
image reconstruction models. For non-convex models the MCMC approach allows for global optimiza-
tion. Our experiments on different models for motion estimation and stereo reconstruction show that
such a global optimization approach is not only feasible but also provides superior results.
- 07.12.2010 / ICG / 13:00h
Title:Large-Scale Robotic SLAM through Visual Mapping
Speaker:Christof Hoppe
Abstract:Simultaneous Localization and Mapping (SLAM) in a
three-dimensional environment is an essential requirement for autonomous
mobile robots to accomplish high level tasks. An emerging sensor for SLAM
is the digital camera, because it is cheap, small, has low weight and can
be applied in many different application areas like marine, aerial or land robotics.
Today's camera-based solutions, called \textit{visual SLAM}, are limited to small
environments like desktop or office scenes because of geometric error propagation and
limited scalability.
In this master thesis, we developed a SLAM system that allows us to handle
large-scale environments using a stereo-camera mounted on a wheeled robot.
Our approach extends a keyframe-based method for augmented reality applications
by adding appearance-based loop detection and correction. Furthermore, we propose
a method for incorperating other sensor information like odometry into the visual
SLAM framework. We are hereby able to preserve connectivity between camera
poses even if visual features are absent. To maintain map accuracy without
sacrificing excessive computation time, we combine feature descriptors of
different strength for data association.
In the experiments, we show that our approach is able to handle trajectories of
several hundred meters and containing several thousand visual features. The resulting
three-dimensional maps have correct metric scale. The absolute trajectory error is
below one percent. On a standardized benchmark dataset providing groundtruth
trajectories, our system outperforms other visual SLAM algorithms by a factor of two.
- 02.12.2010 / ICG / 11:00h
Title:The Narcissistic Robot: Robot Calibration Using a Mirror
Speaker:Matthias Rüther
Abstract:We present a novel method for calibration of a robotic manipulator. The robot kinematic chain and its tool are observed by a hand mounted camera through a mirror. We demonstrate that this enables hand-eye, hand-tool, and kinematic robot calibration without incorporating accurate external
references, except the mirror. Using this particularly simple setup, hand-eye calibration becomes independent of the kinematic chain and parameter observability constraints in kinematic calibration become more relaxed, which makes pose planning for robot calibration more convenient.
- 26.11.2010 / ICG / 09:30h
Title:Real-Time Monocular SLAM and Dense Reconstruction
Speaker:Andrew Davison and Richard Newcombe
Abstract:Recent advances in probabilistic Simultaneous Localisation and Mapping
(SLAM) algorithms, together with modern computer power, have made it
possible to create practical systems able to perform real-time
estimation of the motion of a single camera in 3D purely from the image
stream it acquires. This is of interest in robotics, but also in other
fields like wearable computing and augmented reality. We will review our
research on visual SLAM over the past few years, and present new developments aimed at the challenges of estimating camera motion for very rapid or large scale motion. In particular we will highlight new work which harnesses GPGPU processing power and variational algorithms in order to recover dense scene models in real-time as a camera browses a natural scene.
- 16.11.2010 / ICG / 13:00h
Title:On a first-order primal-dual algorithm
Speaker:Thomas Pock
Abstract:Variational methods have proven to be particularly useful to solve a num-
ber of ill-posed inverse imaging problems. In particular variational methods
incorporating total variation regularization have become very popular for a
number of applications. Unfortunately, these methods are difficult to mini-
mize due to the non-smoothness of the total variation. The aim of this pa-
per is therefore to provide a flexible algorithm which is particularly suitable
for non-smooth convex optimization problems in imaging. In particular, we
study a first-order primal-dual algorithm for non-smooth convex optimiza-
tion problems with known saddle-point structure. We prove convergence to a
saddle-point with rate O(1/N) in finite dimensions for the complete class of
non-smooth problems we are considering in this paper. We
further show accelerations of the proposed algorithm to yield improved rates
on easier problems. In particular we show that we can achieve O(1/N^2) con-
vergence on problems, where the primal or the dual objective is uniformly
convex, and we can show linear convergence, i.e. O(w^N), w<1 on problems where
both are uniformly convex. The wide applicability of the proposed algorithm
is demonstrated on several imaging problems such as image denoising, image
deconvolution, image inpainting, motion estimation and image segmentation.
- 05.11.2010 / ICG / 09:00h
Title:An Omnidirectional Time-of-Flight Camera and its Application to Indoor SLAM
Speaker:Katrin Pirker
Abstract:Photonic mixer devices (PMDs) are able to create reliable depth maps of indoor environments. Yet, their application in mobile robotics, especially in simultaneous localization and mapping (SLAM) applications, is hampered by the limited field of view. Enhancing the field of view by optical devices is not trivial, because the active light source and the sensor rays need to be redirected in a defined manner. In this work we propose an omnidirectional PMD sensor which is well suited for indoor SLAM and easy to calibrate. Using a single sensor and multiple planar mirrors, we are able to reliably navigate in indoor environments to create geometrically consistent maps, even on optically difficult surfaces.
Title:Interactive Multi-Label Segmentation
Speaker:Jakob Santner
Abstract:This paper addresses the problem of interactive multi-label segmentation. We propose a powerful new framework using several color models and texture descriptors, Random Forest likelihood estimation as well as a multi-label Potts-model segmentation. We perform most of the calculations on the GPU and reach runtimes of less than two seconds, allowing for convenient user interaction. Due to the lack of an interactive multi-label segmentation benchmark, we also introduce a large publicly available dataset. We demonstrate the quality of our framework with many examples and experiments using this benchmark dataset.
- 02.11.2010 / ICG / 13:00h
Title: ERSTELLUNG VON SEMANTISCHEN GEBÄUDEMODELLEN AUS CAD-PLÄNEN
Speaker: Martin Mörth
Abstract:Ein semantisches Gebäudemodell vereint sämtliche, für den Anwender relevanten Daten zu einem Gebäude, in einem einzigen Modell. Ein solches Modell wäre für viele Applikationen in der Gebäudetechnik eine hervorragende Basis. Neben Visualisierungen für Sicherheits- und Gebäudeleitstände könnten auch übergeordnete Aufgaben zur Steuerung des Gebäudes einfacher umgesetzt werden. Wird ein Gebäudemodell zur zentralen Bezugsquelle für gebäudespezifische Informationen, bringt dies viele Vorteile mit sich. Bei Änderungen am System würde ein fehleranfälliges Aktualisieren von dezentralen Datenbeständen entfallen.
Viele wissenschaftliche Arbeiten beschäftigen sich mit der Problematik Gebäudemodelle automatisch aus digitalen und analogen CAD-Plänen zu erfassen. Aufbauend auf diesen Arbeiten, wird ein Ansatz zur Erfassung von semantischen Gebäudemodellen aus CADPlänen erarbeitet. In einer Bereinigungsphase werden zunächst topologische Fehler in der geometrischen Repräsentation der Pläne korrigiert. Symbole, die in speziellen Schichten der Pläne zur Repräsentation von Türen und Fenstern eingezeichnet sind, werden zur semantischen Anreicherung der Plandaten verwendet. Von den geometrischen Elementen des Plans aufgespannte Flächen werden ermittelt und anhand dieser semantischen Attribute klassifiziert.
Das so gewonnene semantische Flächenmodell eines Gebäudes wird in weiterer Folge in einer relationalen Datenbank abgespeichert. Es kommt dabei ein Datenbanksystem zum Einsatz, das die Repräsentation geometrischer Merkmale in den Tabellen unterstützt.
Spezielle Funktionen können in Abfragen verwendet werden, um einen Mehrwert aus der geometrischen Repräsentation der Objekte zu erzielen.
Im letzten Teil der Arbeit soll der praktische Nutzen des Gebäudemodells verdeutlicht werden. In einer Beispielanwendung werden aus den Daten der relationalen Datenbank 3D-Modelle zur Visualisierung des repräsentierten Gebäudes generiert.
- 19.10.2010 / ICG / 13:00h
Title: Comparative Analysis of Multidimensional, Quantitative Data
Speaker: Alexander Lex
Abstract:When analyzing multidimensional, quantitative data, the comparison of two or more groups of dimensions is a common task. Typical sources of such data are experiments in biology, physics or engineering, which are conducted in different configurations and use replicates to ensure statistically significant results. One common way to analyze this data is to filter it using statistical methods and then run clustering algorithms to group similar values. The clustering results can be visualized using heat maps, which show differences between groups as changes in color. However, in cases where groups of dimensions have an a priori meaning, it is not desirable to cluster all dimensions combined, since a clustering algorithm can fragment continuous blocks of records. Furthermore, identifying relevant elements in heat maps becomes more difficult as the number of dimensions increases. To aid in such situations, we have developed Matchmaker, a visualization technique that allows researchers to arbitrarily arrange and compare multiple groups of dimensions at the same time. We create separate groups of dimensions which can be clustered individually, and place them in an arrangement of heat maps reminiscent of parallel coordinates. To identify relations, we render bundled curves and ribbons between related records in different groups. We then allow interactive drill-downs using enlarged detail views of the data, which enable in-depth comparisons of clusters between groups. To reduce visual clutter, we minimize crossings between the views. This paper concludes with two case studies. The first demonstrates the value of our technique for the comparison of clustering algorithms. In the second, biologists use our system to investigate why certain strains of mice develop liver disease while others remain healthy, informally showing the efficacy of our system when analyzing multidimensional data containing distinct groups of dimensions.
- 07.10.2010 / ICG / 16:00h
Title: Regression Forests for Efficient Anatomy Detection and Localization in CT Studies
Speaker: Antonio Criminisi
Abstract: Paper
- 05.10.2010 / ICG / 13:00h
Title: Image-based Ghostings for Single Layer Occlusions in Augmented Reality
Speaker: Stefanie Zollmann
Abstract: In augmented reality displays, X-Ray visualization techniques
make hidden objects visible through combining the physical view
with an artificial rendering of the hidden information. An important
step in X-Ray visualization is to decide which parts of the physical
scene should be kept and which should be replaced by overlays.
The combination should provide users with essential perceptual
cues to understand the relationship of depth between hidden
information and the physical scene.
In this paper we present an approach that addresses this decision
in unknown environments by analyzing camera images of the
physical scene and using the extracted information for occlusion
management. Pixels are grouped into perceptually coherent image
regions and a set of parameters is determined for each region. The
parameters change the X-Ray visualization for either preserving existing
structures or generating synthetic structures. Finally, users
can customize the overall opacity of foreground regions to adapt
the visualization.
Title: The City of Sights: Design, Construction, and Measurement of an Augmented Reality Stage Set
Speaker: Lukas Gruber
Abstract:We describe the design and implementation of a physical and virtual
model of an imaginary urban scene—the “City of Sights”—
that can serve as a backdrop or “stage” for a variety of Augmented
Reality (AR) research. We argue that the AR research community
would benefit from such a standard model dataset which can be
used for evaluation of such AR topics as tracking systems, modeling,
spatial AR, rendering tests, collaborative AR and user interface
design. By openly sharing the digital blueprints and assembly
instructions for our models, we allow the proposed set to be physically
replicable by anyone and permit customization and experimental
changes to the stage design which enable comprehensive
exploration of algorithms and methods. Furthermore we provide
an accompanying rich dataset consisting of video sequences under
varying conditions with ground truth camera pose. We employed
three different ground truth acquisition methods to support a broad
range of use cases. The goal of our design is to enable and improve
the replicability and evaluation of future augmented reality
research.
- 21.09.2010 / ICG / 13:00h
Title: Monitoring Social Expectations in Second Life
Speaker: Stephen Cranefield
Abstract:An active topic in multi-agent systems (MAS) research is the adaptation of social constructs from
human society, such as reputation, trust, norms and commitments, to enable autonomous software agents gain an awareness of the social context of their interactions and to help preserve order in open societies of agents.
At the same time, human interaction within online virtual communities has become increasingly popular due to the advent of social networking Websites and online virtual worlds. However, while these technologies provide the middleware to enable interaction, they generally provide little support for users to maintain an awareness of the social context of their interactions.
There is therefore an opportunity for techniques developed in MAS research for maintaining social awareness, that were inspired by human society, to be applied in the context of electronically mediated human interaction, as well as in their original context of software agent interaction. This will discuss one application of this idea to the Second Life virtual world. I will describe an approach allowing individual Second Life users or communities to define conditional rules of social expectation and subscribe to a monitor that checks for the fulfilment and violation of these rules.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- 02.02.2010 / ICG / 13:00h
Title: Novel Applications of Electrocorticographic Signals (ECoG)
Speaker: Peter Brunner
Slides:http://www.bciresearch.org/pbrunner/graz_talk_2009_merged.pdf
- 26.01.2010 / ICG / 13:00h
Title: New Approaches to Airway Segmentation in CT Data
Speaker: Christian Bauer
Abstract: In this talk, we present and compare two different methods for the
segmentation of airways in CT dataset. The first method utilizes a
multi-scale tube detection filter for the identification of tubular
objects followed by a reconstruction of the airway tree. During the
reconstruction step, prior knowledge about the airway trees is utilized
to identify and link tubular objects that are part of the airway tree.
This approach enables robust handling of disturbances like tumors or
emphysema. The second method utilizes the Gradient Vector Flow (GVF)
field for the identification of airways and extraction of their
centerlines. The found centerlines are used in a second step to
initialize the actual GVF-based segmentation. The performance of both
methods has been evaluated on a set of 20 chest CT scans with available
reference segmentations. We present the results of this evaluation,
discuss properties of the two methods, and compare them to the results
of 13 other methods from different research groups.
- 19.01.2010 / ICG / 13:00h
Title: Simultaneous localisation and mapping for mobile robots with recent sensor technologies
Speaker: Elmar Rueckert
Abstract: Autonomous mobile robots need a map of the environment for navigation. Simultaneous Localisation and Mapping (SLAM) is essential for autonomous navigation, path planning and obstacle avoidance. SLAM describes a process of building a map of an unknown environment and computing at the same time the current robot position. Both steps depend on each other. A good map is necessary to compute the robot position and on the other hand just an accurate position estimate yields to a correct map. Several popular SLAM packages, like DP-SLAM, GMapping or GridSLAM are available for research purposes and allow a not yet available and meaningful comparison between sensors and algorithms. The aim of the work is to find a robust method to generate 2D or 3D maps with recent sensor technologies. We compare a grid based method with two implementations of geometric feature based SLAM algorithms. All methods rely on a probabilistic estimate of the robot state realised with a Particle Filter. Recent sensor technologies: Laser range finders, sonar sensors and time of flight cameras are evaluated with respect to accuracy and robustness. Laser beam based sensors yield to the most exact results and are commonly used. Because of the low price of sonar sensors, ambitious efforts are being made to build cheap household robots. The last sensor technology, listed, is the newest and allows 3D scans of the environment. The experiments take place in indoor environments and a quantitative evaluation of the results is performed with the recently published RawSeeds datasets.
Title: Visual Analytics for Gene Expression Data
Speaker: Bernhard Schlegl
Abstract: The analysis of biomolecular gene expression data sets is a research area which results are
used by a wide range of life science experts. Biologists and geneticists are only two sample
users who are interested in analyzing gene expression profiles. To support users in terms
of visualization, the Caleydo InfoVis framework provides several visualization techniques
for gene expression data and pathways. The combination of both data sources is a major
field of interest because it provides better insights into the biological processes occurring
inside a cell into the context of patients.
To help experts extract information from the data visual analytics is sometimes used.
In this thesis we apply data mining methods and visualize the results. We set a focus
on clustering algorithms because they are well suited for finding co-expressed genes. Co-
expressed genes are relevant because experts assume that they are responsible for similar
functions inside a cell.
- 12.01.2010 / ICG / 13:00h
Title: Convex Approximation for Matching Subgraphs in Computer Vision
Speaker: Christian Schellewald
Abstract: I will present a convex approximation approach to the combinatorial problem
of matching subgraphs -- which represent object views -- against larger graphs which represent scenes. Starting from a linear programming formulation for computing optimal matchings in bipartite graphs, we extend
the linear objective function in order to take into account the relational
constraints given by both graphs. The resulting quadratic combinatorial
optimisation problem is approximately solved by a (convex) semidefinite
program. Some results with respect to view-based object recognition will be shown.
Title: Motion Estimation with Physical Prior Knowledge
Speaker: Annette Stahl
Abstract: We introduce a regularisation term for variational motion estimation approaches exploiting physical prior knowledge that is new in the field of image sequence processing. Using one of the motion estimation approaches along with an appropriate transport process we also propose a new
reconstruction approach for missing data in image sequences, also known as video inpainting. We exploit and extend the existing framework of standard variational optical flow approaches, which we use to recover optical flow
fields from image sequences by minimising an appropriate energy functional.
A partial differential equation is employed in order to obtain a physical
plausible regularisation term for dynamic image motion modelling. The resulting distributed-parameter approach incorporates a spatio-temporal regularisation in a recursive online fashion, contrary to previous variational
approaches which are designed to evaluate the entire spatio-temporal image volumes in a batch processing mode.
- 07.01.2010 / ICG / 13:00h
Title: Learning Components for Human Sensing
Speaker: Fernando De la Torre
Abstract: Providing computers with the ability to understand human behavior from
sensory data (e.g. video, audio, or wearable sensors) is an essential part
of many applications that can benefit society such as clinical diagnosis,
human computer interaction, and social robotics. A critical element in the
design of any behavioral sensing system is to find a good representation of
the data for encoding, segmenting, classifying and predicting subtle human
behavior. In this talk I will propose several extensions of Component
Analysis (CA) techniques (e.g. kernel principal component analysis, support
vector machines, and spectral clustering) that are able to learn
spatio-temporal representations or components useful in many human sensing
tasks.
In the first part of the talk I will give an overview of several ongoing
projects in the CMU Human Sensing Laboratory, including our current work on
depression assessment and deception detection from video, as well as
hot-flash detection from wearable sensors. In the second part of the talk I
will show how several extensions of the CA methods outperform
state-of-the-art algorithms in problems such as temporal alignment of human
behavior, temporal segmentation/clustering of human activities, joint
segmentation and classification of human behavior, and facial feature
detection in images. The talk will be adaptive, and I will discuss the
topics of major interest to the audience.
- 22.12.2009 / ICG / 13:00h
Title: Wavelet based real-time deformable objects
Speaker: Antonio Rella
Abstract: Calculation of deformation of 3-D models is a well know realm in analytic mathematics. As early as a century past algorithms have been developed to compute such deformations. In general, these computations for physically correct and accurate deformations are cumbersome and time-consuming. For a few years computers became more powerful and have the capabilities to compute deformations for complex models in an appropriate period of time. Beside these physically correct deformations, realistic and intuitive deformations have been described by less time consuming algorithms, which also could be computed in real-time. However, these deformations are not precise and can only satisfy an observer at first sight. This thesis is concerned with the boundary element computation algorithm for accurate deformation descriptions of three dimensional models. For this method the underlying coefficient matrix is fully populated and therefore more difficult to solve. The approach of lazy wavelets, on the contrary, is able to remove less relevant geometrical information while accepting the emerging error, to achieve a more sparse coefficient matrix and to ease the calculation.
- 15.12.2009 / ICG / 13:00h
Title:Articulated Tracking With Few and With Many Parameters
Speaker: Prof. Konrad Schindler
Abstract:Estimating the 3D pose of articulated objects from image data is a notoriously difficult problem in computer vision. One reason is that articulated objects such as the human body have many degrees of freedom, and hence a huge pose space. The talk will present two very different ways of simplifying inference in the pose space.
In the first approach, strong a-priori assumptions about the expected motion pattern are imposed, which reduce the effective number of unknowns and allow one to perform pose estimation in a space of much lower dimension, found with (non-linear) dimensionality reduction.
In the second approach, the relations between parts of the articulated structure are relaxed to soft constraints. This increases the total number of parameters, but allows one to independently estimate the poses of the parts (each with a small number of unknowns) and afterwards enforce the constraints between them by message passing.
Examples will be shown of tracking of walking people (for the "few parameters" setting), respectively hand tracking during object manipulation (for the "many parameters" setting).
- 10.12.2009 / ICG / 13:00h
Title: Efficient Ray Casting of Volumetric Datasets with Polyhedral Boundaries on Manycore GPUs
Speaker: Bernhard Kainz
Abstract: We present a new system for hardware-accelerated ray casting of multiple volumes. Our approach supports a large number of volumes, complex translucent and concave polyhedral objects as well as CSG intersections of volumes and geometry in any combination. It is implemented as a software renderer in CUDA without any fixed function portions, which allows full control over the use of memory bandwidth. High depth complexity, which is problematic for conventional approaches based on depth peeling, can be successfully handled. As far as we know, our approach is the first framework for multi-volume rendering which provides interactive frame rates when concurrently rendering more than 50 arbitrarily overlapping volumes on current graphics hardware.
- 09.12.2009 / ICG / 13:00h
Title: Graphical Models for Object Detection and Pose Estimation
Speaker: Prof. Stefan Roth
Abstract: In my talk I will present two rather different approaches for object
detection. The first is focused on detecting generic classes of
objects in images, a problem that is made challenging by the large
intra-class appearance variation of typical object categories as well
as viewpoint changes and occlusions. Recent work has pointed to
advantages of combining several features, particularly global and
local ones. But it remained difficult to choose appropriate
combinations of features. Our work addresses this problem using a
conditional random field (CRF). To find the feature couplings that
yield the best discriminative power, we automatically learn the graph
structure of the CRF in a discriminative fashion. The resulting
approach yields state-of-the-art performance on the challenging PASCAL
2007 dataset.
In the second part of my talk, I will focus on detecting people in
particular, as well as estimating their 2D pose. While for people
detection monolithic, global features and discriminative learning are
widely used, pose estimation has often remained focused on simple
image features such as silhouettes that lack discriminative power.
Our work shows that combining a simple tree-structured graphical model
for modeling admissible body part configurations with powerful
discriminative features in a so-called pictorial structures framework
enables excellent detection as well as pose estimation performance. We
demonstrate results for a variety of challenging scenes including TV
footage.
This is joint work with P. Schnitzspan, M. Andriluka and B. Schiele.
- 01.12.2009 / ICG / 13:00h
Title:Towards a Collaborative Information Visualization System in a Multi-Display Envrionment
Speaker: Werner Puff
Abstract:Visual data analysis often operates on a vast amount of data. The size of data sets as well as the knowledge related to it raises, among others, two problems. On the one hand display areas and resolutions of standard workplaces are often insufficient to visualize the data in a proper proper manner. On the other hand the knowledge and collaboration of experts from multiple domains is needed for an efficient data analysis process but the collaborative tasks are not supported by the utilized systems.
This work proposes an approach to counter these problems, based on the information visualization software Caleydo. The approach introduces extensions to the existing software in order to connect multiple Caleydo applications for the collaborative analysis of one data set. In addition, Caleydo is integrated into the Multi-Display Environment Deskotheque. This integration provides larger and more flexible display areas and also enables co-located collaboration for small groups. The extensions presented here include a communication layer to provide synchronization and data exchange between the applications. Visual Links are used to make users aware of changes, an important aspect in a setup consisting of multiple displays and multiple users.
Title:Real-Time 3D Rendering for the Atronic EGD Framework: A Hybrid Approach
Speaker:Christopher Dissauer
Abstract:The Austrian company Atronic is engaged in development and manufacturing of videobased
gaming machines and display systems for the world-wide casino market. Displaying
graphical content on these machines is performed via the companys proprietary EGD
scene graph framework; historically, the original design of this framework was strongly
influenced by the comparatively low GPU performance of the available hardware platform,
resulting in a necessarily CPU-bound 2D-only implementation. This thesis presents an
approach to augment the framework by GPU-accelerated real-time 3D rendering, suitable
for operation on Atronics upcoming next-generation hardware platforms.
Necessary goals and requirements were defined; during this process, it was decided
to aim for a hybrid approach that represents a novel way for seamless integration of 3D
functionality into the existing 2D system. Based on these drafts, a preliminary study was
carried out to ensure overall technical feasibility within the given hardware and software
limits. With respect to the results of this study a concrete design was laid out that
eventually led to the implementation of a working prototype.
Corner test cases have been prepared to evaluate the hybrid prototype system with
regard to extensibility, stability and performance, yielding overall satisfactory results. Nevertheless,
certain inadequacies were revealed through this evaluation; most prominently,
the techniques used for parallelization of CPU-bound and GPU-bound tasks showed considerable
room for further optimization towards a technically mature product.
|