Sections
You are here: Home Courses Advanced Computer Vision (710.001) Advanced Computer Vision

Advanced Computer Vision  (710.001)

Vortragende(r) Franz Leberl, Thomas Pock
Art SE, SS 2.0
Zielgruppe Freifach
Literatur

Infrastruktur des Institutes

Ort Seminarraum, Institut für Maschinelles Sehen und Darstellen, Inffeldgasse 16, 2. Stock
Zeit Dienstag, 13.00 - 14.30 Uhr
Link(s)

TUG-Online

Details

Vortragsliste:

  • 27.09.2011 / ICG / 13:00h

Title:Variational Multiview Range Estimation
Speaker:Gottfried Graber
Abstract: Variational methods have been shown to be very effective in computing dense depthmaps from stereo images. Most algorithms aim at computing the disparity field, which makes the integration of multiple views hard. In this work we propose to compute depth directly, which not only simplifies the extension to multiple views significantly, but also increases the robustness and quality of the resulting depthmaps. We provide results on both synthetic and real world data, where our method performs comparable to current state of the art multiview stereo algorithms.

Title:Bildgestutzte Qualitätsprüfung von Feuerbohnen
Speaker:
Abstract: The presented work deals with the sorting of food. In detail, the sorting of scarlet runners is addressed using an image-based method. Since the work cannot cover all necessary topics, it focuses on the implementation of the image processing part. For this purpose, food sorting is introduced in general, some information about scarlet runners and project requirements is the starting point of the work. To get an overview, sorting concepts and software-related issues are discussed. This also helps to define exact specifications for the image processing system. Fundamentals in image processing are introduced afterwards, including image acquisition and technologies, preprocessing and feature extraction. Fur- ther needed subjects are machine learning and real-time systems. Methods to implement a suitable software are described and analyzed using complexity. Practical usage of the solutions is proofed via experiments to evaluate performance and timing behavior. The final chapter gives some conclusions and perspectives.

  • 13.09.2011 / ICG / 13:00h

Title:Transforming image completion
Speaker:Alex Mansfield, ETH Zürich
Abstract: Image completion is an important photo-editing task which involves synthetically filling a hole in the image such that the image still appears natural. State of the art image completion methods work by searching for patches in the rest of the image that fit well in the hole region. Our key insight is that image patches remain natural under a variety of transformations (e.g. scale, rotation and brightness change), and this should be exploited. We extend the image completion model of Wexler et al. 2007 and investigate numerous optimisation techniques. We show how to achieve results that outperform previous state of the art with reasonable efficiency.

  • 26.08.2011 / ICG / 13:00h

Title:TBA
Speaker:Paul Wohlhart
Abstract: TBA

  • 25.07.2011 / ICG / 13:00h

Title:Identifying social norms in virtual agent societies
Speaker:Bastin Tony Roy Savarimuthu
Abstract: Social norms are expectations of an agent (human or software) about the behaviour of other agents in the society. Examples of social norms followed in the human agent society are the obligation norm of gift exchange during Christmas and the prohibition norm against littering a public place. Social norms are simple constructs that are used to facilitate cooperation in human societies. In the field of computer science, normative multi-agent system researchers study how norms can be used to facilitate cooperation and collaboration among software agents. In virtual environments such as second life, avatars embody software agents (i.e. they are proxies to humans). It is important for a software agent operating under the open world assumption to be endowed with computational mechanisms to identify norms that may govern its behaviour and its interactions with others. Otherwise, social sanctions may ensue for not following the norms. In this talk, I will first provide an overview of the internal agent architecture for norm identification which a software agent can use to identify the norms of a society. Second, I will discuss how one particular type of norm, the prohibition norm (e.g. don't litter the park) can be identified. Third, I will briefly discuss how obligations norms can be identified. The mechanisms developed in this work are applicable to software entities in virtual environments such as Second Life and Massively Multi-player online games.

Biography: Bastin Tony Roy Savarimuthu received his Master of Engineering (ME) degree in Software Systems from Birla Institute of Technology and Science, Pilani, India. He is currently a Lecturer in Information Science at the University of Otago, Dunedin, New Zealand. His primary area of research is normative multi-agent systems. His recently submitted PhD work focuses on how norms emerge in artificial agent societies and how agents can identify norms in open agent societies. His other research interests include mobile computing, social networking and software engineering. More details can be found at http://www.infosci.otago.ac.nz/tony-bastin-roy-savarimuthu/

  • 28.06.2011 / ICG / 13:00h

Title:On-line Data Analysis Based on Visual Codebooks
Speaker:Vítězslav Beran, Department of Computer Graphics and Multimedia,Brno University of Technology
Abstract: This work introduces the new adaptable method for on-line video searching in real-time based on visual codebook. The new method addresses the high computational efficiency and retrieval performance when used on on-line data. The method originates in procedures utilized by static visual codebook techniques. These standard procedures are modified to be able to adapt to changing data. The procedures, that improve the new method adaptability, are dynamic inverse document frequency, adaptable visual codebook and flowing inverted index. The developed adaptable method was evaluated and the presented results show how the adaptable method outperforms the static approaches when evaluating on the video searching tasks. The new adaptable method is based on introduced flowing window concept that defines the ways of selection of data, both for system adaptation and for processing. Together with the concept, the mathematical background is defined to find the best configuration when applying the concept to some new method. The practical application of the adaptable method is particularly in the video processing systems where significant changes of the data domain, unknown in advance, is expected. The method is applicable in embedded systems monitoring and analyzing the broadcasted TV on-line signals in real-time.

  • 21.06.2011 / ICG / 13:00h

Title:3D Object Categorization and Pose Estimation using 3D Gaussian Mixture Contour Models
Speaker:Kerstin Pötsch
Abstract:In this talk I give an overview of my thesis which is concerned with the problem of object categorization based on 3D shape models. 2D shape models based on 2D contour fragments are powerful cues for object categorization but these methods are restricted to one aspect. Therefore, we are going towards object categorization based on 3D shape models. 3D shape is modelled by 3D contour fragments (1D embedded in 3D). We use an approach based on Gaussian Mixture Models for learning 3D category models which we further use for pose estimation in 2D images. I will demonstrate our 3D categorization system on our own dataset and I will demonstrate our pose estimation approach on seventeen poses of the ETH80 database.

  • 14.06.2011 / ICG / 13:00h

Title:Efficient Structure from Motion with Weak Position and Orientation Priors
Speaker:Arnold Irschara
Abstract: In this paper we present an approach that leverages prior information from global positioning systems and inertial measurement units to speedup structure from motion computation. We propose a view selection strategy that advances vocabulary tree based coarse matching by also considering the geometric configuration between weakly oriented images. Furthermore, we introduce a fast and scalable reconstruction approach that relies on global rotation registration and robust bundle adjustment. Real world experiments are performed using data acquired by a micro aerial vehicle attached with GPS/INS sensors. Our proposed algorithm achieves orientation results that are sub-pixel accurate and the precision is on a par with results from incremental structure from motion approaches. Moreover, the method is scalable and computationally more efficient than previous approaches.

Title:Artifact-free JPEG decompression as constrained optimization problem
Speaker:Martin Holler, Institute for Mathematics and Scientific Computing, Uni Graz
Abstract: The problem of artifact-free decompression of a given JPEG compressed image is addressed. This is done by formulating a constrained optimization problem involving a data fidelity- and a regularization term. The main focus is put on the regularization term. At first, the choice of the well known Total Variation (TV) functional for regularization is discussed. Then, the recently introduced Total Generalized Variation (TGV) functional is considered as regularization term and obtained results are compared with results from the TV based model. At last, the TGV based model is extended to handle color JPEG images, where color sub-sampling has been applied. For both models, the resulting minimization problem is solved at the use of a primal-dual algorithm. Computation times of graphics processing unit (GPU) based implementations are presented.

  • 07.06.2011 / ICG / 13:00h

Title:ISMAR 2011 Submissions
Speaker:Gerhard Reitmayer, et al.
Abstract: Review of ISMAR 2011 submissions. Approximately 10 short presentations.

  • 25.05.2011 / ICG / 13:30h

Title:Talk of Markus Hadwiger
Speaker:Markus Hadwiger
Abstract: Dr. Markus Hadwiger is Assistant Professor of Computer Science in the Division of Mathematical and Computer Sciences and Engineering at KAUST. He assumed his duties in October 2009.

Prior to his appointment at KAUST, Dr. Hadwiger was a Senior Researcher at the VRVis Research Center for Virtual Reality and Visualization in Vienna, Austria. During this time, he conducted extensive basic and applied research in scientific visualization, especially volume visualization and medical visualization, as well as research on GPU-based algorithms.

Dr. Hadwiger’s research interests are in scientific visualization, especially petascale visualization and scientific computing, volume visualization, medical visualization, interactive segmentation and image processing, GPU-based algorithms, and general-purpose computations on GPUs.

He is a co-author of the book “Real-Time Volume Graphics” published by A K Peters in 2006, and has been involved in many courses and tutorials about volume rendering and visualization at ACM SIGGRAPH, ACM SIGGRAPH Asia, IEEE Visualization, and Eurographics. Dr. Hadwiger has co-authored more than 30 refereed articles.

In 2008, Dr. Hadwiger has been awarded a multi-year ICT basic research grant from the Vienna Science and Technology Fund WWTF for research on scalable semantic petascale visualization. Also, he was a co-recipient of the Best Application Paper award at IEEE Visualization 2007. Dr. Hadwiger is a member of the IEEE and Eurographics.

Dr. Hadwiger received his doctoral and master’s degrees in Computer Science from Vienna University of Technology, Austria.

  • 24.05.2011 / ICG / 13:00h

Title:Vision-Based Quality Inspection in Robotic Welding
Speaker:Markus Heber
Abstract: In this work we present a novel method for assessing the quality of a robotic welding process. While most conventional automated approaches rely on non-visual information like sound or voltage, we introduce a vision-based approach. Although the weld seam appearance changes, we exploit only the information from error-free reference data, and assess the welding quality through the number of highly dissimilar frames. In our experiments we show, that this approach enables an efficient and accurate separation of defective from error-free weldings, as well as detection of welding defects in real-time by exploiting the spatial information provided by the welding robot.

Title:Learning Face Recognition in Videos from Associated Information Sources
Speaker:Paul Wolhart
Abstract: Videos are often associated with additional information that could be valuable for interpretation of its content. This especially applies for the recognition of faces within video streams, where often cues such as transcripts and subtitles are available. However, this data is not completely reliable and might be ambiguously labeled. To overcome these limitations, we propose a new semi supervised multiple instance learning algorithm, where the contribution is twofold. First, we can transfer information on labeled bags of instances, thus, enabling us to weaken the prerequisite of knowing the label for each instance. Second, we can integrate unlabeled data, given only probabilistic information in form of priors. The benefits of the approach are demonstrated for face recognition in videos on a publicly available benchmark dataset.

  • 17.05.2011 / ICG / 13:00h

Title:Multi-View Stereo: Redundancy Benefits for 3D~Reconstruction
Speaker:Markus Rumpler
Abstract: This work investigates the influence of using multiple views for 3D reconstruction with respect to depth accuracy and robustness. In particular we show that multiview matching not only contributes to scene completeness, but also improves depth accuracy by improved triangulation angles. We first start by synthetic experiments on a typical aerial photogrammetric camera network and investigate how baseline (i.e. triangulation angle) and redundancy affect the depth error. Our evaluation also includes a comparison between combined pairwise triangulated and fused stereo pairs in contrast to true multiview triangulation. By analyzing the 3D uncertainty ellipsoid of triangulated points we demonstrate the clear advantage of a multiview approach over fused two view stereo algorithms. We propose an efficient dense matching algorithm that utilizes pairwise optical flow followed by a robust correspondence chaining approach. We provide evaluation results of the proposed method on ground truth data and compare its performance in contrast to a multiview plane sweep method.

Title:Large-Scale Robotic SLAM through Visual Mapping
Speaker:Christof Hoppe
Abstract: Keyframe-based visual SLAM systems perform reliably and fast in medium-sized environments. Currently, their main weaknesses are robustness and scalability in large scenarios. In this work, we propose a hybrid, keyframe based visual SLAM system, which overcomes these problems. We combine visual features of different strength, add appearance-based loop detection and present a novel method to incorporate non-visual sensor information into standard bundle adjustment frameworks to tackle the problem of weakly textured scenes. On a standardized test dataset, we outperform EKFbased solutions in terms of localization accuracy by at least a factor of two. On a self-recorded dataset, we achieve a performance comparable to a laser scanner approach.

  • 13.05.2011 / HSi5 / 14:30h

Title:Incremetal learning with import vector machines
Speaker:Wolfgang Förstner
Abstract: Incremental learning addresses the adaption of the learned model necessary due to changes in appearance, shape, the set of features used for identication, and the complexity of the model. Incremental learning needs a model with a generative and discriminative component, which allows to handle a large variety of object classes and simultaneously being efficient for distinguishing similar object classes. Among the many approaches used for incremental learning, import vector machines (IVM) appear to have a great potential, which has not been exploited yet. The IVM are a sparse, kernel-based discriminative model, similar to the well-known support vector machines. Due to the similarity of the both models, namely depending on an affine combination of the features, IVM can be used for the same type of problems as SVMs. However, the IVM provide a probabilistic output, are sparser, and appear to contain a generative component though not constructed this way. The talk presents basic idea of import vector machines, especially adresses the differences to SVN's, - based on recent work of recent work of Ribana Roscher - shows how to arrive at an efficient incremental multi-class learning scheme and illustrates the potential using classical datasets and an object tracking approach.

  • 12.05.2011 / HSi11 / 16:00h

Title:On the role of order of training examples for incremental learning
Speaker:Susanne Wenzel
Abstract: We adress the problem of learning classes where it is impossible to capture the huge variability of examples with one dataset at one time but obtain these examples over time. A continuous learning system would be able to improve already learned models using new examples. There exist a number of incremental learning methods approaching this problem. But one can easily show that the performance of these methods depends on the order of examples for training, a problem which is not adresse in most publications. This talk points out the role of the order of samples for training an incremental learning method. We define characteristics of incremental learning methods to describe the influence of sample ordering on the performance of a learned model. We sketch different types of experiments to evaluate these properties. Based on the estimation of Bayes error bounds, we show how to find sequences of classes for training just based on the data to always get obtain the best possible error rates.

Title:Spatial and Hierarchical Structures for Interpreting Images of Man-made Scenes
Speaker:Michael Ying Yang
Abstract: Classification of various image components (pixels and regions) in meaningful categories is a challenging task due to ambiguities inherent to image data. Images of man-made scenes, e.g. building facade images, exhibit strong contextual dependencies in the form of spatial interactions among components. For example, neighboring pixels tend to have similar class labels, and different regions appear in restricted spatial configurations. Modeling these interactions is crucial to achieve good classification accuracy. Graphical models provide a consistent framework for the statistical modeling. In this talk, we present a conditional random field (CRF) to model the spatial structures in the image. The unary potentials are built on the probability output of an efficient randomized decision forest classifier which acts on the region level. The pairwise potentials are introduced to enforce spatial consistency between neighboring regions. To exploit different levels of contextual information in images, a hierarchical conditional random field (HCRF) is described as an extension of CRF. The hierarchical structure of the regions is integrated into pairwise potentials. The model is built on multi-scale image analysis in order to aggregate evidence from local to global level.

Title:Real-time reconstruction of road-surfaces and curbs from stereo-image-sequences
Speaker:Jan Siegemund
Abstract: Robust registration and modeling of the ego vehicle's free driving space provides the basis for many high-level driving assistance applications, such as path planing and collision avoidance. In this context, curbs are an important delimiting structure of this free-space, usually representing the boundary between the driving lane and the sidewalk. However, most existing systems for obstacle detection classify curbs as road inliers, due to their low height occurrence. Concerning the sensor aspect, stereo cameras are getting affordable and provide several advantages, such as a high data rate and a low requirement of space inside the vehicle. In my talk I will present a method to employ Conditional Random Fields for real-time reconstruction of road-surfaces and curbs from stereo-image-sequences.

  • 10.05.2011 / ICG / 13:00h

Title:Human Action Recognition using Multiple Instance Learning: A Comparative Study
Speaker:Gerald Fritz
Abstract: This thesis is situated in the scope of human action recognition and is concerned with two major objectives. First, it presents a comparative study of five different multiple instance learning (MIL) approaches and relates the results to those reported for state-of-the-art approaches in this field. Second, this work considers whether a sparse, part-based representation is able to support the consecutive classification process.

We investigate a non-negative matrix factorization with sparseness constraints and determine how such a representation contribute to performance improvements. Furthermore, we analyse the impact of a structured initialization towards a better part-based representation and present results for two different nearest neighbour approaches in a face recognition experiment. In the main part of this thesis we investigate, whether a MIL concept is suitable for an action recognition task. We perform a thoroughly and detailed evaluation of different MIL approaches on the Weizmann action dataset and the KTH benchmark.

Results on the ORL database of faces demonstrate that sparse, part-based representation beneficially supports the subsequent classifier. In particular, if the level of sparseness is significantly greater than those obtained by an unconstrained matrix factorization, then both classifiers achieved an increased performance of $\sim 1.5\%$. Results on the Weizmann dataset show that three out of five MIL methods achieved competitive or better accuracies compared to a linear SVM classifier. Evaluations on the KTH benchmark demonstrate, that the best MIL approach (\emph{miGraph}) performed equally well up to a moderate level of noise. Finally, a solid comparison with recent approaches in the field of human action recognition complements the discussion of both datasets.

Title:Efficient SLAM: Ideas and theoretical analysis
Speaker:Handa Ankur
Abstract: Visual SLAM is process by which a robot builds a map of the environment as well as compute it's own position at the same time using only camera as the sensor. It uses various landmarks to build the map environment. These landmarks can be point-features, edges or it can be fully dense map. Building on from the work on the classic MonoSLAM mapping sparse point features, we explore the possibilities of making very efficient SLAM algorithms that scale well with the increase in number of features and in the end delve into the theoretical side of it in a brief asking very simple questions e.g. what frame-rate to use, how many features to use and which image resolution to use to obtain anytime SLAM algorithm working under hard computational constraints.

  • 03.05.2011 / ICG / 13:00h

Title:Efficiently Locating Photographs in Many Panoramas
Speaker:Michael Kroepfl / Microsoft Research Redmond
Abstract:

Efficiently Locating Photographs in Many Panoramas

We present a method for efficient and reliable geo-positioning of images. It relies on image-based matching of the query images onto a trellis of existing images that provides accurate 5-DOF calibration (camera position and orientation without scale). As such it can handle any image input, including old historical images, matched against a whole city. On such a scale, care needs to be taken with the size of the database. We deviate from previous work by using panoramas to simultaneously reduce the database size and increase the coverage. To reduce the likelihood of false matches, we restrict the range of angles for matched features. Furthermore, we enhance the RANSAC procedure to include two phases. The second phase includes guided feature matching to increase the likelihood of positive matches. Hence, we devise a matching confidence score that separates between true and false matches. We demonstrate the algorithm on a large scale database covering a whole city in order to show its usefulness for a vision-based augmented reality system.

Read/Write World – Location Matching API

Based on our previous work on location based image matching, we developed a cloud-based image matching service, which allows matching of an arbitrary set of input images (panoramas and regular images) to a large index of geo-tagged images from different sources on the web. Through image matching, we can create a match-graph which can link media from different sources with other media and associated meta-data. We recently announced (http://www.youtube.com/watch?v=4X9u4JG9H6E) the availability of this web-service to developers through a publicly available API, allowing developers to create their own applications using the service. In this talk, I will provide an overview of the functionality showcased on the project’s web-page (http://readwriteworld.cloudapp.net/), including demonstrations of the individual service endpoints.

  • 19.04.2011 / ICG / 13:00h

Title:Natural Landmark–based Monocular Localization for MAVs
Speaker:Andreas Wendel
Abstract:Highly accurate localization of a micro aerial vehicle (MAV) with respect to a scene is important for a wide range of applications, in particular surveillance and inspection. Most existing approaches to visual localization focus on indoor environments, while such tasks require outdoor navigation. Within this work, we introduce a novel algorithm for monocular visual localization for MAVs based on the concept of virtual views in 3D space. Under the assumption that significant parts of the scene do not alter their geometry and serve as natural landmarks, the accuracy of our visual approach outperforms consumer grade GPS systems. In an experimental setup we compare our approach to a state–of–the–art visual SLAM algorithm and evaluate the performance by geometric validation from an observer's view. As our method directly allows global registration, it is neither prone to drift nor bias. This makes it well suited for long–term autonomous navigation.

Title:Facade Segmentation in a Multi-View Scenario
Speaker:Andreas Wendel
Abstract:We examine a new method of façade segmentation in a multi-view scenario. A set of overlapping, thus redundant street-side images exists and each image shows multiple buildings. A semantic segmentation identifies primary areas in the image such as sky, ground, vegetation, and facade. Subsequently, repeated patterns are detected in image segments previous labeled as "facade areas" and are applied to separate specific facades from each other. Experimentation is based on an industrial street-view dataset from a moving car by well-designed, calibrated, automated cameras. High overlap images define a multi-view scenario. We achieve 97% pixel-wise segmentation effectiveness, outperforming current state-of-the-art methods.

  • CANCELED !!! 12.04.2011 / ICG / 13:00h CANCELED !!!

Title:Bildgestützte Qualitätskontrolle von Feuerbohnen
Speaker:Heinz Fleischhacker
Abstract:Die vorliegende Arbeit befasst sich mit der Sortierung von Lebensmitteln, im konkreten wird eine L ̈sung zur bildgest ̈tzten Sortierung von Feuerbohnen pr ̈sentiert. Da die Ar- beit jedoch nicht den gesamten Bereich der daf ̈r notwendigen Technologien abzudecken vermag, liegt der Fokus auf der Implementierung der Bildverarbeitung. Zu diesem Zweck werden zu Beginn Aufgaben der Lebensmittelsortierung erl ̈utert, kurze Informationen zur speziellen Kultur der Feuerbohne leiten das Projekt ein. Es folgen Grundlagen der Bild- verarbeitung in den Bereichen Bildgewinnung und Technologien, Vorverarbeitung sowie Eigenschaftsextraktion. Danach werden Elemente des maschinellen Lernens sowie Echt- zeitsysteme eingef ̈hrt, welche f ̈r die weiteren Darstellungen ebenfalls ben ̈tigt werden. Es folgt die Vorstellung von Sortierkonzepten, aus denen die Anforderungen einer Bildver- arbeitungssoftware abgeleitet werden, welche Feuerbohnen klassifizieren kann. Nach Fest- legung dieser Anforderungen werden Algorithmen vorgestellt, welche f ̈r kontinuierliche Datenverarbeitung ausgelegt sind. Im Fall der Analyse von Feuerbohnen kommt diese Not- wendigkeit durch die Verwendung von Zeilensensoren zu Stande. Die Algorithmen werden analysiert und an Hand einer konkreten Implementierung im Detail getestet. Daraus wird die Verwendbarkeit der entwickelten L ̈sungen argumentiert. Abschließende Bemerkungen geben einen weiteren Ausblick.

  • 05.04.2011 / ICG / 13:00h

Title:Realtime Incremental Image Stitching For Industrial Quality Inspection
Speaker:Robert Lanner
Abstract:The task of image stitching is to create a high quality panorama from a set of partially overlapping input images. In order to align images, image registration is applied. A blending method is finally used to stitch the aligned images and to create smooth transitions between them without visible artifacts. Hence, the two main steps for the generation of a panorama are image registration and image blending. This thesis presents an image stitching system that creates a weld seam survey image for visual quality inspection of welding processes. The image registration is based on salient keypoint extraction and robust motion estimation. By incorporation of available tracking data, the system gurantees a successful mosaic generation. Furthermore, the system includes an incremental blending strategy to provide an online generation of image mosaics and noise filtering methods to cope with e.g. smoke or sparks that typically occur during welding processes.

  • 15.03.2011 / ICG / 13:00h

Title:Enforcing topological constraints in random field image segmentation
Speaker:Chao Chen / IST Austria
Abstract:We introduce a new way to integrate knowledge about topological properties (TPs) into random field image segmentation model. Instead of including TPs as additional constraints during minimization of the energy function, we devise an efficient algorithm for modifying the unary potentials such that the resulting segmentation is guaranteed with the desired properties. Our method is more flexible in the sense that it handles more topology constraints than previous methods, which were only able to enforce pairwise or global connectivity. In particular, our method is very fast, making it for the first time possible to enforce global topological properties in practical image segmentation tasks.

  • 24.08.2010 / ICG / 13:00h

Title:Content Creation for Augmented Reality on Mobile Devices
Speaker:Stefan Mooslechner
Abstract:Nowadays, AR (Augmented Reality) become more and more attractive for different areas of application. Especially the increasing number of smartphones with huge displays, built-in cameras and fast wireless connections extends this area. The most applications in this case deal with pre-assembled content, and the user is a simple consumer. Indeed it is possible to create their own content, but in most cases, a special knowledge about different software solutions is necessary. So, the user has to invest time in learning about these applications. The possibility to create and share AR content in a smart and easy way would widen up the number of users for this field of application. In this work, we present a prototype to create AR content directly on mobile devices. The user can create new 3D-objects as well as 2D-drawings. We provide different possibilities to color and texture the objects. The building and manipulation of the scenes are done directly at the location where they will be shown. So, a fast and exact adjustment to the environment is possible. We deliver an easy way to build virtual models out of the real environment or to generate totally new objects. Additionally, we use an existing infrastructure to distribute the content to a huge number of users. This could be a further step to reach more acceptance for AR applications by end users.

  • 29.06.2010 / ICG / 13:00h

Title:Information-theoretic database building and querying for augmented reality applications
Speaker:Pawan Baheti
Abstract:Recently, there has been tremendous interest in the area of mobile Augmented Reality (AR) with applications including navigation, social networking, gaming and education. Current generation mobile phones are equipped with camera, GPS and other sensors, e.g., magnetic compass, accelerometer in addition to having ever increasing computing/graphics capabilities and memory storage. Mobile AR applications process the output of one or more sensors to augment the real world view with useful information. In this work the focus is on the camera sensor output, and a server-client framework is introduced to enable AR applications on mobile phones. The main focus of this talk is to present information-theoretic techniques for the server to build and maintain an image (feature) database based on reference images, and for the client to query the captured input images against this database. The database building on the server involves pruning the descriptor set obtained from reference images with respect to statistical and entropy based measures. Performance results using standard image sets are provided demonstrating superior recognition performance even with dramatic reductions in feature database size. Further extensions in terms of client feedback are considered to improve the database optimization on the server side.

  • 15.06.2010 / ICG / 13:00h

Title:Robust Aerial Image Matching in Temporal Variant Regions
Speaker:Gernot Margreitner
Abstract:This thesis deals with the problem of finding correspondences between images that capture the same image scene taken at different times. The time differences can vary from a few minutes up to several months which makes it even harder to reliably find correspondences. Basically, local features proved to be a powerful way to find such correspondences because they are robust to background clutter, occlusions, or changes of the viewpoint. Even though numerous comprehensive feature evaluations have been published, none of these works focused the performance evaluation in the presence of temporal variations. This is a major drawback because in many applications multi-temporal image matching is a crucial component in order to successfully solve the posed problem. Consequentially, this thesis presents a multi-temporal performance evaluation of selected local detectors and descriptors for non-planar aerial imagery. The primary goal of this work is to develop a temporal insensitive image matching workflow that is robust to temporal changes in aerial imagery and achieves highly accurate correspondence alignments. Such a matching algorithm may serve as a fundamental component of a broad range of applications. For example, the demonstrated algorithm prototype can be used to enhance existing photogrammetric workflows, where manually intensive user intervention is usually required in order to correctly match images in the presence of temporal changes.

  • 01.06.2010 / ICG / 13:00h

Title: Describing Buildings by 3-Dimensional Details Found in Aerial Photography
Speaker: Philipp Meixner
Abstract:A description of Real Properties is of interest in connection with Location-Based Services and urban resource management. The advent of Internet-maps and location aware Web-search inspires the development of such descriptions to be developed automatically and at very little incremental cost from aerial photography and its associated data products. Very important on each real property are its buildings. We describe how one can recognize and reconstruct buildings in 3 dimensions with the purpose of extracting the building size, its footprint, the number of floors, the roof shapes, the number of windows, the existence or absence of balconies. A key to success in this task is the availability of aerial photography at a greater overlap than has been customary in traditional photogrammetry, as well as a Ground Sampling Distance GSD exceeding the traditional values. We use images at a pixel size of 10 cm and with an overlap of 80% in the direction of flight and 60% across the flight direction. Such data support a robust determination of the number of floors and windows. Initial tests with data from the core of the City of Graz (Austria) produced an accuracy of 90% regarding the count of the number of floors and an accuracy of 80% regarding the detection of windows.

Title:Highly accurate Multiresolution Isosurface Rendering using compactly supported Spline Wavelets
Speaker:Markus Steinberger
Abstract:We present an interactive rendering method for isosurfaces in a voxel grid. The underlying trivariate function is represented as a spline wavelet hierarchy, which allows for adaptive (view-dependent) selection of the desired level-of-detail by superimposing appropriately weighted basis functions. Different root finding techniques are compared with respect to their precision and efficiency. Both wavelet reconstruction and root finding are implemented in Cuda to utilize the high computational performance of Nvidia's hardware and to obtain high quality results. We tested our methods with datasets of up to 512³ voxels and demonstrate interactive frame rates for a viewport size of up to 1024x768 pixels.

  • 25.05.2010 / ICG / 13:00h

Title: Bridging the gap between 3D computer graphics and video/film
Speaker: Philippe Bekaert
Abstract: 3D computer graphics often looks unnatural and cartoon like. Although it is very well possible to create highly realistic computer graphics models and renderings, the cost of doing so is generally very large. Video technology on the other hand preserves realism from scene to screen, by construction, but it does not allow the fantastic freedom of computer graphics in creating or modifying models, or to navigate and interact in them. In this presentation, I will (re)discuss image and video based modeling and rendering in this light, discuss the particular case of the maturing technology of omni-directional video including its application in performance art, and will argue that we are only at the beginning of the development of a new visual medium with its proper grammar and applications.

  • 05.05.2010 / ICG / 16:00h

Title: Quantitative lung image analysis
Speaker: Reinhard Beichel
Abstract: Lung diseases are a major health problem. State-of-the-art volumetric imaging modalities like multi-detector computed tomography (MDCT) allow us to depict lung diseases in unprecedented detail, which enables us to use imaging as a biomarker. In the first part of the talk, the current challenges in quantitative lung image analysis will be discussed. In the second part, methods for lung shape analysis and robust segmentation of diseased lungs will be presented.

  • 29.04.2010 / ICG / 13:45h

Title: Multi-Frame Rate Volume Rendering
Speaker: Stefan Hauswieser
Abstract: This is the test talk of my paper for the Eurographics Symposium on Parallel Graphics and Visualization. It presents multi-frame rate volume rendering, an asynchronous approach to parallel volume rendering. The workload is distributed over multiple GPUs in such a way that the main display device can provide high frame rates and little latency to user input, while one or multiple backend GPUs asynchronously provide new views. The latency artifacts inherent to such a solution are minimized by forward image warping. Volume rendering, especially in medical applications, often involves the visualization of transparent objects. Former multi-frame rate rendering systems addressed this poorly, because an intermediate representation consisting of a single surface lacks the ability to preserve motion parallax. The combination of volume raycasting with feature peeling yields an image-based representation that is simultaneously suitable for high quality reconstruction and for fast rendering of transparent datasets. Moreover, novel methods for trading excess speed for visual quality are introduced, and strategies for balancing quality versus speed during runtime are described. A performance evaluation section provides details on possible application scenarios.

  • 27.04.2010 / ICG / 13:00h

Title: MCMC Sampling for urban scene analysis
Speaker: Florent Lafarge
Abstract: This talk presents a family of probabilistic tools, the Markov Chain Monte Carlo (MCMC) samplers, which is efficient at minimizing non-convex energies in spaces of high dimension. These optimization algorithms have several interesting properties. For example, they allow us to deal with energies of any form which, in turn, enables us to introduce complex interactions between the objects of interest. They are also adapted at exploring large configuration spaces of variable dimension.

In this talk, we first detail the principle of these samplers and present the various possibilities offered such as the birth and death of objects in a scene, the switching of object types from a library of models, and coupling with diffusion dynamics. We then propose some applications for urban scene analysis. In particular, we present models for representing natural textures, extracting objects of interest from aerial images such as road networks, reconstructing buildings from DEMs, and modelling facades from multi-view stereo images.

  • 23.04.2010 / ICG / 13:00h

Title: Learning Object Detectors from Multiple Cameras by Centralized Information Fusion
Speaker: Armin Berger
Abstract: Automated object detection is an important task in computer vision and visual surveillance in particular. It is a difficult task to train accurate detectors that have a high performance on a wide variety of scenes. For this purpose, recently, in surveillance multi-camera networks attracted interest for training scene specific detectors to improve the detection performance and decrease the false positive rate, since there are many problems that cannot be tackled with single camera approaches (e.g. occlusion handling).

This thesis introduces a novel centralized approach to simplify information fusion within a multi-camera network by learning an object detectors from multiple cameras. This approach allows to collect information form an arbitrary number of cameras. Having calibrated cameras, where the calibration has to be performed only once for each camera, the centralized approach projects each camera's detection information to a central (virtual) camera. A mean-shift algorithm extracts local maxima from the fused information. This location information is back-projected to the single camera views to extract additional examples for training. The approach is demonstrated for the task of person detection within an on-line boosting framework. A detailed analysis of the learning behavior is given and it is shown that the performance of state-of-the-art detectors can be achieved on single camera views although only a small number of labeled training examples are used.

  • 13.04.2010 / ICG / 13:00h

Title: Planar Features for Visual SLAM
Speaker: Tobias Pietzsch
Abstract: In simultaneous localisation and mapping (SLAM), we are concerned with estimating the pose of a mobile robot and simultaneously building a map of the environment it is navigating. Visual SLAM, tackling the problem using a camera as the only sensor, has made astonishing progress in recent years, regarding both scalability and robustness of the devised solutions. The majority of existing systems focus on building sparse maps of point features to enable reliable camera pose tracking. However, the usefulness of sparse maps is limited for many other interesting scenarios. Examples include path planning in robotics or occlusion of artificial objects in augmented reality. These tasks would require maps representing dense structure which allow geometric reasoning. I argue that in order to achieve denser maps, we should go beyond point features to more descriptive features such as line or surface segments.

In this talk, I discuss the application of planar surface segments as features in EKF-based visual SLAM. These planar features are measured directly using the intensities of individual pixels in the camera images. In this way, the information provided by changes in feature appearance due to changing view-point is directly used to improve the state estimate. I will discuss several issues that arise from using intensity measurements and propose solutions. In particular I will address the cubic cost of the EKF update step in the dimension of the measurement vector, because planar feature measurements usually comprise thousands of individual pixel intensities. Finally, I will present experimental results that show robust camera tracking using planar features and increased accuracy in comparison to traditional point features.

  • 30.03.2010 / ICG / 13:00h

Title: Rapid 3D modeling from live video
Speaker: Qi Pan
Abstract: ProFORMA is a system capable of real-time 3D reconstruction of textured objects rotated by a user's hand. Partial models are rapidly generated from the live video and displayed to the user, as well as used by the system to robustly track the object's motion. The system works by calculating the Delaunay tetrahedralisation of a point cloud obtained from on-line structure from motion estimation which is then carved using a recursive and probabilistic algorithm to rapidly obtain the surface mesh. This talk will look at the techniques used in the system as well as future work we plan to conduct in rapid 3D modelling.

  • 23.03.2010 / ICG / 13:00h

Title: Computer-Vision based Pharmaceutical Pill Recognition on Mobile Phones
Speaker: Andreas Hartl
Abstract: In this work we present a mobile computer vision system which simplifies the task of identifying pharmaceutical pills. A single input image of pills on a special marker-based target is processed by an efficient method for object segmentation on structured background. Estimators for the object properties size, shape and color deliver parameters that can be used for querying an on-line database about an unknown pill. A prototype application is constructed using the Studierstube ES framework, which allows to carry out the entire procedure of pill recognition on off-the-shelf mobile hardware. For the purpose of pill retrieval, an additional piece of software is introduced which runs on an ordinary web server. It may deliver preprocessed pill information from an arbitrary database to the mobile device and serves as an interface for arbitrary sources of information. The performance of the estimators as well as their runtime is subsequently evaluated with conditions that resemble typical environments of use. The retrieval performance on the exemplarily used Identa database confirms that the system can facilitate the task of mobile pill recognition in a realistic scenario.

  • 18.03.2010 / ICG / 13:00h

Title: Rigid Body Reconstruction for Motion Analysis of Giant Honeybees Using Stereo Vision
Speaker: Michael Maurer
Abstract: Zoologists are interested in the defense waves of giant honeybees. Especially the movement of all single bees during the defense wave is of interest. Currently they are only able to measure the movement of a single bee using a laser vibrometer. A single measurement does not provide any information on speed, intensity and the starting point of a wave. They are interested in a sensor that enables a 3D reconstruction of the individuals while performing a defense wave. In order to solve this problem, a vision based measurement system is proposed. A portable stereo setup using two high resolution cameras with high frame rates is designed in this thesis to acquire the image sequences of the defense wave in an outdoor environment. The functionality of the acquisition setup has also been proven at an expedition to Nepal. Additionally, a framework to segment and reconstruct the single bees is presented. For the segmentation three different methods are proposed and evaluated. The correspondence problem is faced using reduced graph cuts to get accurate matches in the presence of repetitive patterns. The evaluation has been done by comparison to manually labeled data.

Title: Verteidigungsstrategien bei Riesenhonigbienen
Speaker: Gerald Kastberger
Abstract: Im Rahmen eines vom FWF geförderten Forschungsprojekts werden Kommunikationsleistungen der Riesenhonigbienen (Apis dorsata) untersucht. In diesem Zusammenhang wurde auch eine Kooperation mit der Technischen Universität Graz gesucht. Michael Maurer vom Institute for Computer Graphics and Vision (ICG) hat gemeinsam mit Horst Bischof und Matthias Rüther ein portables Stereo-tracking-System entwickelt. Damit wurde in Chitwan (Nepal) versucht, die an den freihängenden flächenförmigen Einwaben-Nestern eine für Riesenhonigbienen wohl einzigartige Verteidigungsleistung, in drei Raumdimensionen zu vermessen. Dabei handelt es sich um das sogenannte Shimmering, ein kaskadisches Bewegungsmuster der an den Nestern hängenden Oberflächenbienen, das ähnlich den Mexican Waves in Fußballstadien abläuft. Der Wellencharakter und die Schnelligkeit dieses kollektiven Verhaltens haben die Funktion, prädatorische Wespen abzuwehren. Darüber hinaus wird aber auch vermutet, dass Shimmering Kolonie-intrinsische Bedeutung hat, um die Mitglieder der Kolonie über den augenblicklichen Verteidigungsstatus zu informieren. Die Methode des Stereotracking, die von Michael Maurer auf diese Anwendung angepasst wurde, hat den Vorteil, dass die Position der einzelnen Bienen an der Oberfläche des Nests in allen drei Raumdimensionen bei einer Frame-Rate von 60 Hz nicht-invasiv vermessen werden konnte. Damit erschließen sich neue Wege, solch in Bruchteilen einer Sekunde ablaufende kollektive Verhaltensweisen von einem außerordentlich hohen Synchonisierungsgrad zu untersuchen. Im Vortrag möchte ich die einzigartige Biologie der Riesenhonigbienen vorstellen, die feldtaugliche Applikation der erwähnten Versuchsmethode beschreiben und die damit erzielten ersten Resultate kurz darstellen.

  • 09.03.2010 / ICG / 13:00h

Title: GPU-based reconstruction and visualization of needles in X-ray images
Speaker: Matthias Scharrer
Abstract: This work presents an algorithm to detect rigid straight biopsy needles in multi-view C-arm X-ray images and reconstruction of its tip and orientation in three dimensional space. Several well known computer vision techniques are applied to achieve this goal. The processing pipeline described consists of several stages including a denoising and preprocessing stage with image filtering, computing a norm for better distinction of needle and background and applying the Radon transform to determine the orientation, refinement of the latter by utilizing the random sample consensus paradigm, needle tip detection, reconstruction of the data to three dimensional space with Direct Linear Transform and improving robustness by determining deviation in back projection. Afterwards, the results are visualized by a recently developed application, which facilitates the display of volumetric data together with polygonal geometry and intersection thereof. The processing steps are described in detail, a short overview is given about the surrounding application and finally, the evaluation results of the experiments on real X-ray imaging data is presented and discussed.

[Powered by Plone]