2018

Journal Articles

  1. QuadricSLAM: Constrained Dual Quadrics from Object Detections as Landmarks in Object-oriented SLAM Lachlan Nicholson, Michael Milford, Niko Sünderhauf. IEEE Robotics and Automation Letters (RA-L), 2018. In this paper, we use 2D object detections from multiple views to simultaneously estimate a 3D quadric surface for each object and localize the camera position. We derive a SLAM formulation that uses dual quadrics as 3D landmark representations, exploiting their ability to compactly represent the size, position and orientation of an object, and show how 2D object detections can directly constrain the quadric parameters via a novel geometric error formulation. We develop a sensor model for object detectors that addresses the challenge of partially visible objects, and demonstrate how to jointly estimate the camera pose and constrained dual quadric parameters in factor graph based SLAM with a general perspective camera. [arXiv] [website]
  2. The Limits and Potentials of Deep Learning for Robotics Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford, others. The International Journal of Robotics Research, 2018. The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique challenges for deep robotic learning in simulation, and explore the spectrum between purely data-driven and model-driven approaches. We hope this paper provides a motivating overview of important research directions to overcome the current limitations, and help fulfill the promising potentials of deep learning in robotics. [arXiv]

Special Issues

  1. Deep Learning for Robotic Vision Anelia Angelova, Gustavo Carneiro, Niko Sünderhauf, Jürgen Leitner. 2018.
  2. Special Issue on Deep Learning in Robotics Niko Sünderhauf, Jürgen Leitner, Ben Upcroft, Jose Neira. The International Journal of Robotics Research (IJRR), 2018.

Conference Publications

  1. Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal Jake Bruce, Niko Sünderhauf, Piotr Mirowski, Raia Hadsell, Michael Milford. In Proc. of Conference on Robot Learning (CoRL), 2018. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. [arXiv] [website]
  2. LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics Sourav Garg, Niko Sünderhauf, Michael Milford. In Proc. of Robotics: Science and Systems (RSS), 2018. In this paper we develop a suite of novel semantic- and appearance-based techniques to enable for the first time high performance place recognition in the challenging scenario of recognizing places when returning from the opposite direction. We first propose a novel Local Semantic Tensor (LoST) descriptor of images using the convolutional feature maps from a state-of-the-art dense semantic segmentation network. Then, to verify the spatial semantic arrangement of the top matching candidates, we develop a novel approach for mining semantically-salient keypoint correspondences.
  3. Dropout Sampling for Robust Object Detection in Open-Set Conditions Dimity Miller, Lachlan Nicholson, Feras Dayoub, Niko Sünderhauf. In Proc. of IEEE International Conference on Robotics and Automation (ICRA), 2018. Dropout Variational Inference, or Dropout Sampling, has been recently proposed as an approximation technique for Bayesian Deep Learning and evaluated for image classification and regression tasks. This paper investigates the utility of Dropout Sampling for object detection for the first time. We demonstrate how label uncertainty can be extracted from a state-of-the-art object detection system via Dropout Sampling. We show that this uncertainty can be utilized to increase object detection performance under the open-set conditions that are typically encountered in robotic vision. We evaluate this approach on a large synthetic dataset with 30,000 images, and a real-world dataset captured by a mobile robot in a versatile campus environment.
  4. Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018. To enable and encourage the application of vision and language methods to the problem of interpreting visually grounded navigation instructions, we present the Matterport3D Simulator – a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings – the Room-to-Room (R2R) dataset.
  5. Don’t Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition Sourav Garg, Niko Sünderhauf, Michael Milford. In Proc. of IEEE International Conference on Robotics and Automation (ICRA), 2018. In this work, we develop a novel methodology for using the semantics-aware higher-order layers of deep neural networks for recognizing specific places from within a reference database. To further improve the robustness to appearance change, we develop a descriptor normalization scheme that builds on the success of normalization schemes for pure appearance-based techniques.
  6. SceneCut: Joint Geometric and Object Segmentation for Indoor Scenes Trung T. Pham, Thanh-Toan Do, Niko Sünderhauf, Ian Reid, Michael Milford. In Proc. of IEEE International Conference on Robotics and Automation (ICRA), 2018. This paper presents SceneCut, a novel approach to jointly discover previously unseen objects and non-object surfaces using a single RGB-D image. SceneCut’s joint reasoning over scene semantics and geometry allows a robot to detect and segment object instances in complex scenes where modern deep learning-based methods either fail to separate object instances, or fail to detect objects that were not seen during training. SceneCut automatically decomposes a scene into meaningful regions which either represent objects or scene surfaces. The decomposition is qualified by an unified energy function over objectness and geometric fitting. We show how this energy function can be optimized efficiently by utilizing hierarchical segmentation trees.

Workshop Publications

  1. QuadricSLAM: Constrained Dual Quadrics from Object Detections as Landmarks in Semantic SLAM Lachlan Nicholson, Michael Milford, Niko Sünderhauf. In Workshop on Representing a Complex World, International Conference on Robotics and Automation (ICRA), 2018. Best Workshop Paper Award We derive a SLAM formulation that uses dual quadrics as 3D landmark representations, exploiting their ability to compactly represent the size, position and orientation of an object, and show how 2D bounding boxes (such as those typically obtained from visual object detection systems) can directly constrain the quadric parameters via a novel geometric error formulation. We develop a sensor model for deep-learned object detectors that addresses the challenge of partial object detections often encountered in robotics applications, and demonstrate how to jointly estimate the camera pose and constrained dual quadric parameters in factor graph based SLAM with a general perspective camera. [arXiv] [website]

arXiv Preprints

  1. Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection Dimity Miller, Feras Dayoub, Michael Milford, Niko Sünderhauf. arXiv preprint, 2018. There has been a recent emergence of sampling-based techniques for estimating epistemic uncertainty in deep neural networks. While these methods can be applied to classification or semantic segmentation tasks by simply averaging samples, this is not the case for object detection, where detection sample bounding boxes must be accurately associated and merged. A weak merging strategy can significantly degrade the performance of the detector and yield an unreliable uncertainty measure. This paper provides the first in-depth investigation of the effect of different association and merging strategies. We compare different combinations of three spatial and two semantic affinity measures with four clustering methods for MC Dropout with a Single Shot Multi-Box Detector. Our results show that the correct choice of affinity-clustering combinations can greatly improve the effectiveness of the classification and spatial uncertainty estimation and the resulting object detection performance. We base our evaluation on a new mix of datasets that emulate near open-set conditions (semantically similar unknown classes), distant open-set conditions (semantically dissimilar unknown classes) and the common closed-set conditions (only known classes). [arXiv]
  2. An Orientation Factor for Object-Oriented SLAM Natalie Jablonsky, Michael Milford, Niko Sünderhauf. arXiv preprint, 2018. Current approaches to object-oriented SLAM lack the ability to incorporate prior knowledge of the scene geometry, such as the expected global orientation of objects. We overcome this limitation by proposing a geometric factor that constrains the global orientation of objects in the map, depending on the objects’ semantics. This new geometric factor is a first example of how semantics can inform and improve geometry in object-oriented SLAM. We implement the geometric factor for the recently proposed QuadricSLAM that represents landmarks as dual quadrics. The factor probabilistically models the quadrics’ major axes to be either perpendicular to or aligned with the direction of gravity, depending on their semantic class. Our experiments on simulated and real-world datasets show that using the proposed factors to incorporate prior knowledge improves both the trajectory and landmark quality. [arXiv] [website]
  3. Zero-shot Sim-to-Real Transfer with Modular Priors Robert Lee, Serena Mou, Vibhavari Dasagi, Jake Bruce, Jürgen Leitner, Niko Sünderhauf. arXiv preprint, 2018. Current end-to-end Reinforcement Learning (RL) approaches are severely limited by restrictively large search spaces and are prone to overfitting to their training environment. This is because in end-to-end RL perception, decision-making and low-level control are all being learned jointly from very sparse reward signals, with little capability of incorporating prior knowledge or existing algorithms. In this work, we propose a novel framework that effectively decouples RL for high-level decision making from low-level perception and control. This allows us to transfer a learned policy from a highly abstract simulation to a real robot without requiring any transfer learning. We therefore coin our approach zero-shot sim-to-real transfer. We successfully demonstrate our approach on the robot manipulation task of object sorting. A key component of our approach is a deep sets encoder that enables us to reinforcement learn the high-level policy based on the variable-length output of a pre-trained object detector, instead of learning from raw pixels. We show that this method can learn effective policies within mere minutes of highly simplified simulation. The learned policies can be directly deployed on a robot without further training, and generalize to variations of the task unseen during training. [arXiv]

2017

Journal Articles

  1. Multi-Modal Trip Hazard Affordance Detection On Construction Sites Sean McMahon, Niko Sünderhauf, Ben Upcroft, Michael J Milford. IEEE Robotics and Automation Letters (RA-L), 2017. Trip hazards are a significant contributor to accidents on construction and manufacturing sites. We conduct a comprehensive investigation into the performance characteristics of 11 different colors and depth fusion approaches, including four fusion and one nonfusion approach, using color and two types of depth images. Trained and tested on more than 600 labeled trip hazards over four floors and 2000 m2 in an active construction site, this approach was able to differentiate between identical objects in different physical configurations. Outperforming a color-only detector, our multimodal trip detector fuses color and depth information to achieve a 4% absolute improvement in F1-score. These investigative results and the extensive publicly available dataset move us one step closer to assistive or fully automated safety inspection systems on construction sites.

Conference Publications

  1. Meaningful Maps With Object-Oriented Semantic Mapping Niko Sünderhauf, Trung T. Pham Pham, Yasir Latif, Michael Milford. In Proc. of IEEE International Conference on Intelligent Robots and Systems (IROS), 2017. For intelligent robots to interact in meaningful ways with their environment, they must understand both the geometric and semantic properties of the scene surrounding them. The majority of research to date has addressed these mapping challenges separately, focusing on either geometric or semantic mapping. In this paper we address the problem of building environmental maps that include both semantically meaningful, object-level entities and point- or mesh-based geometrical representations. We simultaneously build geometric point cloud models of previously unseen instances of known object classes and create a map that contains these object models as central entities. Our system leverages sparse, feature-based RGB-D SLAM, image-based deep-learning object detection and 3D unsupervised segmentation.
  2. The ACRV Picking Benchmark: A Robotic Shelf Picking Benchmark to Foster Reproducible Research Jürgen Leitner, Adam W Tow, Jake E Dean, Niko Sünderhauf, Joseph W Durham, Matthew Cooper, Markus Eich, Christopher Lehnert, Ruben Mangels, Christopher McCool, Peter Kujala, Lachlan Nicholson, Trung Pham, James Sergeant, Fangyi Zhang, Ben Upcroft, Peter Corke. In Proc. of IEEE International Conference on Robotics and Automation (ICRA), 2017. Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark. Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of complete robotic systems - including perception and manipulation - instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot.
  3. Deep Learning Features at Scale for Visual Place Recognition Zetao Chen, Adam Jacobson, Niko Sünderhauf, Ben Upcroft, Lingqiao Liu, Chunhua Shen, Ian Reid, Michael Milford. In Proc. of IEEE International Conference on Robotics and Automation (ICRA), 2017. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature encoding method to generate condition- and viewpoint-invariant features. To enable this training to occur, we have developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of place appearance change at thousands of different places, as opposed to the semantic place type datasets currently available. This new dataset enables us to set up a training regime that interprets place recognition as a classification problem. We comprehensively evaluate our trained networks on several challenging benchmark place recognition datasets and demonstrate that they achieve an average 10% increase in performance over other place recognition algorithms and pre-trained CNNs.
  4. Auxiliary Tasks to Improve Trip Hazard Affordance Detection Sean McMahon, Tong Shen, Niko Sünderhauf, Ian Reid, Chunhua Shen, Michael Milford. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), 2017. We propose to train a CNN performing pixel-wise trip detection with three auxiliary tasks to help the CNN better infer scene geometric properties of trip hazards. Of the three approaches investigated pixel-wise ground plane estimation, pixel depth estimation and pixel height above ground plane estimation, the first approach allowed the trip detector to achieve a 11.1% increase in Trip IOU over earlier work. These new approaches make it plausible to deploy a robotic platform to perform trip hazard detection, and so potentially reduce the number of injuries on construction sites.

Workshop Publications

  1. Dropout Variational Inference Improves Object Detection in Open-Set Conditions Dimity Miller, Lachlan Nicholson, Feras Dayoub, Niko Sünderhauf. In Proc. of NIPS Workshop on Bayesian Deep Learning, 2017. One of the biggest current challenges of visual object detection is reliable operation in open-set conditions. One way to handle the open-set problem is to utilize the uncertainty of the model to reject predictions with low probability. Bayesian Neural Networks (BNNs), with variational inference commonly used as an approximation, is an established approach to estimate model uncertainty. Here we extend the concept of Dropout sampling to object detection for the first time. We evaluate Bayesian object detection on a large synthetic and a real-world dataset and show how the estimated label uncertainty can be utilized to increase object detection performance under open-set conditions.
  2. Episode-Based Active Learning with Bayesian Neural Networks Feras Dayoub, Niko Sünderhauf, Peter Corke. In Workshop on Deep Learning for Robotic Vision, Conference on Computer Vision and Pattern Recognition (CVPR), 2017. We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final training on the accumulated acquisition set are essential for best performance, while limiting the amount of required human labeling labor.
  3. One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay Jacob Bruce, Niko Sünderhauf, Piotr Mirowski, Raia Hadsell, Michael Milford. In Proc. of NIPS Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning, 2017. Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fixed goal and in a known environment, on a mobile robot. The robot leverages an interactive world model built from a single traversal of the environment, a pre-trained visual feature encoder, and stochastic environmental augmentation, to demonstrate successful zero-shot transfer under real-world environmental variations without fine-tuning.

2016

Conference Publications

  1. A Robustness Analysis of Deep Q Networks Adam W Tow, Sareh Shirazi, Jürgen Leitner, Niko Sünderhauf, Michael Milford, Ben Upcroft. In Proc. of Australasian Conference on Robotics and Automation (ACRA), 2016. In this paper, we present an analysis of the robustness of Deep Q Networks to various types of perceptual noise (changing brightness, Gaussian blur, salt and pepper, distractors). We present a benchmark example that involves playing the game Breakout though a webcam and screen environment, like humans do. We present a simple training approach to improve the performance maintained when transferring a DQN agent trained in simulation to the real world. We also evaluate DQN agents trained under a variety of simulation environments to report for the first time how DQNs cope with perceptual noise, common to real world robotic applications.
  2. High-Fidelity Simulation for Evaluating Robotic Vision Performance John Robert Skinner, Sourav Garg, Niko Sünderhauf, Peter Corke, Ben Upcroft, Michael J Milford. In Proc. of IEEE International Conference on Intelligent Robots and Systems (IROS), 2016. For machine learning applications a critical bottleneck is the limited amount of real world image data that can be captured and labelled for both training and testing purposes. In this paper we investigate the use of a photo-realistic simulation tool to address these challenges, in three specific domains: robust place recognition, visual SLAM and object recognition. For the first two problems we generate images from a complex 3D environment with systematically varying camera paths, camera viewpoints and lighting conditions. For the first time we are able to systematically characterise the performance of these algorithms as paths and lighting conditions change. In particular, we are able to systematically generate varying camera viewpoint datasets that would be difficult or impossible to generate in the real world. We also compare algorithm results for a camera in a real environment and a simulated camera in a simulation model of that real environment. Finally, for the object recognition domain, we generate labelled image data and characterise the viewpoint dependency of a current convolution neural network in performing object recognition. Together these results provide a multi-domain demonstration of the beneficial properties of using simulation to characterise and analyse a wide range of robotic vision algorithms.
  3. LunaRoo: Designing a Hopping Lunar Science Payload. Jürgen Leitner, Will Chamberlain, Donald G. Dansereau, Matthew Dunbabin, Markus Eich, Thierry Peynot, Jon Roberts, Raymond Russell, Niko Sünderhauf. In Proc. of IEEE Aerospace Conference, 2016. We describe a hopping science payload solution designed to exploit the Moon’s lower gravity to leap up to 20m above the surface. The entire solar-powered robot is compact enough to fit within a 10cm cube, whilst providing unique observation and mission capabilities by creating imagery during the hop. The LunaRoo concept is a proposed payload to fly onboard a Google Lunar XPrize entry. Its compact form is specifically designed for lunar exploration and science mission within the constraints given by PTScientists. The core features of LunaRoo are its method of locomotion - hopping like a kangaroo - and its imaging system capable of unique over-the-horizon perception. The payload will serve as a proof of concept, highlighting the benefits of alternative mobility solutions, in particular enabling observation and exploration of terrain not traversable by wheeled robots. in addition providing data for beyond line-of-sight planning and communications for surface assets, extending overall mission capabilities.
  4. Place Categorization and Semantic Mapping on a Mobile Robot Niko Sünderhauf, Feras Dayoub, Sean McMahon, Ben Talbot, Ruth Schulz, Peter Corke, Gordon Wyeth, Ben Upcroft, Michael Milford. Proc. of IEEE International Conference on Robotics and Automation (ICRA), 2016. In this paper we focus on the challenging problem of place categorization and semantic mapping on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot’s behaviour during navigation tasks. The system is made available to the community as a ROS module.

2015

Journal Articles

  1. Visual Place Recognition: A Survey Stephanie Lowry, Niko Sünderhauf, Paul Newman, John J Leonard, David Cox, Peter Corke, Michael J Milford. Transactions on Robotics (TRO), 2015. This paper presents a survey of the visual place recognition research landscape. We start by introducing the concepts behind place recognition – the role of place recognition in the animal kingdom, how a “place” is defined in a robotics context, and the major components of a place recognition system. We then survey visual place recognition solutions for environments where appearance change is assumed to be negligible. Long term robot operations have revealed that environments continually change; consequently we survey place recognition solutions that implicitly or explicitly account for appearance change within the environment. Finally we close with a discussion of the future of visual place recognition, in particular with respect to the rapid advances being made in the related fields of deep learning, semantic scene understanding and video description.
  2. Superpixel-based appearance change prediction for long-term navigation across seasons Peer Neubert, Niko Sünderhauf, Peter Protzel. Robotics and Autonomous Systems, 2015. The goal of our work is to support existing approaches to place recognition by learning how the visual appearance of an environment changes over time and by using this learned knowledge to predict its appearance under different environmental conditions. We describe the general idea of appearance change prediction (ACP) and investigate properties of our novel implementation based on vocabularies of superpixels (SP-ACP). This paper deepens the understanding of the proposed SP-ACP system and evaluates the influence of its parameters. We present the results of a largescale experiment on the complete 10 hour Nordland dataset and appearance change predictions between different combinations of seasons.

Conference Publications

  1. Evaluation of features for leaf classification in challenging conditions David Hall, Chris McCool, Feras Dayoub, Niko Sünderhauf, Ben Upcroft. In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, 2015. Fine-grained leaf classification has concentrated on the use of traditional shape and statistical features to classify ideal images. In this paper we evaluate the effectiveness of traditional hand-crafted features and propose the use of deep convolutional neural network (Conv Net) features. We introduce a range of condition variations to explore the robustness of these features, including: translation, scaling, rotation, shading and occlusion. Evaluations on the Flavia dataset demonstrate that in ideal imaging conditions, combining traditional and Conv Net features yields state-of-the art performance with an average accuracy of 97.3%±0:6% compared to traditional features which obtain an average accuracy of 91.2%±1:6%. Further experiments show that this combined classification approach consistently outperforms the best set of traditional features by an average of 5.7% for all of the evaluated condition variations.
  2. Multimodal Deep Autoencoders for Control of a Mobile Robot James Sergeant, Niko Sünderhauf, Michael Milford, Ben Upcroft. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), 2015. Robot navigation systems are typically engineered to suit certain platforms, sensing suites and environment types. In order to deploy a robot in an environment where its existing navigation system is insufficient, the system must be modified manually, often at significant cost. In this paper we address this problem, proposing a system based on multimodal deep autoencoders that enables a robot to learn how to navigate by observing a dataset of sensor input and motor commands collected while being teleoperated by a human. Low-level features and cross modal correlations are learned and used in initialising two different architectures with three operating modes. During operation, these systems exploit the learned correlations in generating suitable control signals based only on the sensor information.
  3. TripNet: Detecting Trip Hazards on Construction Sites Sean McMahon, Niko Sünderhauf, Michael Milford, Ben Upcroft. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), 2015. This paper introduces TripNet, a robotic vision system that detects trip hazards using raw construction site images. TripNet performs trip hazard identification using only camera imagery and minimal training with a pre-trained Convolutional Neural Network (CNN) rapidly fine-tuned on a small corpus of labelled image regions from construction sites. There is no reliance on prior scene segmentation methods during deployment. Trip-Net achieves comparable performance to a human on a dataset recorded in two distinct real world construction sites. TripNet exhibits spatial and temporal generalization by functioning in previously unseen parts of a construction site and over time periods of several weeks.
  4. Enhancing Human Action Recognition with Region Proposals Fahimeh Rezazadegan, Sareh Shirazi, Niko Sünderhauf, Michael Milford, Ben Upcroft. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), 2015.
  5. Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free Niko Sünderhauf, Sareh Shirazi, Adam Jacobson, Feras Dayoub, Edward Pepperell, Ben Upcroft, Michael Milford. In Proc. of Robotics: Science and Systems (RSS), 2015. Here we present an approach that adapts state-of-the-art object proposal techniques to identify potential landmarks within an image for place recognition. We use the astonishing power of convolutional neural network features to identify matching landmark proposals between images to perform place recognition over extreme appearance and viewpoint variations. Our system does not require any form of training, all components are generic enough to be used off-the-shelf. We present a range of challenging experiments in varied viewpoint and environmental conditions. We demonstrate superior performance to current state-of-the-art techniques. [Poster]
  6. On the Performance of ConvNet Features for Place Recognition Niko Sünderhauf, Feras Dayoub, Sareh Shirazi, Ben Upcroft, Michael Milford. In Proc. of IEEE International Conference on Intelligent Robots and Systems (IROS), 2015. This paper comprehensively evaluates and compares the utility of three state-of-the-art ConvNets on the problems of particular relevance to navigation for robots; viewpoint-invariance and condition-invariance, and for the first time enables real-time place recognition performance using ConvNets with large maps by integrating a variety of existing (locality-sensitive hashing) and novel (semantic search space partitioning) optimization techniques. We present extensive experiments on four real world datasets cultivated to evaluate each of the specific challenges in place recognition. The results demonstrate that speed-ups of two orders of magnitude can be achieved with minimal accuracy degradation, enabling real-time performance. We confirm that networks trained for semantic place categorization also perform better at (specific) place recognition when faced with severe appearance changes and provide a reference for which networks and layers are optimal for different aspects of the place recognition problem.

Workshop Publications

  1. Sequence Searching with Deep-learnt Depth for Condition-and Viewpoint-invariant Route-based Place Recognition Michael Milford, Stephanie Lowry, Niko Sünderhauf, Sareh Shirazi, Edward Pepperell, Ben Upcroft, Chunhua Shen, Guosheng Lin, Fayao Liu, Cesar Cadena, Ian Reid. In Workshop on Computer Vision in Vehicle Technology (CVVT), Conference on Computer Vision and Pattern Recognition (CVPR), 2015. Vision-based localization on robots and vehicles remains unsolved when extreme appearance change and viewpoint change are present simultaneously. In this paper we significantly improve the viewpoint invariance of the SeqSLAM algorithm by using state-of-the-art deep learning techniques to generate synthetic viewpoints. Our approach is different to other deep learning approaches in that it does not rely on the ability of the CNN network to learn invariant features, but only to produce“good enough” depth images from day-time imagery only. We evaluate the system on a new multi-lane day-night car dataset specifically gathered to simultaneously test both appearance and viewpoint change.
  2. Continuous Factor Graphs For Holistic Scene Understanding Niko Sünderhauf, Ben Upcroft, Michael Milford. In Workshop on Scene Understanding (SUNw), Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 2015. We propose a novel mathematical formulation for the holistic scene understanding problem and transform it from the discrete into the continuous domain. The problem can then be modeled with a nonlinear continuous factor graph, and the MAP solution is found via least squares optimization. We evaluate our method on the realistic NYU2 dataset.
  3. How Good Are EdgeBoxes, Really? Sean McMahon, Niko Sünderhauf, Ben Upcroft, Michael Milford. In Workshop on Scene Understanding (SUNw), Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 2015.
  4. SLAM – Quo Vadis? In Support of Object Oriented and Semantic SLAM Niko Sünderhauf, Feras Dayoub, Sean McMahon, Markus Eich, Ben Upcroft, Michael Milford. In Workshop on The Problem of Moving Sensors, Robotics: Science and Systems (RSS), 2015. Most current SLAM systems are still based on primitive geometric features such as points, lines, or planes. The created maps therefore carry geometric information, but no immediate semantic information. With the recent significant advances in object detection and scene classification we think the time is right for the SLAM community to ask where the SLAM research should be going during the next years. As a possible answer to this question, we advocate developing SLAM systems that are more object oriented and more semantically enriched than the current state of the art. This paper provides an overview of our ongoing work in this direction.

2014

Conference Publications

  1. Phobos and Deimos on Mars – Two Autonomous Robots for the DLR SpaceBot Cup Niko Sünderhauf, Peer Neubert, Martina Truschzinski, Daniel Wunschel, Johannes Pöschmann, Sven Lanve, Peter Protzel. In Proceedings of International Symposium on Artificial Intelligence, Robotics and Automation in Space (iSAIRAS), 2014. In 2013, ten teams from German universities and research institutes participated in a national robot competition called SpaceBot Cup organized by the DLR Space Administration. The robots had one hour to autonomously explore and map a challenging Mars-like environment, find, transport, and manipulate two objects, and navigate back to the landing site. Localization without GPS in an unstructured environment was a major issue as was mobile manipulation and very restricted communication. This paper describes our system of two rovers operating on the ground plus a quadrotor UAV simulating an observing orbiting satellite. We relied on ROS (robot operating system) as the software infrastructure and describe the main ROS components utilized in performing the tasks. Despite (or because of) faults, communication loss and breakdowns, it was a valuable experience with many lessons learned.

Workshop Publications

  1. Fine-Grained Plant Classification Using Convolutional Neural Networks for Feature Extraction. Niko Sünderhauf, Chris McCool, Ben Upcroft, Tristan Perez. In CLEF (Working Notes), 2014.

2013

Conference Publications

  1. Incremental Sensor Fusion in Factor Graphs with Unknown Delays Niko Sünderhauf, Sven Lange, Peter Protzel. In Proc. of ESA Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA), 2013. Our paper addresses the problem of performing incremental sensor fusion in factor graphs when some of the sensor information arrive with a significant unknown delay. We develop and compare two techniques to handle such delayed measurements under mild conditions on the characteristics of that delay: We consider the unknown delay to be bounded and quantizable into multiples of the state transition cycle time. The proposed methods are evaluated using a simulation of a dynamic 3-DoF system that fuses odometry and GPS measurements.
  2. Switchable Constraints and Incremental Smoothing for Online Mitigation of Non-Line-of-Sight and Multipath Effects Niko Sünderhauf, Marcus Obst, Sven Lange, Gerd Wanielik, Peter Protzel. In Proc. of IEEE Intelligent Vehicles Symposium (IV), 2013. Reliable vehicle positioning is a crucial requirement for many applications of advanced driver assistance systems. While satellite navigation provides a reasonable performance in general, it often suffers from multipath and non-line-of-sight errors when it is applied in urban areas and therefore does not guarantee consistent results anymore. Our paper proposes a novel online method that identifies and excludes the affected pseudorange measurements. Our approach does not depend on additional sensors, maps, or environmental models. We rather formulate the positioning problem as a Bayesian inference problem in a factor graph and combine the recently developed concept of switchable constraints with an algorithm for efficient incremental inference in such graphs. We furthermore introduce the concepts of auxiliary updates and factor graph pruning in order to accelerate convergence while keeping the graph size and required runtime bounded. A realworld experiment demonstrates that the resulting algorithm is able to successfully localize despite a large number of satellite observations are influenced by NLOS or multipath effects.
  3. Switchable Constraints vs. Max-Mixture Models vs. RRR – A Comparison of three Approaches to Robust Pose Graph SLAM Niko Sünderhauf, Peter Protzel. In Proc. of Intl. Conf. on Robotics and Automation (ICRA), 2013. SLAM algorithms that can infer a correct map despite the presence of outliers have recently attracted increasing attention. In the context of SLAM, outlier constraints are typically caused by a failed place recognition due to perceptional aliasing. If not handled correctly, they can have catastrophic effects on the inferred map. Since robust robotic mapping and SLAM are among the key requirements for autonomous longterm operation, inference methods that can cope with such data association failures are a hot topic in current research. Our paper compares three very recently published approaches to robust pose graph SLAM, namely switchable constraints, maxmixture models and the RRR algorithm. All three methods were developed as extensions to existing factor graph-based SLAM back-ends and aim at improving the overall system’s robustness to false positive loop closure constraints. Due to the novelty of the three proposed algorithms, no direct comparison has been conducted so far.
  4. Incremental Smoothing vs. Filtering for Sensor Fusion on an Indoor UAV Sven Lange, Niko Sünderhauf, Peter Protzel. In Proc. of Intl. Conf. on Robotics and Automation (ICRA), 2013. Our paper explores the performance of a recently proposed incremental smoother in the context of nonlinear sensor fusion for a real-world UAV. This efficient factor graph based smoothing approach has a number of advantages compared to conventional filtering techniques like the EKF or its variants. It can more easily incorporate asynchronous and delayed measurements from sensors operating at different rates and is supposed to be less error-prone in highly nonlinear settings. We compare the novel incremental smoothing approach based on iSAM2 against our conventional EKF based sensor fusion framework. Unlike previously presented work, the experiments are not only performed in simulation, but also on a real-world quadrotor UAV system using IMU, optical flow and altitude measurements.
  5. Appearance Change Prediction for Long-Term Navigation Across Seasons Peer Neubert, Niko Sünderhauf, Peter Protzel. In Proceedings of European Conference on Mobile Robotics (ECMR), 2013.

Workshop Publications

  1. Predicting the Change – A Step Towards Life-Long Operation in Everyday Environments Niko Sünderhauf, Peer Neubert, Peter Protzel. In Proceedings of Robotics: Science and Systems (RSS) Robotics Challenges and Vision Workshop, 2013.
  2. Are We There Yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons. Niko Sünderhauf, Peer Neubert, Peter Protzel. In Proceedings of Workshop on Long-Term Autonomy, IEEE International Conference on Robotics and Automation (ICRA), 2013.

2012

Conference Publications

  1. Multipath Mitigation in GNSS-Based Localization using Robust Optimization Niko Sünderhauf, Marcus Obst, Gerd Wanielik, Peter Protzel. In Proc. of IEEE Intelligent Vehicles Symposium (IV), 2012. Our paper adapts recent advances in the SLAM (Simultaneous Localization and Mapping) literature to the problem of multipath mitigation and proposes a novel approach to successfully localize a vehicle despite a significant number of multipath observations. We show that GNSS-based localization problems can be modelled as factor graphs and solved using efficient nonlinear least squares methods that exploit the sparsity inherent in the problem formulation. Using a recently developed novel approach for robust optimization, satellite observations that are subject to multipath errors can be successfully identified and rejected during the optimization process. We demonstrate the feasibility of the proposed approach on a real-world urban dataset and compare it to an existing method of multipath detection.
  2. Towards a Robust Back-End for Pose Graph SLAM Niko Sünderhauf, Peter Protzel. In Proc. of IEEE Intl. Conf. on Robotics and Automation (ICRA), 2012. We propose a novel formulation that allows the back-end to change parts of the topological structure of the graph during the optimization process. The back-end can thereby discard loop closures and converge towards correct solutions even in the presence of false positive loop closures. This largely increases the overall robustness of the SLAM system and closes a gap between the sensor-driven front-end and the back-end optimizers. We demonstrate the approach and present results both on large scale synthetic and real-world dataset
  3. Switchable Constraints for Robust Pose Graph SLAM Niko Sünderhauf, Peter Protzel. In Proc. of IEEE International Conference on Intelligent Robots and Systems (IROS), 2012. Current SLAM back-ends are based on least squares optimization and thus are not robust against outliers like data association errors and false positive loop closure detections. Our paper presents and evaluates a robust back-end formulation for SLAM using switchable constraints. Instead of proposing yet another appearance-based data association technique, our system is able to recognize and reject outliers during the optimization. This is achieved by making the topology of the underlying factor graph representation subject to the optimization instead of keeping it fixed. The evaluation shows that the approach can deal with up to 1000 false positive loop closure constraints on various datasets. This largely increases the robustness of the overall SLAM system and closes a gap between the sensor-driven front-end and the back-end optimizers.
  4. Towards Robust Graphical Models for GNSS-Based Localization in Urban Environments Niko Sünderhauf, Peter Protzel. In Proc. of IEEE International Multi-Conference on Systems, Signals and Devices (SSD), 2012.

Workshop Publications

  1. A Generic Approach for Robust Probabilistic Estimation with Graphical Models Niko Sünderhauf, Peter Protzel. In Proc. of RSS Workshop on Long-term Operation of Autonomous Robotic Systems in Changing Environments, 2012.

Misc

  1. Robust Optimization for Simultaneous Localization and Mapping Niko Sünderhauf. PhD Thesis. Chemnitz University of Technology 2012.

2011

Conference Publications

  1. BRIEF-Gist – Closing the Loop by Simple Means Niko Sünderhauf, Peter Protzel. In Proc. of IEEE Intl. Conf. on Intelligent Robots and Systems (IROS), 2011.
  2. Autonomous Corridor Flight of a UAV Using a Low-Cost and Light-Weight RGB-D Camera Sven Lange, Niko Sünderhauf, Peer Neubert, Sebastian Drews, Peter Protzel. In Proc. of Intl. Symposium on Autonomous Mini Robots for Research and Edutainment (AMiRE), 2011. We describe the first application of the novel Kinect RGB-D sensor on a fully autonomous quadrotor UAV. In contrast to the established RGB-D devices that are both expensive and comparably heavy, the Kinect is light-weight and especially low-cost. It provides dense color and depth information and can be readily applied to a variety of tasks in the robotics domain. We apply the Kinect on a UAV in an indoor corridor scenario. The sensor extracts a 3D point cloud of the environment that is further processed on-board to identify walls, obstacles, and the position and orientation of the UAV inside the corridor. Subsequent controllers for altitude, position, velocity, and heading enable the UAV to autonomously operate in this indoor environment.
  3. Autonomous Corridor Flight of a UAV Using an RGB-D Camera Sven Lange, Niko Sünderhauf, Peer Neubert, Sebastian Drews, Peter Protzel. In Proc. of EuRobotics RGB-D Workshop on 3D Perception in Robotics, 2011. We describe the first application of the novel Kinect RGB-D sensor on a fully autonomous quadrotor UAV. We apply the UAV in an indoor corridor scenario. The position and orientation of the UAV inside the corridor is extracted from the RGB-D data. Subsequent controllers for altitude, posi- tion, velocity, and heading enable the UAV to autonomously operate in this indoor environment.

2010

Journal Articles

  1. Learning from Nature: Biologically Inspired Robot Navigation and SLAM – A Review Niko Sünderhauf, Peter Protzel. Künstliche Intelligenz (German Journal on Artificial Intelligence), Special Issue on SLAM, 2010. In this paper we summarize the most important neuronal fundamentals of navigation in rodents, primates and humans. We review a number of brain cells that are involved in spatial navigation and their properties. Furthermore, we review RatSLAM, a working SLAM system that is partially inspired by neuronal mechanisms underlying mammalian spatial navigation.

Conference Publications

  1. The Causal Update Filter – A Novel Biologically Inspired Filter Paradigm for Appearance Based SLAM Niko Sünderhauf, Peer Neubert, Peter Protzel. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), 2010. Recently a SLAM algorithm based on biological principles (RatSLAM) has been proposed. It was proven to perform well in large and demanding scenarios. In this paper we establish a comparison of the principles underlying this algorithm with standard probabilistic SLAM approaches and identify the key difference to be an additive update step. Using this insight, we derive the novel, non-Bayesian Causal Update filter that is suitable for application in appearance-based SLAM. We successfully apply this new filter to two demanding vision-only urban SLAM problems of 5 and 66 km length. We show that it can functionally replace the core of RatSLAM, gaining a massive speed-up.
  2. Beyond RatSLAM: Improvements to a Biologically Inspired SLAM System Niko Sünderhauf, Peter Protzel. In Proc of Intel. Conf. on Emerging Technologies and Factory Automation (ETFA), 2010. A SLAM algorithm inspired by biological principles has been recently proposed and shown to perform well in a large and demanding scenario. We analyse and compare this system (RatSLAM) and the established Bayesian SLAM methods and identify the key difference to be an additive update step. Using this insight, we derive a novel filter scheme and successfully show that it can entirely replace the core of the RatSLAM system while maintaining its desirable robustness. This leads to a massive speedup, as the novel filter can be calculated very efficiently. We successfully applied the new algorithm to the same 66 km long dataset that was used with the original algorithm.
  3. From Neurons to Robots: Towards Efficient Biologically Inspired Filtering and SLAM Niko Sünderhauf, Peter Protzel. In Proc. of KI 2010: Advances in Artificial Intelligence, 2010. We discuss recently published models of neural information process- ing under uncertainty and a SLAM system that was inspired by the neural struc- tures underlying mammalian spatial navigation. We summarize the derivation of a novel filter scheme that captures the important ideas of the biologically inspired SLAM approach, but implements them on a higher level of abstraction. This leads to a new and more efficient approach to biologically inspired filtering which we successfully applied to real world urban SLAM challenge of 66 km length.

2009

Conference Publications

  1. A Vision Based Onboard Approach for Landing and Position Control of an Autonomous Multirotor UAV in GPS-Denied Environments Sven Lange, Niko Sünderhauf, Peter Protzel. In Proc. of Intl. Conf. on Advanced Robotics (ICAR), 2009. We describe our work on multirotor UAVs and focus on our method for autonomous landing and position control. The paper describes the design of our landing pad and the vision based detection algorithm that estimates the 3Dposition of the UAV relative to the landing pad. A cascaded controller structure stabilizes velocity and position in the absence of GPS signals by using a dedicated optical flow sensor. Practical experiments prove the quality of our approach.
  2. Using Image Profiles and Integral Images for Efficient Calculation of Sparse Optical Flow Fields. Niko Sünderhauf, Peter Protzel. In Proceedings of the International Conference on Advanced Robotics, 2009.

2008

Conference Publications

  1. Autonomous Landing for a Multirotor UAV Using Vision Sven Lange, Niko Sünderhauf, Peter Protzel. In Workshop Proc. of SIMPAR 2008 Intl. Conf. on Simulation, Modeling and Programming for Autonomous Robots, 2008. We describe our work on multirotor UAVs and focus on our method for autonomous landing. The paper describes the design of our landing pad and its advantages. We explain how the landing pad detection algorithm works and how the 3D-position of the UAV relative to the landing pad is calculated. Practical experiments prove the quality of these estimations.

2007

Conference Publications

  1. Using the Unscented Kalman Filter in Mono-SLAM with Inverse Depth Parametrization for Autonomous Airship Control Niko Sünderhauf, Sven Lange, Peter Protzel. In Proc. of IEEE International Workshop on Safety Security and Rescue Robotics (SSRR), 2007. In this paper, we present an approach for aiding control of an autonomous airship by the means of SLAM. We show how the Unscented Kalman Filter can be applied in a SLAM context with monocular vision. The recently published Inverse Depth Parametrization is used for undelayed single-hypothesis landmark initialization and modelling. The novelty of the presented approach lies in the combination of UKF, Inverse Depth Parametrization and bearing-only SLAM and its application for autonomous airship control and UAV control in general.
  2. FastSLAM using SURF Features: An Efficient Implementation and Practical Experiences Peer Neubert, Niko Sünderhauf, Peter Protzel. In Proceedings of the International Conference on Intelligent and Autonomous Vehicles, IAV07, 2007.
  3. Comparing several implementations of two recently published feature detectors Johannes Bauer, Niko Sünderhauf, Peter Protzel. In Proceedings of the International Conference on Intelligent and Autonomous Vehicles, IAV07, 2007.

Misc

  1. Stereo Odometry – A Review of Approaches Niko Sünderhauf, Peter Protzel. 2007.

2006

Conference Publications

  1. Bringing Robotics Closer to Students – A Threefold Approach Niko Sünderhauf, T. Krause, P. Protzel. In Proc. of Intl. Conf. on Robotics and Automation (ICRA), 2006.
  2. Using and Extending the Miro Middleware for Autonomous Mobile Robots Daniel Krüger, Ingo Lil, Niko Sünderhauf, Robert Baumgartl, Peter Protzel. In Proceedings of Towards Autonomous Robotic Systems (TAROS06), 2006.
  3. Towards Using Bundle Adjustment for Robust Stereo Odometry in Outdoor Terrain Niko Sünderhauf, Peter Protzel. In Proceedings of Towards Autonomous Robotic Systems (TAROS06), 2006.

Misc

  1. Stereo Odometry on an Autonomous Mobile Robot in Outdoor Terrain (Diplomarbeit) Niko Sünderhauf. 2006.

2005

Conference Publications

  1. RoboKing - Bringing Robotics Closer to Pupils Niko Sünderhauf, Thomas Krause, Peter Protzel. In Proc. of Intl. Conf. on Robotics and Automation (ICRA), 2005.
  2. Visual Odometry using Sparse Bundle Adjustment on an Autonomous Outdoor Vehicle Niko Sünderhauf, Kurt Konolige, Simon Lacroix, Peter Protzel. In Tagungsband Autonome Mobile Systeme 2005, 2005.

Workshop Publications

  1. Comparison of Stereovision Odometry Approaches. Niko Sünderhauf, Kurt Konolige, Thomas Lemaire, Simon Lacroix. In Proceedings of IEEE International Conference on Robotics and Automation, Planetary Rover Workshop, 2005.