Semantic RGB-D Perceptionfor Cognitive RobotsSven BehnkeComputer Science Institute VIAutonomous Intelligent Systems

Our Domestic Service RobotsDynamaid CoseroSize: 100-180 cm, weight: 30-35 kg36 articulated jointsPC, laser scanners, Kinect, microphone, 2

RoboCup 2013 Eindhoven3

Analysis of Table-top Scenes andGrasp Planning Detection of clusters above horizontal planeTwo grasps (top, side) Flexible grasping of many unknown objects [Stückler, Steffens, Holz, Behnke, Robotics and Autonomous Systems 2012]4

3D-Mapping with Surfels5

3D-Mapping with Surfels6

3D-Mapping and Localization Registration of 3D laser scansRepresentation of point distributions in voxelsDrivability assessment trough region growingRobust localization using 2D laser scans[Kläß, Stückler, Behnke: Robotik 2012]7

3D Mapping by RGB-D SLAM5cm Modelling of shape andcolor distributions in voxelsLocal multiresolutionEfficient registration[Stückler, Behnke:of views on CPU2,5cmJournal of Visual Communicationand Image Representation 2013] Globaloptimization Multi-camera SLAM[Stoucken, Diplomarbeit 2013]8

Learning and Tracking Object Models Modeling of objects by RGB-D-SLAM Real-time registration with current RGB-D image9

Deformable RGB-D-Registration Based on Coherent Point Drift method [Myronenko &Song, PAMI 2010]Multiresolution Surfel Map allows real-time registration[Stückler, Behnke, ICRA2014]10

Transformation of Poses on Object Derived from the deformation field[Stückler, Behnke, ICRA2014]11

Grasp & Motion Skill Transfer Demonstration at RoboCup 2013 [Stückler, Behnke, ICRA2014] 12

Tool use: Bottle Opener Tool tip perceptionExtension of armkinematicsPerception ofcrown capMotion adaptation[Stückler, Behnke, Humanoids 2014]13

Picking Sausage, Bimanual Transport Perception oftool tip andsausage Alignment withmain axis ofsausage Our team NimbRo won [email protected] League inthree consecutive years[Stückler, Behnke, Humanoids 2014]14

Hierarchical Object Discovery troughMotion Segmentation Motion is strong segmentation cueBoth camera and object motion Segment-wise registration of a sequence Inference of a segment hierarchy [Stückler, Behnke: IJCAI 2013]15

Semantic Mapping Pixel-wise classification of RGB-Dimages by random forestsInner nodes compare color /depth of regionsSize normalizationTraining and recall on GPU3D fusion through RGB-D SLAMEvaluation on own data set andNYU depth v2Accuracy in %Ground truthSegmentation[Stückler,Biresev,Behnke:IROS 2012]Ø ClassesØ PixelsSilberman et al. 201259,658,6Couprie et al. 201363,564,5Random forest65,068,13D-Fusion66,870,6[Stückler et al., Journal of Real-Time Image Processing 2014]16

Learning Depth-sensitive CRFs SLIC depth super pixelsUnary features: random forestHeight feature Pairwise features Color contrastVertical alignmentDepth differenceNormal ts:Random forestCRF predictionGround truth[Müller and Behnke, ICRA 2014]17

Object Class Detection in RGB-D Hough forests make not only object class decision,but describe object centerRGB-D objects data setColor and depth featuresTraining with rendered scenesDetection of object positionand orientationScene Class prob.Object centersOrientationDetected objectsDepth helps a lot[Badami, Stückler, Behnke: SPME 2013]18

Bin Picking Known objects intransport box Matching of graphs of 2D and 3D shape primitives3D 2DGrasp and motion planningOfflineOnline[Nieuwenhuisen et al.: ICRA 2013]19

Learning of Object Models Scan multiple objectsin different poses Find support planeand remove it Segment views Register views using ICP Recognize geometric primitivesRegistered viewsSurface reconstructionDetected primitives20

Active Object Perception[Holz et al. STAR 2014]

Active Object PerceptionDetected cylindersPartial occlusionsDetectedobject Efficient exploration of the part arrangement in thetransport boxes to handle occlusions[Holz et al. STAR 2014]

Active Object PerceptionNext best view Efficient exploration of the part arrangement in thetransport boxes to handle occlusions

Active Object PerceptionTwo moredetectedobjects Efficient exploration of the part arrangement in thetransport boxes to handle occlusions[Holz et al. STAR 2014]

Industrial Application: Depalettizing Using work space RGB-DcameraInitial pose of transport boxroughly knownDetect dominanthorizontal planeabove groundCluster points abovesupport planeEstimate mainaxes[Holz et al. IROS 2015]25

Object View Registration Wrist RGB-D camera moved above innermost objectcandidateObject views arerepresented asMultiresolutionSurfel MapRegistration of objectview with currentmeasurements usingsoft assignmentsVerification based onregistration quality[Holz et al. IROS 2015]26

Part Detection and Grasping[Holz et al. IROS 2015]27

Depalletizing Results: 10 Runs Total time Component times and success rates[Holz et al. IROS 2015]28

Part Verification Results Parts used for verification Detection confidences[Holz et al. IROS 2015]29

Different Lighting ConditionsArtificial light and day lightOnly daylightLow light [Holz et al. IROS 2015]In all cases, thepalette wassuccessfullycleared.30

Deep Learning[Schulz and Behnke, KI 2012]31

GPU Implementations (CUDA) Affordable parallel computersGeneral-purpose programmingConvolutional [Scherer & Behnke, 2009] Local connectivity [Uetz & Behnke, 2009]32

Image Categorization: NORB 10 categories, jittered-cluttered Max-Pooling, cross-entropy training Test error: 5,6% (LeNet7: 7.8%)[Scherer, Müller, Behnke, ICANN’10]33

Image Categorization: LabelMe 50,000 color images (256x256)12 classes clutter (50%)Error TRN: 3.77%; TST: 16.27%Recall: 1,356 images/s[Uetz, Behnke, ICIS2009]34

Object-class Segmentation Class annotation per pixel Multi-scale input channels Evaluated on MSRC-9/21and INRIA Graz-02 datasets[Schulz, Behnke 2012]InputOutputTruthInputOutputTruth35

Object Detection in Images Bounding box annotation Structured loss that directly maximizes overlap of theprediction with ground truth bounding boxes Evaluated on two of the Pascal VOC 2007 classes[Schulz, Behnke, ICANN 2014]36

RGB-D Object-Class Segmentation Scale input according to depthCompute pixel heightNYU Depth V2RGBDepthHeightTruthOutput[Schulz, Höft, Behnke, ESANN 2015]37

Neural Abstraction Pyramid[Behnke, LNCS 2766, 2003]Abstract features- Data-driven- Analysis- Feature extraction- Model-driven- Synthesis- Feature expansionSignals- Grouping - Competition - Completion38

Iterative Interpretation[Behnke, LNCS 2766, 2003] Interpret most obvious parts first Use partial interpretation as context to resolvelocal ambiguities39

Local Recurrent ConnectivityLateral projectionHyper columnBackward projectionForward projectionCellFeature mapLayerProcessor elementLayerOutputProjectionsLayerLess abstractHyper neighborhoodMore abstract[Behnke, LNCS 2766, 2003]40

Neural Abstraction Pyramid for RGB-D VideoObject-class Segmentation RGBNYU Depth V2contains RGB-D videosequencesRecursivecomputation isefficient for temporalintegrationDepthOutputTruth[Pavel, Schulz, Behnke, IJCNN 2015]41

Geometric and Semantic Features for RGB-DObject-class Segmentation New geometricfeature: distancefrom wallSemantic featurespretrained fromImageNetBoth helpsignificantly[Husain et al. under review]RGBTruthDistWallOutWOOutWithDist42

Semantic Segmentation Priors forObject Discovery Combine bottomup object discoveryand semanticpriorsSemanticsegmentation usedto classify colorand depthsuperpixelsHigher recall, moreprecise objectborders[Garcia et al. under review]43

RGB-D Object Recognition and PoseEstimation Use pretrained features from ImageNet[Schwarz, Schulz, Behnke, ICRA2015]44

Canonical View, Colorization Objects viewedfrom ased ondistance fromcenter vertical[Schwarz, Schulz, Behnke, ICRA2015]45

Features Disentangle Data t-SNEembedding[Schwarz, Schulz,Behnke ICRA2015]46

Recognition Accuracy Improved both category and instance recognition Confusion1:pitcher/ coffe mug2:peach/sponge[Schwarz, Schulz, Behnke, ICRA2015]47

Conclusion Semantic perception in everydayenvironments is challengingSimple methods rely on strong assumptions(e.g. support plane)Depth helps with segmentation, allows forsize normalization, geometric features, shapedescriptorsDeep learning methods work wellTransfer of features from large data setsMany open problems, e.g. total sceneunderstanding, incorporating physics, 48

Thanks for your attention!Questions?49

36 articulated joints . [Holz et al. IROS 2015] Object View Registration 26 . - Feature expansion Signals Abstract feature