## Abstract

This paper proposes a learning algorithm to speed up search in Task and Motion Planning problems (TAMP), and to solve three challenges in order to improve planning efficiency. These challenges are: what to predict, how to represent a planning problem instance, and how to transfer knowledge from one problem instance to another. The proposed method here is called the score-space representation, which aims to predict constraints on the search space based on a generic representation of a planning problem instance. This score space is where people can represent a problem instance in terms of performance of a set of solutions attempted so far. Through this representation, knowledge is transferred from previous problems in the form of constraints, based on the similarity in score space. Following the score-space representation, a sequential algorithm which efficiently predicts these constraints is designed, and is evaluated in three different challenging task and motion planning problems. Results indicate the approach presented in this paper perform orders of magnitudes faster than an unguided planner.

## Introduction

The groundwork of task and motion planning (TAMP) problems is to make a robot decide how to manipulate objects in cluttered scenes, in order to achieve a high-level goal such as setting the table. Unlike humans who can reduce the length of the planning process by learning from previous planning experience, the computation time for robots increases exponentially with problem size. Also, every time a new problem is encountered, the robots must solve it from scratch, which is extremely inefficient for real world tasks. Therefore, a new algorithm design is proposed in this paper, and it aims to solve the three main problems in TAMP, which are

- WHAT TO PREDICT
- HOW TO REPRESENT a problem instance
- HOW TO TRANSFER knowledge from past experience to the current problem instance.

**What to Predict**

The previous approaches of using learning to speed up planning is not preferable, due to the intricate relationship between object poses and the robot free-space. This intricacy means that the space of feasible solutions may be altered by a small change in the environment. Figure 1 illustrates this difficulty in a pick-and-place domain, which shows that it is difficult to predict a complete solution or a sub-goal purely based on experience. The top half of Figure 1 shows that, obstacle positions differing by only 0.02 m will require entirely different choices of grasp for picking the blue object. The bottom half shows that object placement position choices differing by 0.05 m is the difference between feasible and unfeasible.

*Figure 1*

Taking into account the intuition that constraints can generalize more effectively across problem instances than a complete solution or a sub-goal, the designed algorithm will have to learn to predict constraints on the solution, by finding a subset of decision variables that can be predicted reliably, while leaving the rest of the decision variables to be filled in by the planner. Figure 2 demonstrates a robot trying to pick an object from a shelf. The arrangement and number of obstacles vary randomly across different planning problem instances. The objective is to find an arm trajectory to a pre-grasp pose for the blue box, marked with a circle, whose position is randomly determined in each problem instance.

*Figure 2*

The subset of predicted decision variables is regarded as the solution constraints, which reduces the search space and preserves the robustness of the planner against changes in problem instances.

**How to Represent Problem Instances**

The score space is defined as a new type of representation, proposed in this paper, of the problem instance in terms of a vector of the scores for a set of plans on that instance. In the score space, every plan is computed based on a fixed set of solution constraints. The score space has an advantage of directly giving information on the similarity between problem instances and solution constraints without any explicit state representation, whereas its main disadvantage is that its information is computationally expensive.

**How to Transfer Knowledge from Past Experience to the Current Problem Instance Efficiently**

To solve this, an algorithm model needs to be defined to apply the expectation and correlation of the scores of solution constraints from the past instances, so the desired solution constraint can be decided.

The proposed algorithm, BOX, is defined as an upper-confidence-bound (UCB) type, experience-based, black-box function-optimization technique that learns to suggest appropriate solution constraints based on the scores of the previously attempted plans.

## Algorithm

The diagram above demonstrates the proposed BOX algorithm and an algorithm which can create score matrix and solution constraints. With these two algorithms ready, the experiments can run to see the performance of the BOX algorithm.

## Experiments

The experiments examine the effectiveness of the STATIC and BOX algorithms in three robotic planning domains. Five selected algorithms, which are STATIC, BOX, RAWPLANNER, RAND, and DOO, are compared in those three domains. Two different planners, RRT and Trajopt, are also used in this paper to show that the score-space algorithms can work with different planners. In each domain, the results are reported using two plots: the first showing the time to find the first feasible solution, and the second showing how the solution quality improves as the algorithms are given more time.

**1. Picking an Object in a Cluttered Scene**

This domain requires the robot to move to a pre-grasp configuration for an object. It involves choosing the configuration and a collision-free path in a domain. Here, the number and position of obstacles can vary.

The robots are asked to pinpoint an arm motion, and grasp an object that randomly lies on a desk or bookshelf. There also are randomly placed obstacles. The object grasp or the final robot configuration is specified. The complete planning problem includes choosing a grasp, performing inverse kinematics to find a pre-grasp configuration for the chosen grasp, and solving a motion planning problem to the computed pre-grasp configuration.

Figure 3a shows the comparison of the time required by each method to find the first feasible plan with RRT as the path planner, Figure 3d shows the same comparison, but with TrajOpt as the path planner instead.

Figures 3b and 3e demonstrate the time required by each method to find the first feasible plan, using RRT and TrajOpt as the planner.

Figure 3c shows the time required by each method to find the first feasible plan.

Figure 4a compares the solution quality vs time using RRT, and Figure 4d compares the same using TrajOpt.

Figures 4b and 4e compare the solution quality vs time when RRT TrajOpt is used, respectively.

Figure 4c shows the average solution score as a function of computation time.

**2. Picking Randomly Placed Object in a Cluttered Scene**

This domain requires the robot to pick an object that is randomly placed on one of the tables in a large room. It requires the choosing of a base and pre-grasp configuration, and a collision-free path from the initial configuration to the pre-grasp configuration. The robots need to search for a base configuration, a left arm pre-grasp configuration, and a feasible path between these configurations to pick an object, as shown in Figure 5. There are 20 boxes resting on two tables as obstacles, and they remain fixed for all instances. The location and orientation in the plane of table are randomly selected and not in collision for each the red box (as obstacle) or the blue box (as target).

**3.Moving Object to a Different Room**

This domain requires the robot to pick an object, move to another room by going through a narrow passage, and then place the object on a table in that room. To solve this problem, the robots need to choose a pre-grasp configuration, a placement position for the object on the table, a pre-place configuration, and paths that connects all these configurations. Again, the problem instances vary as before.

The solution constraints involving the placements of objects are introduced here, with the problem instances shown in Figure 6.

All plots display that the score-space algorithms STATIC and BOX outperform all other algorithm in terms of finding a good solution with a given amount of time, and BOX outperforms STATIC, showing the advantage of using correlation information.

All the results demonstrate that the proposed approach can significantly improve the speed of planning in complex TAMP problems

## Technical Comments:

With the addition of the correlation information, the proposed algorithm, BOX, really stands out and optimized the performance of task and motion planning problems using the score-space representation. In addition, the performances of BOX on different planners, RRT and TrajOpt, are approximately the same. The future work may focus on optimization of score-space algorithms in more complex TAMP problems, predict and generalize better kinematic path and poses. Potential studies may involve transferring of a heavy object, which cannot be done by only one robot. Thus, the problem of cooperation between two robots with the same BOX algorithm may arise.

## References:

[1] L. P. Kaelbling and T. Lozano-Perez, “Integrated task and motion ´ planning in belief space,” IJRR, vol. 32, no. 9-10, 2013.

[2] T. Lozano-Perez and L. Kaelbling, “A constraint-based method for ´ solving sequential manipulation planning problems,” IROS, 2014.

[3] S. Srivastava, E. Fang, L. Riano, R. Chitnis, S. Russell, and P. Abbeel, “Combined task and motion planning through an extensible plannerindependent interface layer,” ICRA, 2014.

[4] R. Munos, “From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning,” Foundations and Trends in Machine Learning, 2014.

[5] N. Srinivas, A. Krause, S. KaKade, and M. Seeger, “Gaussian process optimization in the bandit setting: No regret and experimental design,” ICML, 2010.

[6] R. Munos., “Optimization of deterministic functions without the knowledge of its smoothness.” Advances in Neural Information Processing Systems, 2011.

[7] D. Berenson, P. Abbeel, and K. Goldberg, “A robot path planning framework that learns from experience,” ICRA, 2012.

[8] J. Hodal and J. Dvo ´ ˇrak, “Using case-based reasoning for mobile robot ´ path planning,” Journal of Engineering Mechanics, 2008.

[9] S. Pandya and S. Hutchinson, “A case-based approach to robot motion planning,” IEEE Intl. Conf. on Systems, Man and Cybernetics, 1992.

[10] N. Jetchev and M. Toussaint, “Fast motion planning from experience: trajectory prediction for speeding up movement generation,” Autonomous Robots, vol. 34, no. 1-2, pp. 111–127, 2013.

[11] J. Lien and Y. Lu, “Planning motion in environments with similar obstacles,” RSS, 2009.

[12] M. Phillips, B. Cohen, S. Chita, and M. Likhachev, “E-graphs: Bootstrapping planning with experience graphs,” RSS, 2012.

[13] A. Dragan, G. Gordon, and S. S. Srinivasa, “Learning from experience in manipulation planning: Setting the right goals,” ISRR, 2011.

[14] S. Finney, L. P. Kaelbling, and T. Lozano-Perez, “Predicting partial ´ paths from planning problem parameters,” RSS, 2007.

[15] D. Dey, T. Y. Liu, B. Sofman, and J. A. Bagnell, “Efficient optimization of control libraries,” AAAI, 2012.

[16] D. Dey, T. Y. Liu, B. Sofman, M. Hebert, and J. A. Bagnell, “Contextual sequence prediction via submodular function optimization,” RSS, 2012.

[17] R. Diankov, “Automated construction of robotic manipulation programs,” Ph.D. dissertation, CMU Robotics Institute, August 2010.

[18] J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan, S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequential convex optimization and convex collision checking,” IJRR, 2014.

Paper Authors: Beomjoon Kim, Leslie Pack Kaelbling and Tomas Lozano-Perez

Source: http://lis.csail.mit.edu/pubs/kim-icra17.pdf

**Author**: Bin Liu | **Editor**: Zhen Gao | **Localized by Synced Global Team**: Xiang Chen

## 0 comments on “Learning to Guide Task and Motion Planning Using Score-Space Representation”