A Task Learning Mechanism for the Telerobots

Telerobotic systems have attracted growing attention because of their superiority in the dangerous or unknown interaction tasks. It is very challengeable to exploit such systems to implement complex tasks in an autonomous way. In this paper, we propose a task learning framework to represent the manipulation skill demonstrated by a remotely controlled robot


Introduction
With the development of sciences and technologies, robots have been widely used in the industry, life, education, entertainment, and military domains. [1][2][3][4][5][6][7] Restricted by many factors, e.g., sensor technologies, control technologies, mechanism design, etc., robots cannot achieve full autonomy. In structural environments, such as automobile production line, robots can do amazing job because the whole manipulation process has been preprogrammed. These technologies, however, cannot directly transfer to the robots which are working in the unstructured and uncertainty environment. On this point, the telerobot is a practical choice which can be remotely controlled by an operator. A telerobotic system consists of¯ve parts: a human operator, a master device, a remote slave device, a communication channel, and an interaction environment. A human operator remotely controls a robot to perform a task in dangerous or unknown environments. The telerobotic system integrates the humans' intelligence and robots' superiorities for increasingly complex task scenarios and interaction environments. Combining the advantages of the robot and operator, the telerobots are widely used in telesurgery, deep-sea/space exploration, nuclear processing, etc. domains.
However, because of time delays, model uncertainties and lack of transparencies for the teleoperated system, it is challengeable for the operator to control the remote robot in an easy way. Many works have concentrated on the control algorithms to cope with the problems of model uncertainties and time delays. Yang et al. proposed neural networks (NNs) and automatic collision method to deal with the uncertainties of kinematics and dynamics for a Baxter robot. 8 A radial basis function NNs (RBFNNs) and wave variable method were presented to compensate the in°uences of time delay and dynamics uncertainties. 9 The authors developed an adaptive fuzzy control method to deal with the system's uncertainties and time delay for a dualmaster-single-slave teleoperated system. 10,11 Sun et al. presented a wave-based timedomain passivity method to enhance the transparency of a four-channel teleoperated system. 12 Besides, other control algorithms, for examples, hybrid position/force control, impedance control, and fuzzy control method and so on, have widely applied in teleoperation to deal with the issues. [13][14][15][16][17] These advanced control algorithms e®ectively improve the performance of the telerobotic system. Yang et al. developed a variable controller with tremor attenuation¯lter which involves muscle activation for the purpose of personalized control in a dynamic environment. 18 A human-in-theloop framework was proposed to adapt the change of human behavior and ensure a optimal teleimpedance in the process of teleoperation. 19 Generally, human or human upper limb is a main component of the teleoperated system. The authors analyzed the cost energy of the upper limb via evaluation of hand grasp pressure and passivity interaction for teleoperation. 20 Shahbazi et al. presented a passivity method with respect to human upper limb for the purpose of the natural and safe HRI. 21 Potential answers were developed to deal with the in°uence factors which involve human in terms of frame rates, image issue, and so on. 22 By introducing the human-centered interaction methods, the performance of the teleoperated system can be strengthened in the process of control.
The above-mentioned methods strictly demonstrated and coped with the problems of time delays, model uncertainties, etc. However, it is hard to obtain an accurate model in the presence of environment uncertainty and load uncertainty. Alternatively, the algorithm based on machine learning is a worth choice to solve the problem of estimations of robot model in dynamics and kinematics. Robot learning is an alternative solution to estimate the relationship among operator, the robot, and the executed task. Human operators face a huge manipulation pressure and workload in traditional teleoperation. In some extreme cases, operators even need to control every joints of the robot in order to guide it to¯nely interact with environment. One solution for easing operator's workload is that the robot learns the manipulation skill o®line and implements the manipulation task online according to its perception capability. Along this direction, one very important direction is how robot can learn a skill by human demonstration. The authors developed a dynamical movement primitive (DMP) algorithm with sEMG signal to construct the learned skill and sti®ness by using the task trajectories and sti®ness information. 23,24 A human motion intention recognition approach was proposed to identify the object and grasp conguration by employing hidden Markov model (HMM) in the teleoperated system. 25 Tanwani et al. proposed a hidden semi-Markov model (HSMM) algorithm to generate a task model for the purpose of assistance of the human operator. 26 An impedance control with haptic method was proposed for an assembly in order to learn the task through collection of motion and sti®ness messages. 27 An HMM and dynamic time warping (DTW) method were used to learn a complex task by using the task trajectories. 28 The researchers employed HMM and Gaussian mixture model (GMM) to learn a pouring task and a container-emptying task for a bilateral teleoperated system. 29,30 A small variance asymptotic method which involved online robot learning was developed to learn a task model and achieve human intention recognition. 31 Zeestraten et al. proposed a programming by demonstration (PbD) based on shared control which was used to learn a task with state sharing and also used to improve the task performance. 32 In order to improve the working e±ciency of task and achieve automated task generation, this paper develops a robot task trajectory learning based on machine learning. First, a DTW method is implemented to deal with the problems of temporal mismatch for demonstrated observations. The demonstrated observations can be normalized in the same time domain by using DTW method. Second, a GMM method is used to encode the task trajectory for the purposes of description of relationship among the robot, teleoperator, and the task. Finally, a GMR method is employed to generate continuous, smooth, and state-based task trajectory according to the variability of the demonstrations. In this work, the proposed method brings the following bene¯ts: (1) E±ciency: Compared with the traditional direct teleoperation, the proposed method enables the telerobot to perform a task automatically through several demonstrations. This would be great because it does not require human operator for repetitive task. (2) Security: The teleoperated system does not need human operators all the time. Instead, human operator can concentrate on the decision-making works in teleoperation.
The rest of this paper is organized as follows. Proposed methods composed of DTW, GMM, and GMR are shown in Sec. 2. Section 3 presents the results. Finally, we draw conclusion and future work in Sec. 4.

The description of task learning
In this paper, the system includes 3 modules as shown in Fig. 1: Human demonstrations, robot learning and preproduction, and robot execution.
. Human demonstration module: The function of this module is that an operator remotely controls a slave robot to perform tasks via master haptic device. The trajectories including position, velocity of the robot's end-e®ector are recorded. In this teleoperated system, a operator controls the master via the position of master device. Because of the di®erent kinematics structure between the master device and the slave robot, we are using the following coordinate transformation: where x, y, z and x m , y m , z m are the position vectors of the slave and the master in task space, respectively. O sm is the rotation transformation matrix, ¤ is the scale factor to adjust the workspace between the master and the slave. b is the position correction term in X/Y /Z directions.

1950009-4
. Robot learning and reproduction module: The robot learning module includes data preprocessing, task encoding, and task generating. The function of data preprocessing is to normalize the demonstrated data into a uni¯ed time domain. We employ a GMM method based on statistical learning method to encode human skill and develop a GMR method to generate a learned skill model for the slave device according to the new situation. . Robot execution module: The main function of this module is to enable the robot following the trajectory generated from robot learning module. To this end, the slave robot executes the generalized task learned from the demonstrated robot trajectories.
In order to describe the proposed method more clearly, a procedure of the algorithm and a schematic diagram of the proposed method are outlined in Algorithm 1 and Fig. 2, respectively.

Data preprocessing with DTW
Given the successfully designed human demonstration module, position and velocity of the robot end-e®ector are sampled from the multiple task demonstrations.
In multiple demonstration, the implementation time of the tasks is not the same. We processed the sample data by employing DTW method to keep data in the same time domain. 33 DTW algorithm was widely used in speech recognition to deal with the problem of similarity for two di®erent temporal sequences. 34,35 By introducing the boundary, monotonicity, and step size conditions, we compute the similarity through an optimal wrap path distance as follows: where x 1 and x 2 are the two di®erent length time trajectories (position and velocity). Lðx 1 Þ and Lðx 2 Þ are the lengths of two trajectories. DfLðx 1 Þ; Lðx 2 Þg and D opt fLðx 1 Þ; Lðx 2 Þg are the wrap path distance and optimal wrap path distance, respectively.

Task encoded by GMM
Reinforcement learning (RL) and HMM have been widely used for e®ectively encoding the task trajectory. 30,36 Nevertheless, due to the need of good timeliness for encoding a task, the RL method is limited because of its search space being too large for a relatively complex task. Besides, the process of task encoding is a continuous one, while the HMM method is needed to be interpolated based on the discrete sets. Therefore, We propose the combined GMM and GMR method which can cope with the above-mentioned shortcomings in robot learning because of its appropriate search space and continuity.
After the preprocessing, we obtained N normalized, demonstrated sample sequence which can be represented as X: demonstrated trajectories composed by position and velocity.

Encode the training trajectory
Θ ← EM algorithm Θ: parameters of GMM are optimized by EM algorithm.

Generate the trajectory for a new test task based on GMM
Xo(j): mean of the output in jth step. σo: variance of the output in jth step. A j , b j and h j can be computed by (16)-(18).

Following the generated trajectory to implement new task
J. Luo et al.

1950009-6
GMM is developed to encode the demonstrated task from the observation sequences. 37,38 The probability density function (PDF) pðX i Þ can be presented as where j and j are the mean values and variances for the observations. pðjÞ is the prior information for the K Gaussian components in jth step and it satis¯es X K j¼1 pðjÞ ¼ N ðX i j j ; j Þ is the Gaussian distribution of the observations given as Inspired by Ref. 39, (4) based on (5)-(6) for the demonstrated observations can be rewritten as where Â ¼ fÂ j g K j¼1 ¼ f 1 ; 1 ; 1 ; . . . ; j ; j ; j ; . . . ; K ; K ; K g are the parameters of Gaussian component. In this paper, the demonstrated observations are regarded as independent Gaussian distribution. According to the demonstrated observations, the parameters of GMM can be estimated by employed expectation-maximization (EM) method. The values of EM method are initialized by using k-means clustering algorithm.

Task generated by GMR
According to the demonstrated observation sequence, GMR method is employed to generate a generalized task model after encoded by GMM. [40][41][42] By employing the joint probability distribution, the observation sequence, means matrix, and covariance matrix can be described as The estimation values of conditional distribution for output dataX j o can be computed according to the conditional output X j o and given X i as follows: Motivated by the related works in Refs. 43 and 44, the conditional distribution of output dataX o for K Gaussian component aŝ Thus, a generated task model is obtained based on GMR. The generated motion from the learned model can perform smoothly without paying attention to the inverse kinematics problem of the slave. This would greatly improve the real-time ability of the telerobotic systems for the automated task.
According to the conditional probability of GMR and a current given position x, we can obtain the desired velocity aŝ where where j ¼ 1; . . . ; K in (16)- (18). According to Euler integration, the desired position in Cartesian space at time t is updated based on the computed desired velocity_ x as follows: 3. Results and Discussion

Experimental setup
We demonstrate our proposed method using a telerobotic system shown in Fig. 3, which is composed of Touch X (haptic device) and a Baxter robot. Touch X communicates with Baxter robot via a User Datagram Protocol (UDP) module. A cleaning task is performed to validate the e®ectiveness of the proposed method with di®erent initial conditions based on GMM and GMR methods. In the cleaning task, a gray cardboard as a cleaning tool and a yellow cube as a rubbish is used. A human operator via Touch X controls the right arm of Baxter robot to sweep the rubbish into the red garbage bucket. There exist several demonstrations in this task. By learning the cleaning task, the robot can be executed successfully in di®erent initial positions.

Demonstrated observation preprocessing
The cleaning task is demonstrated 10 times from di®erent initial places through a teleoperated mode.

Results
As shown in Fig. 6, the demonstrated trajectories are divided into 10 times to indicate the similarity of each demonstrated cleaning task in the same time domain. It can be seen that the curve of each demonstration has a similar shape in comparison with other curves. In this experiment, the GMM method is employed to encode the cleaning task.  (a) Encoded models by using GMM in X-axis (b) Encoded models by using GMM in Y-axis (c) Encoded models by using GMM in Y-axis

1950009-12
the cleaning motion varies mainly in the X and Y -axes, while it is relatively stable in Z -axis. The encoding task indicates that the cleaning task has great variability in X-Y space.
In the generating phase, the desired position can be obtained according to GMR method. The results of generating task are plotted in Figs. 8(a)-8(c) based on the learned model. There is a smooth generated trajectory in X/Y /Z -axes according to (10)- (14). It is noted that the generated trajectories could be adjusted (a) Regressed models by using GMR in X-axis (b) Regressed models by using GMR in Y-axis (c) Regressed models by using GMR in Z-axis Fig. 8. Left: (a) shows the generation task by using GMR in X-axis. Middle: (b) is generation task by using GMR in Y -axis. Right: (c) displays the generation task by using GMR in Z -axis.
(a) Robot execution process from starting place to phase I for a cleaning task (b) Robot execution process from phase I to phase II for a cleaning task The robot execution process of the cleaning task is presented in Fig. 9. In Fig. 9(a), the robot is performed from a given initial position which is di®erent from the trainings. In order to describe the execution process of the slave, we arti¯cially divide the process into¯ve phases (phases I-V). In phase I, the motion process can be updated according to (15)-(18) via a given initial position. This phase is corresponded with the phase a in generating process. Accordingly, Figs. 9(b)-9(f) indicate the rest of the cleaning task performed process (four phases) corresponding to the another phases (b)-(f) which are generated by using GMR method, respectively. In the robot execution process, the cleaning task is successfully implemented by using GMM-GMR method.

Conclusion and Future Work
This paper proposes a task learning framework for the teleoperated robot to explore the relationship among the robot, operator, and the task. In this paper, we adopt to use a DTW method which is used to normalize the 10 demonstrated observations (c) Robot execution process from phase II to phase III for a cleaning task (d) Robot execution process in phase III pausing for a moment for a cleaning task (e) Robot execution process from phase III to phase IV for a cleaning task (f) Robot execution process from phase IV to phase V for a cleaning task

1950009-14
with di®erent time scale. We propose a new learning framework which employs GMM method to encode the demonstrated trajectory of robot end-e®ector while a operator remotely controls the robot implementing a cleaning task. In the framework, we use GMR to generalize the training result to new situation (di®erent initial position), in our case, in the evaluation experiment, the cleaning task starts from a new position. Experimental results indicate that the proposed learning framework is feasible. After 10 times demonstrations, the proposed method successfully generates a trajectory which has di®erent initial positions with the training set. Combining with the robot cartesian controller, the clean task in the new situation is implemented correctly. In the future, impedance and force information of end-e®ector of the robot involving human-robot interaction should be considered. The time-delay issue of the teleoperation system should also be noted. 45,46 Furthermore, more industrial task scenarios should be explored in practical applications.