Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Embodied perception during walking using Deep Recurrent Neural Networks Chen, Jacob 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_september_chen_jacob.pdf [ 4.63MB ]
Metadata
JSON: 24-1.0348730.json
JSON-LD: 24-1.0348730-ld.json
RDF/XML (Pretty): 24-1.0348730-rdf.xml
RDF/JSON: 24-1.0348730-rdf.json
Turtle: 24-1.0348730-turtle.txt
N-Triples: 24-1.0348730-rdf-ntriples.txt
Original Record: 24-1.0348730-source.json
Full Text
24-1.0348730-fulltext.txt
Citation
24-1.0348730.ris

Full Text

Embodied Perception during Walking using DeepRecurrent Neural NetworksbyJacob ChenB.Sc Electrical Engineering, University of Maryland, College Park, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)July 2017c© Jacob Chen, 2017AbstractMovements such as walking require knowledge of the environment in order tobe robust. This knowledge can be gleaned via embodied perception. While in-formation about the upcoming terrain such as compliance, friction, or slope maybe difficult to directly estimate, using the walking motion itself allows for theseproperties to be implicitly observed over time from the stream of movement data.However, the relationship between a parameter such as ground compliance and themovement data may be complex and difficult to discover. In this thesis, we demon-strate the use of a Deep LSTM Network to estimate slope and ground complianceof terrain by observing a stream of sensory information that includes the characterstate and foot pressure information.iiLay SummaryAccurately estimating the environment that an agent is in is beneficial because itallows for immediate, robust, and effective responsiveness. Allowing the agent tobuild their own structured representations from input streams is a notion borrowedfrom the recent ideas in the cognitive sciences. In order to do this data drivenapproach, we use a modern machine learning model called the Deep RecurrentNeural Network, which enables the system to receive a stream of inputs that includethe state of the bipedal agent as well as foot pressure sensors to estimate terrainproperties that are difficult to analyze analytically. Using such a model allows theagent to build its own structured representations of the data that enable high fidelityestimation that may be used as supplemental information to inform decisions thatthe agent may make in the future.iiiPrefaceMy supervisor Michiel Van de Panne, was instrumental in providing the guidanceand direction for this thesis. I was responsible for its implementation, experimentalsetup, and results, as well as the writing of this thesis.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Embodied Perception . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Robot Locomotion with Environmental Knowledge . . . . . . . . 72.2.1 Terrain Classification . . . . . . . . . . . . . . . . . . . . 72.2.2 Terrain Estimation . . . . . . . . . . . . . . . . . . . . . 102.3 System Identification and State Estimation . . . . . . . . . . . . . 11v2.4 Prediction from Time Series . . . . . . . . . . . . . . . . . . . . 123 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1 Bipedal Walking Control Policy . . . . . . . . . . . . . . . . . . 143.2 Ground Contact Model . . . . . . . . . . . . . . . . . . . . . . . 163.3 State Estimation and System Identification . . . . . . . . . . . . . 183.4 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4.1 Feed Forward Networks . . . . . . . . . . . . . . . . . . 203.4.2 Recurrent Neural Networks (RNNs) . . . . . . . . . . . . 223.4.3 Vanilla RNN . . . . . . . . . . . . . . . . . . . . . . . . 223.4.4 LSTM (Long Short Term Memory) Cells . . . . . . . . . 254 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1 Physics Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 284.1.1 Biped Simulation . . . . . . . . . . . . . . . . . . . . . . 284.1.2 Terrain Generation . . . . . . . . . . . . . . . . . . . . . 304.2 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . 304.3 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . 314.3.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.1 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . 335.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2.1 Shared Network . . . . . . . . . . . . . . . . . . . . . . 355.3 Separate Networks . . . . . . . . . . . . . . . . . . . . . . . . . 365.3.1 Compliance . . . . . . . . . . . . . . . . . . . . . . . . . 365.3.2 Slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.4 Decreased Parameter Space . . . . . . . . . . . . . . . . . . . . . 385.5 Varying Window Sizes . . . . . . . . . . . . . . . . . . . . . . . 385.6 Removal of Features . . . . . . . . . . . . . . . . . . . . . . . . 395.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50viBibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52viiList of TablesTable 4.1 Body parameters . . . . . . . . . . . . . . . . . . . . . . . . . 28Table 4.2 Joint PD Gains . . . . . . . . . . . . . . . . . . . . . . . . . . 29Table 4.3 Finite State Machine parameters, WF represents with respect toWorld Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Table 5.1 Validation losses between the separate and shared models. Low-est validation losses per trial are indicated by their respectivearrows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37viiiList of FiguresFigure 1.1 Sample features plotted with respect to time along with theirlabels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Figure 1.2 Simulation snapshots in order from left to right, top to bot-tom. The character is estimating the terrain properties. Redcircles represent the actual values, blue circles represent thepredictions. All predictions are normalized with respect to thetraining set. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Figure 2.1 Simple contact sensor [Giguere and Dudek, 2009] used for sur-face identification . . . . . . . . . . . . . . . . . . . . . . . . 7Figure 2.2 Legged Robot [Walas, 2015] used for classifying terrains . . . 8Figure 2.3 Robot [Sandeep Manjanna, 2013] used for classifying terrainsusing gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Figure 2.4 UP-OSI System Diagram of [Wenhao Yu, 2017] . . . . . . . . 10Figure 3.1 System Diagram. SIMple BIped CONtrol (SIMBICON) Walk-ing controller generates data from physics simulation by beingapplied on the environment. This data is then used to train alearning model, where the estimations can be displayed in realtime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15ixFigure 3.2 Example Finite State Machine for SIMBICON. The state tran-sitions that exit states 1 and 3 occur after a time delay, ∆ t.States 2 and 4 are completed up until the corresponding foothas made contact. States 1 and 2, shown in green, are in rightstnace, while states 3 and 4, shown in orange, are in left stance. 17Figure 3.3 The rigid body is drawn as a light green rectangle. Red circlesrepresent vertices of rigid body that will be used for collisiondetection and response. X’s represent the location of first con-tact with the ground point. Green arrows represent the restora-tive force vectors applied to the vertices. . . . . . . . . . . . . 18Figure 3.4 Spring and Damper system used for the contact model. . . . . 18Figure 3.5 The rigid body is represented as a light green rectangle. Thecircle represents a vertex of rigid body to track. The X rep-resents the collision point of the vertex with the surface. Thegreen arrow represents the restorative force vector applied tothe vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 3.6 Example architecture of a dense feed forward network. . . . . 20Figure 3.7 Example architecture of an unrolled dense Recurrent NeuralNetwork. Inputs are combined with the previous hidden stateand the result is linearly combined before being passed throughan activation function. hi represent the neurons with weightedinputs followed by the activation function. . . . . . . . . . . . 22Figure 3.8 Function φ transforms the hidden state into an output. Thisoperation can be performed at any stage in the sequence. . . . 23Figure 3.9 Various RNN Output forms . . . . . . . . . . . . . . . . . . . 24Figure 3.10 LSTM cell proposed by Hochreiter. This architecture intro-duces the input and output gates which regulate the cell’s hid-den state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 3.11 Modern LSTM cell with forget gate proposed by [Gers et al.,1999]. The Forget gate allows LSTM cell to flush out irrele-vant contents from the cell state while maintaining the constanterror carousel. . . . . . . . . . . . . . . . . . . . . . . . . . 27xFigure 4.1 The slope pattern that the bipedal character traverse across. . 30Figure 4.2 The standard multilayer LSTM architecture . . . . . . . . . . 31Figure 5.1 The experiments that were carried out. Each experiment branchesoff from the shared network architecture to measure its effecton accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 5.2 The Shared LSTM architecture for predicting both slope sˆ andcompliance cˆ. . . . . . . . . . . . . . . . . . . . . . . . . . . 36Figure 5.3 Results for Slope and Compliance prediction using a sharedarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Figure 5.4 Results for compliance prediction using separate model . . . . 42Figure 5.5 Results for slope prediction using separate model . . . . . . . 43Figure 5.6 Results for using half of the units. . . . . . . . . . . . . . . . 44Figure 5.7 Results for a window size of 15 . . . . . . . . . . . . . . . . 45Figure 5.8 Results for a window of 7 . . . . . . . . . . . . . . . . . . . 46Figure 5.9 Results for a window of 3 . . . . . . . . . . . . . . . . . . . 47Figure 5.10 Results of only using state features . . . . . . . . . . . . . . . 48Figure 5.11 Results of only using foot pressure forces . . . . . . . . . . . 49xiGlossarySIMBICON SIMple BIped CONtrolPD Proportional DerivativeLSTM Long Short Term MemoryRNN Recurrent Neural NetworkBPTT Back Propagation Through TimeSVM Support Vector MachinesUP Universal Control PolicyOSI On-line System Identification ModelRELU Rectified Linear UnitCEC Constant Error CarouselEKF Extended Kalman FiltersPOMDP Partially Observed Markhov Decision ProcessesFFNN Feed Forward Neural NetworkFSM Finite State MachineCOM Center Of MassxiiAcknowledgmentsI thank my supervisor Dr. Michiel van de Panne for his mentoring and supportthroughout my Master’s track. I thoroughly appreciate your oversight, patience,and your willingness to work alongside your students. It is clearly evident that youtake your students’ interests to heart by the kinds of relationships you foster withthem. Your inquisitive nature and character will be one that I will always strive toemulate.I also thank my lab mates, Xue Bin Peng, Glen Berseth, and Shailen Agrawalthat I had the pleasure and honor of meeting and sharing time in the lab. Eventhough I was not directly involved in your projects, you had no qualms in dis-cussing your work. The enthusiasm you had for it was palpable and contagious. Iwas able to grow and learn so much from our discussions. I wish I could botheryou guys forever. I couldn’t have asked for a more fostering lab for knowledge.I also extend my thanks to Dr. Dinesh Pai for being the second reader of mythesis.Furthermore, I would like to thank Kimberly Dextras Romagnino for providingthe necessary comedic relief of the lab. Without it, I may very well have goneinsane.Lastly, and most importantly, I thank my parents for all the support and lovethey have shown me. The opportunities that I have had would not have been possi-ble without your encouragement, understanding, patience, and willingness to help.I would not be who I am today without your influences.xiiiChapter 1IntroductionOur perception of the environment influences the way we interact with it. Thus,building an accurate perception that reflects reality accurately is crucial in inform-ing the decisions we make. Furthermore, building an accurate representation ofreality is restricted to the physical and sensory modalities that are afforded to theagent. The agent has the task of discovering the relationship mapping betweeninformation gleaned from its modalities and their environment. Traditional ap-proaches for allowing an agent to discover this relationship require domain knowl-edge that is not representative of how humans and other organisms learn to discoverand interact with their environment. For instance, in order to catch a ball, humansdo not calculate the trajectory of the ball and predict where it will land. Instead,humans use the strategy of a constant stream of sensory information to positionthemselves in order to catch the ball successfully. In this work, we seek to emulateaspects of such behavior. This work seeks to explore insight related to sensory-motor embodiment and therefore how an agent should learn and interact with itsenvironment without the need for embedding domain knowledge.1.1 MotivationsA major theme in cognitive science is the idea of perception and its relation tocognition and how this is framed. Embodied perception and cognition is the ideathat cognition is deeply dependent upon characteristics of the physical body of the1agent [Berthoz, 2002; Wilson and Foglia, 2017]. Part of the theory is that percep-tion is tied to the physical make up of the agent and so the cognitive system of theagent is constrained by the capabilities of the physical modalities and movementsafforded to the agent. The agent thus uses an interplay between the simulated con-sequences of its actions and the environment as a means of understanding theirenvironment. This thesis addresses this theme by using the consequences of theenvironment on its own modalities through walking to understand and estimate ter-rain properties. By using the recent window of past consequences, we explore howthis understanding can be achieved, possibly similar to how humans gain a betterunderstanding through repeated temporal exposure and feedback. Thus this the-sis’ primary motivation is to explore and validate how well movement of an agentcan help it understand its own environment and discover its relationship to latentvariables that may be difficult to discover analytically.Our work is also motivated by controls in animation and robotics. Typically inorder to animate characters, motion capture techniques and/or artist involvementis required to provide realistic high quality animation. Physics based animationthrough control offers an alternative to these techniques along with many desirablebenefits. If successful, these techniques for control provide responsive realism,interaction, and generalization towards a variety of environments. Using thesekinds of approaches may reduce the effort and cost for animation by removing theneed for tailoring animations to specific environments.In the above context, this thesis also seeks to improve the capacity of usingphysics based controllers by introducing predictive models for environment stateestimation. By using these models, the controllers gain the ability to anticipatechanges and perturbations in the environment and respond intelligently. If the pre-dictive model has enough fidelity, then it may be useful for the controller to aug-ment the current state of the physical agent to propose an action. This work liesin the realm of state estimation and system identification where the agent uses anawareness of itself and its encompassing environment. Traditional system identifi-cation techniques include discovering the behaviors of a system through strategicsampling. Depending on the response of the system, the appropriate model is cho-sen and introduced to minimize the error between prediction and actual. Traditionalstate estimation techniques seek to estimate current state based on a combination of2new observations and previous estimate. This is most clearly demonstrated throughthe Kalman filter technique which is used extensively in many applications. Ourwork estimates the state of the environment through continuous sampling and con-structs a model to approximate the system that transforms character movement andmodality information to estimates of environmental latent variables, such as com-pliance and slope.Deep Learning techniques have been shown to be successful in the field of com-puter graphics. Some examples include motion manifold learning and synthesis,physics based controllers, and 3D Mesh Labeling [Guo et al., 2015; Holden et al.,2016; Peng et al., 2016]. Additionally, much progress has been made using Recur-rent Neural Network (RNN)s in Natural Language Processing due to its ability toprocess time sequenced inputs [Socher et al., 2013]. This work demonstrates thepotential to leverage RNNs for the purpose of enhancing physics based controllers.Furthermore, deep learning is able to discover relationships between variables thatmay be difficult to discover analytically. This work relates environmental proper-ties to the movement of the agent’s afforded modalities which is difficult to modelanalytically. Here, the slope and compliance act as latent variables that affect thewalking movement of the bipedal character. By using a data driven approach, wecan discover this relationship with sufficient enough accuracy that there is no needto resort to complex analytical models.We propose that by using the recent history of the states and other sensoryinformation of a bipedal walker, we are able to find key patterns hidden in the statesequence that would enable the bipedal character to accurately estimate the groundproperties of the terrain it is traversing such as slope and compliance. We thusallow the model to record and decide which elements in its input stream it shouldpay close attention to. An accurate estimate might only be obtainable during aparticular part of the walk cycle, and so the model needs to remember this event inorder to provide good estimates at other points in time. Similarly, the best resultmay only be obtainable by integrating together important pieces of informationthat are obtained at different points in the walk cycle. To accomplish this, we use amulti-layer LSTM network as our predictive model in order to learn temporal statedependencies that can be used for the prediction of ground properties. For thisthesis, we require a walking controller and use the well known SIMBICON control3Figure 1.1: Sample features plotted with respect to time along with their la-belsstrategy [Yin et al., 2007] with a robust enough gait along with our own contactmodel in order to predict the slope and compliance of the terrain. This introducesthe possibility to use high fidelity predictive models in order to enhance controls.The terrain follows a pattern of presentation, but each terrain segment consistsof randomized slopes and ground compliance. We model ground compliance byusing our own contact model based on springs and dampers. Further details on thesimulation are found in Chapter 4. Figure 1.2 displays snapshots of the simulation.Figure 1.1 displays some state features plotted against time along with the terrainlabels that the model uses to estimate terrain properties. The purpose is to illustratethe complex relationship that the model must learn to capture.1.2 Thesis OverviewThis thesis explores the effectiveness and potential of using deep Long Short TermMemory (LSTM) networks to build an embodied perception of the environment.In Chapter 2, we review the related work that has inspired this work. Chapter 3serves to provide the background needed to understand the implementation of thethesis. We examine in fine detail each of the components that this thesis builds4(a) Snapshot 1 (b) Snapshot 2(c) Snapshot 3 (d) Snapshot 4Figure 1.2: Simulation snapshots in order from left to right, top to bottom.The character is estimating the terrain properties. Red circles representthe actual values, blue circles represent the predictions. All predictionsare normalized with respect to the training set.upon. Chapter 4 provides the details of the experimental methodology of the exper-iments and describes the simulation and strategy for obtaining our results. Chap-ter 5 presents and discusses the results of our experiments and compares them toprovide meaningful conclusions. Lastly, Chapter 6 concludes with limitations ofthe work and the possible future directions it can take.5Chapter 2Related WorkUsing motion as a means for understanding the environment has its foundation inthe cognitive sciences and extends into robotics and control. Furthermore, under-standing the environment using time series data in a data driven fashion is stillbeing explored as a major research topic. In this chapter, we provide the relevantwork that shapes and defines our research question.2.1 Embodied PerceptionEmbodied perception, also referred to as grounded perception, is the notion thatperception and cognitive activity is rooted in sensorimotor experience, namely sit-uated actions and bodily states [Barsalou, 2007]. This approach states that per-ception is shaped by the brain capturing the states and experiences across themodalities afforded to it and integrating them into a multi modal representationof an event. Later when the event is recalled, these representations are replayedas simulations that allow the brain to represent and reason across events. Thistype of approach shifts the paradigm of control strategies [Vernon, 2008]. Tradi-tional strategies involve embedding symbols and intent into the agent and usingthose representations for learning and task completion. This strategy is known asthe cognitivist position. The emergent strategy allows the agent to self organizeand represent their environment given the modalities afforded it through agent-environment interaction over time. Recent advances in Machine Learning explore6Figure 2.1: Simple contact sensor [Giguere and Dudek, 2009] used for sur-face identificationthis paradigm by allowing representations of the data to emerge through repeatedexposure and training instead of relying on hand-engineered symbols that are ide-alized descriptions of human cognitive representations.2.2 Robot Locomotion with Environmental KnowledgeLocomotion in robotics has long been a challenging and continual effort. Creatingrobust controllers for robots requires exploiting the degrees of freedom inherent inthe robot in a manner that optimizes its traversal across a dynamic environment.Thus, the more knowledge the robot has of itself and the environment the better itwill perform. By decoupling system identification and control, each problem canbe solved separately for more flexibility and robustness. System identification ofthe environment can be seen as separating the problem into Terrain Classificationor Terrain Estimation.2.2.1 Terrain ClassificationA number of works have addressed terrain classification by leveraging machinelearning techniques. A first approach is to determine the degree of simplifica-tion a modality can have to accomplish terrain classification. This approach usesa very simple contact dynamic sensor for surface identification as shown in Fig-ure 2.1. Giguere and Dudek [2009] The authors attach an accelerometer to a rigidrod that contacts the surface and is dragged along by the robot to collect readingsat a particular speed. The authors used a supervised learning method for surfaceidentification of 10 surfaces and explore using an unsupervised technique for dis-7Figure 2.2: Legged Robot [Walas, 2015] used for classifying terrainscriminating against 2 surfaces. In the supervised setting, the data is aggregatedwith non-overlapping time-windows. Features are then extracted and processed bya 2 layer dense network. In this way, time dependence is preserved since sequentialdata is preprocessed before being presented to the network. The supervised settingachieved a score of 94.6%. In the unsupervised setting, the authors explore usinga mixture of 2 Gaussian classifiers to classify the feature vector into 2 classes. Theintuition is to discover the parameters that minimize the variability of classificationacross sequential data points. This setting was limited to the discriminatory powerof the sensor across the 2 different terrains and was not competitive with their su-pervised approach but also was not fully explored. However, using unsupervisedapproaches is promising for terrain classification since it allows the agent to betrained online.Another supervised approach is to classify 12 types of terrains by using acombination of visual, depth, and compliance data with a Support Vector Ma-chines (SVM) classifier [Walas, 2015]. The data is collected and trained offlineto find the best method for combining the separate classifiers from each sensingmodality to maximize classification accuracy. Their best results come from com-bined classifiers from all three sensing modalities with a precision of 94.44% anda recall of 95.15%, which is sufficient to perform control actions based on thisfeedback. The authors modify the gait by changing the PD gains of their internal8Figure 2.3: Robot [Sandeep Manjanna, 2013] used for classifying terrains us-ing gaitcontroller. Depending on the type of terrain, the optimal gait is selected that willallow the robot to traverse the terrain in the most efficient manner with regards tospeed and distance. The authors were able to see improvements using the optimalgait adjusted for terrain. The authors do not exploit any time dependencies in thiswork. The robot is displayed in Figure 2.2.Finally an approach similar to our work is to identify the terrain using the gaititself. All sensory information is proprioceptive and does not acquire informationabout the surrounding environment [Sandeep Manjanna, 2013]. This work has tiesto the emergent idea mentioned above of using movement and action to build per-ception. The authors found that classification of terrain was possible by analyzingthe effect of different terrains on the gaits. The robot the authors experimentedwith is modeled in Figure 2.3. The robot was equipped with sensors measuring legrotations, accelerations, rotations, magnetic orientation, as well as an estimationof electrical currents. The robot would then sample these modalities at a rate of20Hz with the goal of classifying 4 types of terrain. They used an unsupervisedclustering algorithm where each batch represents a sequence of consecutive samplemeasurements which embeds time dependent information. The parameters for thealgorithm are discovered by attempting to make consistent classifications acrossconsecutive temporal samples in the batch. The authors found that at particularsettings of the robot, the data had more separability thereby making their classi-fier more effective. Using their unsupervised approach enabled them to achieve asuccess rate of 92.11% which is competitive with previous supervised approaches.9Figure 2.4: UP-OSI System Diagram of [Wenhao Yu, 2017]It should be noted that the authors inject knowledge of optimal data separation inorder for their algorithm to be the most effective rather than allowing that behaviorto emerge automatically.2.2.2 Terrain EstimationBecause of the many types of terrain, it may be beneficial to estimate the propertiesof the terrain rather than attempting to classify the terrain as being one of N dis-crete types. One work uses direct sensory information from the foot to analyticallycompute the surface gradient underneath the foot [Yi et al., 2010]. The agent fol-lows a unique walking strategy to collect noisy sensory measurements of the poseand displacement of the foot at each step. This information is used to maintain andupdate a local estimate of the surface normal. The estimates are then used to adaptthe gait. The learning is performed online for a small number of model parameterswhich allow learning to be achieved rapidly. The authors then compare using theirmethod to enhance the locomotion of their robot and found it to be more stable androbust than using a baseline locomotion strategy.Another approach to terrain estimation replaces LIDAR information with stereovision to generate footstep plans for the robot on uneven terrain [Marion et al.,2015]. Using stereo vision, the authors build a 3D height map which then allowsthem to segment the terrain into possible regions for the footstep planning of robot.Using this approach, they found that the results were comparable to using LIDARas a means for gathering the height map and can act as a direct replacement withthe benefit of smaller weight and power costs.10In an attempt to leverage simulation to address a wide array of environmentalconditions, the authors build a system utilizing neural networks which consists of aUniversal Control Policy (UP) and a On-line System Identification Model (OSI) forterrain estimation [Wenhao Yu, 2017]. The UP outputs a control vector based onthe current state and dynamic model parameters. The OSI uses a history of previousstates and control vectors to predict the dynamic model parameters. The completesystem diagram is shown in Figure 2.4. The intuition is that if the dynamic modelparameters are correctly predicted for the environment, then the UP should be ableto output the correct control vector that responds appropriately. The authors trainthe systems by sampling from a distribution of model parameters so that the sys-tem is robust to many variations of model parameters. During training, they couplethe OSI model with the UP in order for the interaction to be more representative ofthe combined system. This system can then be used in a variety of unknown en-vironments by leveraging data that is easily attainable by simulation. The authorstest their approach on 4 different low-dimensional testing environments. They dis-covered that their approach was comparable to using the true model parameters ineach scenario and in some cases performed better.2.3 System Identification and State EstimationSystem identification is the task of accurately modeling the behaviors of a systemon the basis of observing input-output relationships [Ljung, 1999]. The generalapproach is to strategically sample the system, collect the outputs, determine theclass of models that represent the relationship well, and modify the parameters ofthe model to accurately fit the input-output relationship. There are three main typesof system identification frameworks that are known as white box modeling, blackbox modeling, and grey box modeling. These differ in the amount of prior knowl-edge the modeler injects into the modeled representation of the system. Systemidentification techniques are well studied and various techniques exist dependingon the demands of the system approximation [Sjo¨berg et al., 1995]. Our work isa data driven black box approach using a parametric model to model non-linearrelationships.State estimation seeks to continuously make estimates of the dynamics of a11system through feedback and observing input-output relationships [Simon, 2006].A common approach to this is using a technique called Kalman filtering. This tech-nique is an iterative process which uses previous state distribution and uncertaintymodeled by a covariance matrix with sensor readings with uncertainty of its own.A new state distribution and uncertainty is estimated and the process repeats. Manyversions of this filter exist to handle various conditions. State estimation with RNNshas been compared with Extended Kalman Filters (EKF) [N. Yadaiah, 2006]. Theauthors found that in their scenario the RNN based state estimator was more accu-rate than the EKF implementation. They develop an architecture which cascades aRNN layer into a Feed Forward Neural Network (FFNN). The motivation is that theRNN layers would represent the system state dynamics while the FFNN transformsthe dynamics into measurements. This is similar to our architecture, where the wecascade RNN layers into FFNN layers for final estimation. However, we use themodern LSTM architecture with higher input dimensions.2.4 Prediction from Time SeriesPrediction from time series data has been studied and applied extensively in manyfields including economics, biology, linguistics, and industrial settings [Debasish Sena,2015; Socher et al., 2013; Yazdani, 2009; Ziv Bar-Joseph, 2012]. Using time seriesdata allows for deeper analysis of the behavior of a system. A popular choice fortime series prediction is a class of models known as RNNs [Connor et al., 1994;Gers et al., 2000; Prasad and Prasad, 2014]. These models enable accurate androbust data driven approaches to prediction problems which address the issuesof highly nonlinear, non-stationary, noisy data. As such, these models are verypromising in their capabilities of modeling complex systems. Similar applicationsto this thesis leverage RNNs for control or state estimation tasks [Alanis et al., 2011;HuH and Todorov, 2009]. By training the RNN to emulate specific behaviors, theauthors were able to provide generalized solutions to their task domains. Anotheruse case of RNNs for control is to preserve sequential observation-action sequencesto solve Partially Observed Markhov Decision Processes (POMDP) in a scalableway to solve tasks in the context of reinforcement learning [Heess et al., 2015]. Byusing this model, the authors were able to hide state information such as velocity12from the model in order to allow the model to infer that information to achieve tasksof balancing an inverted pendulum and cartpole swing-ups. Furthermore, time de-pendent environmental changes could also be introduced and handled gracefullyby this model while a non-temporal model such as a feed forward neural networkperformed poorly.13Chapter 3BackgroundThis chapter reviews a number of the key components of the simulation and learn-ing system that we develop in this thesis. Figure 3.1 displays a high level overviewof the entire system. Each component will be discussed in detail beginning with thewalking controller, physics environment, and finally the learning model. At a highlevel, we apply a physics based bipedal controller to the terrain generated insideour physics simulation to gather data. This data is then given as input to our esti-mation model and then fed back to the physics simulation for real time estimationof the environment.3.1 Bipedal Walking Control PolicyVirtual bipedal characters can be animated using physics, leading to rich and re-alistic interactions with their environment that would be difficult to achieve usingtraditional animation methods that ignore physics. Accomplishing this requiresgenerating actuation patterns that enable the physics-based character to accomplishits tasks. There are three design approaches that have been explored for generat-ing such patterns. These include Pose-driven feedback control, dynamics-basedoptimization control, and stimulus-response network control. In a pose-driven ap-proach, target trajectories are proposed for the character which the controller usesto minimize the distance between its current measured state and the target states.In a dynamics-based optimization control approach, actuator torques are proposed14Figure 3.1: System Diagram. SIMBICON Walking controller generates datafrom physics simulation by being applied on the environment. Thisdata is then used to train a learning model, where the estimations can bedisplayed in real time.which try to optimize a set of high level objectives while obeying constraints. Com-pared to the previous approach, this method establishes a definitive causal effect oftorques. However, these methods are computationally more expensive since a sys-tem of equations have to be solved at regular intervals, though at a lower rate thanthe physics time step of the simulation. Finally, a stimulus-response network con-trol seeks to map stimuli from the environment to actuations. These methods do notassume any a priori knowledge and seeks to build the mapping through repeatedsimulation and evaluation of a fitness function. However, developing and tuningthis fitness function to produce natural motion for humans and animals remainselusive and requires extensive knowledge from a variety of different fields.To that end, we require an effective and robust control approach for bipedallocomotion. This locomotion needs to be robust enough to handle slight terrain per-turbations. We settle on the well-known bipedal control policy known as SIMBICON,which is a pose-driven feedback control design [Yin et al., 2007]. This controlstrategy allow for customization of a wide range of gaits and styles that are ro-bust enough to handle small unexpected forces, terrain changes, and dynamics. Itcontrols characters by defining targets for the joint angles and uses Proportional15Derivative (PD) feedback controllers to compute torques that minimize the differ-ence between the target pose and the current pose.The controller is based on a Finite State Machine (FSM) along with a feed-back mechanism for determining swing hip position. The FSM can be alteredto produce different style gaits while remaining robust towards small perturba-tions. Each state provides a set of target joint positions and velocities for eachindividual joint. FSM state transitions either when the time duration of the statehas been reached or a contact has been made with the surface as shown in Fig-ure 3.2. PD controllers are used to compute joint torques as computed according toτ = kp(θd −θ)+ kd(θ˙d − θ˙). Here, θd is the desired orientation of the child linkwith respect to its parent link, θ is the current orientation, θ˙d is the desired angularvelocity, θ˙ is the current angular velocity, and kp and kd are gains for position andvelocity respectively. Typically the desired angular velocity θ˙d is set to 0. Fur-thermore, there is a feedback law specific to the swing hip that modifies the targetorientation based on the stance foot distance to the Center Of Mass (COM) and itslinear velocity in order to keep the character stable. This feedback component iscomputed according to θd = θd0 + cdd + cvv. Where θd0 is the original desiredorientation, d is the distance between the hip and the stance foot, v is the linearvelocity of the hip, and cd and cv are gain parameters. Virtual torque is appliedto the torso to keep it upright with respect to the world frame, which is realizedthrough the hips so that it can be produced via internal torques.For our purposes, we require a basic walking gait that is able to traverse vari-ations in terrain. These variations take the form of ground slope and ground com-pliance. A walking gait robust to these perturbations is found through manualselection of the parameters as based on the original SIMBICON paper [Yin et al.,2007]. Further details are given in Chapter 4.3.2 Ground Contact ModelWe require a ground contact model that can provide the information needed tomodel the sensing of pressure at various predetermined points along the sole of thefoot. These pressures are assumed to be available as part of the sensory stream.we model ground compliance using dynamically instanced springs and dampers,16Figure 3.2: Example Finite State Machine for SIMBICON. The state transi-tions that exit states 1 and 3 occur after a time delay, ∆ t. States 2 and 4are completed up until the corresponding foot has made contact. States1 and 2, shown in green, are in right stnace, while states 3 and 4, shownin orange, are in left stance.which apply restorative forces to specific points on a rigid body in order to keep itabove the ground.Figure 3.3 shows an example of the forces being applied to a set of vertices onthe foot. Each restorative force is computed using a spring and damper accordingtoFr = kp(Cp−P)− kdP˙where Cp is the point of first contact with the ground, P is the current positionof the vertex, P˙ is the velocity of the vertex, and kp,kd are stiffness and dampingconstants. A closeup of the spring and damper model is shown in Figure 3.4. Inorder to avoid creating unrealistic forces that pull the foot down, we further filterthe vertical reaction force according to F ′y = min(0,Fy).We apply a friction cone to limit the allowable directions that the forces canbe directed according to the coefficient of friction as defined by µ = tanθ = |Ft ||Fn|17Figure 3.3: The rigid body is drawn as a light green rectangle. Red circlesrepresent vertices of rigid body that will be used for collision detectionand response. X’s represent the location of first contact with the groundpoint. Green arrows represent the restorative force vectors applied tothe vertices.Figure 3.4: Spring and Damper system used for the contact model.where Ft is the tangential force component, and Fn is the normal force compo-nent as shown in Figure 3.5. Once the restorative force is calculated for eachvertex, we compare it with the allowable ranges specified by the friction coeffi-cient. If the force exceeds the bounds of the friction cone, we clip the force tothe outside edge of the cone and move the collision point to reflect this change, i.e|Ft | = min(|Ft |,µ ∗ |Fn|). The forces felt on the feet are shown in the blue lines inFigure 1.23.3 State Estimation and System IdentificationThe problem we seek to solve is related to the task of determining the state of adynamical system, which can be generalized to also model the state of the envi-18Figure 3.5: The rigid body is represented as a light green rectangle. The cir-cle represents a vertex of rigid body to track. The X represents the col-lision point of the vertex with the surface. The green arrow representsthe restorative force vector applied to the vertex.ronment. Accurate knowledge of the state of the system is crucial for stabilizationthrough state feedback. In our case, we are interested in using a series of characterstate measurements observed over time as a means to better approximate the en-vironmental state. Our approach is loosely related to what the Kalman filter seeksto accomplish with continuous states. The Kalman filter recursively predicts andupdates its current estimates and covariances with corrections from observed mea-surements at each step. The predict step is characterized by predicting the nextestimate of the state as well as the uncertainty modeled by the updated covari-ance. Different variations of the Kalman Filter are used ubiquitously in the field ofrobotics for state estimation depending on the type of system encountered. Exam-ples include applications estimating joint friction of bipedal walkers, vehicle states,and robot positioning [Hashlamon and Erbatur, 2016; N. Houshangi, 2005; Reinaet al., 2017].3.4 Neural NetworksNeural Networks are hierarchical models which learn useful representations of thedata at each layer. Each layer consists of a particular model that extracts its own19purposeful and useful representations from its input. Neural Networks are wellknown as flexible function approximators that can be trained end-to-end throughsupervised learning. Because of their success in accurate generalization, thesemodels are applied in fields including linguistics, computer vision, graphics, con-trol, and biology.3.4.1 Feed Forward NetworksFigure 3.6: Example architecture of a dense feed forward network.Traditional neural networks are feed-forward neural networks, which have afixed input size and are fully connected. Figure 3.6 displays an example of a clas-sical Feed Forward Deep Neural Network. Each neuron in the previous layer isconnected to a neuron in the subsequent layer. Neurons, represented as circles,compute a weighted sum of its inputs and are typically followed by a subsequentnon-linear activation function to produce an output. This is expressed as:a ji = σ(∑kw j−1ik aj−1k +bji)(3.1)where a ji is the activation value of the ith neuron in the jth layer, σ is the activationfunction, w j−1ik is the weight that connects the kth neuron to the ith neuron in previ-ous layer j− 1, and b ji is the bias of the ith neuron in layer j. As more layers areadded, the complexity of the model grows because the parameter space increases.As model complexity grows, the risk of overfitting also increases.In order to train these models, we require a loss function which represents20the deviation between the desired values and the values predicted by the network.The backpropagation algorithm is used for training the network by computing thegradients of the loss function with respect to each of the parameters of the networkby taking advantage of the chain rule of calculus. Four equations are used forbackpropagation:δ L = ∇aCσ ′(∑kwL−1ik aL−1k +bLi)(3.2)δ j =((w j+1)ᵀδ j+1)σ ′(∑kw j−1ik aj−1k +bji)(3.3)∂C∂b ji= δ ji (3.4)∂C∂w jik= a j−1k δli (3.5)Equations 3.2 and 3.3 describe how to compute the gradients of the activation func-tions with respect to any layer.  represents the Hadamard product. In all cases,C is the loss function, w jik represents the weights connecting the kth node to theith node in layer j, a represents the activation value, and b represents the biases.In Equation 3.2 and 3.3, the gradient with respect to the node is stored as δ sinceall the parameters of the network require this value. Equation 3.2 computes thegradient with respect to the node in the last layer, while equation 3.3 computes thegradient with respect to an arbitrary node in layer j. Equations 3.4 and 3.5 de-scribe how to compute the gradients with respect to each parameter given the layergradients. Finally each parameter is updated via gradient descent according toθ ′ = θ −α ∂C∂θwhere θ represents the parameter to be updated, and α is the learning rate. Theupdate shifts the parameter towards minimizing the cost function.Having overly complex models allows the possibility of overfitting, which isan issue described as having a minimal training error but poor generalization per-formance. In order to combat this, we employ a regularization technique known as21dropout. In dropout, random node activations are taken away during training withsome probability. This technique can be interpreted as model averaging where thenetwork sees a high volume of different possible network architectures and av-erages their outputs at test time. Though not necessarily required when copiousamount of training data is available, we employ this regularization technique toensure that the network does not overfit.3.4.2 Recurrent Neural Networks (RNNs)Figure 3.7: Example architecture of an unrolled dense Recurrent Neural Net-work. Inputs are combined with the previous hidden state and the resultis linearly combined before being passed through an activation function.hi represent the neurons with weighted inputs followed by the activationfunction.RNNs differ from traditional feed-forward neural networks in their ability tostore and propagate internal memory across arbitrary length sequences of inputs.These types of networks specialize in retaining time dependencies hidden in theinput sequence.3.4.3 Vanilla RNNFigure 3.7 shows a basic architecture of an unrolled vanilla RNN. These networksoperate sequentially on one input at a time with the same parameters and compute ahidden state at each step. This makes learning much faster compared to traditionalnetworks because the number of parameters is much smaller than a feed-forwardnetwork that receives the entire temporal sequence of data. The hidden values arethen passed forward in time and combined with the next input. The final output of22the network can be viewed as another layer which uses the hidden state as an inputto an activation function. Formally, the hidden state value is computed as:hi = σ (Uxt +WHt−1+bi) (3.6)where hi represents the hidden activation at time t for the ith hidden unit, U is a vec-tor of weights to multiply the input, W is a vector of weights that is multiplied withthe previous hidden state, b is the bias for the hidden unit, and σ is an activationfunction. A common activation function is the tanh function which compresses theoutput to the range [−1,1]. At the beginning of each sequence, the hidden state canbe reset to an initial value.Figure 3.8: Function φ transforms the hidden state into an output. This oper-ation can be performed at any stage in the sequence.The power and flexibility of these networks lies in leveraging the hidden statesto produce meaningful outputs. Figure 3.8 shows an example of how to use thehidden state to compute an output. Using this capability, the RNN is able to providemany forms of sequential output. Figure 3.9 shows the various forms that RNNoutputs can take. The output form of the RNN depends on the task it is trying toaccomplish.There are several methods for training a RNN. By far, the most popular one23(a) One to One (b) One to Many(c) Many to One (d) Many to Many(e) Many to ManyFigure 3.9: Various RNN Output formsis the Back Propagation Through Time (BPTT) algorithm, which is what we useto train our network. The BPTT algorithm is essentially the same as the standardbackpropagation algorithm on the unrolled RNN network. The difference is that thesame parameters of the RNN are used at every step and each gradient contribution ateach time step is summed together. As with deep feed-forward networks, gradients24may vanish or explode when propagated through a long sequence since gradientmultiplications results in small or large values as more multiplication operationstake place depending on the magnitude of the weights and activation function types[Pascanu et al., 2012]. These gradients pose a problem because exploding gradi-ents cause instabilities in the training and vanishing gradients make learning longterm dependencies difficult. There are numerous techniques to deal with explodinggradients such as time step truncation and gradient clipping, but vanishing gradi-ents are much more difficult to remedy. In order to solve this problem, a differentarchitecture of the Recurrent Neural Network was conceived, known as the LSTM[Lipton, 2015].3.4.4 LSTM (Long Short Term Memory) CellsLSTM networks have the same capabilities of the vanilla RNN. However, LSTMnetworks solve the vanishing gradient problem by introducing a memory cell statewith gates that dictate when the cell state is written to and read from through theuse of input and output gates [Hochreiter and Schmidhuber, 1997].Figure 3.10: LSTM cell proposed by Hochreiter. This architecture introducesthe input and output gates which regulate the cell’s hidden state.Figure 3.10 shows the architecture of the LSTM cell proposed by Hochreiter25in 1997 [Hochreiter and Schmidhuber, 1997]. In this figure, the input and previoushidden state are linearly combined and fed into the input node, the input gate, andthe output gate. These gates have non-linear activation functions that operate onthe inputs. In order to reach the cell state, the input node is multiplied elemental-wise by the input gate. This restricts the information that is allowed to modify thecell state. The cell state keeps its internal state across many iterations by having itsown self recurrent connection. The equations dictating the LSTM cell is specifiedas:It = σ(Wxt +Uht−1+bi) (3.7)Nt = σ(Wxt +Uht−1+bn) (3.8)Ct = Gt  It +Ct−1 (3.9)Ot = σ(Wxt +Uht−1+bo) (3.10)ht = Ogt Ct (3.11)where Nt is the input node, It is the input gate, Ct is the hidden cell state, Ot isthe output gate, and ht is the hidden state that is propagated forward. Having theself recurrent connection Ct allows for the error to be propagated back for longdepths. σ represents a non-linear function and is usually the tanh function. Havinga recurrent connection which spans adjacent time steps with a constant weightallows errors to be back propagated without vanishing or exploding. This is knownas the Constant Error Carousel (CEC). More specifically, because the cell state islinearly combined, the gradients from the upper levels of computation can flowdirectly down to the lower levels of the cell. The input and output gates enable theLSTM cell to selectively update or read from its cell state.Since its first conception in 1997, there has been a recent addition of anothergate known as the forget gate [Gers et al., 1999]. The motivation for this additionis to give the LSTM cell the ability to flush its cell state to discard irrelevant infor-mation thereby learning to forget. This gate modulates the previous cell state toproduce a new cell state. The modern LSTM cell is shown in Figure 3.11.26Figure 3.11: Modern LSTM cell with forget gate proposed by [Gers et al.,1999]. The Forget gate allows LSTM cell to flush out irrelevant con-tents from the cell state while maintaining the constant error carousel.The forget gate and the new cell state equations are:Ft = σ(Wxt +Uht−1+b f ) (3.12)Ct = Gt  It +Ft Ct−1 (3.13)As with the RNN there can be multiple hidden units that make up the hidden state.If the dimension of ht is k, then the concatenated input is d+ k where d is the di-mensionality of the state vector. Each gate and node receives its own copy of theinputs which are densely connected to each hidden unit. Thus there are approxi-mately 4k(d+ k) parameters per LSTM cell. The LSTM network is likewise trainedusing the BPTT algorithm.27Chapter 4MethodologyIn this chapter we discuss how the physics simulation is set up, the architecture ofthe LSTM network, as well as the training process of the LSTM network.4.1 Physics SimulationAs previously mentioned, we use the Bullet physics engine to develop our simu-lation. We use a time step of 0.0005s in order to achieve stable simulation whilemaintaining real-time performance.4.1.1 Biped SimulationTo drive the biped, we compute torques using PD controllers along with a feedbackcomponent for calculating swing hip angle. These torques are applied before eachBody Mass (kg) Length (m)Torso 70 0.48URL 5 0.45ULL 5 0.45LRL 4 0.45LLL 4 0.45RF 1 0.25LF 1 0.25Table 4.1: Body parameters28Body kp kdTorso 4200 420URL 800 80ULL 800 80LRL 700 70LLL 700 70RF 80 8LF 80 8Table 4.2: Joint PD GainsParameter State 1,3 State 2, 4∆t 0.31s contactCd 10 10Cv 4 4Torso (WF) 1.5◦ 1.5◦Swing hip 46.0◦ -7.0◦Swing knee 68◦ 17◦Swing ankle -9◦ -5◦Stance knee 8◦ 15.7◦Stance ankle -7◦ -3.0◦Table 4.3: Finite State Machine parameters, WF represents with respect toWorld Framephysics step. The body parameters of the 2D Biped are shown in Table 4.1, whilejoint parameters are shown in Table 4.2. The finite state machine parameters areshown in table 4.3. The segments are connected using revolute joints with jointlimits. There is also a maximum allowable torque of 300Nm per application.As previously noted, the ground compliance model that we implement is basedon a spring-and-damper restorative forces. The compliance model consists of astiffness gain, kp and a damping gain, kd . We set kd = 0.1kp. The stiffness issampled uniformly from a range kp ∈ [1000,3000]Nm. We apply this spring systemto 22 vertices located on the bottom edge of each foot of the biped. We found thatusing more vertices along the feet offered more robustness to the gait. Figure 1.2shows the force vectors in blue, that act upon the vertices along the bottom of thefoot.294.1.2 Terrain GenerationThe terrain that the biped traverses follows repeated instances of 3-segment chunks.Each segment is 4 meters in length. The first segment uses a slope m1 ∈ [−5◦,5◦].The second segment receives a slope m2 = 0◦. The last segment’s slope is givenby m3 =−m1. The purpose of the flat terrain segments, i.e., m2 = 0, is to give thebiped the opportunity to reset its gait. Figure 4.1 displays an example of the terrainpattern that the biped is traversing.Figure 4.1: The slope pattern that the bipedal character traverse across.4.2 Network ArchitectureFor our experiments, we employ a 2 layer LSTM network with a dense feed forwardnetwork with a Rectified Linear Unit (RELU) before the final output. Selecting thisarchitecture was performed while varying hyper-parameters and verifying the vali-dation loss. Deeper LSTM layers are helpful for more complex relationships. Moreparameters allow for modeling of more complex relationships but leaves the possi-bility of overfitting. Finding an architecture which generalizes well depends on thecomplexity of the problem and the number of parameters should scale accordingly.Having a Dense Feed-Forward network at the end of the model transforms the hid-den state into the estimation. For our intuition, the first LSTM layer transforms theinput sequence into a hidden sequence that the model learns to understand well.The second LSTM layer takes this hidden sequence and learns to recognize impor-tant data representations which is then decoded by the final dense layers. Figure 4.2displays the architecture that we use in our experiments.Our framework uses the many-to-many output form of the RNN as shown inFigure 3.9d for training to take advantage of all the corresponding labels for eachtime step in the sequence. When testing, we use the many-to-one output form asshown in Figure 3.9c which predicts the terrain properties after the last input fromthe sequence has been consumed. The outputs of the model are linear units because30Figure 4.2: The standard multilayer LSTM architectureour problem resembles a regression.4.3 Training ProcessThe training process consists of data collection and offline training of the model.Data collection is the process of consolidating the simulation results which includethe biped character state as well as the terrain labels. Our training process consistsof sampling batches of data, providing the corresponding labels, and updating themodel.4.3.1 Data CollectionWe collect consecutive input vectors along with their associated labels. The inputvector includes features representing the torso’s linear velocity, each bodies dis-tance to the root (hip joint), orientation, angular velocity, as well as an averagedwindow of forces. In total, our input vector has 45 dimensions. We gather thelabels for the terrain based on the position of hip joint with respect to the ground.That is, the label for the state is determined by which ground segment the hip po-sition is currently over. Thus, there are cases where the swing leg is ahead of thehip and contacts the next terrain segment first while the hip is still over the currentterrain segment. These situations give the model a notion of anticipation where itobserves a change in terrain prior to its complete arrival. Along with our proposed31network model and state input, we experiment with different window lengths andstate features as well. In total, our state vector has a maximum of 45 dimensions.We found that sampling at greater frequencies allowed for more accurate re-sults. For our simulation we sample at approximately 30Hz. Each sample consistsof a state vector which includes the biped state as well as the foot pressure forcescomputed by the compliance model. Each state vector includes the linear velocityof the hip, each body’s orientation and angular velocity, and the restorative forcevectors acting upon each vertex along the bottom edge of the foot. We run thesimulation on an Intel Core i5 CPU at 2.66 GHz.4.3.2 TrainingOnce we have collected the data, we begin our offline training of the model. Duringpreprocessing, we normalize the inputs and the outputs to have zero mean and unitvariance. We then employ an overlapping windowing strategy to present sequencesto the network. Each window includes 30 consecutive states and an average footpressure forces. This corresponds to roughly having a 1 second stream of datathat is given to the model for estimation. Because the data does not fit into RAM,we select batches of sequences uniformly during training. Our network uses themean squared error loss along with the RMSProp optimizer [Tieleman and Hinton,2012]. Furthermore, it should be noted that for final test prediction, we normalizetest values with respect to the training values using the mean and and variance fromthe training data.32Chapter 5ExperimentsThis chapter describes the experiments that we perform. Each experiment testsdifferent architectures and input state configurations. In the following sections,we describe our training framework and the evaluation results we use to gaugeperformance.5.1 Parameter SettingsFor all of our experiments, we use a training, validation, and test set framework;the model updates itself on the training set while measuring its performance on thevalidation set after each epoch. We stop training if the validation error does notimprove after a patience parameter (The number of consecutive epochs where thevalidation loss does not decrease) of 10 or if the maximum number of epochs, 40,is reached. The training is performed using batches of 32 samples for 530 timesas sufficient performance was reached using those metrics. Validation losses werecomputed and aggregated and averaged over every sample in a sequence since wepredict on each sample. We then use the model with the smallest validation loss fortesting. The final test score is realized by predicting on the test set. Experimentsare each run 3 times in order to gain an understanding of the distribution of thefinal models. Each experiment contains a validation loss graph, a sample test errorgraph, and a histogram of errors. With regard to the validation loss graph, thearrows represent the minimum validation losses for each trial. With regard to the33Figure 5.1: The experiments that were carried out. Each experiment branchesoff from the shared network architecture to measure its effect on accu-racy.sample test error graphs, the blue circles represent the model estimates while thered circles represent the actual values, the black lines show the distance betweenthe actual values and the estimation.-5.2 ExperimentsWe begin with the shared network model where the outputs share the same LSTMlayers before splitting into separate outputs. Each experiment follows from thebase case of using shared LSTM layers to measure its effect on estimation accu-racy. We try separate experiments where each experiment consists of a separatemodel that estimates individual terrain parameters. By doing this, we determine ifeach estimation benefits from having dedicated parameters. Next, we determine ifhaving fewer hidden units affects the estimation accuracy. Finally, we gauge theeffect of different input state configurations. Specifically, we measure the effect ofincluding only bipedal character state and the effect of including only foot pressureforces to estimate terrain properties. Figure 5.1 graphically displays the conductedexperiments. To get an idea of the scale of the errors, we note that 0.1σs = 0.288◦,and 0.1σc = 57.192Nm where σs and σc stand for slope and compliance standard34deviation respectively.5.2.1 Shared NetworkWe introduce the shared LSTM network with separate time distributed fully con-nected layers as separate outputs for slope and ground compliance shown in Fig-ure 5.2. Each LSTM layer consists of 128 hidden units. Each fully connected layerconsists of 128 units. There are 2 linear outputs from the model for slope andcompliance. The total number of trainable parameters is 253,954. We use a fixedwindow size of 30 and predict at every step in the sequence. Figure 5.3a displaysthe validation losses per trial. For comparison, the validation error of using theaverage label value for both slope and compliance estimation is 1.892. This valuegives an idea of the improvement that this model has over just using the average la-bel. The lowest achieved validation loss is 0.0481.Using the trained model with thelowest validation loss, we run the model on a test set of 2000 samples to get an ideaof the estimation accuracy as shown in Figure 5.3b. Here, the blue circles representthe prediction made by the model, the red circles are the actual labels, and the blacklines represent the distance of the prediction to the actual labels. We see that thepredictions estimate the labels quite closely. There are leading and trailing edgesduring estimation because the correct label corresponds to the terrain segment thatthe hip is currently over, which have discrepancies since the swing foot is aheadof of the hip and anticipates terrain changes. Finally, to get an understanding ofthe distribution of errors, we plot a histogram of errors as shown in Figure 5.3c.In both histogram plots, the errors are closely centered around 0, which suggestshigh fidelity to the actual labels. Note that the test losses for slope are smaller thancompliance for two main reasons. One, the slope generation has a specified terrainpattern while compliance generation does not, which means that a terrain slope of 0occur more frequently than other slopes, giving the model the opportunity to learnits representation and minimize errors. And two, the range of compliance values ismuch greater than that of slope, which may require more sampling in order for themodel to build an accurate representation of the data. Furthermore, to explain themultimodal distribution of the slope, we see that there are much larger errors alongthe edges of the estimations than between the edges, which gives rise to the small35sharp peaks in the error histogram.Figure 5.2: The Shared LSTM architecture for predicting both slope sˆ andcompliance cˆ.5.3 Separate NetworksThe model that we use for separate slope and compliance predictions is shown inFigure 4.2. Each model is dedicated to estimating one of the terrain properties.Each LSTM layer has 128 hidden units. The Dense layer consists of 128 unitsas well. The total number of trainable parameters is 237,313. We begin withthe dedicated compliance model followed by a dedicated slope model. Table 5.1displays the validation losses across all trials. The validation loss for the separatemodels are added to compare with the validation losses from the model with sharedparameters. It is interesting to note that the lowest validation loss for the sharedmodel is lower than the combined lowest validation losses of the separate modelsfor slope and compliance.5.3.1 ComplianceFigure 5.4 displays the results of using a separate network for compliance esti-mation. Each validation trial where the network is trained from random differentparameter initializations. The validation loss for comparison using the average36compliance value as an estimate is 3.484551. The lowest achievable validationloss is 0.031. We then use the model with the lowest validation loss as our final testprediction model. Each test sample represents a sequence of inputs. We take thelast predicted output as our final prediction to compare with the ground truth. Notethe anticipation towards the next compliance values shown by the large complianceerrors towards the edges; the model tries to predict the next values but the hip isstill over the current terrain segment. Once the foot pressures are felt for the nextsegment, the model anticipates the change in terrain and adjusts its estimate. Thehistogram distribution of errors shows the concentration of errors which is centeredaround 0, implying accurate prediction. Comparing the test histogram of a dedi-cated compliance model with the shared model, we see that the test errors of thededicated compliance model are actually less accurate than the shared model. Thededicated model has a larger mean and standard deviation, implying that there maybe information gained from using slope estimations as well.5.3.2 SlopeWe perform the same experiments with slope as we did with compliance. Fig-ure 5.5a displays the validation losses across 3 different parameter initializations.The validation loss for comparison using the average slope value is 0.298591. Thelowest validation loss is 0.015. Once we have established the model with the min-imum validation loss, we use that model on our test data set. The test estimationsare shown in Figure 5.5b. As mentioned previously, since the model sees moreexamples of 0 slope, it learns that representation very well, keeping the varianceof the errors minimal during those stretches. To get a better understanding of thedistribution of the errors we plot the error histogram as well shown in Figure 5.5c.Test Trial 1 Trial 2 Trial 3Slope 0.0169 0.0177 0.0159Compliance 0.0314 0.0329 0.0320Shared 0.0453 0.0482 0.0509Table 5.1: Validation losses between the separate and shared models. Lowestvalidation losses per trial are indicated by their respective arrows37Comparing this histogram of errors with the histogram of errors from the sharedmodel, we see that the distributions look about the same with minor improvementsin mean and spread. Thus, we conclude that there is not too much of an improve-ment using a dedicated model for slope estimation than using a shared model.5.4 Decreased Parameter SpaceWe now perform our same methodology using half the number of units in theshared architecture. Each LSTM layer consists of 64 units. Each fully connectedlayer consists of 64 units as well. The total number of trainable parameters be-comes 69,634.Figure 5.6a displays the validation losses across 3 separate trials from differentparameter initializations. Comparing the validation losses with the shared modelof more parameters, we see that the shared model achieves lower validation lossconsistently across all trials. The lowest validation loss from using half of theunits is 0.053. Test predictions are shown in Figure 5.6b. Comparing sample testpredictions, we see that the predictions made by this model have slightly morevariance. We verify this by comparing the histogram plots of test errors generatedby this model and the previous model. The test histogram is shown in Figure 5.7c.Comparing slope error distributions, this model has a mean farther from 0 as wellas a larger deviation. When comparing compliance errors, we see that the mean isslightly farther from 0 and the spread is slightly larger for this model. We concludethat the size of the parameter space affects the estimation quality of the model,but more experiments are needed to gauge the relationship between number ofparameters, parameter selection, and estimation quality.5.5 Varying Window SizesIn this experiment we vary the window sizes of the input data to gauge its effecton the prediction quality of the shared architecture. Our initial experiment withthe shared architecture were run with a window of size 30. In this section, wedisplay the results of changing the window to sizes of 15, 7, and 3. We run eachexperiment with 20 epochs instead of 40 because the validation improvements after20 are minimal.38Experimental results using a window of 15 are displayed in Figure 5.7. Ex-perimental results using a window of 7 are displayed in Figure 5.8. Experimentalresults using a window of 3 are shown in Figure 5.9. Comparing these resultstogether, we observe a general trend of loss of predictive quality as window sizedecreases. The validation losses are larger as window size decreases. From thesample test examples, we see that the variance of the estimates increases as win-dow sizes decreases. This is reflected through the spread of test error which alsoincreases for both the slope and compliance estimations as the window size de-creases. This leads us to conclude that using a larger window size indeed providesgreater accuracy for estimations. There is relevant information contained in thesequence history that requires a larger window to track it.5.6 Removal of FeaturesThe following experiments gauge the effect that the inputs have on estimation ac-curacy. In all cases, the model using the entire feature set had higher estimationaccuracy and precision. We begin with removing the foot pressure forces fromthe inputs, thereby only using bipedal character state features to estimate terrainproperties. Afterwards, we remove the bipedal character state features and onlyuse foot pressure forces to estimate terrain properties. We then compare the resultsof these experiments together to draw a conclusion on how each feature set affectsestimation.Results of removing the foot pressure forces and only using state features areshown in Figure 5.10. Results of removing the state features and only using footpressure forces are shown in Figure 5.11. Comparing the validation losses together,we see that in all cases the validation loss after one epoch with just using the footpressure forces improved more sharply than that of just using state information.This may be due to the homogeneity of the foot pressure force data. Regardless,the validation losses at the end of training appear similar. By comparing test pre-dictions together, we begin to see some interesting behavior. Using only bipedalcharacter state features, we see that the test predictions of slope fluctuate less andare more accurate than that of just using foot pressure forces for the same estima-tion. Conversely, when looking at the test predictions for compliance using only39foot pressure forces we see that the estimations oscillate less than that of just usingstate information for the same estimations. This hints at the notion that bipedalcharacter state captures slope more accurately than using foot pressure forces, andfoot pressure forces capture compliance more accurately than bipedal state fea-tures. We verify this by comparing the test histograms. By comparing the errorhistograms for slope from the respective figures, we see that using state results inlower spread for slope with higher accuracy. By comparing the error histogramsfor compliance, we see that the spread of errors from the compliance estimationusing bipedal state features is slightly larger than that of using foot pressure forces.This hints that the model learns to be sensitive to certain features that may capturethe desired variable to estimate more accurately.5.7 DiscussionThis section provides a brief discussion about the results. We show that a sharedarchitecture gives a sufficient model for our estimation purposes. This model scalesthe number of parameters with the complexity of the estimation task appropriately.In general, larger window sizes provide better estimates. From our experiments,we believe that slope information is contained in the bipedal state of the characterwhile foot pressure forces contain compliance values. Having memory of the timeseries inputs that is represented by the cell state of the LSTM cell allows the modelto pick up on important patterns hidden in the input sequence. We suspect that byusing the sequence of bipedal states, the model is able to pick up on the differencein state values and map those differences to slopes. To make this more obvious,we conducted an experiment (not shown) where we provided a sequence of statedifferences as time series input to make the information more explicit and verifiedsimilar performance quality.40(a) Validation losses for Slope and Compliance predic-tions. The arrows point to the respective minimum lossesof each trial(b) Sample Test Predictions for each output.(c) Histograms of test error predictions for slope andcompliance. µs = 0.006,σs = 0.084,µc = −0.001,σc =0.113Figure 5.3: Results for Slope and Compliance prediction using a shared ar-chitecture41(a) Validation losses for compliance predictions.(b) Sample compliance test predictions for each output.(c) Histograms of test error predictions for compliance.µ = 0.027,σ = 0.118Figure 5.4: Results for compliance prediction using separate model42(a) Validation losses for Slope predictions.(b) Sample slope test predictions for each output.(c) Histograms of test error predictions for slope. µ =−0.002,σ = 0.072Figure 5.5: Results for slope prediction using separate model43(a) Validation losses for Slope and Compliance predic-tions.(b) Sample Test Predictions for each output with half ofthe units.(c) Histograms of test error predictions for slope andcompliance. µs = 0.008,σs = 0.094,µc = −0.020,σc =0.124Figure 5.6: Results for using half of the units.44(a) Validation losses for Slope and Compliance predic-tions for a window of size 15.(b) Sample Test Predictions for each output with a win-dow size of 15.(c) Histograms of test error predictions forslope and compliance for a window of size15. µs = 0.001,σs = 0.144,µc =−0.009,σc =0.235Figure 5.7: Results for a window size of 1545(a) Validation losses for Slope and Compliance predic-tions.(b) Sample Test Predictions for each output.(c) Histograms of test error predictions for slope andcompliance. µs = 0.016,σs = 0.151,µc = 0.025,σc = 0.242Figure 5.8: Results for a window of 746(a) Validation losses for Slope and Compliance predic-tions.(b) Sample Test Predictions for each output.(c) Histograms of test error predictions for slope andcompliance. µs = −0.131,σs = 0.334,µc = 0.161,σc =0.406Figure 5.9: Results for a window of 347(a) Validation losses for Slope and Compliance predic-tions.(b) Sample Test Predictions for each output.(c) Histograms of test error predictions for slope andcompliance. µs = 0.00178,σs = 0.120,µc = −0.005,σc =0.152Figure 5.10: Results of only using state features48(a) Validation losses for Slope and Compliance predic-tions.(b) Sample Test Predictions for each output.(c) Histograms of test error predictions for slope andcompliance. µs = 0.028,σs = 0.146,µc = −0.014,σc =0.140Figure 5.11: Results of only using foot pressure forces49Chapter 6ConclusionsIn this thesis, we explored using a deep recurrent network along with a recent his-tory of locomotion and foot pressure data for terrain estimation. We were ableto develop an architecture with enough complexity that can handle temporal data.Our approach was motivated by the idea of building perception through structuredmovement and physical sensations gained through the agent’s modalities. Throughour experiments, we discovered that history length, parameter size, as well as fea-ture type all contribute to the fidelity of the model. Using a longer history allowedthe model more opportunity to capture key patterns. Increasing parameter spacegave the model more freedom to search for optimal parameter values. Lastly, ourexperiments hinted that key pieces of information were embedded in the type offeatures given to the model; the model was able to learn which set of features gavegreater correspondences with terrain parameters without any explicit information.Automatically learning these correspondences is similar to how humans learn topay attention to certain modalities to learn more about certain aspects of the envi-ronment. We show that this model achieves high fidelity that can be used in realtime as supplemental information. A limitation of this model is the tight couplingof the model with the locomotion, sensory information, and environment; a changein either of these components would require a separate model to be trained. Gener-alizing a model like this for a variety of motion, terrain, and sensory information isa challenge. In the future, we would like to explore the best use cases for a similarmodel for boosting controllers. Incorporating a memory element into a controller50may enable it to memorize key events and anticipate important changes allowingfor highly dynamic responses by the agent.51BibliographyAlanis, A. Y., Sanchez, E. N., Loukianov, A. G., and Perez, M. A. (2011).Real-time recurrent neural state estimation. IEEE Transactions on NeuralNetworks, 22(3):497–505. → pages 12Barsalou, L. W. (2007). Grounded Cognition. Annual Review of Psychology,59:1–21. → pages 6Berthoz, A. (2002). The Brain’s Sense of Movement. Harvard University Press.→ pages 2Connor, J. T., Martin, R. D., and Atlas, L. E. (1994). Recurrent Neural Networksand Robust Time Series Prediction. IEEE Transactions on Neural Networks,5(2):240–254. → pages 12Debasish Sena, N. K. N. (2015). Application of time series based predictionmodel to forecast per capita disposable income. IEEE. → pages 12Gers, F., Eck, D., and Schmidhuber, J. (2000). Applying lstm to time seriespredictable through time-window approaches. Technical report. → pages 12Gers, F. A., Schmidhuber, J., and Cummins, F. (1999). Learning to forget:Continual prediction with lstm. Neural Computation, 12:2451–2471. → pagesx, 26, 27Giguere, P. and Dudek, G. (2009). Surface identification using simple contactdynamics for mobile robots. In 2009 IEEE International Conference onRobotics and Automation, pages 3301–3306. → pages ix, 7Guo, K., Zou, D., and Chen, X. (2015). 3d mesh labeling via deep convolutionalneural networks. ACM Trans. Graph., 35(1):3:1–3:12. → pages 3Hashlamon, I. and Erbatur, K. (2016). Joint friction estimation for walkingbipeds. Robotica, 34(7):16101629. → pages 1952Heess, N., Hunt, J. J., Lillicrap, T. P., and Silver, D. (2015). Memory-basedcontrol with recurrent neural networks. CoRR, abs/1512.04455. → pages 12Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. NeuralComput., 9(8):1735–1780. → pages 25, 26Holden, D., Saito, J., and Komura, T. (2016). A deep learning framework forcharacter motion synthesis and editing. ACM Trans. Graph.,35(4):138:1–138:11. → pages 3HuH, D. and Todorov, E. (2009). Real-time motor control using recurrent neuralnetworks. In 2009 IEEE Symposium on Adaptive Dynamic Programming andReinforcement Learning, pages 42–49. → pages 12Lipton, Z. C. (2015). A critical review of recurrent neural networks for sequencelearning. CoRR, abs/1506.00019. → pages 25Ljung, L., editor (1999). System Identification (2Nd Ed.): Theory for the User.Prentice Hall PTR, Upper Saddle River, NJ, USA. → pages 11Marion, P., Fallon, M., Deits, R., Whelan, T., Antone, M., McDonald, J., andTedrake, R. (2015). Continuous Humanoid Locomotion over Uneven Terrainusing Stereo Fusion, pages 881–888. IEEE. → pages 10N. Houshangi, F. A. (2005). Accurate mobile robot position determination usingunscented kalman filter. IEEE. → pages 19N. Yadaiah, G. S. (2006). Neural network based state estimation of dynamicalsystems. IEEE. → pages 12Pascanu, R., Mikolov, T., and Bengio, Y. (2012). Understanding the explodinggradient problem. CoRR, abs/1211.5063. → pages 25Peng, X. B., Berseth, G., and van de Panne, M. (2016). Terrain-adaptivelocomotion skills using deep reinforcement learning. ACM Trans. Graph.,35(4):81:1–81:12. → pages 3Prasad, S. C. and Prasad, P. (2014). Deep recurrent neural networks for timeseries prediction. CoRR, abs/1407.5949. → pages 12Reina, G., Paiano, M., and Blanco-Claraco, J.-L. (2017). Vehicle parameterestimation using a model-based estimator. Mechanical Systems and SignalProcessing, 87:227 – 241. → pages 1953Sandeep Manjanna, Gregory Dudek, P. G. (2013). Using gait change for terrainsensing by robots. 2013 10th International Conference on Computer and RobotVision (CRV 2013), 00:16–22. → pages ix, 9Simon, D. (2006). Optimal State Estimation: Kalman, H Infinity, and NonlinearApproaches. Wiley-Interscience. → pages 12Sjo¨berg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Glorennec, P.-Y.,Hjalmarsson, H., and Juditsky, A. (1995). Nonlinear black-box modeling insystem identification: A unified overview. Automatica, 31(12):1691–1724. →pages 11Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., andPotts, C. (2013). Recursive deep models for semantic compositionality over asentiment treebank. In Proceedings of the 2013 Conference on EmpiricalMethods in Natural Language Processing, pages 1631–1642, Stroudsburg, PA.Association for Computational Linguistics. → pages 3, 12Tieleman, T. and Hinton, G. (2012). Lecture 6.5—RmsProp: Divide the gradientby a running average of its recent magnitude. COURSERA: Neural Networksfor Machine Learning. → pages 32Vernon, D. (2008). Cognitive vision: The case for embodied perception. Imageand Vision Computing, 26(1):127 – 140. Cognitive Vision-Special Issue. →pages 6Walas, K. (2015). Terrain classification and negotiation with a walking robot. J.Intell. Robotics Syst., 78(3-4):401–423. → pages ix, 8Wenhao Yu, C. Karen Liu, G. T. (2017). Preparing for the unknown: Learning auniversal policy with online system identification. Pre print. → pages ix, 10, 11Wilson, R. A. and Foglia, L. (2017). Embodied cognition. In Zalta, E. N., editor,The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, StanfordUniversity, spring 2017 edition. → pages 2Yazdani, H. (2009). Prediction of chaotic time series using neural network. InProceedings of the 10th WSEAS International Conference on Neural Networks,NN’09, pages 47–54, Stevens Point, Wisconsin, USA. World Scientific andEngineering Academy and Society (WSEAS). → pages 12Yi, S.-J., Zhang, B.-T., and Lee, D. D. (2010). Online learning of uneven terrainfor humanoid bipedal walking. In Proceedings of the Twenty-Fourth AAAI54Conference on Artificial Intelligence, AAAI’10, pages 1639–1644. AAAIPress. → pages 10Yin, K., Loken, K., and van de Panne, M. (2007). Simbicon: Simple bipedlocomotion control. ACM Trans. Graph., 26(3). → pages 4, 15, 16Ziv Bar-Joseph, Anthony Gitter, . I. S. (2012). Studying and modelling dynamicbiological processes using time-series gene expression data. Nature. → pages1255

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0348730/manifest

Comment

Related Items