UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Reinforcement-learning control framework and sensing paradigm for flapping-wing micro aerial vehicles Motamed, Mehran 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2006-0267.pdf [ 9.26MB ]
JSON: 831-1.0065534.json
JSON-LD: 831-1.0065534-ld.json
RDF/XML (Pretty): 831-1.0065534-rdf.xml
RDF/JSON: 831-1.0065534-rdf.json
Turtle: 831-1.0065534-turtle.txt
N-Triples: 831-1.0065534-rdf-ntriples.txt
Original Record: 831-1.0065534-source.json
Full Text

Full Text

Reinforcement-Learning Control Framework and Sensing Paradigm for Flapping-Wing Micro Aerial Vehicles by  Mehran Motamed B.Sc, Sharif University of Technology, 2003  A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF THE REQUIREMENTS FOR T H E D E G R E E OF MASTER OF APPLIED SCIENCE in THE FACULTY OF G R A D U A T E STUDIES (Electrical and Computer Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA April 2006 © Mehran Motamed, 2006  Abstract Insects are fascinating for their maneuverability and complex aerobatics. Flapping-wing micro aerial vehicles are inspired from insect flight and aim to achieve high maneuverability at low speeds as well as hovering. Such a vehicle would have unique applications in social and economic sectors as well as in the military. This work, introduces a learning approach to flight control for flapping-wing micro aerial vehicles. A reinforcement-learning control framework has been proposed as a suitable biomimetic candidate for control of micro aerial vehicles. This work also discusses a matching sensing paradigm as a byproduct of the control approach. The control framework is then implemented using the Q-learning algorithm for the case study of lift generation for microflight. The results from a computer simulation using a quasi-steady aerodynamic model, and from an experimental investigation on a dynamically scaled model, confirm the applicability of the proposed framework. Moreover, the results of the learning scheme are shown to be comparable to a biological fruit fly, melanogaster,  efficiency.  Drosophila  in terms of the mean lift-force coefficient and the mean aerodynamic  Contents Abstract  ii  Table of Contents  iii  List of Tables  vi  List of Figures  vii  Acknowledgements  ix  Dedication  x  1 Introduction  1  1.1  Flapping-Wing Micro Aerial Vehicles  2  1.2  Flapping Flight in Biological Insects  4  1.3  Control and Sensing for MAVs  8  1.4  Outline  10  2 Proposed Control Framework and Sensing Paradigm  13  2.1  Introduction  13  2.2  Control Framework: Reinforcement Learning  13  2.2.1  Problem Statement  13  2.2.2  Learning in Nature  16  2.2.3  Reinforcement Learning  17  iii  CONTENTS  2.3  iv  RL class of problems  23  RL versus other methods  26  Sensing Paradigm for Micro Flight  28  2.3.1  Problem Statement  29  30  2.3.2  2.3.3 2.4  Scaling  Review of Miniature Force Sensing  35  Biological sensors  35  Biomimetic Sensors  38  Purely Engineering Sensors  40  Sensor-rich Paradigm  44  Summary and Discussion  47  3 An RL Approach to Lift Generation in Flapping MAVs 3.1  3.2  3.3  4  49  Algorithm  49  3.1.1  52  Q-learning  Simulation  56  3.2.1  Quasi-steady World Model  56  3.2.2  R L Implementation  60  3.2.3  Evaluation  62  Experiment  65  3.3.1  Dynamically Scaled Model  66  Theory  66  Model Specification  68  Hardware  71  Software  74  3.3.2  Power Requirement  3.3.3  Implementation: Learning on Scaled Model  78  3.3.4  Evaluation of Experiment  82  Conclusion  • • •  76  92  CONTENTS  v  4.1 TAitureWork  93  Bibliography  96  A Wing-shape Equations Used for Calculating Wing Morphological Parameters  102  B Water-glycerine Mixture Properties  104  C The Effect of Side Walls, Surface and Ground on Mean Lift-Force Coefficient D Calibration Procedure for ATI-Nanol7 Force Sensor  106 108  E The Contribution of Wing Missing Part to the Total Aerodynamic Force F Force Filtering: Spectral Analysis of Raw Forces  110 111  List of Tables 1.1  Reynolds numbers for different moving objects  2.1  Possible target specifications of a force sensor to be used on an MAV.  2.2  Comparison between robot and agent in a sensor-rich paradigm  48  3.1  Simulation parameters for a typical Drosophila  60  3.2  Morphological and functional parameters of the prototype based on flight performance of Drosophila  3.3  .  69  melanogaster  ,  86  Parameters used in lift and power equations for the extrapolated MAV of wingspan b based on experimental results  89  q  3.5  Parameters used in power equations for the biological Drosophila  in its  maximum performance 3.6  90  Comparison result between a fruit fly and the extrapolated M A V in terms of lift generation (CL) and aerodynamic power efficiency (fj^)  A.l  Morphological parameters calculated for Drosophila shape  B.l  30  Morphological and functional parameters of the model based on experimental results  3.4  4  melanogaster's  90 wing  •  103  Some properties of the water/glycerine (CzH%Oz) mixture. The source for all data except for glycerine 100% is [47]. Glycerine 100% is from [88]. 105  vi  List of Figures 1.1  Poetry in motion  5  2.1  Organization of Chapter 2 showing the flow of topics discussed in the chapter  14  2.2  Agent-Environment interaction in an R L model  18  2.3  Finite State Machine (FSM) models for the agent and the environment.  18  2.4  The relationship between stationary, deterministic and Markovian problem boundaries  26  2.5  Schematic layout of isomorphic scaling problem  31  2.6  Simplified sensor power-analysis model  32  2.7  Campaniform sensilla  37  2.8  A closer look at the a haltere of an insect, the angular acceleration feed-  2.9  back sensor  38  A biomimetic haltere  39  2.10 Schematic of a piezoresistive pressure sensor  41  2.11 Silicon-based capacitive pressure sensors  42  2.12 The agent-environment and the robot-external world boundaries  45  3.1  Force decomposition of the aerodynamic force, F  56  3.2  Isometric view of a flapping wing  57  3.3  Force coefficients of aerodynamic forces  58  3.4  Graph of et, 7t and At used in simulation  59  aer0  vii  LIST OF FIGURES  3.5  viii  Wing-chord representation of the optimal policy found by the agent maximizing the lift force in the simulation  3.6  63  The position and velocity profiles of the optimal policy found in the simulation  64  3.7  Improvement on lift generation and jerk avoidance during the simulation. 65  3.8  Wing planform of Drosophila  3.9  Plot of force, F, and torque, T, versus model wing span, b, and model  melanogaster  flapping frequency, n  69  70  3.10 Schematic of the single-wing driving mechanism (not to scale)  73  3.11 Photos of the experimental setup  74  3.12 Close-up of the wing joint before wing is attached and submersed in glycerine  75  3.13 The actual wing used in the experiment  75  3.14 Power expenditure of the total input power budget in biological insects.  77  3.15 Graph of et, 7t and A* used in simulation  80  3.16 Control loop of the experiment showing different components involved in actuation and sensing. .  81  3.17 Wing chord representation of the optimal policy found by the agent maximizing lift force in the experiment  83  3.18 The position and velocity profiles of the optimal policy found in the experiment  84  3.19 Improvement on lift generation and jerk avoidance during the experiment. 85 C.l  F.l  The effect of the side walls, bottom and ground of the tank on mean lift-force coefficient  107  Power Spectral Density (PSD) estimate via Welch's method  Ill  Acknowledgements Coming to Canada for study, has changed me and my life in many ways. I have done this work during this sweet and sour period. Therefore, it is a very difficult task to count everybody who contributed to this achievement, and an impossible one to thank all. A special thank you is owed to my supervisor, Dr. Joseph Yan, whose continuous support and friendly dedication made this work a possibility. He believed in me from the beginning and kept my mind open for new ideas. I'm grateful for much I have learned from him personally and professionally. I also thank Dr. Sheldon Green, for his share in developing the dynamically scaled model, and Dr. John Madden, for his help and support during working closely with him and his students. I thank Uli Seibold, Winson Lai and Eric Yim from our research group for helping me during this work and for their valuable feedbacks. Last but not the least, I should thank the long list of friends I have, for their support and encouragement during the emotional highs and lows of doing this work. Thank you all.  ix  This work is dedicated to my parents, my brother and my  With their constant  love and attention,  sister.  they made the  time and distance between us much less than it really is.  Chapter 1  Introduction Evolution over billions of years on this planet has resulted in organisms that, are "optimal" in the sense that they have characteristics that allow them to perform efficiently and survive robustly in their environment; insects are such examples. Who hasn't witnessed mating dragonflies, flying in pairs, or mosquitoes seemingly disappearing as one tries to track them, or flies stopping instantly to turn sharply in another direction? And who hasn't tried to catch a fly by hand and been amazed by their agility? Insects can land upside-down on a ceiling, take off in any direction, and can fly straight up, down, or backwards. They often hit walls or windows and start falling, then regain control in a few microseconds and continue flying. Insects beat their wings from 5 to 1000 times per second. Wing-beat patterns of most insects, like flies or mosquitoes normally cannot be seen. However, butterflies, among the slowest flapping insects, give everyone a chance to admire the beautifully sophisticated manner in which they flap their wings. It almost looks like insects, with their amazing aerobatics, are constantly trying to prove nature's supremacy over human knowledge. On the other hand, humans have not only tried to understand the phenomenon of insect flight but have also tried to mimic it robotically.  1  1.1 Flapping-Wing Micro Aerial Vehicles  1.1  2  Flapping-Wing Micro Aerial Vehicles  One definition for micro aerial vehicles is employed by D A R P A [51]. According to a 1  DARPA program, Micro Aerial Vehicles (MAVs) are aircraft limited in size to less than 15 cm ( « 6 in.) in length, width and height. MAVs must have a weight of 50 g or less and must be capable of staying aloft for 20 to 60 minutes for a distance of 10 kilometers. This definition is based on DARPA's applications, and actual specifications should be determined based on a given project. MAVs have wide applications in civil service, industrial and commercial sectors, educational and scientific projects as well as in the military. Current assets, such as satellites, manned aircraft and Unmanned Aerial Vehicles (UAVs) are unable to explore inside buildings, stairwells, shafts, caves, tunnels, pipes and rubble heaps. To successfully do so, a vehicle would need to be very agile, highly maneuver able at low speeds and able to hover.  Although one might consider designing low-speed MAVs by scaling down dimensions of a conventional aircraft, doing so is not feasible, because it would significantly change the qualitative behavior of the surrounding fluid.  As a result, the behavior of the  surrounding fluid can no longer be explained by the classical principles on which these aircraft have been built. Moreover, many conventional aircraft use thrust to generate lift to stay aloft, and are unable to hover. In view of the remarkable capabilities of insect flight, numerous researchers have turned to the use of flapping wings to engineer a solution. Insect-scale flapping-wing MAVs have been proposed for applications where maneuverability at low speed is necessary, often' in confined spaces, as well as for general applications of UAVs. Examples include the following: • Exploration of inaccessible or hard-to-reach areas: - Internal inspection of pipes — Geological exploration 1  DARPA: The Defense Advanced Research Projects Agency of USA.  1.1 Flapping-Wing Micro Aerial Vehicles  3  — Providing information for police patrols — Mars exploration • Investigation of hazardous environments: — Locating victims trapped under rubble — Locating civilians trapped in a fire — Guiding forest-fire fighting — Identifying chemical leak zones • Surveillance (mostly of indoor environments): — Quality-of-Service assurance — Air quality sampling — Traffic monitoring and management — Law enforcement (e.g., border surveillance) • Ecology and science: — Forestry and wildlife surveys — Agriculture (e.g., monitoring crops) — Flight studies • Entertainment: — Toys • Military: — Reconnaissance and surveillance — Tagging — Biochemical sensing Some notable work in making flapping-wing MAVs includes the Micromechanical Flying Insect (MFI) project by Fearing et al. [32], the Entomopter design by Michelson and Reece [54], and the MicroBat Ornithopter by Pornsin-Sirirak et al. [62].  1.2 Flapping Flight in Biological Insects  1.2  4  Flapping Flight in Biological Insects  The first notable obstacle in building an MAV capable of sustained flight, is to understand the high aerodynamic forces generated by flying insects that distinguish insects' flight from that of a classical aircraft. The traditional steady-state aerodynamics of current fixed and rotary wing aircraft cannot account for forces generated by flapping flight resulting in underestimation of such forces . The answer lies in the unsteady nature of the aerodynamics of flapping flight. In aerodynamics, the Reynolds number (Re) is a measure of the ratio of inertial to viscous forces, and determines the general qualitative behavior of the fluid interaction with an object. Table 1.1 shows typical Reynolds numbers for different moving objects. As can be seen, the Reynolds number of a flying insect differs greatly from that of a commercial aircraft, as does the aerodynamics. Objects Swimming sperm Fruit fly Dragonfly Hummingbird Swimming human F-18 fighter aircraft C-5 heavy-cargo transport  Reynolds Number IO" 10 10 10 10 10 10 2  2  3  5  6  7  8  Table 1.1: Reynolds numbers for different moving objects. Let's look more closely at the flapping stroke of a biological insect to explain, to some degree, four examples of unsteady aerodynamic mechanisms that are partly responsible for the generation of flight forces in insects: namely, delayed stall, the Kramer wing-wake  interaction  and clap-and-fling . s  effect,  Figure 1.1 shows a fly in different stages of  one full stroke. Red (darker) and blue (lighter) colors indicate either sides of the wings For example, the steady-state equations of aerodynamics predict flight forces for bumble bees that are about four times smaller than what is required, resulting in the famous paradox of "bumble bees can't fly"! For purposes of introduction, this explanation is simplified and should not be considered as the current stage of research advancement in this field. 2  3  1.2 Flapping Flight in Biological Insects  5  (g)  JL  Figure 1.1: Poetry in motion. Each wing stroke forms a rough sideways figure eight. The upper side of the wing can be seen in (a—d) in red/darker color while the under side is visible in (e—h) in blue/lighter color. During the downstroke, the wings sweep forward. Then, at the end of the stroke, they rotate so that the undersides face up, and then the wings sweep backward for the upstroke. At the end of the upstroke, they rotate again and sweep forward, starting the cycle anew. Illustration modified from [96]. and represent the downstroke (a—d) and upstroke (e—h) halves, respectively. In stage (a), the wing is ready to begin the downstroke. The wings start their translational motion in (b). This translational component of the total aerodynamic force is the one that contributes the most to the generated force. The reason lies in the stable attachment of the leading-edge vortex, even at high angles of attack. This mechanism is often named delayed stall.  In (c) and (d), the wings undergo rapid rotations in preparation for the  next half-stroke. This rapid rotation contributes to the total aerodynamic force by generating peak forces at either end of the stroke, and is called the Kramer effect. During (e—g), the wings interact with their own wake produced in the preceding half-stroke. This mechanism is difficult to model and is called wing-wake  interaction.  At the end  of upstrokes, (h) to (a), the wings clap dorsally with leading edges touching before the trailing edges, and then fling apart again with the trailing edges following the leading edges. This mechanism, called clap-and-fling,  is a combination of separate mechanisms  1.2 Flapping Flight in Biological Insects  6  and is believed to partially account for the high force generation in flapping flight. The full wing stroke, (a—h), roughly forms a sideways figure of eight. The seminal work of C P . Ellington in 1984 [21] [22] [23] [24] [25] [26] gave a momentum to and provided a basis for quantitative understanding of insect flight aerodynamics. Yet until recently, researchers were unable to rigorously quantify this behavior or adequately measure its forces and flow patterns. Qualitative mechanisms of insect flight, however, have been revealed to some extent. The best qualitative and quantitative understanding of insect unsteady aerodynamics has come from three general techniques: 1. Direct methods: These methods use direct measurement of aerodynamic forces or capture the qualitative behavior of insect flight mainly on tethered biological specimens (e.g., [23], [19] and [75]). Direct measurement from biological insects, is limited in the sense that it provides understanding of insect flight as the insect flaps its wings, but it cannot be used to command arbitrary flapping trajectories. Moreover, the measurements from insects in free flight is very hard to obtain, while the concern with tethering is that it may affect true free-flight wing-beat patterns. 2. Computational and analytical methods: 2. a. Computational fluid dynamics (CFD) models: With ongoing advances in computational methods, scientists are trying to numerically solve the NavierStoke equations to explain the unsteady aerodynamics of insect flight in a more rigorous manner (e.g., [68] and [80]). C F D methods are increasingly achieving higher accuracy in modeling flapping flight, but at the current stage of advancement, simulating a single wing beat could take several hours of computing time, depending on the degree of accuracy. 2. b. Analytical modeling of insect Eight: In search of the near-field models for the aerodynamics of insect flight, some researchers developed analytical models of unsteady aerodynamics with a degree of success. These models are not  1.2 Flapping Flight in Biological Insects  7  as rigorous as numerical models but tend to incorporate the basic phenomenology of the underlying unsteady mechanisms (e.g., [5], [94]). Analytical models are often developed around modeling a particular mechanism (e.g., one that has been identified by flow visualization methods), and therefore may not be suitable for near-field modeling of unsteady aerodynamics in general. Furthermore, there are few available analytical models to use in practice. 2. c. Quasi-steady modeling of insect Bight: Based on the quasi-steady assumption (i.e., the instantaneous forces on a flapping wing are equal to the forces during steady motion of the wing at an identical instantaneous velocity and angle of attack [21]), researchers have developed approximate models that greatly simplify insect flight aerodynamics. The quasi-steady models, to date, do not adequately account for the mean lift generated by biological insects. However, because of their simplicity, they are among the only computational models that can be used in realtime, for example, for control purposes. 3. Instrumented physical methods: The experimental difficulties of measuring forces on tethered insects and the theoretical challenges in modeling insect flight persuaded researchers to build mechanical models that are both geometrically and kinematically similar to a biological insect. The model, called a dynamically scaled model, conserves the underlying phenomena of flapping flight and is used along with flow visualization methods to provide measurements and understanding of the unsteady mechanisms involved (e.g., [27], [20] and [44]). For the time being, using dynamically scaled models of flying insects seems to be the best way to confirm control methods, although building such apparatuses with accuracy involves a number of experimental challenges. S.P. Sane provides a comprehensive survey on the aerodynamics of insect flight that included the above methods, with their advantages and drawbacks [69].  1.3 Control and Sensing for MAVs  1.3  8  Control and Sensing for M A V s  In order to achieve autonomy for a flapping-wing MAV, actuators would be required to drive the wings, sensors to provide measurements and a controller that processes these measurements to determine what signals to send to the actuators. With some degree of achievement in understanding unsteady mechanisms of flapping flight, the research on sensor design and flapping-flight control for microflight seems to be getting more attention. The issue of control is coupled with the sensing issue, and exploring the one would affect the other. That is, taking a certain approach to addressing the control issue, for example, would require taking a consistent approach in sensing for microflight. Insect sensors are fascinating in their simplicity in design, efficiency in function, and robustness to variable environments; many engineering applications would benefit from these features. In general, literature on insect flight control and sensing is mostly implicit, and work devoted to the subject is limited. This is mainly because the research has been focused on aerodynamics; the actual approaches and algorithms for control need to be inferred from these works. From a more conventional control point of view, Schenato [71] views the control of flapping flight from its superproblem class of underactuated  nonlinear mechanical control  systems, and proposes a theoretical approach using high-frequency periodic feedback to increase the control inputs. The process by which mechanisms come into being in traditional engineering can be quite different from that by which organisms evolve in nature. The typical engineering process of developing a system is to specify the requirements, generate a design, analyze the design and then test it to determine how well the requirements are met. At any point, the previous steps (especially the requirements and/or the design) may be revised as more is learned (especially during analysis and testing). In contrast, biological structures exist in an organism and may not be designed for a specific predefined task. Nevertheless, a biological component somehow adapts itself to best benefit the system (e.g., in Diptera, the hind wings have evolved into halteres, which provide angular ve-  1.3 Control and Sensing for MAVs  9  locity feedback). Provided with enough senses to feel and enough capacity to learn, it seems an insect system manages with what it has to survive and reproduce. Several key advantages of natural evolution over human engineering include (a) the robustness of an individual organism to change and to uncertainty; (b) the ability to evolve distinctly different viable organisms to provide diversity (which is a means of providing robustness to an ecology); and (c) the simplicity in design rules by which organisms survive and adapt, precluding the need for complex and lengthy design cycles. Some engineering methodologies try to capture some of the advantages of evolution. For mechanism design, "rapid prototyping" methods are similar to evolution, in that numerous prototypes with simpler designs are quickly generated and tested on a shortened life cycle. This procedure allows engineers to more rapidly identify viable solutions in design and fabrication than if they had carried out full analysis of a few prototypes on long development cycles. Neural networks, genetic algorithms and reinforcement learning are examples of engineering methods that try to understand and replicate in artificial systems, the complexity with which learning can be achieved. The conventional engineering approach to designing a flight control system would require a good mathematical model of unsteady aerodynamics. As mentioned, none of the available models are mature enough for use in on-board flight control. However, insects are capable of controlling their course of flight with their limited computational power. This observation suggests that a biomimetic approach of "learning" to fly might be more suitable. A direct inspiration from nature can be seen in evolutionary algorithms (EAs), which seem suitable for determining good wing trajectories. EAs start off with a population that can evolve with each successive generation. As "survival of the fittest" is built into the algorithm, the best traits are passed down, so it is only a matter of time before "natural selection" finds good fliers. EAs have been simulated by Augustsson et al. [2] and Hunt et al. [40] for finding suitable wing trajectories for bird-like flight. However, these results are not necessarily applicable to insect-scale flight, with its lower Reynolds number (quasi-steady models can predict flight forces for birds fairly accurately, but  1.4 Outline  10  not yet for insects). In biomimetic engineering, the concept is that nature already provides the blueprints for solving many problems faced by engineers. Despite our vast wealth of knowledge about biological systems, conventional engineering designs have little or no basis in nature. Although the pace of human technological advances has sometimes been phenomenal, there are many instances where a limitation, inherently associated with the technology, has been an obstacle, and where a more elegant solution may be found in nature. Insect-scale flight is such an instance. It is therefore humbling to observe biological insects achieving flight with their limited computational power. Such observations, along with shortcomings of the more demanding classical engineering methods for achieving micro-scale flight, encouraged researchers to explore and propose different solutions to this problem. For example, in [95], the author compared a fly to a modern aircraft and discussed how the fly's biological control and sensing schemes differ fundamentally from those of a modern aircraft, and proposed taking a more biomimetic approach to escape from the complications imposed by conventional engineering. Fish locomotion control is also related to flapping-flight control in that both systems are interacting with a viscous fluid (e.g., [18] and [90]). This research area is inherently multidisciplinary in nature, calling for collaboration from biologists, neurologists and engineers. Human engineering ingenuity still falls short in comparison to nature's capabilities and therefore the biomimetic robotics research area is widely open to inspiring solutions.  1.4  Outline  A top-down approach has been taken to the research in this area as well as to the work done in this thesis. The project of constructing an autonomous flapping-wing MAV capable of sustained flight is a distant goal at the moment, therefore researchers around the world are working on many related subgoals of this project. Like all farreaching goals, the procedures and methods are yet to be set, and the general direction  1.4 Outline  11  of research has to be led by surveys that identify the needs and possible methodologies. Ground-breaking solutions need to be proposed, verified and compared against other methods. The nature of research in this area, calls for a top-down approach for the following reasons.  First, as the published work is recent and mainly proposes solutions to a  problem that has not been fully addressed before, there is a lack of general consensus about methods, models and procedures. Second, the higher-level explanations act as justification for such methods, described later in detail, and therefore have to precede them to support them and provide the motivation, and to suggest the importance of such works. Third, it places the work in relation to other work and to the final goal. Therefore, the approach to the problem presented in this thesis is believed to be of importance itself. The work of this thesis is widely interdisciplinary, gathering knowledge from mechanical and electrical engineering, biology, neuroscience and computer science. However, the level of readers' prior knowledge of these areas is assumed to be minimal. The essential building blocks of the work are introduced before addressing the subject itself and references are made to direct curious readers toward more detailed discussions. The thesis is organized in two main chapters: Chapter 2 provides higher-level explanation and a proposal of a control framework and sensing paradigm, while Chapter 3 provides a detailed case study as an example of how the proposed control framework can be employed. In Chapter 2, a reinforcement-learning framework for the control problem of flapping flight of MAVs is proposed. The applicability of the framework to the unsteady aerodynamics problem of flapping flight is confirmed, as an example, by classifying the unsteady aerodynamics as a subset of reinforcement-learning problems. The second half of the chapter is dedicated to a sensing paradigm suitable for the proposed framework, based on a survey for current sensor-design schemes. As mentioned, Chapter 3 is a case study that implements an algorithm using the control framework of Chapter 2. It contains a computer simulation using a quasi-  1.4 Outline  12  steady model, and an experiment on a dynamically scaled model, using the implemented algorithm. The dynamically scaled model developed for experimental investigation of flapping flight is also described. Simulation and experimental results are evaluated qualitatively and quantitatively to enable future comparison of the methods. Chapter 4 concludes the thesis and suggests future work at both the framework and implementation levels.  Chapter 2  Proposed Control Framework and Sensing Paradigm 2.1  Introduction  This chapter is organized as depicted in Figure 2.1. Dashed items are those already discussed in the introduction or that may be found in the literature. Light bulbs mark documents that propose something for the first time. Magnifiers signify browsing a multi-document, and wrenches identify experimental items. Numbers on items represent the section number in which they are discussed. As can be seen, there are two main ideas that will be discussed in §2.2 and §2.3. Each of the ideas is supported and derived from a list of requirements and an inspiration from browsing a multi-document.  2.2 2.2.1  Control Framework: Reinforcement Learning Problem Statement  The problem is to find a "suitable" control framework for flapping MAVs. "Suitable" should be defined with regard to the applications of MAVs, as detailed in §1.1. The following is a possible list of criteria against which any proposed control approach can 13  2.2 Control Framework: Reinforcement Learning  MAV Applications  11  Aerodynamic Models  §2.2.2 §2.2.1 Control Needs  §2.2.3 > Experiments; s  Learning In Nature IT  Proposed Control Framework  §2.3.2  ©  Proposal  P  Literature review  «jf  Experiment  Force Sensing Review Proposea Sensing Paradigm  Figure 2.1: Organization of Chapter 2 showing the flow of topics discussed in the chapter. Dashed items are not discussed in this chapter. Numbers on other items represent the section number in which they are discussed. be compared. Notice that some of the items in the list are closely related to each other, and therefore could be grouped as a single item. Yet, the distinctions made in the list are believed to shed light on the problem from different perspectives: • The learning capacity of a control/learning method is identified by the class of problems it can address. The method should be capable of addressing issues concerning the control of MAVs, and in particular, the basic problem of unsteady aerodynamics of flapping flight. • Considering typical applications of MAVs, robustness to the uncertainties and to transient changes in the dynamic environment is one of the main desired characteristics (e.g., wind gusts, shock waves from explosions and so on).  2.2 Control Framework: Reinforcement Learning  15  • Another concern is adaptability to the permanent changes in the circumstances (e.g., broken wing, transition into water and so on). For that matter, the method should be as independent to a specific model or structure as possible. This is because an MAV is dealing with a dynamic world; based on the application and level of implementation, the world model could be unknown or unavailable. This issue can also be seen as a case of robustness. • Stratification of control layers is required due to layered nature of control goals in MAV applications (e.g., mission statements are at a higher level than trajectory planning). A hierarchical control approach can address this issue. • Intuitiveness is a subtle yet important property. Intuitiveness is an abstract property that makes a method generally more consistent to the known and established facts and therefore more communicable (i.e., easier to explain/debate), easier to implement and so on. • Generality, another abstract property, which could be defined as having general applicability in terms of being supported by a single idea or scheme, is also desired. • Abstraction is a mechanism and practice to reduce and factor out details so that one can focus on a few concepts at a time. Abstraction is therefore desired to reduce the overhead in control layers and to minimize the number of parameters needed at each layer of control. Together with generality, abstraction allows one to build a hierarchy of control layers based on a single main idea. Most often, intuitiveness, generality and abstraction are closely related attributes, and one would often change as a result of a change in another. • Existence of analytical tools is desirable. Such tools could provide powerful analysis and encourage discussions on different issues in control engineering. Moreover, every method should be evaluated against its goals, which provides a means of comparing different methods. Ease of evaluation is reflected by how easy a certain performance metric can be defined for comparison.  2.2 Control Framework: Reinforcement Learning  16  • Ease o f i m p l e m e n t a t i o n is an important property in engineering. Although a framework could have different implementations, some issues are inherently related to the framework itself, and should be accounted for at this stage. In general, biomimetic frameworks are preferred, mainly because biological systems already have been verified and proven by nature to work. In fact, it were biological systems, in the first place, that showed us what properties are suitable for a system to survive successfully in real world. The first challenge for a proposed framework would be to deal with the unsteady aerodynamics problem. Since for this special case, there is no fast enough, and at the same time, accurate enough aerodynamic model available that can be used for control purposes in an actual prototype (as discussed in 1.2), the control method should deal with this particular situation. 2.2.2  Learning in Nature  The discussion of when and how learning occurs in nature requires years of study in psychology and related fields. This work is related "animal learning theory", in its focus on the interface with the environment. The learning in nature, for example flapping flight in flying insects, is founded on a set of very basic and limited concepts. The following two concepts are particularly inspirational for our problem: 1. All animals such as insects and humans are born with a set of genetically pre-wired reflexes, that is, responses to environmental stimuli. The fundamental building block of learning is Pavlovian conditioning, the ability of animals to associate pre-existing relationships between stimuli and responses to new stimuli. Therefore, Pavlovian conditioning exists on top of the limited number of unconditional reflexes mentioned before [61]. 2. When a response following a stimulus is followed by a reinforcing feedback, the future probability of the response occurring increases (i.e., the relationship between  2.2 Control Framework: Reinforcement Learning  17  them is reinforced). The converse also holds true for punishing feedbacks. This modification of a behavior by the experience of a response and its consequence is called "operant conditioning" or "instrumental conditioning", and the response is said to be conditioned [76]. These concepts provide the building blocks of the complex learning behaviors of all animals, including insects. Therefore, a possible biomimetic control/learning scheme could also be built upon these concepts. 2.2.3  Reinforcement L e a r n i n g  A reinforcement-learning model has been proposed for control of biomimetic flappingwing MAVs. The reinforcement-learning approach, much like learning in nature, is based on conditioning by means of rewards and punishments. Reinforcement learning (RL) is also thought of as a class of problems. The learner, or decision-maker, is called the agent. RL is the problem faced by an agent who should learn behavior through trial-and-error interactions with a dynamic environment.  Issues  in animal behavior learning closely resemble this problem, as acknowledged in [81]. With this definition, R L seems to be a good biomimetic long-term approach for the control problem of MAV flight. Formally, the model consists of : 1  • a discrete set of states, S; • a discrete set of actions, A; and • a set of scalar reinforcement signals, K. As depicted in Figure 2.2, in the RL model, the agent is connected to a dynamic environment via perception and action. Both environment and agent are modeled as stochastic finite state machines (Figure 2.3). On each time step t, the agent is in state 2  ' A s a convention throughout the thesis, parameters with having more than one dimension are represented in bold. Readers with a control engineering background are advised to notice the difference between this definition of state and the one defined in some control textbooks such as [8] and [11]. In the latter, the authors assumed determinism which is not generally the case in this discussion. 2  2.2 Control Framework: Reinforcement Learning  18  Environment >  Figure 2.2: Agent-Environment interaction in an RL model. Three main signals are: action, a , that represents the agent's choice; states, s, on which the agent's decisions are based; and reward, r, that represents the goal. Reward r +i will be associated with (st,a ) and the subscript t + 1 is to emphasize that the award calculation is carried out at time step t + 1. t  t  Figure 2.3: Finite State Machine (FSM) models for the agent and the environment. In the agent FSM, inputs are observations and rewards sent from the environment and outputs are actions. Conversely, in the environment FSM, inputs are actions and outputs are observations and rewards.  2.2 Control Framework: Reinforcement Learning  19  s G S. The agent then chooses an action, a <G A(s ) to be performed, where A(s ) is t  t  t  t  the set of possible action(s) associated with state sj. The action causes the state of the environment to change and as a result, the environment sends two pieces of information as feedback; a reinforcement signal or reward, r i , as the value of this transition and t +  St+i,  an indication of the new state. The agent should choose actions to increase the  long-term sum of the reinforcement signals. Future states and reinforcement signals are determined by transition functions T : s  5 x A i—> Pr(S) and T : S x A i—> P r ( » ) , where members of Pr(S) and Pr($t) are r  distributions over sets S and 3?, respectively. In the actual world, the model also consists of functions I , I and I (Figure 2.3) a  s  r  that change the way the agent affects and perceives the environment. I  a  determines  how the agent's actions affect the environment. J determines how the agent perceives s  the environmental states. I can also be interpreted in a similar manner as the way the r  agent perceives reinforcement signals coming from the environment. As can be seen, if I I Ir s  a r e  n  ° t identity functions, they may affect the observability of the environmental 3  states/reinforcements. In the case of I = I and I = / , the world is observable and the model becomes s  r  a Markov Decision Process (MDP). In the more realistic case, where I and I s  r  are  not identity functions, perception of the agent from its environment is not exact either in states or in reinforcement signals. The model, in this case, is called a Partially Observable MDP (POMDP ). 4  What the agent should do in its interactions with the environment is to find a mapping that maximizes some long-term measure of reinforcement. A mapping from states to probabilities of selecting each possible action is called a policy and is denoted by 7r, where TTt(a\s) is the probability of selecting action a at time-step t, given St = s. Here is a simplified example showing more intuitively how the learning agent interacts H e r e , observability is used as a measure for how well the internal states of a system can be inferred by knowledge of its external outputs. The formal definition is deliberately avoided in this section and can be found i n control textbooks such as [8] and [11]. Pronounced "pom-dp". 3  4  2.2 Control Framework: Reinforcement Learning  20  with the environment. In this example, one round of negotiation between the agent and the environment indicates a time step. Example 2.1. Agent: ... I see I'm in state 3. I choose action 98 to perform. Environment: Action 98 received. Your new state is 34. You are given a reinforcement of +0.8 units. You have 9 possible actions to choose from. Agent: I received a reinforcement of +0.8 units and I see I'm in state 34. I choose action 5 to perform. Environment: Action 5 received. Your new state is 4. You are given a reinforcement of —6 units. You have 2 possible actions to choose from. Agent: I received a reinforcement of —6 units and I see I'm in state 4. I choose action 1 to perform. Environment: Action 1 received. Your new state is 15. You are given a reinforcement of —3 units. You have 9 possible actions to choose from. •  Unlike in most robotic applications, where the agent-environment boundary is similar to the physical boundary between the robot and its environment, the boundary between the agent and its environment in an RL model is closer to the agent. The general rule is that whatever the agent cannot change arbitrarily is external to the agent and therefore within the environment. This, in most cases, includes mechanical parts of a robot, its links and all sensory and actuation parts. In other words, the agentenvironment boundary is not denned by the boundary of the agent's knowledge but by its absolute control. It is worth mentioning that we do not require the environment to be completely unknown. In fact, in many cases the agent has a clear knowledge of how the environment works and how the reward is calculated, yet faces a difficult problem much like how we face a difficult puzzle. However, the rules of the environment are ex-  2.2 Control Framework: Reinforcement Learning  21  ternal to the agent (most importantly the reward calculation) and therefore are beyond the agent's absolute control. The properties that support the RL framework for control of flapping-wing MAVs are as follows: • Robustness and adaptability are important advantages of the proposed framework. The agent is given a goal and is awarded or punished by the reactions of its own actions upon the environment. No constraint has been put on the agent to flap its wings as biological insects do. The agent will basically explore its action space (i.e., every sequence of actuation it can make) to fulfill the given goal. In this statement, there is no trace of mentioning how it will do that. In fact, if the agent came up with what insects came up with in the first place, it would be very rewarding. It is interesting to note that with the same control approach the agent could walk with its wings (much like a canoe paddles itself through the water surface) or it could swim under water, if the goal dictates and the environment feedbacks so (e.g., in the case of broken wings or possible transition into water). Therefore, if the agent is well-implemented in different layers of hierarchy it can show great robustness and adaptability to the changing environment. • The R L framework is very flexible and very general. The time steps do not have to correspond to real time and can, for example, refer to major events arbitrarily spaced in time. The actions could be low-level, like input voltages to motors in the case of a robot, or they could be high-level, for example monitoring a building or finding a victim in the case of a rescue agent. They could also be abstract, like a shift in mentality toward some point (e.g., being less intrusive). R L generality comes from its conceptual simplicity inspired by nature. RL framework can also be generalized to accommodate continuous actions, states and rewards which can potentially be more efficient in terms of memory requirements (for more explanation see [42]). • The R L framework proposes a considerable abstraction that every goal-oriented  2.2 Control Framework: Reinforcement Learning  22  behavior, regardless of its sensory, memory, actuation and objective details, can be reduced to three signals passing back and forth between the agent and its environment: One representing the agent's choice (the action), one representing the basis on which the choices are made (the state) and one to define the agent's goal (the reward). This allows building a hierarchy of behaviors with the same principle which is very important. • The RL framework could be model-less. The agent, of course, uses a model to behave. Here, this means that no information about the environmental  model need  to be assumed by the agent and through its trial-and-error interactions with the unknown (yet at least partially observable to the agent) environment, the agent is able to perform as desired. As mentioned in §1.2, the unsteady aerodynamics of flapping flight lacks a fast and accurate model to simulate the unsteady phenomenon of flapping flight in realtime for control algorithms. Therefore, R L is very practical regarding this problem. In fact, this is very common in nature. For example, insects do not solve Navier-Stokes equations for unsteady aerodynamics in order to fly, nor do we consciously calculate the dynamic forces as we walk. What happens is that because the environment is stationary (or at least slowly-varying non-stationary) and partially observable, one can build an approximate model based on the mutual interactions. Note that these conditions, i.e., not having the exact enviromental model and only being able to partially observe the environment are true for every real-world control case. Exact environmental model and/or full observability of environment are only assumed. This is because, in the real world, no mathematical model can be as exact as nature, and perceptions are always modified versions of reality. Therefore, frameworks that are based on either of these complete assumptions are inherently limited if they need to interact with the real world. • Learning by reinforcement can be done online, since it is similar in this sense to the broad and well-studied problems of supervised learning. The discussion of  2.2 Control Framework: Reinforcement Learning  23  how R L differs from supervised learning comes later in this chapter. • RL is very suitable for repetitive behaviors (and all types of locomotion are usually repetitive behaviors). The reason lies in the difference between R L and heuristic search algorithms. In the former, a policy is sought to maximize a function, which is in the form of a sequence of actions while in the latter, a fixed point is the answer to the optimization problem. • As a result of the agent's policy-searching, an agent potentially could learn policies with delayed reward. That is, the agent could perform a series of actions with little or no reward in order to arrive in a state with a high reinforcement. This could be particularly important in learning an unsteady mechanism of vortex generation at low Reynolds numbers, an important cause of unsteady mechanisms of flapping flight. Delayed reward is addressed in detail in the next chapter.  v  R L class of problems  In this subsection, we focus on the learning capacity of RL methods. As an example, we confirm that the RL approach has the capability to learn the unsteady aerodynamics of flapping flight by showing this problem is a subset of the class of RL problems. Before entering this discussion, the following formal definitions are called for: • Stationary vs.  non-stationary: Stationary models are those in which the  probabilities of making state transitions or receiving specific reinforcement signals do not change over time. That is, in a stationary environment: T (s,a,t) s  = T (s,a), s  T (s,a,t) = T (s,a). r  r  (2.1)  There are times when there is a notion of slowly-varying non-stationary environments. "Slowly-varying" should be defined in relation to an application, and generally, it means that either the rates of the changes in the model are slow, or  2.2 Control Framework: Reinforcement Learning  24  that the magnitudes of the changes are small, or both. Here, the first interpretation (i.e., having a slow rate in change) is meant. • Deterministic vs. non-deterministic: A deterministic environment is an environment in which no randomness is involved in the development of future states of the system and/or reinforcement signals. In a non-deterministic environment, taking the same action in the same state on two different occasions may result in a different next state and/or reinforcement signal. In general, rules of classical physics is considered to be deterministic and rules of quantum physics, on the other hand, are non-deterministic. More formally, states are mapped to probabilities by which the next states can be reached. That is, the environment is said to be deterministic if and only if for any state-action pair (sj, at), a unique state s^ ^ and a unique reinforcement signal 3 a  r* ^ exist, such that: s a  r( \ J Ts{st,a ) = <  i{  1  t  S t + 1  =  U,a)  .I , T (s ,a ) = I  S  1  i  r  t  i f r  *+i = (s,a) r  t  0 otherwise  .... (2.2)  0 otherwise  Markovian vs. non-Markovian: A model is Markovian if the state transitions are independent of any previous environmental states or agent actions. Generally, the dynamics of the system can be written as:  Pr\st+i = s*, r i=r* t+  (s , a , r ), {s , a , n),(s , 0  0  0  x  x  t  a, n ) | . t  (2.3)  If the state/reinforcement signals have the Markov property, then the dynamics can be simplified as: Pr{s +i=s*, t  r i=r* t+  (s ,a ,r )|. t  t  t  (2.4)  2.2 Control Framework: Reinforcement Learning  25  In other words, state/reinforcement signals have the Markov property, and are Markov state/reinforcement, if and only if (2.3) is equal to (2.3) for all s*, r*, and histories, (sj,aj,rj) 0 < i < t. In this case, the environment is also said to have the Markov property. In a more general definition, this is called a first-order Markov property. A k th  order Markov property is satisfied when (2.3) equals:  = s*, r =r*  (2.5)  (s -k,at-k,r -k),...,{^t,at,r )  t+x  t  t  t  Having defined these terms, we can make a statement about the R L domain of problems: In general, it is assumed that the environment is stationary, or at least slowly-varying non-stationary. However, the environment can be non-deterministic or non-Markovian. At first glance, the unsteady aerodynamics problem of flapping flight is a stationary, deterministic but non-Markovian problem . It is stationary because the rules of 5  aerodynamics, although being unsteady (i.e., time-dependent), are not time-varying. It is deterministic because no randomness in calculation of the forces is involved. And finally, it is non-Markovian because at each state the next state/reinforcement, derived from the unsteady aerodynamic mechanisms of flapping flight, depends not only on the current state-action pair, but also on the sequence of state-action pairs leading to that current state. A closer look, however, with regard to M A V applications, suggests that the problem is better approximated as slowly-varying non-stationary, deterministic and k -oxr\ex th  Markovian . It is slowly-varying non-stationary because for typical applications of 6,7  It is worth mentioning that these properties almost never hold true in practice, as our perceptions are always altered and our understanding limited. Therefore, the statement is true only in an approximate sense. Here, the author preferred the term "fc -order Markovian" over its equivalent "not far from Markov". In this text, k identifies a trade-off in approximating the unsteadiness of the phenomenon, k = 1 approximates the problem as being steady. The actual value for k should be found based on application since it depends on the actual Reynolds number. 5  6  7  i,l  2.2 Control Framework: Reinforcement Learning  26  MAVs, it is possible and sometimes likely that the model goes through a permanent change (e.g., in the case of a broken wing, transition into water and so on). However, the rate at which these changes occur is most likely slow. It is fc^-order Markovian because although the current state/action is not enough to determine the next state/reinforcement, it relies more heavily on the current and recent states, approximated using the state/action history of length k. The relationships between the three defined properties, the boundary of the RL domain of problems and the location of our example problem of unsteady aerodynamics are illustrated in Figure 2.4. The above discussion makes it clear that the class of unsteady aerodynamics of the flapping flight problem is a subset of the class of RL problems, and can be addressed by the RL framework.  R L versus other methods  RL is an approach to machine intelligence that combines supervised learning and dynamic programming to solve problems that neither of the two disciplines can solve  RL domain of problems  Unsteady Aerodynamics Problem  Figure 2.4: The relationship between stationary, deterministic and Markovian problem boundaries. The example problem of unsteady aerodynamics is slowly-varying nonstationary, deterministic and k -order Markovian. th  2.2 Control Framework: Reinforcement Learning  27  individually. Genetic algorithms offer a similar approach that is also biomimetic; as mentioned, there are several related works that use evolutionary methods. RL, without dependence to a specific model, is also very similar to adaptive control. In this section, RL is compared to these four techniques. • Supervised learning is a general, widely studied problem of training a parameterized function approximator. Supervised learning requires sample input-output pairs from the function to be learned. However, in RL there is no presentation of input-output pairs. The agent is rewarded after each action but there is no indication which action would be in its best long-term interest. Another difference is that in R L the evaluation of the system is done concurrent with learning and so online performance is important. • Dynamic programming is a method for reducing the runtime of algorithms exhibiting the properties of overlapping subproblems and optimal substructure. RL is a dynamic programming approach, as it constructs value functions that are themselves locally optimal. R L , therefore, in standard discrete implementation, suffers from what is called "the curse of dimensionality", since its computational requirements grow linearly with the number of state variables, which itself grows exponentially with the number of dimensions. However, many R L methods are able to partially escape this situation by sampling and by function approximation. It is commonly accepted that both methods use the optimal substructure approach. However, opinions differ as to whether one is a subset of the other. • Genetic algorithms are related to R L through being applied to the same problems. As mentioned before, there are two methods of behavior learning. One is to search the space of behaviors to find one that is suitable (performs well by some measure in the environment). This is the approach taken by genetic algorithms. On the other hand, R L algorithms use statistical techniques and dynamic programming to approximate the usefulness of taking certain actions in the environment. Most work with genetic algorithms simulates evolution, not learning  2.3 Sensing Paradigm for Micro Flight  28  during an individual life, and so is very different from RL. Neither is better; however, in problems where state information is available (e.g., the flapping flight problem), R L seems to have a distinct advantage over evolutionary methods such as genetic algorithms. On the other hand, in problems where state information is limited (e.g., ballistic motion) and the system is completely non-Markovian, there is no clear advantage to using RL over evolutionary methods. • Adaptive control is a well-developed discipline dealing with algorithms that allow experience to improve a sequence of decisions. The system dynamics is smooth (i.e., linear or locally linearizable around a desired trajectory). A common cost function in adaptive control consists of quadratic penalties on desired states and actions which are represented as vectors. Like RL, the dynamic model of the system is not known in adaptive control and must be estimated. However, unlike RL, the "structure" of the dynamic model is fixed, therefore the modelestimation problem turns into a parameter-estimation problem. This property enables powerful mathematical analysis, and leads to practical and widely deployed adaptive-control algorithms.  2.3  Sensing Paradigm for Micro Flight  In this section, we discuss the sister problem of flight control, the sensing problem, to propose a sensing paradigm for micro flight. Although our look at this problem is biomimetic, simply copying existing biological sensors is not desired. This is because natural organisms have, over time, developed tools and methods that are much different from those developed by humans. Thus, when looking at nature for inspiration we should also keep in mind our engineering limitations. For this reason, we discuss engineering efforts and limitations in parallel with other discussions in this section.  2.3 Sensing Paradigm for Micro Flight  2.3.1  29  Problem Statement  As mentioned in §1.3, the control framework either dictates or loosens some constraints on sensing paradigms for micro flight, ones that make these two components consistent and coupled. The first outcome of employing the RL framework is the need for highly  informative  signals. The agent interprets the environment through input signals. As mentioned in §2.2.3, the agent receives two pieces of information from the environment: a reward, and an indication of current state. The agent makes associations, and updates them (i.e., they become conditioned,  in a more physiological sense) by pieces of experience  8  over time. These associations are made on top of input signals; therefore, the actual attained capacity and performance of learning depends on the amount of information these signals deliver. Obviously, as we go up in the control layers hierarchy, these signals become more informative and abstract. On the other hand, there is no constraint on how these signals are generated. That is, there is no constraint on the type of sensors used or how they are placed or how their information is distributed in space and time. Yet, there is a need for the sensing paradigm to provide meaningful input signals for the agent. This might suggest sensor elements that would expand a "basis" for the agent's "feeling space", to provide global information of its environment. Moreover, considering the applications of MAVs, robustness is a key property. Having identified the desired properties of the sensing paradigm, we proceed to a more detailed discussion to identify some limitations in designing a typical force sensor for microflight. For the application of MAVs, some important sensor attributes include: bandwidth, range, resolution, size and mass. Bandwidth must be considered in relation to wing-beat frequency, which for flying insects ranges from 5Hz to 1000Hz . For flight 9  control, it can be argued that less bandwidth than the wing-beat frequency is needed E a c h piece of experience can be defined by quadruple < s,a,r,s' >. In this update, 8, a and s' identify a unique association, and r identifies the strength (or weakness) of the relation. T h e wing-beat frequency of insects ranges from ~ 5Hz for some butterflies (Lepidoptera:Papilio) to ~ 40Hz in dragonflies, ~ 200Hz in fruit flies and goes up to ~ 1000Hz i n some tiny midges of the genus Forcipomyia (Diptera.:Ceratopogonidae) [78]. 8  9  2.3 Sensing Paradigm for Micro Flight  30  since only average force measurements during each stroke may be sufficient for varying wing trajectories on a stroke-by-stroke basis; this would, however, primarily be true only for steady maneuvers such as hovering or cruising [71]. One would expect that higher harmonics in wing motion would necessitate higher bandwidth, especially in the case of agile maneuvers. Sensor range and resolution must be considered in relation to the vehicle weight . Weight affects the average wing lift force and the sensor bandwidth 10  since a higher bandwidth requires a higher range to record the large instantaneous forces. In order to minimize intrusiveness, the sensor should be as compact and lightweight as possible. Table 2.1 shows some attributes typical of a dragonfly and approximate constraints on a force sensor to be installed on an MAV of the same size, to provide an indication of what is being sought. Attribute Size Mass Force Frequency  Typical dragonfly Wing span: 54 mm 530 mg Lift during hover: 5 mN Wing-beat frequency: 40 Hz  Force sensor Length: « 1 mm « 5 mg Peak force: ?s 50 mN Bandwidth: « \ kHz  Table 2.1: Possible target specifications of a force sensor to be used on an MAV.  Scaling  For the ideal sensor to have no effect on the object that it is measuring, it would have no mass, no physical size and consume no power; of course, this can never be achieved in practice. Considering the size constraint of MAVs, miniaturization plays a central role in the design and implementation of such small sensing elements. As sensor size is reduced, some attributes become more dominant than others, and as a result, the operating point changes. The scaling issue presented here quantifies these effects due to the change in sensor size by a scaling factor of n. T h e smallest known flying insect is the Dicopomorpha echmepterygis, weighing only a few p,N [55]. The heaviest one is the Deinacrida heteracantha of New Zealand, weighing about 687mN [56]. M A V s 1 0  2.3 Sensing Paradigm for Micro Flight  31  4  a.  iF  L  n V  +  0 Figure 2.5: Schematic layout of isomorphic scaling problem. For this analysis, consider the force/pressure sensor located at the end of a beam and, its isomorphically  reduced  counterpart, illustrated in Figure 2.5. Initially, the sensor  with resistance, R , has dimensions of w x l; the beam length is L ; and F is a point 0  load applied at the far end of the beam. • Power density, static stability: According to R =  and the fact that I oc n,  A ex n and p is constant, the resistance of the scaled sensor is proportional to 2  ^. Therefore, assuming that both sensors are excited with the same voltage level, v, from P — v /R, we conclude that as the sensor is reduced in size, the power 2  dissipation would also be scaled by the scaling factor n. However, the power density, pp = P/V, is therefore proportional to  This increased power density  would introduce decreased static stability of the sensory system which is one of the side effects of the down-scaling. The electric field E — v/d would also increase by n and the breakdown field can be exceeded for very small d. • Responsiveness/sensitivity to thermal variations: by definition must have a weight of 50<7 or less.  2.3 Sensing Paradigm for Micro Flight  32  Of.  Figure 2.6: Simplified sensor power-analysis model. The effect of the sides was neglected. The sensor surface has temperature T , while every point outside the sensor element stays at temperature T . s  0  In order to investigate behavioral changes in response to thermal variations due to scaling, we look at the energy analysis in the sensor model depicted in Figure 2.6. To simplify the analysis to some degree, we assume I, w ~5> d to neglect the effect of the sides. The rise in the temperature of the sensor is due to the power dissipated in the sensor, P = i R . 2  0  Conservation of energy dictates a power balance with  three processes: P = Q  f  where Qf,Q  s  (2.6)  + Qs + Q  1  and Q are, respectively, the rates of the convective transfer of heat p  from sensor to the ambient flow, the storage of heat in the sensor element and the conductive transfer of heat to the underlying plate. Assuming the plate is large and conductive enough to stay constant at temperature T , we have: 0  Qf  =  (2.7)  h S(T -T ) f  s  0  (2.8) (2.9) where hf is the convective heat-transfer coefficient of the fluid, which depends on the fluid material as well as the fluid velocity. S = I x w is the surface area of the sensor exposed to the ambient flow. T and T are the temperatures of the s  Q  2.3 Sensing Paradigm for Micro Flight  33  sensor and the ambient flow, respectively, as shown in Figure 2.6. c , m , K and s  s  S  d are, respectively, the specific heat, mass, thermal conductivity and height of the sensor element. From (2.6-2.9) one can write the first-order system: + ?f)T + T S(h + g f ) + i R cm  -S(h  =  2  f  s  dt  0  s  f  0  (2.10)  s  from which we determine the time constant, r, of the thermal change in the sensor, which describes how quickly the sensor can track transient changes in temperature:  T  cm S(h + s  =  (2.11)  s  h  f  S(l+K )\  f  n  From (2.10), the equilibrium temperature is: iR S(h, + *t)~ 2  where we defined: K = K x ^ — n  J_ iR V S(l + K ) 2  0  (2.12)  0  °  +  n  f  x ^f. Therefore, as m oc n and S oc n , 3  0  2  s  the following relate r of equation (2.11) to the scaling factor n:  T  =n  for K < n,  n  for K 3> n.  0  (2.13)  OC 0  In either of the above cases, as the sensor is reduced in size, it will more rapidly follow the transient changes in temperature, which once again is not desired. In miniature sensors, d <C  and as a result we are concerned with r a n . As an 2  example, for Entran semiconductor strain gauges, n is about 1 5 0 ^ ^ and hf for s  a gas in free convection is about ^0 ^ , n  while d is in the order of 1 0 m . _4  K  One can repeat a similar analysis for T  Sfiq  the growth of i R . 2  0  of equation (2.12) to find a bound on  Assuming thermal stability is a function of temperature and  that we don't want it to change, we would want to keep T ^ $  eq  constant. That  2.3 Sensing Paradigm for Micro Flight  34  is, we need the second term in (2.12) to stay constant with n. Therefore, as the denominator is proportional to n (^), the power should also be proportional to 2  n. In summary, the thermal stability will not change with size scaling as long as the excitation voltage remains unchanged. • Resolution: The few drawbacks of miniaturization are the price we pay for the incredible advantages of miniature sensors. In this section, one of these advantages, namely high resolution, is introduced. One can write: F  FM c  G  R  G  EI  0  F P Lc  min  G  EI  r  EI  '  [  / n i A  .  '  '  where FQ is the gauge factor, £ in is the minimum strain caused by the minimum m  moment M i . m  n  e i m  n  translates into a minimum change in resistance, AR i , m n  that  can be read by external instrumentation. E, I and c are the elastic modulus, crosssectional moment of inertia and cross-sectional centroid of the beam, respectively. Therefore, the minimum force, P , that can be resolved is T  P = r  EI  „  p  T  R FQLC  xA i W  (2.15)  0  is limited by the instrumentation, while / oc n and L and c are propor4  ARmi  n  tional to the scaling factor. Therefore P oc n , and the resolution being inversely 2  r  proportional to the minimum readable force would be proportional to • Bandwidth: Another advantage of miniature sensors is their phenomenally high natural frequency, calculated  \ Tl, where k and M are the stiffness  and effective mass of the structure, respectively. The stiffness for the cantilever structure would be k = F/5 = ^f,  where F is the applied test force and 5 is  the resultant beam deflection. As / and L are proportional to n and n respec4  tively, the stiffness, k would increase by the increasing scaling factor (i.e., k oc- n). Therefore, as M a n , the natural frequency and therefore the bandwidth would 3  increase with a decreased scaling factor (i.e., UJBW  OC  u oc - ) . n  2.3 Sensing Paradigm for Micro Flight  2.3.2  35  Review of Miniature Force Sensing  In this subsection, we discuss current biological, biomimetic and miniature sensors available, in order to compare existing attempts and identify issues needing further attention. This review covers recent experiments in sensor design, as well as other key works in this area. Because of the overwhelming amount of information available, we limit the discussion to force/pressure mechanosensors. The different processes involved, make it difficult to compare biological and classical miniature sensors. A discussion of which sensors are better would require a performance metric, however it can be quite difficult to quantify features such as robustness. Furthermore, the metric would be specific to some application. Inevitably, mechanisms that are optimized in one domain may be unsuitable for another domain and therefore cannot be generally compared.  Biological sensors  Although insects have the same five senses as humans, their sensory capabilities are very different both qualitatively and quantitatively. Depending on their function, the sensory receptors involved can be categorized into four groups: mechanoreceptors, tors, chernoreceptors  and thermoreceptors.  photorecep-  In force sensing, the mechanoreceptors are  the only receptors we are concerned about. Mechanoreceptors  detect mechanical disturbances such as movement, stress, vibra-  tion, etc. Insect mechanoreceptors can be found almost everywhere on the surface of an insect's body. They are innervated by one or more sensory neurons that fire in response to stretching, bending, compression, vibration, or other mechanical disturbances. However, they may differ in the .way they react to stimuli. Some produce a phasic response when stimulated, firing once when activated and again when deactivated. Other receptors generate a tonic response, firing repeatedly as long as a stimulus persists. Keil provides a nice review on insect mechanoreceptors [43]. Four types of mechanoreceptors are trichoid sensilla, campaniform  sensilla, stretch receptors  and chordotonal  organs  2.3 Sensing Paradigm for Micro Flight  36  denned as follows: •  Trichoid  sensilla  (i.e., hair sensilla) are probably the simplest and most common  mechanoreceptor, being found in almost all body parts. They typically have a phasic response and are used to detect touch and air movements, including sound. They are also found as hair beds or hair plates (clusters of tactile setae) often behind the head, on the legs, or near joints, where they respond to movements of the body. •  Campaniform  sensilla,  11  are flattened oval discs (Figure 2.7) found in many loca-  tions, especially on the appendages near articulated joints (e.g., base of the wings, halteres, etc.) (Figure 2.8) [64], [79], [37]. They are mostly found in clusters, act as a unit and almost always have a proprioceptive function (i.e., sensory input about the position or orientation of the body and its appendages).  They respond to  cuticle deformation and act as biological strain gauges. Modeling and simulation of campaniform sensilla is reported in [77] and [12]. • Stretch receptors  are found throughout the body wherever muscle control is re-  quired (e.g., mouth parts, foregut, and hindgut; there are also dorsal, ventral, and longitudinal stretch receptors in the thorax and abdomen. Stretch receptors are also found in hinge regions, like the base of the wing and legs). •  Chordotonal  organs  are subcuticular sensilla (i.e., with no external sign of their  presence), composed of many sensilla called scolopidia.  They are mainly very  sensitive to vibration. These various sensing organs can be seen as the building blocks of the sensing mechanisms or sensing systems in which engineers are interested.  Halteres of the group  of flying insects known as Diptera are a good example of such systems. Each haltere contains ~400 mechanoreceptors [37], mainly campaniform sensilla and chordotonal organs, at the base (see Figures 2.8a. and 2.8b.) These sensors are evolved from the hind wings in insects belonging to the group Diptera (Figure 2.8c). Derham [16] was 1 1  Latin: campana bell; forma, shape.  2.3 Sensing Paradigm for Micro Flight  37  20 micron  Figure 2.7: Campaniform sensilla. (a) a group of campaniform sensilla on a cockroach leg. Reprinted from: [97]. (b) The structure of a single campaniform sensilla showing the cap covering the neuron's dendrite. Redrawn from: [53]. the first to note that insects cannot stay aloft immediately after their halteres are removed . Subsequent discussions about how halteres work and what they measure are 12  found in [35], [34], [65] and [30]. It is known that although one haltere is blind to the rotations around the axis of its oscillation [59], [60], [39], roll turns can still be sensed because the two halteres are not coplanar. Pringle [66] proposed a bilateral phase comparison for pitch-roll discrimination, but the fact that flies with only one haltere removed can fly almost normally (as discussed by Nalbach [59]) indicates that it is theoretically possible for a haltere system to have three measuring axes unilaterally. Therefore, the Dipteran haltere system can detect all three rotational axes [30]. The campaniform sensilla and chordotonal organs, at the base of the halteres, encode strain generated by the coriolis forces during rotational movements. Therefore, the haltere system is a means to maintain equilibrium during flight. A comparison between visual and haltere-mediated reflexes reveals that the visual system is tuned to relatively slow rotation, whereas the haltere-mediated response to mechanical rotation increases with rising angular velocity [74]. Additionally, haltere mechanoreceptors even provide synaptic input to neurons of the wing steering muscle [31], [84]. It is worth mentioning that Fraenkel and Pringle [35] mentioned that such specimens become almost indistinguishable from normal individuals in this respect after one day, showing an example of redundancy in sensing. 12  2.3 Sensing Paradigm for Micro Flight  38  7^  Figure 2.8: A closer look at the a haltere of an insect, the angular acceleration feedback sensor, (a) Field of campaniform sensilla on the base of the left haltere of the Calliphora. Plane of oscillation is toward the observer; arrows indicate axis of movement (x300). (b) Rows of campaniform sensilla on the haltere. The caps are hidden beneath the semicircular collars (xl300). (a) and (b) reprinted from [43]. (c) Left haltere from above. Most of the mass of the haltere is in the knob (left). The base houses more than 300 strain receptors. Modified from [39].  Biomimetic Sensors  The term biomimetic  devices  can cover a wide range of human-made, biologically in-  spired systems and tools. It is important to note that a system can be biomimetic in design, manufacture or both, and that what is considered biomimetic depends on how the word is defined. Almost every sensor could have a biomimetic inspiration in design or manufacture. In biomimetic sensor design, copying the existing morphological and/or functional  2.3 Sensing Paradigm for Micro Flight  39  characteristics of biological systems may not be always desired. Instead, the goal is to draw inspirational concepts from nature that apply to the engineering domain. Research in biomimetic robotics is necessarily multidisciplinary, requiring knowledge from various engineering fields (typically, electrical, mechanical, material and computer) as well as various biological sciences (neurosciences, locomotion). MEMS and BioMEMS offer the possibility of new approaches to the design and fabrication of miniature sensors. On the other hand, tissue engineering, which may be viewed as a form of biomimetic manufacturing, aims to grow biological systems in a physiological environment. As mentioned before, campaniform sensilla work as biological strain gauges. By sensing strains on the wing base, for instance, these sensilla permit the insect to feel the wing position and orientation by encoding the cuticle deformations. Modeling and simulation of campaniform sensilla has been studied in [77] and [12]. Wood and Fearing [89] discussed the use of strain gauges for flight force measurement for a micromechanical flying insect (MFI). They also analyzed placement of the gauges on different body parts. The halteres of an insect detect rotations via gyroscopic forces and play a fundamen-  Figure 2.9: A biomimetic haltere. (a) Biomimetic haltere schematic diagram, Completed haltere with strain sensor. Modified from [91].  (b)  2.3 Sensing Paradigm for Micro Flight  40  tal role in attitude stabilization in insects. The simulation and fabrication of this type of biologically inspired angular rate sensor, for use on an MAV, is described in [91] (Figure 2.9). The authors also discussed the use of halteres to reduce body oscillations due to flapping by phase-locking to the wing. There are several advantages in using halteres instead of MEMS gyroscopes as angular rate sensors such as lower power consumption and larger dynamic range. A biomimetic hair-like sensor is explained in [4] for use in an underwater lobster robot. A biomimetic lateral-line flow sensor, based on an artificial hair cell (AHC), has also been developed at the University of Illinois. Fan et al. discussed the development of the fabrication process for the lateral-line flow sensor using bulk micromachining technology [29]. Other interesting biomimetic designs from the flight force measurement point of view include the biomimetic flow and contact/bending MEMS sensor [50] and moving antennae [3] for a biomimetic lobster robot. The latter, based on strain gauges, serves as a touch, collision-detection and water-motion sensor. Also, [3] described the use of MEMS accelerometers as the biomimetic implementation of lobsters'  statocysts . 13  Comparing biological sensory systems in insects to human-made sensors, reveals the great gap in between. There has been little work done on biomimetic sensor design and fabrication for microflight, considering the great need for miniature, reliable, robust and cheap sensors.  Purely Engineering Sensors  As human technology advanced in a very different way from what nature presents, engineers came up with their own solutions to problems nature had already solved. Miniature sensors are part of the human attempt to solve the need for compact sensors with specific properties. Conventional technologies for microfabrication of these miniature sensors are silicon bulk micromachining and silicon surface micromachining. Nonlinearity, hysteresis and thermal sensitivity are several problems in the design and Statocysts  13  are the organs of sensing orientation and gravity in some plants and invertebrates.  2.3 Sensing Paradigm for Micro Flight  41  Applied Force  Piezoresistors  Glass or Si  Figure 2.10: Schematic of a piezoresistive pressure sensor. application of miniature sensors. For overviews on force sensing in microrobotic and microfabricated force and position sensors, see [28] and [13], respectively. Based on the property employed to sense the force/pressure, miniature sensors can be roughly divided into the following six categories: • Piezoresistive force/pressure sensors: Most piezoresistive force/presure sensors utilize metal-foil strain gauges or semiconductor strain gauges with resistance varying due to a stress (Figure 2.10). Since the resistance also varies with temperature, four piezoresistors are often connected in a bridge configuration (either quarter, half or full) for thermal compensation. The design, fabrication and characterization of a micro-mechanical piezoresistive force sensor is presented in [67]. A piezoresistive cantilever system is considered by [7] in a discussion of optimal design of dynamic force/torque sensors. A simple idea is proposed in [48] for the design of a micro strain gauge to achieve a high gain. The authors designed their micro strain gauge with a mechanical amplifier and used beam theory to analyze the new system. The development of a sixaxis silicon micro force sensor through a bulk micromachining process is described in [41]. Fabrication of a temperature-compensated dual-beam pressure sensor by surface micromachining is described in [52]. Fundamentally similar to the piezoresistive sensors, are many flow sensors [83] that measure lift force in order to calculate flow. One of the interesting piezoresistive sensors for flight force measurement applications is the flexible sheer-stress sensor  2.3 Sensing Paradigm for Micro Flight  42  skin described in [92]. • Capacitive force/pressure sensors: Two examples of capacitive force/pressure systems are shown in Figure 2.11. In capacitive sensors, deflection of the membrane causes the distance between the two electrodes of the capacitor to change. The change in capacitance is then measured and the amount of force/pressure is calculated based on this change. Despont et al. discussed the design of a macromachined capacitive force sensor [17]. Capacitive sensors, as opposed to their piezoresistive counterparts, have no hysteresis, better long-term stability and are generally more sensitive. However, they require more complex signal processing and have higher production costs. • Piezoelectric force/pressure sensors: Piezoelectric materials establish a bidirectional connection between their electrical and mechanical domains. In other words, a change in mechanical stress on a piezoelectric material can be detected  a.  Elastic Working Gap between suspension capacitors electrodes Electric \ , / /, q  Electrode 2  Force  Reference capacitors  Sensing capacitors  Electronic circuit  Pressure  Figure 2.11: Silicon-based capacitive pressure sensors. Redrawn from [17] and [73].  2.3 Sensing Paradigm for Micro Flight  43  by a change in the electric field, and hence, the voltage across the material. The converse is also true, making it possible to use these materials as both actuators and sensors. Campolo et al. discussed the use of piezoelectric actuators with embedded piezoelectric sensors for the MFI [9]. Piezoelectric actuators that also work as sensors are becoming increasingly appealing due to their high bandwidth, high output force, compact size and high power density properties. • Piezomagnetic force/pressure sensors: Similar to piezoelectric materials, piezomagnetic materials establish a coupling between their two domains. In the presence of a magnetic field, they undergo a dimensional change (Joule effect). The application of mechanical strain may change the state of magnetization within the sample (Villari  effect), which is usually detected as a change in permeability.  There are a number of basic limiting characteristics that a high performance piezomagnetic material should have. Gibbs et al. studied the development stages of piezomagnetic materials and evaluation of their performance in microelectromechanical systems [36]. The development of a highly sensitive strain sensor based on magnetoelectronic devices is given in [49]. • Resonant force/pressure sensors:  Resonant force/pressure sensors detect  force/pressure on the membrane by measuring the change in the resonant frequency of the membrane. The main advantage of using resonators is that the measured value, in the form of a frequency, provides higher SNR. Also, the signal can be digitally processed. The nonlinearity and hysteresis effects of resonant strain gauges are studied in [38]. Beeby et al. detailed the design and new fabrication process for a dynamically balanced resonator [6]. In their work, the resonator revieves electrostatic excitations and the vibration is monitored using the piezoresistive effect in the resonator itself. • Optical force/pressure sensors: These sensors use the change in optical prop-  2.3 Sensing Paradigm for Micro Flight  44  erties to detect force/pressure. Two types of well-known optical force/pressure sensors are described. - Mach-Zehnder interferometer: This method is based on the fact that a modulated light beam, in the presence of force/pressure, has a different propagation speed than a reference light beam. Here, laser light is brought into the device and is split by two waveguides; one channel serves as the reference signal. The photodiode at the end of the channels detects the phase shift, which is then translated into force/pressure [63]. - Laser Raman spectrophotometer: This method of force sensing is presented in [1]. It is based on the fact that, when light strikes a solid object, the peak frequency and bandwidth of the scattered light (known as Raman scattering) provides quantitative information about the internal stress of the target. When force is applied, the Raman scattering frequency is decreased or increased by the stress. 2.3.3  Sensor-rich Paradigm  Although agent and environment are defined in §2.2.3, it is necessary to further differentiate the layers between the external world and the agent by introducing another boundary: • Agent: the brain, the decision-making component of the robot, as explained in §2.2.3. • Environment: whatever is external to the agent.  It consists of whatever is  outside the agent's absolute control. Therefore, mechanical parts of the M A V are usually considered to be part of the environment. • Robot: includes all the mechanical parts, its links and all sensory and actuation parts. It therefore comprises the agent and part of the environment. • External world: whatever is external to the robot. It consists of whatever is outside the robot's absolute control. In control engineering, this term is often  2.3 Sensing Paradigm for Micro Flight  45  used to define the environment. The reader is advised to differentiate these terms as defined. '  Figure 2.12 illustrates the agent-environment and the robot-external world bound-  aries as well as a schematic of the sensors by which the robot perceives the external world. As can be seen, the sensing paradigm should be implemented in between these two boundaries to provide highly informative signals for the agent from these sensor readings. The complexities seen in an environment are mostly due to our limited perception/understanding of such environments. However, these complications might be considered simple to a more perceptive and understanding entity. The main reason is that the latter can correlate the facts using its superior perception of the environment whereas the former views these facts as uncertainties. Although a part of this perception is not achievable in practice simply because of the limitations of our material world (e.g., identifying a dark body on a dark night by naked eyes), a part of it is attainable by better sensing the environment (e.g., wearing thermal or night-vision goggles). Here is an example: j—[ R o b o t - E x t e r n a l World B o u n d a r y >-{ A g e n t - E n v i r o n m e n t B o u n d a r y  Figure 2.12: The agent-environment and the robot-external world boundaries. The figure shows the robot perceives the external world by sensor elements e\ — e^. The sensing paradigm should be implemented in between these two boundaries to provide highly informative signals (s,r)i — [S,T)M for the agent.  2.3 Sensing Paradigm for Micro Flight  46  Example 2.2. Consider robots A and B, both identical fliers, except that robot B has an added antennae mechanism to sense wind gusts. Suppose both agents have this ability under our framework of §2.2 to build associations and conditions between causes and effects. In wind gusts, robot A receives changes in effects (e.g, oscillations in flight) without any changes in causes (i.e., same sensor readings) and therefore labels the environment as acting randomly. On the other hand, robot B is able to distinguish this new state by incorporating his antennae sensor readings. Therefore, robot B is looking at a simpler class of problem, although still unknown (i.e., not identified), with fewer ambiguous states than robot A perceives. •  As can be inferred, there is a relationship between the amount of information delivered by the sensors and the degree of simplicity of the problem viewed by the robot. Let's assume the robot has N independent sensor elements that provide N pieces of information at each time step t. The sensing input, Ut, is the information received (i.e., felt) by the robot at time step t (Figure 2.12), and can be written as: U = [fi h h t  ••• / i v f -  (2.16)  This vector over time is the only input to the robot and therefore, the perception and understanding of the robot solely depends on the length of this vector. A subset of this vector identifies the current state, and a subset is used to reward the agent. Better perception of the environment not only simplifies the problems, but also potentially provides highly informative signals to better reward the agent. In the sensor-rich paradigm, the amount of information received by the agent and the robot is the same. However, the robot acquires this from the external world by many distributed sensor elements, whereas the agent receives a few highly informative signals. This abstraction is done in a sensor-rich paradigm by networking the distributed sensor  2.4 Summary and Discussion  47  elements to form many feedback loops to extract highly informative signals. The presence of many sensor elements, on the other hand, implies a more complex state space, which is not desired. This complexity, however, is at the robot level and not at the agent level. That is, in a sensor-rich paradigm much like what we have been looking at in biological insects, the agent has simple states and the robot has complex states. Table 2.2 summarizes the comparison between robot and agent in a sensor-rich paradigm.  2.4  Summary and Discussion  Reinforcement learning has been proposed as a framework to address the flight-control problem of MAVs. Pros and cons of this approach have been explored and compared to those of other methods. The unsteady aerodynamics problem has been shown to be a subset of the R L class of problems, confirming the applicability of this approach in theory. A compatible sensing approach has been introduced more in the form of a paradigm rather than a framework to complete the discussion. In analyzing scaling issues, it was found that as the sensor size is reduced, some properties become more significant than others. Operational power density is inversely proportional to the square of the scaling ratio (i.e., oc  resulting in reduced static  stability of the system. The responsiveness of the sensor to transient thermal variations is increased and therefore to keep the sensor in a thermally stable region we required a bound on the growth of power dissipation in the sensor. Among the advantages of scaling, we explained how sensor resolution is increased proportional to the square of the scaling ratio (i.e., a p ) . Also, although miniaturization implies reduced structural stiffness, the overall bandwidth of the sensory system would increase with a decreasing scaling factor (i.e., ex ^). A survey of possible insect sensory structures was provided. Analysis of these purely biological, biomimetic and engineering viewpoints is a rich source of inspiration for the design and development of microsensory systems. As noted, not much work has  2.4 Summary and Discussion  Robot Interacts with: external world Complex states Receives numerous simple data networked to generate many feedback loops and to provide highly informative signals  48  Agent Interacts with: environment Simple states Receives a few highly informative signals  Table 2.2: Comparison between robot and agent in a sensor-rich paradigm. been done in miniature biomimetic sensor design for microflight, since the problem of building an autonomous MAV is still in the early stage of understanding the napping flight mechanisms. A great lesson from nature is redundancy. In biological systems, there are many instances of redundancy in the form of existence of duplicate sensors or existence of the same sensor but with a completely different mechanism. Different sensors with partly overlapping ranges make for more reliable sensory information (e.g., haltere and visual systems in maintaining balanced flight in some flying insects). Together with the distributed property of biological systems, they make biological systems very robust. The direction of biorobotics should lean toward finding simpler solutions for problems and preventing overkill in system design as biological systems suggest. Simple reaction-based mechanisms seem to work better. This suggests avoiding ad hoc devices that act as a single unit in favor of distributed adaptive systems with many small simple elements. Moreover, making use of the environment itself and benefiting from the properties of the physical world should be employed (i.e., intuitiveness) much like the way the shape of the inner ear helps in recognizing individual frequencies. The RL framework and sensor-rich paradigm discussed in this chapter conform with the sensor-rich feedback control paradigm suggested by Zbikowski for microflight [95]. Such conformity, itself, shows the promise of this approach and reinforces employing a biomimetic solution.  Chapter 3  An RL Approach to Lift Generation in Flapping MAVs In this chapter, we will employ the R L framework described in the previous chapter (§2.2) to solve the basic problem of lift generation for micro flight and derive a wing trajectory that maximizes lift by means of learning from reinforcements. This motion will then be compared to the napping motion of a biological insect. It is important to note that there is no assumption of flapping motion. Nevertheless, the low-level controller, described here, ultimately achieves flapping motion by means of learning. This should be contrasted to an approach in which a flapping motion is assumed but parameterized by a few values that need to be adjusted (e.g., [71])'.  3.1  Algorithm  Although complete observability is necessary for algorithms based on MDPs, in realworld applications it is often not possible for the agent to completely perceive its environment, for example, noisy, incomplete and error-prone observations. This problem is referred to as the problem of "hidden" states, and the resulting formal model to solve partially observable problems is called partially observable MDP, or simply POMDP [15]. 49  3.1 Algorithm  50  In a more general case of the RL problem, the agent's action not only determines its immediate reward but also specifies (at least probabilistically) the future reinforcements indirectly by specifying the next state of the environment. Therefore, in each decision, the agent has to consider not only the immediate reward but also the next state. The way in which the agent takes future reinforcements into account is determined by the model of long-term optimality it is using. The agent should learn from delayed ment.  reinforce-  That is, it might take a sequence of actions with little or no immediate reward  just to reach a particular state with a high reinforcement. The agent should learn to decide which actions are profitable based on the cumulative reward that can take place arbitrarily far in the future. In our opinion, this process can play a fundamental role in learning the unsteady mechanisms of napping flight which are theoretically based on underlying unsteady mechanism of vortex generation and shedding. Before tackling how an agent should learn to behave optimally in an environment, we should discuss what the optimal model of behavior is. This is similar, in its concept, to determining a cost function for an optimization algorithm. In RL, the real cost function (i.e., the award calculation unit) is in the environment that may not be available. However, by collecting information over time about this function through reinforcement signals, the agent is able to build an estimation about the utility of a policy. The agent will then choose the immediate action in the direction of the selected policy. In this estimation the agent not only considers the immediate reward but also a window of future rewards. The two very well-known cases are: • Finite-Horizon: In the finite-horizon model the agent optimizes its expected reward for the next h steps. That is: h  (3.1) t=o  where 7 € [0 1] is the weighting parameter. • Infinite-Horizon: In the infinite-horizon model, however, the agent takes into  3.1 Algorithm  51  account the long-term reward. That is: OO  E { X>* }•  (-)  r  32  t=o  However, the rewards that are received in the future are geometrically discounted by a discount factor 7 , that is, jt = 7*. Mathematically, in this case, we require 7 e  [0 1).  As stated in §, the unsteady flapping flight problem is best approximated as slowly-varying non-stationary, deterministic and fc -order Markovian. We chose t/l  infinite-horizon case as the model of optimality. Therefore, the suitable algorithm, for our problem, is a discounted, infinite-horizon algorithm that can learn to behave optimally in a slowly-varying non-stationary, deterministic and k -ovder th  Markovian  environment. The effective algorithms that address non-Markovian problems are those that use memory elements to disambiguate the current state. Among the most effective algorithms with internal states for this case are "Recurrent Q-learning" or POMDP-based algorithms, the ones using Hidden Markov Model (HMM) techniques. If a non-Markovian problem is at least k -ovdev Markovian with k being small, some memoryless techniques th  that are simpler and computationally less expensive might work. However, full observability is a requirement for algorithms based on MDPs. In cases where this assumption cannot be made, convergence cannot be guaranteed analytically. For a good survey on RL algorithms, refer to [42]. The algorithm used in this chapter is the memoryless Q-learning algorithm developed for Markovian problems [86]. Three reasons motivated this pragmatic choice: First, in an unsteady aerodynamic model, the aerodynamic forces generated on the wings rely more heavily on the current and recent states. Therefore, the unsteady aerodynamics problem faced by the agent is not far from being Markovian and Q-learning is known 1  In a more analytical sense, this means that the unsteady aerodynamics problem can be approximated by a k -ovdei Markovian model with k being small. 1  th  3.1 Algorithm  52  to handle small breaches of the Markov requirement very well [42]. Second, because of the limited number of choices on aerodynamic models, that assumed for simulation purposes is quasi-steady and not even unsteady. Third, Q-learning is computationally less expensive than algorithms with internal states and relatively easier to implement. For these reasons, the theoretic choices of algorithms, mentioned above, should only be considered after possible pragmatic failure of the simpler algorithms. However, as stated, an important disadvantage is that there is no analytical proof for the convergence of Q-learning on non-Markovian problems. The description of the Q-learning algorithm follows. 3.1.1  Q-learning  Let the world state at time step t be s , and assume that the agent chooses action t  at- The immediate result is that a reward r +i is received by the agent, and the world t  undergoes a transition to the next state  (refer to Figure 2.2). The agent tries to  St+i  choose actions maximizing discounted cumulative rewards over time. More precisely, let 7 be a specified discount factor. The total discounted reward received by the agent starting at time t is given by:  r (t) = n + i d  + 7^+2 + 7 r 2  t  +  3  + -  +  l ~r n  l  t+n  +-  (3.3)  The objective is to find a policy ir, or a rule for selecting actions, such that the expected value of the return is maximized. For any such policy ir, and for any state s, the estimate of the value of the state s can be expressed as:  V (s) n  = E{r (0)\s d  0  = s; a* = Tr( ) for all i > o}, Si  (3.4)  which is equal to the expected total discounted return received when starting in state s and following policy n thereafter. If TT is an optimal policy we also use the notation V* for V . Many dynamic programming-based RL methods involve trying to estimate n  the state values V*(s) or V (s) for a fixed policy n. n  3.1 Algorithm  53  Q-learning is a simple incremental algorithm developed from the theory of dynamic programming for delayed reinforcement learning [86]. In Q-learning, a set of policies and a value function are represented by a two-dimensional lookup table indexed by state-action pairs. Formally, a Q-value can be assigned to each state s and action a, such that Q : S x A i—> 3?, and: Q*(s,a) =  J  B{r +7^*(si)|so = s;a = a} 1  0  = R(s,a)+jJ2 ss'^ *( ">> P  V  s  s'  (3.5)  (-) 3  6  where R(s, a) = E{ri\so — s; an = a} and P >(a) is the probability of reaching state ss  s' as a result of taking action a in state s. It follows that:  V*(s) = maxQ*(s,a). a  (3.7)  Intuitively, (3.6) suggests that the state-action value, Q*(s,a), is the expected total discounted return resulting from taking action a in state s and continuing with the optimal policy thereafter. More generally, the Q-function can be defined with respect to an arbitrary policy n as:  Q«(s, a) = R(s, a) + ^ P s' (a)V"V), 7  S  (3-8)  8'  and Q* is just Q for an optimal policy re. n  The Q-learning algorithm works by maintaining an estimate of the Q* function, which we denote by Q*, and adjusting Q* values (i.e., Q-values) based on actions taken and rewards received. This is done using Sutton's prediction difference, or TD error; that is, the difference between the immediate reward received plus the discounted value of the next state and the Q-value of the current state-action pair: TD error = r + -yV*(s') -  Q*(s,a),  (3.9)  3.1 Algorithm  54  where r is the immediate reward, s' is the next state resulting from taking action a in state s, and V*(s') = nmxQ*(s,a). Then the values of Q* are updated according to: Q*{s,a) := (l-\)Q*(s,a)  + \{r + V*(s')), 7  (3.10)  where A € (0 1] is a learning-rate parameter . Note that the current estimate of the 2  Q* function implicitly defines a greedy policy by 7r(s) = argmaxQ*(s,a). That is, the greedy policy is to select actions with the largest estimated Q-value. It is important to note that the Q-learning method does not specify what actions the agent should take at each state as it updates its estimates. This means that Qlearning allows arbitrary experimentation while at the same time preserving the current best estimate of states' values. This is possible because Q-learning constructs a value function on the state-action space, instead of the state space. Furthermore, since this function is updated according to the optimal choice of action at the following state, it does not matter what action is actually followed at that state. For this reason, Q-learning is said to be "experimentation-insensitive" [42]. To eventually find the optimal Q-function, however, the agent must try out each action in every state many times. It has been shown [86] [14] that if (3.10) is applied in any order to update each state-action pair's Q-value iteratively where the number of iterations approached infinity, then Q* will converge to Q* and V* will converge to V* with probability 1 as long as A is reduced to 0 at a suitable rate. This is why the policy 7r(s) = argmaxQ*(s,a) is used part of the time, in order to be able to explore the state-action space completely. Parameter e controls this process, as described later. The Q-learning algorithm for one episode of learning can be written as (pseudocode modified from [82]): In RL, learning rate is commonly expressed by a. Here we used A to make a distinction from a, the angle of attack used later in this chapter. 2  3.1 Algorithm  55  A l g o r i t h m 3.1: Q-Learning 1  Initialize Q(s,a) arbitrarily  2  Initialize s  3 4,  repeat (for each step in episode) Choose a from s using policy derived from Q (e.g., £-greedy) Take action a , observe r, s' Q(s,a) 4= (1 - X)Q(s,a) + \(r + 7 max Q(s',a'))  5  6 7 8.  until episode terminal condition reached  The following parameters are considered higher-level signals that control the process of learning and therefore, if not hard-coded, must be externally provided in real-time to the agent: • £ £ [ 0 1 ] : exploration rate, • 7 € [0 1): discount factor, and • A € (0 1]: learning rate. At each time step t, we choose to make either a random action or the optimal action at = Tr(st). We will take the first possibility (the random action) with a probability of e. Therefore, £, identifies the threshold between exploitation and exploration. At the beginning of a learning episode, £ should be set to near or at 1 to allow sufficient exploration. However, at the end e is set to 0 to ensure following the optimal policy. This is true only for stationary environments. In non-stationary environments, exploration should continue to take place at all times, to take notice of changes in the environment. The discount factor, 7 , is used to weight near-term reinforcements more heavily than distant future reinforcements. The closer 7 is to 1, the greater the weight given to future reinforcements. 7 can be set to a small value at the beginning and increased to reach its final value as the optimal policy is found. Because the agent is becoming more experienced over time, it can rely more heavily on its estimations of receiving certain future reinforcement, given a state-action pair. A is the learning-rate parameter and should be greater than 0, during the learning process, to allow Q-values to be updated  3.2 Simulation  56  by (3.10).  3.2  Simulation  The primary goal of this section is to introduce the first step in employing the RL framework for flight control inflapping-wingMAVs. The computer simulation presented here demonstrates the application of a Q-Learning R L technique to a quasi-steady aerodynamic model, resulting in successful convergence of the wing trajectory to a flapping motion. Although the adequacy of such a model in capturing complex qualitative behavior is questionable, it does not diminish the importance of the simulation procedure and results, presented in this section, in finding suitable algorithms and developing needed conditions to carry out physical-model experiments. 3.2.1  Quasi-steady World Model  The aerodynamics of flapping flight was simulated by a modified version of the 2D quasi-steady aerodynamic model described by Schenato et al. [72]. The model combines analytical models of delayed stall and rotational lift mechanisms of flapping flight with empirically matched data from a dynamically scaled model [20] of a typical Drosophila at low Reynolds numbers (Re RS 136).  F,  N  cp  Figure 3.1: Force decomposition of the aerodynamic force, F . F^, FT are, respectively, the normal and tangential force components w.r.t. the wing. FL, Fry are the lift and drag force components, respectively, w.r.t. the absolute frame. aero  3.2 Simulation  57  4r  Chordwise rotational axis  Figure 3.2: Isometric view of a napping wing. Figure shows the left wing in an upstroke, assuming the insect body lies along the x axis facing the positive direction. The small circle on the wing chord indicates the leading edge. For the simulation, a horizontal stroke plane is assumed. <j> is the stroke variable, defined as the angle between the wing's chordwise axis of rotation and the y axis, a is the angle between the chord and the stroke plane. According to the quasi-steady approach, the total force on a wing is computed by dividing the wing into infinitesimal blades and integrating the forces along the wing, that is, F t(t) = §Q F(t,r)dr. The normal and tangential components of translational ne  and rotational forces are depicted in Figure 3.1. Rotational forces are purely pressure forces and act only perpendicular to the wing surface. Therefore, the three forces can be written as:  Ftr, {t)  =  F ,T(t)  =  N  tr  -C {cc{t))pRcU {t)  (3.11)  \c (a(t))pRcU (t).  (3.12)  l  2  N  cp  2  T  p  (3.13) where a and <p are the angle of attack and stroke angle, respectively, as shown in Figure  3.2 Simulation  58  !  C  rot  '.a 2  '  c  ,(«)  C: ( « ) T  ~0  10  20  30  40  50  Angle of Attack (deg)  60  70  80  90  Figure 3.3: Force coefficients of aerodynamic forces. CV and CT are the normal and tangential force coefficients, respectively, based on data extracted from experiments in [20]. CN and CT vary with oc, and a good fit on data is Cjv(a) = 3.4 sin a for all a, and C {a) = 0.4cos (2a) if 0° < a < 45° and C {a) = 0 if 45° < a < 90°. C is the coefficient of the rotational force. C t is independent of a, and from thin airfoil theory can be expressed as C t = 27r(| — XQ), where XQ is the non-dimensional distance between the rotational axis and the leading edge. 2  T  T  ROT  ro  ro  3.2. CV(a), CT{OI) and C t are the normal, tangential and rotational force coefficients, ro  respectively, fitted to empirical data of the scaled model [20], and are shown in Figure 3.3. p is the density of the fluid (air), R is the wing length, and c is the mean wing chord width. U is the velocity of the wing's center of pressure: cp  U (t) = f R4>(t), cp  2  (3.14)  where f 2 is the non-dimensional center of pressure. Finally, the total lift, FT,(t), and drag, Fr>(t), acting on the wing in Figure 3.1 can  3.2 Simulation  59  be derived by a trigonometric transformation, as follows: F (t)  = F , {t) + F , (t),  (3.15)  F (t)  = Ftr {t),  (3.16)  F (t)  = F (t) cos a(t) - F (t) sin cx(t),  (3.17)  F (t)  = F (t) sina(t) + F (t) cos a(t).  (3.18)  N  T  L  D  tr N  rot N  tT  N  T  N  T  These equations are used throughout this section to simulate, the environment for the Q-learner agent.  Steps >  Figure 3.4: Graph of e , ~/t and Xt used in simulation. The simulation was carried out using A = 1 and e and 7 varying by equations (3.19) and (3.20), respectively. As can be seen from the figure, the exploration rate e is decreased and the discount factor 7 is increased by the time step.* The former implies decreased exploration near the end of the simulation and the latter indicates increased reliability on expected future reinforcements as the agent becomes more "experienced". t  3.2 Simulation  3.2.2  60  R L Implementation  Figure 3.4 shows how e, 7 and A change over time, e has been chosen to change by the following equation:  Al  £t=£t  + 1.2 x 10-*™)•  ( 3 J 9 )  This is particularly useful to determine that the robot is gradually learning the movement. 7 has been set to vary according to the exploration rate, e, by the following equation: 7 i  = 0.95 - 0.3e*.  (3.20)  A has been set to 1 during the learning process. Morphological and kinematic parameters have been chosen similar to those typical of Drosophila, as shown in Table 3.1. Symbol R c r^S) $ xo p  Parameter Wing Length (mm) Mean wing chord length (mm) Non-dimensional second moment of wing area Stroke amplitude (deg) Non-dimensional chordwise axis of rotation from leading edge Air density (kg m~ ) 3  Value 2.47 1.60 0.35 160 0.25 1.29  Data from [45] and [46]. Table 3.1: Simulation parameters for a typical Drosophila.  3.2 Simulation  61  The states and actions are defined for a.2 DOF wing motion as:  s=  a  (3.21)  a—  Aa  a where <f> and a. are velocities of joints. At each time step the agent can control its speed on each degree of freedom independently by increasing, decreasing or keeping the same speed, that is A<fi, Aa €E {—5v,0, +5v}. Therefore, at each time step, the agent has a set of 9 possible actions to choose from. The stroke angle, <f>, can vary between —80° and 80°, giving the maximum stroke amplitude of $ = 160° (Table 3.1). The angle of attack, a, can vary between 0° and 180°. Both variables are discrete in steps of 5°. The boundary conditions on <p and a have not been checked manually; instead, a biomimetic concept is implemented. If the motion of the wing requires a joint to go beyond its range, a discontinuity in acceleration will stop the joint at the end of the range, causing the velocity to suddenly drop to zero. This situation returns a large negative reinforcement (i.e., "jerk") to represent a possible feeling of "pain" on the wing joint (i.e., r = — R x)ma  Otherwise, the reinforcement  function is the instantaneous lift force (i.e., F L ( s , a ) ) , as we are trying to maximize t  t  the lift force during this process. That is,  n+i = R(s ,a ) = t  Rma  if \4>\ > threshold-, $  t  Fr,(si,at)  (3.22)  otherwise.  It can be seen how it is easy under the RL framework, to modify the implementation for a particular goal. With the same state and action signals, one can build arbitrary goal functions by only changing the reinforcement function. This is because the agent will eventually explore its actuation space to maximize this function over time. For  3.2 Simulation  62  example, in order to minimize power expenditure of the flapping trajectory, one can define the power requirement as a function of the state transition, and punish the agent by adding a new term to the generated lift with a proper negative weighting. As another example, hovering can be achieved by subtracting the desired hovering lift force (i.e., the MAV weight) and at the same time punishing for the average drag force over a window of time. In general the reinforcement learning function, R(s,a), can be written as:  R{s ,a ) = W xG , t  where G =  [gig2 •  t  lXn  (3.23)  nxl  • • gn] is a vector of desired goals and W = [w\W2 • • • w ] is the T  n  weighting vector. Obviously, (3.23) is not the only way to incorporate different goals. For example, here we implemented two goals (i.e., lift generation and jerk avoidance) by considering them in two separate cases. 3.2.3  Evaluation  Without any a priori assumptions of the desired trajectory, the agent converged to a smooth flapping motion as shown in Figure 3.5 and 3.6, much like the flapping motion in biological insects. The agent preferred to accelerate at the beginning of each half stroke and to decelerate at the end to avoid sudden stall (or possible injury to the joint) and to generate as much lift as possible to maximize the cumulative reward. As can be seen in Figure 3.5, the rotations initiated before the end of each half stroke and the duration of each rotation is about At t « 0.2 of the total stroke cycle. ro  The rotations are also delayed in each stroke as can be seen from the dashed lines of Figure 3.5. The parameter being measured to evaluate the convergence of the learning and to evaluate the learning performance is the rate at which the agent is collecting rewards over time. To make the parameter more illustrative, we defines the average reward at time step t to be the average of instantaneous rewards gained over the time window  3.2 Simulation  63  Stroke Position  Figure 3.5: Wing-chord representation of the optimal policy found by the agent maximizing the lift force in simulation. The agent makes use of its full stroke amplitude. The rotations occur before the end of each half stroke with a duration of At « 0.2 of the stroke cycle. The rotations are not symmetric around the end points. The dashed lines indicate the approximate start and end of rotations and show how the rotations are delayed. The arrows on the wing chord show only the instantaneous lift forces. rot  [t — T , t] where the length of the window, T + 1, was arbitrary. That is, w  w  i=t  R(s ,a ) t  t  =  i = t  E - -  F (si,a,i) L  T  .  (3.24)  J-W " r J-  Figure 3.7 shows that the algorithm converges to a mean lift of FL = 5.04 fiN in about 300K steps. A salient point in this graph is that the lift force found a plateau of F ,max — 5.04 /iN before the end of the learning episode, which indicates the locally L  maximum mean lift that can be generated using the constraints of our simulation. Moreover, from Figure 3.7, we see that the agent minimized the frequency of out-ofrange motions, for which it has been punished, as described. The optimal policy has converged to the length of N f. stro  e  = 25 steps as can be seen  in Figure 3.6. That is, the final flapping motion cycle is completed in 25 time steps. On  3.2 Simulation  64  0  0.5 Downstroke  • 1 Upstroke  1.5 Downstroke  2 Upstroke  400  Stroke Cycle  Figure 3.6: The'position and velocity profiles of the optimal policy found in the simulation. Parameter ip = a — 90° is the rotation angle. The figure shows nearly sinusoidal motion on 4>. The discontinuities in acceleration seen in velocity profile are due to quantization of velocities in 9 steps. the other hand, Q-learning has converged in about N i i ep  S0(  = 300K  e  steps. This means,  assuming enough computational power at the prototype scale of a Drosophila that beats its wings at about / = 200 Hz, the total time needed to learn flapping flight that produces lift force for flight of a biomimetic Drosophila would be: T =  N,episode Nstroke xf~  300 x 10 steps' 3  25 §H  = 60 s.  X 200 H~z  (3.25)  However, this only shows the possible applicability of an R L controller for online learning of flapping flight. A number of challenges are involved in implementing RL algorithms in real world which are addressed in future work (§4.1). The algorithm took approximately 3 hours to learn the flapping flight presented above. The speed of convergence was not an issue in this work and the program has not been optimized for time. In fact, as mentioned earlier, other algorithms with more involved models and implementations exist, at the expense of added complexity, that  3.3 Experiment  61  65  1  1  ,  1  1  10.25  Steps  Figure 3.7: Improvement on lift generation and jerk avoidance during the simulation. The graph shows the mean lift increasing by the simulation steps until it reaches the maximum mean lift force of FL = 5.04 LLN. An interesting point in the graph is that the lift force found a plateau of Fi = 5.04 nN before the end of learning, which indicates the locally maximum mean lift that can be generated using the constraints in the simulations. On the other hand, it shows the agent minimized the frequency of jerk that is associated with a motion-range violation. tmax  might behave more efficiently overall. However, the simulation presented above was not meant to provide the best solution, but rather to pave the way to implement the RL framework in the real world. Based on the evaluations, it successfully accomplished this goal.  3.3  Experiment  As mentioned in §1.2, the quasi-steady model used in the simulation may be inadequate for good quantitative force estimations. Thus, the R L algorithm was implemented on a dynamically scaled model to determine its use for true unsteady aerodynamics and the results show promise [44]. This section is dedicated to the findings of this experiment.  3.3 Experiment  3.3.1  66  Dynamically Scaled M o d e l ' Theory  The theory behind conserving the aerodynamics while scaling the geometric dimensions of a problem is based on the dynamic scaling method commonly used in fluid mechanics studies (the basic concept here can be found in a standard textbook on fluid dynamics such as [33]). Sometimes, experimental testing of the actual objects of interest is difficult or impossible. However, a scaled model can be built for such tests, which involves not only geometric scaling, but adjustments to various other parameters in the system. In the case of insect flight, the building of a dynamically scaled model will accommodate experimental procedures, providing conditions that would be comparable to the real insect. For example, changes such as enlarging the wing, beating the wing in a fluid denser than air, and lowering the wing-beat frequency, result in larger forces and slower wing beats that are easier for making measurements. There are also many other complications involved in performing experiments on live insects, despite their natural ability to control wing-beat kinematics. When such scaling is used, the scaled version is known as the "model" and the actual object of interest is known as the "prototype". In order for experiments on the model to be relevant for the prototype as well, the model must be both geometrically and kinematically similar to the prototype. While the former is ensured by scaling every dimension in the prototype by a single value, the latter is ensured by considering all the pertinent parameters associated with a flapping wing moving through a fluid. These include the velocity of forward motion (V), the density (p) and kinematic viscosity of the fluid (u), the stroke amplitude ($) and frequency of flapping motion (n), and a characteristic length of the system; in this case, the wingspan (b) is used. Using the Buckingham n theorem, these basic parameters can be organized into terms of independent non-dimensional parameter groups (n-groups). In this way, only each of the independent n-groups must be equal to ensure that the results are comparable, providing a bridge for conversion of measured values from the model to predict values appropriate to the prototype. In order to meaningfully extrapolate  3.3 Experiment  67  results from the model to the prototype for our experiments, given the basic parameters that have been identified, the theorem tells us that two independent non-dimensional groups must be matched. The natural ones to use for flapping flight are the Reynolds number (Re) and Strouhal number (St):  R  e  =  b  E  V  JH^±,  b L  =  (3.26)  and  where the subscripts identify either the prototype or the model. For example, given that the lift force (FT) can be expressed as a function of the basic parameters mentioned above, F = f (V,p,v,$,n,b), L  1  (3-28)  all the basic parameters in this system can then be expressed in terms of non-dimensional parameters, as follows: C  L  = f (Re,St,$). 2  (3.29)  The coefficient of lift (CL) is the dependent non-dimensional LT-group, which is also a function of FL- Since C L is the same for the prototype and the model, all the basic parameters are known, and the prototype lift force can be solved using measurements of the model's lift force. It is important to comment on two points here. First, in many of the other scaled models, researchers have ignored St and used a modified Re derived by Ellington in [26] that uses the time-averaged velocity of the wingtip instead of the flight velocity: — $6 h Re=—, 2  (3.30)  where Ai is the aspect ratio of the wing span to its mean chord length. Since geometric  3.3 Experiment  68  similarity requires that $ and Al remain the same between the model and prototype, we need only ensure that ^  also remains the same. By satisfying (3.26) and (3.27),  one can easily verify that (3.30) is also satisfied (although the converse does not hold true). Second, during hovering mode where V = 0, (3.26) is automatically satisfied but a division-by-zero problem arises for computing St. It is perhaps for this reason that many researchers use the modified Re equation of (3.30). Although matching (3.26) and (3.27) is equivalent to matching (3.30) in the trivial hovering case, some care should be taken when the flight speed is not zero.  M o d e l Specification  The specification of morphological and functional parameters is best described in six sequential steps: 1. Specifying the basis for scaling: The basis for specification of the scaled model is a typical Drosophila melanogaster, and the parameters mentioned below are adjusted to geometrically and kinematically scale Drosophila flight. However, the design will also be easily adaptable to a wide range of prototypes (i.e., very flexible ranges of wing size/shape, configuration, and motion profiles). Table 3.2 summarizes the morphological and kinematic parameters associated with Drosophila flight. 2. Specifying At  m  and 3> : Geometric scaling requires matching wing-shape param  meters to those of Drosophila. All of the non-dimensional wing-shape parameters have been calculated from a Drosophila melanogaster^ wing shape, drawn in Figure 3.8 and detailed in Appendix A. Consequently, Al  m  Out of the possible range for stroke amplitude, $ 3. Finding (F/T) : max  m  = Alp and $  m  = 3> . p  = 160° was selected.  One of the constraints on scaling is the upper bound on  aerodynamic forces. Based on steady equations, an estimate on F  max  can be  obtained from: F max r  (3.31)  3.3 Experiment  Symbol b  p  jRp rl{S)  p  % n  p  (d4>/dt)l  jmax  69  Parameter Wing span (mm) Aspect ratio Non-dimensional second moment of wing area Stroke amplitude (deg) Stroke frequency (Hz) Square of maximum non-dimensional angular velocity Air density (kg m~ ) Air kinematic viscosity at 20° C (cSt) Reynolds numbers reported for Drosophila Reynolds number of the prototype 3  PP Vp R&Drosophila  Re  p  Value 4.94 5.487 0.328 148 - 169 190 - 212 30.3 1.2 15 115 - 178 165  Subscript p identifies the prototype. Re calculated based on values of this table. Ai and f\ (S) calculated from wing shape depicted in [98] (see Appendix A for calculations); v from [85]; all other data from [45], [46]. p  p  p  p  Table 3.2: Morphological and functional parameters of the prototype based on flight performance of Drosophila melanogaster.  Figure 3.8: Wing planform of Drosophila  melanog aster.  Redrawn from [98].  3.3 Experiment  70  10  20  30  40  50  1  2  3  4  Figure 3.9: Plot of force, F, and torque, T, versus model wing span, 6, and model napping frequency, n. The lines show the limits of the sensor on forces and torques. A combination of model wing span and model flapping frequency is valid if and only if it identifies a point below these two lines. where Cf  = 1.88 is the maximum force coefficient derived for  tmax  Drosophila^  maximum performance by Lehmann and Dickinson in [46]. p x-, for the scaled ma  model, was predicted to be the density of 100% glycerine, which is 1260  kgm~ . 3  A is the single wing area and Vti max is the maximum velocity of the wingtip. Pi  Eventually, it can be seen that F  max  gests that T  max  oc  against b and n m  m  cx  b^n^,  and a similar derivation sug-  6m m- The maximum expected forces and torques are plotted n  in Figure 3.9. Thick dashed lines represent the limits from the  force/torque sensor used. Therefore, the desired combinations of the model wing span and freqtiency are those that spot a point below the two limit lines.  3.3 Experiment  71  4. Choosing b  m  and n  m  based on (F/T) :  Since the fluid to be used, and  max  thus the ranges of kinematic viscosity as well, is quite flexible, and because b  m  has the greatest impact on both the maximum torques and the required volume of fluid, the model wingspan was determined first. A large wingspan would demand a larger tank and greater driving torques, while too small a wingspan would defeat part of the original purpose of the scaling. Given all of these considerations, a parametric study was performed and a model wingspan of b = 0.4m was chosen m  as a good compromise. With this choice of wing span, the possible values for n  m  fall below 420 mHz. 5. Setting u  m  to match Re'. All of the morphological and functional parameters  needed to match Reynolds numbers are now set except for u . Based on equation m  (3.30), u  m  can be selected so that Re  = Re .  m  p  From Table 3.2 and what is  specified for the scaled model so far, and for 100 < n (mHz) m  < 420, this value  will be in the range of 50 < u (cSt) < 207. Based on the kinematic viscosity of m  the water/glycerine mixture, this range can then be obtained by mixing 80% to 90% glycerine with water (see Appendix B). 6. Choosing an appropriate tank size: The final step is to choose the tank size, with the aim of minimizing the effect of the sides. On the other hand, as the volume of the needed fluid increases with the third power of the tank size, it is preferrable to keep the tank small. A 40' x 20' x 20' container was used, with the total effect of the side walls, surface and ground on the mean lift-force coefficient calculated to be +13.5% (see Appendix C).  Hardware  The model runs in a container filled with a water/glycerine mixture. These fluids were chosen because of their miscibility as well as the transparency of the final mixture. They also provide a wide range of possibilities for v  m  (between 1 cSt for water and about  1180 cSt for glycerine). The final mixture has the ratio of roughly 11% water to 89%  3.3 Experiment  72  glycerine to achieve a kinematic viscosity of about 160 cSt. A rectangular frame is built around the container from of extruded aluminum beams in order to support the drive mechanism. Vibrations of the frame due to movement of the mechanism have been minimized by reinforcing the bases and all joints on the frame using triangular plates (Figure 3.11(a)). This frame holds a platform on which the wingdriving motor assembly is mounted. A schematic of this mechanism for driving a single wing is shown in Figure 3.10. The main vertical shaft connects the motor assembly platform to the immersed base-joint assembly. Figure 3.12 shows the base-joint assembly. An ATI NAN017 6axis force/torque sensor has been hermetically sealed and mounted between the wing and the base joint, and calibrated to measure the instantaneous forces. (For the calibration procedure, refer to Appendix D.)  The wing is fabricated from a flat | "  plate of acrylic shaped to have a planform geometrically similar to the wing of a Drosophila  melanogaster  (Figure 3.8). A wing holder connects the wing to the sen-  sor. The wing, attached to the the wing holder and the sealed sensor, is shown in Figure 3.13. As can be seen, a part of the wing planform from Drosophila^  wing shown  in Figure 3.8 is missing due to the space occupied by the force sensor, wing holder and other attachments. The contribution of this missing part in aerodynamic force production has been estimated to be about 0.03% and therefore is neglected. (For calculations, see Appendix E.) As illustrated in Figure 3.11(b), the topmost motor drives the entire shaft assembly through a planetary gearhead about the stroke axis (motor 1 for yaw). The wing joint, attached at the distal end of the long shaft, is driven by an assembly of pulleys, timing belts and two motors to control rotation about the other two axes (motor 2 for roll and motor 3 for pitch) (Figure 3.12). The motors for all three wing-driving axes are NEMA 23 sized stepping motors with minimum holding torque of 264 oz-in. The gear mount rotates freely on ball under bearings the C-shaped bracket supporting the horizontal shafts. The motor-3 pulley drives this rotation directly while the motor-2 pulley drives the bevel gears connected to the wing shaft. The mechanism design is such that motor  3.3 Experiment  73  Planetary Gearhead  Stepper Motors  Timing Belts  Main Shaft  Gear Mount  x  IfESi  Bevel Gears  Force/Torque Sensor 1  Wing  Figure 3.10: Schematic of the single-wing driving mechanism (not to scale). 1 only drives the yaw rotation, motor 2 only drives the roll rotation, but motor 3 drives coupled motions of pitch and roll and thus needs more coordinated motion for a desired trajectory. Aside from running each orthogonal angular trajectory on its corresponding motor axis, motor 2 must also account for Motor-3 motion (i.e., by adding the pitch motion) to its motion profile. The adjustments should also include the gear-reduction ratios (5:1 on motor 1 and 3:1 on motor 2). The motors are driven at a microstepping mode of 12800 steps/rev. Most of the mechanism is machined out of aluminum plates and shafts, with the exception of the bevel gears, which due to wear and tear are made of stainless steel. A GALIL DMC-2183 8-axis motion-controller board controls the three motors from the PC commands through three driver boards (Figure 3.11(a)). The extra axes are reserved for possible future modifications (e.g., adding a second wing).  3.3 Experiment  74  (a) T h e experimental setup showing the aluminum frame reinforced for vibrations, single-wing driving mechanism and controller boards.  (b) Isometric view of the motor assembly showing the position and orientation of each motor,  Figure 3.11: Photos of the experimental setup.  Software  A collection of software has been written in MATLAB®using GUI Development Environment to create a user interface for collecting forces and to command desired trajectories. This software allows to interact with the scaled model through a host computer. • Trajectory Planner is a stand-alone GUI for creating and commanding common figure-of-eight and oval flapping trajectories by specifying the kinematic parameters. • Translator component translates velocities (output of a velocity controller) into motion-controller language so they can be sent as motor commands.  3.3 Experiment  75  Main i Shaft  Gear Mount  Bevel Gears  Figure 3.12: Close-up of the wing joint before wing is attached and submersed in glycerine.  Sealed Sensor Wing holder  Wing  Figure 3.13: The actual wing used in the experiment. The wing is fabricated from a flat | " plate of acrylic shaped to have a planform geometrically similar to the wing of Drosophila  melanogaster.  • Ethernet Interface provides UDP connection between the host computer and 3  the motion controller. • Calibrator applies the calibration matrix found previously by the calibration procedure outlined in Appendix D to the raw measurements from the force sensor. • Serial Interface connects the force sensor to the host computer through an RSImplemented using external code: PNET package as a tcp/udp/ip toolbox. Copyright © 19982003, Peter Rydesater et al., under G N U General Public License. 3  3.3 Experiment  76  232 serial communication line on 9600 bps. The scaled-model apparatus, described above, ensures the underlying aerodynamics of Drosophila^ flight is conserved by matching the Reynolds numbers between the prototype and the model. Since the learning algorithm explores all the possible actuations it can make, one more attribute has to be matched before forces from the scaled model can be compared to Drosophila's flight forces. Drosophila's flight muscles have limited abilities to accelerate and decelerate the wing and therefore the scaled model, in our experimental case, should be made limited in software to enable future comparisons with Drosophila.  A power requirement analysis of the insect's flight is required to determine  the desired constraint. 3.3.2  Power Requirement  Figure 3.14 shows how the input power is spent in an insect system. The power inputbudget is equal to the mechanical power output plus the amount dissipated by heat from muscular inefficiencies. The mechanical power output is in terms of the inertial power and the aerodynamic power. The inertial power is the power needed to accelerate and decelerate the wing and the aerodynamic power is the power needed to move the wings and body through the air. The aerodynamic power itself can be divided into three components: the induced power imparted to the vortex wakes, the profile power from profile drag and velocity, and parasite power to move the body in air. In order to make the scaled model comparable to Drosophila in terms of actuation abilities, a maximum angular velocity has been found for Drosophila and is used as an upper bound on joint velocities in scaled model. Looking at the mean specific inertial power, P* (= Pacc/mg) from Ellington's formulations in [26], we have : 4  cc  pn § R {d4>/di) 3  2  z  2  2Pw  4  n  max  2M  F o r the definitions of all the parameters in this equation, see [22] and [26].  (3.32)  3.3 Experiment  77  Power Input  Mechanical Power Output Inertial Power 1 Induced Power  Aerodynamic Power ... J ( I  Profile Power  1 \ Parasite Power JI J  i (  Figure 3.14: Power expenditure of the total input power budget in biological insects. The input power is dissipated mostly in heat by muscular inefficiencies. Redrawn from [10]. where (dip/dt) is the non-dimensional angular velocity defined in [23] from the dimensional angular velocity, (d(j)/dt), as:  (3.33)  {<hj>/di) = ^{d<j)/dt) \dcf>/dt\  m  a  x  can be used as the non-dimensional upper bound on the angular veloc-  ity we are looking for. Hereafter in this text, \drj)/di\  refers to the maximum of  max  non-dimensional angular velocity observed or measured for a flapping trajectory, and \d<f>/dt\bound  refers to the upper theoretic bound on non-dimensional angular velocity but  may not have been reached. The actual value for \d<f>/dt\^  ound  on \dcj)Idt\  max  of Drosophila  can be calculated based  in its maximum flight performance. Lehmann and Dickin-  son [45] investigated power requirements of Drosophila  in its minimum, hovering and  maximum performances. However, they assumed a constant maximum non-dimensional angular velocity of \d(f>/di\  max  = 5.50 in all three cases. Hence, for our purposes, there  is no advantage in using their results on Drosophila's other works on hovering  maximum flight performance over  Drosophila.  Another way to estimate the non-dimensional angular velocity is from two simple models: in the sawtooth model, the wing-tip velocity follows a constant profile, and  3.3  78  Experiment  in the harmonic model, it follows a half-cosine profile [93] [23]. The values for nondimensional angular velocity for each of these models has been calculated to be 4.00 and 6.28 respectively. Lehmann and Dickinson discussed that the actual value falls somewhere between these two extremes [45]. Consequently, the value \d<j>/di\bound = 6.25 has been selected to give a reasonable, yet practical estimate of the bound on nondimensional angular velocity. This value is non-dimensional and therefore stays constant between the model and the prototype. We assume the same bound on the angular velocity for the muscles/motor of the other DOF, that is, \da/dt\b nd = \d4>/dt\b dOU  oun  In §3.3.3, we will see how these parameters enter the process of velocity quantization. 3.3.3  Implementation: Learning on Scaled Model  The states, actions and reward function are defined, in the same way as before, for a 2 DOF (horizontal stroke plane) as:  a  , a=  Acp Aa  ,  r =  if \6\ > $threshold) . (3.34)  R(s, a) Fi(s,a)  otherwise.  a  The actual flapping frequency and stroke amplitude is not known until the end of the experiment, when the agent reaches the end of its learning episode and converges to a flapping trajectory. This is because the flapping motion is being learned by the agent and the parameters of this motion are not assumed beforehand. Therefore, the actual Reynolds number is not known until the end of the experiment (based on (3.30)), which introduces some difficulties into how well the Reynolds numbers can be matched. The actual converged Reynolds number can then be used to change other parameters in the model (in our case, u ) to achieve matching Reynolds numbers. However, the m  learning process should be repeated once more for this new environment. But there is no guarantee that the agent in this new environment would converge to the same flapping  3.3 Experiment  79  frequency or stroke amplitude! According to (3.33) a similar problem arises in matching the maximum non-dimensional angular velocities. Therefore, the process of parameter matching, in our case, is inevitably iterative. The simulation data presented in §3.2 provide insight into the possible ranges of these values, and has been used to estimate 3> and n and reduce the number of iterations. Note that the simulation parameters are insufficient to extract the napping frequency from, since in simulation there was no notion of time but rather time step; therefore, the actual temporal length of the steps, dt tep, s  is not known. If this value is known, the napping frequency can be simply found  by:  •' stroke" step v  11  As mentioned in §3.2, each of the two angular velocities has been quantized in 9 steps, that is:  i^O:}quantized  G {-4, - 3 , - 2 , - 1 , 0, 1, 2, 3, 4}.  (3.36)  The maximum value of this quantization should be matched by the value found previously in §3.3.2 for \d<j)/di\i) i. Using (3.33) and (3.35) to get the maximum 0un(  angular velocity in terms of deg/'step, we have:  4>max = -X^7  $sim\d(j>/di\ . bound  (3.37)  ^ ^'sim  The same derivation can be followed to find c\ . max  These values, along with the  values for the total number of microsteps and gearing ratios for the motors mentioned in §, are then used to command the motors accordingly. The experiment is carried out for the last third part of the learning episode (i.e., 100K cycles). This is because the quasi-steady model is believed to be able to sufficiently approximate the environment for the first two-thirds of the learning episode. This way the agent can learn the essential properties of the environment and then perform a more  3.3 Experiment  80  detailed search on an actual physical environment to fine-tune its behavior. Two issues guided this decision. First, the total learning episode could be very long considering the speed of the scaled model (e.g., an experimental episode would have taken more than 10 hours). Second, as the scaled model was not designed primarily to carry out such long experiments, the wear and tear on the mechanical parts was a real concern. £ and 7 have been chosen to change by the equations (3.19) and (3.20), respectively. A has been set to 1 during the learning process. This time, the three parameters are fed into the simulation for the first two-thirds of the learning episode, and then are used in the experiment for the last third of the episode. Figure 3.15 shows how e, 7 and A change over time. Figure 3.16 shows the data flow in the control loop. As can be seen, the Q-learning agent is connected to the scaled model via actuation and sensing lines. A control cycle is initiated by the Q-learning agent issuing an action. This action is then translated by the translator component to a command form understandable by the motion controller. 1 '  Simulation  \  Simulation  !  Exp jriment  1  \  0.8  i  1 1 i a, 0.6  1 i i 1  0.4|  X  0.2  1  \i 1  0  50K  100K  150K  Steps  200K  •— ' —250K  300K  Figure 3.15: Graph of et, "ft and Aj used in simulation. The experiment was carried out using A = 1 and e and 7 varying by equations (3.19) and (3.20), respectively. As can be seen from the figure, the exploration rate e is decreased and the discount factor 7 is increased by the time step. The former implies decreased exploration near the end of the simulation, and the latter indicates increased reliability on expected future reinforcements as the agent becomes more "experienced".  3.3 Experiment  81  Translator  f  Command w  \ Ethernet Interface  UDP  Package  Motion Controller Actuation  Control loop: 128 ms Exp2Sim  Dynamically Scaled Model Data from the force sensor  Lift-Force Extractor  Digital Filter  J  c  = *!f-6)  Calibrator  Serial Interface  Figure 3.16: Control loop of the experiment showing different components involved in actuation and sensing. This command is then sent as a UDP package to the motion controller, which in turn actuates the motors on the scaled model. The force sensor sends raw measurements through an RS-232 communication line to the calibrator. The calibrated force vector is then filtered using a zero-phase-delay, digital low-pass Butterworth filter with cutoff frequency of to = 2.5 Hz. More information on filtering sensor data, including spectral c  analysis of the data, is available in Appendix F. The forces are now calibrated and filtered, and therefore meaningful. However, they do not represent pure aerodynamic forces. The measured forces at the wing base consist of gravitational, inertial and aerodynamic components. The inertial components represent the acceleration forces due to the mass of the sensor and the wing as well as the added mass of the fluid around the wing, and are harder to isolate, since they are 5  trajectory-dependent.  Since the stroke plane is horizontal and there is no deviations  When a wing accelerates, it sets the surrounding fluid in motion, causing in inertial forces by the fluid. This phenomenon is called the added mass effect. 5  3.3 Experiment  82  from the stroke plane, only the horizontal component of the total force is assumed to be affected by inertial forces which we discard as we are interested in the lift force. Moreover, inertial forces do not affect the mean aerodynamic forces and the mean total force over a cycle can be obtained without calculating the inertial forces. The vertical component of the force can be derived by a trigonometric transformation, followed by subtracting the gravitational contribution of the masses of the wing holder and the wing calculated during the calibration process (see Appendix D) from the vertical component. The lift force is then calculated by accounting for the total effect of the side-walls, surface and bottom of the tank. There is one more step before rewarding the agent by this value, which is to make sure that the magnitude of reward distribution of the simulation matches that of the experiment. Such a match will ensure a smooth transition between simulation and experiment. This normalization is done by multiplying the lift forces by the Exp2Sim factor , as can be seen from Figure 3.16. The control loop, as 6  explained above, runs at fcontrol = 7.81 Hz. Therefore, every time step is 128 ms in length, that is, dt t s  3.3.4  ep  = 128 ms.  Evaluation of Experiment  Without any a priori assumptions on the desired trajectory, the agent, once more, converged to a smooth flapping motion as shown in Figure 3.17 and 3.18, similar but slightly different than the one found from the simulation. The agent preferred to accelerate at the beginning of each half stroke and to decelerate at the end to avoid sudden stall (or possible injury to the joint) and at the same time to generate as much lift as possible to maximize the cumulative reward. The discontinuities in acceleration seen in the velocity profile are due to quantization of velocities in 9 steps. Increasing the number of steps in this quantization will result in a more smooth trajectory but also will increase the total number of steps and therefore the time of convergence. As mentioned before, performance of the learning implementation has not been the focus in this work and is This factor is found experimentally by recording the final trajectory from the simulation and computing the mean liftforces in both simulation and experiment. The Exp2Sim factor can then be derived from: Exp2Sim = Lsim/Lsxp6  3.3 Experiment  83  Stroke Position  Figure 3.17: Wing chord representation of the optimal policy found by the agent maximizing lift force in the experiment. The agent makes use of its full stroke amplitude. The rotations have been delayed until the start of the next half stroke, with a duration of t ~ 0.25 of the stroke cycle. The dashed lines indicate the approximate start and end of rotations and show how the rotations are delayed. The arrows on the wing chord show only the instantaneous lift forces. rot  considered as a future work. Figure 3.19 shows that the algorithm converges to a mean lift of Fr, = 1.82 N in about 300K steps. From the figure it is also clear that the agent minimized the frequency of out-of-range motions due to associated punishment, as described before. As can be seen in Figure 3.17, the duration of each rotation is about one-quarter of the total stroke cycle. More technically, by denning non-dimensional time as i = nt, we have t t ~ 0.25. Comparing to the simulation results of Figure 3.5, the optimal policy ro  found from the experiment differs slightly in terms of the rotational timings and the instantaneous lift traces. The rotations in Figure 3.17 have been initiated before the end of each half-stroke. However, the rotations in Figure 3.17, as indicated by the dashed lines, are almost delayed until the start of the next half stroke. Moreover, unlike in the experiment, the lift traces from the simulation seem to fade near the end of each half  3.3 Experiment  84  80 CD CD o> CD  —  40  /  %x  \ v  S  -40 -80  ' \  /  /  /  '  >'  X  /  \  \  \ N  \  /  v  \/ •  /  •  /  0.5  y  > V  \  1.5  Stroke Cycle  Figure 3.18: The position and velocity profiles of the optimal policy found in the experiment. Parameter ip = a — 90° is the rotation angle. The figure shows nearly sinusoidal motion on <f>. The discontinuities in acceleration seen in velocity profile are due to quantization of velocities in 9 steps. stroke, when the wing goes under rotations. This mainly suggests that the quasi-steady aerodynamics model, used in the simulation, is insufficient to account for the lift forces generated during the wing's rotational movement. The experiment took approximately 3.5 hours to learn the flapping flight presented 7  above. As mentioned before, the speed of convergence was not an issue in this work, and the program has not been optimized for time. The optimal policy has converged to the length of N  exp  = 24 steps. That is, the  final flapping motion cycle is completed in 24 time steps. The flapping frequency, then, can be found from equation (3.35) to be 325 mHz. The agent made use of the whole available stroke motion range, as expected, and therefore $  e x p  = 160°.  Having identified these parameters, we can confirm that the actual Reynolds number and the bound on non-dimensional angular velocity, given the new n T  7  EXP  = 100,000 {steps)  x  0.128  (s/step)  m  and $ , are m  3.3 Experiment  85  0.15  0.05  150K Steps  Figure 3.19: Improvement on lift generation and jerk avoidance during the experiment. The graph shows the mean lift increasing by steps until it reaches the maximum mean lift force of FL = 1.82 N. The graph also shows the agent minimized the frequency of jerk associated with a motion-range violation. within the desired ranges making the scaled model comparable, in this sense, to a typical  Drosophila  From (3.30) and (3.37), we have:  melanogaster.  (160 x fo) x A x 0.325 1.6 x IO" x 5.487 2  T  Re m.actual  4  N  EXP  •  N  -  EXP  „  24  \d(j>/dt\bound,actual = 2— <t>max = ~r As can be seen, the Re ^ t al & d R p e matched, and T  n  m  e  ac U  ar  (3.38)  165,  (3.39)  6.00. \d<f>/dt\  bound  = — X 6.25 is the  \d<j)/dt\bound,actual  m  desired range. The experiments on the scaled model have successfully evaluated the control framework of §2.2 in the case study of lift generation for flapping flight, as qualitatively illustrated in Figure 3.19. The final parameters for the the dynamically scaled model are summarized in Table 3.3. The model parameters were set after several iterations to give a complete match with the prototype.  3.3 Experiment  Symbol b JRm m  $ m  m  n  (d<f>i'dt)  m<max  \d4>/di\% trot FL  86  Parameter Wing span (m) Aspect ratio Non-dimensional second moment of wing area Stroke amplitude (deg) Stroke frequency (Hz)  Value 0.4 5.487 0.328 160 0.325  Maximum non-dimensional angular velocity  6.00  Mean square of non-dimensional angular velocity  21.0  Mean |-power of the absolute value of the non-dimensional angular velocity Non-dimensional rotation time  48.9  Mean lift force (N) Fluid density (kg m ) Fluid kinematic viscosity (cSt) Reynolds number  1.82 1235 160 165  - 3  Pm Re  m  0.25  Subscript m identifies the model. Re calculated based on values of this table. |-power of the absolute value of the non-dimensional angular velocities are calculated for future reference. m  Table 3.3: Morphological and functional parameters of the model based on experimental results. The scaled model is not intended to scale every attribute of biological insects, since the final goal of building an MAV capable of sustained flight will not be achieved by replicating biological insects (as also acknowledged in §2.3). Therefore, fundamental technological differences in mechanism and design (e.g., actuation) between the prototype (i.e., nature) and model (i.e., engineering) make some comparisons pointless. Moreover, there is a fundamental difference between the goal functions of the agent and an insect: The agent was not concerned with minimizing energy (and therefore had greater freedom in generating aerodynamic forces), while it is believed that evolution  3.3 Experiment  87  in biological insects did minimize the energy cost of flapping. One can argue that based on this discussion, the promise of the findings from this experiment will be realized when an actual MAV is built. In the meantime, however, nature can still be regarded as an intuitive guideline for confirming the final achievements and to some extent identifying the promise of the results. The rest of this section is, therefore, dedicated to comparing the experimental findings from the scaled model to biological insects. The purpose of building every dynamically scaled model is to scale the prototype, gather information which were hard to obtain in the prototype scale, and to scaled down the results to get the corresponding results for the actual use on a smaller scale MAV. Therefore the comparison is made between a biological script  p,  as before) and an extrapolated  Drosophila-hased  Drosophila  (denoted by sub-  M A V of 4 cm in wingspan 8  9  (denoted by subscript q to distinguish it from the scaled model denoted by subscript m). To compare the generated force, it is more appropriate to compare the mean lift-force coefficients. This is because Cr. is not affected by different scaling factors in different works, and can be directly compared to that of a biological insect. Since the prototype and the model have different policies for expending power, it is logical to compare the mean aerodynamic efficiencies of the two flapping trajectories in parallel . Therefore, 10  pairs of (Cr,, Tfo) give a better metric for comparison. The mean lift-force coefficient, CL, can be found by substituting the generated mean lift force as F in the following equation (modified from [24]):  C  F  64F  =  (3.40)  p^ n b M- (d^/diff {S) 2  2  4  l  2  As mentioned earlier, the force coefficient stays unchanged with scaling; therefore, That is, extrapolated from the Drosophila-based scaled model. T h e choice of wing span was arbitrary. * I t can be argued that the mechanical efficiencies of the two systems should be compared. However, the inertial power depends on the actuation mechanism and design, and therefore can be quite different for an M A V and an insect. 9  1 0  3.3 Experiment  88  in order to find (CrJ , model parameters can be plugged into (3.40). 9  The mean aerodynamic efficiency, rfc, can be defined as the minimum power required to hover divided by the aerodynamic power expended (from Weis-Fogh [87] and Ellington [26]). That is: Ta = ^f, where P  RF  (3-41)  is the minimum power calculated for a steady downward Rankine-Froude  momentum jet. P  RF  for a horizontal stroke plane can be written as: _2p  where p  (= FL/S where S = b M~ ) is the wing loading. The mean specific aero2  w  l  dynamic power, P*, is the sum of the mean specific induced power, P* specific profile power, P*  and the mean specific parasite power, P*  the mean  as discussed in  §3.3.2 (Figure 3.14). Pp* , the parasite power required to move the body through the ar  air is commonly neglected when flight speed is assumed to be zero [26], and for tethered flight [45]. The power equations are therefore (modified from [45] and [26]):  p$ n 6 yR- |dc6/dt| r|(5). 3  Ppro ~  5  1  3  Co ro :P  *a = L  P  3  P  + wo ,  (3.44)  (3-45)  P  d  >  where F is the mean total force, and n is the Rankine-Froude correction factor required T  because of the periodic nature of vortex shedding in the wake [25]. The value for K from [45] has been trusted, which is based on Ellington's estimations [25] and visualizations of wake during tethered flight by Dickinson and Gotz [19]. r (S) is the third moment 3  of wing area (see Appendix A), \dq)/di\ is the mean cube of the absolute value of the 3  non-dimensional angular velocity and Cu,pro is the mean profile drag coefficient which according to the approximation of Ellington in [26] can be estimated as 7/(\/~Re).  3.3 Experiment  Symbol  89  Parameter Wing span (m) Aspect ratio Non-dimensional third moment of wing area Stroke amplitude (deg) Stroke frequency (Hz)  Value 0.04 5.487 0.230 160 3.04  \d4>/dt\l  Mean cube of the absolute value of the non-dimensional angular velocity  114.9  Wo  Mean total force (pN) Air density (kg m~ ) Air kinematic viscosity (cSt) Rankine-Froude correction factor Reynolds number  18.53 1.2 15 1.28 165  rl(S)  q  n  q  3  Pq v  q  K Re  q  Subscript q identifies the MAV. n calculated from (3.30); Ft calculated based on (3.40). q  >g  Table 3.4: Parameters used in lift and power equations for the extrapolated M A V of wingspan b based on experimental results. q  The parameters needed to calculate the lift and power for the extrapolated M A V of wingspan b — 4 cm are summarized in Table 3.4. Plugging into the lift and power q  equations yields (CT., rf^) = (3.56, 0.40). The maximum observed lift coefficient q  for Drosophila  can be found from [46] (also mentioned by Sane and Dickinson [70])  to be (CL) — 1.9. The parameters needed to calculate the power for a biological P  Drosophila  are summarized in Table 3.5 (from [45]). Using the same equations of power,  (rfo) = 0.25. Therefore, (C£, rh) = (1.9, 0.25). p  p  The comparison is summarized in Table 3.6. From the table, the first row indicates that the extrapolated MAV has demonstrated promising capability in generating a mean lift coefficient in the range of the Drosophila melanogaster  prototype and beyond. In  this comparison case and using the same equations for power, the second row suggests that the MAV has achieved a higher mean aerodynamic efficiency. This increase was  3.3 Experiment  Symbol bp Mp r\(S) <J> n  90  Parameter Wing span (mm) Aspect ratio Non-dimensional second moment of wing area Stroke amplitude (deg) Stroke frequency (Hz)  p  P  p  Value 4.94 6.18 0.242 169 212  \dcf>/dt\ Mean cube of the absolute value of the non-dimensional angular velocity  104.6  Ft~p~ pp K Re  13.8 1.2 1.28 165  3  p  Mean total force (uN) Air density (kg m~ ) Rankine-Froude correction factor Reynolds number 3  Subscript p identifies the Drosophila  prototype.  Table 3.5: Parameters used in power equations for the biological Drosophila maximum performance.  Drosophila  Va~  m.  in its  extrapolated M A V  1.9  3.56  0.25  0.40  Table 3.6: Comparison result between a fruit fly and the extrapolated MAV in terms of lift generation (CL) and aerodynamic power efficiency  3.3 Experiment  91  unexpected, considering the unlimited power budget of the MAV versus the limited budget of the biological insect. This conclusion is valid. However, the reader is advised not to conclude a subtle but important implication of this sentence, that is the MAV being overall more efficient than Drosophila.  Several reasons can be mentioned. First,  the flapping trajectories were not the same. Second, the flapping trajectories compared do not represent the overall performances of either the MAV or the Drosophila.  Finally,  it is not discussed how much the biological insect has sacrificed power, in its maximum performance, to temporarily generate high aerodynamic forces. Nevertheless, the comparison with the biological insect shows great results and confidence in the actual values achieved by the extrapolated MAV.  Chapter 4  Conclusion In Chapter 2, we introduced a learning control framework and a sensor-rich paradigm and discussed them in detail. We implemented the high-level control framework for the case study of lift generation for microflight in Chapter 3, and evaluated it on the scaled model. The promising results confirmed the applicability of the framework. This work, to the knowledge of the author, is the first attempt at implementing a learning approach to the MAV flight-control problem. This thesis was motivated by the apparent need and lack of research in this biomimetic area, and aimed to justify and motivate the employment of this and similar biomimetic approaches. The biggest contribution of this work is, therefore, to make this interdisciplinary link between biology, neuroscience, control and computer science more apparent. We also made the following contributions in pursuing this goal. 1. We classified the problem of unsteady aerodynamics has been classified under the slowly-varying non-stationary, deterministic and k -ovder th  Markovian class of  problems (§ This classification is important in selecting a proper control method to handle these conditions. 2. We reviewed the biological, biomimetic and engineering approaches to sensor design for microflight (§2.3.2) in order to consider different approaches side by side. Together with the investigation of the scaling issue (§, these comparisons 92  4.1 Future Work  93  are intended to generate inspiration from nature while bearing in mind engineering limitations. This topic has been published in [57]. 3. We simulated and evaluated the low-level task of lift generation, using a quasisteady aerodynamic model to pave the way for experimental investigations (§3.2). The simulation has been published in [58]. 4. The dynamically scaled model of §3.3.1 is a valuable apparatus for understanding and visualizing unsteady aerodynamics. It can quickly and easily be altered to model different wing-beat patterns, wing shapes, frequencies and Reynolds numbers. The dynamically scaled model has been published in [44]. The findings produced from designing the current dynamically scaled model are particularly important in designing future generations of scaled models with more accuracy and capabilities. 5. As mentioned, the experiment with the dynamically scaled model (§3.3) answered questions about the applicability of the RL framework. It also confirmed the ability of the Q-learning algorithm to understand true aerodynamics to some extent. We successfully evaluated the learning approach by showing an improvement in the lift generated on the scaled model, and quantified it to allow future comparisons. Moreover, a comparison between an MAV-scale robotic insect and a biological fruit fly (Drosophila  4.1  melanogaster)  showed promise in the actual numbers produced.  Future Work  From the author's point of view, this work is relatively novel and is focused on a project that is in the early stages of development; it is therefore far from being complete and exclusive. The RL framework can be made more precise and detailed. More detailed explanation of the framework would make the comparisons between R L and other frameworks easier. The framework can also be made more comprehensive. The basic idea of hi-  4.1 Future Work  94  erarchical reinforcement-learning, for example, has been deliberately implied as the consequence of hierarchical goal functions ,but not explicitly pursued. The sensing paradigm, introduced in this thesis as a byproduct of the RL framework, can be made complete to explain in greater detail how it should be implemented. We discussed in this work how the complexity of the states is shifted from the control paradigm to the sensing paradigm. However, we have not addressed how this complexity should be handled, which identifies a possible future work. Sensor design for microflight can be explored as a separate research avenue that is useful not only for constructing MAVs but also for micro-robotics research in general. One of the aspects that should be addressed in sensor design is integration. The level of integration in nature far exceeds that in human-made machines. Integration of CMOS and MEMS should be explored to provide the infrastructure for integration of the whole silicon-based sensory system. Ideas for integration of actuators with sensors are also interesting and important in regards to insect sensor design criteria, and should be further studied. In Chapter 3, we chose a simple case study to achieve a low-level task that demonstrates the applicability of the RL framework. Other case studies, at the same or higher levels, can also be implemented. In the present study, Q-learning, a popular R L algorithm, has been selected and implemented based on the author's familiarity and experience. The specific algorithm was not the focus of this work since we wished to concentrate on the framework, rather than on a particular implementation. Therefore, a possible future work is to identify suitable RL algorithms. For example, more involved algorithms, such as those based on POMDPs or HMMs, can be compared to Q-learning from a practical point of view. Different layers of control can be implemented and handled simultaneously by a hierarchical RL approach that defines subgoals to achieve higher-level goals. This demonstration is particularly interesting for the inter-communication of adjacent layers. Continuous-RL implementations should be also explored. Computational demands of RL algorithms should be studied to achieve realtime implementations suitable for controlling the flap-  4.1 Future Work  95  ping flight in realtime. Furthermore, a solution to the curse of dimensionality in R L algorithms based on function approximation should be investigated. We conducted the simulation using a quasi-steady model. The simulation model, as discussed, seems too simplified to account for unsteady mechanisms of insect flight in general. Although our results were not based on the simulation (since the simulation was used as the first step to do the experiment), it is preferred to to replace the aerodynamic model used in simulation upon availability of a more accurate model. This is because as more information is found from the simulation, they can be used to plan the experimental procedures better. The dynamically scaled model needs to be redesigned based on the findings of the current implementation. In particular, the wing joint can be made smaller; the space occupied by the force sensor, wing holder and other attachments should be minimized; the wing-drive mechanism can be placed on a linear stage to allow for forward/backward motion; the effect of the sides should be minimized and the submersed parts should be anodized to prevent slight corrosion of aluminum parts in the fluid.  Bibliography [1] F. Arai, Y . Nonoda, T. Fukuda, and T. Oota. New force measurement and micro grasping method using laser raman spectrophotometer. In The IEEE International Conference on Robotics and Automation (ICRA '96), volume 3, pages 2220-2225, 1996. [2] P. Augustsson, K . Wolff, and P. Nordin. Creation of a learning and flying robot by means of evolution. In The Genetic and Evolutionary  Computation  Conference (GECCO'02),  pages  1279-1285, New York, NY, Jul 2002. [3] J. Ayers. Underwater walking. Arthropod Structure and Development, 33:347-360, 2004. [4] J. Ayers, P. Zavracky, N . McGruer, D. Massa., W. Vorus, R. Mukherjee, and S. Currie. A modular behavioral-based architecture for biomimetic autonomous underwater robots. In The 3rd International  Conference on Technology and the Mine Problem. Naval Postgraduate  School, 1998. CDRom. [5] A. Azuma, S. Azuma, I. Watanabe, and T. Furuta. Flight mechanics of a dragonfly. Journal of Experimental  Biology, 116:79-107, 1985.  [6] S.P. Beeby, G. Ensell, B.R. Baker, M.J. Tudor, and N.M. White. Micromachined silicon resonant strain gauges fabricated using soi wafer technology. Microelectromechanical Systems, 9(1):104-111, 2000. [7] A. Bicchi, A. Caiti, and D. Prattichizzo. Optimal design of dynamic force/torque sensors. In The IF AG Conference on Control Industrial Systems, Belfort, France, 1997.  [8] W.L. Brogan. Modern Control Theory. Prentice Hall, 3rd edition, 1991. [9] D. Campolo, R. Sahai, and R.S. Fearing. Development of piezoelectric bending actuators with embedded piezoelectric sensors for micromechanical flapping mechanisms. In The IEEE International  Conference on Robotics and Automation  (ICRA '03), Taipei, Taiwan,  2003. [10] T . M . Casey. A comparison of mechanical and energetic estimates of flight cost for hovering sphinx moths. Journal of Experimental Biology, 91:117-129, 1981. [11] C.-T. Chen. Linear System Theory and Design. Oxford University Press, New York, NY, 1998. [12] J.H. Cocatre-Zilgien and F. Delcomyn. Modeling stress and strain in an insect leg for simulation of campaniform sensilla responses to external forces. Biological Cybernetics, 81:149-160, 1999.  96  BIBLIOGRAPHY  97  [13] P. Dario, C. Laschi, S. Micera, F. Vecchi, M . Zecca, A. Menciassi, B. Mazzolai, and M.C. Carrozza. Biologically-inspired microfabricated force and position mechano-sensors. In F.G. Barth, J.A.C. Humphrey, and T.W. Secomb, editors, Sensors and Sensing in Biology and Engineering, pages 109-128. Springer-Verlag, 2003. [14] P. Dayan. The convergence of TD(A) for general A. Machine Learning, 8:117-138, 1992. [15] T. Dean, L.P. Kaelbling, J. Kirman, and A. Nicholson. Planning under time constraints in stochastic domains. Artificial Intelligence, 76(l-2):35-74, Jul 1995. [16] W. Derham. Physico-theology. W. and J. Innys, London, 1714. [17] M . Despont, G.A. Racine, P. Renaud, and N.F. de Rooij. New design of micromachined capacitive force sensor. Micromechanics and Microengineering, 3:239-242, 1993. [18] M.H. Dickinson. Unsteady mechanicsms of force generation in aquatic and aerial locomotion. American Zoology, 56:537-554, 1999. [19] M.H. Dickinson and K . G . Gotz. The wake dynamics and flight forces of the fruit fly, Drosophila Melanogaster. Journal of Experimental Biology, 199:2085-2104, 1996. [20] M.H. Dickinson, F.-O. Lehmann, and S.P. Sane. Wing rotation and the aerodynamic basis of insect flight. Science, 284:1954-1960, 1999. [21] C P . Ellington. The aerodynamics of hovering insect flight. I. the quasi-steady analysis. Philosophical  Transactions of the Royal Society of London. Series B: Biological Sciences,  305(1122):1-15, 1984. [22] C P . Ellington. The aerodynamics of hovering insect flight. II. morphological parameters. Philosophical  Transactions of the Royal Society of London. Series B: Biological Sciences,  305(1122):17-40, 1984. [23] C P . Ellington. The aerodynamics of hovering insect flight. III. kinematics. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 305(1122):41-  78, 1984. [24] C P . Ellington. The aerodynamics of hovering insect flight. IV. aerodynamic mechanisms. Philosophical  Transactions of the Royal Society of London. Series B: Biological Sciences,  305(1122):79-113, 1984. [25] C P . Ellington. The aerodynamics of hovering insect flight. V. a vortex theory. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 305(1122):115-  144, 1984. [26] C P . Ellington. The aerodynamics of hovering insect flight. VI. lift and power requirements. Philosophical  Transactions of the Royal Society of London. Series B: Biological Sciences,  305(1122):145-181, 1984. [27] C P . Ellington, C van den Berg, A.P. Willmot, and A.L.R. Thomas. Leading edge vortices in insect flight. Nature, 384:626-630, Dec 1996. [28] S. Fahlbusch, S. Fatikow, and K. Santa. Force sensing in microrobotic systems: an overview. In S. G. Tzafestas, editor, Advances in Manufacturing:  Decision, Control and  Information  Technology, volume 1, pages 233-244. Springer-Verlag, London, 1999. [29] Z. Fan, J. Chen, J. Zou, D. Bullen, C. Liu, and F. Delcomyn. Design and fabrication of artificial lateral line flow sensors. Micromechanics and Microengineering, 12(5):655-661, Sep 2002.  BIBLIOGRAPHY  98  [30] R . Faust. Untersuchungen zum halterenproblem. Zool. Jahrb. Physiol., 63:325-366, 1952. [31] A . Fayyazuddin and M . H . Dickinson. Halteres afferents provide direct and electronic input to a steering motor neuron in the blowfly, Calliphora. Neuroscience, 16:5225-5232, 1996. [32] R.S. Fearing, K . H . Chiang, M . H . Dickinson, D . L . Pick, M . Sitti, and J . Y a n . Wing transmission for a micromechanical flying insect. In The IEEE International Conference on Robotics and Automation (ICRA '00), pages 1509-1516, San Francisco, C A , A p r 2000. [33] R . W . Fox and A . T . McDonald. Introduction to Fluid Mechanics. John Wiley & Sons, New York, 3rd edition, 1985. [34] G . Fraenkel. The function of the halteres of flies. Zool. Soc. Lon. A, 109:69-78, 1939. [35] G . Fraenkel and J.W.S. Pringle. Halteres of flies as gyroscopic organs of equilibrium. Nature, 141:919-921, 1938. [36] M . R . J . Gibbs, R. Watts, W . K a r l , A . L . Powell, and R . B . Yates. Microstructures containing piezomagnetic elements. Sensors and Actuators A: Physical, 59:229-235, 1997. [37] W . Gnatzy, U . Grunert, and M . Bender. Campaniform sensilla of Calliphora Vicina (insecta Diptera). I. topography. Zoomorphology, 106:312-319, 1987. [38] C . G u i , R . Legtenberg, H . A . C . Tilmans, J . H . J . Fluitman, and M . Elwenspoek. Nonlinearity and hysteresis of resonant strain gauges. Microelectromechanical Systems, 7(1):122-127, 1998. [39] R . Hengstenberg. Controlling the fly's gyroscope. Nature, 392:757-758, 1998. [40] R . Hunt, G . Hornby, and J . D . Lohn. Toward evolved flight. In The Genetic and Evolutionary Computation Conference (GECCO'05), pages 957-964, Washigton, D C , Jun 2005. [41] W . L . J i n and C D . Mote Jr. A six-component silicon micro force sensor. Actuators A: Physical, 65:109-115, 1998.  Sensors and  [42] L . P . Kaelbling, M . L . Liftman, and A . W . Moore. Reinforcement learning: a survey. Artificial Intelligence Research, 4:237-285, 1996. [43] T . A . K e i l . Functional morphology of insect mechanoreceptors. Technique, 39:506-531, 1997.  Microscopy Research and  [44] W . L a i , J . Yan, M . Motamed, and S. Green. Force measurements on a scaled mechanical model of dragonfly in forward flight. In The IEEE International Conference on Advanced Robotics (ICAR'05), pages 595-600, Seattle, W A , Jul 2005. [45] F . - O . Lehmann and M . H . Dickinson. The changes in power requirements and muscle efficiency during elevated force production in the fruit fly Drosophila Melanogaster. Journal of Experimental Biology, 200:1133-1143, 1997. [46] F . - O . Lehmann and M . H . Dickinson. The control of wing kinematics and flight forces in fruit flies (Drosophila spp.). Journal of Experimental Biology, 201:385-401, 1998. [47] David R. Lide, editor. CRC Handbook of Chemistry and Physics. C R C Press, 85th edition, 2005. [48] L . L i n , A . P . Pisano, and R . T . Howe. A micro strain gauge with mechanical amplifier. Microelectromechanical Systems, 6(4):313-321, 1997.  BIBLIOGRAPHY  99  , [49] M . Lohndorf, T.A. Duenas, A. Ludwig, M . Ruhrig, J. Wecker, D. Burgler, P. Grunberg, and E. Quandt. Strain sensors based on magnetostrictive G M R / T M R structures. IEEE Transactions on Magnetics, 38(5):2826-2828, 2002. [50] N . E. McGruer, G.G. Adams, T.Q. Truong, T.G. Barnes, X . Lu, and J.C. Aceros. Biomimetic flow and contact/bending mems sensors. In J. Ayers, J.L. Davis, and A. Rudolph, editors, Neurotechnology for Biomimetic Robots, pages 13-30. The MIT Press, Cambridge, Massachusetts, 2002. [51] J.M. McMichael and M.S. Francis. Micro air vehicles - toward a new dimension in flight. Online, Dec 1997. [52] P. Melvas, E. Kalvesten, and G. Stemme. A temperature compensated dual beam pressure sensor. Sensors and Actuators A: Physical, 100:46-53, 2002. [53] J.R. Meyer. Mechanoreceptors, September 2001. Department of Entomology, NC State University. [54] R.C. Michelson and S. Reece. Update on flapping wing micro air vehicle research - ongoing work to develop a flapping wing, crawling entomopter. In The 13th Bristol International Conference on RPV/UAV Systems, pages 30.1-30.12, Bristol, England, Mar-Apr 1998. [55] E.L. Mockford. A new species of Dicopomorpha (Hymenoptera: Mymaridae) with diminutive, apterous males. Annals of the Entomological Society of America, 90:115-120, 1997. [56] M.W. Moffet. Wetas - New Zealands insect giants. National Geographic, 180(5):100-105, 1991. [57] M . Motamed and J. Yan. A review of biological, biomimetic and miniature force sensing for microflight. In The IEEE/RSJ  International  Conference on Intelligent Robots and Systems  (IROS'05), pages 3939-3946, Edmonton, A B , Aug 2005. [58] M . Motamed and J. Yan. A reinforcement learning approach to lift generation in flapping MAVs: Simulation results. In The IEEE International Conference on Robotics and Automation (ICRA '06), Orlando, FL, May 2006. In Press. [59] G. Nalbach. The halteres of the blowfly Calliphora I. kinematics and dynamics. Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 173:293-300, 1993.  [60] G. Nalbach. Extremely non-orthogonal axes in a sense organ for rotation: behavioral analysis of the dipteran haltere system. Neuroscience, 61(1):149-163, 1994. [61] I. P. Pavlov.  Conditioned reflexes: an investigation  of the physiological activity of the  cerebral cortex. Oxford University Press, London, 1927. G. V. Anrep, Translator. I  [62] T.N. Pornsin-sirirak, S.W. Lee, H. Nassef, J. Grasmeyer, Y.C. Tai, C M . Ho, and M . Keennon. Mems wing technology for a battery-powered ornithopter. In The 13th IEEE International Conference on Microelectromechanical Systems, pages 799-804, Miyazaki, Japan, Jan 2000. [63] H. Porte, V. Gorel, S. Kiryenko, J. Goedgebuer, W. Daniau, and P. Blind. Imbalanced mach-zehnder interferometer integrated in micromachined silicon substrate for pressure sensor. Lightwave Technology, 17(2):229-233, 1999. [64] J.W.S Pringle. Proprioceprion in insects. II. the action of campaniform sensilla on the leg. Journal of Experimental Biology, 15:114-131, 1938.  BIBLIOGRAPHY  100  [65] J.W.S Pringle. The gyroscopic mechanism of the halteres of Diptera. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 233:347-384, 1948.  [66] J.W.S Pringle. Insect Flight. Cambridge University Press, London, 1957. [67] B.L. Pruitt and T.W. Kenny. Piezoresistive cantilevers and measurement system for characterizing low force electrical contact. Sensors and Actuators A: Physical, 104:68-77, 2003. [68] R. Ramamurti and W.C. Sandberg. A three-dimensional computational study of the aerodynamic mechanisms of insect flight. Journal of Experimental Biology, 205:1507-1518, 2002. [69] S.P. Sane. The aerodynamics of insect flight. Journal of Experimental Biology, 206:4191208, 2003. [70] S.P. Sane and M.H. Dickinson. The control of flight force by a flapping wing: lift and drag production. Journal of Experimental [71] L. Schenato. Analysis  Biology, 204:2607-2626, 2001.  and control of flapping flight: from biological to robotic insects. PhD  thesis, University of California at Berkeley, Berkeley, CA, Fall 2003. [72] L. Schenato, X . Deng, W.C. Wu, and S. Sastry. Virtual insect flight simulator (VIFS): a software testbed for insect flight. In The IEEE International Conference on Robotics and Automation (ICRA'01), pages 3885-3892, Seoul, Korea, May 2001. [73] U. Schoneberg, F. V. Schnatz, W. Brockherde, P. Kopystynski, T. Melhorn, E. Obermeier, and H. Benzel. Cmos integrated capacitive pressure transducer with on-chip electronics and digital calibration capability. In The International  Conference on Solid-State Sensors  and Actuators, pages 304-307, 1991. [74] A. Sherman and M.H. Dickinson. A comparison of visual and haltere-mediated equilibrium reflexesin the fruitfly Drosophila Melanogaster. Journal of Experimental Biology, 206:295302, 2003. [75] A. Sherman and M.H. Dickinsons, summation of visual and mechanosensory feedback in Drosophila flight control. Journal of Experimental Biology, 207:133-142, 2004. [76] B. F. Skinner. The behavior of organisms: An experimental analysis. Copley, Acton, M A ,  1938. [77] A. Skordos, P.H. Chan, J.F.V. Vincent, and G. Jeronimidis. A novel strain sensor based on the campaniform sensillum of insects. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 360:239-253, 2002.  [78] O. Sotavalta. Recordings of high wing-stroke and thoracic vibration frequency in some midges. Biological Bulletin, 104:439-444, 1953. [79] S.M Spinola and K . M . Chapman. Proprioceptive indentation of the campaniform sensilla of cockroach legs. Comparative Physiology A: Sensory, Neural, and Behavioral  Physiology,  96:257-272, 1975. [80] M . Sun and S.L. Lan. A computational study of the aerodynamic forces and power requirements of dragonfly (Aeschna Juncea) hovering. Journal of Experimental Biology, 207:18871901, 2004. [81] R.S. Sutton and A.G. Barto. Time-derivative models of pavlovian reinforcement. M . Gabriel and J. Moore, editors, Learning and Computational  Neuroscience:  tions of Adaptive Networks, pages 497-537. MIT Press, Cambridge, MA, 1990.  In  Founda-  BIBLIOGRAPHY  101  [82] R.S. Sutton and A.G. Barto. Reinforcement learning: an introduction. bridge, MA, Mar 1998.  MIT press, Cam-  [83] N . Svedin, E. Kalvesten, and G. Stemme. A lift force sensor with integrated hot-chips for wide range flow measurements. Sensors and Actuators A: Physical, 109(1-2):120-130, 2003. [84] J.R. Trimarchy and R.K. Murphey. The shaking-B-2 mutation disrupts electrical synapses in a flight circuit in adult Drosophila. Neuroscience, 17:4700-4710, 1997. [85] S. Vogel. Life in Moving Fluids. Princeton University Press, Princeton, N J , 1981. [86] C.J.C.H. Watkins. Learning from delayed rewards. PhD thesis, King's College, Cambridge, UK, 1989. [87] T. Weis-Fogh. Energetics of hovering flight in hummingbirds and in Drosophila. Journal of Experimental  Biology, 56:79-104, 1972.  [88] Frank M . .White. Fluid Mechanics. McGraw Hill, 1998. [89] R.J. Wood and R.S. Fearing. Flight force measurements for a micromechanical flying insect. In The IEEE'/RSJ  International  Conference on Intelligent Robots and Systems  (IROS'01),  Maui, HI, Oct-Nov 2001. [90] T.W. Wu. On theoretical modeling of aquatic and aerial animal locomotion. Advances in Applied Mechanics, 38:292-353, 2001.  [91] W.C. Wu, R.J. Wood, and R.S. Fearing. Halteres for the micromechanical flying insect. In The IEEE International  Conference on Robotics and Automation (ICRA '02), Washington,  DC, May 2002. [92] Y . Xu, F. Jiang, Y.-C. Tai, A. Huang, C.-M. Ho, , and S. Newbern. Flexible shear-stress sensor skin and its application to unmanned aerial vehicle. Sensors and Actuators A: Physical, 105:321-329, 2003. [93] J . M . Zanker. The wing beat of Drosophila melanogaster. I. kinematics.  Philosophical  Transactions of the Royal Society of London. Series B: Biological Sciences, 327:1-18, 1990.  [94] R. Zbikowski. On aerodynamic modelling of an insect-like flapping wring in hover for micro air vehicles. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 360:273-290, 2002.  [95] R. Zbikowski. Sensor-rich feedback control. IEEE Instrumentation & Measurement Magazine, 7(3): 19-26, Sep 2004. [96] R. Zbikowski. Fly like a fly [micro-air vehicle]. IEEE Spectrum, 42(11):46-51, Nov 2005. [97] S.N. Zill and E.A. Seyfarth. Exoskeletal sensors for walking. Scientific American, 275(1) :8690, Jul 1996. [98] E. Zimmerman, A. Palsson, and G. Gibson. Quantitative trait loci affecting components of wing shape in Drosophila melanogaster. Genetics, 155(2):671-683, Jun 2000.  Appendix A  Wing-shape Equations Used for Calculating Wing Morphological Parameters The morphological parameters for the wing shape of Figure 3.8 have been calculated using Ellington's definition in [22]. The aspect ratio, At, can be found by dividing the wing span by the mean chord length. That is:  At = b/c,  (A.l)  where b, is the wing span, and c, is the mean chord length. Thefc-thmoment of wing area, is defined as:  S = 2 f cr dr = SR f cf df, Jo Jo k  k  k  k  (A.2)  where S — b AR~ is the total wing area, c = Alc/b is the normalized wing chord and f = r/R 2  l  is the non-dimensional radius. The non-dimensional fc-th moment of wing area can then be derived from:  f {S) = S /SR k  k  k  k  = f cf df. Jo k  (A.3)  In order to find the wing-shape parameters, the wing has been divided into 20 rectangular  102  103  elements. The elements have the same width of w = 10 mm, with their heights, d, changing according to the length of the wing chord. The parameters are summarized in Table A . l .  Symbol  At fl(S)  Table A . l : shape.  Parameter Aspect ratio Non-dimensional second moment of wing area Non-dimensional third moment of wing area  Morphological parameters calculated for Drosophila  Value 5.487 0.328 0.230 melanogaster's  wing  Appendix B  Water-glycerine M i x t u r e Properties The dynamically scaled model, used in this experiment, requires choosing a proper mixture of water and glycerine to achieve a certain Reynolds number, as discussed in § Table B . l can be used as a reference for choosing a percentage of glycerine in the mixture.  104  105  Mass (%) 0.5 1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100  Density (kg/m ) 999.4 1000 1002.8 1005.1 1007.4 1009.7 1012 1014.4 1016.7 1019.1 1021 1026.2 1031.1 1036 1040.9 1045.9 1056.1 1066.4 1077 1087.6 1098.4 1109.2 1120 1130.8 1141.9 1153 1164.3 1175 1186.6 1197.6 1208 1219.2 1229.9 1240.4 1250.8 1261.1 3  Dynamic Viscosity (mPas)  Kinematic Viscosity (cSt)  1.011 1.022 1.048 1.074 1.1 1.127 1.157 1.188 1.22 1.256 1.291 1.365 1.445 1.533 1.63 1.737 1.988 2.279 2.637 3.088 3.653 4.443 5.413 6.666 8.349 10.681 13.657 18.457 27.625 40.571 59.9 84.338 147.494 384.467 780.458 1490  1.0116 1.022 1.0451 1.0686 1.0919 1.1162 1.1433 1.1711 1.2 1.2325 1.2644 1.3302 1.4014 1.4797 1.5660 1.6608 1.8824 2.1371 2.4485 2.8393 3.3257 4.0056 4.833 5.8949 7.3115 9.2637 11.7298 15.7081 23.2808 33.8769 49.5861 69.1749 119.9236 309.954 623.9671 1181.5082  Table B . l : Some properties of the water/glycerine (C^HsOs) mixture. The source for all data except for glycerine 100% is [47]. Glycerine 100% is from [88].  Appendix C  The Effect of Side Walls, Surface and Ground on Mean Lift-Force Coefficient Based on the experimental findings of Dickinson et al. on Drosophila'''s scaled model [20], one can write the following equation for the mean lift-force coefficient of a wing in distance x from a border:  (C.l) where x = | is the non-dimensional distance associated with a scaled model and  CL O I 0  is the  lift force coefficient of a wing far enough from the border for the change in force coefficient to be small enough to be neglected. The set of equations empirically found by Dickinson et al. is based on a dynamically scaled model with Re — 136 and is for C / ,  >oo  = 1.39. In general, to  the knowledge of the author, r is not only a function of x but also a function of Re and CL Therefore, care should be taken in using (C.l) in cases where Re or C L  j o o  .  differ greatly from  their above-mentioned values. In our experiment, the distance between the nearest point on the wing and a sidewall, surface and ground were measured to be 145 m m , 37 m m and 110 mm, respectively. Figure C . l plots the effect of each border on the mean lift-force coefficient in terms of the distance between the closest point on the wing and the corresponding border based on [20]. Given Figure C . l and  106  107  100 -Top -Side Bottom  110  145 Distance (mm)  200  300  Figure C . l : The effect of the side walls, bottom and ground of the tank on mean lift-force coefficient. assuming r(x) to stay unchanged in our case, and that the three effects are accumulative, the total effect of the side walls, surface and ground on mean lift-force coefficient, E—, can be approximated as: E  = (100 + 32.79) x (100 - 0.3699) x (100 - 14.21) - 100 = +13.5%  (C.2)  Appendix D  Calibration Procedure for ATI-Nanol7 Force Sensor The raw gauge measurement, g, from the force sensor is in counts. This measurement can be mapped to the force/torque by matrix T = diag(Ti,T2,..., T ). Moreover, the sensor has some 6  constant bias offset, b, also in counts. The relationship between the measured wrench, [/ r ] , T  and the raw gauge measurement, g, is:  T(g -b) =  f  (D.l)  Initially, T is known, but b needs to be determined. To this end, as many static measurements as possible are taken, namely, gi for various angles of wing orientation OJJ (e.g., every 5°). The simultaneous equations for these measurements for x and y axes are:  Ti{gij -b ) x  = - Wsinaj  fx  T2(9i,f - bf ) y  y  (D.2)  Wcosai  where W = mwg — Bw is the wing weight subtracted by its buoyant force. Putting the measurements to a matrix form we have:  108  109  9i,fx  1 0  — sinai/Ti  92,fx  1 0  - sino^Ai  9k,fx  1 0  - sin a /Ti  9i,fv  0 1  cos ai/T  92,fy  0 1  cos a jT  _ 0 1  cos a /T  bf  x  k  . 9kjy _  2  k  (D.S)  = Mb  fy  b  2  W  2  2  A least square error (LSE) estimation of the biases can be solved as b =  (M M)~ M b, T  1  T  which solves for bf , bf and W. x  y  In order to increase SNR to get better calibration results, known masses can be added to the wing, in which case the measured weight W is:  W = (m g - B ) + (m g - B ) w  where rriM is the total mass added and  w  BM  M  M  (D.4)  is its buoyant force. The second parentheses should  finally be subtracted from the W to get the first parentheses.  Appendix E  The Contribution of Wing Missing Part to the Total Aerodynamic Force A part of the wing is missing due to the space occupied by the force sensor, wing holder and other attachments. The contribution of this missing part is approximated by comparing F\, the total force produced by a wing having only the missing part (i.e., the complete wing minus the current part), to F , the total force produced by a complete wing. From (3.40) we have: c  _ [fc *2(g)]i 4  ( E 1 ]  The values of b and r|(5) for the complete wing are .4 m and 0.328, respectively. These parameters for the wing comprised only of the missing part have been calculated from Figure 3.13 and Appendix A as 0.144 m and 0.359, respectively. The contribution of the missing part of the wing can be found from (E.l) to be 3.5%, and has been neglected in the calculations of this thesis.  110  Appendix F  Force Filtering: Spectral Analysis of Raw Forces The collected forces on the wing have been filtered using a zero-phase-delay digital low-pass Butterworth filter with cutoff frequency of w = 2.5 Hz. The distribution of power per unit c  frequency of the raw data is plotted in Figure F . l . It is desirable to not filter out any steady and unsteady components of the aerodynamic force trails. Generally it can be seen from the graph that the filter cuts off a small portion of the spectrum.  16 14|  CD  S . 10  6  3  o  CL  4  2 0L 10"  -2  Frequency (Hz)  10  Figure F . l : Power Spectral Density (PSD) estimate via Welch's method.  Ill  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items