Sensing and Sorting Ore Using aRelational Influence DiagrambyMatthew DirksB.C.I.S., Okanagan College, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinFaculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2014c© Matthew Dirks 2014AbstractMining companies typically process all the material extracted from a minesite using processes which are extremely consumptive of energy and chem-icals. Sorting this material more effectively would reduce the resources re-quired. A high-throughput rock-sorting machine developed by MineSenseTMTechnologies Ltd. provides the sensors and diverting equipment. After re-ceiving noisy sensor data, the sorting system has 400 ms to decide whetherto activate the diverters which will divert the rocks into either a keep or adiscard bin.The problem tackled in this thesis is to sort an unknown number of rocksby sensing their mineralogy, position, and size using electromagnetic sensorsand diverting them according to how valuable the mineral is to the mine.In real-time we must interpret the sensor data and compute the best actionto take.We model the problem with a relational influence diagram which showsrelations between random variables, decision variables, and utility nodes.We learn the model offline and do online inference. Inference is achieved us-ing a combination of exhaustive and random search. The model parametersare learned using Sequential Model-based Algorithm Configuration (SMAC).We simulate the diverters for offline evaluation and evaluate our solution onrecorded sensor data. Our result improves over the current state-of-the-artacross the entire range of utility.iiPrefaceA paper based on portions of this thesis has been published, entitled “Rep-resentation, Reasoning, and Learning for a Relational Influence DiagramApplied to a Real-Time Geological Domain” [8]. I am the first author of thepaper, with contributions by: Andrew Csinger, Managing Director at Mine-Sense Technologies Ltd.; Andrew Bamber, CEO at MineSense TechnologiesLtd.; and David Poole, Professor of Computer Science at the University ofBritish Columbia.The experiments performed in Section 3.2 were performed by myself,Matthew Dirks, using equipment owned by MineSenseTM Technologies Ltd.at their premises in Vancouver, British Columbia, Canada.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Sensing and Sorting Ore . . . . . . . . . . . . . . . . . . . . 21.3.1 ConductOreTM . . . . . . . . . . . . . . . . . . . . . . 21.3.2 SortOre . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Value-Based Sorting Algorithm . . . . . . . . . . . . . . . . . 71.5 Related Work in Electromagnetic Sensing . . . . . . . . . . . 72 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Relational Influence Diagram . . . . . . . . . . . . . . . . . . 92.2 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Model Details . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 One Rock: Analytical Solution . . . . . . . . . . . . . . . . . 162.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 16ivTable of Contents2.4.2 Calculate Rock X-Position . . . . . . . . . . . . . . . 162.4.3 Calculate Mineral Content and Rock Size . . . . . . . 202.5 Multi-Rock . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 New Sorting Algorithm . . . . . . . . . . . . . . . . . . . . . . 233.1 Rock Predictor Sorting Algorithm . . . . . . . . . . . . . . . 233.1.1 Timing Considerations . . . . . . . . . . . . . . . . . 233.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 243.1.3 Data Window Size and Performance . . . . . . . . . . 283.2 Real-World Data and Testing . . . . . . . . . . . . . . . . . . 303.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . 303.2.2 Real-World Test . . . . . . . . . . . . . . . . . . . . . 303.2.3 Real-World Test Result . . . . . . . . . . . . . . . . . 304 Offline Sorting and Optimization . . . . . . . . . . . . . . . . 334.1 Diverter Simulator . . . . . . . . . . . . . . . . . . . . . . . . 334.1.1 Activate . . . . . . . . . . . . . . . . . . . . . . . . . 334.1.2 Bin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1.3 Accuracy of Simulation . . . . . . . . . . . . . . . . . 344.2 Algorithm Performance Metric . . . . . . . . . . . . . . . . . 354.3 Utility Tradeoffs . . . . . . . . . . . . . . . . . . . . . . . . . 374.3.1 Utility Example . . . . . . . . . . . . . . . . . . . . . 374.4 Automatic Parameter Configuration . . . . . . . . . . . . . . 374.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45vList of Tables3.1 RPSA Algorithm Parameters . . . . . . . . . . . . . . . . . . 263.2 Counting good and bad rocks in keep and discard bins. . . . 313.3 Definition of TPr, FNr, FPr, and TNr. . . . . . . . . . . . . 313.4 Average rates for real-world runs on SortOre. Best resultsare in bold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1 Average rates for real-world runs on SortOre (real), divertersimulation runs (sim), and the difference between these forboth RPSA and VBSA. . . . . . . . . . . . . . . . . . . . . . 354.2 C(FN) and C(FP ) which correspond to PC(+) values. . . 39viList of Figures1.1 ConductOreTM unit at UBC’s Mining Department and sam-ple rock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 3D plots of two sample rocks on the ConductOre unit. . . . . 41.3 MineSenseTM SortOreTM [22] . . . . . . . . . . . . . . . . . . 51.4 A schematic of the sorter system (SortOreTM). . . . . . . . . 62.1 Relational Influence Diagram . . . . . . . . . . . . . . . . . . 102.2 Example of 2-D Gaussian Function from two perspectives.Darker color represents lower sensor readings, and lightercolor represents higher sensor readings. . . . . . . . . . . . . 122.3 Example of a cut (green line) of a 2-D Gaussian function fromtwo perspectives. . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Relational Influence Diagram (Detailed) . . . . . . . . . . . . 142.5 Two perspectives of the data from Rock Sample 2 taken viaConductOreTM unit with 2 cuts highlighted. . . . . . . . . . 172.6 Rock’s sensor signal on Sensor A and B; intersection high-lighted with a star where y-distance from Sensor B is 2.056cm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.7 Rock x-position calculation. The two rocks in this diagramare the same rock at two different times. . . . . . . . . . . . 203.1 Picture of SortOre running during tests. . . . . . . . . . . . 314.2 Comparison of two sorting algorithms using the best param-eters for each algorithm for each PC(+). Lower values ofNorm(E[Cost]) are better . . . . . . . . . . . . . . . . . . . 40viiAcknowledgementsI would like to acknowledge Dr. David Poole and David Buchman for theirgreat understanding and guidance. I also wish to thank Dr. Andrew Csinger,for his support and vision for innovation.I am particularly grateful for my friends and family, for the support theyhave provided for many years, and to my friends and colleagues, SarwarAlam, Albert Thompson, and Shuo Shen, for great ideas and discussionsthroughout the preparation of this thesis.Finally, I wish to express a special thanks to Rachel Dirks for her love,support, patience, and dedication; I truly appreciate everything you havedone for me.viiiChapter 1Introduction1.1 OutlineIn this chapter we describe the sorting problem, sorting equipment, currentsorting algorithm, and related work. In Section 2.1 we describe our modelof the sorting problem using a relational influence diagram. We describetwo solutions to the problem: the first is an analytical approach which isnot practical for real-time use but motivates our final solution (Sections 2.4to 2.5) and the second is an approximation method, called Rock PredictorSorting Algorithm (RPSA), which uses online optimization. We evaluateRPSA on SortOreTM in Chapter 3. Lastly, we optimize RPSA’s parametersoffline on recorded sensor data using a state-of-the-art algorithm configura-tion method and show improved performance in Chapter 4.1.2 MotivationMining companies typically process all the material extracted from a minesite using processes which are extremely consumptive of energy and chem-icals. Sorting the good material from the bad would effectively reduce re-quired resources by leaving behind the bad material and only transportingand processing the good material. MineSenseTM is a BC-based mining andprocess technology company founded by Andrew Bamber with a vision ofreducing the material which needlessly leaves the mining site. Two of theirtechnologies, ConductOreTM and SortOreTM, make use of high-frequencyelectromagnetic sensors for this purpose (Section 1.3).11.3. Sensing and Sorting Ore1.3 Sensing and Sorting Ore1.3.1 ConductOreTMFigure 1.1: ConductOreTM unit at UBC’s Mining Department and samplerock.ConductOreTM (Figure 1.1) is a High-Frequency Electromagnetic Spec-troscopy sensing device developed by MineSenseTM. ConductOreTM consistsof an electromagnetic coil which measures magnetic and electromagnetic fluxabove the sensor and can detect materials (rocks) which are either conduc-tive or magnetic.A conductive or magnetic rock positioned near a sensor will disrupt themagnetic field produced by the sensor. The sensor is able to measure thisdisruption and report it, which we call a sensor reading. In general, thesensor readings from a sensor increase the closer a rock is to the sensor. Wedefine the function underlying the sensor readings of one rock as electro-magnetic response.The sensor is rotationally symmetric such that if the sensor is rotated,the sensor readings will not change (assuming nothing else has changed).Placing a rock off center in different directions, but at the same distancefrom the center, tends to produce the same signal. To demonstrate thiseffect and to provide data for later experiments, a square grid spaced at 121.3. Sensing and Sorting Orecm increments was created. Rock sample 1 and rock sample 2 were placedat all grid points to record the sensor reading of each rock at each point.The result is visualized in Figure 1.2. Notice the contour lines projectedonto the bottom plane; each contour is approximately circular due to thesymmetric property. Furthermore, the projections on the coordinate planesat y = 11 and x = 1 appear to be Gaussian-like – this will be importantlater.1.3.2 SortOreIn SortOreTM (Figure 1.3), rocks are dumped on a conveyor belt, movingdownward on the y-axis as in the schematic in Figure 1.4. As rocks travel,they pass over a sensor array consisting of 7 electromagnetic sensors (eachone the same as a ConductOreTM sensor). The computer processes thesensor readings while the rocks travel to the end of the conveyor belt wherethey fall off onto the diverter array which may have one or more divertersactivated, displacing rocks into the keep bin or else leaving them to fallinto the discard bin.The sensor array is mounted beneath the conveyor belt; as the beltmoves, the rocks are moved over the sensor array, producing a signal overtime. The diverter array consists of 7 diverters in line with the 7 sensors.The diverters are like flippers in pinball hitting rocks instead of metal balls.When a diverter is moving into the upward (activated) state, rocks passingover will be hit into the keep bin. Furthermore, so long as the diverter iskept in the activated position, any rocks passing over will be diverted intothe keep bin as well. When the diverter is down (de-activated), any rockspassing over will end up in the discard bin.Rocks are considered good when they contain a percentage of valuablemetal above some threshold (dependent on the type of metal and the par-ticular mining site). Any rocks that fall below this threshold are consideredbad rocks. The goal is to sort all the good rocks into the keep bin, and thebad rocks into the discard bin.The sensors in the array are staggered; there are two rows of sensors31.3. Sensing and Sorting Ore(a) Rock Sample 1(b) Rock Sample 2Figure 1.2: 3D plots of two sample rocks on the ConductOre unit.41.3. Sensing and Sorting OreFigure 1.3: MineSenseTM SortOreTM [22]with the odd numbered sensors in one row and the even numbered sensorsin the other row (as in Figure 1.4). They are designed this way becausehad the sensors been closer together, a rock positioned in between sensors1 and 2 may have the eddy currents produced by sensor 1 interact withthe currents produced by sensor 2. The sensors are positioned to reducethe interference from neighboring sensors. For the sake of presentation andalgorithm simplicity, we describe them as being in the same row and assumeindependence between the sensors.Furthermore, we assume that the movement (and speed) of the rock doesnot affect the sensor readings. Thus all the sensor readings can be seen assnapshots.51.3. Sensing and Sorting OreFigure 1.4: A schematic of the sorter system (SortOreTM).61.4. Value-Based Sorting Algorithm1.4 Value-Based Sorting AlgorithmWe define a sorting algorithm to be any algorithm which takes sensorreadings as input and outputs decisions on how to sort. Currently the exist-ing sorting algorithm is to sort rocks based solely on the sensor reading value.We call this algorithm the Value-Based Sorting Algorithm (VBSA).If a sensor’s reading is above a predefined threshold, then the diverterin line with that sensor is told to activate when the rock has moved intoposition over the diverter, hitting any rocks passing over it into the keepbin, otherwise the rocks are left to fall into the discard bin. This algorithmis quite trivial and as such can easily process every millisecond of data inless than a millisecond so lends itself quite naturally to real-time sorting.We use this algorithm as the baseline which we aim to surpass.This algorithm has some inherent flaws. Firstly, if a rock passes inbetween two sensors, the sensor readings from either sensor will be signifi-cantly lower than it would otherwise have been had it passed directly overthe sensor. In this case, a comparison of sensor reading to threshold mayyield an incorrect decision to not activate the diverter(s). Furthermore, thisalgorithm tends to activate the diverters over long segments of time. Forexample, a diverter may be activated from the moment a sensor encountersthe edge of a large good rock until the opposite edge. Ideally, the diverterwould be activated for as short of time as possible at the rock’s center ofmass. We aim to overcome these drawbacks.1.5 Related Work in Electromagnetic SensingSurprisingly, little research has been done in the use of electromagnetic coilsin sensors to detect and locate minerals. Much research has been done inelectromagnetic sensing for other applications, such as detecting cracks inmetal pipes [3] or detecting and locating unexploded ordnance beneath theground, however at a long range and with the goal of identifying singleobjects at a time [7]. Likewise, inductive sensing has also been used in coinrecognition, but again only considers the case of single objects [23] [5].71.5. Related Work in Electromagnetic SensingMore closely related work by David Lowther et al. [20] is in electromag-netic device simulations in a full 3D setting in which they very accuratelyapproximate electromagnetic physics. However, computation time was notappropriate for our needs - ranging from 1 second to 1 hour typically -whereas our work requires very fast approximations in less than 1 second.8Chapter 2Modelling2.1 Relational Influence DiagramProblems where the set of objects is unknown – in our case there are anunknown number of rocks – are called open-universe problems [21]. Wedescribe our open-universe problem using a relational influence diagram.Influence diagrams, developed by Howard and Matheson ([13], [26]), are anextension to belief networks and consist of 3 node types: random variables(ovals), decision variables (rectangles), and utility nodes (diamonds). Ar-rows going into random variables represent probabilistic dependence: thevalue of a random variable is probabilistically dependent on its parents. Ar-rows going into decision variables represent the information available whenmaking a decision. Arrows going into a utility node represent deterministicdependence on the value of its parents.We extend influence diagrams further with the idea of relations betweenvariables as in relational probabilistic models (see [24] for an overview). Asimilar technique is used under the name relational decision diagram [14].We use plate notation [4] to avoid redundancy when a node, or set of nodes,is repeated more than once. A plate is drawn as a box around one or morenodes, and is labelled with the name of a class. For instance, we have zeroor more rocks which are represented by the plate labelled “Rock (≥ 0)”.The nodes within a plate are duplicated, along with any arrows connectedto them, for each instance of the class.We chose to use a relational influence diagram to explain the model andto reveal explicitly what is and isn’t being modelled. We have two diagrams:the first one (Figure 2.1), shows the system at a high-level and we believeis conceptually easier to understand, and the second (Figure 2.4) describes92.1. Relational Influence DiagramRock PropertiesSensor (7)Sensor SignalActivateDiverter (7)UtilityElectromagnetic ResponseBin(keep or discard)Sensor ReadingRockSensor SignalRock ( 0)Figure 2.1: Relational Influence Diagramthe same system broken down into simpler components with their relationsexplained more clearly.First, refer to the high-level influence diagram (Figure 2.1). There is anunknown number of rocks, which may or may not be overlapping, each withrock properties describing the size, minerality, and position of the rock.We assume rocks are circles and we do not model mineral type (Iron, Copper,etc), instead we focus on amount of mineral content which is assumed to bespread equally throughout the entire rock.A rock’s properties influence the electromagnetic response (defined inSection 1.3.1). The rocks do not necessarily pass directly over a sensor,but may pass between sensors or over multiple sensors. The rock sensorsignal node is within two plates (rock and sensor) in Figure 2.1 becausefor each rock, there is one rock sensor signal for each of the 7 sensors. The102.2. Gaussianrock sensor signal is the signal of the rock over a particular sensor, whichdepends on the rock’s electromagnetic response and on how close the rockis when it passes the sensor.For each sensor, sensor signal combines each rock sensor signal fromthe rocks passing (partially or wholly) over the sensor. Finally, we modela sensor reading as a sample from the sensor signal at discrete times(milliseconds).Given the sensor readings from all 7 sensors, a decision must be madeto activate or not activate each diverter for each millisecond in time. Thediverters guide each rock into either the keep bin or the discard bin.Finally, the bin and mineral content determine the value of the utilityfunction. The bin which each rock ended up in and whether the rock wasgood or bad (high or low mineral content respectively) determines how wellthe sort performed.2.2 GaussianWe have chosen to use 2-D and 1-D circularly symmetric Gaussian functionsfor modelling electromagnetic response and rock sensor signal. The 1-DGaussian takes the following form:f(x) = a e−1/2(x−b)2s2 (2.1)Where a is the height of the Gaussian’s peak, b is the position of thecenter of the Gaussian’s peak (which we refer to as mean), and s controlsthe width of the Gaussian (also called standard deviation in the case ofGaussian distributions).A 2-D Gaussian, shown in Figure 2.2, is defined as:f(x, y) = a e−1/2 ( (x−b)2s21+ (y−c)2s22)(2.2)where a is the height, (b, c) is the 2-dimensional mean, s1 controls thewidth on the x-axis, and s2 controls the width on the y-axis. Since we are112.2. Gaussian(a) Gaussian - View from above (b) Gaussian - 3D ViewFigure 2.2: Example of 2-D Gaussian Function from two perspectives.Darker color represents lower sensor readings, and lighter color representshigher sensor readings.using circularly symmetric Gaussians, s1 = s2.We define a cut of a 2-D Gaussian to be the resulting function of takinga plane and “cutting” through the 2-D Gaussian surface. An example of acut is shown in Figure 2.3.A cut can be created by a vertical plane which is parallel to either the x-axis or y-axis. For instance, a cut parallel to the y-axis is derived by allowingx to be constant, then separating the function into two 1-D functions, one ofwhich will be constant. Therefore, Equation 2.2 simplifies to the followingwhich is a 1-D Gaussian function:f(x, y) = a e−1/2(x−b)2s2 e−1/2(y−c)2s2f(y) = ah e−1/2(y−c)2s2(2.3)where h = e−1/2(x−b)2s2 as read directly from Equation 2.3 and is a con-stant because x is constant, and ah is the height of the cut. Also note thatif x = b, then h = 1 and the height of the cut becomes simply a whichwas the height of the 2-D Guassian’s peak to begin with from Equation 2.2.122.3. Model Details(a) Gaussian Cut - View 1 (b) Gaussian Cut - View 2Figure 2.3: Example of a cut (green line) of a 2-D Gaussian function fromtwo perspectives.Gaussian separability results in a couple useful properties:• Gaussian Cut: When a 2-D Gaussian is cut, the resulting cut is a 1-D Gaussian function. Furthermore, a cut crossing the 2-D Gaussian’smean will have the same height as the 2-D Gaussian.• Symmetric Gaussian Width: When a symmetric 2-D Gaussian iscut, the resulting 1-D Gaussian will have the same width.2.3 Model DetailsWe describe the relationship between rocks and sensors in more detail inthe detailed relational influence diagram in Figure 2.4 which is a refinementof the high-level influence diagram (Figure 2.1) and which helps to explainhow each variable is connected.Rocks are denoted with an i subscript and sensors are denoted with a jsubscript (1 thru 7). We model rock properties as 4 components: mineralcontent (ci), size as a radius (ri), and position (xi, yi).Electromagnetic response is modelled as a circularly symmetric 2-D132.3. Model DetailsSensor (7)Sensor Signal(f j )ActivateDiverter (7)UtilityHeight (h i)Electromagnetic ResponseBin(keep or discard)Sensor ReadingHeight (h ij )Rock Sensor SignalRock ( 0)X - Position (x i)Y - Position (yi)Mineral Content (ci)Size Radius (ri)Width (si)Electromagnetic ResponseRock PropertiesFigure 2.4: Relational Influence Diagram (Detailed)Gaussian function with mean, height, and width. The mean is assumedto be equal to the rock’s position (xi, yi).The width (si) parameter depends on the size of rock i:si = ari + b (2.4)where slope (a) and bias (b) are parameters which are learned. The height(hi) is proportional to the rock’s mineral content:hi = cim (2.5)where m is a parameter which is learned.We model a rock sensor signal as an univariate (1-D) Gaussian function142.3. Model Detailswhich is a cut from the 2-D Gaussian of the electromagnetic response andhas three parameters: mean, height, and width.When a rock passes directly over a sensor, the 1-D Gaussian of the rocksensor signal has equal width and height as the 2-D Gaussian of the electro-magnetic response. In general, when a rock passes by a sensor (not directlyover), the rock’s sensor signal is a cut (1-D Gaussian) of the electromagneticresponse (2-D Gaussian). The mean is simply equal to the rock’s y-position.The width is equal to the electromagnetic response’s width because of thesymmetric Gaussian width property. The height of the 1-D Gaussian ismodelled as the variable height (hij) that is a point on a cut parallel to thex-axis. hij is derived from Equation 2.3 by allowing y to be constant.hij = hie−1/2(xj−xi)2s2i (2.6)where hi is the height of the electromagnetic response, xj is the x-positionof the jth sensor, xi is the x-position of rock i, si is the width of the electro-magnetic response for rock i, and hij is the height of the rock sensor signalfor rock i on sensor j.Lastly, we model sensor signal as a sum of all the rock sensor signals.Since rock sensor signal is a 1-D Gaussian, summing creates a sum of Gaus-sians.fj (t) =n∑i=1hije−1/2(t−yi)2s2i (2.7)where fj is the sensor signal for sensor j, t is continuous time, hij is theheight of the rock sensor signal for rock i on sensor j, yi is rock i’s y-position,and si is the width of the electromagnetic response for rock i.152.4. One Rock: Analytical Solution2.4 One Rock: Analytical Solution2.4.1 IntroductionWe first present an analytical solution for one rock. We show how well thisattempt works, and outline its drawbacks. In the end, this solution doesnot become the sorting algorithm we use to solve the sorting problem formultiple rocks in real-time but motivates our final solution. Refer to thedetailed influence diagram again (Figure 2.4), we estimate hij , si, and yithen analytically calculate the remaining rock properties ci, ri, and xi.The first step is to take the sensor readings and estimate the sensorsignal. Recall that we model a sensor signal as a sum of Gaussians, butsince there is only one rock in this example, the sum of Gaussians is overonly one Gaussian and is therefore equivalent to the rock’s sensor signal (1-DGaussian). Therefore, to estimate the sensor signal and rock sensor signal,we fit a 1-D Gaussian to the sensor readings of each sensor. This estimationprovides us hij , si, and yi.2.4.2 Calculate Rock X-PositionExampleFor this example, we use the data from Rock Sample 2 of Figure 1.2. Thisdata was collected from a ConductOreTM unit, but we use it as if it weredata collected by the SortOreTM (the sensors are the same). Let the y-axis represent time as the rock moves over the conveyor belt, and let anarbitrary sensor’s readings be any column (cut) of data (the points alongany x-position as y varies). Let Sensor A and Sensor B be two neighbouringsensors, shown in Figure 2.5, with sensor readings at x = 3 and at x = 9respectively which are highlighted as green stars. These two sensors are 6cm from each other. The line passing through the sensor readings (stars)of each sensor is its sensor signal. In this example, the rock’s x-position atx = 5 is the solution we are looking for – highlighted as a bold black line inFigure 2.5 (b).Recall from Section 1.3.1 that sensors are rotationally symmetric, and162.4. One Rock: Analytical Solution(a) Two rock sensor signals - 3D view(b) Two rock sensor signals - View from aboveFigure 2.5: Two perspectives of the data from Rock Sample 2 taken viaConductOreTM unit with 2 cuts highlighted.172.4. One Rock: Analytical Solutiontherefore a rock placed α cm away from a sensor will have the same sensorreading regardless of direction from the sensor, so long as α remains constant.Thus a sensor reading from Sensor A which is equal to a sensor reading fromSensor B represents a rock with distance from Sensor A equal to distancefrom Sensor B. We take each rock sensor signal and find where they intersectwhen positioned 6 cm apart from each other, because the sensors are 6 cmapart in this example (Figure 2.6).Figure 2.6: Rock’s sensor signal on Sensor A and B; intersection highlightedwith a star where y-distance from Sensor B is 2.056 cm.In Figure 2.6, the x-axis shows the relative y-distance of the rock fromeach sensor. The y-position of the rock relative to each sensor is known sincewe can measure time as the rock moves, and the peak of the rock’s sensorsignal is where the rock’s y-position is equal to the sensor’s y-position.The star in Figure 2.6 highlights the intersection point — this is where182.4. One Rock: Analytical Solutionthe sensor reading from Sensor A is equal to the sensor reading from SensorB. Therefore, the rock has an equal distance from Sensor A as from Sensor B.From Figure 2.6 we know that the rock’s estimated y-distance from SensorA is 3.944 cm and y-distance from Sensor B is 2.056 cm. However, wewant to know the rock’s x-distance from either sensor, not the y-distance.Two congruent triangles are formed from the information we have, shownin Figure 2.7. The hypotenuses of the two triangles are equal since theintersection point is when the rock has an equal distance from Sensor A asfrom Sensor B. The information we have is shown in Equation 2.8. We solvethe system of equations below to determine the rock’s x-position:y1 = 3.944y2 = 2.056x1 + x2 = 6y1 + y2 = 6(x1)2 + (y1)2 = d2 = (x2)2 + (y2)2··· x1 = y2, x2 = y1(2.8)ExperimentWe setup an experiment to determine the accuracy of using the methodfrom Section 2.4.2 to calculate the x-position on real data. In the coordinatesystem of Figure 2.5, the rock is positioned at (5, 5).In the example in Section2.4.2, Sensor A’s x-position was 3, and Sensor B’s x-position was 9; Now wemove Sensor A’s x-position to be one of {1, 2, 3, 4, 5} (cm) and Sensor B’sx-position be one of {6, 7, 8, 9, 10} (cm) and take all the combinations fora total of 25 possibilities. The rock is always between the sensors, and thesensors’ x-positions move relative to the rock with the distance betweensensors varying.We collected data via ConductOreTM of two sample rocks (Figure 1.2),Rock Sample 1 and Rock Sample 2, giving us 50 test cases total. For each192.4. One Rock: Analytical SolutionFigure 2.7: Rock x-position calculation. The two rocks in this diagram arethe same rock at two different times.of the 50 possibilities, we calculated a position estimate, and compared theestimate to the actual position. All of our estimations were within 1 cm ofthe true position, and 88% were within 0.5 cm.2.4.3 Calculate Mineral Content and Rock SizeIn the previous section we calculated rock x-position (xi), we still need tocalculate ci and ri. In this section we will solve for electromagnetic responseheight (hi) followed by rock mineral content (ci).Using the Gaussian cut property, we solve Equation 2.6 for the 2-DGuassian’s height (hi) given a 1-D Gaussian as shown below:hi = hije1/2(xi−xj)2s2i (2.9)202.5. Multi-RockThen we solve for ci from Equation 2.5:ci = hi/m (2.10)Lastly, we solve for ri from Equation 2.4, which becomes:ri =si − ba(2.11)We used a rough estimate of a, b, andm based on our data and knowledgefrom experts in the field – later we automatically learn these parameters.2.5 Multi-RockWe attempted to extend the Analytical Solution for one rock to multiplerocks. When multiple rocks are present near a sensor, the rock sensor sig-nals from each rock are summed to create the sensor’s signal (a sum ofGaussians). We separated the sensor signal of each sensor into rock sensorsignals using maximum likelihood estimation (MLE) with a sum of Gaus-sians fit to the sensor readings, with the hope of later determining to whichrock each rock sensor signal belongs to.MLE required searching over the entire space of the model, which inthis case was over 3n parameters for each of the 7 sensors, where n is themaximum number of rocks (chosen arbitrarily). Searching over the entirespace is not feasible in a real-time scenario. Nonetheless, we experimentedwith global optimization methods available in Python’s SciPy library [17]including gradient-based methods [27].After fitting the sum of Gaussians we correlate rock sensor signals totheir corresponding rock by matching up the mean of a rock sensor signalon one sensor with a nearby mean of a rock sensor signal on another sensor.Once each rock sensor signal was correlated to a rock, we proceeded withcalculating the electromagnetic response and rock properties analyticallyjust as in the one rock scenario.Correlating the rock sensor signals works reasonably well for two sensors,but for more than two sensors correlating rock sensor signals becomes am-212.5. Multi-Rockbiguous. Furthermore, fitting the sum of Gaussians often would find moreGaussians than there were real rocks due to the noise in the data, and thesearch procedure is computationally difficult due to the large search space.Therefore, the drawbacks above motivate using a search procedure whichsearches for rock properties directly.22Chapter 3New Sorting Algorithm3.1 Rock Predictor Sorting AlgorithmWe develop a new ore sorting algorithm called Rock Predictor Sorting Al-gorithm (RPSA) where we search over rock properties directly.3.1.1 Timing ConsiderationsWe have two time constraints to meet. Firstly, to be real-time the algo-rithm should produce decisions at the same rate as the rocks passing overthe sensors. Secondly, there is a time window of less than 1 second betweensensing a rock and acting upon it. RPSA does not process sensor read-ings independently however, but collects a group of them before processingbegins.Data WindowWe define a data window to be a group of sensor readings collected priorto processing. Since the system runs continuously and indefinitely, we can-not collect all readings and process them together to provide a completepicture. Therefore, we collect some number of sensor readings as definedby the data window, process the sensor readings, then collect another datawindow starting where the previous left off (there is no overlap between datawindows).Cutting off data in this manner may cause the boundaries of the datawindow to pass through a rock, providing only a partial reading of that rock.When a rock is on the boundary, it may be detected in both data windows.The larger the data window, the less often this will occur.233.1. Rock Predictor Sorting Algorithm3.1.2 AlgorithmAlgorithm 1 Rock Predictor Sorting Algorithm (RPSA)1: procedure RPSA(trueSensorReadings)2: for separatePass in numSeparatePasses do3: for rock in tempRocks do . resets tempRocks4: rock.size← 05: for pass in numPasses do6: for i in numRocksPerSensor do7: for innerPass in numInnerPasses do8: for s in sensors do . for each of 7 sensors9: for randJumps in numRandJumps do10: tempRocks2← tempRocks . tempRocks set copied11: rock ← tempRocks2[i] . reference to rock instance12: rock.x← random within conveyor width13: rock.y ← random within data window boundary14: for s in sizes and m in mineralContents do15: rock.size← s16: rock.mineralContent← m17: newCost← modelCost(tempRocks2, trueSensorReadings)18: if newCost < bestCost then19: tempRocks← tempRocks2 . tempRocks2 set copied20: bestRocks← tempRocks2 . tempRocks2 set copied21: bestCost← newCost22: return bestRocksRPSA pseudo-code is shown in Algorithm 1. RPSA has many algorithmparameters which we learn in Chapter 4 and which are listed in Table 3.1.A rock is an object with 4 properties: x, y, size, and mineralContent.The variables tempRocks, tempRocks2, and bestRocks are sets of rock ob-jects. RPSA goes through a number of outer and inner passes within whichwe perform a search over rock properties for each rock in a set of rocks,populating each rock object’s properties as we go. A cost function is usedto determine if one set of rocks is better than another. The number of rocksvaries for every set of sensor readings. We parametrize the maximum num-ber of rocks the search can consider (line 6) and later we automatically learn243.1. Rock Predictor Sorting Algorithm23: procedure modelCost(rocks, trueSensorReadings)24: cost← 025: for j in sensors do26: Generate rock sensor signals from rock predictions for sensor s27: sensorSignal←∑i∈rocks rockSensorSignali,j28: for k = 0, k < dataWindowSize, k+ = stepSize do29: cost + =(sensorSignal[k]− trueSensorReadings[k])230: activeRocks← { i | i ∈ rocks, i[size] 6= 0 }31: for rock1 in activeRocks do32: for rock2 in rocks \{rock1} do33: dist← distance between rock1 and rock234: overlapRatio← ratio of overlap between rock1 and rock235: cost + =overlapRatio2 ∗ overlapCostWeight36: cost + =nCostWeight ∗ |activeRocks|37: return costits value. The numSeparatePasses parameter defines the number of sepa-rate passes the algorithm will perform which starts a new search over rocksfrom scratch. numPasses is the number of times to give each rock anotherchance at improving the cost function. numInnerPasses is basically thenumber of times to loop through each sensor, since the search is performedsequentially over the sensors in line 8.The search can be stopped at any time in which case the algorithmreturns the best result it has found (called bestRocks). The longer the algo-rithm runs the better the result. This makes RPSA an anytime algorithm[29] (sometimes called a flexible algorithm [12]). We have experimentedwith a number of search algorithms including exhaustive search, MCMC[25], gradient-based methods, and derivative-free random search [2]. Anexhaustive search over all possible rock properties cannot be done in a rea-sonable amount of time. Similarly, MCMC and gradient-based methods aretoo computationally intensive for our timing requirements. We found thatwhat works the best is a mix between random and exhaustive search.253.1. Rock Predictor Sorting AlgorithmParameter Name SizestepSize 100numRocksPerSensor 5numRandJumps 300numPasses 10numInnerPasses 10numSeparatePasses 4mineralContents 5overlapCostWeight 2000nCostWeight 2000stddevSlope 1500stddevBias 26rockMaxMag 2500threshold 1001Table 3.1: RPSA Algorithm ParametersRandom SearchOur random search randomly jumps to new rock positions (x, y) for everyrock. numRandJumps is a parameter which defines the number of randomsamples to perform per iteration of the loops (line 9), typically around 100samples. We constrain a rock’s x-position to be within the edges of theconveyor belt (line 12), and the y-position to be within the boundaries ofthe data window (line 13).Exhaustive SearchFor every position of every rock, we do a coarse exhaustive search over allvalues of rock size and rock mineral content (line 14), where rock size hasonly 3 possible values including 0 and mineral content has up to 10 values.To allow for a variable number of rocks within the maximum, we allowrock size to be 0 and consider such rocks to be non-existent. A rock’s sizeshould be between 3 and 15 cm approximately (varies between mining sites)but we discretize to only 4.5 and 7 cm. Furthermore, a rock’s mineral contentshould be within 0 and 1 which we also discretize. The values of mineral263.1. Rock Predictor Sorting Algorithmcontent is parametrized (mineralContents, line 14) and are automaticallylearnt later.CostWe use the maximum a posteriori (MAP) probability model where the goalis to maximize the probability of the model given the data [18]. Our modelis a set of rocks, and the data is the set of sensor readings for all the sensors.The MAP model is the one that maximizes likelihood times the prior. Like-lihood is the likelihood of the sensor reading data given the set of rocks andprior is the probability of any particular set of rocks a priori. Equivalently,we can take the negative log and minimize, which is what we do with a costfunction.Our cost function, with value cost, is effectively the negative log of thelikelihood plus the negative log of the prior. The set of rocks with thelowest cost is assumed to be the most likely estimate. Cost is calculated inthe procedure “ModelCost” in Algorithm 1, line 23, which is the sum of thefollowing:• Likelihood: Difference between predicted and actual sensor readings(line 29).• Prior: Cost associated with overlap between rocks (line 35).• Prior: Cost associated with number of rocks (line 36).LikelihoodAs in the detailed influence diagram (Figure 2.4), if we provide rock prop-erties we can calculate electromagnetic response, rock sensor signals, andthe final sensor signal for each sensor (lines 26-27) which uses the param-eters a, b, and m. The parameters a, b, and m are renamed stddevSlope,stddevBias, and rockMaxMag respectively and are not shown in the pseudo-code, but are defined in Equations 2.4 and 2.5. We use a sample from thesensor signal as the predicted sensor reading at the kth point in time, writ-ten sensorSignal[k]. The difference between each predicted sensor reading273.1. Rock Predictor Sorting Algorithmand actual sensor reading is summed to create a cost which we will try tominimize. Rather than calculating the difference for every sensor reading,we skip over some as defined by the parameter stepSize whose value weautomatically learn later. By skipping some sensor readings, we improvecomputation time which allows for more iterations of the algorithm to beperformed, increasing the quality of the result.PriorsWe penalize for overlapping rocks because rocks tend to fall beside each otherrather than sit on top of each other. In lines 31-35 we loop through every pairof rocks, measure the distance between them, and calculate the percentageof overlap, which we call overlapRatio. We take the square of overlapRatiobecause we assume that two rocks on top of each other are exponentiallyless likely than two rocks that are barely overlapping. The overlapRatio isweighted by a parameter, overlapCostWeight. The number of rocks is alsopenalized such that more rocks is more heavily penalized than fewer rocks(line 36). The number of rocks is multiplied by a parameter, nCostWeight.Both overlapCostWeight and nCostWeight can be 0, in which case thecalculations are not performed.ThresholdOnce Algorithm 1 has completed, there is one final step not shown. With thepredicted set of rocks (bestRocks) we compare each rock’s mineralContentto a threshold parameter. If the mineral content exceeds this threshold, therock is classified as good and the appropriate diverter decisions are madeaccordingly.3.1.3 Data Window Size and PerformanceA shorter data window results in faster algorithm run time. Lines 9 - 19 ofRPSA (Algorithm 1) are executed for every rock. The ModelCost procedureis performed for every configuration of a rock (line 17) which loops over allthe sensor readings within the data window and loops over every pair of283.1. Rock Predictor Sorting Algorithmrocks. The number of rocks is linearly dependent on the size of the datawindow, therefore the complexity within each inner pass in RPSA is roughlyO(dataWindowSize3).Because of the complexity, we want to keep the data window as small aspossible. Our belt speed is 0.5 m/s, and sensor readings are taken at 1 msintervals. Based on the largest rocks we may encounter and the speed of thebelt, we estimate that a data window of 100 sensor readings is sufficientlylarge to allow the majority of a rock to pass completely over the sensor.Any smaller and no rock would ever have its complete signal recorded. Dueto networking overhead, 200 sensor readings was the smallest we could gowithout a bottleneck building up in packet transmission. Furthermore, thelarger the data window, the less time remains between gathering the groupof sensor readings and requiring a decision to divert. Therefore we set thesize of the data window to 200 sensor readings which takes 200 ms to collect.To be real-time, an algorithm running on a single core would have totake 200 ms or less to compute a decision before it is required to moveon to the next data window. We use parallel processing via MPI[1] onour computer with 8 virtual cores (Intel processor with hyper-threading, 4physical cores); We leave 1 unassigned to allow the operating system to runwithout interfering and we use 2 to manage communication to and from thediverters and sensors respectively. The 5 remaining are used as workers. Wedistribute the workload by pipelining the jobs round-robin style; each jobcontains one data window.In theory, each worker could use up to 1000 ms each (5 ∗ 200 = 1000ms). However, cores with hyper-threading do not get 100% of processingpower, therefore we empirically determined that a deadline of 400 ms wasthe highest we could go with our computer to keep up with the real-timeconstraint. We keep to this 400 ms deadline throughout the remainder ofthis thesis.293.2. Real-World Data and Testing3.2 Real-World Data and Testing3.2.1 Data CollectionWe performed real-world tests and recorded all the sensor data and a videoof the rocks passing over the conveyor belt. This data is used to rerun testsoffline in the next chapter. We ran 240 rocks per algorithm (approximately– some rocks split into two during the process). These rocks consisted of3 different types: 1/4 are “good” rocks with a high concentration of metal(painted green), 1/4 are “bad” which have a small concentration of metal(painted red), and the remaining 2/4 are also “bad” but with no significantmetal content (unpainted - grey or white).We synchronized the video to the sensor data and manually labelledevery rock’s position on the conveyor belt, its approximate size, and itsmineral content as either good or bad. The rocks were painted red andgreen to make this visual classification possible.3.2.2 Real-World TestAs a sanity check to ensure the feasibility of our algorithm in the real-world,our real-world tests ran the Value-Based Sorting Algorithm(VBSA) and theRock Predictor Sorting Algorithm (RPSA). The results from this test arepreliminary, as we hadn’t optimized the algorithm parameters at this pointbecause the main purpose was to collect sensor data. VBSA has 1 parameter,threshold, and RPSA has parameters listed in Table 3.1. Both VBSA andRPSA are run with a single parameter configuration which is a hand-pickedbest guess; in Chapter 4 we automatically configure the parameters.3.2.3 Real-World Test ResultWe measure the performance of a sorting algorithm in terms of a countof True Positives, False Negatives, True Negatives, and False Positives (TP,FN, TN, and FP respectively – see Table 3.2). For easy comparison betweenruns we use rates - True Positive Rate denoted as TPr and so on (Table 3.3).303.2. Real-World Data and TestingThe number of good rocks is equal to TP + FN , and the number of badrocks is equal to FP + TN .keep discardgood True Positive (TP) False Negative (FN)bad False Positive (FP) True Negative (TN)Table 3.2: Counting good and bad rocks in keep and discard bins.TPr = TPTP+FN FNr =FNTP+FN FPr =FPFP+TN TNr =TNFP+TNTable 3.3: Definition of TPr, FNr, FPr, and TNr.Our results from running on SortOre are shown in Table 3.4.Figure 3.1: Picture of SortOre running during tests.We notice that the base-line sorting method (VBSA) captured 31% moregood rocks into the keep bin (TP) than our method (RPSA) did but italso had 83% more bad rocks in the keep bin (FP) than RPSA. Thus, ifone’s preference is to minimize bad rocks from entering the keep bin thenRPSA performed better. However, it may be the case where the oppositeis preferred: maximize true positives (TP) as much as reasonably possible313.2. Real-World Data and TestingRPSA VBSATPr 0.620 0.813FNr 0.380 0.188FPr 0.206 0.378TNr 0.794 0.622Table 3.4: Average rates for real-world runs on SortOre. Best results are inbold.while still discarding a large percentage of bad rocks (TN). We will look atthese trade-offs in Chapter 4.32Chapter 4Offline Sorting andOptimization4.1 Diverter SimulatorDifferent mining sites will have different distributions of rocks and the min-eral type being mined will affect the costs associated with those rocks.Therefore, the utility changes between mining sites. Our goal is to optimizethe utility of the mine site and operator. Since we don’t know the mine sitesbeforehand, we want to optimize over all possible mine sites and evaluateperformance for each so that any mine operator can read directly from theresults what performance to expect and what algorithm or algorithm pa-rameters to use given their mine. Furthermore, knowing the performanceacross all utilities will allow us to compare future sorting algorithms to ourswithout favoring a particular utility.Training an algorithm with real-world tests is expensive in terms of man-hours and SortOre equipment time, both of which are limited; instead wesimulate the behavior of the SortOre’s diverters and run sorting algorithmsoffline. The simulator models the activate and bin nodes from the detailedinfluence diagram (Figure 2.4). The goal of diverter simulator is to compareone algorithm relative to another, so a perfect simulation is not required.4.1.1 ActivateA sorting algorithm outputs decisions of if and when to activate each di-verter. When a diverter is activated, there is a delay between the deactivatedand activated states while the diverter physically moves into place. When a334.1. Diverter Simulatordecision to divert is made at x ms, the diverters are activated at x ms minusthe delay time. We measured the delay time from a video captured of thediverters to use in our simulation.4.1.2 BinWhen doing tests on the real device we can physically count the number ofgood and bad rocks in each bin. In simulation, however, we calculate whereeach rock would probably end up. The sorting decisions from a sortingalgorithm are compared to manually labelled rocks to determine whetherthey should be diverted into the keep bin or left to fall into the discard bin.However, an activated diverter does not guarantee that the rocks passingover it will end up in the keep bin - and similarly for a deactivated diverter -due to a variety of reasons including but not limited to: contacting a rock onits edge, rocks tumbling to some other position, and rocks colliding into otherrocks. Therefore, the result of activation is stochastic. The probabilities arelearnt from the data to ensure that the average result from a batch of rocksrun on SortOre is comparable to the same batch run in simulation.A rock flies out entirely on occasion, missing both bins, or may split into2 smaller rocks due to the force of the diverters but these were rare so weassume they do not occur.4.1.3 Accuracy of SimulationThe performance of any sorting algorithm as run on the diverter simulatorreasonably estimates the same sorting algorithm run on SortOre. To showthis we take the recorded sensor data from the real runs and run it throughthe diverter simulator process with one of the sorting algorithms, the re-sulting TP/FP/TN/FN should have the same, or similar, averages as to thereal-world runs. As seen in Table 4.1, the difference between the real-worldruns and the diverter simulation is between about 3% and 7%. Next we willexplore modifications to the sorting algorithm’s parameters and evaluatetheir performance on this simulation platform.344.2. Algorithm Performance MetricRock Predictor Sorting Algorithm Value-Based Sorting Algorithmsim real difference sim real differenceTPr 0.646 0.620 0.026 0.746 0.813 0.067FNr 0.354 0.380 0.026 0.254 0.188 0.067FPr 0.230 0.206 0.025 0.353 0.378 0.025TNr 0.770 0.794 0.025 0.647 0.622 0.025Table 4.1: Average rates for real-world runs on SortOre (real), divertersimulation runs (sim), and the difference between these for both RPSA andVBSA.4.2 Algorithm Performance MetricSorting algorithms perform differently depending on the configuration ofthe parameters. Diverter simulator allows us to vary the algorithm parame-ter configurations and see the change in performance (TP/FP/TN/FN willvary). For instance, the threshold in VBSA (Section 1.4) may be increasedto cause less rocks to go into the keep bin. In some cases it may be that onlyrocks with high-concentrations of metal are worth keeping, in which case itis preferable to raise the threshold until that preference is met.These preferences are expressed as misclassification costs. For exam-ple, if a good rock going into the discard bin (FN) is twice as costly as a badrock going into the keep bin (FP), then the misclassification costs would beas follows:C(FN) = 2C(FP ) = 1(4.1)Note that C(FP ) and C(FN) are unitless measures – the ratio betweenC(FN) and C(FP ) is what matters. It is reasonable to assign misclassifi-cation costs in terms of monetary costs and loss of profits.Class probabilities are the probability of a positive sample (good rock),P (+), and probability of a negative sample (bad rock), P (−). In the real-world evaluation (Section 3.2), 1/4 of the rocks are positive. Thus the class354.2. Algorithm Performance Metricprobabilities are:P (+) = 0.25P (−) = 0.75(4.2)We want to express utility in a single term for ease of comparison and alsoin a way that explicitly shows misclassification costs. Probability Costexpresses the misclassification costs and class distributions in a single term.PC(+) =P (+)C(FN)P (+)C(FN) + P (−)C(FP )(4.3)If the number of good and bad rocks were equal, then a PC(+) of 0.5is where misclassification costs are equal, C(FN) = C(FP ). For the classprobabilities in Equation 4.2, a PC(+) of 0.25 is where misclassificationcosts are equal. For a given set of class probabilities, lowering PC(+) impliesincreasing C(FP ), and raising PC(+) implies increasing C(FN).We need a utility function that tells us the performance of a rock sortingalgorithm. Ideally, it should tell us how well the sorting algorithm per-formed for a given set of misclassification costs and class distributions. Weuse Normalized Expected Cost, written as Norm(Exp[Cost]), which isdefined below:Norm(E[Cost]) =E[Cost]max E[Cost]=FN ∗ C(FN) + FP ∗ C(FP )P (+) ∗ C(FN) + P (−) ∗ C(FP )(4.4)Where FN and FP are the counts of false negatives and positives re-spectively, C(FN) and C(FP ) are the costs of a false negative and positive,and P (+) and P (−) are the the probabilities of a positive and negative.The denominator in Equation 4.4 is simply a normalizing constant which isuseful as it allows the scale of misclassification costs or class distributions tochange while still being able to compare one algorithm relative to another.364.3. Utility TradeoffsNormalized Expected Cost and Probability Cost are measures definedin Cost Curves[9]. We chose to use their measures as they are becominga standard approach for evaluating binary classifiers. Cost Curves havebeen used in a variety of applications, including: software fault prediction[16], landslide susceptibility models in the geology domain [10], and neuralnetworks in the medical domain [28].4.3 Utility TradeoffsWe can change the balance of true positives versus false negatives by chang-ing the parameters of each algorithm, which also affects the value of theutility function, Norm(E[Cost]).4.3.1 Utility ExampleWe want to find the best parameter configuration for the given class distri-butions and misclassification costs. In VBSA there is only one parameter,threshold, which ranges between 0 and 1000. We have discretized thresholdinto 1001 values. For this example, let the misclassification costs and classprobabilities take on the assignments below:C(FN) = 3C(FP ) = 1P (+) = 0.25P (−) = 0.75(4.5)The Normalized Expected Cost, Norm(E[Cost]), for this utility over allof VBSA’s thresholds can be seen in Figure 4.1.4.4 Automatic Parameter ConfigurationWe evaluate a sorting algorithm’s parameter configuration offline by run-ning the algorithm on all our recorded data (Section 3.2.1) then passing the374.4. Automatic Parameter ConfigurationBest threshold(a) VBSAFigure 4.1: Example of Norm(E[Cost]) over VBSA’s threshold parame-ter for misclassification costs and class probabilities as defined in Equation4.5. Best threshold value is marked (smaller values of Norm(E[Cost]) arebetter).diverter decisions through the diverter simulator (Section 4.1) and finallycalculating the Norm(E[Cost]).For each utility, we optimize the parameter configuration looking for thebest Norm(E[Cost]). We choose one parameter configuration per utility.If multiple parameter configurations tie for best Norm(E[Cost]), we chooseone arbitrarily. VBSA has only one parameter, threshold, so we find thebest one using an exhaustive search.For a specific utility, the number of parameter combinations is approx-imately 1026 (RPSA’s parameters are listed in Table 3.1). Therefore, wecannot afford to do an exhaustive search over all combinations of RPSA’sparameters. Instead, we search using automatic algorithm parameter con-figuration.384.5. ResultsFor automatic algorithm parameter configuration, we turned to a state-of-the-art technique: Sequential Model-Based Optimization for AlgorithmConfiguration (SMAC) [15]. SMAC has been successfully used to optimizelocal search and tree search algorithms.We formulated the problem as follows. 3/4 of the labelled data is usedas the training set, with the remaining 1/4 held aside as the test set. SinceRPSA is stochastic, its performance varies between runs, so the training setis repeated 50 times to find an average Norm(Exp[Cost]), for a total of18, 000 rocks. We use SMAC to search over RPSA’s parameters, minimizingNorm(Exp[Cost]) for a particular utility.We did this for 9 utilities shown in Table 4.2. The probability costs areshown in addition to the corresponding misclassification costs, given thatP (+) = 0.25 and P (−) = 0.75 in our data set.PC(+) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9C(FN) 0.3 0.75 1.3 2.0 3.0 4.5 7.0 12.0 27.0C(FP ) 1 1 1 1 1 1 1 1 1Table 4.2: C(FN) and C(FP ) which correspond to PC(+) values.For each value of PC(+), we ran SMAC for 4 hours on 8 2.67-GHz Xeonx5550 cores on WestGrid’s Hermes cluster.4.5 ResultsWe compare the performance of RPSA’s best parameters (found using SMAC)to VBSA’s best parameters (found using exhaustive search) on the test dataset. Results can be seen in the Cost Curve in Figure 4.2 which plots the prob-ability cost of a positive on the x-axis and normalized expected cost on they-axis. The ”trivial” case shows the cost of rejecting (for PC(+) < 0.5) oraccepting (for PC(+) > 0.5) all the rocks all the time. At PC(+) of 0.5 onecan accept or reject all or randomly sort in order to achieve the trivial cost.When a sorting algorithm’s Normalized Expected Cost (Norm(E[Cost])) ishigher (worse) than the trivial case, it is better to do a trivial sort.394.5. ResultsOur algorithm, RPSA, improves on VBSA by 9% on average across all9 utilities and is statistically significant for utility where PC(+) is 0.5 andlower. In other words, if the cost of a false negative (C(FN)) is 3 timesthe cost of a false positive (C(FP )) or less then RPSA should be used overVBSA. Error bars represent standard error of the mean.00.050.10.150.20.250.30.350.40.450.50.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Norm(E[Cost]) PC(+) RPSA VBSA TrivialFigure 4.2: Comparison of two sorting algorithms using the best parametersfor each algorithm for each PC(+). Lower values of Norm(E[Cost]) arebetter40Chapter 5Conclusion5.1 Future WorkRelational Influence Diagrams Applied to Rock SortingWe hope to use the relation influence diagram to discover new areas ofimprovement by identifying where the diagram is lacking and to describechanges in the future in terms of adding or removing diagram elements(variables, arrows, plates, etc) allowing future work to more clearly relateto what we have done in this thesis. A more detailed and accurate modelcould consist of the following:1. Model electromagnetic response with a more realistic function (cur-rently using 2-D Gaussian).2. Create a model of how to learn diverter behavior, such as active learn-ing.3. Allow for different diverter types (such as air jets).4. Model multiple rocks with various mineral types (with 1 mineral typeper rock).5. Model multiple rocks where each rock may have different mineral typeswithin it.6. Relax the assumption that rocks are circles.7. Consider alternate utility functions.415.1. Future WorkSensor FusionWe’d like to investigate the use of additional sensors types in combinationwith HFEMS sensors to contribute to the decision making process. Sensortypes include cameras (visible or infrared light), x-ray fluorescence, laser-induced breakdown spectroscopy, and laser displacement sensors. Some ofthese sensor types would overcome the problem that a material which isneither conductive nor magnetic is impossible to detect with HFEMS. Theproblem of localizing rocks could be aided by 2-D laser scanners or opticalcameras to map out where any and all physical objects are on the conveyorbelt - regardless of their magnetic or conductive properties. Solving thelocalization problem with lasers or cameras would also provide the abilityto automatically generate labelled data which could be used in training asystem that would not require the lasers or cameras upon deployment.RPSA Inference Quality and EfficiencyIn performing online inference, we want to decrease the computational re-quirements for real-time sorting and improve solution quality. For instance,the quality of results RPSA can produce is dependent on how much CPUtime is available. One option is estimate the value of computation [11] todetermine a reasonable trade-off between the cost of computing time and thevalue of the result resulting in dynamic CPU time scheduling. In dynamicCPU scheduling, we can allocate CPU time depending on the hardness ofthe current problem instance, such that trivial instances will take little timeand harder instances will be given extra time to compute.Data windows are sequential with no overlap in our implementation; abetter method may be to have overlapping data window - i.e. a slidingwindow. This would add computational burden, but would likely improveresult quality.Online Inference MethodsOther methods of identifying rocks and their properties should be consid-ered, resulting in new rock sorting algorithms. One option is to use machine425.2. Conclusionlearning models to improve recognition of patterns in sensor data. Secondly,independent component analysis (ICA) is a method often used in separat-ing sound sources in an audio signal [6]. With ICA, it may be possible toseparate the signals from each rock in our sensor data.Modeling and RepresentationFormal models for sensor-based, continuous and discrete time, object-orientedopen-universe problems would be beneficial. We have made one attempt us-ing a relational influence diagram. Given such a model we want to automat-ically generate an algorithm by determining policies offline and performinginference online where necessary.For example, we want to draw a relational influence diagram and au-tomatically perform inference and learning on it. Specifically, we want todiscover alternative rock sorting algorithms that have never been evaluatedbefore including introducing new sensor or diverter types.Furthermore, we want to consider computational efficiency in solving re-lational influence diagrams offline. One option is to consider limited memoryinfluence diagrams [19] where decisions and values from the past are not re-membered at all later times but are forgotten.5.2 ConclusionSensing and sorting ore is a difficult problem to which we provided a possi-ble solution. We modelled the problem of sensing and sorting rocks basedon their mineralogy, size, and position via electromagnetic sensors using arelational influence diagram. We presented an analytical solution and anapproximate solution (RPSA) for multiple rocks.We developed a simulation platform (diverter simulator) to train andevaluate the performance of any sorting algorithm. We compared the simu-lator to the real world to ensure comparable output. The simulator is notlimited to only evaluating our own sorting algorithm, but can be used in thefuture for new sorting algorithms.435.2. ConclusionRPSA was optimized by minimizing over its many algorithm parame-ters for all utility values. We found improved parameter configurations foreach utility, and show the performance compared to VBSA. RPSA improveson the previous best method (VBSA) by 9%. Our solution demonstratesthat there is potential to increase rock sorting performance resulting in areduction of material which needlessly leaves the mining site.44Bibliography[1] MPICH [Software], Accessed December 2013. http://www.mpich.org.[2] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research,13:281–305, 2012.[3] Jack Blitz and Geoff Simpson. Electrical and magnetic methods ofnon-destructive testing. Non-Destructive Evaluation Series. SpringerNetherlands, second edition, 1997.[4] Wray L. Buntine. Operations for learning with graphical models. Jour-nal of Artificial Intelligence Research, 2:159–225, 1994.[5] Alfonso Carlosena, Antonio J Lopez-Martin, Fernando Arizti, AneMart´ınez-de Guerenu, Jose L Pina-Insausti, and JL Garc´ıa-Saye´s. Sens-ing in coin discriminators. In Proceedings of IEEE Sensors ApplicationsSymposium, pages 6–8, 2007.[6] Pierre Comon. Independent component analysis, a new concept? SignalProcess., 36(3):287–314, April 1994.[7] Yagadhish Das, John E. McFee, Jack Toews, and Gregory C. Stuart.Analysis of an electromagnetic induction detector for real-time loca-tion of buried objects. IEEE Transactions on Geoscience and RemoteSensing, 28(3):278–288, May 1990.[8] Matthew Dirks, Andrew Csinger, Andrew Bamber, and David Poole.Representation, reasoning, and learning for a relational influence dia-gram applied to a real-time geological domain. In AAAI Workshop on45BibliographyStatistical and Relational Artificial Intelligence, Menlo Park, CA, 2014.AAAI Press.[9] Chris Drummond and Robert C. Holte. Cost curves: An improvedmethod for visualizing classifier performance. Machine Learning,65(1):95–130, 2006.[10] Paolo Frattini, Giovanni Crosta, and Alberto Carrara. Techniques forevaluating the performance of landslide susceptibility models. Engi-neering Geology, 111(14):62 – 72, 2010.[11] Michael C. Horsch and David Poole. Estimating the value of computa-tion in flexible information refinement. In In Proceedings of FifteenthConference on Uncertainty in Artificial Intelligence (UAI-99), pages297–304, 1999.[12] Eric J. Horvitz. Reasoning about beliefs and actions under computa-tional resource constraints. In In Proceedings of the 1987 Workshop onUncertainty in Artificial Intelligence, pages 429–444, 1987.[13] Ronald a. Howard and James E. Matheson. Influence Diagrams. Deci-sion Analysis, 2(3):127–143, September 2005.[14] William Hsu and Roby Joehanes. Relational decision networks. InProceedings of the ICML Workshop on Statistical Relational Learning,page 122, 2004.[15] Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequentialmodel-based optimization for general algorithm configuration. In Pro-ceedings of the 5th International Conference on Learning and Intelli-gent Optimization, LION’05, pages 507–523, Berlin, Heidelberg, 2011.Springer-Verlag.[16] Yue Jiang, Bojan Cukic, and T. Menzies. Cost curve evaluation of faultprediction models. In Software Reliability Engineering, 2008. ISSRE2008. 19th International Symposium on, pages 197–206, Nov 2008.46Bibliography[17] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open sourcescientific tools for Python, 2001.[18] Poole David L. and Mackworth Alan K. Artificial Intelligence: Foun-dations of Computational Agents. Cambridge University Press, NewYork, NY, USA, 2010.[19] Steffen L Lauritzen and Dennis Nilsson. Representing and solvingdecision problems with limited information. Management Science,47(9):1235–1251, 2001.[20] David Lowther. The development of industrially-relevant computa-tional electromagnetics based design tools. IEEE Transactions on Mag-netics, 49:2375–2380, May 2013.[21] Brian Milch and Stuart Russell. Extending Bayesian networks to theopen-universe case. Heuristics, Probability and Causality. A Tribute toJudea Pearl. College Publications, 5, 2010.[22] MineSense. MineSense. http://minesense.com/products, August 2013.[23] Ph. A. Passeraub, P-A. Besse, C. de Raad, O. Dezuari, F. Quinet, andR. S. Popovic. Coin recognition using an inductive proximity sensormicrosystem. In In Proceedings of International Conference on SolidState Sensors and Actuators, volume 1, pages 389–392, 1997.[24] David Poole. Logic, probability and computation: Foundations and is-sues of statistical relational AI. Logic Programming and NonmonotonicReasoning, 2011.[25] Christian Robert and George Casella. A short history of Markov chainmonte carlo: Subjective recollections from incomplete data. StatisticalScience, 26(1):102–115, 02 2011.[26] Ross D. Shachter. Probabilistic inference and influence diagrams. Op-erations Research, 36(4):589–604, 1988.47Bibliography[27] Jan A. Snyman. Practical mathematical optimization : an introduc-tion to basic optimization theory and classical and new gradient-basedalgorithms. Applied optimization. Springer, New York, 2005.[28] Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural networkswith methods addressing the class imbalance problem. Knowledge andData Engineering, IEEE Transactions on, 18(1):63–77, Jan 2006.[29] Shlomo Zilberstein. Using anytime algorithms in intelligent systems.AI Magazine, 17(3):73–83, 1996.48