c© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license.Practical guidelines for fast, efficient and robust simulations of yield-stressflows without regularisation using accelerated proximal gradient oraugmented Lagrangian methodsTimm Treskatisa,∗, Ali Roustaeid,∗, Ian Frigaarda,b, Anthony Wachsa,caDepartment of Mathematics, The University of British Columbia, 1984 Mathematics Road, Vancouver, BC, Canada V6T1Z2bDepartment of Mechanical Engineering, The University of British Columbia, 2054 Applied Science Lane, Vancouver, BC,Canada V6T 1Z4cDepartment of Chemical and Biological Engineering, The University of British Columbia, 2360 East Mall, Vancouver, BC,Canada V6T 1Z3dDepartment of Engineering Science, College of Engineering, University of Tehran, Tehran, IranAbstractThe mathematically sound resolution of yield stress fluid flows involves nonsmooth convex optimisationproblems. Traditionally, augmented Lagrangian methods developed in the 1980’s have been used for thispurpose. The main drawback of these algorithms is their frustratingly slow O(1/√k) worst-case convergence,where k is the iteration counter. Recently, an improved ‘dual FISTA’ algorithm (short: FISTA*) wasintroduced, which achieves the higher and provably optimal rate of O(1/k). When implementing thesealgorithms in two finite-element packages (FreeFem++ by Fre´de´ric Hecht, UPMC Paris and Rheolef byPierre Saramito, UGA Grenoble), we observed that these theoretical convergence rates are not generallyattained. In this article, we present four common numerical pitfalls that adversely impact the convergenceof the optimisation algorithms. By means of constructive and practical guidelines we point out how a carefulimplementation can not only recover the full order of convergence, but also reduce the computational costper iteration for further efficiency gains. Furthermore, we assess the performance and accuracy of FISTA*for the practical case of flow in wavy walled channel and demonstrate significant speed-up when FISTA* isemployed instead of the classical augmented Lagrangian method.Keywords: yield-stress flows, viscoplastic fluids, augmented Lagrangian method, FISTA*, finite elements2010 MSC: 65K15, 76M101. IntroductionFor several decades, the alternating direction method of multipliers (ADMM or ALG2) from the class ofaugmented Lagrangian methods has served as the workhorse for genuinely nonsmooth simulations of yield-stress fluid flows. With the attribute ‘genuinely nonsmooth’ we refer to algorithms that solve viscoplasticflow problems without artificial smoothing or other forms of regularisation.5As a minimal working example that captures all relevant effects without adding extra complexity, we∗Corresponding authorsEmail addresses: timmt@math.ubc.ca (Timm Treskatis), aroustaei@ut.ac.ir (Ali Roustaei)Preprint submitted to cIRcle November 27, 2017predominantly focus on the model problem of stationary creeping Bingham flow in two spatial dimensionsτ = 2µDu+ τ0 Du|Du| if |Du| 6= 0 (1a)|τ | ≤ τ0 if |Du| = 0 (1b)−div τ +∇p = f in Ω (1c)divu = 0 in Ω (1d)u = uΓ on Γ (1e)where τ : Ω → R2×2sym is the deviatoric part of the Cauchy stress tensor, u : Ω → R2 the velocity field andp : Ω → R the pressure. The problem data consisting of a plastic viscosity parameter µ > 0, a yield stressτ0 ≥ 0, a body force f : Ω→ R2 and an admissible boundary velocity uΓ : Γ→ R2 with∫ΓuΓ · nds = 0are given. We use the notation D := (∇+∇>)/2 for the symmetric part of the gradient operator and for atensor τ we define a norm by setting|τ | :=√√√√∑i,jτ2ij2.Γ is the boundary of the Lipschitz domain Ω ⊂ R2 with outward pointing unit normal vectors n.A convenient alternative formulation of the Bingham model from Equations (1a) and (1b) readsDu = fBi(τ ) :=12µ(|τ | − τ0)+τ|τ | . (2)Whenever the thresholding function (·)+ = max { 0, · } returns zero, we define the entire expression on theright hand side to be zero.The main difficulty of genuinely nonsmooth formulations of viscoplastic flow problems consists in their10lack of differentiability, due to the presence of a yield stress. Simultaneously, the stress τ is generally notuniquely determined whenever τ0 > 0.1.1. Genuinely nonsmooth optimisation algorithmsThe traditional solution approach of Duvaut and Lions [1] identifies Problem (1) with the nonsmoothconvex minimisation problemminu∈Ud∈S I(u,d) := 2µ∫Ω|d|2 dx+ 2τ0∫Ω|d|dx−∫Ωf · udx (3a)subject to Du = d, (3b)where the extra strain-rate field d is introduced for algorithmic reasons. The set of admissible flow velocitiesU ={u ∈ H1(Ω)2 ∣∣ divu = 0 and u = uΓ on Γ }incorporates incompressibility and the boundary condition into this formulation. Strain rates belong to thespaceS ={d ∈ L2(Ω)2×2∣∣∣ d = d> }of symmetric tensor-valued functions, as does the stress. In the sequel, we will use boldface symbolsH1(Ω) =H1(Ω)2 or L2(Ω) = L2(Ω)2×2 for product spaces, where their multiplicity shall be clear from the context.15We will also suppress the domain Ω.2Note that the stress τ does not explicitly appear in the primal problem (3). It does, however, act as aLagrange multiplier for the constraint (3b). Conversely, in the equivalent dual problem [2, 3]minτ∈Sp∈P J(τ ) := 12µ∫Ω(|τ | − τ0)2+ dx (4a)subject to − div τ +∇p = f . (4b)with the pressure spaceP = p ∈ L2(Ω)∣∣∣∣∣∣∫Ωp dx = 0 ,the velocity u has been eliminated, but it assumes the role of a Lagrange multiplier for the constraint (4b).It is an important observation that the primal problem (3) is strongly convex but, if τ0 > 0, notdifferentiable. In contrast, the dual problem (4) is once (Lipschitz) continuously differentiable but, if τ0 > 0,not strongly convex [3].20Both traditional and more recently devised optimisation algorithms exploit this structural information:starting from an arbitrarily chosen initial guess, numerical methods for the solution of Problem (3) computeupdates iteratively by taking steps along the negative gradient of the dual problem (4).The classical augmented Lagrangian method applied to Problem (3) has appeared e.g. in [4, 5, 6] andreads as follows:25Algorithm 1 (Alternating Direction Method of Multipliers (ADMM / ALG2) for Bingham flow).Input Initial guesses d0, τ 0, a penalty parameter r > 0Initialisation k = 0Step 1 Solve the Stokes problem−div (rDuk+1)+∇pk+1 = f − div (rdk − τ k) in Ωdivuk+1 = 0 in Ωuk+1 = uΓ on Γ.Step 2 Evaluate the modified Bingham modeldk+1 =12µ+ r(|τ k + rDuk+1| − τ0)+ τ k + rDuk+1|τ k + rDuk+1| .Step 3 If the algorithm has converged then stop else update the stress fieldτ k+1 = τ k + r(Duk+1 − dk+1).set k ← k + 1 and go to Step 1.The Bermu´dez-Moreno method [7] is a more general algorithm for solving variational inequalities. When30applied to the Bingham flow problem, it does however result in a very similar algorithm, possibly withdifferent free parameters [8].A few remarks on the convergence of Algorithm 1 are in order. In the literature on convex optimisation,convergence results are usually available for the optimality gap in function values, i.e. how fast the sequenceI(uk,Duk)− I(u¯,Du¯) or J(τ k)− J(τ¯ ) approaches zero. Here, u¯ and τ¯ denote the unique velocity solutionand an admissible shear stress, respectively. Since Problem (3) is strongly convex, convergence in function3values automatically enforces convergence of the velocity iterates uk → u¯ in the energy norm of the problem,by means of the inequality [2, p 23]I(uk,Duk)− I(u¯,Du¯) ≥ 2µ‖Duk −Du¯‖2L2 . (5a)In a similar fashion, convergence in the dual objective forces strain-rate iterates to converge as well [2, p 53]:J(τ k)− J(τ¯ ) ≥ 2µ‖fBi(τ k)−Du¯‖2L2 (5b)under the proviso that the stress iterate τ k satisfies the constraint (4b) of the dual problem. Recall thatfBi(τk) defines the strain-rate tensor corresponding to the stress τ k according to the Bingham model (2).The convergence analysis of the alternating direction method of multipliers is and remains challenging,but a number of theoretical results are available which prove thatI(uk,Duk)− I(u¯,Du¯) ≤ O(1/k)at least in special cases [9, 10]. In such a case, we obtain from (5) that the iterates of velocity fields ukgenerated by Algorithm 1 converge to the analytical solution u¯ like‖Duk −Du¯‖L2 ≤ O(1/√k).Indeed, this low sublinear convergence rate has been observed in numerical simulations of viscoplastic flow35problems [3].However, from the theory of complexity we know [11] that by exploiting only function values andgradients—just like Algorithm 1—optimisation problems of the form (4) can be solved at the rateJ(τ k)− J(τ¯ ) ≤ O(1/k2).According to (5), this yields a still sublinear, but compared to ADMM / ALG2 higher rate of order O(1/k)for the iterates. Very recently, Attouch and Peypouquet [12] refined this result further to even faster o(1/k2)convergence in function values or o(1/k) convergence of the iterates. They also re-confirmed that there isno > 0 for which a rate of order O(1/k1+) could generally be guaranteed. In that sense, the sublinear40o(1/k) rate is optimal for the genuinely nonsmooth solution of yield-stress flow problems.The augmented Lagrangian method from Algorithm 1 requires only minor modifications to turn it intoa method with such optimal convergence. The following class of algorithms was recently introduced in [3]:Algorithm 2 (Dual-based Proximal Gradient Methods for Bingham flow).Input τ 045Initialisation k = 0 , τ 0aux = τ0Step 1 Evaluate the Bingham model (2)dk =12µ(|τ k| − τ0)+ τ k|τ k| .Step 2 Solve the Stokes problem−div (2µDuk)+∇pk = f − div (2µdk − τ k) in Ωdivuk = 0 in Ωuk = uΓ on Γ.Step 3 If the algorithm has converged then stop else update the auxiliary stress fieldτ k+1aux = τk + 2µ(Duk − dk).4Step 4 Extrapolateτ k+1 = τ k+1aux + ξk+1(τ k+1aux − τ kaux),set k ← k + 1 and go to Step 1.There are several common choices for the step-size sequence (ξk)k:• The constant sequence ξk ≡ 0 gives the suboptimal dual iterative shrinkage-thresholding algorithm(ISTA*) [13], which is almost equivalent to the augmented Lagrangian method from Algorithm 150(ADMM / ALG2) [3].• The traditional FISTA sequence is defined through the recursion [13]t1 = 1, tk+1 =1 +√1 + 4(tk)22, ξk =tk − 1tk+1. (6)With this dual-based fast iterative shrinkage-thresholding algorithm (FISTA*), iterates converge at arate of the order O(1/k). [3]• Another classical choice stems from Nesterov’s acceleration scheme [11]55ξk =k − 1k − 1 + α (7)with a parameter α > 0, and typically α = 3. o(1/k) convergence of the iterates is obtainable withα > 3 [12]. Surely, the improvement from ‘big O’ to ‘little o’ is of no practical relevance, but settingα > 3 also yields additional and stronger convergence results e.g. for the sequence of stress iteratesτ k [12, 14]. We therefore use Algorithm 2 with the Nesterov sequence and α = 4.Both Algorithms 1 and 2 define an ‘outer’ optimisation loop, which converges at a sublinear rate. Every60single iteration only demands for simple function evaluations and the solution of a Stokes problem. ThisStokes problem in Step 1 or 2, respecitively is conventionally solved with an ‘inner’ iterative method, such asthe conjugate-gradient-Uzawa algorithm [15] which is robust, which converges linearly and for which efficientpreconditioners [16, Ch 2] are readily available.1.2. Other genuinely nonsmooth methods65In recent years, several attempts to solve Steps 1-3 in Algorithms 1 and 2 in a more coupled fashion haveappeared in the literature. For instance, Aposporidis and co-authors [17] replaced the methodology basedon optimisation techniques with a Picard iteration for Problem (1). A Newton algorithm for viscoplasticpipe flow problems appeared in [18], and Saramito replaced the trust-region strategy of the latter work withline searches in [19].70Compared to the algorithms presented above, the alternative methods from [17] and [19] transfer themain difficulty of yield-stress flow simulations—non-differentiability in u and / or d and non-uniqueness ofτ—from the ‘outer’ to the ‘inner’ loop. More specifically, the Picard and Newton methods lead to singularlinear systems, where the linear operator possesses a large nontrivial kernel. This affects the convergence ofthese methods as follows:75Convergence of ‘outer’ iterations Since the objective I of the primal optimisation problem (3) is bothstrongly convex and some of its terms are Lipschitz continuously differentiable, complexity theoryguarantees that ‘outer’ iterations of a suitable algorithm may converge at a linear rate (see e.g. [20]).Whether the algorithm in [17] actually attains this rate has not yet been fully investigated, as far aswe are aware. Aposporidis et al. [17] showed that their proof of linear convergence breaks down in the80genuinely nonsmooth case. The system matrix in the Newton iterations in [19] is generally singular.Therefore [21], the convergence rate of Newton’s method deteriorates from locally quadratic e.g. forgeneralised Newtonian flows, to linear when applied to the problem of yield-stress flows.5Convergence of ‘inner’ iterations The singular linear systems that arise in each ‘outer’ Picard or New-ton iteration may be identified with a quadratic and hence smooth, but not strongly convex optimi-85sation problem. Hence, the optimal convergence rate of ‘inner’ iterates for the solution of these linearsystems is the sublinear rate of o(1/k). We are not currently aware of works that would have addressedthe question whether common solvers such as MINRES or GMRES actually attain this optimal rate.Due to the infinite condition number of the linear operators involved, we certainly do not expect linearconvergence.90In summary, while the classical optimisation approach leads to algorithms with slow, sublinearly convergent‘outer’ iterations and fast, linearly convergent ‘inner’ iterations, the situation is reversed for the algorithmsin [17, 19]: here the ‘outer’ iterations are (at best) linearly convergent and the ‘inner’ iterations converge ata sublinear rate.It should also be noted that GMRES has a significantly larger memory footprint than Algorithms 1 and 295insofar as it accumulates the full convergence history up until the method is restarted. In contrast, FISTA*only requires that two auxiliary stress iterates be kept in memory.Another open question, as pointed out in [17], concerns LBB-stable discretisation schemes for theseformulations.Our objective is to obtain solvers that are designed for a robust and efficient solution of viscoplastic100flow problems. Considering the more mature state of the optimisation algorithms ADMM / ALG2 andparticularly FISTA* in terms of convergence analysis, preconditioning and efficient implementations, wefocus on these latter two algorithms in this work. However, many of our results are equally valid forimplementations of other genuinely nonsmooth algorithms, such as Picard iterations or Newton’s method.1.3. Outline of this paper105This paper is mostly based on our experience with implementing Algorithm 2 in the free and open sourcefinite-element packages FreeFem++ [22] and Rheolef [23]. We present and analyse four numerical challengesin Section 2, which negatively impact the convergence of the optimisation algorithms, along with strategiesto overcome them. Section 3 is a short case study to asses FISTA* for solving more practical flows. We solvethe Stokes flow of a Bingham fluid in a wavy walled channel and compare it to the previously published110results obtained using augmented Lagrangian. Finally, we close with some concluding remarks in Section 4.2. Implementation of ADMM / ALG2 and FISTA*This part is devoted to practical implementation guidelines for Algorithms 1 and 2, backed by convexand numerical analysis. In the following four subsections, we shall address these computational aspects:1. What is a feasible way of tracking the convergence of algorithms for viscoplastic flow problems?1152. What accuracy is required for the solution of the Stokes problem in each iteration?3. What finite elements should be used to discretise flow problems of yield-stress fluids?4. What triangulations of the flow domain are admissible?To illustrate the results of our numerical studies in this section, we select the prototypical benchmarkproblem of force-driven flow enclosed in a square cavity. More precisely, in Problem (1) we define the flowdomain as the unit square Ω = ]0, 1[2, on the wall we impose the no-slip condition uΓ ≡ 0 and we set theright-hand side tof(x) = 300(x2 − 0.50.5− x1).For the rheological parameters we choose µ = 1 and τ0 = 10 (all units are SI units). Reference solutions areavailable in the literature [2, 3, 24].120An analytical solution for this problem appears to be unknown. For its numerical solution, we startfrom a Cartesian grid of 32 × 32 squares and subdivide each cell diagonally into four congruent triangles.6(a) Flow velocity u¯ and its norm. (b) Yielded (white) and unyielded (grey) zones.Figure 1: Numerical reference solution for the force-driven Bingham flow problem in a square cavity, computed with theMATLAB / Octave package TOOTHPASTE [26] on a 32× 32 grid. There is no flow in the corners of the domain. In the centralplug region that is shaped like a truncated square, the flow is constrained to rigid-body rotation.On this triangulation, we discretise the flow equations with the conforming and inf-sup stable Bercovier-Pironneau elements [25], i.e. P1-iso-P2/P1 elements for the velocity-pressure complex. Since the velocityis a continuous, piecewise linear function, the discrete shear stress and strain rate are naturally chosen to125be (discontinuous) piecewise constant. After k = 1, 000, 000 iterations of FISTA*, initialised with τ 0 ≡ 0and always solving the Stokes problems to machine accuracy, we obtain our numerical reference solutionu¯ ≈ u1,000,000. This ‘rotating plug’ is shown in Figure 1.2.1. Measuring convergenceThe objective of a numerical flow simulation is to find a velocity field u, which approximates the true solu-130tion u¯ as closely as possible. With the error measured in the energy norm, we shall accept an approximationu as soon as‖Du−Du¯‖L2 ≤ tol (8)for a prescribed tolerance tol > 0. In practice, a sensible numeric value for tol could be derived from thecharacteristic length and velocity scales of the problem, or one could use a relative criterion of the formtol = reltol‖Du‖L2with reltol set to e.g. 1% or 0.1%.Of course, the stopping criterion (8) is not practically applicable, since u¯ is unknown. Instead of (8),the common stopping criterion used in simulations of viscoplastic fluids [27, 28, 29, 30, 31] is of the form135‖Du− d‖L2 ≤ . (9)This mismatch between the two strain-rate fields Du and d is commonly referred to as residual in theliterature. The field Du − d is the negative gradient corresponding to the dual problem (4), and thisgradient surely vanishes at the optimal solution.Sometimes, the criterion (9) is also complemented with a restriction on the velocity increment uk−uk−1between two subsequent iterations [29, 31]. If measured in the energy norm, such a criterion assumes the140form‖Duk −Duk−1‖L2 ≤ . (10)7Quantity Definition Measure forerror u− u¯ proximity to the true solutionresidual Du− d proximity to the affine subspace where Du = dincrement uk − uk−1 progress of the algorithmprimal-dual gap I(u,Du) + J(τ ) difference between the primal functional I and thedual functional −JTable 1: Common nomenclature for the convergence analysis of algorithms in viscoplasticity.Figure 2: Sketch of Problem (3): contours of the nonsmooth convex objective I(u,d), with its minimiser subject to theconstraint Du = d marked by a +. Sets of points that satisfy a stopping criterion are shaded in grey. Note that the usualcriterion ‖Du − d‖ ≤ alone does not imply proximity to the solution (left); a second contribution to the error has to becontrolled as well (right).In Table 1 we offer a summary of various technical expressions that are used to describe the convergenceof algorithms for simulating yield-stress flows, along with some less abstract interpretations. In other works,the notion of convergence has sometimes been used to describe at what rate the residuals approach zero. Toavoid any confusion, we consistently refer to the sequence of errors whenever we speak of the convergence145of an optimisation algorithm.Our analysis shall lead us towards two main conclusions, a negative result and a positive one:• The commonly used stopping criteria (9) and (10) for Algorithms 1 and 2 do not reliably control theerror.• A both reliable and sharp error bound is computable at relatively low cost.150We will try to illustrate why the residuals are not an adequate measure of convergence. Unfortunately,the plausible implicationsmall residual =⇒ small erroris false. In Figure 2, we provide a sketch of the primal problem (3) that we obtain after replacing theunknowns of the vector-valued velocity field u and the tensor-valued strain-rate field d with real numbersonly. In this simplified setting, the constraint Du = d defines a line. The set of all points that satisfy theresidual-based stopping criterion (9) forms an infinitely long band around this line. Consequently, even ifan iterate (u,d) gives a tiny residual, u could still lie arbitrarily far away from the true solution u¯. Some155numerical experiments shall reveal shortly that this does indeed happen for our test problem of force-drivenBingham flow.8Note that we may decompose the error of the k-th velocity iterate as follows:‖Duk −Du¯velocity error‖L2 ≤ ‖Duk − dkresidual‖L2 + ‖dk −Du¯‖L2strain-rate error(11)Firstly, this inequality shows once once more that a small residual is not generally a sufficient criterion fora small error. Secondly, one may hypothesise that there is a certain risk of using a large penalty parameter160r in the augmented Lagrange formulation of the primal problem (3) that Algorithm 1 is based on: in thatcase, ADMM / ALG2 converges towards the affine subspace where Du = d with high priority (residualsdecay rapidly), whereas convergence towards the true solution plays a subordinate role (errors remain large).Observing only residuals, one might get the impression of seemingly accelerated convergence.Fortunately, convex optimisation offers a solution to these issues. The strong duality of the primal and165dual problems (3) and (4) signifies that for arbitrary velocity fields u and for arbitrary stress fields τ thatsatisfy the dual constraint (4b), we generally haveI(u,Du) ≥ −J(τ ) (12)and if equality holds, then u and τ already solve the viscoplastic flow problem [2, p 45]. Combining theinequalities (5) and (12) yields2µ‖Duk −Du¯‖2L2 ≤ I(uk,Duk)− I(u¯,Du¯) ≤ I(uk,Duk) + J(τ )for an arbitrary shear stress τ subject to the constraint (4b). It makes sense to evaluate this so-calledprimal-dual gap on the right-hand side with the best available guess for a solution τ¯ that also satisfies (4b),namely the auxiliary iterate τ k+1aux . For k ≥ 1 but not for k = 0, the extrapolated stress τ k+1 would be170another admissible choice. This way, we obtain‖Duk −Du¯‖L2 ≤1√2µ√I(uk,Duk) + J(τ k+1aux ) =: ηk. (13)Even though the error on the left-hand side is unknown, the right-hand side is computable at every iteration.Explicitly, we haveηk =1√2µ√√√√2µ∫Ω|Duk|2 dx+ 2τ0∫Ω|Duk|dx−∫Ωf · uk dx+ 12µ∫Ω(|τ k+1aux | − τ0)2+ dx (14)Since this quantity ηk is guaranteed to be a reliable upper bound for the actual error, we refer to it as anerror estimate.175Numerical experimentsAfter these theoretical considerations, we now support our conclusions with numerical results. We shallshow that residuals and velocity increments are not generally suitable measures of convergence for simulationsof yield-stress flows. Furthermore, while the theory already ensures that ηk is a reliable upper bound forthe error, we also want to find out whether it is sharp, i.e. whether it decays at the same and not a lower180rate than the actual error.To that end, we conduct four different experiments, each time solving our test problem of force-drivenBingham flow. In each case, we compute for the first 1,000 iterations• the error measured in the energy norm, ‖Duk −Du¯‖L2 , with the numerical reference solution u¯ fromFigure 1,185• the estimator ηk =√I(uk,Duk) + J(τ k+1aux )/√2µ,• the residual ‖Duk − dk‖L2 ,• the velocity increment, also measured in the energy norm, ‖Duk −Duk−1‖L2 .9iteration kerror ‖Duk −Du¯‖L2estimator ηkresidual ‖Duk − dk‖L2increment ‖Duk −Duk−1‖L210010110210310−810−610−410−2100102(a) Algorithm 1 (ADMM / ALG2).iteration k10010110210310−810−610−410−2100102(b) Algorithm 2 (FISTA*).iteration k10010110210310−1510−1010−5100105(c) Algorithm 1 as penalty method.iteration k10010110210310−810−610−410−2100102(d) Algorithm 1 with incorrect strain rate.Figure 3: Inadequacy of customary stopping criteria: neither the residuals nor the increments provide reliable error bounds.The error estimator ηk, however, is both reliable and sharp.10Algorithm 1 (ADMM / ALG2). In a first experiment, we use the classical augmented Lagrangian methodwith a fixed penalty parameter of r = 2µ = 2, initialised with d0 = τ 0 = 0. The results are shown in190Figure 3a.As predicted by the convergence analysis of this method, the velocity iterates approach the true solutionat a low sublinear rate (namely O(1/√k)). Meanwhile, both the residuals and the increments converge ata higher rate (namely O(1/k)). Consequently, if one only measures these two quantities as it is commonpractice, one might be misled to believe that the algorithm converges faster than it actually does. In other195words, these two quantities cannot measure the actual error reliably, which limits their applicability for astopping criterion.In contrast, the error estimate exhibits the same convergence rate of order O(1/√k) as the errors. Asexpected, it is also consistently an upper bound for the error. Hence, the estimator does indeed provideboth a reliable and sharp error bound in this case.200Algorithm 2 (FISTA*). Repeating the same procedure for Algorithm 2, we obtain a very similar picture,as shown in Figure 3b. FISTA* reduces the error more efficiently than ADMM / ALG2: the velocityiterates now approach the solution at a rate of order O(1/k) (or o(1/k)). Due to the non-monotone natureof accelerated methods, it is difficult to infer a quantitative convergence rate of residuals and incrementsfrom the graph. The error estimator, however, also captures the improved convergence rate compared to205Algorithm 1 accurately.Algorithm 1 as penalty method. In a third experiment, we study the influence of the penalty parameter ron the convergence of the augmented Lagrangian method. Bernabeu observed that ‘the convergence [of theresiduals] attains a linear regime’ [32, p 59] if the penalty parameter grows exponentially with the numberof iterations. For a flow problem different from the one considered here, the author initialised the sequence210of penalty parameters with r = 10 and then increased their value by 1% from one iteration to the next.We use the same methodology for the test problem at hand, but increasing the parameter r by 2.5% inevery iteration. We do this solely for practical purposes, to ensure that the interesting dynamics are stillobservable within the first 1,000 iterations. The qualitative features remain the same as with the slowergrowth studied by Bernabeu.215As can be seen from Figure 3c, residuals and velocity increments converge linearly, starting at about300 iterations and until machine accuracy is reached. Since the growing penalty parameter increasinglyprioritises satisfaction of the constraint Du = d over the minimisation of the objective, such behaviouris expected. Meanwhile, it also becomes clear that large penalty parameters do not actually improve theconvergence of the algorithm, but they rather lead to early stagnation.220In line with our aforementioned hypothesis, these findings should serve as a word of warning to prac-titioners who may feel tempted to tune an implementation of Algorithm 1 for faster convergence by usinga large (fixed or variable) penalty parameter. Even though the method will likely satisfy the two commonstopping criteria after fewer iterations already, the approximate solution may well be further away from theexact solution as with a smaller parameter.225Algorithm 1 with incorrect strain rate. Our fourth and final convergence experiment is a particularly extremecase. This time, we use Algorithm 1 set up as for the first experiment. However, we deliberately implementa mistake in Step 2 and add a 1% error to the strain-rate tensor by re-scaling dk+1 ← 1.01dk+1. Of course,this algorithm does not converge to the solution of the test problem.Indeed, from Figure 3d we observe that after approximately 50 iterations, the error sequence ‖Duk −230Du¯‖L2 stagnates at a value between 10−2 and 10−1. The error estimator ηk predicts this value correctly evenquantitatively. Residuals and increments, however, converge in the same manner as in the first experiment,giving the false appearance of convergence to the solution of the problem! From the residuals and incrementsalone, it is impossible to distinguish a good approximation of u¯ from a completely nonsensical velocity field.Practical consequences235We return to our initial question regarding the issue of how the convergence of optimisation algorithmsfor viscoplastic flow simulations can be measured.11Residuals and velocity increments paint an overly optimistic picture of the convergence, risking thatan algorithm is stopped too early while iterates uk are still far away from the solution u¯. The conclusionthat ‘the algorithm has converged’ because these commonly observed quantities have become small is not240generally valid.The expression (14) for the error estimator ηk requires about the same computational cost as the evalu-ation of residuals and velocity increments combined. However, it does not share the latters’ inadequacy formeasuring convergence. The estimator is provably reliable, it is sharp in all four of the above experimentsand it hence qualifies as an ideal candidate for a rigourous stopping criterion. To test for convergence in245Step 3 of Algorithm 1 or 2, we suggest to check whether ηk ≤ tol, cf (8). Once this criterion is met, theactual error ‖Duk −Du¯‖L2—even though usually unknown—is automatically bounded by tol as well.Finally, large penalty parameters in the augmented Lagrangian method may be harmful. Besides, toobtain actual and not just apparent convergence acceleration, we recommend to replace ADMM / ALG2with FISTA*.2502.2. Solving the Stokes problemWe now turn towards the second question regarding the required accuracy of solutions to the Stokesproblems which appear in Step 1 of Algorithm 1 and Step 2 of Algorithm 2.We now turn towards the second question regarding the required accuracy of solutions to the Stokesproblems− div (cDu) +∇p = f − div(cdk − τ k)in Ωdivu = 0 in Ωu = uΓ on Γ.which appear in Step 1 of Algorithm 1 with c = r, u = uk+1, p = pk+1 and in Step 2 of Algorithm 2 withc = 2µ, u = uk, p = pk.255In our analysis so far we have assumed that these subproblems are solved exactly. In our numerical testswe have respected this assumption as far as practically possible, by solving the velocity problems in theconjugate-gradient Uzawa algorithm directly through matrix factorisation and iterating until‖divu‖L2 ≤ abstol (15)where the absolute tolerance is set to machine accuracy ≈ 2× 10−16.The solution of Stokes’ problem is by far the most expensive component of Algorithms 1 and 2; in fact260it is the only step that contributes a significant computational cost. Consequently, it appears wasteful tosolve these subproblems with excessive accuracy while iterates are still far from convergence.On another hand, it has already been observed in the past that large errors in the evaluation of iteratesmay impact the convergence rate of fast methods to an extent at which their advantage of convergenceacceleration is lost [33]. Thus, while it would be desirable to solve the Stokes problems only inexactly,265a certain level of accuracy is still required to preserve the fast and robust convergence of the numericalsimulation. Fortunately, recent results from the convergence analysis of convex optimisation algorithmsprovide a simple and practical criterion that achieves just that.In [20], Schmidt and co-authors analyse how the convergence rates of proximal gradient methods, suchas the basic ISTA and the accelerated FISTA, are affected by inexact evaluations of the subproblems. With270their results put in the context of our problem, they show that the convergence rates remain unchanged,provided that the Stokes problem at iteration k of Algorithm 2 is solved withabstol ≤ O(1/k2+δ), (16)where δ > 0 is an arbitrarily small constant. For the more slowly converging method without acceleration,the rate of convergence remains unaffected ifabstol ≤ O(1/k1+δ). (17)These results motivate the following strategy for an inexact algorithm:275121. For the initial iteration, solve the Stokes problem for some absolute tolerance abstol = α > 0.2. At iteration k, solve the Stokes problem with abstol = α/k1+δ (for a basic method like Algorithm 1)or abstol = α/k2+δ (for an accelerated method like Algorithm 2).Numerical experimentsWe proceed with a few simulations of the force-driven Bingham flow problem in order to validate this280strategy of inexact augmented Lagrangian or dual FISTA methods.Inexact variants of Algorithm 1. In Figure 4a we show how the convergence rate of ADMM / ALG2 isaffected by inexact evaluations of the Stokes problem. We initially compute an error bound η0 for the datathe algorithm is initialised with, then we define the Stokes tolerance for the first iteration to be α = 10−2η0,i.e. we accept a 1% error in the ‘inner’ iterations relative to the ‘outer’ iterations. We set the safety factor285δ to 10−3.If the absolute tolerance for the Stokes solver decays at a rate of only O(1/k0.5+δ), then the convergencerate of the augmented Lagrangian method deteriorates from O(1/√k) to what appears to be O(1/k3/8).This observation indicates that the theoretically derived sufficient condition of [20] cannot be relaxed.With the Stokes tolerance adjusted more rapidly at a rate of O(1/k1+δ), no visual difference are per-290ceivable between the convergence of the inexact and exact versions of the augmented Lagranian algorithm.Despite virtually identical convergence behaviour in terms of iteration numbers, the inexact method isactually three times faster in terms of computing time. These results are shown in Figure 4c.Inexact variants of Algorithm 2. In an analogous fashion, we first solve the Stokes problems in FISTA*with abstol decaying like 1/k1+δ or 1/k2+δ. As can be seen from Figure 4b, the former rate is insufficient295for optimal convergence while the latter strategy yields errors that are indistinguishable from the exactalgorithm.Looking at the performance of the different algorithms over time, we first observe that FISTA* withexact subproblems is more efficient than either version of ADMM / ALG2. With inexactly solved Stokesproblems, the computing time of FISTA* is improved further by an additional factor of approximately two.300Algorithm 2 (FISTA*) with default settings of finite-element programs. The FreeFem++ software package[22] implements two types of stopping criteria for Stokes problems: an absolute stopping criterion of theform (15) or, alternatively, a relative one,‖divu‖L2 ≤ reltol‖div u˜‖L2 , (18)where u˜ = u0 refers to the initial velocity iterate of the Stokes solver.Rheolef [23] by default measures convergence of the Stokes solver with a relative criterion of the form(18), however with u˜ defined as the solution to the elliptic problem−div (cDu˜) = f − div(cdk − τ k)which originates from the Schur-complement formulation of the Stokes problem.305It is important to note that the convergence analysis of convex optimisation and the previous twonumerical experiments require the absolute tolerance, i.e. the entire right-hand side of (18) to be sufficientlysmall. This may be difficult to achieve with a relative stopping criterion, since the factor ‖div u˜‖L2 is notimmediately controlled. Indeed, as presented in Figure 5, even with a very small relative tolerance imposedon the Stokes solver, the optimisation algorithm may fail to converge.310Practical consequencesIn conclusion, careful attention should be given to the seemingly small detail of a stopping criterionfor the Stokes problems in Algorithms 1 and 2. While it is crucial to control the absolute magnitude ofthe Stokes residuals, it is not necessary to impose a small tolerance right from the start. With a sufficient13iteration kerrorabstol = ǫabstol = α/k1+δabstol = α/k0.5+δ10010110210310−610−510−410−310−210−1100101(a) Convergence of Algorithm 1 (ADMM / ALG2)with graphs of O(1/k3/8) and O(1/k1/2)iteration kerrorabstol = ǫabstol = α/k2+δabstol = α/k1+δ10010110210310−610−510−410−310−210−1100101(b) Convergence of Algorithm 2 (FISTA*) withgraphs of O(1/k1/2) and O(1/k).errorCPUtime(s)10−610−410−2100020406080100120140160180(c) Computing time of exact vs inexact algorithmsdepending on the imposed level of accuracy. Pleaserefer to the legends in (a) and (b).Figure 4: Algorithms 1 and 2 with exact or inexact solutions of the Stokes problems. If the absolute tolerance for the Stokessolver decays at a sufficiently high rate, then the inexact algorithm converges at the same rate as the original algorithm. Withrespect to computing time, the inexact version of each algorithm converges significantly faster than the original method. Data:δ = 10−3, α ≈ 6× 10−2.1410 0 10 1 10 2 10 3 10 4Iteration k10 -1210 -1010 -810 -610 -410 -210 010 2(a) Relative stopping criteria10 0 10 1 10 2 10 3 10 4Iteration k10 -1210 -1010 -810 -610 -410 -210 010 2(b) Absolute stopping criteriaFigure 5: Convergence plots of FISTA* for the rotating plug flow. a) Using relative stopping criterion reltol = 10−11 for solvingStokes subproblems b) Using absolute stopping criterion for solving Stokes subproblems. The blue, black and red curves showvelocity increments in the L2 norm, residuals and the primal-dual gap respectively; the dashed line shows the theoretical 1/k2decay rate. Implementation in FreeFem++.decay rate, viscoplastic flow problems may be solved several times faster in terms of CPU time, while the315algorithms still maintain their favourable robust convergence behaviour.We point out that for inexactly solved Stokes problems, the error estimator ηk may no longer be a validerror bound since stress iterates may now violate the constraint (4b). We leave a generalised error estimatefor inexact algorithms for future research. For now, we recommend to solve only the first and the lastiteration with a sharp Stokes tolerance:320• As in our numerical experiments, the error estimator for the initial iterate can provide a sensible valuefor the absolute Stokes tolerance that an inexact method is initialised with.• Once it appears like the inexact algorithm has converged, another evaluation of the error estimatorwith a strict Stokes tolerance confirms whether convergence has indeed been attained.2.3. Evaluating the constitutive model325It is rather obvious that an ‘inner’ iterative method for Stokes’ problem stopped after a finite numberof iterations introduces a possibly significant error. A much more subtle, but at least equally onerous errorsource lurks in the constitutive equation. Step 2 of Algorithm 1 and Step 1 of Algorithm 2 demand for anevaluation of the constitutive model defined by fBi in (2). Generally, this cannot be done exactly.We shall motivate this issue with an example: a common, inf-sup stable element pair for velocity and330pressure is the Taylor-Hood or P2/P1 element. Due to the continuous, piecewise quadratic velocity approxi-mation, it is natural to discretise the strain-rate and shear-stress tensors with P dc1 elements of discontinuous,piecewise linear functions. However, the nonlinear and nonsmooth function fBi turns a piecewise linear stressinput τ k into an output fBi(τk) that is usually no longer piecewise linear. Hence, an additional interpolationstep is needed to obtain a piecewise linear strain-rate field dk. Overall, we obtain335dk = fBi(τk) + ek (19)where ek assumes the role of an interpolation error. This error renders the Bingham flow problem, discretisedwith Taylor-Hood elements, inconsistent.15(a) Discontinuous P dc1 elements for shear stress andstrain rate, e.g. due to a Taylor-Hood discretisationof the velocity-pressure system.(b) Discontinuous P0 elements for shear stress andstrain rate, e.g. due to a Bercovier-Pironneau dis-cretisation of the velocity-pressure system.Figure 6: Evaluation of the Bingham model in 1D: piecewise linear and higher-order elements for the shear stress and strainrate give rise to a mesh-dependent interpolation error; piecewise constant elements allow for an exact evaluation. In 2D and3D, piecewise linear and higher-order elements additionally introduce an interpolation error of order O(h) or higher on cellswhere the fluid is fully yielded.We propose two solution strategies:1. To avoid this inconsistent discretisation, one may use low-order finite elements that lead to a piecewiseconstant approximation of the strain-rate and shear-stress variables.3402. To alleviate the inconsistency issue of higher-order elements, we suggest adaptive enrichment of thediscretisation to reduce the magnitude of the interpolation error ek as the optimisation algorithmprogresses.The avoidance strategy is remarkably simple: if the input τ k is piecewise constant, then the outputfBi(τk) is automatically piecewise constant as well and both at the continuous and the discrete level itholdsdk = fBi(τk)with ek ≡ 0. In Figure 6 we illustrate this concept in one spatial dimension.Since Du is a piecewise constant function if and only if u is piecewise linear, we suggest the following345two ‘special’ choices of finite elements that are both inf-sup stable and that automatically lead to a fullyconsistent discretisation of yield-stress fluid flows:Bercovier-Pironneau elements P1-iso-P2 elements for the velocity, which is continuous and piecewiselinear on a refined grid; P dc0 elements for the shear stress and strain rate on the same grid; P1elements for the pressure, which is continuous and piecewise linear on the given, coarser grid. This350approximation is conforming with respect to the function spaces that these variables live in, but notgenerally discretely mass-conservative.1st order Brezzi-Douglas-Marini elements BDM1 elements for the velocity, which is piecewise lin-ear and continuous in the normal direction across cell interfaces; P dc0 elements for the shear stress,strain rate and pressure. This approximation is a nonconforming discontinuous Galerkin method and355discretely mass-conservative.A sketch of these elements is shown in Figure 7.16Figure 7: Degrees of freedom of the Bercovier-Pironneau elements (top) and the first-order Brezzi-Douglas-Marini elements(bottom) with the corresponding elements for the velocity (left), pressure (centre) and shear stress / strain rate (right). Nodalvalues are represented by solid markers, normal components by arrows. Since the velocity is piecewise linear in both cases, thestrain rate is a piecewise constant function.17In order to improve upon the approximation quality of these low-order elements, adaptive mesh refine-ment may be applied if so desired. However, the optimisation algorithm will also converge to a solution onarbitrarily coarse meshes.360This is no longer true if the problem is discretised with higher-order elements, such as Taylor-Hoodelements or a discontinuous Galerkin scheme with shape functions of higher polynomial degree. Applyingonce again the convergence analysis of inexact algorithms of Schmidt and co-workers [20], the interpolationerror in the evaluation of the strain rate is required to decay like‖ek‖L2 ≤ O(1/k1+δ) (20)for the basic ISTA* method without acceleration and like365‖ek‖L2 ≤ O(1/k2+δ) (21)for FISTA* in order to guarantee that the convergence rate does not deteriorate. As for the inexact solutionof the Stokes problems, δ > 0 is an arbitrary parameter.The interpolation error ek depends on the chosen elements, in particular their polynomial degree, thesmoothness of the approximated function and the mesh parameter h, which we define as the maximumdiameter over all cells of the grid. These interpolation estimates are available in the literature on finite-370element methods, e.g. [34].On cells where |τ k| > τ0, i.e. where the fluid is fully yielded, f(τ k) is a fully smooth function. Aninterpolation estimate then assumes the form‖ek‖L2 ≤ O(hp) (22)for the error restricted to those cells of the mesh that are fully contained in the yielded subdomain. With pwe denote the degree of the largest possible polynomial space P dcp that is a subset of the discrete stress and375strain-rate space. Consequently, the error may be reduced by h-adaptivity, p-adaptivity or a combinationthereof, i.e. by refining the mesh cells locally or by increasing the polynomial degree of the shape and testfunctions.On cells that are cut by the yield surface, however, the approximation order is limited by the lackof smoothness of f(τ k). In the direction perpendicular to the interface between yielded and unyielded380regions, this function possesses a ‘kink’. Such weak singularities are only poorly approximated by polynomialfunctions, resulting in a lower rate of convergence than in the smooth case (22) (cf [35, Ch 6]) and a possiblyvery large leading-order coefficient.Thus, the total approximation error will usually be dominated by the contributions from cells on theyield surface. Local mesh refinement at least in the direction perpendicular to the boundary of any plugsis one theoretically admissible strategy to obtain a convergent algorithm—an approach implemented byRoquet and Saramito in [36]. Note that it is crucial to refine the mesh at a sufficiently high rate: for anoptimal convergence rate of FISTA*, (21) imposes that the mesh be refined locally around the yield surfaceat a rate ofh ≤ O(1/k2+δ).Considering that due to the algorithms’ sublinear convergence a relatively large number of iterations k ∼ 103is often necessary for a reasonable accuracy, it may be challenging to achieve the required mesh resolution.385Extended finite element methods (XFEM) have been designed as a generalisation of p-adaptive enrich-ment for similar problems that exhibit singularities across interfaces, see e.g. [37, 38]. In the context ofviscoplastic fluid flow problems, the following approach may be viable:1. Compute the level set |τ k| = τ0 which defines the yielded-unyielded interface, and approximate it witha piecewise polynomial curve.3902. Enrich or replace polynomial shape functions of elements cut by these curves with more tailor-madeshape functions, which also possess a weak singularity (‘kink’) in the direction perpendicular to thiscurve.Further research will be required to assess the feasibility of such an approach in the context of numericalsimulations of viscoplastic fluid flow.39518iteration kerrorabstol = ǫ, ek = 0abstol = ǫ, ek = χ/k1/2+δabstol = α/k1+δ, ek = χ/k1/2+δabstol = ǫ, ek = χ/k1+δabstol = α/k1+δ, ek = χ/k1+δ100 101 102 10310−610−510−410−310−210−1100101(a) Convergence of Algorithm 1 (ADMM / ALG2)with graphs of O(1/k1/4) and O(1/k1/2)iteration kerrorabstol = ǫ, ek = 0abstol = ǫ, ek = χ/k1+δabstol = α/k2+δ, ek = χ/k1+δabstol = ǫ, ek = χ/k2+δabstol = α/k2+δ, ek = χ/k2+δ100 101 102 10310−610−510−410−310−210−1100101(b) Convergence of Algorithm 2 (FISTA*) withgraphs of O(1/k1/2) and O(1/k).Figure 8: Convergence of Algorithms 1 and 2 in the presence of errors in the strain rate and with exact or inexact solutions of theStokes problems. Strain-rate errors may adversely affect the convergence of the algorithm, unless they decay sufficiently fast.In the latter case, the algorithms remain robust even with inexactly solved Stokes problems. Data: δ = 10−3, α ≈ 6 × 10−2.The entries of the tensor field χ are randomly drawn from a uniform distribution on [−0.1, 0.1].Numerical experimentsWe first investigate how general errors in the strain-rate affect the convergence of ADMM / ALG2 andFISTA*, then we compare Bercovier-Pironneau and Taylor-Hood elements for viscoplastic flow simulations.ADMM with strain-rate errors. We solve the rotating-plug problem once more, at this stage still withBercovier-Pironneau elements for which no interpolation error occurs. Instead, we generate a strain-rate400perturbation field χ, a piecewise constant function on the finer velocity grid whose function values arerandomly drawn from a uniform distribution on the interval [−0.1, 0.1]. After evaluating the exact strain-rate iterate dk as prescribed by Step 2 of Algorithm 1, we add an artificial error ek = χ/k1+δ that decaysat a rate sufficient for unobstructed convergence, or ek = χ/k1/2+δ, which decays slower. We perturb boththe exact algorithm and the algorithm with inexact solutions of the Stokes problems.405In Figure 8a we show that strain-rate errors of orderO(1/k1/2+δ) have a detrimental effect on convergence.No negative impact is observable when the accuracy of the strain-rate iterates improves sufficiently fast.Remarkably, even if both the velocity iterates and the strain-rate iterates are computed inexactly, thealgorithm remains robust.FISTA* with strain-rate errors. Figure 8b admits analogous conclusions. Provided that the sufficient decay410rates for errors in the Stokes problems and the constitutive model are obeyed, FISTA* converges at theexpected higher rate with no perceivable differences to the exact algorithm. In the presence of strain-rateerrors that only decay at a rate proportional to 1/k1+δ, convergence like 1/k2 still appears to be preserved,at least for the first 2,000 iterations. Local superconvergence effects have vanished, though. The acceleratedalgorithm, too, always converges in a robust manner even in the presence of velocity and /or strain-rate415errors.Bercovier-Pironneau vs Taylor-Hood elements. We now analyse more specifically the effect of the interpo-lation error that results from a Taylor-Hood discretisation with discontinuous, piecewise linear stress andstrain-rate approximations.In Figure 9 we compare how Algorithm 2 converges with a discretely consistent approximation using420Bercovier-Pironneau elements and with an inconsistent approximation using Taylor-Hood elements. We donot employ any adaptation techniques such as mesh refinement. Hence, the Taylor-Hood interpolation error1910 0 10 1 10 2 10 3 10 4Iteration k10 -1010 -810 -610 -410 -210 010 2(a) Discretisation with Taylor-Hood elements. Im-plementation in Rheolef.10 0 10 1 10 2 10 3 10 4Iteration k10 -1210 -1010 -810 -610 -410 -210 010 2(b) Discretisation with Bercovier-Pironneau ele-ments. Implementation in FreeFem++.Figure 9: Convergence of Algorithm 2 for the benchmark problem of force-driven Bingham flow; L2 norm of the velocityincrements (blue), residuals (black) and the primal-dual gap (red). On a fixed grid, the interpolation error introduced byTaylor-Hood elements does not decay, leading to stagnation as soon as this error dominates.is of order O(1) with respect to the iteration counter k. This systematic error ek thus becomes increasinglynotable as the optimisation algorithm progresses towards the solution, leading to increasingly perturbedupdates until the method stagnates. Indeed, convergence of FISTA* is neither expected, nor is it observed425in our numerical results. In constrast, no such issue occurs for the discretisation with Bercovier-Pironneauelements.A viscoplastic flow problem in only one spatial dimension allows us to pinpoint that the dominantcontribution to the interpolation error arises from the poor approximation quality in cells that are cut bythe yield surface. We consider a simple Bingham-Poiseuille flow problem, for which the exact Algorithm 1430or 2 returns the solution after the first iteration. This it not generally true, however, when this problem isdiscretised with Taylor-Hood elements. In Figure 10a, we show that even after ten iterations the methodfails to converge, unless the boundary of the central plug coincides exactly with the boundary of two meshcells. Whenever the yielded-unyielded interface lies in the interior of an element, the results are of pooraccuracy, even on a very fine mesh.435Practical consequencesThere are two kinds of ‘special’ finite-element methods that are particularly interesting for simulationsof yield-stress fluid flows: the Bercovier-Pironneau and the Brezzi-Douglas-Marini elements of first order.These schemes lead to a piecewise constant approximation of the strain-rate and shear-stress fields and arethus fully consistent at the discrete level. The convergence of the optimisation method is independent of the440mesh resolution. Adaptive mesh refinement, e.g. to improve the spatial resolution of stagnant flow regions,is optional.Higher order elements generally introduce a mesh-dependent consistency error. The optimisation algo-rithm does not converge, unless this error is decreased at a sufficiently high rate. Adaptive mesh refinement(or other model enrichment) is mandatory.44520(a) Velocity profile of a Bingham-Poiseuilleflow with the plug shaded in grey. If discre-tised with an even number of intervals acrossthe channel width, the yielded-unyielded inter-face lies exactly on an element border.ny ‖u− u¯‖L2 ‖d−Du¯‖L214 1.6e-14 4.5e-1415 2.6e-03 1.8e-0216 3.3e-15 2.2e-1429 7.1e-04 6.7e-0330 6.2e-13 1.1e-13119 4.2e-05 8.1e-04120 1.8e-12 3.3e-12(b) Velocity / strain-rate errors of FISTA* after 10 it-erations (using the P2/Pdc1 element pair for velocity /strain rate) compared to the analytical solution u¯ of thePoiseuille flow problem. ny is the number of elementsacross the channel width.Figure 10: Poiseuille flow of a Bingham fluid. The yield stress is specifically chosen such that the central plug fills exactly halfof the channel.2.4. MeshingFinally, we address one seemingly minor detail, which may nevertheless have a notable negative effecton convergence:Some mesh generators, including the meshing module of FreeFem++, may produce corner triangleswith two edges on the boundary of the domain, unless specifically configured otherwise. Certain elements,450such as P1 elements, then have no remaining degrees of freedom on such cells. The Stokes problems arehence overdetermined and solved in a least-squares sense. This case may be interpreted as an inconsistentdiscretisation as well, with analogous consequences as discussed in the previous subsection. Fortunately, anyinconsistencies resulting from singular corner triangles are easy to eliminate.In Figure 11 we show how the mesh topology may affect the convergence of FISTA*.4553. Case Study: wavy channelIn this section we demonstrate the capabilities of FISTA* for the resolution of complex and more practicalflows. One such example is the Stokes flow of a Bingham fluid through a wavy channel. This flow has richfeatures that a good numerical algorithm should be able to capture accurately. In [39], the case of small waveamplitude (h) for long channels (δ = w/l 1, w: width, l: length of channel) is studied using asymptotic460methods. It is shown that for h < O(δ) the plug at the centre of the channel is intact, additionally is widerat narrower sections of the channel which is counter-intuitive. Larger amplitudes h ∼ O(δ) are consideredin [40]. The central plug breaks into two islands at the widest and narrowest sections of the channel in thiscase and a pseudo-plug forms between the two. A pseudo-plug is a region with small strain-rates where thestress only exceeds the yield stress slightly. For large wave amplitudes h = O(1), as examined in [41], a465static zone forms in the widest section of the channel.In Figure 12 we have used FISTA* to reproduce Figure 3 of [41] which was done using an augmentedLagrangian method and which shows the subsequent variations in the flow starting from asymptotically smallwave amplitude to bigger O(1) value at fixed Bingham number B = 10. We can see that FISTA* resolvesall the features: the intact central plug at small wave amplitude, the broken central plug and pseudo-plug4702110 0 10 1 10 2 10 3 10 4 10 5Iteration k10 -1210 -1010 -810 -610 -410 -210 010 2(a)10 0 10 1 10 2 10 3 10 4 10 5Iteration k10 -1210 -1010 -810 -610 -410 -210 010 2(b)Figure 11: The convergence plots and corresponding mesh (shown below) for two cases. a) Mesh with corner points dividedbetween two elements b) Mesh with corner points owned by one element leading to overdetermined system. The black and redcurves show strain rate residual and duality gap respectively and the dashed line shows the theoretical 1/k2 decay rate.22Figure 12: Figure 3 of [41] reproduced by FISTA*. Speed and streamline (left panel) and pressure (right panel) for B = 10and wave amplitude h = .01, .25, 1, 2, 4. The color keys at bottom are valid for all h, gray regions on the right show unyieldedregions.at slightly bigger amplitude, the thin fouling layer at the widest section of the channel for h = 2 and finally,the large static zone for h = 4, all as shown in Figure 3 of [41].As in [41], we used three cycles of mesh adaptation to increase the accuracy of computations. Notethat adapting mesh elements to the yield surface can have significant effect on the accuracy of solutions asmentioned in Section 2.3. Fig 13 shows the final adapted mesh for the channel of h = 2 in Fig 12.475Fig 14 shows the comparison of convergence plots of FISTA* and the canonical augmented LagrangianALG2 for the wavy channel of h = 2 in Fig 12. Both implementations were done in FreeFem++ using exactlythe same mesh, element type (velocity: P1-iso-P2, pressure: P1, stress/strain rate: P0-iso-P2) and Stokessolver to make the comparison as fair as possible. We see that in ∼8000 iterations FISTA* has narrowed theduality gap down to ∼ 10−6 while the augmented Lagrangian method reaches ∼ 10−4 which is a significant480difference. Note that the main computational cost per iteration of both algorithms is the Stokes solver andthus for an equal number of iterations both take approximately equal time.4. ConclusionsA fast optimisation algorithm, a carefully designed discretisation scheme and fine-tuned algebraic solversare essential for convergent and efficient numerical simulations of yield-stress fluid flows:4851. The residual, i.e. the mismatch between the two numerical strain-rate fields, is neither an adequatestopping criterion nor a valid measure for convergence. Instead, we recommend to gauge convergencethrough the error estimator that is based on the primal-dual gap.23Figure 13: The mesh of wavy channel flow of h = 2 in Fig 12 after three cycles of adaptation.10 0 10 1 10 2 10 3 10 4Iteration k10 -810 -610 -410 -210 010 2(a) FISTA*10 0 10 1 10 2 10 3 10 4Iteration k10 -810 -610 -410 -210 010 210 4(b) Augmented Lagrangian (ALG2)Figure 14: Convergence plots of FISTA* and augmented Lagrangian for the wavy channel flow. The black and red curves showthe residuals and the duality gap respectively and the dashed line shows the theoretical 1/k2 decay rate of FISTA*.242. The Stokes problems that arise in every iteration of the optimisation algorithm may be solved inexactlywithout affecting the order of convergence, provided that the tolerance decays at a sufficiently high490rate. Inexact algorithms may reduce the computing time by a factor of two or even more.3. Since all viscoplastic constitutive relations are inherently nonlinear and nonsmooth, an interpolationerror is automatically introduced when the numerical strain-rate and stress fields are discretised withpiecewise linear or higher-order elements. This interpolation error imposes a barrier for the optimisa-tion algorithm, sooner or later resulting in stagnation. h, p or other forms of adaptive enrichment are495mandatory for convergence. Only for piecewise constant stress and strain-rate approximations, whichappear naturally in a discretisation with Bercovier-Pironneau or first-order Brezzi-Douglas-Marini ele-ments, the interpolation error vanishes and convergence is generally guaranteed, independently of theunderlying mesh.4. Finally, mesh generators should be configured to compute triangulations with no singular corner ver-500tices in order to avoid elements with no effective degrees of freedom. Else, the discrete Stokes problemswill be solved in a least-squares sense, leading to unnecessary errors and inhibited convergence.All of these conclusions are valid both for classical optimisation algorithms such as the augmentedLagrangian method ADMM / ALG2 and accelerated methods such as FISTA*. The simulation results offlow through the wavy channel demonstrated that FISTA* can achieve its fast theoretical convergence rate,505provided that the necessary details are met. This order of magnitude speed-up is definitely a step forwardto practical 3D computations of yield-stress flows and even for unsteady 2D flow. Finally, it is important tomention the value of open source software. It would have been very difficult or impossible to pinpoint andresolve some details (e.g. relative/absolute stopping criteria in Section 2.2) without inspecting the sourcecode of the software.510In future work, we shall investigate the issue of error estimation for inexact algorithms further. Eventhough our practical guidelines apply analogously to yield-stress fluids governed by constitutive modelsother than the Bingham model, the extra smoothness of the Herschel-Bulkley or Casson models can likelybe exploited for even more efficient algorithms. To generalise this work further, we also consider the fullNavier-Stokes equations supplemented with a viscoplastic model an interesting computational challenge.515AcknowledgementWe would like to thank P. Saramito and F. Hecht for their advice on Rheolef and FreeFem++. Com-pute Canada is acknowledged for providing computational infrastructure. Ali Roustaei and Ian Frigaardacknowledge the support of NSERC and Schlumberger through CRD Project 444985-12.References520[1] G. Duvaut, J. L. Lions, Inequalities in mechanics and physics, Grundlehren der mathematischen Wissenschaften, Springer-Verlag, Berlin, Heidelberg, New York, 1976.[2] T. Treskatis, Fast proximal algorithms for applications in viscoplasticity, Ph.D. thesis, University of Canterbury,Christchurch (2016).[3] T. Treskatis, M. A. Moyers-Gonza´lez, C. J. Price, An accelerated dual proximal gradient method for applications in525viscoplasticity, Journal of Non-Newtonian Fluid Mechanics 238 (2016) 115–130.[4] M. Fortin, R. Glowinski, Augmented Lagrangian methods: applications to the numerical solution of boundary-valueproblems, North-Holland, 1983.[5] R. Glowinski, P. Le Tallec, Augmented Lagrangian and operator-splitting methods in nonlinear mechanics, Vol. 9, SIAM,1989.530[6] R. Glowinski, A. Wachs, On the numerical simulation of viscoplastic fluid flow, in: P. G. Ciarlet (Ed.), Numerical Methodsfor Non-Newtonian Fluids, Vol. 16 of Handbook of Numerical Analysis, North-Holland, Amsterdam, 2011.[7] A. Bermu´dez, C. Moreno, Duality methods for solving variational inequalities, Computers & Mathematics with Applica-tions 7 (1) (1981) 43–58.[8] E. D. Ferna´ndez-Nieto, J. M. Gallardo, P. Vigneaux, Efficient numerical schemes for viscoplastic avalanches. Part 1: the5351D case, Journal of Computational Physics 264 (2014) 55–90.[9] B. He, X. Yuan, On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers, Nu-merische Mathematik (2012) 1–11.25[10] B. He, X. Yuan, On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method, SIAM Journalon Numerical Analysis 50 (2) (2012) 700–709.540[11] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2), in: Soviet MathematicsDoklady, Vol. 27, 1983, pp. 372–376.[12] H. Attouch, J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually fasterthan 1/k2, SIAM Journal on Optimization 26 (3) (2016) 1824–1834.[13] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on545Imaging Sciences 2 (1) (2009) 183–202.[14] A. Chambolle, C. Dossal, On the convergence of the iterates of the fast iterative shrinkage/thresholding algorithm, Journalof Optimization Theory and Applications 166 (3) (2015) 968–982.[15] J. Cahouet, J.-P. Chabard, Some fast 3D finite element solvers for the generalized Stokes problem, International Journalfor Numerical Methods in Fluids 8 (8) (1988) 869–895.550[16] S. Turek, Efficient Solvers for Incompressible Flow Problems: An Algorithmic and Computational Approach, Vol. 6,Springer Science & Business Media, 1999.[17] A. Aposporidis, E. Haber, M. A. Olshanskii, A. Veneziani, A mixed formulation of the Bingham fluid flow problem:analysis and numerical solution, Comput. Methods Appl. Mech. Engrg. 200 (29-32) (2011) 2434–2446. doi:10.1016/j.cma.2011.04.004.555[18] T. Treskatis, M. A. Moyers-Gonza´lez, C. J. Price, A trust-region SQP method for the numerical approximation of vis-coplastic fluid flow, arXiv preprint.URL http://arxiv.org/pdf/1504.08057[19] P. Saramito, A damped Newton algorithm for computing viscoplastic fluid flows, Journal of Non-Newtonian Fluid Me-chanics 238 (2016) 6–15.560[20] M. Schmidt, N. L. Roux, F. R. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization,in: Advances in Neural Information Processing Systems, 2011, pp. 1458–1466.[21] D. Decker, H. Keller, C. Kelley, Convergence rates for Newtons method at singular points, SIAM Journal on NumericalAnalysis 20 (2) (1983) 296–314.[22] F. Hecht, New development in FreeFem++, Journal of Numerical Mathematics 20 (3-4) (2012) 251–265.565[23] P. Saramito, Efficient C++ finite element computing with Rheolef, CNRS-CCSD ed., 2015, http://cel.archives-ouvertes.fr/cel-00573970.[24] J. C. De los Reyes, S. Gonza´lez Andrade, A combined BDF-semismooth Newton approach for time-dependent Binghamflow, Numerical Methods for Partial Differential Equations 28 (3) (2012) 834–860.[25] M. Bercovier, O. Pironneau, Error estimates for finite element method solution of the Stokes problem in the primitive570variables, Numerische Mathematik 33 (2) (1979) 211–224.[26] T. Treskatis, TOOTHPASTE: The viscoplastic CFD app for MATLAB, version 1.1, https://www.mathworks.com/matlabcentral/fileexchange/62924-toothpaste (May 2017).[27] P. Saramito, N. Roquet, An adaptive finite element method for viscoplastic fluid flows in pipes, Computer methods inapplied mechanics and engineering 190 (40) (2001) 5391–5412.575[28] E. J. Dean, R. Glowinski, G. Guidoboni, On the numerical simulation of Bingham visco-plastic flow: old and new results,J. Non-Newtonian Fluid Mech. 142 (1) (2007) 36–62.[29] Z. Yu, A. Wachs, A fictitious domain method for dynamic simulation of particle sedimentation in Bingham fluids, Journalof Non-Newtonian Fluid Mechanics 145 (2) (2007) 78–91.[30] L. Muravleva, Uzawa-like methods for numerical modeling of unsteady viscoplastic Bingham medium flows, Applied580Numerical Mathematics 93 (2015) 140–149.[31] A. Wachs, I. A. Frigaard, Particle settling in yield stress fluids: limiting time, distance and applications, Journal ofNon-Newtonian Fluid Mechanics 238 (2016) 189–204.[32] N. Bernabeu, Mode´lisation multi-physique des e´coulements viscoplastiques: application aux coule´es de lave volcanique,Ph.D. thesis, Universite´ de Grenoble, Grenoble (2015).585[33] O. Devolder, F. Glineur, Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathemat-ical Programming 146 (1-2) (2014) 37–75.[34] P. G. Ciarlet, The finite element method for elliptic problems, Elsevier, 1978.[35] D. Funaro, Polynomial approximation of differential equations, Vol. 8, Springer Science & Business Media, 2008.[36] N. Roquet, P. Saramito, An adaptive finite element method for Bingham fluid flows around a cylinder, Comput. Methods590Appl. Mech. Engrg. 193 (2003) 3317–3341.[37] T.-P. Fries, T. Belytschko, The extended/generalized finite element method: an overview of the method and its applica-tions, International Journal for Numerical Methods in Engineering 84 (3) (2010) 253–304.[38] T. Carraro, S. Wetterauer, On the implementation of the eXtended Finite Element Method (XFEM) for interface problems,arXiv preprint.595URL http://arxiv.org/pdf/1507.04238[39] I. Frigaard, D. Ryan, Flow of a visco-plastic fluid in a channel of slowly varying width, Journal of Non-Newtonian FluidMechanics 123 (1) (2004) 6783. doi:10.1016/j.jnnfm.2004.06.011.URL http://dx.doi.org/10.1016/j.jnnfm.2004.06.011[40] A. Putz, I. Frigaard, D. Martinez, On the lubrication paradox and the use of regularisation methods for lubrication flows,600Journal of Non-Newtonian Fluid Mechanics 163 (1-3) (2009) 6277. doi:10.1016/j.jnnfm.2009.06.006.URL http://dx.doi.org/10.1016/j.jnnfm.2009.06.006[41] A. Roustaei, I. Frigaard, The occurrence of fouling layers in the flow of a yield stress fluid along a wavy-walled channel,26Journal of Non-Newtonian Fluid Mechanics 198 (2013) 109124. doi:10.1016/j.jnnfm.2013.03.005.URL http://dx.doi.org/10.1016/j.jnnfm.2013.03.00560527
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- Practical guidelines for fast, efficient and robust...
Open Collections
UBC Faculty Research and Publications
Practical guidelines for fast, efficient and robust simulations of yield-stress flows without regularisation… Treskatis, Timm; Roustaei, Ali; Frigaard, Ian; Wachs, Anthony M., 1983- Nov 27, 2017
pdf
Page Metadata
Item Metadata
Title | Practical guidelines for fast, efficient and robust simulations of yield-stress flows without regularisation using accelerated proximal gradient or augmented Lagrangian methods |
Creator |
Treskatis, Timm Roustaei, Ali Frigaard, Ian Wachs, Anthony M., 1983- |
Date Issued | 2017-11-27 |
Description | The mathematically sound resolution of yield stress fluid flows involves nonsmooth convex optimisation problems. Traditionally, augmented Lagrangian methods developed in the 1980's have been used for this purpose. The main drawback of these algorithms is their frustratingly slow O(1/√k) worst-case convergence, where k is the iteration counter. Recently, an improved 'dual FISTA' algorithm (short: FISTA*) was introduced, which achieves the higher and provably optimal rate of O(1/k). When implementing these algorithms in two finite-element packages (FreeFem++ by Frédéric Hecht, UPMC Paris and Rheolef by Pierre Saramito, UGA Grenoble), we observed that these theoretical convergence rates are not generally attained. In this article, we present four common numerical pitfalls that adversely impact the convergence of the optimisation algorithms. By means of constructive and practical guidelines we point out how a careful implementation can not only recover the full order of convergence, but also reduce the computational cost per iteration for further efficiency gains. Furthermore, we assess the performance and accuracy of FISTA* for the practical case of flow in wavy walled channel and demonstrate significant speed-up when FISTA* is employed instead of the classical augmented Lagrangian method. |
Subject |
yield-stress flows viscoplastic fluids augmented Lagrangian method FISTA* finite elements |
Genre |
Article Preprint |
Type |
Text |
Language | eng |
Date Available | 2017-11-29 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0360784 |
URI | http://hdl.handle.net/2429/63760 |
Affiliation |
Applied Science, Faculty of Science, Faculty of Chemical and Biological Engineering, Department of Mathematics, Department of Mechanical Engineering, Department of |
Peer Review Status | Unreviewed |
Scholarly Level | Faculty Postdoctoral |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 52383-Treskatis_et_al_Practical_guidelines_fast.pdf [ 3.51MB ]
- Metadata
- JSON: 52383-1.0360784.json
- JSON-LD: 52383-1.0360784-ld.json
- RDF/XML (Pretty): 52383-1.0360784-rdf.xml
- RDF/JSON: 52383-1.0360784-rdf.json
- Turtle: 52383-1.0360784-turtle.txt
- N-Triples: 52383-1.0360784-rdf-ntriples.txt
- Original Record: 52383-1.0360784-source.json
- Full Text
- 52383-1.0360784-fulltext.txt
- Citation
- 52383-1.0360784.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.52383.1-0360784/manifest