underactuated/optimization.html at master · HaydenTedrake/underactuated · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
<!DOCTYPE html>

<html>

  <head>
    <title>Underactuated Robotics: Optimization and
  Mathematical Programming</title>
    <meta name="Underactuated Robotics: Optimization and
  Mathematical Programming" content="text/html; charset=utf-8;" />
    <link rel="canonical" href="http://underactuated.mit.edu/optimization.html" />

    <script src="https://hypothes.is/embed.js" async></script>
    <script type="text/javascript" src="htmlbook/book.js"></script>

    <script src="htmlbook/mathjax-config.js" defer></script>
    <script type="text/javascript" id="MathJax-script" defer
      src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
    </script>
    <script>window.MathJax || document.write('<script type="text/javascript" src="htmlbook/MathJax/es5/tex-chtml.js" defer><\/script>')</script>

    <link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
    <script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
    <script>hljs.initHighlightingOnLoad();</script>

    <link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
  </head>

<body onload="loadChapter('underactuated');">

<div data-type="titlepage">
  <header>
    <h1><a href="underactuated.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
    <p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
    <p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
    <p style="font-size: 14px; text-align: right;">
      &copy; Russ Tedrake, 2020<br/>
      <a href="tocite.html">How to cite these notes</a><br/>
    </p>
  </header>
</div>

<p><b>Note:</b> These are working notes used for <a
href="http://underactuated.csail.mit.edu/Spring2020/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2020 semester.  <a
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture  videos are available on YouTube</a>.</p>

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=multibody.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=underactuated.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=playbook.html>Next Chapter</a></td>
</tr></table>


<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter class="appendix" style="counter-reset: chapter 2"><h1>Optimization and
  Mathematical Programming</h1>

  <section><h1>Optimization software</h1>

    <p>We have a short introduction to mathematical programming in
      <drake></drake> <a href="https://nbviewer.jupyter.org/github/RobotLocomotion/drake/blob/master/tutorials/mathematical_program.ipynb">here</a>.</p>

  </section>

  <section><h1>General concepts</h1>

    <subsection><h1>Constrained optimization with Lagrange multipliers</h1>

      <p>Given the equality-constrained optimization problem $$\minimize_\bz
      \ell(\bz) \quad \subjto \quad \bphi(\bz) = 0,$$ where $\bphi$ is a vector.
      Define a vector $\lambda$ of Lagrange multipliers, the same size as
      $\phi$, and the scalar Lagrangian function, $$L(\bz,{\bf \lambda}) =
      \ell(\bz) + \lambda^T\phi(\bz).$$ A necessary condition for $\bz^*$ to be
      an optimal value of the constrained optimization is that the gradients of
      $L$ vanish with respect to both $\bz$ and $\lambda$: $$\pd{L}{\bz} = 0,
      \quad \pd{L}{\lambda} = 0.$$  Note that $\pd{L}{\lambda} = \phi(\bz)$, so
      requiring this to be zero is equivalent to requiring the constraints to be
      satisfied.</p>

      <example><h1>Optimization on the unit circle</h1>

        <p>Consider the following optimization: $$\min_{x,y} x+y, \quad \subjto
        \quad x^2 + y^2 = 1.$$ The level sets of $x+y$ are straight lines with
        slope $-1$, and the constraint requires that the solution lives on the
        unit circle.</p> <figure> <img width="80%"
        src="figures/lagrange_multipliers_unit_circle.svg"/> </figure>
        <p>Simply by inspection, we can determine that the optimal solution
        should be $x=y=-\frac{\sqrt{2}}{2}.$ Let's make sure we can obtain the
        same result using Lagrange multipliers. </p>

        <p>Formulating $$L = x+y+\lambda(x^2+y^2-1),$$ we can take the
        derivatives and solve

        \begin{align*}
        \pd{L}{x} = 1 + 2\lambda x = 0 \quad & \Rightarrow & \lambda = -\frac{1}{2x}, \\
        \pd{L}{y} = 1 + 2\lambda y = 1 - \frac{y}{x} = 0 \quad & \Rightarrow & y = x,\\
        \pd{L}{\lambda} = x^2+y^2-1 = 2x^2 -1 = 0 \quad & \Rightarrow & x = \pm \frac{1}{\sqrt{2}}.
        \end{align*}

        Given the two solutions which satisfy the necessary conditions, the
        negative solution is clearly the minimizer of the objective.</p>

      </example>

    </subsection>

  </section>

  <section><h1>Convex optimization</h1>
    <subsection><h1>Linear Programs/Quadratic Programs/Second-Order Cones</h1>
      Example: Balancing force control on Atlas
    </subsection>
    <subsection><h1>Semidefinite Programming and Linear Matrix Inequalities</h1>
    </subsection>
    <subsection id=sums_of_squares><h1>Sums-of-squares optimization</h1>

      <p>It turns out that in the same way that we can use SDP to search over
      the positive quadratic equations, we can generalize this to search over
      the positive polynomial equations.  To be clear, for quadratic equations
      we have \[ {\bf P}\succeq 0 \quad \Rightarrow \quad \bx^T {\bf P} \bx \ge
      0, \quad \forall \bx. \]  It turns out that we can generalize this to \[
      {\bf P}\succeq 0 \quad \Rightarrow \quad {\bf m}^T(\bx) {\bf P} {\bf
      m}(\bx) \ge 0, \quad \forall \bx, \] where ${\bf m}(\bx)$ is a vector of
      polynomial equations, typically chosen as a vector of <em>monomials</em>
      (polynomials with only one term).  The set of positive polynomials
      parameterized in this way is exactly the set of polynomials that can be
      written as a <em>sum of squares</em><elib>Parrilo00</elib>.  While it is
      known that not all positive polynomials can be written in this form, much
      is known about the gap.  For our purposes this gap is very small (papers
      have been written about trying to find polynomials which are uniformly
      positive but not sums of squares); we should remember that it exists but
      not worry about it too much for now.</p>

      <p>Even better, there is quite a bit that is known about how to choose
      the terms in ${\bf m}(\bx)$.  For example, if you give me the polynomial
      \[ p(x) = 2 - 4x + 5x^2 \] and ask if it is positive for all real $x$, I
      can convince you that it is by producing the sums-of-squares
      factorization \[ p(x) = 1 + (1-2x)^2 + x^2, \] and I know that I can
      formulate the problem without needing any monomials of degree greater
      than 1 (the square-root of the degree of $p$) in my monomial vector.  In
      practice, what this means for us is that people have authored
      optimization front-ends which take a high-level specification with
      constraints on the positivity of polynomials and they automatically
      generate the SDP problem for you, without having to think about what
      terms should appear in ${\bf m}(\bx)$ (c.f. <elib>Prajna04</elib>).
      This class of optimization problems is called Sums-of-Squares (SOS)
      optimization.</p>

      <p>As it happens, the particular choice of ${\bf m}(\bx)$ can have a huge
      impact on the numerics of the resulting semidefinite program (and on your
      ability to solve it with commercial solvers). <drake></drake> implements
      some particularly novel/advanced algorithms to make this work well
      <elib>Permenter17</elib>.</p>

      <p>We will write optimizations using sums-of-squares constraints as \[
      p(\bx) \sos \] as shorthand for the constraint that $p(\bx) \ge 0$ for all
      $\bx$, as demonstrated by finding a sums-of-squares decomposition.</p>

      <p>This is surprisingly powerful.  It allows us to use convex
      optimization to solve what appear to be very non-convex optimization
      problems.</p>

      <example><h1>Global minimization via
        SOS</h1>

        <p>Consider the following famous non-linear function of two variables
        (called the "six-hump-camel": $$p(x) = 4x^2 + xy - 4y^2 - 2.1x^4 + 4y^4
        + \frac{1}{3}x^6.$$  This function has six local minima, two of them
        being global minima <elib>Henrion09a</elib>.</p>

        <div  style="text-align:center"><img
                src="figures/six_hump_camel.svg" width="90%"/></div>

        <p>By formulating a simple sums-of-squares optimization, we can
        actually find the minimum value of this function (technically, it is
        only a lower bound, but in this case and many cases, it is surprisingly
        tight) by writing: \begin{align*} \max_\lambda \ \ & \lambda \\
        \text{s.t. } & p(x) - \lambda \text{ is sos.} \end{align*}  Go ahead
        and play with the code (most of the lines are only for plotting; the
        actual optimization problem is nice and simple to formulate). </p>

        <jupyter>examples/optimization.ipynb</jupyter>

        <p>Note that this finds the minimum value, but does not actually
        produce the $\bx$ value which mimimizes it.  This is possible
        <elib>Henrion09a</elib>, but it requires examining the dual of the
        sums-of-squares solutions (which we don't need for the goals of this
        chapter).</p>
      </example>

    </subsection>

    <subsection><h1>Solution techniques</h1>
      Interior point (Gurobi, Mosek, Sedumi, ...), First order methods
    </subsection>
  </section>

  <section id="nonlinear"><h1>Nonlinear programming</h1>

    <p>The generic formulation of a nonlinear optimization problem is     \[
    \min_z c(z) \quad \subjto \quad \bphi(z) \le 0, \]     where $z$ is a
    vector of <em>decision variables</em>, $c$ is a scalar <em>objective
    function</em> and $\phi$ is a vector     of <em>constraints</em>.  Note
    that, although we write $\phi \le 0$, this formulation     captures
    positivity constraints on the decision variables (simply multiply the
    constraint by $-1$) and equality constraints (simply list both $\phi\le0$
    and $-\phi\le0$) as well. </p>

    <p>The picture that you should have in your head is a nonlinear, potentially
    non-convex objective function defined over (multi-dimensional) $z$, with a
    subset of possible $z$ values satisfying the constraints.</p>

    <figure> <img width="80%"
    src="figures/nonlinear_optimization_w_minima.svg"/>
    <figcaption>One-dimensional cartoon of a nonlinear optimization problem. The
    red dots represent local minima.  The blue dot represents the optimal
    solution.</figcaption> </figure>

    <p>Note that minima can be the result of the objective function having zero
    derivative <em>or</em> due to a sloped objective up against a
    constraint.</p>

    <p>Numerical methods for solving these optimization problems require an
    initial guess, $\hat{z}$, and proceed by trying to move down the objective
    function to a minima.  Common approaches include <em>gradient descent</em>,
    in which the gradient of the objective function is computed or estimated, or
    second-order methods such as <em>sequential quadratic programming (SQP)</em>
    which attempts to make a local quadratic approximation of the objective
    function and local linear approximations of the constraints and solves a
    quadratic program on each iteration to jump directly to the minimum of the
    local approximation.</p>

    <p>While not strictly required, these algorithms can often benefit a great
    deal from having the gradients of the objective and constraints computed
    explicitly; the alternative is to obtain them from numerical
    differentiation.  Beyond pure speed considerations, I strongly prefer to
    compute the gradients explicitly because it can help avoid numerical
    accuracy issues that can creep in with finite difference methods.  The
    desire to calculate these gradients will be a major theme in the discussion
    below, and we have gone to great lengths to provide explicit gradients of
    our provided functions and automatic differentiation of user-provided
    functions in <drake></drake>.</p>

    <p>When I started out, I was of the opinion that there is nothing difficult
    about implementing gradient descent or even a second-order method, and I
    wrote all of the solvers myself.  I now realize that I was wrong.  The
    commercial solvers available for nonlinear programming are substantially
    higher performance than anything I wrote myself, with a number of tricks,
    subtleties, and parameter choices that can make a huge difference in
    practice.  Some of these solvers can exploit sparsity in the problem (e.g.,
    if the constraints operate in a sparse way on the decision variables).
    Nowadays, we make heaviest use of SNOPT <elib>Gill06</elib>, which now comes
    bundled with the binary distributions of <drake></drake>, but also <a
    href="http://drake.mit.edu/doxygen_cxx/group__solvers.html">support a large
    suite of numerical solvers</a>.  Note that while I do advocate using these
    tools, you do not need to use them as a black box.  In many cases you can
    improve the optimization performance by understanding and selecting
    non-default configuration parameters.</p>

    <subsection><h1>Second-order methods (SQP /
    Interior-Point)</h1></subsection>

    <subsection><h1>First-order methods (SGD / ADMM) </h1>

      <subsubsection id="penalty"><h1>Penalty methods</h1>

        <p>Augmented Lagrangian</p>

      </subsubsection>
      <subsubsection><h1>Projected Gradient Descent</h1></subsubsection>

    </subsection>

    <subsection><h1>Zero-order methods (CMA)</h1></subsection>

    <todo>My original list: Lagrange multipliers / KKT, Gradient descent, SQP (SNOPT, NLOPT, IPOPT), Global optimization</todo>

    <subsection><h1>Example: Inverse Kinematics</h1>
    </subsection>


  </section> <!-- nonlinear optimization -->

  <section><h1>Combinatorial optimization</h1>
    <subsection><h1>Search, SAT, First order logic, SMT solvers, LP interpretation</h1></subsection>
    <subsection><h1>Mixed-integer convex optimization</h1></subsection>
  </section>
  <section><h1>"Black-box" optimization</h1>
    Derivative-free methods.  Some allow noisy evaluations.
  </section>

  <todo>Polynomial root finding/homotopy, L1 optimization, QCQP, LCP/Variational inequalities, BMI, ..., warm-starting, ...</todo>


</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=multibody.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=underactuated.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=playbook.html>Next Chapter</a></td>
</tr></table>

<div id="footer">
  <hr>
  <table style="width:100%;">
    <tr><td><em>Underactuated Robotics</em></td><td align="right">&copy; Russ
      Tedrake, 2020</td></tr>
  </table>
</div>


</body>
</html>