next up previous
Next: Note: Up: Globalizing Newton's method: Descent Previous: A secant update for

A secant update for the Hessian

I now return to the question of finding a secant update for the Hessian that preserves symmetry and positive definiteness. That is, given a symmetric and positive definite matrix Hk and

\begin{displaymath}s^{(k)}=x^{(k+1)}-x^{(k)},\ y^{(k)}=\nabla f(x^{(k+1)})-\nabla f(x^{(k)}),
\end{displaymath}

I want to find a symmetric positive definite matrix Hk+1 that satisfies

 
Hk+1s(k)=y(k). (6)

Such an Hk+1 does not necessarily exist. If (6) holds, where Hk+1 is positive definite, then

\begin{displaymath}s^{(k)}\cdot y^{(k)}=s^{(k)}\cdot H_{k+1}s^{(k)}>0.
\end{displaymath}

Therefore, if $s^{(k)}\cdot y^{(k)}>0$ fails, it will be impossible to find Hk+1. The condition $s^{(k)}\cdot y^{(k)}>0$ is equivalent to

 \begin{displaymath}
\left(\nabla f(x^{(k+1)})-\nabla f(x^{(k)})\right)\cdot\left(x^{(k+1)}-x^{(k)}\right)>0.
\end{displaymath} (7)

A strictly convex function f would satisfy (7) for any points x(k) and x(k+1). The condition (7) therefore means that $\nabla f(x^{(k)})$ and $\nabla f(x^{(k+1)})$ are consistent with f's having positive curvature on the line segment between x(k) and x(k+1). If this condition were to fail, it would be impossible to find a positive definite matrix Hk+1 satisfying the secant equation.

Next I wish to point out the following: No rank-one update can produce Hk+1 that is symmetric, positive definite, and satisfies the secant equation, at least not in every case. Indeed, to preserve symmetry and positive definiteness, a rank-one update would have to take the form

Hk+1=Hk+uuT.

However, it is easy to show that, in some cases (even when $s^{(k)}\cdot y^{(k)}>0$holds), no vector u causes Hk+1 to satisfy the secant equation.4

Since a rank one update cannot be found, I will try a different approach. To simplify the notation in the following derivation, I will write H=Hk, H+=Hk+1, s=s(k), and y=y(k). The positive definite matrix H has a Cholesky factorization H=LLT. I will look for H+ in the form H+=JJT, where J is nonsingular and close to L in some sense. The reader should notice that, if J is nonsingular, then JJT is necessarily symmetric and positive definite.

If H+ is to satisfy the secant equation, then

JJTs=y

must hold. The matrix J will be found by a two-step process:
1.
Given any vector v, choose J so that Jv=y and J is as close as possible to L. Broyden's method shows how to do this:

 \begin{displaymath}
J=L+\frac{\left(y-Lv\right)v^T}{v\cdot v}.
\end{displaymath} (8)

2.
Choose v so that JTs=v also holds (then JJT will satisfy the secant equation (6)). Using (8),

\begin{eqnarray*}J^Ts=v&\Rightarrow&L^Ts+\frac{v(y-Lv)^Ts}{v\cdot v}=v\\
&\Righ...
...\frac{y\cdot s}{v\cdot v}+
\frac{v\cdot L^Ts}{v\cdot v}\right)v.
\end{eqnarray*}


This last equation is quite nonlinear in v, and to solve it directly would be difficult or impossible. However, it shows that the vectors v and LTs must point in the same direction. Therefore, there exists $\alpha$such that

\begin{displaymath}v=\alpha L^Ts.
\end{displaymath}

Substituting this formula for v into (8) and simplifying yields

\begin{displaymath}J=L+\frac{ys^TL}{\alpha s\cdot Hs}-\frac{Hss^TL}{s\cdot Hs}
\end{displaymath}

(to arrive at this formula for J, I used the fact that H=LLT).

I now use the equation JTs=v, where $v=\alpha L^Ts$, to solve for $\alpha$:

\begin{eqnarray*}J^Ts=v&\Rightarrow&L^Ts+\frac{L^Tsy^Ts}{\alpha s\cdot Hs}-
\fra...
...\alpha L^Ts\\
&\Rightarrow&\alpha^2=\frac{y\cdot s}{s\cdot Hs}.
\end{eqnarray*}


There are two solutions for $\alpha$; the positive solution is the correct one to take, since then J=L if H already satisfies the secant equation. (The reader should notice that the requirement that $s\cdot y$ be positive appears here.)

Therefore, if

 \begin{displaymath}
J=L+\frac{ys^TL}{\alpha s\cdot Hs}-\frac{Hss^TL}{s\cdot Hs},
\end{displaymath} (9)

where

\begin{displaymath}\alpha=\sqrt{\frac{y\cdot s}{s\cdot Hs}},
\end{displaymath}

then H+=JJT is symmetric, positive definite, and satisfies

H+s=y.

Since J, as given by (9), is not necessarily lower triangular (even though L is), it is more convenient to express the update in terms of H and H+ rather than in terms of L and J. A tedious calculation, which simplifies nicely in the end, shows that

\begin{displaymath}JJ^T=H-\frac{Hss^TH}{s\cdot Hs}+\frac{yy^T}{y\cdot s}.
\end{displaymath}

The resulting update is known as the BFGS update:5

 \begin{displaymath}
H_{k+1}=H_k-\frac{H_ks^{(k)}(s^{(k)})^TH_k}{s^{(k)}\cdot H_ks^{(k)}}+
\frac{y^{(k)}(y^{(k)})^T}{y^{(k)}\cdot s^{(k)}}.
\end{displaymath} (10)



 
next up previous
Next: Note: Up: Globalizing Newton's method: Descent Previous: A secant update for
Mark S. Gockenbach
2003-02-17