Sunday, 28 July 2013

An alternative variational principle

In the previous post I talked about one of the less quoted results in the variational theory of Strurm-Liouville eigenvalue problems which I owe to the book on differential equations by Erich Kamke and which I reworked for the purposes of my dissertation.

The other result that I used to complement the compulsory discussion of the Ritz method which I also found in Kamke's book states an alternative variational principle which can be used to obtain estimates of the eigenvalues. This method gives a weaker upper bound than the classical Rayleigh quotient, however takes less computational effort to evaluate. In return, one can widen the class of trial functions without severely complicating the calculations.

The expression of this alternative principle took some time to sink in, especially given the fact that I was running out of time to complete the thesis. I nearly posted a question on MSE to help me work out the motivation behind the new quotient. I am glad I did not do it, as the derivation turned out to be much easier then it seemed.

We start off with the Rayleigh quotient written in the abstract form:
$$J\left\{ \phi\right\}=\frac{\left\langle \phi,L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }$$
Then using Cauchy-Schwarz inequality we obtain.
$$\left(\left\langle \phi,L\left[\phi\right]\right\rangle \right)^{2}\leq\left\langle \phi,\phi\right\rangle \left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle
 \frac{\left\langle \phi,L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\leq\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,L\left[\phi\right]\right\rangle }$$
Written explicitly
$$\frac{\int_{\Omega}\rho\phi L\left[\phi\right]d\boldsymbol{x}}{\int_{\Omega}\rho\phi^{2}d\boldsymbol{x}}\le\frac{\int_{\Omega}\rho\left(L\left[\phi\right]\right)^{2}d\boldsymbol{x}}{\int_{\Omega}\rho\phi L\left[\phi\right]d\boldsymbol{x}}$$
Hence we can replace the original problem with the following one
$$K\left\{ \phi\right\} =\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,L\left[\phi\right]\right\rangle }\to\min,\qquad\phi\in\mathfrak{A}$$
Consider the following eigenvalue problem
$$-y''=\lambda y,\quad y\left(0\right)=y\left(1\right)=0$$
Take the following trial function
First we perform the calculation using the Rayleigh quotient
$$J\left\{ y_{1}\right\} =\frac{\left\langle y_{1},L\left[y_{1}\right]\right\rangle }{\left\langle y_{1},y_{1}\right\rangle }=-\frac{\int_{0}^{1}y_{1}y_{1}''dx}{\int_{0}^{1}y_{1}^{2}dx}$$
$$\begin{aligned}\int_{0}^{1}y_{1}y_{1}''dx & =\int_{0}^{1}\left[\left(\frac{x}{12}-\frac{x^{3}}{6}+\frac{x^{4}}{12}\right)\left(-x+x^{2}\right)\right]dx\\
 & =\int_{0}^{1}\left(\frac{x^{6}}{12}-\frac{x^{5}}{4}+\frac{x^{4}}{6}+\frac{x^{3}}{12}-\frac{x^{2}}{12}\right)dx\\
 & =-\frac{17}{5040}
The factor just evaluated will be common for both variational quotients
$$\begin{aligned}\int_{0}^{1}y_{1}^{2}dx & =\int_{0}^{1}\left(\frac{x^{8}}{144}-\frac{x^{7}}{36}+\frac{x^{6}}{36}+\frac{x^{5}}{72}+\frac{x^{2}}{144}\right)dx\\
 & =\frac{31}{90720}
$$ \lambda_{1}\le J\left\{ y_{1}\right\} =\frac{17\cdot90720}{5040\cdot31}\approx9.871$$
Now we use the alternative variational principle
$$K\left\{ y_{1}\right\} =\frac{\left\langle L\left[y_{1}\right],L\left[y_{1}\right]\right\rangle }{\left\langle y_{1},L\left[y_{1}\right]\right\rangle }=-\frac{\int_{0}^{1}y_{1}''^{2}dx}{\int_{0}^{1}y_{1}y_{1}''dx}$$
Obviously the gain in computational efficiency comes from replacing $y^{2}$ with $\left(L\left[y\right]\right)^{2}$ which will be a polynomial of the order less by 4, than $y^{2}$.
$$\lambda_{1}\le K\left\{ y_{1}\right\} =\frac{5040}{17\cdot30}\approx9.882$$
$$K\left\{ y_{1}\right\} >J\left\{ y_{1}\right\} >\lambda_{1}=\pi^{2}$$
as expected, however $K\left\{ y_{1}\right\}$ takes less operations to evaluate.

Saturday, 27 July 2013

Krylov-Bogolyubov inclusion theorem

When I first got hold of a copy of Erich Kamke's classic book in 2002 I did not realise it would be the inspiration of two of my papers which to dat I consider the best. One of them was an essay on lemniscate functions extracts from which I published in the first post here.

The other one was my dissertation for the MSc degree at the Open University. While working on the dissertation I had to remain within a rather restrictive framework of the general topic of variational methods in eigenvalue problems, with the emphasis on Ritz method and asymptotic properties, as well as the time constraints.

Faced with a task of producing material which is both rigorous and original, in the area that is relatively well explored, and meet the tight deadline, I decided to work through some of the less referred to results (not easily googled) and try to give original account of them. Two of such results were given in Kamke's book without proof. In this post I discuss one of them.

The general setting for the dissertation focused on the asymptotic estimates for large eigenvalues. This topic was once brought to attention by the famous question of Mark Kac whether one can hear the shape of a drum. The estimates discussed in the course guide were all one-sided, derived using the Rayleigh quotient. Kamke gives some examples of the two-sided estimates which I decided to put at the core of my project.

In particular there is a theorem is due to N. Krylov and N. Bogolyubov who published it in 1929 which gives a two sided estimate of the eigenvalues. In the Russian translation of Kamke a reference for the original paper is made to a report for the Academy of Sciences of USSR which I was unable to get hold of. Most of the online and other resources were not of much help either until I found a book by Lothar Kollatz. It essentially gives the proof, however, the notational framework employed there seemed a bit obscure to me and was not compatible for the more modern setting I chose for my report. I certainly could not afford rewriting the report for the purpose of the proof, so I decided to rewrite the proof.

Before I proceed I have to fix some definitions and cite some general results.

Let $\Omega$ be a domain with a piecewise smooth boundary $\partial\Omega$ and function $\phi\in L_{2}$. Consider the following functional $S:L_{2}\to\mathbb{R}$.

$$S\left\{ \phi\right\} =\int_{\Omega}\left(p\left|\nabla\phi\right|^{2}+q\phi^{2}\right)d\boldsymbol{x}+\int_{\partial\Omega}p\sigma\phi^{2}d\boldsymbol{s}$$

Where $p>0$ and $p\in C^{1}\left(\Omega\right)$, $q\in C\left(\Omega\right)$. Now define operator

$$L\left[u\right]=\frac{1}{\rho}\left(-\nabla\cdot\left(p\nabla u\right)+qu\right)$$

where $\rho\in C^{1}\left(\Omega\right)$, $\rho=\rho\left(\boldsymbol{x}\right)>0$ and demand that admissible functions satisfy the Robin boundary conditions
$$\left.\left(\frac{\partial u}{\partial n}+\sigma u\right)\right|_{\partial\Omega}=0$$

Define scalar product $L_{2}\times L_{2}\to\mathbb{C}$ as follows
$$\left\langle f,g\right\rangle =\int_{\Omega}\rho f\left(\boldsymbol{x}\right)g^{*}\left(\boldsymbol{x}\right)d\boldsymbol{x}$$

Denote the set of admissible functions

$$\mathfrak{A}=\left\{ \left. u \in L_{2} \right| \left. \left(\frac{\partial u}{\partial n}+\sigma u\right) \right|_{\partial\Omega}=0\right\}$$

and the subset set of normed admissible functions
$$\mathfrak{N}_{1}=\left\{ \left. u\in A\right| \int_{\Omega}\rho u^{2}d\boldsymbol{x}=1\right\}$$

It can be shown that the problem of minimization of the functional $S\left\{\phi\right\} =\left\langle \phi,L\left[\phi\right]\right\rangle \to\min,\phi\in\mathfrak{N}_{1}$ is equivalent to the eigenvalue problem for the differential equation

$$L\left[u\right]=\lambda u$$
with boundary conditions
$$\left.\left(\frac{\partial u}{\partial n}+\sigma u\right)\right|_{\partial\Omega}=0$$

It can also be shown that $L$ is self adjoint

$$\left\langle v,L\left[u\right]\right\rangle =\left\langle L\left[v\right],u\right\rangle$$

and that there exists a sequence an infinite and complete sequence of orthogonal functions (eigenfunctions) which satisfy boundary conditions and give rise to the corresponding sequence of the eigenvalues.

We can now state the theorem in question.

Theorem. For an arbitrary admissible function $\phi$
  which does not vanish identically consider the following quantities

$$\alpha=\frac{\left\langle \phi,L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\qquad\beta^{2}=\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }$$

Then $\beta^{2}\ge\alpha^{2}$ and in the interval between

$$\alpha-\sqrt{\beta^{2}-\alpha^{2}}\quad\text{and}\quad\alpha+\sqrt{\beta^{2}-\alpha^{2}}$$ there lies at least one eigenvalue.

First we prove that $\beta^{2}\ge\alpha^{2}$ using Cauchy-Schwarz inequality
$$\alpha^{2}=\left(\frac{\left\langle \phi,L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\right)^{2}\le\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle \left\langle \phi,\phi\right\rangle }{\left\langle \phi,\phi\right\rangle ^{2}}=\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }=\beta^{2}$$

Now consider the following expression

$$\begin{aligned}T\left[\phi\right] & =\frac{A\left[\phi\right]}{B\left[\phi\right]}=\frac{\left\langle \alpha\phi-L\left[\phi\right],\alpha\phi-L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\\
 & =\frac{\alpha^{2}\left\langle \phi,\phi\right\rangle -2\alpha\left\langle \phi,L\left[\phi\right]\right\rangle +\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\\
 & =\alpha^{2}-2\alpha^{2}+\beta^{2}=\beta^{2}-\alpha^{2}



Hence by Bessel's inequality

$$A\left[\phi\right]=\left\langle \psi,\psi\right\rangle \ge\sum_{i=1}^{\infty}\left\langle \psi,\phi_{i}\right\rangle ^{2}$$

where $\phi_{i}$ are the eigenfunctions of $L$. Evaluating the coefficients

$$\begin{aligned}\left\langle \psi,\phi_{i}\right\rangle  & =\left\langle \alpha\phi-L\left[\phi\right],\phi_{i}\right\rangle =\alpha\left\langle \phi,\phi_{i}\right\rangle -\left\langle L\left[\phi\right],\phi_{i}\right\rangle \\
 & =\alpha\left\langle \phi,\phi_{i}\right\rangle -\left\langle \phi,L\left[\phi_{i}\right]\right\rangle =\left(\alpha-\lambda_{i}\right)\left\langle \phi,\phi_{i}\right\rangle

$$A\left[\phi\right]\ge\left(\alpha-\lambda_{i}\right)^{2}\left\langle \phi,\phi_{i}\right\rangle ^{2}$$

Now since $\phi$ is an admissible function by completeness theorem we obtain
$$\begin{aligned}B\left[\phi\right] & =\left\langle \phi,\phi\right\rangle =\sum_{i=1}^{\infty}\left\langle \phi,\phi_{i}\right\rangle ^{2}\end{aligned}$$

Recalling the expression for $T\left[\phi\right]$ we obtain
$$T\left[\phi\right]=\beta^{2}-\alpha^{2}\ge\frac{\sum_{i=1}^{\infty}\left(\alpha-\lambda_{i}\right)^{2}\left\langle \phi,\phi_{i}\right\rangle ^{2}}{\sum_{i=1}^{\infty}\left\langle \phi,\phi_{i}\right\rangle ^{2}}$$

If  $\lambda_{n}$ is the nearest or one of the equidistant eigenvalues to $\alpha$, then for all $i$


$$\frac{\sum_{i=1}^{\infty}\left(\alpha-\lambda_{i}\right)^{2}\left\langle \phi,\phi_{i}\right\rangle ^{2}}{\sum_{i=1}^{\infty}\left\langle \phi,\phi_{i}\right\rangle ^{2}}\ge\frac{\sum_{i=1}^{\infty}\left(\alpha-\lambda_{n}\right)^{2}\left\langle \phi,\phi_{i}\right\rangle ^{2}}{\sum_{i=1}^{\infty}\left\langle \phi,\phi_{i}\right\rangle ^{2}}=\left(\alpha-\lambda_{n}\right)^{2}$$


$$ \left|\alpha-\lambda_{n}\right|\le\sqrt{\beta^{2}-\alpha^{2}}$$
$$ \alpha-\sqrt{\beta^{2}-\alpha^{2}}\le\lambda_{n}\le\alpha+\sqrt{\beta^{2}-\alpha^{2}}$$

which completes the proof.

Wednesday, 4 July 2012

Counting Counter-examples

Without having much time to write really with dissertation and the next assignment deadlines looming and preparing my first article for Mathematics Today (fingers crossed that it is accepted) I still wanted to put down a few follow up thoughts on the brilliant OU course in Functional Analysis I am currently doing.
Apparently the ability to give counter-examples, particularly in analysis is a sign of good understanding of the statements of theorems. These sometimes natural, sometimes pathologic constructions reveal the significance of each individual assumption disproving the hypothesis when one of the former is dropped.
One of the questions in the book asks to test the requirement of completeness of a linear space arising in the formulation of the Banach-Steinhaus theorem. It is the proof of incompleteness of the given space that is interesting. The following construction is suggested:
$$X=\left\{ \left. x=(x_k) \right| \sum \left| x_k\right| <\infty \right\}$$
endowed with the norm
$$\Vert x\Vert=\sup_k\left|x_k\right|$$
where $(x_k)$ stands for a sequence of complex numbers.
There is some sense in which the selected norm is "unnatural" for the given space and that sense is precisely that normed space $\left(X,\Vert .\Vert\right)$ is incomplete.
At risk of locking myself up in patterns I started thinking in the direction of the harmonic series which seems to be the source of a great number of various counter-examples. After some juggling around I considered the sequence
which is really a sequence of infinite sequences. Indeed the partial sum, after "telescoping" gives
$$\sum_{k=1}^n x_n=1-\frac{1}{1+k+n}$$
which tends to 1 and so $x\in X$. It turns out that $x_n$ is Cauchy in $\left(X,\Vert .\Vert\right)$. Now $x_n$ is Cauchy in $\left(X,\Vert .\Vert\right)$, since
$$\Vert x_n - x_m \Vert = \sup_k\left|\frac{1}{k+n}-\frac{1}{k+m}+\frac{1}{1+k+m}-\frac{1}{1+k+n}\right|\to 0$$
Taking $y=\left(\frac{1}{k}\right)$ we note that
$$\lim _{n\to 0}\Vert x_n - y \Vert=0$$
Hence $x_n$ converges to $y$ in the sense of the given norm, however, $y\notin X$ so the space is incomplete.
The book gives a more concise constructive solution working with the sequences of partial sums of the above sequence instead. However, I decided to stick with my ugly one. It is shooting straight into the heart of the problem and working backwards that often allows to resolve more complicated cases and I bet that's how the authors arrived at their solution.
Another example using some "off the shelf" sequences demonstrates the non-triviality of the result of Hahn-Banach extension theorem. The lesson is that it is easy to construct an extension for a functional, but giving a linear extension is far from obvious. Take $f(x)=\lim_n x_n$ where $x$ is a convergent sequence. Consider the space of convergent sequences $c$ as a subspace of bounded sequences $l_{\infty}$. Define $g$ on $l_{\infty}$ by $g(x)=f(x)$ for $x\in c$ and $g(x)=0$ for $x\in l_{\infty}\backslash c$. It can be checked that $g$ is a non-linear extension of $f$ to $l_{\infty}$. Indeed, consider
$x\in c$, hence $g(x)=f(x)=1$. $y\in l_{\infty}\backslash c$, hence $g(y)=0$. So $g(x)+g(y)=1$. On the other hand
Clearly $(x+y)\in l_{\infty}\backslash c$, so $g(x+y)=0 \ne g(x)+g(y)$, so $g$ is non-linear.

Saturday, 2 June 2012

From integrating multiplier to integrating operator

I though about this trick and then found an example to apply it to.
$$\ln|1+x+y|=(y-x)+\ln C$$
I am still not sure if this is a one-off case or it can be generalised to a consistent method. It all spins around the understanding of $d$ as a linear operator when it comes up in equations and integrals. This is something I am discussing at greater lengths in my self-published book

Welcome to StackExchange

Since two weeks ago I have been addicted to math.SE forum. Here is my profile with a rather modest track record attached to it. I found it a very easy to use and thoughtfully organised environment which encourages users to participate actively and remain fair to each other. Mathematicians can be an unexpectedly tough community, in fact. I found a lot of skilful and deep individuals and enjoyed some beautiful arguments even when they competed with mine. On the downside, I can see that there is a fair amount of showing off clearly seen in some of the discussions (which I am myself far from being free of) when standard textbook exercises attract an avalanche of responses fighting for cheap reputation points. At the some time some of the more technically demanding questions are often left unattended, including one of my own.See how far I can get before the end of the year and how much of my reputation is really worth a badge!

Speaking for the IMA

Two weeks ago I had an honour to speak art the 15th Early Career Mathematicians Conference at the University of Manchester organised by the IMA. The subject of the talk was largely an aggregation of my past work on courses, exercises, these blog posts centred around the core ideas of mathematical analysis.
Although I was not happy about my presentation, because even with quite a few technical examples I think it still is lacking essence. The main reason for it, of course, is that I am not currently working on any specialist subject and still completing my MSc at the Open University. Nevertheless, I got a very positive feedback from the audience of graduate students, school teachers and university professors. The only part that let me down was when someone referred to my statements as the philosophy of mathematics. Although there were all reasons for saying that, my aim remains to be a working mathematician. This year  I am finishing my part-time MSc which took me 3 years, on top of the full time job and finally delving into one of my favourite branches of mathematics with the hope of delivering solid results.
My lecture will be published in the August issue of the Mathematics Today, a quarterly magazine that IMA distribute among their members.

Thursday, 2 February 2012

Finding a particular solution of a second order linear inhomogeneous recurrence equation

Approximation theory and methods did not really fit in the "big picture" of my study plan last year, not only because I am notoriously bad at numerical calculations. Having invested a lot of effort in developing intuition for the behaviour of analytic functions I was suddenly confronted with the cubic splines which seemed to have all those properties, that well-mannered functions would be never allowed to possess.
Nicely, but artificially glued together of several pieces of cubics, smooth only up to the second derivative, vanishing on the entire intervals, now this is what seems really counter-intuitive.

The following is the kind of problem I got stuck with for a while. The task is to express a function, say $$f(x)=x^2$$ in terms of cubic B-splines on the entire real axis. I am omitting a lot of background material focusing on one particular idea that arises in the solution.

$$x^2=\sum_{p=-\infty}^{\infty}\lambda_p B_p(x)$$
Since $$B^3_p(x)$$ has a supporting interval $[p, p+3+1]=[p,p+4]$ of length 4 outside which it vanishes, we can start by expressing the function in terms of $B_{-3} , B_{-2}, B_{-1} ,B_{0}$ on $[0.1]$ and then try to extend the result. Calculating the expressions for the splines on $[0.1]$:
multiplying by the respective coefficients, summing and equating powers of $x$ on each side we arrive at the following system of equations:
-\lambda_{-3}+3\lambda_{-2}-3\lambda_{-1}+\lambda_{0} & =0\\
3\lambda_{-3}-6\lambda_{-2}+3\lambda_{-1} & =0\\
-3\lambda_{-3}+3\lambda_{-1} & =24\\
\lambda_{-3}+4\lambda_{-2}+\lambda_{-1} & =0\end{cases}$$

Having the solution (guaranteed by the Schoenberg-Whitney theorem): $(\lambda_{-3},\lambda_{-2},\lambda_{-1},\lambda_0)=(\frac{8}{3},\frac{-4}{3},\frac{8}{3},\frac{44}{3})$
Now we want to find all coefficients on each of the intervals $[\xi_p,\xi_p+4]$ for the points ${\xi_i=ih;i=\pm1,\pm2....}$. From the general expression for the B-spline it can be deduced that
which for $h=1$ leads to the following recurrence relation:
or, after changing the index
Now here is the trick that I came up with. The last expression can be thought of as a "second-order linear inhomogeneous recurrence relation". The advantage of this approach is that the structure of the solution instantly becomes clear.
The general solution of the corresponding homogeneous relation
is derived using the standard method of solving this type of recurrencies and is given by the following expression:
It can also be found using generating functions. Not surprisingly it depends on 2 arbitary constants, as it takes 2 initial terms, $\lambda_0$ and $\lambda_{-1}$ to reconstruct the whole sequence from the three-term recurrency. Applying the general ideas from the linear systems we deduce that in order to obtain the general solution of the inhomogeneous recurrency we have to add a particular solution to the expression above.
Since the RHS is the quadratic polynomial it makes sence to look for the particular solution in the form:
Substituting this into the original recurrency and gatherig together the powers of $j$ we obtain:
which after equating powers gives the solution $(a,b,c)=(4,16,\frac{44}{3})$
Thus the general solution of the inhomogeneous equation is given by the following formula:
Now we can use the values of $\lambda_0$ and $\lambda_{-1}$ to determine the constants (bearing in mind that $\left(-2-\sqrt{3}\right)\left(-2+\sqrt{3}\right)=-1$):
$$\frac{8}{3}= -\alpha\left(-2+\sqrt{3}\right)-\beta\left(-2-\sqrt{3}\right)+\frac{8}{3}$$
which gives $\alpha=\beta=0$. Thus finally:
which is the solution of the original problem.