Sunday, 28 July 2013

An alternative variational principle

In the previous post I talked about one of the less quoted results in the variational theory of Sturm-Liouville eigenvalue problems which I owe to the book on differential equations by Erich Kamke and which I reworked for the purposes of my dissertation.

The other result that I also found in Kamke's book states an alternative variational principle which can be used to obtain estimates of the eigenvalues. This method gives a weaker upper bound than the Rayleigh quotient, however it requires smaller computational effort. In return, one can widen the class of trial functions without severely complicating the calculations.

We start with the Rayleigh quotient written in the abstract form:
$$J\left\{ \phi\right\}=\frac{\left\langle \phi,L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }$$
Then using Cauchy-Schwarz inequality we obtain.
$$\left(\left\langle \phi,L\left[\phi\right]\right\rangle \right)^{2}\leq\left\langle \phi,\phi\right\rangle \left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle
 \frac{\left\langle \phi,L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\leq\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,L\left[\phi\right]\right\rangle }$$
Written explicitly
$$\frac{\int_{\Omega}\rho\phi L\left[\phi\right]d\boldsymbol{x}}{\int_{\Omega}\rho\phi^{2}d\boldsymbol{x}}\le\frac{\int_{\Omega}\rho\left(L\left[\phi\right]\right)^{2}d\boldsymbol{x}}{\int_{\Omega}\rho\phi L\left[\phi\right]d\boldsymbol{x}}$$
Hence we can replace the original problem with the following one
$$K\left\{ \phi\right\} =\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,L\left[\phi\right]\right\rangle }\to\min,\qquad\phi\in\mathfrak{A}$$
Consider the eigenvalue problem
$$-y''=\lambda y,\quad y\left(0\right)=y\left(1\right)=0$$
Take the trial function
First we perform the calculation using the Rayleigh quotient
$$J\left\{ y_{1}\right\} =\frac{\left\langle y_{1},L\left[y_{1}\right]\right\rangle }{\left\langle y_{1},y_{1}\right\rangle }=-\frac{\int_{0}^{1}y_{1}y_{1}''dx}{\int_{0}^{1}y_{1}^{2}dx}$$
$$\begin{aligned}\int_{0}^{1}y_{1}y_{1}''dx & =\int_{0}^{1}\left[\left(\frac{x}{12}-\frac{x^{3}}{6}+\frac{x^{4}}{12}\right)\left(-x+x^{2}\right)\right]dx\\
 & =\int_{0}^{1}\left(\frac{x^{6}}{12}-\frac{x^{5}}{4}+\frac{x^{4}}{6}+\frac{x^{3}}{12}-\frac{x^{2}}{12}\right)dx\\
 & =-\frac{17}{5040}
The factor evaluated above will be common to both variational quotients
$$\begin{aligned}\int_{0}^{1}y_{1}^{2}dx & =\int_{0}^{1}\left(\frac{x^{8}}{144}-\frac{x^{7}}{36}+\frac{x^{6}}{36}+\frac{x^{5}}{72}+\frac{x^{2}}{144}\right)dx\\
 & =\frac{31}{90720}
$$ \lambda_{1}\le J\left\{ y_{1}\right\} =\frac{17\cdot90720}{5040\cdot31}\approx9.871$$
Now we use the alternative variational principle
$$K\left\{ y_{1}\right\} =\frac{\left\langle L\left[y_{1}\right],L\left[y_{1}\right]\right\rangle }{\left\langle y_{1},L\left[y_{1}\right]\right\rangle }=-\frac{\int_{0}^{1}y_{1}''^{2}dx}{\int_{0}^{1}y_{1}y_{1}''dx}$$
Obviously, the announced gain in computational efficiency comes from replacing \(y^{2}\) with \(\left(L\left[y\right]\right)^{2}\) which will be a polynomial of the order less by 4, than \(y^{2}\).
$$\lambda_{1}\le K\left\{ y_{1}\right\} =\frac{5040}{17\cdot30}\approx9.882$$
$$K\left\{ y_{1}\right\} >J\left\{ y_{1}\right\} >\lambda_{1}=\pi^{2}$$
as expected, however \(K\left\{ y_{1}\right\}\) takes less operations to evaluate.

Saturday, 27 July 2013

Krylov-Bogolyubov inclusion theorem

When I first got hold of a copy of Erich Kamke's classic book in 2002 I did not realise it would become the inspiration for two of my papers which to day I consider the best. One of them was an essay on lemniscate functions extracts from which I published in the first post here.

The other one was my dissertation for the MSc degree at the Open University. While working on the dissertation I had to remain within a rather restrictive framework of the general topic of variational methods in eigenvalue problems, with the emphasis on Ritz method and asymptotic properties of the eigenvalues.

Faced with a task of producing material which is both rigorous and original, in the area that is very well explored, and meeting the tight deadline, I decided to work through some of the less quoted results (not easily googled) and try to give an original account of them. Two of such results were given in Kamke's book without proof. In this post I discuss one of them.

The main focus of the dissertation was set on the asymptotic estimates for large eigenvalues. This topic was once brought to attention by the famous question of Mark Kac whether one can hear the shape of a drum. The estimates discussed in the course guide were all of the one-sided form, derived using the Rayleigh quotient. Kamke gives some examples of the two-sided estimates which I decided to put at the core of my project.

In particular, there is a theorem due to N. Krylov and N. Bogolyubov, published in 1929, which gives a two-sided estimate of the eigenvalues. In the Russian translation of Kamke's text a reference is made to a report for the Academy of Sciences of USSR which I was unable to get hold of. Other resources were not of much help either, until I found the book written by Lothar Kollatz. It gives the proof, however, the notational framework employed there seemed a bit obscure to me and was not compatible with the more modern setting I chose for my report. I decided to rewrite the proof.

Before I proceed I have to fix some definitions and cite some general results.

Let \(\Omega\) be a domain with a piecewise smooth boundary \(\partial\Omega\) and function \(\phi\in L_{2}\). Consider the following functional \(S:L_{2}\to\mathbb{R}\).

$$S\left\{ \phi\right\} =\int_{\Omega}\left(p\left|\nabla\phi\right|^{2}+q\phi^{2}\right)d\boldsymbol{x}+\int_{\partial\Omega}p\sigma\phi^{2}d\boldsymbol{s}$$

Where \(p>0\) and \(p\in C^{1}\left(\Omega\right)\), \(q\in C\left(\Omega\right)\). Now define the operator

$$L\left[u\right]=\frac{1}{\rho}\left(-\nabla\cdot\left(p\nabla u\right)+qu\right)$$

where \(\rho\in C^{1}\left(\Omega\right)\), \(\rho=\rho\left(\boldsymbol{x}\right)>0\) and demand that admissible functions satisfy the Robin boundary conditions
$$\left.\left(\frac{\partial u}{\partial n}+\sigma u\right)\right|_{\partial\Omega}=0$$

Define the scalar product \(L_{2}\times L_{2}\to\mathbb{C}\) as follows
$$\left\langle f,g\right\rangle =\int_{\Omega}\rho f\left(\boldsymbol{x}\right)g^{*}\left(\boldsymbol{x}\right)d\boldsymbol{x}$$

Denote the set of admissible functions

$$\mathfrak{A}=\left\{ \left. u \in L_{2} \right| \left. \left(\frac{\partial u}{\partial n}+\sigma u\right) \right|_{\partial\Omega}=0\right\}$$

and the subset set of normed admissible functions
$$\mathfrak{N}_{1}=\left\{ \left. u\in A\right| \int_{\Omega}\rho u^{2}d\boldsymbol{x}=1\right\}$$

It can be shown that the problem of minimization of the functional \(S\left\{\phi\right\} =\left\langle \phi,L\left[\phi\right]\right\rangle \to\min,\phi\in\mathfrak{N}_{1}\) is equivalent to the eigenvalue problem for the differential equation

$$L\left[u\right]=\lambda u$$
with boundary conditions
$$\left.\left(\frac{\partial u}{\partial n}+\sigma u\right)\right|_{\partial\Omega}=0$$

It can also be shown that \(L\) is self adjoint

$$\left\langle v,L\left[u\right]\right\rangle =\left\langle L\left[v\right],u\right\rangle$$

and that there exists a sequence an infinite and complete sequence of orthogonal functions (eigenfunctions) which satisfy boundary conditions and give rise to the corresponding sequence of the eigenvalues.

We can now state the theorem in question.

Theorem. For an arbitrary admissible function \(\phi\)
  which does not vanish identically consider the following quantities

$$\alpha=\frac{\left\langle \phi,L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\qquad\beta^{2}=\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }$$

Then \(\beta^{2}\ge\alpha^{2}\) and in the interval between

$$\alpha-\sqrt{\beta^{2}-\alpha^{2}}\quad\text{and}\quad\alpha+\sqrt{\beta^{2}-\alpha^{2}}$$ there lies at least one eigenvalue.

First we prove that \(\beta^{2}\ge\alpha^{2}\) using Cauchy-Schwarz inequality
$$\alpha^{2}=\left(\frac{\left\langle \phi,L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\right)^{2}\le\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle \left\langle \phi,\phi\right\rangle }{\left\langle \phi,\phi\right\rangle ^{2}}=\frac{\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }=\beta^{2}$$

Now consider the following expression

$$\begin{aligned}T\left[\phi\right] & =\frac{A\left[\phi\right]}{B\left[\phi\right]}=\frac{\left\langle \alpha\phi-L\left[\phi\right],\alpha\phi-L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\\
 & =\frac{\alpha^{2}\left\langle \phi,\phi\right\rangle -2\alpha\left\langle \phi,L\left[\phi\right]\right\rangle +\left\langle L\left[\phi\right],L\left[\phi\right]\right\rangle }{\left\langle \phi,\phi\right\rangle }\\
 & =\alpha^{2}-2\alpha^{2}+\beta^{2}=\beta^{2}-\alpha^{2}



Hence, by Bessel's inequality we obtain

$$A\left[\phi\right]=\left\langle \psi,\psi\right\rangle \ge\sum_{i=1}^{\infty}\left\langle \psi,\phi_{i}\right\rangle ^{2}$$

where \(\phi_{i}\) are the eigenfunctions of \(L\). Evaluating the coefficients

$$\begin{aligned}\left\langle \psi,\phi_{i}\right\rangle  & =\left\langle \alpha\phi-L\left[\phi\right],\phi_{i}\right\rangle =\alpha\left\langle \phi,\phi_{i}\right\rangle -\left\langle L\left[\phi\right],\phi_{i}\right\rangle \\
 & =\alpha\left\langle \phi,\phi_{i}\right\rangle -\left\langle \phi,L\left[\phi_{i}\right]\right\rangle =\left(\alpha-\lambda_{i}\right)\left\langle \phi,\phi_{i}\right\rangle

$$A\left[\phi\right]\ge\left(\alpha-\lambda_{i}\right)^{2}\left\langle \phi,\phi_{i}\right\rangle ^{2}$$

Since \(\phi\) is an admissible function, by completeness theorem we obtain
$$\begin{aligned}B\left[\phi\right] & =\left\langle \phi,\phi\right\rangle =\sum_{i=1}^{\infty}\left\langle \phi,\phi_{i}\right\rangle ^{2}\end{aligned}$$

Recalling the expression for \(T\left[\phi\right]\) we obtain
$$T\left[\phi\right]=\beta^{2}-\alpha^{2}\ge\frac{\sum_{i=1}^{\infty}\left(\alpha-\lambda_{i}\right)^{2}\left\langle \phi,\phi_{i}\right\rangle ^{2}}{\sum_{i=1}^{\infty}\left\langle \phi,\phi_{i}\right\rangle ^{2}}$$

If  \(\lambda_{n}\) is the nearest or one of the equidistant eigenvalues to \(\alpha\), then for all \(i\)


$$\frac{\sum_{i=1}^{\infty}\left(\alpha-\lambda_{i}\right)^{2}\left\langle \phi,\phi_{i}\right\rangle ^{2}}{\sum_{i=1}^{\infty}\left\langle \phi,\phi_{i}\right\rangle ^{2}}\ge\frac{\sum_{i=1}^{\infty}\left(\alpha-\lambda_{n}\right)^{2}\left\langle \phi,\phi_{i}\right\rangle ^{2}}{\sum_{i=1}^{\infty}\left\langle \phi,\phi_{i}\right\rangle ^{2}}=\left(\alpha-\lambda_{n}\right)^{2}$$


$$ \left|\alpha-\lambda_{n}\right|\le\sqrt{\beta^{2}-\alpha^{2}}$$
$$ \alpha-\sqrt{\beta^{2}-\alpha^{2}}\le\lambda_{n}\le\alpha+\sqrt{\beta^{2}-\alpha^{2}}$$

which completes the proof.

Wednesday, 4 July 2012

Counting Counter-examples

Apparently the ability to give counter-examples, particularly in analysis, is a sign of good understanding of the statements of theorems. These, sometimes natural, sometimes pathologic constructions reveal the significance of each individual assumption, disproving the hypothesis when one of the former is dropped.
Consider the following test of the requirement of completeness of a linear space arising in the formulation of the Banach-Steinhaus theorem. It is the proof of incompleteness of the given space that is interesting. The following construction is suggested:
$$X=\left\{ \left. x=(x_k) \right| \sum \left| x_k\right| <\infty \right\}$$
endowed with the norm
$$\Vert x\Vert=\sup_k\left|x_k\right|$$
where \((x_k)\) stands for a sequence of complex numbers.
There is some sense in which the selected norm is "unnatural" for the given space and that sense is precisely that normed space \(\left(X,\Vert .\Vert\right)\) is incomplete.
I began thinking in the direction of the harmonic series which seems to be the source of a great number of various counter-examples. After some juggling I considered the sequence
which is really a sequence of infinite sequences. Indeed the partial sum, after "telescoping" gives
$$\sum_{k=1}^n x_n=1-\frac{1}{1+k+n}$$
which tends to 1 and so \(x\in X\). It turns out that \(x_n\) is Cauchy in \(\left(X,\Vert .\Vert\right)\). Now \(x_n\) is Cauchy in \(\left(X,\Vert .\Vert\right)\), since
$$\Vert x_n - x_m \Vert = \sup_k\left|\frac{1}{k+n}-\frac{1}{k+m}+\frac{1}{1+k+m}-\frac{1}{1+k+n}\right|\to 0$$
Taking \(y=\left(\frac{1}{k}\right)\) we note that
$$\lim _{n\to 0}\Vert x_n - y \Vert=0$$
Another example using some "off the shelf" sequences demonstrates the non-triviality of the result of Hahn-Banach extension theorem. The main lesson here is that it is easy to construct an extension for a functional, but giving a linear extension is far from obvious. Take \(f(x)=\lim_n x_n\) where \(x\) is a convergent sequence. Consider the space of convergent sequences \(c\) as a subspace of bounded sequences \(l_{\infty}\). Define \(g\) on \(l_{\infty}\) by \(g(x)=f(x)\) for \(x\in c\) and \(g(x)=0\) for \(x\in l_{\infty}\backslash c\). It can be checked that \(g\) is a non-linear extension of \(f\) to \(l_{\infty}\). Indeed, consider
\(x\in c\), hence \(g(x)=f(x)=1\). \(y\in l_{\infty}\backslash c\), hence \(g(y)=0\). So \(g(x)+g(y)=1\). On the other hand
Clearly \((x+y)\in l_{\infty}\backslash c\), so \(g(x+y)=0 \ne g(x)+g(y)\), so \(g\) is non-linear.

Saturday, 2 June 2012

From integrating multiplier to integrating operator

I thought about this trick first and then found an example to apply it to.
$$\ln|1+x+y|=(y-x)+\ln C$$
I am still not sure if this is a one-off case or if it can be generalised to a consistent method. It all spins around the understanding of \(d\) as a linear operator when it comes up in equations and integrals. This is something I discuss at greater length in my self-published book

Speaking for the IMA

Two weeks ago I had an honour to speak art the 15th Early Career Mathematicians Conference at the University of Manchester organised by the IMA. The subject of the talk was largely an aggregation of my past work on courses, exercises, these blog posts, centred around the core ideas of mathematical analysis.
Although I was not happy about my presentation, because, even with quite a few technical examples given, I think it still lacked essence. The main reason for it, of course, is that I am not currently working on any specialist subject and still completing my MSc at the Open University. Nevertheless, I got a very positive feedback from the audience of graduate students, school teachers and university professors. The only part that did let me down was when someone referred to my material as the "philosophy of mathematics". Although they had all the good reasons for saying that, my aim remains to be a working mathematician. This year  I am finishing my part-time MSc which took me 3 years, on top of the full time job, and finally delving into one of my favourite branches of mathematics with the hope of delivering solid results.

Thursday, 2 February 2012

Finding a particular solution of a second order linear inhomogeneous recurrence equation

Approximation theory and methods did not really fit in the "big picture" of my study plan last year, not only because I am notoriously bad at numerical calculations. Having invested a lot of effort in developing intuition for the behaviour of analytic functions, I was suddenly confronted with the cubic splines which seemed to have all those properties, that well-mannered functions would be never allowed to possess.
Nicely, but artificially glued together of several pieces of cubics, smooth only up to the second derivative, vanishing on the entire intervals, now this is what seems really counter-intuitive.

The following is the kind of problem I got stuck with for a while. The task is to express a function, say $$f(x)=x^2$$ in terms of cubic B-splines on the entire real axis. I am omitting a lot of background material focusing on one particular idea that arises in the solution.

$$x^2=\sum_{p=-\infty}^{\infty}\lambda_p B_p(x)$$
Since $$B^3_p(x)$$ has a supporting interval \([p, p+3+1]=[p,p+4]\) of length 4 outside which it vanishes, we can start by expressing the function in terms of \(B_{-3} , B_{-2}, B_{-1} ,B_{0}\) on \([0.1]\) and then try to extend the result. Calculating the expressions for the splines on \([0.1]\):
multiplying by the respective coefficients, summing and equating powers of \(x\) on each side we arrive at the following system of equations:
-\lambda_{-3}+3\lambda_{-2}-3\lambda_{-1}+\lambda_{0} & =0\\
3\lambda_{-3}-6\lambda_{-2}+3\lambda_{-1} & =0\\
-3\lambda_{-3}+3\lambda_{-1} & =24\\
\lambda_{-3}+4\lambda_{-2}+\lambda_{-1} & =0\end{cases}$$

Having the solution (guaranteed by the Schoenberg-Whitney theorem): \((\lambda_{-3},\lambda_{-2},\lambda_{-1},\lambda_0)=(\frac{8}{3},\frac{-4}{3},\frac{8}{3},\frac{44}{3})\)
Now we want to find all coefficients on each of the intervals \([\xi_p,\xi_p+4]\) for the points \({\xi_i=ih;i=\pm1,\pm2....}\). From the general expression for the B-spline it can be deduced that
which for \(h=1\) leads to the following recurrence relation:
or, after changing the index
Now here is the trick that I came up with. The last expression can be thought of as a "second-order linear inhomogeneous recurrence relation". The advantage of this approach is that the structure of the solution instantly becomes clear.
The general solution of the corresponding homogeneous relation
is derived using the standard method of solving this type of recurrencies and is given by the following expression:
It can also be found using generating functions. Not surprisingly it depends on 2 arbitary constants, as it takes 2 initial terms, \(\lambda_0\) and \(\lambda_{-1}\) to reconstruct the whole sequence from the three-term recurrency. Applying the general ideas from the linear systems we deduce that in order to obtain the general solution of the inhomogeneous recurrency we have to add a particular solution to the expression above.
Since the RHS is the quadratic polynomial it makes sence to look for the particular solution in the form:
Substituting this into the original recurrency and gatherig together the powers of \(j\) we obtain:
which after equating powers gives the solution \((a,b,c)=(4,16,\frac{44}{3})\)
Thus the general solution of the inhomogeneous equation is given by the following formula:
Now we can use the values of \(\lambda_0\) and \(\lambda_{-1}\) to determine the constants (bearing in mind that \(\left(-2-\sqrt{3}\right)\left(-2+\sqrt{3}\right)=-1\)):
$$\frac{8}{3}= -\alpha\left(-2+\sqrt{3}\right)-\beta\left(-2-\sqrt{3}\right)+\frac{8}{3}$$
which gives \(\alpha=\beta=0\). Thus finally:
which is the solution of the original problem.

Sunday, 15 January 2012

Calculus in a warehouse

One of the most frequent remarks that I get when I mention studying applied mathematics is "applied to what?.." It is not always easy to give convincing examples straight away, because for me the enormous role of this science in all aspects of life is just so obvious. This is why I like to discover the unusual cases where some elaborate techniques can be applied to in quite mundane areas.

This is one of the small problems I formulated for myself during my studies of maritime logistics, just for the sake of curiosity. Calculations are pretty simple and straightforward, but the results look nice, also with some unexpected turn in the middle.

In maritime transport one has to deal with the so-called bulk cargoes (iron ore, coal, fertilizers etc) which are often stored in the ports in large piles. I once considered determining some quantitative characteristics that are useful for designing the warehouse to store such type of cargo.

Let’s approximate the pile of cargo with the geometrical body of height \(H\) having a rectangle \(L_2\times B\) at its base and 4 facets inclined towards the centre at equal slope, thus forming the upper horizontal edge of length \(L_1\):

Its volume can be calculated as follows:
So that
Now let us determine the amount of energy required to form the pile (meaning mechanical work that has to be done against the forces of gravity). Assume that density of the cargo is \(\gamma\) kg/m3. To lift a small layer to height \(H-x\) requires:
$$dA=\gamma gdV(H-x)=\gamma gB\left(L_1\frac{x(H-x)}{H}+\frac{(L_2-L_1)x^2(H-x)}{H^2}\right)dx$$
Thus, total work equals:
$$A=\gamma gB\int_0^H(L_1\frac{x(H-x)}{H}+\frac{(L_2-L_1)x^2(H-x)}{H^2})dx=\frac{\gamma gBH^2}{12}(L_1+L_2)$$

No we can ask the following question: if volume is a given value, what should be length and breadth so that the pile occupies minimal warehouse area? To simplify calculations and cancel out some parameters let us introduce the value called “angle of natural slope”. This is the term from soil mechanics which means the angle that the substance makes against horizontal surface. It depends on density, viscosity and inter-particle friction and can be found in special tables.
$$\frac{2H}{B}=\tan \chi$$
Assuming that V=const, obtain expression for L and insert it in the formula for the area S=LB:
$$S=4\left(\frac{1}{\sqrt[3]{6}}+1\right)\left(\frac{V}{\tan\chi}\right)^{2/3}\approx 4.4\left(\frac{V}{\tan\chi}\right)^{2/3}$$
Now, assuming again that \(V=const\), let us determine the values of \(L\), \(B\) again which minimize the amount of energy required to form the pile. Again, substituting for \(L\), but now in the expression for \(A\) we obtain:
$$A=\frac{\gamma gBH^2}{12}(L_1+L_2)=\frac{\gamma g H^2}{48}\left(\frac{8VB}{\tan\chi}-\frac{B^4}{3}\right)$$
$$A'(B)=\frac{\gamma g \tan^2 \chi}{48}\left(\frac{8V}{\tan\chi}-\frac{4B^3}{3}\right)=0$$
Remarkably, we get the same values which were obtained to minimize the base area, although the approach is now quite different.
To complete the picture we can use the derived results to determine maximum allowed volume for a single pile if unitary pressure on the ground is limited by some given value \(q\) kg/m2:
$$V<\frac{(4.4q)^3}{\gamma^3\tan\chi}\approx 85.3\frac{q^3}{\gamma^3\tan\chi}$$