My view of the world

My view of the world

I think of myself as an inference machine fed with sensory information. I see, hear, taste, smell, and feel (in the sense of touch) according to whatever that information consists of. This brain-in-the-vat view assumes nothing, since I experience this sensory information simply by being alive. To me, the reality is what I sense. The structure of this reality is whatever I can infer from and validate against the sensory flow.

First weeks

Since this sensory flow started last week, I've mostly been afraid since everything just
seems chaotic. I have noticed that there is variation to what I sense, and some of these
variations occur repeatedly. I know that sometimes I like what I sense and sometimes I
don't like what I sense. Most oddly, whenever I feel bad, there are these repeating
occurrences of something that suddenly make me feel good. I call them mom and dad.
No idea what they are though. There are other reoccurring things too, and I have started giving names to them. That thing I call red, that green, that loud, and that painful.

First months

Whatever the system is that feeds me this sensory information, it seems I am part of it. I have identified parts of the sensory information as if being directly caused by me. I have these things I call hands and legs, and can see them swinging in front of me when I want to. In the same way, I can make noises, or inflict myself pain by making a wrongly coordinated move. Being part of the system is one of my strongest beliefs: my repetitive tests on this matter never fail, and these tests I do automatically every day.

The system seems to put very stringent boundaries, or rules, on what I can do. Colliding my body with certain objects inflicts pain in me. This is sensory information which is such
intense and unbearable that I go to great lengths to avoid it. I also find that I am unable to carry out certain tasks. No matter hard I try, I am unable to get off this bed by myself. It is as if I was drawn to it.

To me, the most important thing is to avoid pain. But some of the sensory information is the opposite of pain. This is a sensation which I really like, and would like continue feeling without a single stop. I call it happiness.

First years

I have found other anomalies from the sensory information. After recognizing myself, I have found increasingly many instances of things that look like me. Not exactly, but there are so massively many similarities between me and these things that I have classified them as humans, me included. These things appear to act as if they were alive like me, and they seem as able to modify the world as I am. It turned out that mom and dad are humans just like me.

In recent times, I have tried several ways of affecting the system. Last week I tried to use my thoughts to move rocks around. It did not work. I also tried to fly by jumping and flapping my hands. But I failed. Yet the birds were flying all the time with such ease.

I have come to call this system a world.

Childhood

I have found that humans must be treated differently from other things. The problematic thing with them is that they are very able to cause me pain. Physical pain is one, but other sensory overloads are awful as well, such as prolonged loud noises they might make. Along with the strong belief of being part of the world, by experience I have come to strongly
believe that there are multiple humans in this world, sensing the same objective reality. It is not clear to me whether there actually exists other living beings such as me, sharing this experience of control. From my viewpoint, I can never know that. Actually, if these things simply behave as if they were alive, from my viewpoint the question about a simulation versus reality is not relevant at all. I have to live in this world in any case.

To avoid human-inflicted pain, I have constructed a set of strategies for minimizing the pain the other humans inflict on me. I call this social behaviour. It is quite simple, actually. What I do is to think of them as if they experienced the world as I do, to think that they are alive, and then behave towards them as I would like them to behave towards me. I call it a reciprocality principle. When sensing other humans, I have noticed that they induce different kinds of emotions in me. When these other humans do things, recalling a similar event that has happened to me causes me to relive the event myself. It is almost an automatic reaction in me: I call it empathy. Empathy helps me to better choose a reaction which conforms to the reciprocality principle.

I have come to form the rudimentary concepts of wrong and right. Roughly, I call it wrong  to inflict pain in me, and I call everything else right. But after accepting the practical existence of other human beings, and recognizing empathy in me, I have found that I must generalize these concepts. The problem is that recognizing pain in others causes pain in me. But then the problem of minimizing my pain has become a difficult problem, and actually, I feel overwhelmed by it. Suddenly things have become chaotic; minor temporal changes have massive long-term consequences, and I am less and less able to control the future.

I have found it necessary to attempt to balance the distribution of pain. Sometimes I have  to accept some pain now to avoid the world from causing me much greater projected pain later. It is unfortunate that the world is so uncontrollable; I found happiness in control, even if it was illusory. I find that a good strategy to minimize my own pain is to attempt to minimize pain in humans overall. Sometimes this leads to situations where I accept more pain for myself, and grant more happiness to others. But in general it seems to work out well; many times humans treat me reciprocally so.

Adolescence

I have found consistency in the way the world works; most of the limiting rules stay the same from day to day, from year to year. I am aware that this property need not hold. Perhaps I can walk through the brick wall tomorrow when the world changes to allow that. Being fed by the sensory information, I am fundamentally unable to form absolute truths. To only thing I know for certain is what has happened to me in the past.

However, as years have gone by living in this world, I have noticed it being incredibly  consistent. Many of these consistencies I have found by myself. But communicating with other humans has revealed that they have also found such consistencies, and some of this information is of the form which I have never experienced during my life. Still, testing their claims I have found some of them to be correct. This has strengthened my belief that the other humans exist in the same way as I do; they are able to provide me with structured information about the world, kind of shortcuts of inference, and they seem independent of me.

Adulthood

Accepting that everything is possible is not the same as accepting that everything is plausible. I have come to evaluate the quality of information by how many times I can validate it against my sensory flow. By now one of the best information I have is that I can not move through any object dense enough. It is comforting, since a consistent property of this world, gravity, makes me being constantly drawn towards the ground, and if it weren't for the ground to stop that, I think I would fall endlessly.

Obtaining information of good quality has become important to me because it seems to enable me to form models of the world which allow me to predict the future increasingly better. Correct predictions allow me more control over the world, and more control over the world means I have better tools to help humans (myself) feel happier and avoid making them feel miserable.

Finally, I have come to classify two categories of humans as dangerous. The first category consists of those who seem to lack the capability for empathy. Because of this, they seem to be free to make use of a wider range of strategies to gain happiness, or relieve pain, not excluding inflicting pain on others. The second category consists of those who accept information as correct without validating that information against their sensory flow. Humans make models based on the information they have, and use those models to predict the best way to act. The problem is that if the information is of bad quality, then their models may cause them to act so as to increase pain and to decrease happiness in others. Granted, invalid models of the world may sometimes make people act right. Unfortunately, that happens randomly.

Posted in Philosophical stuff | Leave a comment

Infinitely differentiable functions

Let f, g : X \to \mathbb{R} be infinitely differentiable functions, where X \subset \mathbb{R} is an open set in the usual Euclidean metric topology. In this post I will show some closure properties of this class of functions, such that it becomes easy to see whether a given composite function is infinitely differentiable (but not necessarily that it isn't). This is of interest when studying the theory of generalized functions.

Derivatives

The derivative of f is infinitely differentiable:

D^k (Df) = D^{k+1} f.

By induction, this also shows that any finite derivative D^k f of f is infinitely differentiable.

Products

The product fg is infinitely differentiable. By the product rule,

D(fg) = (Df)g + f(Dg).

Applying this rule again,

D^2(fg) = (D^2f)g + 2(Df)(Dg) + f(D^2g).

A pattern emerges. By induction one shows the general Leibniz rule:

D^k(fg) = \sum_{i = 0}^k {k \choose i} (D^i f)(D^{k-i} g),

for k \in \mathbb{N}. QED.

By induction, this also shows that f^n, where n \in \mathbb{N}, is infinitely differentiable (the cases n = 0 and n = 1 are trivial).

Linear combinations

The linear combination \alpha f + \beta g, where \alpha, \beta \in \mathbb{R}, is infinitely differentiable. By linearity of differentiation,

D(\alpha f + \beta g) = \alpha D(f) + \beta D(g).

By induction one shows

D^k(\alpha f + \beta g) = \alpha D^k(f) + \beta D^k(g),

for k \in \mathbb{N}. QED.

By induction, this also shows that any finite linear combination is infinitely differentiable.

Compositions

The composition f \circ g, where the range of f is contained in the domain of g, is infinitely differentiable. By writing down the first few formulas, observing a pattern (not easy), and again using induction, one shows the Faà di Bruno's formula for the k:th derivative of f \circ g. QED.

By induction, this also shows that any finite composition of infinitely differentiable functions is infinitely differentiable.

Multiplicative inverses

The inverse 1 / f is infinitely differentiable, where \forall x \in X: f(x) \neq 0. Assume 1 / f is k-times differentiable, with k \in \mathbb{N}, k > 0. Using the general Leibniz rule, 

0 = D^k(1) = D^k(f/f) = \sum_{i = 0}^k {k \choose i} (D^i f)(D^{k-i} (1/f)).

It follows that if 1 / f is k-times differentiable, then the k:th derivative must be given by

D^k(1/f) = -(1/f)\sum_{i = 1}^k {k \choose i} (D^i f)(D^{k-i} (1/f)).

Since the k:th derivative is given in terms of lower derivatives, this formula can be proved to hold by induction. QED.

In particular, this shows that f^{n} is infinitely differentiable for n \in \mathbb{Z}.

Posted in Intuitive mathematics | Leave a comment

Psychology of reading books

Reading a mathematical, or a scientific, book has suprisingly lot to do with psychology. Each book has a certain feel to it, and is in varying degrees in sync with you. When a book is in sync with you, you feel that

  • you can trust it being error-free in most of the places,
  • it is consistent on every level,
  • it works at an abstraction level that is not too high for you, i.e. you understand the things it says, 
  • it works at an abstraction level that is not too low for you, i.e. you don't feel like missing out on generality, and
  • you know where the text is heading in the grand scheme.

The more the book is synced with you, the lighter the reading, and the more efficient the learning. In this post I will describe some desyncers which I have managed to make explicit to myself from vague frustrated feelings. Corresponding to above, these are

  • abundant errors,
  • inconsistency,
  • missing information,
  • failing to provide redundancy,
  • working on too low an abstraction level, and
  • failing to provide insight.

Abundant errors

An obvious desyncer is the abundance of errors. A typo here and there in informal text presents no problem. However, when the errors hit the theorems or definitions, then that can be an enormous problem. Fortunately, many books have a fair deal of redundancy and consistency; the same definitions and theorems are used over and over again. Given a focused reader, the errors are easily spotted and corrected. But this only works if the frequency of the critical errors is below a certain threshold. Above this threshold the reader loses trust in the text, and becomes frustrated because he feels unable to correct the errors; the learning becomes inefficient or even impossible.

Inconsistency

Consistency means that the same choice is made whenever there is a choice to be made. This means, for example, that the uses of single-letters variables are fixed to specific situations. For example, alpha, beta, and gamma for real numbers, and v and w for vectors. If some function was denoted by f in the previous theorem, consistency means that it is not changed to a g in the next one. Inconsistency is particularly annoying because it requires continuous mental work to translate what is given to what is referred to.

Missing information

A less obvious desyncer is missing information. My pet peeve in this category is failing to specify the domain and the codomain of a function. Learning a new subject can be quite difficult when there are many different functions interacting in many different ways. For example, depending on the domain, a function is either injective or not, and depending on the codomain, a function is either surjective or not.

It may be that the missing information could be deduced from some surrounding context. But this again requires mental work to search and refer back to that context. The reader's mind should be directed at learning the subject, not on translating a coded message to something understandable. The text should provide redundancy instead (e.g. always specify the function domain and codomain on definition).

Failing to provide redundancy

Another desyncer is failing to provide redundancy. This category includes things such as referring to a function f in a theorem, where the function f is defined 5 pages backwards in the text (and what was its domain and codomain anyway?). This category can be seen as a milder version of missing information. Perhaps it could be called missing local information.

Working on too low an abstraction level

The mind of a mathematician, or a scientist, is naturally generalizing. If given an instance of a problem, it instinctly tries to recover the general problem itself. If given an idea, it instinctly tries to generalize it maximally, with the minimum number of assumptions. This is abstraction.

There is beauty in abstraction. When an abstraction is taken to its logical extreme, what is left is the idea itself, nothing more, nothing less. By this property, there is also simplicity in abstraction. Things become easier with abstraction, not harder.

The fundamental flaw in providing the reader with an artifically low abstraction level is that the provided ideas can be generalized in multiple ways, usually only one of the generalizations being the useful one. While meant to simplify the treatment, it is a disservice to the mind which can not make the appropriate generalizations and to be stimulated by the associations to other theories revealed by a higher abstraction level.

Failing to provide insight

Many books in mathematics approach their exposition by working bottom-up. This means that every concept that is being used has already been defined previously. Of course, this is the only way in which any subject can be approached.

However, reading books that have been written strictly bottom-up is frustrating because the reader has no idea on where the book is heading to, or what the important ideas of the book are in the grand scheme. How should the poor reader be motivated to read through an endless list of meaningless theorems one after the other?

The way out of this is to mix the text with a top-down approach, where the parts of the bottom-up text are summarized on how they relate to the things in the grand scheme. I call this providing insight; giving the key ideas in advance, without needing to wait for a year to find them yourself with great trouble and reflection.

Posted in Intuitive mathematics | Leave a comment

A very useful limit theorem

Let (X^{*}, T_{X^{*}}), (X, T_X), (Y, T_Y), and (Z, T_Z) be topological spaces, where X is a subspace of X^{*}. Let f : X \to Y, and g : Y \to Z. In this post I will show that

\lim_{x \to p} f(x) = c \textrm{ and } g \textrm{ is continuous at } c \Rightarrow \lim_{x \to p} g(f(x)) = g(c),

where c \in Y, and p \in \overline{X}. This theorem can then be used to prove many of the commonly needed limit theorems as special cases.

Example 1: The limit of evaluations is the evaluation of the limit

Let f : X \to X : f(x) = x, and g : X \to Z be continuous at p \in X. Then

\lim_{x \to p} g(x) = g(p).

Actually, this is equivalent to continuity at p.

Example 2: The limit of sequential evaluations is the evaluation of the limit

Let f : \mathbb{N} \to Y, and g : Y \to Z be a continuous function, where \mathbb{N} is a subspace of \mathbb{N}^{*} = \mathbb{N} \cup \{\infty\}. Then

\lim_{n \to \infty} g(f(n)) = g\left(\lim_{n \to \infty} f(n)\right).

Example 3: The limit of a component is the component of a limit

Let f : X \to Z^n : f(x) = (f_1(x), \dots, f_n(x)) and \pi_i : Z^n \to Z : \pi_i(z_1, \dots , z_n) = z_i. Since the projection \pi_i is continuous, 

\lim_{x \to p} f_i(x) = \pi_i\left(\lim_{x \to p} f(x)\right) = c_i.

Actually, the reverse implication also holds: if the limits of all component functions exist, then so does the limit of the function (with the relation above).

Example 4: Semigroup addition

Let X and Z be topological semigroups (e.g vector spaces), f : X \to Z^2 : f(x) = (f_1(x), f_2(x)), and g : Z^2 \to Z: g(x, y) = x + y. Since by definition the addition is continuous, 

\lim_{x \to p} \left[f_1(x) + f_2(x)\right] = \lim_{x \to p} f_1(x) + \lim_{x \to p} f_2(x).

We also used the result of example 3 here.

Example 5: Module multiplication

Let XY, and Z be topological R-modules (e.g. vector spaces), f : X \to Y, and g : Y \to Z: g(y) = \alpha y, with \alpha \in R. Since by definition the multiplication is continuous, so is its restriction to a fixed \alpha, and thus 

\lim_{x \to p} \left[\alpha f(x)\right] = \alpha \lim_{x \to p} f(x).

Proof

Denote by T_Z(z) the neighborhoods of a point z \in Z (and similarly for other spaces). Let c = \lim_{x \to p} f(x), and W \in T_Z(g(c)). Since g is continuous at c, there exists V \in T_Y(c) such that g(V) \subset W. By the definition of a limit of a function, there exists U \in T_{X^{*}}(p) such that f(X \cap U \setminus \{p\}) \subset V. Therefore g(f(X \cap U \setminus \{p\})) \subset W, i.e. \lim_{x \to p} g(f(x)) = g(c).

QED.

Posted in Intuitive mathematics | Leave a comment

Semi-norms and symmetric bilinear forms

Let V be a real vector space and \cdot : V^2 \to \mathbb{R} a symmetric bilinear form. In this post I will first show that Cauchy-Schwarz inequality is equivalent to \cdot being semi-definite. I will then show that for a non-negative positive-homogeneous function f : V \to \mathbb{R} the triangle inequality, convexity, and quasi-convexity are all equivalent. Finally, I will show that if  |\cdot|: V \to \mathbb{R} : |x| = \sqrt{|x \cdot x|}, then |\cdot|  is a semi-norm if and only if \cdot is semi-definite. 

Definitions

A bilinear form \cdot : V^2 \to \mathbb{R} is called

  • symmetric, if \forall x, y \in V: x \cdot y = y \cdot x,
  • positive-semi-definite, if \forall x \in V: x\cdot x \geq 0,
  • negative-semi-definite, if \forall x \in V: x \cdot x \leq 0,
  • semi-definite, if it is either positive-semi-definite or negative-semi-definite,
  • indefinite, if it is not semi-definite, and
  • fulfilling the Cauchy-Schwarz inequality, if \forall x, y \in V: (x \cdot y)^2 \leq (x\cdot x)(y \cdot y).

A function f : V \to \mathbb{R} is called

  • positive-homogeneous, if \forall x \in V: \forall \alpha \in \mathbb{R}: f(\alpha x) = |\alpha| f(x),
  • fulfilling the triangle inequality, if \forall x, y \in V: f(x + y) \leq f(x) + f(y),
  • convex if \forall x, y \in V: \forall t \in [0, 1] \subset \mathbb{R}: f((1 - t)x + ty) \leq (1 - t)f(x) + tf(y), and
  • quasi-convex, if \forall x, y \in V: \forall t \in [0, 1] \subset \mathbb{R}: f((1 - t)x + ty) \leq \max\{f(x), f(y)\}.

Cauchy-Schwarz inequality is equivalent to semi-definiteness

Assume the Cauchy-Schwarz inequality holds but that \cdot is indefinite. Then there exists x, y \in V such that x \cdot x > 0 and y \cdot y < 0. Then (x \cdot y)^2 \leq (x \cdot x)(y \cdot y) does not hold since the left-hand side is non-negative and the right-hand side is negative. This is a contradiction. Therefore \cdot is semi-definite.

Assume \cdot is semi-definite. First, if y \cdot y = 0, then the Cauchy-Schwarz inequality holds trivially. Assume y \cdot y \neq 0. Decompose x as x = x_{\parallel} + x_{\perp}, where x_{\parallel} = \frac{x \cdot y}{y \cdot y} y is the orthogonal projection of x to y, and x_{\perp} = x - x_{\parallel} is the rejection of x from y. Then, by semi-definiteness, the Cauchy-Schwarz inequality states that

|x_{\parallel} \cdot x_{\parallel }| \leq |x \cdot x|.

Again by semi-definiteness, and x_{\parallel} \cdot x_{\perp} = 0,

|x_{\parallel} \cdot x_{\parallel}| \leq |x_{\parallel} \cdot x_{\parallel } + 2 x_{\parallel} \cdot x_{\perp} + x_{\perp} \cdot x_{\perp}| = |x \cdot x|.

Therefore the Cauchy-Schwarz inequality holds. QED.

Convexity implies quasi-convexity

Let f : V \to \mathbb{R} be a convex function. Let x, y \in V, and t \in [0, 1] \subset \mathbb{R}. Now 

\begin{eqnarray} f((1 - t) x + tf(y)) & \leq & (1 - t)f(x) + tf(y) \\ {} & \leq & (1 - t) \max\{f(x), f(y)\} + t\max\{f(x), f(y)\}\\ {} & = & \max\{f(x), f(y)\}. \end{eqnarray}

Therefore f is quasi-convex. QED.

Quasi-convexity + non-negativity + positive-homogenuity implies triangle inequality

Let f : V \to \mathbb{R} be a non-negative quasi-convex positive-homogeneous function. For x, y \in V, let t = f(y) / (f(x) + f(y)). Now

\begin{eqnarray}f\left(\frac{x + y}{f(x) + f(y)}\right) & = & f\left((1 - t) \frac{x}{f(x)} + t\frac{y}{f(y)}\right) \\ {} & \leq & \max\left\{f\left(\frac{x}{f(x)}\right), f\left(\frac{y}{f(y)}\right)\right\} \\ {} & = & 1.\end{eqnarray}

Using positive-homogenuity and non-negativeness, 

f(x + y) \leq f(x) + f(y).

Therefore f fulfills the triangle inequality. QED.

For positive-homogeneous functions convexity and triangle inequality are equivalent

Let f : V \to \mathbb{R} be a positive-homogeneous function. Assume f is convex. Then f(x + y) = 2f(0.5 x + 0.5 y) \leq f(x) + f(y). Therefore f fulfills the triangle inequality. Assume f fulfills the triangle inequality. Let t \in [0, 1] \in \mathbb{R}. Then f((1 - t)x + ty) \leq f((1 - t)x) + f(ty) = (1 - t)f(x) + tf(y). Therefore f is convex. QED.

Semi-definiteness is equivalent to triangle inequality

Let \cdot : V^2 \to \mathbb{R} be a symmetric bilinear form, and |\cdot|: V \to \mathbb{R} : |x| = \sqrt{|x \cdot x|}.  Assume \cdot is indefinite. Then by indefiniteness there exists x, y \in V such that x \cdot x > 0 and y \cdot y < 0. Now one can solve the quadratic equation |(1 - t)x + ty| = 0 for t \in \mathbb{R}. The solution is

t = \frac{x \cdot (x + y) \pm \sqrt{(x \cdot y)^2 - (x \cdot x)(y \cdot y)}}{(x+y)\cdot(x+y)}.

The discriminant is always positive, since (x \cdot x)(y \cdot y) < 0. Therefore, there are two points a, b \in V, with |a| = 0 and |b| = 0, which lie on the same line as x and y. Either x or y is a convex combination of a and b. Without loss of generality, assume it is x. If |\cdot| were convex, it would hold that |x| = 0. Since this is not the case, |\cdot| is not convex and the triangle inequality does not hold.

Assume \cdot is semi-definite. Then the Cauchy-Schwarz inequality holds and (x \cdot x)(y \cdot y) \geq 0. Now,

\begin{eqnarray}|x+y|^2 & = & |(x + y) \cdot (x + y)| \\ {} & = & |x \cdot x + 2 x\cdot y + y\cdot y| \\ {} & \leq & |x \cdot x| + 2 |x\cdot y| + |y\cdot y| \\ {} & = & |x \cdot x| + 2 \sqrt{|x \cdot x|}\sqrt{|y \cdot y|} + |y \cdot y| \\ {} & = & (|x| + |y|)^2.\end{eqnarray}

Thus the triangle inequality holds. QED.

Posted in Intuitive mathematics | Leave a comment