Friday, September 12, 2014

The Monty Hall Problem

Continuing the discussion about probabilities and their intuition, here is a classical problem: the Monty Hall problem.

The setting is as follows: you are presented with three doors. Behind each door there is either a goat or a car. There are two goats and only one car. You get to pick a door, and someone who knows where the car is located, opens one of the two remaining doors and reveals a goat. Now there are two doors left, the one which you pick, and another one. Behind those two doors there is either a goat or a car. 

Then you are given a choice: switch the door, or stay with the original one. What should you do? 

Now there are two schools of thought: 
  • stay because it makes no difference, your new odds are 50/50.
  • switch because it increases your odds
Before answering the question, to build up the intuition on the correct answer, let's consider a similar problem:
Instead of 3 doors, consider 1 million doors, 999,999 goats and one car. You pick one door at random, and the chances to get the car are obviously 1 in a million. Then the host of the game, knowing the car location, opens 999,998 doors revealing 999,998 goats. Sticking with your original choice, you still have 1/1,000,000 chances to get the car, switching increases your chances to 999,999/1,000,000. There is no such thing as 50/50 in this problem (or for the original problem). For the original problem switching increases your odds from 1/2 to 2/3. Still not convinced? Use 100 billion doors instead. You are more likely to be killed by lightening than finding the car on the first try. Switching the doors is a sure way of getting the car.

The incorrect solution of 50/50 comes from a naive and faulty application of Bayes' theorem of information update. Granted the 1/3-2/3 odds are not intuitive and there are a variety of ways to convince yourself this is the correct answer, including playing this game with a friend many times. 

One thing to keep in mind is that the game show host (Monty Hall) is biased because he does know where the car is and he is always avoiding it. If the game show host would be unbiased and by luck would not reveal the car, then the odds would be 50/50 in that case. An unbiased host would sometimes reveal the car accidentally. It is the bias of the game show host which tricks our intuition to make us believe in a fair 50/50 solution. The answer is not a fair 50/50 because the game show host bias spoils the fairness overall. 

The amazing thing is that despite all explanations, about half of the population strongly defends one position, and half strongly defends the other position. If you think the correct answer is 50/50, please argue your point of view here and I'll attempt to convince you otherwise. 

Next time we'll continue discussing probabilities with another problem: the sleeping beauty problem. Unlike the Monty Hall problem, the sleeping beauty problem lacks consensus even among experts.  

Sunday, September 7, 2014

Casualties of modern society mathematical illiteracy:

Lucia de Berk, Colin Norris, Ben Geen, Susan Nelles

A good friend of mine, Michael Slott, once made the following observation: claiming illiteracy carries a social stigma, but stating that "I don't know math" is perfectly acceptable. Granted, math deals with abstractions you don't really learn until you assimilate the content and pure memorization is not enough.  But given the proper motivation, it is not that hard to understand it.

In his last interview, Carl Sagan noted: "people read the stock market quotations, and financial papers." Also: "people are able to look at sports statistics. look how many people can do that."

The trouble comes when emotions enter the picture and under the right circumstances it can lead to really bad consequences. 

Statistics can be counter-intuitive. For example, rolling the dice and getting six times in a row 1-1, for the seventh time you may think you are in a lucky streak and the chance to get the same thing is really high. Casinos know very well how to take advantage of our emotions and lack of statistical intuition, and so are the state lottery systems. 

Now in statistics there is a common mistake: the prosecutor's fallacy. To explain this we first need to explain Bayes' theorem:

\( P(A|B) = \frac{P(B|A) P(A)}{P(B)} \)

The simplest way to understand this is to derive it as follows:

P(A|B) is the probability to have A given B
P(B|A) is the probability to have B given A


P(A|B) * P(B) = the probability to get both A and B = \( P(A \cap B) \) 


P(B|A) * P(A) = the probability to get both B and A = \( P(B \cap A) \)

Because \( P(A \cap B)  = P(B \cap A)\) we have:  P(A|B) * P(B) = P(B|A) * P(A) q.e.d.

Now let A be the evidence (E) of guilt, and B the innocence (I). Prosecutor's fallacy is ignoring Bayes' theorem and concluding that:

P(I|E) is small because P(E|I) is small

This is best illustrated with a lottery example. Winning the lottery has a very small probability, but does winning it make you a cheater? Prosecutor's fallacy implies that:  

P(honest person given that you won the lottery) = P(honest | won the lottery) = P(winning the lottery | honest person) = P(winning the lottery by a honest person) = tiny percentage

Therefore you are a cheater.

Now onto Lucia de B case. She was a nurse in a children's hospital and some of the kids there died while under her care. Some doctors called the police to investigate and she was ultimately convicted based on the Prosecutor's fallacy which yielded a 1 in 342 million chances that a nurse is present by chance at the scene for the unexplained deaths . In fact the correct calculation was 1 in 25.  Fortunately the conviction was ultimately overturned, but other similar cases were subsequently discovered.   

In Netherlands there was Lucia de Berk, in UK Colin Noris and Ben Geen, in Canada Susan Nelles

Are there any unknown US cases?

So how do this happen? In a stressful hospital situation something out of the ordinary occurs. Then someone who stands out from the crowd is associated with the incidents. 

The situation snowballs and all past unexplained events are associated in an "aha moment" with the "serial killer". 

Side note: I recall a personal experience with this pattern matching psychological effect. In 2002 in Washington DC area there were two snipers who randomly shot people. Police were clueless and at some point they announced on the radio that the snipers were firing from a white van. The next day I was amazed to realize how many white vans were on the road, and probably all the white vans were pulled over by the police that weak. In the end it turned up that the white van information was incorrect. Now if you did not experience this kind of police drama, recall the last time you bought a new car. You tend to notice so many cars on the road identical with yours.

The case goes to trial and faulty prosecutor reasoning is not properly countered by the defense due to sheer math ignorance and/or incompetence. The "serial killer" is unfairly convicted and only the dedication of honest experts and the public pressure can sometimes overturn the conviction. 

When emotions run high, logic takes a back seat. In US there was a racially motivated nurse cases, in France there was the famous Dreyfus case. We should "J'accuse" elementary mathematical ignorance. 

Wednesday, September 3, 2014

DICE 2014

This is a very short post. Soon I will attend the DICE2104 conference in Castiglioncello (Tuscany) Italy. One presentation I am looking forward is "History of electroweak symmetry breaking" by Tom Kibble, one of the co-discoverers of the Higgs boson.

I applied for this conference well past the deadline and there were no more presentation slots available and I am only presenting a poster. Here it is below (please zoom in the browser to see it better: CTRL + does the trick.)

After the conference I'll have plenty of material to cover, including interesting tidbits on gauge theory. In the meantime I am preparing an interesting statistical post which includes dramatic real-life implications. Please stay tuned.

Friday, August 29, 2014

What if electromagnetism were a SU(2) Yang-Mills gauge theory?

Let us ignore for a moment the Standard Model, the weak force, and the spontaneous symmetry breaking and try to imagine how would the world look like if electromagnetism would not be present and would have been replaced by a SU(2) Yang-Mills theory.

To get a good grip on this, let's do a very quick review of Lie groups and algebras. A Lie group is a continuous group, and is so named after Sophus Lie, a Norwegian mathematician (no, he was not Chinese).  Like any group, a Lie group has a unit element, and because of the continuity, we can define a tangent space for this element. This tangent space is a Lie algebra. A Lie group can be recovered from the Lie algebra by exponentiation, and the elements of the Lie algebras are called generators.

For SU(2), there are three generators:

\( F_i = \frac{1}{2} \sigma_i\)

where \( \sigma_i\) are the Pauli matrices:

\( F_1 = \frac{1}{2}\left( \begin{array}{cc} 0 &1 \\ 1 &0 \end{array}\right) \), \( F_2 = \frac{1}{2}\left( \begin{array}{cc} 0 &-i \\ i &0 \end{array}\right) \), \( F_3 = \frac{1}{2}\left( \begin{array}{cc} 1 &0 \\ 0 & -1 \end{array}\right) \)

[side note - I think I was a bit overly ambitious to present the derivations of the equations in this post and I will instead restrict to just stating the results]

The field tensor \( F_{\mu \nu}\) has the usual definition in terms of the "electric" and "magnetic" fields \( E \) and \( B \):

\( F_{\mu \nu} = \left( \begin{array}{cccc} 0 & E_1 & E_2 & E_3 \\ -E_1 & 0 & -B_3 & B_2 \\ -E_2 & B_3 & 0 & -B_1 \\ -E_3 & -B_2 & B_1 & 0 \end{array} \right) \)


\( F_{\mu \nu} = \partial_\mu A_\nu - \partial_\nu A_\mu -i q (A_\mu A_\nu - A_\nu A_\mu ) \)

Now the essential thing is that instead of \( A_\mu = (\phi, A_x , A_y , A_z ) \) meaning the electric potential and the magnetic vector field, the four components are no longer scalars, but linear combinations of the generators. And the generators can be naively interpreted as rotations in a 3D space. Naively because for each SO(3) rotation there are two SU(2) elements, what mathematicians call a "double cover". A better physical interpretation comes from quantum mechanics where two linear combination of \(F_1 \) and \(F_2 \) are interpreted as "raising" \( I_+ \) and "lowering" \( I_- \) operators. To get to physics, in the original isospin Yang-Mills paper where SU(2) was applied to protons and neutrons, \( I_+ \) corresponded to a transformation of a neutron into a proton, while \( I_- \) corresponded to the reversed operation. In other words, the quanta of SU(2) interaction carries (an isospin) charge. For ordinary electromagnetism, the photon is not electrically charged, but non-abelian gauge interactions are no longer charge neutral.

Let us work out how \( E_x\) would look like :

\( E_x = E_1 = F_{01} = \partial_0 A_1 - \partial_1 A_0 -i q (A_0 A_1 - A_1 A_0 )  = \frac{\partial A_x}{\partial t} - \frac{\partial \phi}{\partial x}\)

the same way as in standard electromagnetism.

Let us also work out how \( B_x\) would look like :

\( B_x = B_1 = F_{43} = \partial_4 A_3 - \partial_3 A_4 -i q ( A_4 A_3 - A_3 A_4 ) \)

\( B_x =  \frac{\partial A_y}{\partial z} - \frac{\partial A_z}{\partial y} - q A_x \)

Here we pick up an additional term due to the non-commutativity. On top of all this, at each space-time point \( A's \) are no longer scalars, but they are vectors in an internal space which can carry SU(2) charges.

Beside the internal space motion, for space-time motion, the Lorentz force law is the same as in the electromagnetic case, but the inhomogeneous Maxwell's equation:

\( \partial^\mu F_{\mu \nu} = j_\nu \)

generalizes to:

\( \partial^\mu F_{\mu \nu} - iq [A^\mu , F_{\mu \nu}]= j_\nu \)

The (Dirac) current itself generalizes from:

\( j_\nu = q \psi^\dagger \gamma_\nu \psi \)

to 3 currents corresponding to the 3 generators \( F^k \) of SU(2):

\( {(j_\nu )}^k = q \psi^\dagger \gamma_\nu F^k \psi \)

So overall, the Yang-Mills theory it is quite more complicated due to the non-commutativity of the gauge group. But one thing should be clear: the magnetic field is just a very naive simplistic picture of what is going on and this mental picture only works for electromagnetism because U(1) is a commutative Lie group. The real physical objects are the "vector potentials" \( A_\mu \). Then the Bohm-Aharonov effect where measurable changes are produced by changes in vector potential while the the net magnetic field is zero is no longer counter-intuitive. The real explanation of this effect is geometrical. 

Yang-Mills is quite an interesting model and its original intention proved to be not in agreement with reality, but physicists kept studying it and it turned out that the SU(2) gauge theory does describe a physical interaction, that of the weak force responsible for particle decays but the story is a bit more complicated due to symmetry breaking. The generalization to SU(3) is straightforward, one simply change the group generators, but then brand new physics arises in the form of asymptotic freedom which explains why we do not see free quarks in nature.

Yang-Mills theory was only accepted by everyone after the proof of renormalizability was obtained in the 70s showing that a quantum field theory based on Yang-Mills does produce sensible finite predictions and all the infinities can be cured in a mathematical consistent way.

Friday, August 22, 2014

Yang's Matrix Trick

It is time to come back to the math series. Today I want to talk about a remarkable similarity spotted by Chen-Ning Yang, the same Yang from Yang-Mills' theory.

Yang-Mills gauge theory is a generalization of electromagnetism when the gauge group is non-abelian. 

Maxwell's equations can be written as:

\( F_{\alpha \beta}= \partial_\alpha A_\beta - \partial_\beta A_\alpha \)

where \( F\) is the electromagnetic tensor and \( A \) is the electromagnetic four-potential.

From Maxwell to Yang-Mills, the generalization is simply by adding the commutator of the potentials:

\( F_{\alpha \beta}= \partial_\alpha A_\beta - \partial_\beta A_\alpha + A_\alpha A_\beta - A_\beta A_\alpha \)

Now here is the magic: if you recall from a prior post the Riemann curvature tensor is:

\( R^{\delta}_{\alpha \beta \gamma} = \partial_\alpha \Gamma^{\delta}_{\beta \gamma} - \partial_\beta \Gamma^{\delta}_{\alpha \gamma} + \Gamma^{\delta}_{\alpha \mu} \Gamma^{\mu}_{\beta \gamma} - \Gamma^{\delta}_{\beta \mu} \Gamma^{\mu}_{\alpha \gamma} \)

we have the following identification which makes the Yang-Mills equation identical with the Riemann curvature:

\( A_\alpha = \Gamma^{\delta}_{\alpha \gamma}\)
\( F_{\alpha \beta} = R^{\delta}_{\alpha \beta \gamma } \)

This hints at a deeply geometrical interpretation of the gauge theory because both the Riemann curvature and Yang-Mills equations are nothing but Cartan's structural equations in disguise: 

\( F = d A + A \wedge A\)

There are 4 fundamental forces in our universe: gravity (SL(2,C)), electromagnetism (U(1) gauge theory), weak force (SU(2) gauge theory), and strong force (SU(3) gauge theory) and all four can be expressed in the form above proving that in nature curvature = force. This is easiest to understand in general relativity, but even there there is a very surprising fact requiring a big conceptual leap: even empty space can curve.

Next time we'll slowly start exploring gauge theory in depth starting with Maxwell's equations. Then all those abstract equations will become much more intuitive.

Wednesday, August 13, 2014

Quantum vs. Classical Mechanics

The search for a distinguishing principle

This is the last post discussing before I'll resume my prior math series.

After boiling down the essentials of quantum and classical mechanics and extracting the common algebraic structure, the question becomes "what is quantum"?

The standard answer from Dirac is that in quantum mechanics we add amplitudes, not probabilities. Even earlier, Schrodinger identified superposition. More modern takes on this starting from Hardy is that pure states are linked by continuous transformations. 

A pure state is a state which cannot be decomposed into a sum of other states. Because state spaces are convex spaces, this means that pure states reside on the boundary of the state space. In classical physics pure states form a discrete set while in the quantum world pure states form a continuous surface. What does this mean? It means that a measurement in classical physics reveals an intrinsic property of the system, but in quantum mechanics even pure states can collapse from one into another. 

But is this intuitive? Can we really claim that we understand the distinction between the classical and the quantum world? No, No, No.

Because quantum and classical physics are completely separated domains, first one cannot explain one in terms of the other, and second, there is no outside bird's eye view to introduce the concepts needed to explain them. 

For the first part, imagine a world of triangles trying to grasp the concept of a circle. This is basically what various quantum interpretations actually attempt to do: explain the weirdness of the quantum world in terms of classical concepts: a futile approach. Each interpretation has intuitive parts, but also craziness baked in. 

For the second part, to intuitively grasp the distinction between quantum and classical physics, you need to extract yourself from this quantum universe and explain both quantum and classical physics in terms of the laws of a meta-universe where both quantum and classical mechanics are valid. No such thing exists.

A comparison with special theory of relativity is helpful here. To really understand Lorenz transformations, one first needs to free himself/herself from the concept of aether. One does not attempt to understand the constant speed of light using notions of unbounded speeds in a Galilean framework. Just consider how silly a theory of relativity "interpretation" would be along those lines:

Light appears to have a constant propagation speed because there is a "Lorenzian potential" which acts contextually in a particular reference frame measurement. However, in reality light does not have a constant propagation speed.

Now if this is silly, why is Bohmian interpretation not silly? 

To really understand quantum mechanics weirdness we need to let go of our classical prejudices. Relativity gave up the concept of aether. Nature is quantum mechanical, no ifs, ends, and buts. Isn't time for quantum mechanics to give up attempts to search for a natural distinguishing principle? It is a futile attempt.

Sunday, August 10, 2014

It is what can generate a bit (part 1)


It takes two to tango (part 2)

Sorry for the delay, I was on vacation and although I brought my laptop with me, I forgot the charger and I could not use it. I got internet access through my cell phone, but typing on a tiny screen is not suitable for generating a blog post.

So with a bit of delay, today I present part two: it takes two to tango.

Let me start by listing the axioms used in other approaches to derive quantum mechanics:

  1. individual systems are Jordan algebras, 
  2. composites are locally tomographic,
  3. at least one system has the structure of a qubit

Dakic and Brukner: "Quantum Theory and Beyond: Is Entanglement Special?":
  1. (Information capacity) an elementary system has the information carrying capacity of at most one bit. All systems of the same information carrying capacity are equivalent; 
  2. (Locality) the state of a composite system is completely determined by local measurements on its subsystems and their correlations
  3. (Reversibility) between any two pure states there exists a reversible transformation; 
  4. (Continuity) between any two pure states there exists a continuous reversible transformation.
Lluis Masanes and Markus Muller: "A derivation of quantum theory from physical requirements": 
  1. in systems that carry one bit of information, each state is characterized by a finite set of outcome probabilities; 
  2. the state of a composite system is characterized by the statistics of measurements on the individual components;
  3. all systems that effectively carry the same amount of information have equivalent state spaces; 
  4. any pure state of a system can be reversibly transformed into any other; 
  5. in systems that carry one bit of information, all mathematically well-defined measurements are allowed by the theory.

Chiribella, D’Ariano, and Perinotti: "Informational derivation of Quantum Theory":
  1. Causality: the probability of a measurement outcome at a certain time does not depend on the choice of measurements that will be performed later. 
  2. Perfect distinguishability: if a state is not completely mixed (i.e. if it cannot be obtained as a mixture from any other state), then there exists at least one state that can be perfectly distinguished from it, 
  3. Ideal compression: every source of information can be encoded in a suitable physical system in a lossless and maximally efficient fashion. Here lossless means that the information can be decoded without errors and maximally efficient means that every state of the encoding system represents a state in the information source, 
  4. Local distinguishability: if two states of a composite system are different, then we can distinguish between them from the statistics of local measurements on the component systems, 
  5. Pure conditioning: if a pure state of system AB undergoes an atomic measurement on system A, then each outcome of the measurement induces a pure state on system B. (Here atomic measurement means a measurement that cannot be obtained as a coarse-graining of another measurement).
  1. Probabilities, 
  2. Simplicity (K is determined by a function of N and for each given N, K takes the minimum value consistent with the axioms), 
  3. Subspaces, 
  4. Composite systems rules ( \( N_{A⊗B} = N_A N_B \) and \( K_{A⊗B} = K_A K_B \) ), 
  5. Continuity (there exists a continuous reversible transformation on a system between any two pure states of that system)
Sure, there are other axiomatization approaches which do not use composition, but why is system composition appearing so often? The answer is in quantum correlations which by Bell's theorem cannot be causally explained. 

More important, for the spin 1/2 case, Bell produced an exact hidden variable model which obtains all quantum mechanics predictions for one particleThis means that considerations of only one particles (systems) are not enough to distinguish between classical and quantum mechanics! Hence the need to consider COMPOSITE systems. 

Now what shows is that composition considerations are extremely powerful because they constrain the algebraic properties. The best analogy is that of a fractal:

with the key difference that invariance under tensor composition implies that the self-similarity pattern is IN PLACE. This is how system composition constraints the dynamic. 

Composition demands either:
  • Quantum mechanics (elliptic composability)
  • Classical mechanics (parabolic composability)
  • Split-complex quantum mechanics (hyperbolic composability)
It is what can generate a bit kills the hyperbolic case.

Then how can we separate the quantum from the classical case? Some quantum mechanics reconstruction proposals (starting with Hardy's) talk about "a continuous reversible transformation" between any two pure states. Next time I'll address this issue and its relationship with quantum mechanics interpretations.