UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Information gain in quantum theory Faghfoor Maghrebi, Mohammad 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2008_fall_faghfoor_maghrebi_mohammad.pdf [ 346.16kB ]
JSON: 24-1.0066769.json
JSON-LD: 24-1.0066769-ld.json
RDF/XML (Pretty): 24-1.0066769-rdf.xml
RDF/JSON: 24-1.0066769-rdf.json
Turtle: 24-1.0066769-turtle.txt
N-Triples: 24-1.0066769-rdf-ntriples.txt
Original Record: 24-1.0066769-source.json
Full Text

Full Text

Information Gain in Quantum Theory by Mohammad Faghfoor Maghrebi  A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in The Faculty of Graduate Studies (Physics)  The University Of British Columbia (Vancouver) July, 2008 c Mohammad Faghfoor Maghrebi 2008  Abstract In this thesis I address the fundamental question that how the information gain is possible in the realm of quantum mechanics where a single measurement alters the state of the system. I study an ensemble of particles in some unknown (but product) state in detail and suggest an optimal way of gaining the maximum information and also quantify the corresponding information exactly. We find a rather novel result which is quite different from other well-known definitions of the information gain in quantum theory.  ii  Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iii  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iv  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . .  v  Dedication  vi  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1 Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Derivation of Born Probability . . . . . . . . . . . . . . . . . 2.2 Ensemble of Particles in a Product State . . . . . . . . . . . .  5 5 8  3 Information Gain . . . . . . . 3.1 Best Information Gain . . . 3.2 A Bound on the Information 3.3 Information Gain . . . . . . 4 Conclusion and Discussion  . . . . . . . . . . . . . . Gain . . . . . . . . . .  . . . .  . . . .  . . . .  . . . .  . . . .  . . . .  . . . .  . . . .  . . . .  . . . .  . . . .  . . . .  12 12 15 17  . . . . . . . . . . . . . . . . . . . . 29  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32  Appendices A Shannon Information . . . . . . . . . . . . . . . . . . . . . . . . 34 B Law of the Large Numbers . . . . . . . . . . . . . . . . . . . . 36  iii  Table of Contents C Central Limit Theorem D Lyapunov Condition  . . . . . . . . . . . . . . . . . . . . . . 39  . . . . . . . . . . . . . . . . . . . . . . . . 40  iv  List of Figures 3.1 3.2 3.3  The -distinguishable region for the Hilbert space of a) a single particle, b) multi-particles. . . . . . . . . . . . . . . . . . . . . 16 The information gain divided by N . Note that this function tends to infinity logarithmically at the extremes. . . . . . . . . 25 a) The information corresponding to p = pnˆ . b) The total information. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27  v  Acknowledgements I would like to thank my supervisor Dr. Gordon Semenoff who firstly and warmly welcomed me as a prospective student. While I had learnt a lot from his amazingly wide knowledge on various topics which I had the benefit to collaborate with him, I wish also to thank him for he was patiently open to my ideas and gave me much room to explore and accomplish this project. I would also like to thank Dr. Robert Raussendorf with whom I had some very inspiring and valuable discussions. He helped me a lot to come to a better understanding of some parts of the project and provided me with some valuable references. I would like to thank Rogayeh, my wife, with whom I shared the memories and ups and downs of the last few years specially since we came to Canada. She helped me a lot in preparing my notes and made me look more intelligible. I have been always encouraged by her presence in all my explorations and I loved the way we learnt things together. I would like also to thank Jennifer Godfrey who helped me a lot in the final editing of this letter.  vi  Dedication To my parents,  To my Mother who has been with me where nobody else was. To my Father who has seemed extremely patient in every step we went forward while he was extremely concerned. To their unbounded care and love for us.  vii  Chapter 1 Introduction Laws of nature should make no distinction between reality and information.  Anton Zeilinger  The emergence of the macroscopic world from quantum mechanics is not well understood. What the wavefunction represents in quantum mechanics is still a matter of ongoing debate. One of the most promising approaches in quantum mechanics is Quantum Information theory which argues that the wavefunction is related to the information in some way [1, 2]. The manyworld interpretation of quantum mechanics also regards the wavefunction as the complete representation of many parallel worlds and thus assigns some physical interpretation to the wavefunction[3, 4]. There is also the Bayesian approach [5] which argues that quantum mechanics should be viewed as a Bayesian system in which all statements are regarded as an agent’s degree of belief and it has nothing to do with a pre-existing reality. The mere existence of multiple approaches illustrates the bizarre situation. There have been even suggestions in the past that the wavefunction could be measured under certain circumstances [6] but it has also been argued that there is no ontological sense of the wavefunction beyond its epistemological meaning [7]. That is, the wavefunction is not real and thus can not be measured or determined [8]. However we are certainly measuring something in the lab. Measuring the quantum state is actually an entire field of study in itself. This usually means that there are many particles available in the same state. The quantum state can be then determined by various methods such as tomography, etc. [9]. Therefore, knowing certain information (in this case, many particles in the same state) about the wavefunction, we can determine more (find the state). As an interesting example in which we can actually determine whether some state is entangled or not, suppose we have two identical systems, say A and B [10]. The state is thus |ψ ⊗ |ψ . Each system has two particles in some arbitrary state (entangled or not) 1  Chapter 1. Introduction |ψ ∈ H{1s ,2s } where s = A, B and H{1s ,2s } is the Hilbert space of the two particles. We can then determine if these two particles are entangled or not by a single measurement. It might seem that this is possible because we have two copies. However, the fact that there are two copies is not relevant because we can think of the two systems as a single system in which the total state (as a product state) |ψ A+B = |ψ A ⊗ |ψ B is an eigenstate of the exchange operator PA,B , which exchanges the two subsystems. Knowing this, we can determine whether 1 and 2 are entangled or not. There is also a nice discussion that considers more carefully some of the assumptions of the original reference[11]. Another perhaps the most famous example is Quantum Information theory which assumes an ensemble E = {ρx , px } where px is the probability that state ρx is sent on a communication channel. The problem is then to find the maximum information by performing some measurement on the states received from the first party. Von Neumann entropy and the accessible information are central in this framework. In all these examples, we need some a priori knowledge to get more information from the system. If there is no a priori knowledge, it seems we can not extract any information whatsoever about the system. However, as an example, there is a huge knowledge of distant galaxies and their extensive properties. How is this information gained? What is the a priori knowledge? And how much is the information gain? A preliminary answer can be the following. Let’s think of a mechanism of continuous measurement which projects the state of the system onto a special basis[12]. Knowing this basis, we can find the state of the system (by a projective measurement in this basis). For example, let’s suppose that we are living in some environment that the particles are continuously measured in the position basis1 . We can determine the state by measuring the position. However this process (of continuous measurement) works only if there is a special dynamics. For example, the dynamics should localize the particles. In general, There might be no such dynamics. As an example, consider the polarization of a photon (or a beam of photons). There is no preferred direction in the space along which the state must collapse. The rotation symmetry of the world might lead to a collapse of the photon state in any direction. So while this answer could be partially true it certainly does not 1  We are assuming that the measurement frequency is slow enough so the wavefunction would actually evolve (as opposed to Zeno Paradox case) but fast enough with respect to some macroscopic time scale [12].  2  Chapter 1. Introduction tell the whole story. A simple case of interest is when there are many particles almost in the same state. For example, radiation from a distant galaxy might contain subsystems each consisting of many particles in an (almost) identical state. Can we justify that they are truly identical? The answer is affirmative. For instance, in his book, Asher Peres suggests to divide the particles into many subgroups of large numbers and repeat a series of measurements on these subgroups[13]. If the ensemble is truly identical, we must have obtained consistent results. Note that it is assumed that the overall state is a product state of the particles. This can partially answer the problem we posed, however it is not quite clear what the answer is if not all the particles are in the same state or why one may assume that the overall state is a product state. Also the amount of information acquired is not quantified and it is not obvious if this is the most efficient way to maximize the information gain. In this letter, we generalize this to the case where the particles might be in non-identical states and find the maximum information gain in an efficient way which doesn’t disturb the system to high precision. We then find the amount of the information gain which, to some surprise, does not follow any of the usual definitions of the information gain in quantum mechanics. In order to achieve the answer, we must re-examine some ideas about the emergence of the probability statements in quantum mechanics. The definition and the derivation of the probabilities in quantum mechanics is also unclear. There is a question as to whether we must derive the probabilities from first principles or if we should accept them as part of the principles of quantum mechanics. There have been various attempts to derive the probability laws [3, 14–16]. Part of the problem lies in the definition of the probability itself. Is the probability interpreted as a frequency law [14, 15](how often some state would result) or is it merely a decision-making rationale [16]? We will argue that at least some notion of the probability is actually derived from first principles in quantum mechanics. The result is not novel even though the derivation may be. Once we have the necessary terms to express the probability, we can generalize that to something which gives us the technology to find the information gain. This generalization deals with an ensemble of not-identical states as opposed to the more conventional one in which all particles are assumed to be in the same state. The structure of this paper is as follows. In section 2.1, we derive the Born rule of probability. We generalize the Born rule to non-identical states 3  Chapter 1. Introduction in section 2.2. In sections 3.1 and 3.2 we find the best information gain and derive its upper bound. We then derive the information gain rigorously in section 3.3. Finally, in the last chapter, we make some concluding remarks on the information gain in quantum mechanics and the role of the measurement.  4  Chapter 2 Probability 2.1  Derivation of Born Probability  In this section we present a simple proof of the Born rule of Probability. We demonstrate this for a two-level system. The arguments may be generalized immediately. Suppose we have N systems in an identical state where N is assumed to be large. The quantum state is then |ψ ⊗ |ψ ⊗ · · · ⊗ |ψ = |ψ  N  .  Let’s consider the following operator Pˆ =  f (ψ) |ψ  N  ψ|N .  ∀ψ∈Hilbert space  In the limit of N → ∞, |ψ N is an eigenstate of this operator. The reason is that if |ψ = |ψ then | ψ|ψ | < 1 and | ψ|ψ |N → 0 as N → ∞. Assuming |ψ = α |↑ + β |↓ , we find |ψ  N  ψ|N = (α |↑ + β |↓ )N (α∗ ↑| + β ∗ ↓|)N αn β N −n |↑↑↓ . . . α∗m β ∗N −m ↑↓↑ . . . | .  = n,m, all orderings  We then project this operator onto a subspace in which n particles point upward and the rest N − n point downward. Let’s call this projector Pˆn,N −n . Tr(Pˆn,N −n |ψ  N  ψ|N ) =  (|α|2 )n (|β|2 )N −n Tr( |↑↓↑ . . . n, all orderings  =  ↑↓↑ . . . |)  n s↑,N −n s↓  N (|α|2 )n (|β|2 )N −n . n 5  Chapter 2. Probability This is a very localized function of n. The maximum of this function occurs 2 when n∗ /N = |α|2 and √ (N − n∗ )/N = |β| for N → ∞. The standard deviation in n is δn ∼ N . Defining Pˆp =  Pˆn,N −n , (p− )N ≤n≤(p+ )N  we find that for sufficiently large N the state |ψ N ψ|N is completely in the subspace given by the projection operator P|α|2 , where could be taken as small as desired for N → ∞, that is lim  N →∞, →0  Tr(Pˆp |ψ  N  ψ|N ) =  0, if p = |α|2 1, if p = |α|2 .  Now suppose that we perform some measurement on the individual spin 1/2 particles. We can measure the spin along the z axis of each particle. The measurement yields | ↑ | ↑ | ↓ . . . , for example. The last equation would then imply that lim  N →∞, →0  ( ↑ | ↑ | ↓ | . . . ) |ψ  N  = 0 if the number of ↑’s in the outcome = N (|α|2 ± )  Therefore, some version of the Born rule is derived here. Our only assumption was that the measurement outcome of some wavefunction will never be some state which has no overlap with the wavefunction. To interpret these results as probability, we must make some justifications concerning the standard deviation of the results. Let’s review the Central Limit Theorem2 . Assume X1 , X2 , X3 , ... are a set of N independent and identically distributed random variables having mean value µ and variance σ 2 . The Central Limit Theorem then implies that as the sample size N increases, the distribution of the sample average approaches the normal distribution with a mean µ and variance σ 2 /N . In our problem E(X) = p.0 + (1 − p).1 = 1 − p, E(X 2 ) = p.02 + (1 − p).12 = 1 − p, so σ 2 = E(X 2 ) − E(X)2 = p(1 − p) = |α|2 |β|2 . 2  See Appendix C.  6  Chapter 2. Probability Let’s see if we would get the same result through our earlier arguments. Defining N Xn = (|α|2 )n (|β|2 )N −n , n we find Xn ≈ √  1 1 N (p − p∗ )2 √ exp(− ) 2 |α|2 |β|2 π|α||β| N  where p∗ = |α|2 . So the standard deviation is exactly what it should be. We can also find the error probability in the case of finite N . We must sum up all Xn with |n − n∗ | ≥ N . Perror = |n−n∗ |≥N  so  2 dn Xn = √ π  ∞ √  N |α||β|  1 2 d˜ n exp(− n ˜ ) 2  1 N 2 2 |α||β| exp(− Perror < √ √ ) 2 |α|2 |β|2 π N ∞  ∞  where we have used x0 dx exp(− 21 x2 ) = 1 x2 dy √12y exp(−y) < 2 0 In order to get a good approximation, we must have3  (2.1) 1 exp(− 12 x20 ). x0  1 √ |α||β|. N Now let’s partition the N -particle unity as follows Pˆp  1= p  where we are summing over all p in {0, . . . , i, (i + 1), . . . , 1}. Consider a projective measurement defining by {Pp }. If the result of the measurement turns out to be p, we find |α|2 − p ≤ where the error probability is as given before and could be taken as small as desired for N → ∞. 3  Note that this relation doesn’t mean that could be zero when α = 0 since the best we can get is to find |α| and |β| within some approximation. There is always some uncertainty . In the extreme case, we find |α|2 ≤ . Plugging back to the last equation, this gives 1 N.  7  Chapter 2. Probability The whole procedure is equally applicable to the mixed states. The proof z is exactly the same. We can find the overlap of ρ⊗N with Pn,N −n which projects n particles onto | ↑ and projects the rest onto | ↓ ⊗N z z Tr(ρ⊗N Pn,N −n ) = Tr(ρz Pn,N −n ).  Therefore we can practically substitute ρ by the reduced density operator ρz which is defined as p1 0 ρz = . 0 p2 So everything also applies to a mixed state if we substitute p1 for |α|2 and p2 for |β|2 . Then the probability is found by P rob(i) = Tr(ρPˆi ) where i is a certain outcome of an experiment and Pˆi is the corresponding operator. There have been some proofs of the Born rule in the frequency interpretation [14, 15] in the literature. However it has been objected to by some physicists. Wallace [17], for example, argues that infinity never occurs in real life or in any finite-event scenario. He points out that in any statistically finite-numbered experiment we can not neglect the tail of the distribution merely because it is very small. He also argues that the most natural framework to derive the probability is the many-worlds interpretation of quantum mechanics and some improvements of what has already been developed by Duetsch [16] will do that.  2.2  Ensemble of Particles in a Product State  The statistics of the outcomes of a quantum measurement obey a probabilistic pattern. So we can use the full strength of the probability theory. We wish to generalize the previous results to an ensemble of particles in arbitrary (but product) state. After all, we are interested in knowing how and by how much we can gain information from the state of a macroscopic system where all the particles may not be in the same state. The Lyapunov’s Central Limit Theorem will be proved to be useful. Let Xn be a sequence of independent random variables. Suppose that the third central moments rn3 := E[ Xn − µn |3 8  Chapter 2. Probability are finite and satisfy the Lyapunov condition4 1/3  N 3 n=1 rn  lim  N →∞  N n=1  1/2  = 0.  σn2  Let the random variable SN := X1 + · · · + XN denote the sum of the random variables Xn . For “large” N , SN is normally distributed5 with the expected value N  E[SN ] ≈  E[Xn ] n=1  and the variance  N  Var[SN ] ≈  Var[Xn ]. n=1  2 Let’s define the total variance as N n=1 Var[Xn ] = N σ where σ is the average of the variance. In our problem, however, there are only two possible outcomes which can be chosen as 0, 1 (that stand for | ↑ and | ↓ for example). Therefore we have the following identity (|Xn − µn | ≤ 1 )  rn3 := E[ Xn − µn |3 ≤ E[ Xn − µn |2 =: σn2 N 3 n=1 rn  or  ≤ N σ 2 . Then the ratio of the third moment to the second one is  lim  N →∞  N 3 n=1 rn N n=1  1/3  1/2  σn2  ≤  1 . (N σ)1/6  The Lyapunov condition is then satisfied for σ 1/N . In other words, this means that we can not find the expected value (i.e. the average of the random variables) with a resolution beyond 1/N . The requirements of the theorem is satisfied in our problem. The total variance is then the sum of all the variances N  σT2 OT  Var[Xn ] = N σ 2  = n=1  4 5  see Appendix D see Appendix C  9  Chapter 2. Probability and the probability of error is given by6 Perror ≤ where  σ2 N 2  must be chosen such that7 √ > σ/ N .  is the resolution. As a result,  In conclusion, we find that, up to a negligible error, the average of the density operators of the states can be found. We can also deduce the same result in the more familiar operators’ language. Consider the state |ψ = α |↑ + β |↓ . For a large ensemble of identical states, we have already showed that |ψ ⊗N would be almost an eigenstate of Pˆp (we dropped the superscript ) with the eingenvalue 1 if p ≈ |α|2 and 0 otherwise. The N -particle state can be grouped as |Ψ = |ψ1 n1 ⊗ |ψ2 n2 ⊗ · · · ⊗ |ψM nM where n1 particles are almost in the same state |ψ1 , n2 particles are almost in the same state |ψ2 and so forth. (n ) It is then true that Pˆpi i |ψi ni = |ψi ni if pi ≈ |αi |2 and it’s 0 otherwise. The superscript ni represents the number of particles in the same state. (N ) Now consider the operator Pˆp which acts on the N -particle-space. We can expand this operator as (N ) Pˆp =  (n ) (n ) Pˆp 1 ⊗ · · · ⊗ Pˆp M 1  M  p1 ,p2 ,...,pM  where the summation is constrained by N1 (n1 p1 + n2 p2 + · · · + nM pM ) = p . Applying this operator to |Ψ , only those terms in which p1 = p1 , . . . , pM = pM will contribute. In conclusion, |Ψ is almost an eigenstate of the operator 6  see Appendix B In the extreme case where the distribution is very close either to 1 or to 0, we have > 1/N . 7  10  Chapter 2. Probability PˆpN with the eigenvalue 1 if p ≈ N1 (|α1 |2 n1 + |α2 |2 n2 + · · · + |αM |2 nM ) and 0 otherwise. We can also write the last condition in a more suggestive way: p≈  1 N  |αi |2  (2.2)  i=1..N  So for a large number, we can find the mean probability distribution to high precision. Theses results will be readily generalized to the product of the mixed states. Also they will be exact for N → ∞. That is, we find lim  N →∞, →0  Tr(Pˆp(N ), ρ1 ⊗ ρ2 ⊗ ρ3 ⊗ . . . ) =  0, if p = p¯ 1, if p = p¯  where p¯ is defined as p¯ =  1 N  ↑ |ρi | ↑ . i  The interesting feature here is that the mean probability (¯ p) can be determined by a collective measurement that doesn’t alter the state (because it is an eigenstate of the measuring operators). So we can perform more measurements (a spin 1/2, for example, can be also measured along the x and y axes in addition to the previously-measured spin z). In this way we can find the average of the density operators of individual particles8 ρ¯ =  1 N  ρi .  (2.3)  1..N  8  This is also in agreement with [18] which argues that in order to perform the optimal measurement we should consider the whole ensemble as a single system rather than a sum of its components.  11  Chapter 3 Information Gain 3.1  Best Information Gain  So far we have proved that giving an ensemble of particles in a product state, we can find the average of the density operators of individual particles. Since we are interested in the best information gain, we should ask if there is any information available to the observer beyond this. Can we, for example, find the “n-th moment” (with a slight abuse of the word “moment”) of the density operators of the particles defined as µn =  1 N  ρni 1..N  (ρi is the density operator of the i-th particle)? We argue that finding anything beyond the first moment (i.e. the average of the density operators) would contradict the no-signalling theorem. The following is a thought experiment which demonstrates this point. Suppose we have a large ensemble of particles which are entangled in pairs. The two particles in each pair are taken far apart. We then have two groups of particles in which each particle from one group is entangled to another from the other group. We would like to perform some measurement on the second group of particles. Consider some generalized measurement, the POVM {Mm } which is acting on one-particle states in the second group. They would satisfy † Mm = 1 Mm m  where 1 is the unity operator in the one-particle Hilbert space. Suppose that some pair is in the state |Ψ12 , where the subindexes 1, 2 refer to the two particles. For a large ensemble of the particles, we can assume that there are very many pairs almost in the same state9 . Then as we perform 9  If this is not the case for a subset of particles, we can safely neglect them.  12  Chapter 3. Information Gain the measurement, a fraction 1 ⊗ Mm |Ψ12 collapse to 1 ⊗ Mm |Ψ12 1 ⊗ Mm |Ψ12  2  of the pairs in this state will  .  As we studied in detail, the average of the density operators can be measured. So after that the measurement has been performed on the second group, the local observer at the first group decides to measure the average density operator and finds 1 N =  1 N  1 ⊗ Mm |Ψ12  n states  2  Tr2  m  † Mm Mm  nTr2 |Ψ12 Ψ12 | 1 ⊗ states  † Ψ12 | 1 ⊗ Mm 1 ⊗ Mm |Ψ12  1 ⊗ Mm |Ψ12 1 ⊗ Mm |Ψ12  m  =  1 N  nρ states  where ρ = Tr2 (|Ψ12 Ψ12 |) and n is the number of particles which were initially almost in the state |Ψ12 . Note that we are also summing over all the states. However, this is also the average of the density operators right before any measurement was performed. So it is independent of what specific POVM has been chosen; any other POVM would result in the same average density operator. There is no way to determine what POVM was used, given only the average of the density operator in the first group. That is there is no way for signalling at a distance to occur. On the contrary, any knowledge beyond the average of the density operators would convey some information about which POVM has been chosen and it would then lead to superluminal communication. As an example, we can find the second moment of the reduced density operator of the first group (after the measurement has been done on the second group). It is nTr2 |Ψ12 m, states  † 1 ⊗ Mm Mm Ψ12 | 1 ⊗ Mm |Ψ12  2  =(  nρ2 )/N  states  which vividly depends on the choice of the POVM. This can be proved in general as follows. Assume that initially all the particles are in the same state |Ψ12 = |Ψ . After performing the measurement 1⊗Mm |Ψ a fraction 1 ⊗ Mm |Ψ 2 of the pairs in this state will collapse to 1⊗M . m |Ψ 13  Chapter 3. Information Gain We can then trace over the second group of the particles. The reduced state is then † Tr2 |Ψ Ψ| 1 ⊗ Mm Mm † Tr1,2 |Ψ Ψ| 1 ⊗ Mm Mm  Now we look for a function of the reduced density operators which is indifferent to the POVM chosen. Note that a special POVM might be {1} which does not do anything. The problem is then to find the most general function with the property f (ρ1 , ρ2 , . . . , ρN ) = f (ρ, ρ, . . . , ρ) where ρ = Tr2 (|Ψ Ψ|) and ρi ’s are the reduced density states as defined in the above and they are constrained as 1 ρi = ρ. (3.1) N It follows that f (ρ1 , ρ2 , . . . , ρN ) must be only a function of the average of its arguments. f (ρ1 , ρ2 , . . . , ρN ) = F (ρ = E(ρi )) . That is any such function only carries information about the average density operator. To give an honest proof, we must show that {ρi } are only constrained by 3.1. That is they are not constrained any further by the measurement, i.e. the POVM {Mm }. In other words, we must show for any {ρi } constrained to 3.1, there is always some POVM which results in the same set of states. We choose another alternative here as follows. As we have said, a fraction 1 ⊗ Mm |Ψ 2 of the pairs will collapse to † Mm Tr2 |Ψ Ψ| 1 ⊗ Mm † Tr1,2 |Ψ Ψ| 1 ⊗ Mm Mm  .  Let’s introduce the (non-normalized) operators ρ˜m as † ρ˜m = Tr2 |Ψ Ψ| 1 ⊗ Mm Mm .  Normalizing this operator, we find the reduced state and its norm also gives us the fraction of the particles reduced in this state. The problem can be then reformulated to find a function as f (˜ ρ1 , ρ˜2 , . . . , ρ˜m , . . . ) 14  Chapter 3. Information Gain which does not depend on the POVM Mm ’s. Since Mm ’s are only constrained by 3.1, we find10 ∂f − λk (Mm )∗ik = 0 ∂(Mm )ij where λk ’s are constant. It follows that ∂f † ∂(Mm Mm )ij  = const,  as we wished to show. Note that this relation would not hold if f is defined as the “n-th moment” of the density operator for any n > 1. The proof is then complete. Note that although we can only find the average density operator, the knowledge of the density operator alone constrains the higher moments because it is a positive trace class operator. The extreme scenario is when the average density operator is a one-dimensional projection operator ρ = ρ2 . In this case, the knowledge of the average density operator will also determine all the higher moments.  3.2  A Bound on the Information Gain  Before discussing the information gain in full generality, it is interesting to explore whether there is any theoretical limit on the information gain of a system of N particles in a two-level system (where we also assume that the particles are in a product state). By general considerations, we can find a bound on the maximum information. This bound comes from the fact that the non-orthogonal states can not be discriminated perfectly [1]. The discrimination of two states is related to their overlap, i.e. how orthogonal they are. We can discriminate two non-orthogonal states up to some error comparable to their overlap. As long as there is a tiny overlap, we can discriminate the states almost reliably. To get some insight, let’s define the two states to be “ -distinguishable” if (the absolute value of) their inner product is less than . Let’s consider a single spin 1/2 particle in some state | ↑ . The fraction of the Hilbert space (in this case the Bloch sphere) that is -distinguishable from this state is something of the order of : the tiny 10  Here we exploit the advantage of using the generalized measurement POVM instead of the projective measurement. Mm ’s can be chosen arbitrarily subject only to 3.1.  15  Chapter 3. Information Gain area near the south pole of the Bloch sphere. However, for a large number of particles, this could be dramatically larger. Figure (3.1) illustrates the point. Each sphere is a Bloch sphere. The big dots on top of the spheres represent the standard state where in the first case is |↑ and in the second case, it is |↑↑↑ . . . . The dark area represents the portion of the Hilbert space which is -distinguishable from the standard state. Figure 3.1: The -distinguishable region for the Hilbert space of a) a single particle, b) multi-particles.  We can very easily calculate the minimum number of the states which could be distinguished from the rest of the Hilbert space reliably11 . Take a standard state such as |↑↑ . . . ↑ and another one of the form |  n ˆ  n ˆ  ···  n ˆ  where the subindexes n ˆ is the direction of the spin. In order to have a small error probability, the overlap should be small |  =| ↑|  n ˆ  |N = cosN (θ/2).  11  A similar line of thought is also taken in [19] where coarse-graining is considered necessary in order to find some sort of reality in quantum mechanics.  16  Chapter 3. Information Gain For the inner product to be small, we must have N θ2 ≥ 1 or √ θ ≥ 1/ N . Any state such as |  n ˆ1  n ˆ2  ···  n ˆN  √ where all n ˆ ’s are confined in the small region of θ < 1/ N , will not be distinguishable from the standard state while any other state would be “almost” distinguishable. The corresponding volume (of the Hilbert space that can not be distinguished from the standard state) is then (we have normalized the total volume to 1) N ∼ (θ2 )N ∼ (1/N )N . So from this point of view, we find the bound12 Imax ≤ N log N.  (3.2)  Note that this bound was found merely by imposing the indistinguishability theorem. There is no reason to expect to gain this much information and most of the times one will not. However, we will show that this bound may be saturated.  3.3  Information Gain  In this section, we finally go back to our initial motive and find the information gain. The Shannon information is not appropriate for our purpose for the following reason. The Shannon information (and its immediate quantum mechanical generalization, Accessible Information) requires the a priori probability px that some state ρx may occur. However, there is no such a priori probabilities in the problem that we are interested in this letter. We will argue more extensively in the next chapter why this is the case. We will seek for a definition which is applicable to any sort of a priori information about the system. We will state our definition in the physicist’s language of the measurement and the state but it is meant to be general. Consider a system which is going to be measured. Prior to the measurement, the system could be in any of Ntotal (likely) states (the notion of the state doesn’t necessarily refer to the quantum state). After the measurement 12  We will define the information rigourously in the next section.  17  Chapter 3. Information Gain has been performed, we find that the system was initially most likely in some subspace (of the total space) that contains only Nmeasured states. Note that “likely” means the error due to neglecting the rest of the states is small. The information is then defined as Igain = log(Ntotal ) − log(Nmeasured ).  (3.3)  This formula could be interpreted in the following way. When there are N different possible outcomes and we don’t know which one is true, the uncertainty (entropy) would be log(N ). After the measurement, the number of possibilities is reduced. The information gain is defined as the decrease of the uncertainty (entropy) before and after the measurement. This definition of the information gain is additive in the following sense. The information in two independent (and non-entangled) systems is the sum of the information in each system Itotal = I1 + I2 . The crucial feature of this notion of the information is that it’s not limited to the a priori knowledge that is usually assumed in the application of the Shannon information, i.e. we have not assumed that the states are drawn from an ensemble E = {ρx , px }. It’s easy to see that in the case of a (classical) ensemble E = {x, px }, we would recover the Shannon information. This will be shown in Appendix A. Note that the information gain as defined here seems to depend on the error that can be allowed. So we must also represent the error probability in the definition of the information Igain (Perror ). A sensible definition of the information gain had better not depend too sensitively on the error. We will come back to this point later. Now we are in a position to find the information gain. Suppose we are given a large number of spin 1/2 particles which are in a product state. We can measure them along the z axis. We assume further that n = N p of the spins will end in state | ↑ after the measurement. According to the (generalized) rule of large numbers, we can find p¯ = Tr(¯ ρ| ↑ ↑ |) where ρ¯ is the average of the density operator of the individual particles (as defined earlier in equation (2.3)). To be more accurate, we find that |p − p¯| < 2 Perror < Nσ 2 .  (3.4) 18  Chapter 3. Information Gain In order to define the number of states we must discretize the Hilbert space. This procedure must preserve the natural symmetries of the Hilbert space. For a two-level system this is quite easy because there is a geometrical picture of the Hilbert space, the Bloch Sphere. The natural measure on the Bloch Sphere is d cos θdφ. This can be rewritten as d cos θdφ ∼ d cos2 (θ/2)dφ ∼ dpdφ where p is the probability of finding the particle in the state | ↑ . So the appropriate measure is dp (measuring the spin z the particles would reveal no information about φ). The number of the states in some interval is then given by 1 dp, δ where δ represents the size of the discretized volume. Then the total number of states of N particles is given by 1 Ntotal = ( )N . δ We will also find Nmeasured as Nmeasured =  1 δ  dp1  1 δ  dp2 . . .  1 δ  dpN  where the prime indicates that the integral is constrained by |p− p¯| < . Note that p¯ = N1 pi in which pi is the probability of finding the i-th particle in the state | ↑ and p is defined as n↑ = pN (n↑ is the number of spins we find in | ↑ after the measurement). So 1 Nmeasured = ( )N δ  dp1 dp2 . . . dpN . 1 |N  P  (3.5)  pi −p|<  Before trying to evaluate this integral let’s get a little bit of insight into its geometrical meaning. The probability space is actually an N -dimensional (super)cube: 0 ≤ pi ≤ 1. We want to find the volume of the portion of the cube which lies between two parallel (super)planes defined as p1 + p2 + · · · + pN = N (p ± ).  19  Chapter 3. Information Gain Let’s denote the volume which is bound by p1 + p2 + · · · + pN = n and p1 + p2 + · · · + pN = n + 1 by Vn . The number of the measured states is 1 Nmeasured = ( )N δ  Vn . N (p− )≤n≤N (p+ )  √ Since the distance between the two parallel surfaces is 1/ N the volume is approximately 1 Vn = √ An (3.6) N where An is the N − 1 dimensional area of the plane (which is defined as the intersection of the super-cube and the super-plane p1 + · · · + pN = n). The area is a well-known series expansion [20]: √ [n] N N (−1)k (n − k)N −1 . (3.7) An = (N − 1)! k=0 k This can also be written as an integral √ sin u 2 N ∞ du An=N p = π u 0  N  cos(2N (p − 1/2)u).  (3.8)  However this a difficult integral to evaluate even numerically and there is no general solution in the literature to the best of our knowledge. Yet we can consider some special cases here. For small |p − 1/2|, we find 6 exp(−6N (p − 1/2)2 ). (3.9) π √ The standard deviation around p = 1/2 is then 1/ N . But our error estimation (3.4) also requires N −1/2 . So we have13 AN p =  Igain (p = 1/2) ≈ 0. 13  To be more precise, Igain (p = 1/2) ∼ log N , however it’s negligible compared to the information gain for p = 1/2, which turns out to be proportional to N .  20  Chapter 3. Information Gain We can also consider the case in which p ≈ 0, 1. That is we can find the volume V which is bounded between the origin and the surface p1 + p2 + · · · + pN = N . This gives us the maximum amount of the information gain. It’s easy to see that ( N )N . V ≤ N! So the maximum information gain should be I = log V ≥ −N log( e) where e is the Neper number. In fact, we can show that I = −N log( e). Let’s compare (the absolute value of) of the first and the second term in equation (3.7) 1 t1 /t0 = N (n − 1)N /nN = N (1 − 1/n)N = N (1 − 1/(N ))N = N exp(− ). We can choose ∼ N −1 . The second term can be then neglected (of course only in this case, i.e. for p = ). This is also true for the rest of the series expansion14 . So we have Imax = −N log e For will be  (3.10)  ∼ 1/N (which is legitimate when p ≈ 0 or 1), the information gain I ≈ N log N  14  We can do the same thing for the m-th term tm /t0 < (  eN −1/ m e ) . m  Again for ∼ N −1 this would be smaller than eN e−1/ . So the sum of all the terms other than the first divided by the first term would be smaller than 1 N .eN exp(− ) −−−−→ 0. N →∞  Geometrically this means that almost all the contribution to the volume comes from the tiny (super)cube whose side is of the order of .  21  Chapter 3. Information Gain up to corrections of the order of N . This is identical to the bound that we found on the maximum information in equation (3.2). Indeed, we see that the bound is actually saturated in this case. So far we examined the problem in some special cases but were limited by the difficult integral in equation (3.8). In general, we can find the answer in the following way. Assume that we are partitioning the probability space into M (= 1δ ) tiny cells. For N particles, there are M N many ways to distribute the particles into different cells. To find the information gain, we must determine how many different ways there are to distribute the particles subject to M −1  ni pi ∈ (N p − N , N p + N ), i=0  where ni is the number of particles in the i-th cell and pi = i/M , as already defined, is the probability of finding these particles in the state | ↑ . Let’s reformulate the problem in the following terms. We would like to find the number of the states constrained to M −1  ni pi = N p¯  (3.11)  i=0  where p¯ ∈ (p − , p + ). We will later sum over all these p¯’s. In the following, we use the method of the maximum entropy [21]. The number of different ways of choosing sets of n1 , n2 , . . . , nM particles from N particles is N! . n1 !n2 ! . . . nM ! The method of maximum entropy teaches us to maximize this expression subject to the constraint (3.11). It is easier to maximize the logarithm of this expression M −1  −N  xi log xi , i=0  where xi = ni /N . We should then maximize M −1  −  M −1  xi pi − p¯ + B  xi log xi + A i=0  M −1  i=0  xi − 1 , i=0  22  Chapter 3. Information Gain where A and B are Lagrange multipliers. By taking the derivative with respect to pi and equating the result to zero we find xi = eApi +B where B (= B + 1) and A are constant (we drop the prime in the following). Plugging this back in the previous relations, we find M −1  −  M −1  xi log xi = − i=0  xi (Api + B) = −(A¯ p + B). i=0  The information gain is then log M N /N = N log M + N (A¯ p + B). So the problem reduces to finding the coefficients A and B. Applying the two constraints, we have M −1  M −1  eApi +B = 1, i=0  pi eApi +B = p¯. i=0  Note that pi = iδ where δ = 1/M . The first constraint gives us M −1  M −1  e  Api +B  =e  B  i=0  eAδ  i  = eB  i=0  eA − 1 =1 eAδ − 1  The LHS of the second constraint is just the derivative of the LHS of the first constraint with respect to A. Then, eB  ∂ eA − 1 = p¯. ∂A eAδ − 1  Taking the ratio of the last two relations eliminates B and gives an equation in terms of A ∂ ∂ log(eA − 1) − log(eAδ − 1) = p¯. ∂A ∂A We are confronted by three possibilities: 1. A 1, 2. A ∼ 1 and 3. A 1. It turns out that only the second possibility is self-consistent. With Aδ 1, the last relation becomes 1 1 = + p¯. −A 1−e A  (3.12) 23  Chapter 3. Information Gain This relation is true for the range (−∞, +∞) of parameter A. We can solve for B in terms of A A . B = log δ + log A e −1 The information gain is then Igain := N K(¯ p) = N (A¯ p + log(1 + A¯ p))  (3.13)  where A should be solved from (3.12) in terms of p¯ and plugged in here. We have also defined the function K = I/N . Note that in finding N , we didn’t sum over all p¯ in (p − , p + ) because the number of possibilities grows exponentially in N as p¯ gets closer to 1/2. Therefore, it is safe to plug in p + in the last relation when p < 1/2 or p − when p > 1/2. Taking p < 1/2, we find Igain = N K(p + ) = N K(p) + N K (p) where we have Taylor expanded it. The √ error probability is given by Perror := γ ∼ 1/N 2 . We should choose 1/ N γ to enforce small error. The second term in the last equation is then much less than O N/γ and is negligible with respect to the first term (for very large N ). So the information gain is Igain = N K(p).  (3.14)  It is encouraging that the information gain almost does not depend√on the error as long as it is much smaller that 1 while it is larger than 1/ N (or 1/N in the extreme cases). We can then find the information gain purely in terms of p = n↑ /N . This information per particle is plotted in Figure (3.2). Let’s examine this expression in a number of limits. 1. p is close to 1 (1 − p 1) when A → ∞. In this limit we have 1 = p + A1 . Then the information gain is Igain = −N log ((1 − p)e) where e is the Neper number. Note that the uncertainty of p is and the largest value one can consider for p is 1 − . In this extreme case, this will exactly reproduce the result of equation (3.10). We will also get a similar result when p is close to 0 Igain = −N log (pe) . 24  Chapter 3. Information Gain 2. The other extreme is when p is close to the middle of the distribution, i.e. |p − 1/2| 1. In this case, we have I = N 24(p − 1/2)2 . At the first sight, it seems that there is some discrepancy between this √ and the result from (3.9). However the latter is only true when |p − 1/2| ∼ 1/ N while the former is true if the usual corrections of the order of log N are negligible, which requires that √ log N √ |p − 1/2| , N so they are consistent. Figure 3.2: The information gain divided by N . Note that this function tends to infinity logarithmically at the extremes.  To summarize, we derived the information gain in terms of the mean probability in measuring the spin z of the particles. The best information  25  Chapter 3. Information Gain we could get from an ensemble of particles in a product state is I = sup Igain (pnˆ ).  (3.15)  n ˆ  So the maximum information will be gained as the maximum |pnˆ − 1/2|. Suppose that the average density operator is diag{¯ ρ} =  p∗ 0 0 1 − p∗  ,  (3.16)  then the maximum |pnˆ − 1/2| is obviously |p∗ − 1/2| and the information content is then I = N K(p∗ ). (3.17) Note that the maximum information is always available to us. Although in the above argument, we assumed that we measure the individual spins along a specific axis, we can do better: the average density operator (¯ ρ) can be determined by a collective measurement as we showed in section 2.2. It might seem that the maximum information should be larger than (3.17). The argument can be illustrated in Figure (3.3). Knowing “p = p∗ ” geometrically means that the average density operator is somewhere in the dark spot in Figure (3.3a) while knowing (3.16) (i.e. |¯ ρestimate − ρ¯actual | < ) corresponds to the dark spot in Figure (3.3b). However, we will show that the excess of the information is negligible. The proof again lies in the fact that we should concern ourselves only with the region which is closer to the center of the Bloch sphere. That is the number of states decreases as we move away from the center of the Bloch sphere. Suppose that the number of states in the small region of size 2 in Figure (3.3b) is V(p). Then the number of states in Figure (3.3a) is definitely less than V(p).1/ 2 . So the difference of the information is log  1 − N K(p) < log(1/ 2 ). V(p)  √ Since ∼ 1/ N at worst, the difference turns out to be logarithmic in N . So perhaps the most important result of this letter can be stated as Igain = N K(p∗ ) where p∗ =  1 1 + 2 2  2Tr(¯ ρ2 ) − 1  (3.18)  26  Chapter 3. Information Gain  Figure 3.3: a) The information corresponding to p = pnˆ . b) The total information.  for a two-level system. Note that the range of p is (1/N, 1 − 1/N ) because p is only found with some resolution which can not be less than 1/N . This equation is not very similar to the well-known definitions of the information gain. For example, the information gain per particle (for a twolevel system) could be I(¯ ρ)/N 1. Even in the extreme case Imax /N = log N. However the fact that the information gain per particle is much more than 1 does not contradict the quantum information theory in any way. In the context of information theory, the information is mostly concerned with the information coding, specifically the number of different letters we can use in order to encode some message. Here we can easily calculate the number of states that can be discriminated. We can find √ the average density operator with some resolution |¯ ρestimate − ρ¯actual | < 1/ N . So we can discriminate (  1 √ )α = N α , 1/ N 27  Chapter 3. Information Gain different states (α ∼ 1). The logarithm of this number is α log N (< N bits) which is (much) less than N . Therefore there is no contradiction. In the next chapter, we summarize the results and investigate the implications.  28  Chapter 4 Conclusion and Discussion The information gain that we found here is different from Shannon information or its close cousin von Neumann information (entropy). In this chapter we argue more extensively why we formulated a different notion of the information gain and didn’t apply the usual definitions of the information from Quantum Information theory(QI). First of all, the problem that we studied in this letter involved infinitely many states (all states of a Bloch sphere, for example). In the context of Quantum Information theory, however, we usually consider finite number of states (an ensemble of states {ρx , px } where x is usually assumed to take finite different values). In Appendix A, we argue that the Shannon Information would not be straightforward for infinite number of states. In the following, we will give a brief summary that how the accessible information is defined in the context of the Quantum Information theory[22] and consider the preassumptions more carefully. Assume that we have an ensemble of states E = {ρx , px } where px is the a priori probability that the state ρx is chosen or sent on a communication channel. The receiver can then collect some information by performing a generalized measurement, the POVM {My }. If some state ρx is sent, he will find the outcome y with the following conditional probability p(y|x) = Tr(My ρx ). The conditional probabilities determine the amount of information that he can gain on the average, the mutual information I(X : Y ) of the preparation and the measurement outcome. The maximum information gain is then Acc(E) = max I(X : Y ). {My }  It might seem that we studied a similar problem except that we considered any (possibly mixed) state in the Hilbert space, i.e. all ρx ∈ Hone−particle and the a priori probabilities were assumed to be all equal. Discretizing the 29  Chapter 4. Conclusion and Discussion Hilbert space, we then have E = {ρx , px = 1/N } where N is the number of the states in the discretized Hilbert space. This line of thought has been pursued, for example, by Asher Peres. In his book [13], he gives the following example “Suppose that the only information prior to a test of σz is that the initial state was pure. It satisfies σ.n|ψ = |ψ with equal probabilities for all directions of the unit vector n...” Peres discretizes the Hilbert space to N small intervals and assumes the a priori probabilities to be p = 1/N . He defines the posteriori information by Bayes’s theorem in accordance with the result of the measurement. The information gain he finds is then Iave = 0.19. In our viewpoint, this approach is not appropriate. There is a major difference between the Quantum-Information theoretic approach and ours. In the former, the probabilities px ’s are the actual probabilities. That is, a fraction px of the states (that are drawn from the ensemble) are really in the state ρx . This allows us to consider only a subspace (the typical subspace) of the whole Hilbert space while discarding the rest of it. Specifically this means that for a large sample, the average of the density operators of the states ( ρi divided by the total number of them) should be px ρx . The problem that we studied in this paper is totally different. Here we assumed that all states are equally probable because we did not know any better. px = 1/N doesn’t mean that the states are truly distributed with such probability or the average density operator is d1 1; a total section was actually devoted to find the average density operator. In QI, one assumes the probabilities a priori and then tries to find the information conditioned to these probabilities. In our problem we don’t know the a priori probabilities and, as we showed, we can only find the average of the density matrix while it’s already taken for granted in the context of QI. As we have already pointed out there is no way to get any information about the quantum state unless we’ve got some a priori knowledge of the system. In this letter we assumed that the state of the system is a product state of pure states. It would be very interesting to ask the same question given different a priori knowledge. It’s very curious to observe that the information is gained only if we know something in advance. As discussed in the first chapter, we might detect a new star far away and gain some information about it. As we outlined in 30  Chapter 4. Conclusion and Discussion the first chapter, one explanation can be the continuous measurement. That is, we may assume that the star’s position is continuously measured by its environment. What the measurement does is simply to project down the system onto some proper subspace of the Hilbert space. In this example, this is the position basis. The question is then whether we always need some sort of measurement to provide us with some a priori knowledge. Can the unitary evolution (as opposed to the measurement) do this task? We argue that it cannot. Suppose we could evolve any state (of the Hilbert space) into a strict subspace of the Hilbert space. Defining an invariant measure of the Hilbert space, the volume of this subspace is V = Ω D[ψ]. Since the measure of the volume is the same under unitary evolution, the volume must remain the same. So V = V , where V is the total volume of the Hilbert space. That is we can’t project the whole Hilbert space onto any strict subspace of it. This is rather trivial and, in fact, is similar to the argument that the volume of the phase-space remains the same in classical mechanics (that is dpdqf (p, q) = const). So it seems that (non-unitary) measurement-like processes are essential in order to gain any sort of information in quantum mechanics. In this letter we showed that even for a system of many particles in some unknown (but product) state, we can gain much information and we hope this could partially answer the question that how the information gain is possible from the unknown environment.  31  Bibliography [1] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information, Cambridge University Press, 2000. [2] C. Brukner and A. Zeilinger, Phys. Rev. Lett. 83, 3354 (1999). [3] H. Everett, Rev. Mod. Phys. 29, 454-462 (1957). [4] B. S. M. De Witt, Quantum mechanics and Reality, Physics Today 23, No. 9, pp. 30-35 (1970). [5] C. M. Caves, C. A. Fuchs, R. Schack, Subjective probability and quantum certainty (2006), arXiv:quant-ph/0608190. [6] Y. Aharonov, J. Anandan, L. Vaidman, Phys. Rev. A 47 4616(1993). [7] W. G. Unruh, Reality and measurement of the wave function, Phys. Rev. A 50, 882 (1994). [8] O. Alter and Y. Yamamoto, Quantum Measurement of a Single System, Wiley, New York, 2001. [9] U. Leonhardt, Measuring the Quantum State of Light, Cambridge University Press, Cambridge, England, 1997. [10] S.P. Walborn et al., Nature 440, 1022(2006). [11] S. J. van Enk, e-print arXiv:quant-ph/0606017. [12] T. Bhattacharya, S. Habib, and K. Jacobs, Phys. Rev. Lett. 85, 4852 (2000). [13] A. Peres, Quantum Theory: Concepts and Methods, Kluwer Academic, Dordrecht, 1993.  32  Bibliography [14] J. Hartle, Quantum mechanics of individual systems. Am. J. Phys., 36, 704712 (1968). [15] E. Farhi, J. Goldstone, S. Gutmann, How probability arises in quantummechanics. Annal. phys., 192, 368382 (1989). [16] D. Duetsch, Proc. R. Soc. London A 455 (1999) 3129. [17] D. Wallace, Quantum probability from subjective likelihood: Improving on Deutschs proof of the probability rule, Stud. Hist. Philos. Sci. B Stud. Hist. Philos. Modern Phys, 38, 311-332 (2007). [18] S. Massar and S. Popescu, Phys. Rev. Lett. 74, 1259 (1995) [19] J. Kofler, C. Brukner, Phys. Rev. Lett. 99, 180403 (2007) [20] G. Polya, Berechnung eines Bestimmten Integrals, Math. Ann. 74 (1913) 204-212. [21] E. T. Jaynes, Probability Theory: The Logic of Science, Cambridge University Press, 2003. [22] J. Preskill, Lecture notes for physics 229: Quantum information and computation, URL http://www.iqi.caltech.edu.  33  Appendix A Shannon Information In the following, we derive Shannon information in the usual case where there are only finite number of outcomes and argue the generalization to the infinitely many states is not straightforward. We will then show Shannon information is a special case of the information gain as defined in this letter. Consider a random variable which might assume some value from a set of L different possibilities. It might assume the first one with probability p1 , the second one with probability p2 , etc. For large N number of events, p1 N of the outcomes will assume the first value, p2 N assume the second value, etc. (typical case). There is a small probability that this might not be the case that is called the atypical case. We can ignore the atypical states for large enough N . There are Ntypical =  N! . (N p1 )!(N p2 )! . . . (N pL )!  typical states that may occur with equal probability. The uncertainty of the outcomes is then log Ntypical ≈ N H(p1 , p2 , . . . , pL ) where H is the Shannon entropy defined as H(p1 , p2 , . . . , pL ) = −  pi log pi .  It might seem that this procedure is also applicable to the continuous case in which there are infinitely many states (L = ∞). However the typical subspace arises for N large enough so that the mean number of events per state will be much larger than 1, i.e. N/L 1. For L = ∞, this would be never satisfied. In the following, we show that the definition of the information gain, as presented in this letter, would result in the Shannon information for an ensemble {x, px } where x is the (classical state) and px is the corresponding a priori probability. The “likely” space, as defined in this letter, consists of 34  Appendix A. Shannon Information the “typical sequences” in this case. Before the “measurement” the number N of typical sequences is 2n(H+δ) ≥ N ( , δ) ≥ (1 − )2n(H−δ) where Perror = is the probability of error. In our language, this number is the same as Ntotal . That is, we have Ntotal = N ( , δ). After performing the measurement (reading the sequence of the letters), we find one sequence. Therefore Nmeasured = 1. Note that and δ can be assumed as small as desired for large enough N . Following our definition of the information gain, we then find Igain = N H(x). where the error probability tends to zero for N → ∞.  35  Appendix B Law of the Large Numbers Assume that X1 , X2 , . . . , XN are N random variables having possibly different distributions. In the following, we prove an extension of the law of large numbers15 . The argument is almost along the same lines as the standard law. Define the operator E which takes the average of a random variable N  2 E(SN )=  =  E(Xi Xj ) N2 i,j=1 1 N2  E(Xi Xj ) + i=j  1 N2  E(Xi2 ). i  The first term could be simplified as E(Xi Xj ) =  E(Xi ))E(Xj )  ( j  i=j  i=j N 2 2 E(Xj )2 = N 2 E¯N − N E¯N  2 (N E¯N − E(Xj ))E(Xj ) = N 2 E¯N −  = j  j=1  where we defined 1 E¯N = N  N  i=1  1 2 E(Xi ), E¯N = N  N  E(Xi )2 i=1  Plugging this back in the first equation, we find 2 E(SN )  15  1 2 + 2 = E¯N N  N  (E(Xj2 ) − E(Xj )2 ) j=1  The law of large numbers applies to N identically distributed random variables.  36  Appendix B. Law of the Large Numbers If the random variable may assume only finite possible outcomes, then the quantity in the bracket is bound from above. Defining σ 2 :=  1 N  E(Xj2 ) − E(Xj )2 ,  we have 1 2 2 + σ2. ) = E¯N E(SN N We can rewrite this as E (SN − EN )2 =  1 2 σ N  This can be also defined in terms of the probability measure E (SN − EN )2 =  dP (SN − EN )2  where dP is the probability measure dP (SN − EN )2 ≥  2  p(|SN − EN | > ).  p() stands for the probability. So we have σ2 . p(|SN − EN | > ) ≤ N 2  (B.1)  Therefore, for N → ∞, we find SN → EN . We can make a stronger result as follows. For the sake of simplicity, suppose there are only two values that X may assume: 0, 1. We have E(X)  dP (X − E(X)) = 0 ⇒  1  dP (E(X) − X) = X=0  dP (X − E(X)). X=E(X)  We then find dP (X − E(X))2 ≤  dP |X − E(X)|  E(X)  1  dP (E(X) − X) = 2  =2 X=0  dP (E(X) − X) ≤ min (E(X), 1 − E(X)) . X=E(X)  37  Appendix B. Law of the Large Numbers Since E(X) is only determined up to some uncertainty , we have dP (X − E(X))2 ≤ min (S(X), 1 − S(X)) + . This would be specially useful when the average is very close to one extreme of [0, 1]. Putting all this together, we find p(|SN − EN | > ) ≤  2 min (S(X), 1 − S(X)) 1 E¯N ≤ + 2 2 N N N  (B.2)  In the extreme case that all the outcomes (but possibly a few of them) turn out to be 0 or 1, we find Perror ∼  1 . N  For the probability to be reasonably small, we must have ∼ 1/N.  38  Appendix C Central Limit Theorem Let X1 , X2 , X3 , ... be a set of n independent and identically distributed random variables having finite mean values and variance σ 2 > 0. The central limit theorem (also known as the second fundamental theorem of probability) states that as the sample size n increases, the distribution of the sample average approaches the normal distribution with a mean µ and variance σ 2 /n irrespective of the shape of the original distribution. Let the sum of the random variables be Sn , given by Sn = X1 + ... + Xn . Then, defining Sn − nµ √ , Zn = σ n the distribution of Zn converges towards the standard normal distribution N (0, 1) as n approaches ∞. This can be also reformulated as Zn =  Xn − µ √ , σ/ n  where X n = Sn /n = (X1 + · · · + Xn )/n is average of the outcomes  39  Appendix D Lyapunov Condition Let Xn , n ∈ N, be a sequence of independent random variables. Suppose that each Xn has finite expected value E[Xn ] = µn and finite variance Var[Xn ] = σn2 . Suppose also that the third central moments rn3 := E[ Xn − µn |3 are finite and satisfy the Lyapunov condition  lim  N →∞  1/3  N 3 n=1 rn N n=1  1/2  = 0.  σn2  Let the random variable SN := X1 + · · · + XN denote the N -th partial sum of the random variables Xn . Then the normalized partial sum ZN :=  N n=1  SN − N n=1  µn 1/2  σn2  converges in distribution to a standard normal random variable as N → ∞. Less formally, for “large” N , SN is approximately normally distributed with expected value N  E[SN ] ≈  E[Xn ] n=1  and variance  N  Var[SN ] ≈  Var[Xn ]. n=1  40  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items