Misplaced Pages

Pure inductive logic

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (March 2019)

Pure inductive logic (PIL) is the area of mathematical logic concerned with the philosophical and mathematical foundations of probabilistic inductive reasoning. It combines classical predicate logic and probability theory (Bayesian inference). Probability values are assigned to sentences of a first-order relational language to represent degrees of belief that should be held by a rational agent. Conditional probability values represent degrees of belief based on the assumption of some received evidence.

PIL studies prior probability functions on the set of sentences and evaluates the rationality of such prior probability functions through principles that such functions should arguably satisfy. Each of the principles directs the function to assign probability values and conditional probability values to sentences in some respect rationally. Not all desirable principles of PIL are compatible, so no prior probability function exists that satisfies them all. Some prior probability functions however are distinguished through satisfying an important collection of principles.

History

Inductive logic started to take a clearer shape in the early 20th century in the work of William Ernest Johnson and John Maynard Keynes, and was further developed by Rudolf Carnap. Carnap introduced the distinction between pure and applied inductive logic, and the modern Pure Inductive Logic evolves along the lines of the pure, uninterpreted approach envisaged by Carnap.

Framework

General case

In its basic form, PIL uses first-order logic without equality, with the usual connectives , , ¬ , {\displaystyle \wedge ,\vee ,\neg ,\to } (and, or, not and implies respectively), quantifiers , , {\displaystyle \exists ,\forall ,} finitely many predicate (relation) symbols, and countably many constant symbols a 1 , a 2 , a 3 , {\displaystyle a_{1},a_{2},a_{3},\ldots \,} .

There are no function symbols. The predicate symbols can be unary, binary or of higher arities. The finite set of predicate symbols may vary while the rest of the language is fixed. It is a convention to refer to the language as L {\displaystyle L} and write

L = { R 1 , R 2 , , R q } {\displaystyle L=\{R_{1},R_{2},\ldots ,R_{q}\}}

where the R i {\displaystyle R_{i}} list the predicate symbols. The set of all sentences is denoted S L {\displaystyle SL} . If a sentence is written with constants appearing in it listed then it is assumed that the list includes at least all those that appear. T L {\displaystyle {\cal {T}}L} is the set of structures for L {\displaystyle L} with universe { a 1 , a 2 , a 3 , } {\displaystyle \{a_{1},a_{2},a_{3},\ldots \}} and with each constant symbol a i {\displaystyle a_{i}} interpreted as itself.

A probability function for sentences of L {\displaystyle L} is a function w {\displaystyle w} with domain S L {\displaystyle SL} and values in the unit interval [ 0 , 1 ] {\displaystyle } satisfying the following conditions:

– any logically valid sentence θ {\displaystyle \theta } has probability 1 : {\displaystyle 1\!:\,} w ( θ ) = 1 {\displaystyle w(\theta )=1}
– if sentences θ {\displaystyle \theta } and ϕ {\displaystyle \phi } are mutually exclusive then w ( θ ϕ ) = w ( θ ) + w ( ϕ ) {\displaystyle w(\theta \vee \phi )=w(\theta )+w(\phi )}
– for a formula ψ ( x ) {\displaystyle \psi (x)} with one free variable the probability of x ψ ( x ) {\displaystyle \exists x\,\psi (x)} is the limit of probabilities of ψ ( a 1 ) ψ ( a 2 ) ψ ( a n ) {\displaystyle \psi (a_{1})\vee \psi (a_{2})\vee \ldots \vee \psi (a_{n})} as n {\displaystyle n} tends to {\displaystyle \infty } .

This last condition, which goes beyond the standard Kolmogorov axioms (for finite additivity) is referred to as Gaifman's Axiom and it is intended to capture the idea that the a i {\displaystyle a_{i}} exhaust the universe.

For a probability function w {\displaystyle w} and a sentence ϕ {\displaystyle \phi } with w ( ϕ ) > 0 {\displaystyle w(\phi )>0} , the corresponding conditional probability function w ( . | ϕ ) {\displaystyle w(\,.|\,\phi )} is defined by

w ( θ ϕ ) = w ( θ φ ) w ( φ )   ( θ S L ) . {\displaystyle w(\theta \mid \phi )={\frac {w(\theta \wedge \varphi )}{w(\varphi )}}\quad \ (\theta \in SL).}

Unlike belief functions in many valued logics, it is not the case that the probability value of a compound sentence is determined by the probability values of its components. Probability respects the classical semantics: logically equivalent sentences must be given the same probability. Hence logically equivalent sentences are often identified.

A state description for a finite set of constants is a conjunction of atomic sentences (predicates or their negations) instantiated exclusively by these constants, such that for any eligible atomic sentence either it or its negation (but not both) appears in the conjunction.

Any probability function is uniquely determined by its values on state descriptions. To define a probability function, it suffices to specify nonnegative values of all state descriptions for a 1 , , a n {\displaystyle a_{1},\ldots ,a_{n}} (for all n {\displaystyle n} ) so that the values of all state descriptions for a 1 , , a n , a n + 1 {\displaystyle a_{1},\ldots ,a_{n},a_{n+1}} extending a given state description for a 1 , , a n {\displaystyle a_{1},\ldots ,a_{n}} sum to the value of the state description they all extend, with the convention that the (only) state description for no constants is a tautology and that has value 1 {\displaystyle 1} .

If Θ {\displaystyle \Theta } is a state description for a set of constants including a i , a j {\displaystyle a_{i},a_{j}} then it is said that a i , a j {\displaystyle a_{i},a_{j}} are indistinguishable in Θ {\displaystyle \Theta } , a i Θ a j {\displaystyle a_{i}\sim _{\Theta }a_{j}} , just when upon adding equality to the language (and axioms of equality to the logic) the sentence Θ a i = a j {\displaystyle \Theta \wedge a_{i}=a_{j}} is consistent. Θ {\displaystyle \,\sim _{\Theta }} is an equivalence relation.

Unary case

In the special case of Unary PIL, all the predicates R 1 , , R q {\displaystyle R_{1},\ldots ,R_{q}} are unary. Formulae of the form

                        β ( x ) = ± R 1 ( x ) ± R 2 ( x ) ± R q ( x ) {\displaystyle ~~~~~~~~~~~~\beta (x)=\pm R_{1}(x)\wedge \pm R_{2}(x)\wedge \ldots \wedge \pm R_{q}(x)}

where ± R {\displaystyle \pm R} stands for one of R {\displaystyle R} , ¬ R {\displaystyle \neg R} , are called atoms. It is assumed that they are listed in some fixed order as β 1 , β 2 , , β 2 q {\displaystyle \beta _{1},\beta _{2},\ldots ,\beta _{2^{q}}} .

A state description specifies an atom for each constant involved in it, and it can be written as a conjunction of these atoms instantiated by the corresponding constants. Two constants are indistinguishable in the state description if it specifies the same atom for both of them.

Central question

Assume a rational agent inhabits a structure in T L {\displaystyle {\cal {T}}L} but knows nothing about which one it is. What probability function w {\displaystyle w} should s/he adopt when w ( θ ) {\displaystyle w(\theta )} is to represent his/her degree of belief that a sentence θ {\displaystyle \theta } is true in this ambient structure?

Rational principles

General rational principles

The following principles have been proposed as desirable properties of a rational prior probability function w {\displaystyle w} for L {\displaystyle L} .

The constant exchangeability principle, Ex. The probability of a sentence θ ( a 1 , a 2 , , a m ) {\displaystyle \theta (a_{1},a_{2},\ldots ,a_{m})} does not change when the a 1 , a 2 , , a m {\displaystyle a_{1},a_{2},\ldots ,a_{m}} in it are replaced by any other m {\displaystyle m} -tuple of (distinct) constants.

The principle of predicate exchangeability, Px. If R , R {\displaystyle R,R'} are predicates of the same arity then for a sentence θ {\displaystyle \theta } ,

w ( θ ) = w ( θ ) {\displaystyle w(\theta )=w(\theta ')}

where θ {\displaystyle \theta '} is the result of simultaneously replacing R {\displaystyle R} by R {\displaystyle R'} and R {\displaystyle R'} by R {\displaystyle R} throughout θ {\displaystyle \theta } .

The strong negation principle, SN. For a predicate R {\displaystyle R} and sentence θ {\displaystyle \theta } ,

w ( θ ) = w ( θ ) {\displaystyle w(\theta )=w(\theta ')}

where θ {\displaystyle \theta '} is the result of simultaneously replacing R {\displaystyle R} by ¬ R {\displaystyle \neg R} and ¬ R {\displaystyle \neg R} by R {\displaystyle R} throughout θ {\displaystyle \theta } .

The principle of regularity, Reg. If a quantifier-free sentence θ {\displaystyle \theta } is satisfiable then w ( θ ) > 0 {\displaystyle w(\theta )>0} .

The principle of super regularity (universal certainty), SReg. If a sentence θ {\displaystyle \theta } is satisfiable then w ( θ ) > 0 {\displaystyle w(\theta )>0} .

The constant irrelevance principle, IP. If sentences θ , ϕ {\displaystyle \theta ,\phi } have no constants in common then w ( θ ϕ ) = w ( θ ) w ( ϕ ) {\displaystyle w(\theta \wedge \phi )=w(\theta )\cdot w(\phi )} .

The weak irrelevance principle, WIP. If sentences θ , ϕ {\displaystyle \theta ,\phi } have no constants nor predicates in common then w ( θ ϕ ) = w ( θ ) w ( ϕ ) {\displaystyle w(\theta \wedge \phi )=w(\theta )\cdot w(\phi )} .

Language invariance principle, Li. There is a family of probability functions w J {\displaystyle w^{J}} , one on each language J {\displaystyle J} , all satisfying Px and Ex, and such that w L = w {\displaystyle w^{L}=w} and if all predicates of J {\displaystyle J} belong also to K {\displaystyle K} then w J {\displaystyle w^{J}} and w K {\displaystyle w^{K}} agree on sentences of J {\displaystyle J} .

The (strong) counterpart principle, CP. If θ , θ {\displaystyle \theta ,\theta '} are sentences such that θ {\displaystyle \theta '} is the result of replacing some constant/relation symbols in θ {\displaystyle \theta } by new constant/relation symbols of the same arity not occurring in θ {\displaystyle \theta } then

w ( θ θ ) w ( θ ) . {\displaystyle w(\theta \mid \theta ')\geq w(\theta ).}

(SCP) If moreover θ {\displaystyle \theta ''} is the result of replacing the same and possibly also additional constant/relation symbols in θ {\displaystyle \theta } by new constant/relation symbols of the same arity not occurring in θ {\displaystyle \theta } then

w ( θ θ ) w ( θ θ ) w ( θ ) . {\displaystyle w(\theta \mid \theta ')\geq w(\theta \mid \theta '')\geq w(\theta ).}

The Invariance Principle, INV. If F {\displaystyle F} is an isomorphism of the Lindenbaum-Tarski algebra of sentences of L {\displaystyle L} supported by some permutation μ {\displaystyle \mu } of T L {\displaystyle {\cal {T}}L} in the sense that for sentences θ , ϕ {\displaystyle \theta ,\phi } ,

F ( [ θ ] ) = [ ϕ ]   {\displaystyle F()=~} just when   M θ μ ( M ) ϕ {\displaystyle ~M\models \theta \Longleftrightarrow \mu (M)\models \phi }

then w ( θ ) = w ( ϕ ) {\displaystyle w(\theta )=w(\phi )} .

The Permutation Invariance Principle, PIP. As INV except that F {\displaystyle F} is additionally required to map (equivalence classes of) state descriptions to (equivalence classes of) state descriptions.

The Spectrum Exchangeability Principle, Sx. The probability w ( Θ ) {\displaystyle w(\Theta )} of a state description Θ {\displaystyle \Theta } depends only on the spectrum of Θ {\displaystyle \Theta } , that is, on the multiset of sizes of equivalence classes with respect to the equivalence relation Θ {\displaystyle \sim _{\Theta }} .

Li with Sx. As the Language Invariance Principle but all the probability functions in the family also satisfy Spectrum Exchangeability.

The Principle of Induction, PI. Let Θ {\displaystyle \Theta } be a state description and a k {\displaystyle a_{k}} a constant not appearing in Θ {\displaystyle \Theta } . Let Φ {\displaystyle \Phi } , Ψ {\displaystyle \Psi } be state descriptions extending Θ {\displaystyle \Theta } to include (just) a k {\displaystyle a_{k}} . If a k {\displaystyle a_{k}} is Φ {\displaystyle \sim _{\Phi }} -equivalent to some and at least as many constants as it is Ψ {\displaystyle \sim _{\Psi }} -equivalent to then w ( Φ Θ ) w ( Ψ Θ ) {\displaystyle w(\Phi \mid \Theta )\geq w(\Psi \mid \Theta )} .

Further rational principles for unary PIL

The Principle of Instantial Relevance, PIR. For a sentence θ {\displaystyle \theta } , atom β {\displaystyle \beta } and constants a k , a m {\displaystyle a_{k},a_{m}} not appearing in θ {\displaystyle \theta } ,

w ( β ( a k ) β ( a m ) θ ) w ( β ( a k ) θ ) {\displaystyle w(\beta (a_{k})\mid \beta (a_{m})\wedge \theta )\geq w(\beta (a_{k})\mid \theta )} .

The Generalized Principle of Instantial Relevance, GPIR. For quantifier-free sentences ψ ( a k ) , ϕ ( a m ) , θ {\displaystyle \psi (a_{k}),\phi (a_{m}),\theta } with constants a k , a m {\displaystyle a_{k},a_{m}} not appearing in θ {\displaystyle \theta } , if ψ ( x ) ϕ ( x ) {\displaystyle \psi (x)\models \phi (x)} then

w ( ψ ( a k ) ϕ ( a m ) θ ) w ( ψ ( a k ) θ ) . {\displaystyle w(\psi (a_{k})\mid \phi (a_{m})\wedge \theta )\geq w(\psi (a_{k})\mid \theta ).}

Johnson Sufficientness Principle, JSP. For a state description Θ {\displaystyle \Theta } for n {\displaystyle n} constants, atom β {\displaystyle \beta } and constant a k {\displaystyle a_{k}} not appearing in Θ {\displaystyle \Theta } , the probability

w ( β ( a k ) Θ ) {\displaystyle w(\beta (a_{k})\mid \Theta )}

depends only on n {\displaystyle n} and on the number of constants for which Θ {\displaystyle \Theta } specifies β {\displaystyle \beta } .

The Principle of Atom Exchangeability, Ax. If τ {\displaystyle \tau } is a permutation of { 1 , 2 , , 2 q } {\displaystyle \{1,2,\ldots ,2^{q}\}} and Θ {\displaystyle \Theta } is a state description expressed as a conjunction of instantiated atoms then w ( Θ ) = w ( Θ ) {\displaystyle w(\Theta )=w(\Theta ')} where Θ {\displaystyle \Theta '} obtains from Θ {\displaystyle \Theta } upon replacing each β i {\displaystyle \beta _{i}} by β τ ( i ) {\displaystyle \beta _{\tau (i)}} .

Reichenbach's Axiom, RA. Let β h i {\displaystyle \beta _{h_{i}}} for i = 1 , 2 , 3 , {\displaystyle i=1,2,3,\ldots } be an infinite sequence of atoms and β {\displaystyle \beta } an atom. Then as n {\displaystyle n} tends to {\displaystyle \infty } , the difference between the conditional probability

w ( β ( a n + 1 ) β h 1 ( a 1 ) β h 2 ( a 2 ) β h n ( a n ) ) {\displaystyle w(\beta (a_{n+1})\mid \beta _{h_{1}}(a_{1})\wedge \beta _{h_{2}}(a_{2})\wedge \ldots \wedge \beta _{h_{n}}(a_{n}))}

and the proportion of occurrences of β {\displaystyle \beta } amongst the β h 1 , β h 2 , , β h n {\displaystyle \beta _{h_{1}},\beta _{h_{2}},\ldots ,\beta _{h_{n}}} tends to 0 {\displaystyle 0} .

Principle of Induction for Unary languages, UPI. For a state description Θ {\displaystyle \Theta } , atoms β i , β j {\displaystyle \beta _{i},\beta _{j}} and constant a k {\displaystyle a_{k}} not appearing in Θ {\displaystyle \Theta } , if Θ {\displaystyle \Theta } specifies β i {\displaystyle \beta _{i}} for at least as many constants as β j {\displaystyle \beta _{j}} then

w ( β i ( a k ) Θ ) w ( β j ( a k ) Θ ) . {\displaystyle w(\beta _{i}(a_{k})\mid \Theta )\geq w(\beta _{j}(a_{k})\mid \Theta ).}

Recovery. Whenever Ψ ( a 1 , a 2 , , a n ) {\displaystyle \Psi (a_{1},a_{2},\ldots ,a_{n})} is a state description then there is another state description Φ ( a n + 1 , a n + 2 , , a h ) {\displaystyle \Phi (a_{n+1},a_{n+2},\ldots ,a_{h})} such that w ( Φ Ψ ) 0 {\displaystyle w(\Phi \wedge \Psi )\neq 0} and for any quantifier-free sentence θ ( a h + 1 , a h + 2 , , a h + g ) {\displaystyle \theta (a_{h+1},a_{h+2},\ldots ,a_{h+g})} ,

w ( θ ( a h + 1 , a h + 2 , , a h + g ) | Φ Ψ ) = w ( θ ( a h + 1 , a h + 2 , , a h + g ) ) . {\displaystyle w(\theta (a_{h+1},a_{h+2},\ldots ,a_{h+g})\,|\,\Phi \wedge \Psi )=w(\theta (a_{h+1},a_{h+2},\ldots ,a_{h+g})).}

Unary Language Invariance Principle, ULi. As Li, but with the languages restricted to the unary ones.

ULi with Ax. As ULi but with all the probability functions in the family also satisfying Atom Exchangeability.

Relationships between principles

General Case

Sx implies Ex, Px and SN.

PIP + Ex implies Sx.

INV implies PIP and Ex.

Li implies CP and SCP.

Li with Sx implies PI.

Unary case

Ex implies PIR.

Ax is equivalent to PIP.

Ax+Ex implies UPI.

Ax+Ex is equivalent to Sx.

ULi with Ax implies Li with Sx.

Important probability functions

General probability functions

Functions V M {\displaystyle V_{M}} . For a given structure M T L {\displaystyle M\in {\cal {T}}L} and θ S L {\displaystyle \theta \in SL} ,

V M ( θ ) = { 1 i f   M θ , 0 o t h e r w i s e . {\displaystyle V_{M}(\theta )=\left\{{\begin{array}{ll}1&{\rm {if}}~M\models \theta ,\\0&{\rm {otherwise}}.\end{array}}\right.}

Functions ω Ψ {\displaystyle \omega ^{\Psi }} . For a given state description Ψ ( a 1 , a 2 , , a K ) {\displaystyle \Psi (a_{1},a_{2},\ldots ,a_{K})} , ω Ψ {\displaystyle \,\omega ^{\Psi }} is defined via specifying its values for state descriptions as follows. ω Ψ ( Θ ( a 1 , a 2 , , a n ) ) {\displaystyle \,\omega ^{\Psi }(\Theta (a_{1},a_{2},\ldots ,a_{n}))} is the probability that when a h 1 , a h 2 , , a h n {\displaystyle a_{h_{1}},a_{h_{2}},\ldots ,a_{h_{n}}} are randomly picked from { a 1 , , a K } {\displaystyle \{a_{1},\ldots ,a_{K}\}} , with replacement and according to the uniform distribution, then Ψ ( a 1 , , a K ) Θ ( a h 1 , a h 2 , , a h n ) . {\displaystyle \Psi (a_{1},\ldots ,a_{K})\models \Theta (a_{h_{1}},a_{h_{2}},\ldots ,a_{h_{n}}).}

Functions ( ω Ψ ) {\displaystyle ^{\circ }\!(\omega ^{\Psi })} . As above but employing a non-standard universe (starting with a possibly non-standard state description Ψ {\displaystyle \Psi } ) to obtain the standard ( ω Ψ ) {\displaystyle ^{\circ }\!(\omega ^{\Psi })} .

{\displaystyle \bullet } The ( ω Ψ ) {\displaystyle ^{\circ }\!(\omega ^{\Psi })} are the only probability functions that satisfy Ex and IP.

Functions u p ¯ {\displaystyle u^{\overline {p}}} . For a given infinite sequence p ¯ = p 0 , p 1 , p 2 , p 3 , {\displaystyle {\overline {p}}=\langle p_{0},p_{1},p_{2},p_{3},\ldots \rangle } of non-negative real numbers such that

p 1 p 2 p 3 0 {\displaystyle p_{1}\geq p_{2}\geq p_{3}\geq \ldots \geq 0\,\,} and   i = 0 p i = 1 {\displaystyle ~\sum _{i=0}^{\infty }p_{i}=1} ,

u p ¯ {\displaystyle u^{\overline {p}}} is defined via specifying its values for state descriptions as follows:

For a sequence c = c 1 , c 2 , , c n {\displaystyle {\vec {c}}=\langle c_{1},c_{2},\ldots ,c_{n}\rangle } of natural numbers and a state description Θ ( a 1 , a 2 , , a n ) {\displaystyle \Theta (a_{1},a_{2},\ldots ,a_{n})} , Θ {\displaystyle \Theta } is consistent with c {\displaystyle {\vec {c}}} if whenever c s = c t 0 {\displaystyle c_{s}=c_{t}\neq 0} then a s Θ a t {\displaystyle a_{s}\sim _{\Theta }a_{t}} . C ( c ) {\displaystyle C({\vec {c}})} is the number of state descriptions for a 1 , a 2 , , a n {\displaystyle a_{1},a_{2},\ldots ,a_{n}} consistent with c {\displaystyle {\vec {c}}} . u p ¯ ( Θ ) {\displaystyle \,u^{\overline {p}}(\Theta )} is the sum over those c {\displaystyle {\vec {c}}} with which Θ {\displaystyle \Theta } is compatible, of

C ( c ) 1 s = 1 n p c s . {\displaystyle C({\vec {c}})^{-1}\prod _{s=1}^{n}p_{c_{s}}.}

{\displaystyle \bullet } The u p ¯ {\displaystyle u^{\overline {p}}} are the only probability functions that satisfy WIP and Li with Sx. (The language invariant family witnessing Li with Sx consists of the functions u p ¯ , J {\displaystyle u^{{\overline {p}},J}} with fixed p ¯ {\displaystyle {\overline {p}}} , where u p ¯ , J {\displaystyle u^{{\overline {p}},J}} is as u p ¯ {\displaystyle u^{\overline {p}}} but defined with language J {\displaystyle J} .)

Further probability functions (unary PIL)

Functions w {\displaystyle w} c {\displaystyle {\vec {c}}} . For a vector c = c 1 , c 2 , , c 2 q {\displaystyle {\vec {c}}=\langle c_{1},c_{2},\ldots ,c_{2^{q}}\rangle } of non-negative real numbers summing to one, w {\displaystyle w} c {\displaystyle {\vec {c}}} is defined via specifying its values for state descriptions as follows:

w {\displaystyle w} c {\displaystyle {\vec {c}}} ( Θ ) = j = 1 2 q c j m j {\displaystyle (\Theta )=\prod _{j=1}^{2^{q}}c_{j}^{m_{j}}}

where m j {\displaystyle m_{j}} the is number of constants for which Θ {\displaystyle \Theta } specifies β j {\displaystyle \beta _{j}} .

{\displaystyle \bullet } The w {\displaystyle w} c {\displaystyle {\vec {c}}} are the only probability functions that satisfy Ex and IP (they are also expressible as ( w Ψ ) {\displaystyle ^{\circ }\!(w^{\Psi })} ).

Carnap continuum functions c λ . {\displaystyle c_{\lambda }.\,} For λ > 0 {\displaystyle \lambda >0} , the probability function c λ {\displaystyle c_{\lambda }} is uniquely determined by the values

c λ ( β j ( a n + 1 ) Θ ) = m j + λ 2 q n + λ {\displaystyle c_{\lambda }(\beta _{j}(a_{n+1})\mid \Theta )={\frac {m_{j}+\lambda 2^{-q}}{n+\lambda }}}

where Θ {\displaystyle \Theta } is a state description for n {\displaystyle n} constants not including a k {\displaystyle a_{k}} and m j {\displaystyle m_{j}} is the number of constants for which Θ {\displaystyle \Theta } specifies β j {\displaystyle \beta _{j}} .

Furthermore, c {\displaystyle c_{\infty }} is the probability function that assigns 2 n q {\displaystyle 2^{-nq}} to every state description for n {\displaystyle n} constants and c 0 {\displaystyle c_{0}} is the probability function that assigns 2 q {\displaystyle 2^{-q}} to any state description in which all constants are indistinguishable, 0 {\displaystyle 0} to any other state description.

{\displaystyle \bullet } The c λ {\displaystyle c_{\lambda }} are the only probability functions that satisfy Ex and JSP.

{\displaystyle \bullet } They also satisfy Li – the functions c λ J {\displaystyle c_{\lambda }^{J}} with fixed λ {\displaystyle \lambda } , where c λ J {\displaystyle c_{\lambda }^{J}} is as c λ {\displaystyle c_{\lambda }} but defined with language J {\displaystyle J} provide the unary language-invariant family members.

Functions w δ {\displaystyle w^{\delta }} . For ( 2 q 1 ) 1 δ 1 {\displaystyle -(2^{q}-1)^{-1}\leq \delta \leq 1} , w δ {\displaystyle w^{\delta }} is the average of the 2 q {\displaystyle 2^{q}} functions w {\displaystyle w} c {\displaystyle {\vec {c}}} where c {\displaystyle {\vec {c}}} has all but one coordinate equal to each other with the odd coordinate differing from them by δ {\displaystyle \delta } , so

w δ = 2 q i = 1 2 q {\displaystyle w^{\delta }=2^{-q}\sum _{i=1}^{2^{q}}} w {\displaystyle w} e i {\displaystyle {\vec {e_{i}}}}

where e i = γ , γ , , γ , γ + δ , γ , , γ   {\displaystyle {\vec {e_{i}}}=\langle \gamma ,\gamma ,\ldots ,\gamma ,\gamma +\delta ,\gamma ,\ldots ,\gamma \rangle ~} , ( γ + δ {\displaystyle \gamma +\delta } in i {\displaystyle i} th place) and γ = 2 q ( 1 δ ) {\displaystyle \gamma =2^{-q}(1-\delta )} .

For 0 δ 1 {\displaystyle 0\leq \delta \leq 1} , the w δ {\displaystyle w^{\delta }} are equal to u p ¯ {\displaystyle u^{\bar {p}}} for

p ¯ = 1 δ , δ , 0 , 0 , 0 , {\displaystyle {\bar {p}}=\langle 1-\delta ,\delta ,0,0,0,\ldots \rangle }

and as such they satisfy Li.

{\displaystyle \bullet } The w δ {\displaystyle w^{\delta }} are the only functions that satisfy GPIR, Ex, Ax and Reg.

{\displaystyle \bullet } The w δ {\displaystyle w^{\delta }} with 0 δ < 1 {\displaystyle 0\leq \delta <1} are the only functions that satisfy Recovery, Reg and ULi with Ax.

Representation theorems

A representation theorem for a class of probability functions provides means of expressing every probability function in the class in terms of generic, relatively simple probability functions from the same class.

Representation Theorem for all probability functions. Every probability function w {\displaystyle w} for L {\displaystyle L} can be represented as

w = T L V M d μ ( M ) {\displaystyle w=\int _{{\cal {T}}L}V_{M}\,d\mu (M)}

where μ {\displaystyle \mu } is a σ {\displaystyle \sigma } -additive measure on the σ {\displaystyle \sigma } -algebra of subsets of T L {\displaystyle {\cal {T}}L} generated by the sets

{ M T L M θ }         ( θ S L ) . {\displaystyle \{\,M\in {\cal {T}}L\mid M\vDash \theta \,\}~~~~(\theta \in SL).}

Representation Theorem for Ex (employing non-standard analysis and Loeb Integration Theory). Every probability function w {\displaystyle w} for L {\displaystyle L} satisfying Ex can be represented as

w = A ( ω Ψ ) d μ ( Ψ ) {\displaystyle w=\int _{A}\,^{\circ }\!(\omega ^{\Psi })\,d\mu (\Psi )}

where A {\displaystyle A} is an internal set of state descriptions for a 1 , a 2 , , a ν {\displaystyle a_{1},a_{2},\ldots ,a_{\nu }} (with ν {\displaystyle \nu } a fixed infinite natural number) and μ {\displaystyle \mu } is a σ {\displaystyle \sigma } -additive measure on a σ {\displaystyle \sigma } -algebra of subsets of A {\displaystyle A} .

Representation Theorem for Li with Sx. Every probability function w {\displaystyle w} for L {\displaystyle L} satisfying Li with Sx can be represented as

w = B u p ¯ d μ ( p ¯ ) {\displaystyle w=\int _{\mathbb {B} }\,u^{\overline {p}}\,d\mu ({\overline {p}})}

where B {\displaystyle {\mathbb {B} }} is the set of sequences

p ¯ = p 0 , p 1 , p 2 , p 3 , {\displaystyle {\overline {p}}=\langle p_{0},p_{1},p_{2},p_{3},\ldots \rangle }

of non-negative reals summing to 1 {\displaystyle 1} and such that p 1 p 2 p 3 0 {\displaystyle p_{1}\geq p_{2}\geq p_{3}\geq \ldots \,\geq 0\,} and μ {\displaystyle \mu } is a σ {\displaystyle \sigma } -additive measure on the Borel subsets of B {\displaystyle {\mathbb {B} }} in the product topology.

de Finetti's Representation Theorem (unary). In the unary case (where L {\displaystyle L} is a language containing q {\displaystyle q} unary predicates), the representation theorem for Ex is equivalent to:

Every probability function w {\displaystyle w} for L {\displaystyle L} satisfying Ex can be represented as

w = D w x d μ ( x ) . {\displaystyle w=\int _{\mathbb {D} }w_{\vec {x}}\,d\mu ({\vec {x}}).}

where D {\displaystyle {\mathbb {D} }} is the set of vectors x = x 1 , x 2 , , x 2 q {\displaystyle {\vec {x}}=\langle x_{1},x_{2},\ldots ,x_{2^{q}}\rangle } of non-negative real numbers summing to one and μ {\displaystyle \mu } is a σ {\displaystyle \sigma } -additive measure on D {\displaystyle {\mathbb {D} }} .

Notes

  1. Rudolf Carnap (1971). A Basic System of Inductive Logic, in Studies in Inductive Logic and Probability, Volume 1, pp 69-70.
  2. Cutland, N.J., Loeb measure theory, in Developments in Nonstandard Mathematics, Eds. N.J.Cutland, F.Oliveira, V.Neves, J.Sousa-Pinto, Pitman Research Notes in Mathematics Series, Vol. 336, Longman Press, 1995, pp151-177.

References

Category:
Pure inductive logic Add topic