Misplaced Pages

Dyck language

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Dyck word) Language consisting of balanced strings of brackets
This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.
Find sources: "Dyck language" – news · newspapers · books · scholar · JSTOR (April 2015)

In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of brackets. The set of Dyck words forms a Dyck language. The simplest, Dyck-1, uses just two matching brackets, e.g. ( and ).

Dyck words and language are named after the mathematician Walther von Dyck. They have applications in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions.

Formal definition

Let Σ = { [ , ] } {\displaystyle \Sigma =\{\}} be the alphabet consisting of the symbols . Let Σ {\displaystyle \Sigma ^{*}} denote its Kleene closure. The Dyck language is defined as:

{ u Σ |  all prefixes of  u  contain no more ]'s than ['s  and the number of ['s in  u  equals the number of ]'s } . {\displaystyle \{u\in \Sigma ^{*}\vert {\text{ all prefixes of }}u{\text{ contain no more ]'s than 's}}\}.}

Context-free grammar

It may be helpful to define the Dyck language via a context-free grammar in some situations. The Dyck language is generated by the context-free grammar with a single non-terminal S, and the production:

Sε | "" S

That is, S is either the empty string (ε) or is "", and an element of the Dyck language.

An alternative context-free grammar for the Dyck language is given by the production:

S → ("")

That is, S is zero or more occurrences of the combination of "", where multiple elements of the Dyck language on the right side of the production are free to differ from each other.

Alternative definition

In yet other contexts it may instead be helpful to define the Dyck language by splitting Σ {\displaystyle \Sigma ^{*}} into equivalence classes, as follows. For any element u Σ {\displaystyle u\in \Sigma ^{*}} of length | u | {\displaystyle |u|} , we define partial functions insert : Σ × N Σ {\displaystyle \operatorname {insert} :\Sigma ^{*}\times \mathbb {N} \rightarrow \Sigma ^{*}} and delete : Σ × N Σ {\displaystyle \operatorname {delete} :\Sigma ^{*}\times \mathbb {N} \rightarrow \Sigma ^{*}} by

insert ( u , j ) {\displaystyle \operatorname {insert} (u,j)} is u {\displaystyle u} with " [ ] {\displaystyle } " inserted into the j {\displaystyle j} th position
delete ( u , j ) {\displaystyle \operatorname {delete} (u,j)} is u {\displaystyle u} with " [ ] {\displaystyle } " deleted from the j {\displaystyle j} th position

with the understanding that insert ( u , j ) {\displaystyle \operatorname {insert} (u,j)} is undefined for j > | u | {\displaystyle j>|u|} and delete ( u , j ) {\displaystyle \operatorname {delete} (u,j)} is undefined if j > | u | 2 {\displaystyle j>|u|-2} . We define an equivalence relation R {\displaystyle R} on Σ {\displaystyle \Sigma ^{*}} as follows: for elements a , b Σ {\displaystyle a,b\in \Sigma ^{*}} we have ( a , b ) R {\displaystyle (a,b)\in R} if and only if there exists a sequence of zero or more applications of the insert {\displaystyle \operatorname {insert} } and delete {\displaystyle \operatorname {delete} } functions starting with a {\displaystyle a} and ending with b {\displaystyle b} . That the sequence of zero operations is allowed accounts for the reflexivity of R {\displaystyle R} . Symmetry follows from the observation that any finite sequence of applications of insert {\displaystyle \operatorname {insert} } to a string can be undone with a finite sequence of applications of delete {\displaystyle \operatorname {delete} } . Transitivity is clear from the definition.

The equivalence relation partitions the language Σ {\displaystyle \Sigma ^{*}} into equivalence classes. If we take ϵ {\displaystyle \epsilon } to denote the empty string, then the language corresponding to the equivalence class Cl ( ϵ ) {\displaystyle \operatorname {Cl} (\epsilon )} is called the Dyck language.

Generalizations

Typed Dyck language

There exist variants of the Dyck language with multiple delimiters, e.g., Dyck-2 on the alphabet "(", ")", "". The words of such a language are the ones which are well-parenthesized for all delimiters, i.e., one can read the word from left to right, push every opening delimiter on the stack, and whenever we reach a closing delimiter then we must be able to pop the matching opening delimiter from the top of the stack. (The counting algorithm above does not generalise).

For example, the following is a valid sentence in Dyck-3:

( { } ] ( ) { ( ) } )

Finite depth

A Dyck language sentence can be pictured as a descent and ascent through the levels of nested brackets. As one reads along a Dyck sentence, each opening bracket increases the nesting depth by 1, and each closing bracket decreases by 1. The depth of a sentence is the maximal depth reached within the sentence.

For example, we can annotate the following sentence with the depth at each step:

0 ( 1 2 { 3 } 2 ] 1 ( 2 ) 1 { 2 ( 3 ) 2 } 1 ) 0 0

and the entire sentence has depth 3.

We define Dyck-(k, m) as the language with k types of brackets and maximal depth m. This has applications in the formal theory of recurrent neural networks.

Properties

  • The Dyck language is closed under the operation of concatenation.
  • By treating Σ {\displaystyle \Sigma ^{*}} as an algebraic monoid under concatenation we see that the monoid structure transfers onto the quotient Σ / R {\displaystyle \Sigma ^{*}/R} , resulting in the syntactic monoid of the Dyck language. The class Cl ( ϵ ) {\displaystyle \operatorname {Cl} (\epsilon )} will be denoted 1 {\displaystyle 1} .
  • The syntactic monoid of the Dyck language is not commutative: if u = Cl ( [ ) {\displaystyle u=\operatorname {Cl} ([)} v = Cl ( ] ) {\displaystyle v=\operatorname {Cl} (])} then u v = Cl ( [ ] ) = 1 Cl ( ] [ ) = v u {\displaystyle uv=\operatorname {Cl} ()=1\neq \operatorname {Cl} (][)=vu} .
  • With the notation above, u v = 1 {\displaystyle uv=1} but neither u {\displaystyle u} nor v {\displaystyle v} are invertible in Σ / R {\displaystyle \Sigma ^{*}/R} .
  • The syntactic monoid of the Dyck language is isomorphic to the bicyclic semigroup by virtue of the properties of Cl ( [ ) {\displaystyle \operatorname {Cl} ([)} Cl ( ] ) {\displaystyle \operatorname {Cl} (])} described above.
  • By the Chomsky–Schützenberger representation theorem, any context-free language is a homomorphic image of the intersection of some regular language with a Dyck language on one or more kinds of bracket pairs.
  • The Dyck language with two distinct types of brackets can be recognized in the complexity class T C 0 {\displaystyle TC^{0}} .
  • The number of distinct Dyck words with exactly n pairs of parentheses and k innermost pairs (viz. the substring [   ] {\displaystyle } ) is the Narayana number N ( n , k ) {\displaystyle \operatorname {N} (n,k)} .
  • The number of distinct Dyck words with exactly n pairs of parentheses is the n-th Catalan number C n {\displaystyle C_{n}} . Notice that the Dyck language of words with n parentheses pairs is equal to the union, over all possible k, of the Dyck languages of words of n parentheses pairs with k innermost pairs, as defined in the previous point. Since k can range from 0 to n, we obtain the following equality, which indeed holds:
C n = k = 1 n N ( n , k ) {\displaystyle C_{n}=\sum _{k=1}^{n}\operatorname {N} (n,k)}

Examples

Lattice of the 14 Dyck words of length 8 - interpreted as up and down

We can define an equivalence relation L {\displaystyle L} on the Dyck language D {\displaystyle {\mathcal {D}}} . For u , v D {\displaystyle u,v\in {\mathcal {D}}} we have ( u , v ) L {\displaystyle (u,v)\in L} if and only if | u | = | v | {\displaystyle |u|=|v|} , i.e. u {\displaystyle u} and v {\displaystyle v} have the same length. This relation partitions the Dyck language: D / L = { D 0 , D 1 , } {\displaystyle {\mathcal {D}}/L=\{{\mathcal {D}}_{0},{\mathcal {D}}_{1},\ldots \}} . We have D = D 0 D 2 D 4 = n = 0 D n {\displaystyle {\mathcal {D}}={\mathcal {D}}_{0}\cup {\mathcal {D}}_{2}\cup {\mathcal {D}}_{4}\cup \ldots =\bigcup _{n=0}^{\infty }{\mathcal {D}}_{n}} where D n = { u D | u | = n } {\displaystyle {\mathcal {D}}_{n}=\{u\in {\mathcal {D}}\mid |u|=n\}} . Note that D n {\displaystyle {\mathcal {D}}_{n}} is empty for odd n {\displaystyle n} .

Having introduced the Dyck words of length n {\displaystyle n} , we can introduce a relationship on them. For every n N {\displaystyle n\in \mathbb {N} } we define a relation S n {\displaystyle S_{n}} on D n {\displaystyle {\mathcal {D}}_{n}} ; for u , v D n {\displaystyle u,v\in {\mathcal {D}}_{n}} we have ( u , v ) S n {\displaystyle (u,v)\in S_{n}} if and only if v {\displaystyle v} can be reached from u {\displaystyle u} by a series of proper swaps. A proper swap in a word u D n {\displaystyle u\in {\mathcal {D}}_{n}} swaps an occurrence of ']'. For each n N {\displaystyle n\in \mathbb {N} } the relation S n {\displaystyle S_{n}} makes D n {\displaystyle {\mathcal {D}}_{n}} into a partially ordered set. The relation S n {\displaystyle S_{n}} is reflexive because an empty sequence of proper swaps takes u {\displaystyle u} to u {\displaystyle u} . Transitivity follows because we can extend a sequence of proper swaps that takes u {\displaystyle u} to v {\displaystyle v} by concatenating it with a sequence of proper swaps that takes v {\displaystyle v} to w {\displaystyle w} forming a sequence that takes u {\displaystyle u} into w {\displaystyle w} . To see that S n {\displaystyle S_{n}} is also antisymmetric we introduce an auxiliary function σ n : D n N {\displaystyle \sigma _{n}:{\mathcal {D}}_{n}\rightarrow \mathbb {N} } defined as a sum over all prefixes v {\displaystyle v} of u {\displaystyle u} :

σ n ( u ) = v w = u ( ( count of ['s in  v ) ( count of ]'s in  v ) ) {\displaystyle \sigma _{n}(u)=\sum _{vw=u}{\Big (}({\text{count of 's in }}v){\Big )}}

The following table illustrates that σ n {\displaystyle \sigma _{n}} is strictly monotonic with respect to proper swaps.

Strict monotonicity of σ n {\displaystyle \sigma _{n}}
partial sums of σ n ( u ) {\displaystyle \sigma _{n}(u)} P {\displaystyle P} P 1 {\displaystyle P-1} P {\displaystyle P} Q {\displaystyle Q}
u {\displaystyle u} {\displaystyle \ldots } ] [ {\displaystyle \ldots }
u {\displaystyle u'} {\displaystyle \ldots } [ ] {\displaystyle \ldots }
partial sums of σ n ( u ) {\displaystyle \sigma _{n}(u')} P {\displaystyle P} P + 1 {\displaystyle P+1} P {\displaystyle P} Q {\displaystyle Q}
Difference of partial sums 0 2 0 0

Hence σ n ( u ) σ n ( u ) = 2 > 0 {\displaystyle \sigma _{n}(u')-\sigma _{n}(u)=2>0} so σ n ( u ) < σ n ( u ) {\displaystyle \sigma _{n}(u)<\sigma _{n}(u')} when there is a proper swap that takes u {\displaystyle u} into u {\displaystyle u'} . Now if we assume that both ( u , v ) , ( v , u ) S n {\displaystyle (u,v),(v,u)\in S_{n}} and u v {\displaystyle u\neq v} , then there are non-empty sequences of proper swaps such u {\displaystyle u} is taken into v {\displaystyle v} and vice versa. But then σ n ( u ) < σ n ( v ) < σ n ( u ) {\displaystyle \sigma _{n}(u)<\sigma _{n}(v)<\sigma _{n}(u)} which is nonsensical. Therefore, whenever both ( u , v ) {\displaystyle (u,v)} and ( v , u ) {\displaystyle (v,u)} are in S n {\displaystyle S_{n}} , we have u = v {\displaystyle u=v} , hence S n {\displaystyle S_{n}} is antisymmetric.

The partial ordered set D 8 {\displaystyle D_{8}} is shown in the illustration accompanying the introduction if we interpret a as going down.

See also

Notes

  1. Hewitt, John; Hahn, Michael; Ganguli, Surya; Liang, Percy; Manning, Christopher D. (2020-10-15). "RNNs can generate bounded hierarchical languages with optimal memory". arXiv:2010.07515 .
  2. Kambites, Communications in Algebra Volume 37 Issue 1 (2009) 193-208
  3. Barrington and Corbett, Information Processing Letters 32 (1989) 251-256

References

Category:
Dyck language Add topic