8.1 Extra Arguments

In the previous chapter we introduced basic DCG notation. But DCGs offer more than we’ve seen so far. For a start, DCGs allow us to specify extra arguments. Extra arguments can be used for many purposes; we’ll examine three.

Context free grammars with features

As a first example, let’s see how extra arguments can be used to add features to context-free grammars.

Here’s the DCG we worked with in the previous chapter:

   s  -->  np,vp.
   
   np  -->  det,n.
   
   vp  -->  v,np.
   vp  -->  v.
   
   det  -->  [the].
   det  -->  [a].
   
   n  -->  [woman].
   n  -->  [man].
   
   v  -->  [shoots].

Now, suppose we wanted to deal with sentences like “She shoots him”, and “He shoots her”. What should we do? Well, obviously we should add rules saying that “he”, “she”, “him”, and “her” are pronouns:

   pro  -->  [he].
   pro  -->  [she].
   pro  -->  [him].
   pro  -->  [her].

Furthermore, we should add a rule saying that noun phrases can be pronouns:

   np  -->  pro.

In this new DCG any good? Well, up to a point, it works. For example:

   ?-  s([she,shoots,him],[]).
   yes

But there’s an obvious problem. The DCG will also accept a lot of sentences that are clearly wrong, such as “A woman shoots she”, “Her shoots a man”, and “Her shoots she”:

   ?-  s([a,woman,shoots,she],[]).
   yes
   
   ?-  s([her,shoots,a,man],[]).
   yes
   
   ?-  s([her,shoots,she],[]).
   yes

That is, the grammar doesn’t know that “she” and “he” are subject pronouns and cannot be used in object position; thus “A woman shoots she” is bad because it violates this basic fact about English. Moreover, the grammar doesn’t know that “her” and “him” are object pronouns and cannot be used in subject position; thus “Her shoots a man” is bad because it violates this constraint. As for “Her shoots she”, this manages to get both matters wrong at once.

Now, it’s pretty obvious what we have to do to put this right: we need to extend the DCG with information about which pronouns can occur in subject position and which in object position. The interesting question: how exactly are we to do this? First let’s look at a naive way of correcting this, namely adding new rules:

   s  -->  np_subject,vp.
   
   np_subject  -->  det,n.
   np_object    -->  det,n.
   np_subject  -->  pro_subject.
   np_object    -->  pro_object.
   
   vp  -->  v,np_object.
   vp  -->  v.
   
   det  -->  [the].
   det  -->  [a].
   
   n  -->  [woman].
   n  -->  [man].
   
   pro_subject  -->  [he].
   pro_subject  -->  [she].
   pro_object  -->  [him].
   pro_object  -->  [her].
   
   v  -->  [shoots].

Now this solution “works”. For example,

   ?-  s([her,shoots,she],[]).
   no

But neither computer scientists nor linguists would consider this a good solution. The trouble is, a small addition to the lexicon has led to quite a big change in the DCG. Let’s face it: “she” and “her” (and “he” and “him”) are the same in a lot of respects. But to deal with the property in which they differ (namely, in which position they can occur in sentences) we’ve had to make big changes to the grammar: in particular, we’ve doubled the number of noun phrase rules. If we had to make further changes (for example, to cope with plural noun phrases) things would get even worse. What we really need is a more delicate programming mechanism that allows us to cope with such facts without being forced to add rules all the time. And here’s where the extra arguments come into play. Look at the following grammar:

   s  -->  np(subject),vp.
   
   np(_)  -->  det,n.
   np(X)  -->  pro(X).
   
   vp  -->  v,np(object).
   vp  -->  v.
   
   det  -->  [the].
   det  -->  [a].
   
   n  -->  [woman].
   n  -->  [man].
   
   pro(subject)  -->  [he].
   pro(subject)  -->  [she].
   pro(object)  -->  [him].
   pro(object)  -->  [her].
   
   v  -->  [shoots].

The key thing to note is that this new grammar contains only one new noun phrase rule. In fact, it is very similar to the first grammar that we wrote, except that now the symbol np is associated with a new argument, either subject , object , _ or X . A linguist would say that we’ve added features to distinguish various kinds of noun phrase. In particular, note the four rules for the pronouns. Here we’ve used the extra argument to state which pronouns can occur in subject position, and which can occur in object position. Thus these rules are the most fundamental, for they give us the basic facts about how these pronouns can be used.

So what do the other rules do? Well, intuitively, the rule

   np(X)  -->  pro(X).

uses the extra argument (the variable X ) to pass these basic facts about pronouns up to noun phrases built out of them: because the variable X is used as the extra argument for both the np and the pronoun, Prolog unification will guarantee that they will be given the same value. In particular, if the pronoun we use is “she” (in which case X=subject ), then the np will (through its extra argument X=subject ) be marked as a subject np. On the other hand, if the pronoun we use is “her” (in which case X=object ), then the extra argument for np will be marked X=object too. And this, of course, is exactly the behaviour we want.

On the other hand, although noun phrases built using the rule

   np(_)  -->  det,n.

also have an extra argument, we’ve used the anonymous variable as its value. Essentially this means can be either , which is correct, for expressions built using this rule (such as “the man” and “a woman”) can be used in both subject and object position.

Now consider the rule

   vp  -->  v,np(object).

This says that to apply this rule we need to use a noun phrase whose extra argument unifies with object . This can be either noun phrases built from object pronouns or noun phrases such as “the man” and “a woman” which have the anonymous variable as the value of the extra argument. Crucially, pronouns marked has having subject as the value of the extra argument can’t be used here: the atoms object and subject don’t unify. Note that the rule

   s  -->  np(subject),vp.

works in an analogous fashion to prevent noun phrases made of object pronouns from ending up in subject position.

This works. You can check it out by posing the query:

   ?-  s(X,[]).

As you step through the responses, you’ll see that only acceptable English is generated.

But while the intuitive explanation just given is correct, what’s really going on? The key thing to remember is that DCG rules are just a convenient abbreviation. For example, the rule

   s  -->  np,vp.

is really syntactic sugar for

   s(A,B)  :-
           np(A,C),
           vp(C,B).

That is, as we learned in the previous chapter, the DCG notation is a way of hiding the two arguments responsible for the difference list representation, so that we don’t have to think about them. We work with the nice user-friendly notation, and Prolog translates it into the clauses just given.

Ok, so we obviously need to ask what

   s  -->  np(subject),vp.

translates into. Here’s the answer:

   s(A,B)  :-
           np(subject,A,C),
           vp(C,B).

As should now be clear, the name “extra argument” is a good one: as this translation makes clear, the subject symbol really is just one more argument in an ordinary Prolog rule. Similarly, our noun phrase DCG rules translate into

   np(A,B,C)  :-
           det(B,D),
           n(D,C).
   np(A,B,C)  :-
           pro(A,B,C).

Note that both rules have three arguments. The first, A , is the extra argument, and the last two are the ordinary, hidden DCG arguments (the two hidden arguments are always the last two arguments).

Incidentally, how do you think we would use the grammar to list the grammatical noun phrases? Well, if we had been working with the DCG rule np  -->  det,n (that is, a rule with no extra arguments) we would have made the query

   ?-  np(NP,[]).

So, in view of what we have just learned about extra arguments, it’s not too surprising that we need to pose the query

   ?-  np(X,NP,[]).

when working with our new DCG. And here’s what the response would be:

   X  =  _2625
   NP  =  [the,woman]  ;
   
   X  =  _2625
   NP  =  [the,man]  ;
   
   X  =  _2625
   NP  =  [a,woman]  ;
   
   X  =  _2625
   NP  =  [a,man]  ;
   
   X  =  subject
   NP  =  [he]  ;
   
   X  =  subject
   NP  =  [she]  ;
   
   X  =  object
   NP  =  [him]  ;
   
   X  =  object
   NP  =  [her]  ;
   
   no

One final remark: don’t be misled by this simplicity of our example grammar. Extra arguments can be used to cope with some complex syntactic problems. DCGs are no longer the state-of-the-art grammar development tools they once were, but they’re not toys either. Once you know about writing DCGs with extra arguments, you can write some fairly sophisticated grammars.

Building parse trees

So far, the programs we have discussed have been able to recognise grammatical structure (that is, they could correctly answer yes or no when asked whether the input was a sentence, a noun phrase, and so on) and to generate grammatical output. This is pleasant, but we would also like to be able to parse . That is, we would like our programs not only to tell us which sentences are grammatical, but also to give us an analysis of their structure. In particular, we would like to see the trees the grammar assigns to sentences.

Well, using only standard Prolog tools we can’t actually draw nice pictures of trees, but we can build data structures which describe trees in a clear way. For example, corresponding to the tree

s np det a n woman vp v shoots

we could have the following term:

   s(np(det(a),n(woman)),vp(v(shoots))).

Sure: it doesn’t look as nice, but all the information in the picture is there. And, with the aid of a decent graphics package, it would be easy to turn this term into a picture.

But how do we get DCGs to build such terms? Actually, it’s pretty easy. After all, in effect a DCG has to work out what the tree structure is when recognising a sentence. So we just need to find a way of keeping track of the structure that the DCG finds. We do this by adding extra arguments. Here’s how:

   s(s(NP,VP))  -->  np(NP),vp(VP).
   
   np(np(DET,N))  -->  det(DET),n(N).
   
   vp(vp(V,NP))  -->  v(V),np(NP).
   vp(vp(V))        -->  v(V).
   
   det(det(the))  -->  [the].
   det(det(a))      -->  [a].
   
   n(n(woman))  -->  [woman].
   n(n(man))      -->  [man].
   
   v(v(shoots))  -->  [shoots].

What’s going on here? Essentially we are building the parse trees for the syntactic categories on the left hand side of the rules out of the parse trees for the syntactic categories on the right hand side of the rules. Consider the rule vp(vp(V,NP))  -->  v(V),np(NP) . When we make a query using this DCG, the V in v(V) and the NP in np(NP) will be instantiated to terms representing parse trees. For example, perhaps V will be instantiated to

   v(shoots)

and NP will be instantiated to

   np(det(a),n(woman)).

What is the term corresponding to a vp made out of these two structures? Obviously it should be this:

   vp(v(shoots),np(det(a),n(woman))).

And this is precisely what the extra argument vp(V,NP) given in the rule vp(vp(V,NP))  --> v(V),np(NP) returns to us: a term whose functor is vp , and whose first and second arguments are the values of V and NP respectively. To put it informally: it plugs the V and the NP terms together under a vp functor.

To parse the sentence “A woman shoots” we pose the query:

   ?-  s(T,[a,woman,shoots],[]).

That is, we ask for the extra argument T to be instantiated to a parse tree for the sentence. And we get:

   T  =  s(np(det(a),n(woman)),vp(v(shoots)))
   yes

Furthermore, we can generate all parse trees by making the following query:

   ?-  s(T,S,[]).

The first three responses are:

   T  =  s(np(det(the),n(woman)),
               vp(v(shoots),np(det(the),n(woman))))
   S  =  [the,woman,shoots,the,woman]  ;
   
   T  =  s(np(det(the),n(woman)),
               vp(v(shoots),np(det(the),n(man))))
   S  =  [the,woman,shoots,the,man]  ;
   
   T  =  s(np(det(the),n(woman)),
               vp(v(shoots),np(det(a),n(woman))))
   S  =  [the,woman,shoots,a,woman]

In short, we have just seen an elegant (and useful) example of how to build structure using unification.

Extra arguments can also be used to build semantic representations. Now, we did not say anything about what the words in our little DCG mean. In fact, nowadays a lot is known about the semantics of natural languages, and it is surprisingly easy to build semantic representations which partially capture the meaning of sentences or even entire discourses. Such representations are usually expressions of some formal language (for example first-order logic, discourse representation structures, or a database query language) and they are usually built up compositionally. That is, the meaning of each word is expressed in the formal language; this meaning is given as an extra argument in the DCG entries for the individual words. Then, for each rule in the grammar, an extra argument shows how to combine the meaning of the two subcomponents. For example, to the rule s  -->  np,  vp we would add an extra argument stating how to combine the np meaning and the vp meaning to form the s meaning. Although somewhat more complex, the semantic construction process is quite like the way we built up the parse tree for the sentence from the parse tree of its subparts. 1

Beyond context free languages

In the previous chapter we introduced DCGs as a useful Prolog tool for representing and working with context free grammars. Now, this is certainly a good way of thinking about DCGs, but it’s not the whole story. For the fact of the matter is: DCGs can deal with a lot more than just context free languages. The extra arguments we have been discussing (and indeed, the extra goals we shall introduce shortly) give us the tools for coping with any computable language whatsoever. We shall illustrate this by presenting a simple DCG for the formal language a n b n c n \{ ϵ } .

The formal language a n b n c n \{ ϵ } consists of all non-null strings made up of a s, b s, and c s which consist of an unbroken block of a s, followed by an unbroken block of b s, followed by an unbroken block of c s, all three blocks having the same length. For example, abc , and aabbcc and aaabbbccc all belong to a n b n c n \{ ϵ } .

The interesting thing about this language is that it is not context free. Try whatever you like, you will not succeed in writing a context free grammar that generates precisely these strings. Proving this would take us too far afield, but the proof is not particularly difficult, and you can find it in many books on formal language theory.

On the other hand, as we shall now see, it is very easy to write a DCG that generates this language. Just as we did in the previous chapter, we shall represent strings as lists; for example, the string abc will be represented using the list [a,b,c] . Given this convention, here’s the DCG we need:

   s(Count)  -->  ablock(Count),bblock(Count),cblock(Count).
   
   ablock(0)  -->  [].
   ablock(succ(Count))  -->  [a],ablock(Count).
   
   bblock(0)  -->  [].
   bblock(succ(Count))  -->  [b],bblock(Count).
   
   cblock(0)  -->  [].
   cblock(succ(Count))  -->  [c],cblock(Count).

The idea underlying this DCG is fairly simple: we use an extra argument to keep track of the length of the blocks. The s rule simply says that we want a block of a s followed by a block of b s followed by block of c s, and all three blocks are to have the same length, namely Count .

What should the values of Count be? The obvious answer is: 1 , 2 , 3 , 4 , and so on. But as yet we don’t know how to mix DCGs and arithmetic, so this isn’t very helpful. Fortunately, as this DCG shows, there’s an easier (and more elegant) way. Represent the number 0 by 0 , the number 1 by succ(0) , the number 2 by succ(succ(0)) , the number 3 by succ(succ(succ(0))) , and so on, just as we did it in Chapter 3 (as we said in Chapter 3, you can read succ as “successor of”). This choice of notation enables us to count using unification.

And this is precisely what our new DCG does. For example, suppose we pose the following query:

   ?-  s(Count,L,[]).

which asks Prolog to generate the lists L of symbols that belong to this language, and to give the value of Count needed to produce each item. Then the first four responses are:

   Count  =  0
   L  =  []  ;
   
   Count  =  succ(0)
   L  =  [a,  b,  c]  ;
   
   Count  =  succ(succ(0))
   L  =  [a,  a,  b,  b,  c,  c]  ;
   
   Count  =  succ(succ(succ(0)))
   L  =  [a,  a,  a,  b,  b,  b,  c,  c,  c]

The value of Count clearly corresponds to the length of the blocks.

So: DCGs are not just a tool for working with context free grammars. They are strictly more powerful than that, and (as we’ve just seen) part of the extra power comes from the use of extra arguments.

eXTReMe Tracker
© 2006-2012 Patrick Blackburn, Johan Bos, Kristina Striegnitz