The advent of generative grammar with the publication of Chomsky’s monograph Syntactic Structures in 1957 was a scientific revolution. This revolution was directed almost exclusively against the taxonomic conception of syntax from the beginning. Since the generative revolution was essentially syntactic, our starting point must be syntax. So, we need to briefly examine taxonomic syntax, indicating also some of the problems to the solution of which it is inadequate.
The Constituent Structure of Sentences
The Word
We will take the word as a basic unit of syntactic structure. By doing this, we are assuming that there is a set of procedures that, when applied to sequences of morphemes, will define a class of constituents consisting of one or more morphemes that may suitably be named ‘words’. In other words, the word is not an intuitively given unit but a procedurally defined classificational construct.
We will further assume that there is a set of procedures through which words can be classified into well-known classes of ‘nouns’, ‘verbs’, ‘adjectives’, ‘adverbs’, ‘pronouns’, ‘conjunctions’, etc.
Tree Diagram
At this point, let us introduce some technical terms from the generative theory of syntax. The terms are not used in taxonomic grammars, but the notions underlying them are implicit in analysing the constituent structure. Consider the following abstract tree diagram.
The dots in the diagram are called nodes. The capital letters are node labels. The lower-case letters represent words. The relationship between the higher and lower nodes in the tree is one of domination. One node immediately dominates another if there is no intervening node between them. In the tree, A immediately dominates B and C; it dominates all other nodes. C immediately dominates F and G; it dominates z and q. There is no relation of domination between C and B, K and G, etc.
Conversely, if we move up the tree, the relationship is one of “is a”. Thus, m “is a” K, K + L “is a” B, etc. Besides these relations, the tree specifies the relation of precedence. For example, B precedes C, F precedes G, etc.
Immediate Constituent Analysis
To study the structure of a sentence, structural linguists thought of dividing a sentence into its immediate constituents (or ICs). The principle involved was cutting a sentence into two, further cutting these two parts into another two, and continuing the segmentation until the smallest unit, the morpheme, arrived. The concept of constituent structure is based on the observation that units that occur next to each other tend to belong together in the sense that they are structurally intimately related.
Consider the following sentence-
(1) The nice scouts who were camping in the wood have gone home.
This sentence consists of 12 words. These form the ultimate constituents of the sentence. As the first step in our analysis of the constituent structure of (1), we attempt to group the words in pairs. Likely candidates are ‘nice’ and ‘scouts’, ‘were’ and ‘camping’, ‘the’ and ‘wood’, and ‘have’ and ‘gone’. Once we have grouped these words, they are considered functional units or constituents. An operational test for the correctness of the analysis is substitution. If the groups are indeed constituents, it should be possible to substitute single words for them without affecting the basic syntactic pattern of the sentence.
(2) The women (nice scouts) who worked (were camping) in there (the wood) went (have gone) home.
We proceed like this through the sentence until all words are paired with a constituent. Let us assume that the final result of the analysis can be represented in terms of the following bracketed string:
(3) (((The ( nice scouts)) ( who (( were camping) ( in ( the woods )))) (( have gone) home)))
We may proceed the other way. The first step would be to segment or cut the sentence into two parts; the next is to segment each into two parts, and so on, until the word’s rank. Again, the analysis would have to be controlled by environmentally determined substitution tests, studies of the distributional range of the units established by segmentation, etc.
We can now illustrate the procedure in the following way: 0 indicates where the first cut was made, 1 shows where the second cuts were made, etc.
Let us now convert (3) and (4) into a tree diagram.
There are 11 nodes in the tree. Each of them immediately dominates two constituents: immediate constituents or ICs of construction represented by the immediately dominating node. The tree shows a hierarchical layering of structures -the principle of togetherness-by-ranks. In other words, the syntactic structure is not solely a matter of linearity but also a matter of depth.
We said that substitution was one of the checks on the analysis. The two encircled nodes represent the last possibilities of substitution. At those nodes, we can substitute, say
(6) Jack left
The sentence, then, can be viewed as successive expansions of this basic structure.
We may now try to omit some of the constituents to see what happens in terms of grammaticality (ungrammatical sentences are marked by *)
(7) The scouts who were camping in the wood have gone home
(8) * The nice who were camping in the wood have gone home
(9) * The nice scouts were camping in the wood have gone home
(10) * The nice scouts have gone home
(11) The nice scouts have gone home
(12) * The nice scouts who were camping the wood have gone home
(13) * The nice scouts who were camping in have gone home
(14) * The nice scouts who were camping in the wood home
(15) The nice scouts who were camping in the wood have gone
The omission of ‘nice’ in (7) does not make the sentence ungrammatical. In (8), ‘scouts’ has been omitted, resulting in ungrammaticality. Consequently, ‘scouts’ must be syntactically more critical than ‘nice’ in the sentence. (9), (10) and (11) show that ‘who’ and ‘were camping in the wood’ must either be present in the sentence or both be omitted. (12) and (13) show that ‘in’ and ‘the wood’ are interdependent in the same way. Finally, (14) and (15) reveal that ‘have gone’ is more important in the sentence structure than ‘home’.
The pattern of (un)grammaticality revealed by (7) through (15) suggests that we have to do with two fundamentally different types of construction. In one type, one of the ICs is the head or centre of the construction. In the other type, neither of the ICs constitutes the head: both are equally important.
Constructions with a head are referred to as endocentric, whereas those with no such head are termed exocentric. It will be apparent that the defining criterion of endocentricity and exocentricity is distribution: a construction is endocentric only if one of the ICs has the same (or roughly the same) distribution as the whole construction, whereas, in an exocentric construction, neither of the ICs has the same distribution as the entire construction.
Each node is related to the two symbols on the branches immediately below. The structure is endocentric if one arrow points towards the node and the other away from the node. If both arrows point away from the node, the structure is exocentric.
Each integer from 13 to 23 specifies a construction. The two lines branching from each node represent the function of the ICs of the construction. Each of the constructions of which the sentence is made up, and -ultimately -each word, is a member of a form class. Suppose we adopt the following notational convention: F1 = function of left-branching constituent from a given node, F2 = function of right-branching constituent, and C = class/construction. In that case, we can assign the following structural description to (16):
A tree diagram with labelled nodes is called a phrase-marker, abbreviated P-marker.