Collocations, Creativity and Constructions

Cordula Glass

eBooks

Collocations, Creativity and Constructions

2019

978-3-8233-9171-5

Gunter Narr Verlag

Cordula Glass

Approaching collocations from a usage-based perspective, this study investigates how the development of collocational proficiency in first and second language attainment could be explained. Against the background of recent approaches in cognitive linguistics such as construction grammar and Complex Adaptive Systems it argues that collocations should not be regarded as idiosyncratic phraseological items, which, depending on their degree of fixedness and semantic opaqueness, can be classified along a gradient of idiomaticity. Thus, this study regards collocations as dynamic linguistic phenomena, which could be seen as subject to constant change rather than more or less static combinations with an additional level of syntagmatic and paradigmatic restrictions. Furthermore it explores how creative changes and alternations of collocations can be used to learn more about a speaker’s cognitive processing of these phraseological phenomena and how this process might be influenced by language external factors such as ‘age’, ‘education’ or ‘context’. Multilingualism and Language Teaching 6 MLT 6 Glass Collocations, Creativity and Constructions Multilingualism and Language Teaching 6 Cordula Glass Collocations, Creativity and Constructions A Usage-based Study of Collocations in Language Attainment Approaching collocations from a usage-based perspective, this study investigates how the development of collocational proficiency in first and second language attainment could be explained. Against the background of recent approaches in cognitive linguistics such as construction grammar and Complex Adaptive Systems it argues that collocations should not be regarded as idiosyncratic phraseological items, which, depending on their degree of fixedness and semantic opaqueness, can be classified along a gradient of idiomaticity. Thus, this study regards collocations as dynamic linguistic phenomena, which could be seen as subject to constant change rather than more or less static combinations with an additional level of syntagmatic and paradigmatic restrictions. Furthermore it explores how creative changes and alternations of collocations can be used to learn more about a speaker’s cognitive processing of these phraseological phenomena and how this process might be influenced by language external factors such as ‘age’, ‘education’ or ‘context’. Multilingualism and Language Teaching 6 MLT 6 Glass Collocations, Creativity and Constructions Multilingualism and Language Teaching 6 Cordula Glass Collocations, Creativity and Constructions A Usage-based Study of Collocations in Language Attainment Collocations, Creativity and Constructions Multilingualism and Language Teaching Herausgegeben von Thorsten Piske (Erlangen), Silke Jansen (Erlangen) und Martha Young-Scholten (Newcastle) Band 6 Cordula Glass Collocations, Creativity and Constructions A Usage-based Study of Collocations in Language Attainment Bibliografische Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http: / / dnb. dnb.de abrufbar. © 2019 · Narr Francke Attempto Verlag GmbH + Co. KG Dischingerweg 5 · D-72070 Tübingen Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen. Internet: www.narr.de E-Mail: info@narr.de Satz: pagina GmbH, Tübingen CPI books GmbH, Leck ISSN 2199-1340 ISBN 978-3-8233-9171-5 For Hannelore and Manfred Glass Inhaltsverzeichnis 7 Inhaltsverzeichnis Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1 Three sides of the same coin? - Collocations, Creativity, Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Collocations as Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1 Context-Oriented Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Significance-Oriented Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 A Construction-Oriented Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3 Collocations and Creativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1 Creative Variation of Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Creativity and Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4 Creating Linguistic Creativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1 Nativist Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.1.1 Interpretive Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.1.2 Conceptual Semantics and Parallel Architecture . . . . . . . . . . . . . . 75 4.2 Constructionist Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.1 Emergentist Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.2 Social-pragmatic Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Phraseology and Language Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.1 Construction Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.2 A Stage Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.3 Complex Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.4 The DMCDC -Model: A usage-based model of collocations . . . . . . . . 106 4.5 Summary and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5 Measuring Collocations - Methodological Considerations . . . . . . . . . . . . . . . . . . 115 5.1 Online Production Tasks - Corpus Data and Statistical Association Measures for Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.1.1 Traditional Association Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 8 Inhaltsverzeichnis 5.1.2 Corpus Data in Cognitive Linguistic Research . . . . . . . . . . . . . . . 125 5.2 Offline Perception Tasks - Experimental Data in Usage-based and Constructionist Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.3 Methodological Limitations and Shortcomings . . . . . . . . . . . . . . . . . . . . 130 5.3.1 Corpus Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.3.2 Judgement Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.4 Methodology of this Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.4.1 Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.4.2 Participants and Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6 CollMatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.1 Native Speakers - Adult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.2 Native Speakers - Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.3 Native Speakers - Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.3.1 Pattern 1: Gradual Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.3.2 Pattern 2: Peaked Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.3.3 Pattern 3: Steady Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.3.4 Pattern 4: Receding Positive Evaluation . . . . . . . . . . . . . . . . . . . . . . 161 6.3.5 Distractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.4 Non-Native Speakers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.4.1 Pattern 1: Gradual Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.4.2 Pattern 2: Peaked and Dented Acceptance . . . . . . . . . . . . . . . . . . . 177 6.4.3 Pattern 3: Steady Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6.4.4 Pattern 4: Receding Positive Evaluation . . . . . . . . . . . . . . . . . . . . . . 180 6.4.5 Distractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.5 Effects of Schooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 6.6 Summary and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 7 CollJudge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 7.1 Native Speakers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 7.1.1 Pattern 1: Preference of Established Variants . . . . . . . . . . . . . . . . . 201 7.1.2 Pattern 2: Overall Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 7.1.3 Pattern 3: Contextual Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.1.4 Other Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 7.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 7.2 Non-native speakers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 7.2.1 Pattern 1: Preference of Established Variants . . . . . . . . . . . . . . . . . 217 7.2.2 Pattern 2: Overall Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Inhaltsverzeichnis 9 7.2.3 Pattern 3: Contextual Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 7.2.4 Pattern 4: Contextual Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 7.2.5 Other Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 7.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 7.3 Comparing Corpus Data and Evaluations from Judgement Tasks . . 236 7.4 Summary and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 8 Main Results and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 8.1 Main Results of this Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 8.2 Limitations and Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 8.3 Implications for a Usage-based Approach Towards language . . . . . . 255 8.3.1 First Language Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 8.3.2 Second Language Acquisition and Learning . . . . . . . . . . . . . . . . . . 259 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Appendix I: Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Appendix II : CollMatch (Acceptance Scores) . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Appendix III : REC items vs. alternate combinations . . . . . . . . . . . . . . . . . . . 277 Appendix IV : Raw Frequency Rankings and Association Measures for CollJudge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Appendix V: CollJudge (z-transformed acceptance scores) . . . . . . . . . . . . . 281 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Figures and Tables 11 Figures and Tables Box 2.1: Entry for the lemma do from Thousand-Word English (Palmer / Hornby 1937: 43) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Box 2.2: Lexical functions of phraseological phenomena according to Mel’čuk (1995: 186) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Box 5.1: Test item from CollJudge - creative / simple variant of cook the tea / meal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Box 5.2: Code and example from the non-linguistic distractor task (Appendix I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Box 6.1: Items of Gradual Acceptance (L1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Box 6.4: Items with an overall tendency of receding positive evaluation (L1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Box 6.5: CollMatch’s pseudo-collocations according pattern (L1) . . . . . . . . . . 165 Box 6.7: Items of Gradual Acceptance (L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Box 6.8: Items of Peaked Acceptance and Dented Acceptance (L2) . . . . . . . . . . . 178 Box 6.10: Items with an overall tendency of receding positive evaluation (L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Box 6.11: CollMatch’s pseudo-collocations according pattern (L2) . . . . . . . . . 183 Box 6.13: Items of Academic Rejection (L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Figure 1.1: Scope of this Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Figure 2.1: Mel’čuk’s (1989, 1995) process of text production compared to de Saussure’s (1916 / 1967: 76-79) linguistic sign . . . . . . . . . . . . 42 Figure 2.2: Typology of word combinations (based on Hausmann 1984 and Bartsch 2004: 38) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Figure 2.3: Form and function levels of collocational construction, taking scornful tone as an example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 4.1: The productivity cline according to Braðdal (2008: 172) . . . . . . . . . 99 Figure 4.2: A stage model for the acquisition of formulaic language (Wray / Perkins 2000) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Figure 4.3: A dynamic model for the cognitive development of collocations ( DMCDC ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 12 Figures and Tables Graph 6.1: Schematic overview of CollMatch’s collocations and pseudo-collocations evaluated by adult native speakers of English (see Appendix II for a detailed overview) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Graph 6.2: Example for Gradual Acceptance ( GA ) - L1 acceptability rating for drop hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Graph 6.3: Example for Peaked Acceptance ( PA ) - L1 acceptability rating for meet a need . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Graph 6.4: Example for Steady Acceptance (StA) - L1 acceptability rating for pull a face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Graph 6.5: Example for Academic Acceptance (AcA) - L1 acceptability rating for adopt an approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Graph 6.6: Example for Dented Acceptance ( DA ) - L2 acceptability rating for make a move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Graph 7.1: Example for Preference of Established Variants ( PREF ) - L1 acceptability rating for pull a face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Graph 7.2: Example for Overall Acceptance ( OA ) - L1 acceptability rating for lend support / advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Graph 7.3: Example for Contextual Acceptance ( CONTEXT ) - L1 acceptability rating for commit a crime / mistake . . . . . . . . . . . . . . . . . . . . . . 211 Table 2.1: Lexical and grammatical relations (based on Halliday 1966: 152-153) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Table 2.2: Corpus-based association measures for “Humpty Dumpty’s collocations” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Table 2.3: Categorisation of lexical word combinations according to Cowie (1983: xii-xiii) with examples from Howarth (1996: 15-16) . . . . . . . . . . . . . . . . 39 Table 2.4: Categorisation of open collocations (Cowie / Howarth 1995: 83) . . . 41 Table 2.5: List of noun collocates for scornful according to the BNC . . . . . . . . . 46 Table 5.1: Typology of linguistic methodologies (partly based on Siyanova / Schmitt 2008) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Table 5.2: Basic contingency table for observed and expected frequencies (based on Evert 2009: 1231) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Table 5.3: Ten most frequent verb collocates for the lemma crime according to MI , z-score, t-score and log-likelihood ( BNC ) . . . . . . . . 122 Table 5.4: Distribution of participants across languages and age groups . . . 142 Table 6.1: Overview of group results from CollMatch based on Gyllstad (2007) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Table 6.2: Patterns identified in CollMatch based on each L1 group’s acceptance scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Figures and Tables 13 Table 6.3: Overview of items with the same acceptance pattern or a similar acceptance score in adult L1 and L2 evaluations . . . . . . . . . . . . . . . . . . . . . . . . . 173 Table 6.4: Patterns identified in CollMatch based on each L2 group’s acceptance scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Table 6.5: Overview of group results from CollMatch in year 5 (Germany) and year 7 (Great Britain) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Table 6.6: Comparison of group results from CollMatch in year 5 (L2) and year 7 (L1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Table 7.1: Items in CollJudge in their four different variants (original version of each item in bold print) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Table 7.2: CollMatch results according to the four test variants of CollJudge for adult L1 and L2 participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Table 7.3: L1 speakers’ qualitative evaluation of items with the pattern of Preference of Established Variants sorted according to age and variant . . 204 Table 7.4: L1 speakers’ qualitative evaluation of items with the pattern of Overall Acceptance sorted according to age and variant . . . . . . . . . . . . . . . . . 209 Table 7.5: L1 speakers’ qualitative evaluation of items with the pattern of Contextual Acceptance sorted according to age and variant . . . . . . . . . . . . . 212 Table 7.6: L1 speakers’ qualitative evaluation of items with an unclear pattern sorted according to age and variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Table 7.7: L2 speakers’ qualitative evaluation of items with the pattern of Established Variants sorted according to age and variant . . . . . 219 Table 7.8: L2 speakers’ qualitative evaluation of items with the pattern of Overall Acceptance sorted according to age and variant . . . . . . 221 Table 7.9: L2 speakers’ qualitative evaluation of items with the pattern of Contextual Acceptance sorted according to age and variant . . . . . . . . . . . . . 225 Table 7.10: L2 speakers’ qualitative evaluation of items with the pattern of Contextual Influence sorted according to age and variant . . . . . . . . . . . . . . . 230 Table 7.11: L2 speakers’ qualitative evaluation of items with an unclear pattern sorted according to age and variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Table 7.12: Raw frequencies and association measures for CollJudge Items 237 Table 7.13: Contextual influence (Δp) on the evaluation of native and non-native speakers of English for established and creative variants within CollJudge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Acknowledgements 15 Acknowledgements Very much like collocations, this book was part of a steady process of ‘creativity’, ‘construction’ and, of course, ‘change’. Many people helped me to pass through the different (and sometimes difficult) stages applied research involves. There are too many to name them all, but I am grateful for their support and feedback. Some, however, deserve a very special mention: First, I would like to thank Professor Dr. Thomas Herbst and Professor Dr. Thorsten Piske for their valuable and constructive suggestions during all stages of this project. I also wish to express my sincere gratitude to Dr. Christina Schelletter for helping out when help was needed the most. Furthermore, heartfelt thanks to Dr. Susanne Dyka, Dr. Carolin Ostermann, Eva Scharf and Thomas Binder for their feedback, encouragement and friendship. This book would not exist without them. I am also grateful to my colleagues as well as to all students, teachers and parents on both sides of the channel who, quite literally, provided the foundations for this project. Finally, I wish to thank my family for their patience, support and encouragement. They are my ‘base’ and inspiration. 1 Three sides of the same coin? - Collocations, Creativity, Constructions Putting together novel expressions is something that speakers do, not grammars. It is a problem-solving activity that requires a constructive effort on the part of a speaker and occurs when he puts linguistic convention to use in specific circumstances. (Langacker 1987: 85) Speakers, as Langacker (1987: 85) points out, form one of the key elements of a language. Not only are they the force that brings a language to life but also the motor to shape conventions and create new expressions, phrases, or even grammatical constructions. Even aspects of language which are traditionally defined through their invariability or at least partial fixedness, such as idioms or collocations, are not immune to creative alternations and change, as sentences 1 (1) to (3) show. (1) BNC K51 1683 Politicians seem to work on the assumption that the early bird catches the voter. (2) BNC CJA 2253 […] and Tabitha was captivated despite herself, watching the pretty man play and wondering how he would end it, how he could ever resolve the disagreement between the rush and the ebb […] (3) BNC H8H 1277 Now I have, and I'm telling you that if you marry him then you'll be committing the biggest mistake in your brief little life. This conflict between fixedness and change forms the basis for this study. The following pages focus on the apparent tension between established, idiomatic items, and creative alternations, which ultimately cause a language to change 1 The examples at hand are taken from the British National Corpus (BNC) with the relevant elemets underlined for the purpose of this study. In terms of frequency, sentences (1) and (2) are the only instances of the early bird catches the voter and pretty man respectively, while phrases containing the lemma mistake in combination with the lemma commit (with a span of +/ - 4) occur more frequently, with a total of eight acceptable hits within the corpus. 18 1 Three sides of the same coin? - Collocations, Creativity, Constructions and evolve. Since idioms 2 , like the early bird catches the worm, are usually considered to function more or less as one complex, rather invariant, unit of meaning, which is made up of several words 3 but expresses a unified concept, it might seem rather surprising to find creative alternations as in (1). However, collocations, like pretty woman or commit a crime 4 , play an even more interesting role. Most definitions (> 2) would, in fact, agree that the essence of this phraseological phenomenon is a strong, partly inexplicable, bond, which seems to link all items within a collocation. In fact, these collocates were often argued to be so closely associated that the thought or perception of one collocate almost automatically seems to somehow activate the other(s) 5 , like commit would trigger a word such as crime, while pretty is likely to elicit woman . Therefore, to talk about creative, “novel expressions” in the context of collocations seems to be counterintuitive at first. Nevertheless, for most definitions a second decisive feature of collocations is a certain degree of flexibility within their components, as in pretty woman , with alternations like pretty girl or pretty face, or commit which, amongst others, can also be found with offence or act . In some cases, these collocational combinations can then even extend to rather unlikely or even novel, yet decodable, combinations like pretty man or commit a mistake . 2 For a more detailed discussion of idioms and their relationship to collocations compare Wulff (2010: 11-12); Howarth (1996: 47); Cruse (1986); and Cowie / Mackin (1983: viii-ix). 3 As the examples above show, the notion of word could, at times, be misleading, since it is not a clearly defined concept. Therefore, it might apply to a sequence of letters or phonemes, separated from other sequences by a space or a pause - such as bear and bird or bears or birds - but also as a sequence of these items which constitute one concept such as teddy bear or the early bird catches the worm . Hence, Matthews (1974) suggests using the term lexeme for a unit referring to “fundamental” concepts while keeping the term word for a lexeme’s instantiations. According to this distinction, teddy bear could then also be regarded as a multi-word lexeme . 4 These examples have been repeatedly used to discuss collocations and collocative meaning (compare for example Leech 1974; 1998: 20; Palmer 1976: 95-97; Klotz 1998: 92-95). 5 Herbst (1996), for example, did show that native speakers of English tend to agree on similar solutions in a completion task, but also that, compared to non-native participants, native speakers give a smaller range of alternative collocates. Hoffmann and Lehmann (2000) furthermore compared native as well as non-native speakers’ intuition against corpus data for low-frequency collocations from the BNC and found that most of the native speakers scored above chance and were able to predict most of the missing collocates in a fill-in-the-gap task. In an reproduction task, Conklin and Schmitt (2008) focused on cognitive processing of formulaic sequences; most of their native as well as non-native participants were able to process formulaic sequences faster in a self-paced reading task, which further supports the assumption that formulaic combinations like collocations are, to a certain extent, cognitively linked. 1 Three sides of the same coin? - Collocations, Creativity, Constructions 19 However, as the examples in (2) and (3) show, a pretty man is not necessarily the same as the linguistically much more frequent 6 handsome man . Furthermore, if one commits a mistake , this is very unlikely to be the same kind of act as in to make a mistake ; and a pretty man tends to be associated with rather female features or behaviour, often used in a derogatory or objectified way, like for example the suitor of the emancipated and self-confident protagonist 7 in sentence (2). A mistake which is committed , on the other hand, is very likely to be a euphemistic phrasing to refer to a serious offence or, as in (3), a similarly lifechanging, yet wrong, decision. In these cases pretty and commit seem to coerce their respective noun phrases ( NP ) into a reading which is much closer to their established collocational meaning than to the more common combinations with handsome or make . In past publications on collocations, these examples have been treated as separate phenomena. They have either been classified as some kind of deliberate, creative, literary form of language use (cf. Hausmann 1984 on counter-creations ), more or less brushed aside as lexical idiosyncrasies or peripheral phenomena 8 (Chomsky 1965, Palmer 1976, Klotz 1998) or have not been mentioned at all. One notable exception is Mackin (1978), who advocates for a lexicographical description of phraseological language which not only focuses on the prototypical form of an idiom or collocation but also takes into account creative alternations, which he calls nonce uses (Mackin 1978: 163-164). Also, comprehensive studies 9 on collocations tend to focus on high frequent or 6 The frequencies for the respective lemma combinations from the BNC (span +/ - 4) around the noun-collocate: pretty+man (2), handsome+man (151), commit+mistake (8), make+mistake (2002). 7 The book sentence (2) is taken from is the award-winning SciFi-novel Take Back Plenty (Greenland 1990: 81) which features an extraordinary, strong and in every aspect of her life independent woman as its main protagonist. 8 Approaches within the framework of generative grammar tend to avoid phraseological phenomena in general and collocations in particular. This is largely because in a generativistic tradition semantic phenomena and their distributional properties are regarded as a peripheral phenomenon. In his publication on Aspects of the Theory of Syntax , for example, Chomsky (1965) suggests two sets of rules; “strict subcategorisation rules” and “selectional rules” (Chomsky 1965: 95). For Chomsky, violation of selectional rules, like for example commit a mistake instead of make a mistake , would still lead to a decodable syntactic structure and can thus be regarded as a less central aspect within a linguistic system. Rögnvaldsson (1993), on the other hand, argues that even though they are usually not explicitly accounted for in universalistic publications, later versions of Chomsky’s conception of language especially, such as The Minimalist Program (Chomsky 1995), can provide a suitable framework to explain collocational combinations. 9 As for example in Bahns (1996) and Steinbügl (2005) with a focus on lexicography; Lehr (1996), Nesselhauf (2004) and Bartsch (2004) with a focus on corpus linguistic methods or Howarth (1996), Gyllstad (2007) and Jehle (2007) with a focus on language attainment and testing. 20 1 Three sides of the same coin? - Collocations, Creativity, Constructions highly associated collocational pairs. This leaves more creative versions out of the picture, since, if viewed individually, for example in a large corpus like the British National Corpus ( BNC 10 ), they are often a low-frequency phenomenon. Even in association measures like mutual information ( MI ), these combinations lose out against rare collocates, like cottage-residence or hara-kiri . At the same time, these non-standard alternations of collocations are particularly interesting, because they are infrequent yet not incomprehensible, which shows that, to a certain extent, creativity and change need a base of established linguistic structures to be interpreted against. The most basic level which could be assumed would, of course, be simple syntactic rules in a traditional grammarlexicon model (Chomsky 1965). Here, it would be argued that the examples of pretty man and commit a mistake are but a mere combination of two lexical items which are formed ad hoc on the basis of established syntactical structures, such as adjective plus noun, [Adj+N] 11 , or verb phrase plus noun phrase in object position, [ VP + NP ]. Still, this does not explain why these words are then interpreted against the background of a related, more established, actual collocation. Therefore, these instances beg the question, whether the interpretation of these creative word combinations might not indeed be cognitively supported by more common, entrenched collocational pairs. This connection, however, would imply that the reason different collocations operate on a gradient spectrum of fixedness is not just a linguistic fact or even coincidence, but that the degree of variability might depend on other factors, like the frequency of input or cognitive entrenchment 12 . Moreover, as our first interpretation of combinations like pretty man or commit a mistake demonstrated, these creative alternations might support approaches which suggest that not only traditional lexical items, such as words or compounds, but also more abstract constructions, such as [ pretty +N] or [ commit + NP ], could have their own level of meaning. In the last decades, several cognitive and constructionist 13 approaches developed which have explicitly or implicitly committed themselves to this idea and thus regard language as a network of elements consisting of a formal as well as a functional side. Further- 10 Data cited with “BNC” has been extracted from the British National Corpus , distributed by Oxford University Computing Services on behalf of the BNC Consortium. All rights in the texts cited are reserved. 11 In this study, square brackets will be used to indicate that the respective sequence could be regarded as the formal representation of a construction. (cf. Stefanowitsch / Gries 2003) 12 Entrenchment in this chapter is used independently from any approach or school of thought. The discussion of entrenchment vs. pre-emption will later be part of chapter 4. 13 See Fischer / Stefanowitsch (2006), Ziem / Lasch (2013) or Hoffmann / Trousdale (2013) for a detailed overview of constructionist approaches. 1 Three sides of the same coin? - Collocations, Creativity, Constructions 21 more, they also see a speaker as the user as well as a source of linguistic innovation, and thus no longer distinguish between more or less normative language competence and a speaker’s actual performance. Therefore, they instead see both the language and its speakers as inseparable parts of a dynamic system 14 . One prominent branch is usage-based theories, which have already been able to show specific effects of linguistic input and frequency on language attainment (Bybee 2010; Ellis 2006; Tomasello 2005; Bybee / Hopper 2001). Another school of thought, construction grammar , developed in recent decades and focuses explicitly on language as a system of form-function pairings, so-called constructions (Ziem / Lasch 2013; Goldberg 2006, 1995). In both these approaches, the dual role of a language user as the recipient as well as the source of linguistic conventions and change holds a central role. More recently, concepts such as constructionalisation (Traugott 2015; Traugott / Trousdale 2013; Hilpert 2008) or cognitive sociolinguistics (Hollman 2013; Grondelaers / Speelman / Geeraerts 2007) then fruitfully applied the method and concepts of constructionist approaches onto the discipline of diachronic language research, as well as sociolinguistic studies. At the same time, these advances show that, while rather comprehensive at every level of a linguistic system, construction grammar has a tendency to focus on more or less isolated constructions. Thus, it needs to remind itself that contextual factors like time in general, as well as a speaker’s age, education or social class might influence the outcome of a study. Still, as has been mentioned previously, creative alternations of collocations tend to be the exception rather than the norm, hence, not every speaker might use or even tolerate the same level of creative language. For the comprehension of complex syntactic constructions, for example, Dąbrowska 15 (1997) was able to show that native speakers’ acceptance diverged drastically, based on the educational background of the participants. In Chipere (2003) as well as Dąbrowska and Street (2006), non-native speakers even outperformed their native speaker 14 Based on his conception of an ideal speaker-hearer Chomsky (1965: 4) argues that language falls into a normative - as Chomsky would argue - partly inborn system of rules, a speaker’s competence , and his / her actual performance . This is not to be confused with Coseriu’s (1973) dichotomy of system and norm , which distinguishes between potentially possible and actually used combinations of linguistic elements. Thus, system and norm could also be applied in a more usage-based and / or constructionist context, while competence and performance would contradict a view of language attainment where a speaker’s system develops out of continuous language use. 15 In a later study, Dąbrowska and Street (2010) were able to obtain similar results in a comparison of participants with “LowAcademicAchievement (LAA)” and “HighAcademicAchievment (HAA)”. In addition, they also showed that training resulted in a better performance for LAA test takers. Furthermore, in Dąbrowska (2010), participants’ evaluations did not only vary depending on a participant’s linguistic expertise, but also according to the degree of syntactic complexity. 22 1 Three sides of the same coin? - Collocations, Creativity, Constructions counterparts in a task on the comprehension and recall of complex sentences and acceptability of plausible and implausible sentences. Thus, the connection between a collocation and its more creative alternations might be able to tell us something about the nature of collocations as such, and also serve as an indicator for potential stages of mental processing in language attainment. As they are generally constructed according to common syntactical patterns (like [Adj+N] or [ VP + NP ]) but apparently restricted combinatorially, collocations can neither be seen as a purely syntactical, nor as a clearly lexical phenomenon. They operate on an in-between level, which makes them a challenging subject for any comprehensive model of language. However, since to date most studies have focused either on highly frequent combinations or collocations which were approved by academically trained evaluators 16 , the status of more creative alternations ranges from deviant exceptions to creative instances of language use. Only rarely has creative language use been seen as a potential next step in an ongoing language development process. But in recent years, advances in usage-based theories and construction grammar have resulted in a new perspective on language acquisition and development. Seeing language as a continuum of ongoing change, several linguists began to approach language as a dynamic, complex adaptive system (Ellis / Larsen-Freeman 2009; Larsen-Freeman / Cameron 2008), which forms structures and abstractions from the input of its environment but also contributes to a steady change by feeding new and sometimes novel or creative utterances into the system. Thus, these approaches share the assumption that the human brain shapes structures and potentially, even rules, based on the input it receives. These structures are then combined to form new utterances. Here, studies in morphology (Bybee 1995) and syntax (Ambridge / Goldberg 2008; Tomasello 2005) have already been able to show that analogy plays a crucial role when it comes to the (re)combination of established items and structures, which might also be responsible for the development of novel and creative combinations. Yet, while morphology and syntax are traditionally regarded as a rule-based system with only a few idiosyncrasies, collocations are often defined by their idiomatic and unpredictable nature. As mentioned before, they could, therefore, be seen as in-between phenomena, which would make them interesting structures to fathom the interaction and effects of linguistic innovation and convention. Therefore, this study seeks to investigate the constructional potential of collocations and also whether creative alternation within a collocational combination could lead to its manifestation as a construction. While contextand 16 Compare for example Howarth (1996), Nesselhauf (2004), Bartsch (2004) or Gyllstad (2007). 1 Three sides of the same coin? - Collocations, Creativity, Constructions 23 significance-oriented approaches alike have treated collocations as descriptively interesting exceptions, this text approaches the phraseological phenomenon from a (language) attainment point of view. Partly inspired by previous studies on semantic prosody (Stewart 2010; Sinclair 2004, 1991; Partington 2004; 1998; Bublitz 1995; Louw 1993), it argues that, to a certain extent, collocations, creativity , and constructions could also be interpreted as different stages of linguistic development. The following pages are based on the idea that, not unlike coins, which, in fact, are three-dimensional objects, collocations have often been approached from two angles: as a partly opaque, phraseological phenomenon, or as a rather interesting frequency effect (Granger / Paquot 2008; Bartsch 2004; Herbst 1996). A third dimension, consisting of temporal and social context, which, like the rim of a coin, might link both sides, has thus far gained little attention. This missing link could be found by looking at the genesis of individual collocational phenomena. But a closer examination of collocational creativity against a usage-based, constructional background is not just interesting from a purely theoretical point of view. Since English today is widely regarded as a key competence, not only in terms of vocational expertise but also as an essential skill in an increasingly private sphere which is becoming ever more global and international, the need to use and understand English correctly is still growing. In the last decades, a plethora of studies has already shown that, because of their seemingly unpredictable character, collocations are one of the most challenging phenomena for non-native speakers of English (amongst others: Howarth 1996; Granger 1998; Nesselhauff 2004; de Cock 2004). With seemingly arbitrary syntagmatic as well as paradigmatic restrictions, they operate outside the traditional “slot-filler model” (Sinclair 1991: 109). Against this background, items such as collocations, which do not allow for any random lexical filling within a syntactical structure, have to be memorized. Hence, even more advanced learners struggle with native-like English phrasing and might at times despair, since they might have been marked down for using combinations such as pretty man or commit a mistake , while native speakers seem to be allowed to use them, although admittedly only under certain circumstances. Thus, the three focal research questions ( RQ ) of this study are: RQ 1: Are collocations a cognitively stored entity, and if so, how can this perspective be adequately described in a comprehensive model of collocational combinations? RQ 2a: Is there a unified process underlying the attainment of collocational proficiency? RQ 2b: Does the collocational proficiency of native (L1) and non-native (L2) speakers of English develop in the same way? 24 1 Three sides of the same coin? - Collocations, Creativity, Constructions RQ 3: What role do the factors ‘creativity’ and ‘context’ play for the acceptability and analysis of collocational phenomena? To approach these questions, the following pages deal with a more detailed discussion of the subject matter, collocations (> 2), as well as creativity and contextual change (> 3). Here the emphasis lies on the concepts themselves, and their theoretical background, as well as their position in modern (cognitive) language attainment research, to extrapolate which factors might contribute to the acceptance of creativity and whether these could be used to explain a collocation’s creative alternation. Together with chapter 4, they contribute to RQ 1. Introducing a first cognitive conception of collocation (> 2) and discussing how and why change and creativity can be seen as a cognitive faculty (> 3), they lay the foundations for a more comprehensive model of collocation as a cognitive phenomenon within the process of language attainment (> 4). Thus, chapter 4 zooms in on the question of how approaches to language acquisition can contribute to RQ 1, and how these support a more comprehensive picture of the potential connections between collocations, creativity, and constructions. Combining different usage-based approaches towards phraseological phenomena (> 4.3), this Dynamic Model for the Cognitive Development of Collocations ( DMCDC -model; > 4.4) is then put to a initial test in the subsequent chapters. In order to lay the methodological groundwork for RQ s 2 and 3, chapter 5 is concerned with ways to operationalise the theoretical considerations from chapters 2 to 4 and discusses which measures need to be taken in order to be able to investigate differences between first and second language speakers as well as the role contextual factors might play. This methodology is then applied and discussed in chapters 6 and 7. Finally, chapter 8 lays out this study’s major findings and implications. Figure 1.1 illustrates the outline of the present study. Figure 1.1: Scope of this Study 25 2 Collocations as Constructions When I use a word, 'Humpty Dumpty said in rather a scornful tone, 'it means just what I choose it to mean - neither more nor less.' 'The question is,' said Alice, 'whether you can make words mean so many different things.' […] 'When I make a word do a lot of work like that,' said Humpty Dumpty, 'I always pay it extra.' 'Oh! ' said Alice. She was too much puzzled to make any other remark. (Carroll 1871 / 2001: 224-225) 1 The debate about the defining properties of collocations seems to be as old as research within the field of lexical co-occurrences itself. Partly because studies on the character and value of collocations have been conducted for various reasons and purposes; the areas of research range from a very applied EFL context (de Cock 1999; Bahns 1997; Howarth 1996; Cowie / Howarth 1995; Hausmann 1984) to lexicography (Cowie 2012; Mel’cuk 1998; Benson / Benson / Ilson 1997; Bahns 1996; Hausmann 1985) and more theoretical, general linguistic description (Coseriu 1967; Firth 1951 / 1964, 1957 / 1968). However, a duality in conception might also have developed from the fact that interest in these special cases of lexical co-occurrence arose roughly at the same time within different schools of linguistics. British Contextualism, with its most prominent representative J. R. Firth at its centre, is frequently quoted as the cradle of the modern concept of collocation (Barnbrook / Mason / Krishnamurthy 2013: 36; Bartsch 2004: 30; Lehr 1996: 7), but also within lexicography authors like Palmer (1933), Hausmann (1984) or more recently Siepmann (2005) concerned themselves with lexical co-occurrences. Their focus still tends to lie more on the properties of collocations than on their contribution to the human linguistic system. Hence 1 In this quote from Through the Looking-Glass , first published in 1871, Carroll uses the sharp tongue of the egg-shaped creature Humpty Dumpty to remark on the polysemous character of words. Later, in the article “The stage and the spirit of reverence”, the author picks up on the sociolinguistic dimension of this thought: “[…] no word has a meaning inseparably attached to it; a word means what the speaker intends by it, and what the hearer understands by it, and that is all … This thought may serve to lessen the horror of some of the language used by the lower classes, which it is a comfort to remember, is often a mere collection of unmeaning sounds, so far as speaker and hearer are concerned.” (Carroll 1871 / 2001: 224). It is interesting to note that this quote not only emphasises the crucial role of textual as well as individual context, but also that utterances, which Carroll might have considered part of the “horror of some of the language used by the lower classes”, today might be well established and accepted. 26 2 Collocations as Constructions today, the field of collocational research seems to be split into contextual-oriented approaches and significance-oriented approaches 2 (Granger / Paquot 2008; Siepmann 2005; Herbst 1996). Based on John Sinclair’s research (Sinclair 1991, 1966; Sinclair / Jones / Daley 1970 / 2005) on lexical frequency, contextual-oriented approaches nowadays mostly come in the shape of corpus-based, frequencyoriented research, while representatives of the significance-oriented approach focus on typological aspects relevant for the non-native language learner, such as (non-)compositionality and variability. Today, however, both sides seem to have reached a point where they realise that they have more in common than they disagree on. Constructions, on the other hand, are in their broadest sense defined as “form and meaning pairings” (Goldberg 2006: 3). So, with the literal translation of collocation as a certain kind of “placing together” (Palmer 1933: 7), the term as such might suggest that the phenomenon of collocation is predominantly regarded as a formal or structural one, lacking an overall meaning dimension, which would be crucial for any kind of construction (> 4). But, as the following chapters will show, even very early accounts of collocation consider not only syntagmatic relations for the constituents of a collocation, but also discuss implications for meaning which stem from a contextual or paradigmatic level within an analysis. This suggests that collocations might be more than just formal, item-specific restrictions on word co-occurrences and that there is an inherent meaning dimension, which makes it possible to regard collocations as a form of construction in a construction grammar sense. Following this idea, this chapter will use a selection of prominent approaches towards collocations from both camps and investigate their understanding of lexical co-occurrences to shed some light on their potential constructional character. Chapters 2.1 and 2.2 will, therefore, outline the two basic views on collocation and highlight potential connections to a modern construction grammar approach. Chapter 2.3 then concludes this section, suggesting a working definition and addressing some critical issues which need to be dealt with once a study assumes cognitive features to be part of collocational phenomena. 2 The following chapters will refer to these views as “context-oriented approaches” and “significance-oriented approaches”. Often, “significance-oriented” is associated with statistical tests for significance, which would make “significance-oriented” a term to be attributed to corpus-based studies of collocations. The taxonomy of this study, however, is based on the respective focus of the different approaches. Thus, researchers who consider the context as the source of collocational combinations are grouped under the headline of “context-oriented”, while definitions which focus on the prominence of an individual collocational combination are discussed under the label of “significance-oriented”. 2.1 Context-Oriented Approaches 27 2.1 Context-Oriented Approaches After over 50 years of debate, the famous words “You shall know a word by the company it keeps! ” (Firth 1957 / 1968: 179) still seem to sum up the quintessential character of any definition of collocation quite well. But Firth was not the first to acknowledge the importance of context and lexical co-occurrence for the semantic interpretation of a word. In lexicographic description, authentic context and combinatorial restrictions were already a concern for lexicographers like Samuel Johnson 3 (1747 / 1837) and later Harold Palmer (1933). Yet, their focus was an accurate and, in the latter’s case, learner-appropriate account of the English language (> 2.2), while Firth shifted the perception of collocation from an observable phenomenon to a linguistic principle, which he considered one of the basic relations within a linguistic system, alongside grammar: Collocations of a given word are statements of the habitual or customary places of that word in collocational order but not in any other contextual order and emphatically not in any grammatical order. The collocation of a word or a ‘piece’ is not to be regarded as mere juxtaposition, it is an order of mutual expectancy. (Firth 1957 / 1968: 181) This definition of collocation is deeply rooted in Firth’s firm belief in the importance of context, which he considers the main source of any kind of meaning. Unlike for example Odgen and Richards ( 10 1956: 10-11) and modern cognitive linguists, he claims that there is no such thing as a “hidden mental process, but chiefly […] situational relations in a context of situation […]” (Firth 1951 / 1964: 19). Hence, to Firth, collocation is a lexical phenomenon which operates on a syntagmatic level to create meaning through (inter-)relation and “mutual expectancy” of words. Compared to most modern definitions of collocation, this 3 In his The Plan of a Dictionary of the English Language Johnson (1747 / 1837: 442-443) wrote: “Words having been hitherto considered as separate and unconnected, are now to be likewise examined as they are ranged in their various relations to others by the rules of syntax or construction, to which I do not know that any regard has been yet shwen in English dictionaries, and in which the grammarians can give little assistance. The syntax of this language is too inconsistent to be reduced to rules, and can be only learned by the distinct consideration of particular words as they are used by the best authors. Thus, we say according to the present modes of speech, The soldier died of his wounds, and the sailor perished with hunger: and every man acquainted with our language would be offended by a change of these particles, which yet seem to be originally assigned by chance, there being no reason to be drawn from grammar why a man may not, with equal propriety, be said to die with a wound, or to perish of hunger.” It is quite remarkable that many ideas which still can be found at the centre of many linguistic debates were already mentioned by Johnson within these few lines, like, for example, the idiosyncratic and usage-based character of a language or the importance of individual speakers and syntactical constructions. 28 2 Collocations as Constructions seems to be a very restricted view, but at a time when words and their meaning were nothing more than a peripheral phenomenon within linguistic research, a statement like this was a first attempt to shift the focus from a grammar-centred slot-filler perspective to a conception of language which needs both lexicon and syntax to form meaningful linguistic speech. One could even argue that Firth’s acknowledgement of meaning-building syntagmatic relations in language did in fact lay the ground for modern cognitive linguistics (> 4), since it is based on the idea that meaning arises through context, which is ultimately nothing but usage and therefore in its basic assumption quite similar to modern usage-based approaches (Bybee 2010; Goldberg 2006; Tomasello 2005). Unfortunately, Firth never gave a more detailed account of what he considers to be sufficient context for “meaning by collocation”, but it becomes apparent from his publications and a few analyses (Firth 1951 / 1964) that this does not simply refer to some kind of compounding: Meaning by collocation is an abstraction at the syntagmatic level and is not directly concerned with the conceptual or idea approach to the meaning of words. One of the meanings of night is its collocability with dark , and, of dark , of course, collocation with night . (Firth 1951 / 1964: 196) The example of dark night, however, seems to be somewhat misfortunate. First, because it suggests a certain spatial closeness of collocates, which, as the next example will show, is not a necessary prerequisite for the concept of “context” and “meaning by collocation”. Secondly, in this case, it is difficult to argue for “meaning by collocation”, because the co-occurrence of dark and night could also refer to real-life, extra-linguistic experience, since darkness is one of the predominant semantic components of night , even without its co-occurrence with dark (Herbst 1996: 384). Therefore, another of Firth's frequently quoted examples might be more suitable to explain the basic assumptions of this early contextualistic approach: you silly ass . Here, Firth claims that the additional meaning which arises through collocation is some kind of “personal reference” (Firth: 1951 / 1964: 194-195). So, when encountering a sentence like (4) one would directly conclude that the referent of ass is a person and not a donkey, because of past experiences with similar cases, such as (5) to (8) 4 : (4) BNC A0D 1316 I'm on in five minutes, that old ass slowed me down. (5) An ass like Bagson might easily do that. (6) He is an ass. 4 Here, examples (5) to (8) are taken form Firth (1951 / 1964: 195). 2.1 Context-Oriented Approaches 29 (7) You silly ass. (8) Don’t be an ass. Take another example from this chapter’s introductory quote: scornful tone . In this case, one can conclude that what the egg-shaped creature said was not very nice or polite even without necessarily being familiar with an adjective like scornful , since tone is often encountered with negatively marked premodifications such as dismissive or harsh 5 . However, these examples show that, even though Firth explicitly stresses the syntagmatic character of collocations, his conception nevertheless includes an inherent paradigmatic level through the notion of “mutual expectancy”, since any expectation is based on previous experience. Therefore, in order to know the company a word keeps, one needs to have encountered this very word in various contexts and processed it with the help of some kind of cognitive storage. Yet, whether Firth would agree with this observation or not is difficult to tell, since his description of collocations remains in a rather vague state throughout his publications (Firth 1951 / 1964; 1968) and Lyons (1977: 612) is certainly right when he observes: “Exactly what Firth meant by collocability is never made clear.” Despite all criticism, Firth’s thoughts on collocation and its influence on the meaning of a lexical unit did inspire his students, John Sinclair and Michael Halliday. Based on Firth’s first sketch of these special phenomena of partly restricted co-occurrence, two approaches towards collocation have emerged: a frequency-oriented approach with corpus linguists like John Sinclair and Göran Kjelmer at its centre and a text-oriented approach, which was primarily developed by Halliday. While Sinclair’s definition of collocations as an observable phenomenon of statistical significance within a linguistic system is still the basis for collocational research today (among others Hanks 2013; Moon 2009; Bartsch 2004; Stefanowitsch / Gries 2003; Hunston / Francis 2000), Halliday’s concept of collocation as a meaning-creating phenomenon is frequently rejected as a misnomer for a special form of cohesive tie within textlinguistics (Herbst 1996: 381). However, especially in his early publications, it was Halliday who explicitly added a paradigmatic level to a contextualist perception of lexical co-occurrences. He was, therefore, the first to acknowledge an experience or usage-based dimension within lexis. Like his approach to grammar, Halliday’s general conception of collocation is strongly influenced by Firth’s notion of the importance of context. Yet, employ- 5 Also compare the discussion of semantic prosody (> 3.1). 30 2 Collocations as Constructions ing a more systematic definition of basic principles within language, he tries to base the relations of lexical as well as syntactical linguistic elements on conceptually more solid ground. He introduces a two-dimensional model to illustrate the most prominent interdependencies of language (Halliday 1966: 152-153). chain (syntagmatic) choice (paradigmatic) grammar structure system lexis collocation set Table 2.1: Lexical and grammatical relations (based on Halliday 1966: 152-153) Following de Saussure (1916 / 1967: 147-148) he argues for a lexico-grammatical as well as a syntagmatic-paradigmatic level of analysis and positions collocation at the intersection of syntagmatic, lexical phenomena (table 2.1). By definition, a paradigmatic level is excluded here. This would make collocation a rather broad category for all things within one string of words (= chain), but not every linear co-occurrence would count as a collocation in Halliday’s books. Again, the notion of “mutual expectancy” or “significant proximity” plays a crucial role: First, in place of the highly abstract relation of structure, in which the value of an element depends on complex factors in no sense reducible to simple sequence, lexis seems to require the recognition merely of linear co-occurrence together with some measure of significant proximity, either a scale or at least a cut-off point. It is this syntagmatic relation which is referred to as ‘collocation’. (Halliday 1966: 152) Coming back to the example of Humpty Dumpty, what Halliday would presumably regard as collocations are expressions like neither more nor less , do work or make a remark , not so much because of their reoccurring character throughout a text or discourse, as would be the case for Firth’s concept of “meaning by collocation”, but rather because of their restricted nature. He therefore stresses that “[i]n lexical analysis it is the lexical restriction which is under focus: the extent to which an item is specified by its collocational environment.” (Halliday 1966: 156). Neither more nor less , for example, could not, or at most fairly rarely, be expressed as neither less nor more 6 , do work is much more usual than make work and vice versa, and one makes a remark instead of does a remark . The adjective scornful in a combination like scornful tone, on the other hand, might be mutually expected in terms of meaning or, to be more precise, the likelihood of a negatively marked adjective to occur with the nominal use of tone , but it 6 In a BNC query neither more nor less scores 21 hits, while neither less nor more occurs only once. 2.1 Context-Oriented Approaches 31 is not an item which shows “significant proximity” with the noun, since the same concept could be equally well expressed with condescending or sarcastic 7 . To analyse likely candidates for lexical co-occurrence, Halliday establishes the concept of lexical sets , which is of course closely related to collocation , since its members are selected based on “the similarity of their collocational restriction” (Halliday 1966: 156) Later he continues: “If we say that the criterion for the assignment of items to sets is collocational, this means to say that items showing a certain degree of likeness in their collocational patterning are assigned to the same set.” (Halliday 1966: 158). The value of this distinction between paradigmatic and syntagmatic level is that it clearly shows two principles which apply to any word co-occurrence with an intuitively - at least for native speakers of English - close relationship: the fact that the constituents cannot be substituted by any random word with roughly the same semantic concept ( collocation ) and a certain underlying meaning value which words with similar collocation share ( lexical set ). This is strikingly similar to the principles of cognitive linguistics in general and construction grammar in particular, since more abstract constructions, like argument structure constructions (Goldberg 2006: 19-44), are at their heart very much like Halliday’s collocations and lexical sets. They open up slots on a paradigmatic level, which influence word choice based on certain meaning restrictions, not unlike collocations, so ultimately this leads directly to a notion of collocations as constructions (> 2.3; 4). Admittedly, in later publications like his 1976 co-authored book on Cohesion in English (Halliday / Hasan 1976), Halliday focuses more strongly on Firth’s original concept of “meaning by collocation” and draws away from his general thoughts on the structure of lexical relations within language. The Collins Birmingham University International Language Database ( COBUILD ), the result of cooperation between the University of Birmingham and Collins Publishers, started in 1980. The aim of this project was to build a large corpus of contemporary text (100 million words) in order to analyse the lexical and grammatical foundations of the English language on a more systematic basis. The pilot project, the OSTI Report (Sinclair / Jones / Daley 1970 / 2005), had already yielded some interesting insights into the pervasive nature of collocational phenomena, so with this follow-up project Sinclair and colleagues concerned themselves once more with a detailed, corpus-driven 8 analysis of 7 Both show a higher rate of mutual information in the BNC, while the results for scornful almost exclusively refer to quotes from Humpty Dumpty, which supports the perspective of Firthian collocations and their close relation to stylistics and idiolectic use (cf. Firth 1951 / 1964: 196-203). 8 Sinclair retained his belief that a researcher should be lead by a corpus and not vice versa. In a posthumously published preface for a special issue of the International Journal 32 2 Collocations as Constructions lexical items. The result was, as intended, a comprehensive corpus-based dictionary of the English language, the Collins COBUILD English Language Dictionary ( COBUILD 1, Sinclair 1987a), but the OSTI report and the conception of the COBUILD 1 also influenced research on collocations in general. Until then, analysis of collocations was based on a rather random sampling of examples and seemed to be somewhat eclectic at times. So, following Firth’s thoughts on the co-occurrence of words, Sinclair took further steps towards the operationalization of the concept of collocation. In order to do so, he labelled its components node , for the lexical item under investigation, and collocate , for “items in the environment” of the node which co-occur in a text or sentence within a certain distance, the span 9 (Sinclair 1966: 415). Leaving introspection behind, all these elements can then be retrieved and correlated with the help of a corpus, which could be regarded as a kind of replica of adult human linguistic experience. In principle, this leads to a very simplified basic assumption: the more frequent, the more important. Unsurprisingly, the notion of frequency is also at the centre of Sinclair’s concept of collocation : Collocation is the cooccurrence of two items in a text within a specified environment. Significant collocation is regular collocation between items such that they co-occur more often than their respective frequencies and the length of text in which they appear would predict. Standard statistical tests can be used to tell whether the association between word A and word B is a significant one. ( Jones / Sinclair 1974: 19) Once again, true to contextualist tradition, the term is defined against the background of co-occurrence and context (“specified environment”). A new aspect is the additional dimension of observable frequency and with it the distinction between significant and casual collocations at its centre (Sinlcair 1966: 418). While basically any co-occurrence of lexical items could be considered a casual collocation , significant collocations are regarded to be special since two items cooccur more often than expected by chance. This correlation, however, can only be established by employing, as Sinclair points out, statistical tests. However, of Corpus Linguistics he affirms this conviction with the following words: “A recurrent theme in the papers is the attitude I have to corpus evidence; the corpus has things to tell me, and I try to work out where it is heading. I have been surprised at the confidence of so many scholars, who seem to think that they have something to tell the corpus.” (Sinclair 2007: 157) 9 In the OSTI report “span” is defined as follows: “A decision was made, based on its results, to set a limit of four words on either side of the node. Any word appearing within this span was to be considered as a collocate, and tested for significance.” (Sinclair / Jones / Daley 1970 / 2005: 13) Since Sinclair and colleagues report good results with this set up, a span of ±4 is also used as a standard setting for the study at hand. 2.1 Context-Oriented Approaches 33 the nature of these tests and their results are, still today, a source of multifaceted debate (Evert 2005: 137-164; Sinclair / Jones / Daley 1970 / 2005: xxi). Hence, a point of criticism which eventually arose (Granger / Paquot 2008; Herbst 1996) is that corpus research, by definition, depends to a large extent on the mathematical methods applied. Since each statistical test yields different results and suggests different interpretations, a more or less subjective factor as far as the decision for or against a methodology is concerned, cannot be denied. This ultimately means that whether a combination of lexical items is indeed significant or not still depends, to a certain degree, on the choice and intention of the author of a study 10 . Similar objections could be raised with regard to the corpus as such. Not every corpus is suitable for any kind of research question. Corpus size and design affect the outcome of most statistical tests, and some corpora might simply be too small to answer certain kinds of question, like, for example, a comprehensive collocational set for less frequent words, such as scornful . Furthermore, corpus evidence and statistically significant collocations need to address the question of relevance, because a significant correlation might also occur because of the topic and / or genre of a text and not because a certain combination is particularly salient (> 4.2.1). The same might be true for collocations which describe real world phenomena, like build , buy and sell as the most frequent verbal collocates for house (Herbst 1996: 388-389). However, Sinclair and his colleagues claim that “[…] at this moment it is impossible to prove or disprove this because the examination has been confined to text of one particular kind.” (Sinclair / Jones / Daley 1970 / 2005: 76) and therefore suggest that “[a]ll collocations must, therefore, be accepted at their face value, since they have all actually occurred more than three times in the sample of spoken language and have passed a fairly stringent significance test.” (Sinclair / Jones / Daley 1970 / 2005: 76). Table 2.2 lists just a small selection of statistical measures, namely Mutual Information ( MI ), z-score, t-score and log-likelihood, for Humpty Dumpty’s “collocations” to demonstrate their respective effect on a potential interpretation. The results are taken from the BNC and, based on Sinclair’s suggestion ( Jones / Sinclair 1974: 21-22), the span was set at four, with a minimum of occurrence of three (Sinclair / Jones / Daley 1970 / 2005: 42). 10 The OSTI-Report, for example, only analyses collocations which occur three times or more in a large monitor corpus, the Bank of English (BoE). Since the present study, on the other hand, seeks to investigate creative alternations of collocations which do not occur very frequently within a corpus, there are no limitations of overall frequency for either the combination as such or any potential collocates. 34 2 Collocations as Constructions raw frequency MI z-score t-score loglikelihood scornful tone 4 (3) 6.6256 17.1883 1.9797 28.9425 neither more nor less 21 n / a n / a n / a n / a neither more 64 -0.7715 -4.2817 -5.6568 -22.3180 nor less 61 0.9824 5.3316 3.8571 22.9026 do (a lot of) work 1948 (17) 0.8176 25.3327 19.0934 526.6274 do_ VERB (a lot of) work 5442 (70) 1.3057 69.0586 43.9290 3412.2975 make (any other) remark 89 (0) 2.5904 19.1776 7.8675 171.4859 make_ VERB (any other) remark 471 (0) 3.5492 67.8321 19.8486 1465.5181 Table 2.2: Corpus-based association measures for “Humpty Dumpty’s collocations” What becomes apparent immediately is that the result for each of the items selected differs depending on the respective statistical measurement. In other words, in a ranking scornful tone , for example, scores quite high when it comes to MI but has a relatively low t-score and log-likelihood value. This is, of course, due to the fact that each method focuses on a different parameter and, therefore, yields different results. While Mutual Information and z-score highlight collocations with a rather low frequency but strong likelihood to co-occur, t-score and log-likelihood show high-frequency pairs. For MI and t-score a result of two or three is considered to be statistically significant (Oakes 1998: 11-12.; Hunston 2002: 69-75; McEnery / Xiao / Tono 2006: 56). Chapter 5 will provide a more thorough discussion of tests for statistical significance; the crucial point to make at this stage is that depending on the method chosen, scornful tone , do work and make a remark could be regarded as statistically significant collocations. None of these methods, however, brought to light the most frequent verbatim word combination neither more nor less . Of course, one could argue in favour of neither more nor less as a word-like, lexically stored chunk. Yet, there are possible variations, like I neither knew nor cared or neither the TV nor the video ( OALD 7: neither ), which suggest an underlying constructional principle, like for example [neither X nor Y]. Scornful tone , on the other hand, might not be as relevant as the corpus analysis suggests, since, upon closer examination, three of the four hits are quotes of the very same passage, namely Humpty Dumpty’s and Alice’s conversation in Lewis Carroll Through the Looking-Glass . Despite the biased data for this co-occurrence, scornful tone could nevertheless qualify as a linguistically interesting word combination, or, to be more precise, 2.1 Context-Oriented Approaches 35 the combination scornful and a noun phrase [ scornful +N], since, according to the BNC , all nouns with the premodification scornful can be subsumed under the semantic headline of “visual and auditive senses” and tone , in this case, behaves right along the line. There seems to be a kind of semi-lexically filled collocation, like [ scornful +N <visual and auditive senses> ] supporting the interpretation of a word combination like scornful tone . This phenomenon of underlying semantic properties of certain, partially fixed word combinations has also been observed by John Sinclair (1996, 1998, 2004) and Bill Louw (1993), who coined the term semantic prosody , which Gavioli later describes as “[…] the way in which words and expressions create an aura of meaning capable of affecting words around them.” (Gavioli 2005: 46; > 3.1). Working with the COUBILD corpus, the Bank of English (BoE), Sinclair further realised that patterns like these are, in fact, themselves more pervasive in the English language than they would have been expected to be. He calls this the idiom principle or principle of idiom : The principle of idiom is that a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments. To some extent, this may reflect the recurrence of similar situations in human affairs; it may illustrate a natural tendency to economy of effort; or it may be motivated in part by the exigencies of real-time conversation. However it arises, it has been relegated to an inferior position in most current linguistics, because it does not fit the open-choice model. (Sinclair 1991: 110) This observation stands in stark contrast to a traditional conception of language, which sees grammar and lexis as more or less detached from each other and lexical items as inferior to grammatical structures for which they simply serve as fillers for the slots they open up. In this respect, the notion of statistically significant collocations and the idiom principle can again be closely related to concepts within construction grammar, since both approaches take authentic language as a starting point and deduce more general linguistic principles from it. The purpose of the identification of these patterns and / or constructions in Sinclair’s research is, of course, very different from modern, cognitive-oriented corpus studies, which are mostly concerned with the mental representation of linguistic concepts and its consequences for language acquisition and learning (> 4). But the corpus-driven, inductive nature of corpus linguistics as a methodology makes it a valid tool for cognitive studies focusing on aspects like frequency of encounter , salience , pre-emption and entrenchment (Ambridge / Lieven 2011; Goldberg 2006; Stefanowitsch / Gries 2003). Furthermore, comparing potential collocates of a node word, looking at collocational sets, and identifying the most likely patterns of co-occurrence yields very similar results to what construction 36 2 Collocations as Constructions grammar has to offer in terms of underlying, partly delexicalised constructions within the English language system (> 5.1.2). Moreover, the functional side of semi-abstract constructions, like [ scornful +N], is also very similar to Sinclair’s notion of semantic prosody. To sum up the general assumptions which most context-oriented approaches share, it could be concluded that this categorisation implies that every dimension of meaning is assumed to be context-derived. Therefore collocates should be regarded as meaning-creating context (syntagmatic relations). Furthermore, previous relations to other lexical items also shape the two dimensions of collocational meaning (paradigmatic relations). Here, syntagmatic relations refer to directly encountered co-occurrences, which do not necessarily need to be special or noteworthy in any way. It is rather the level of previously experienced context which makes a combination of two (or more) words significant and might hence potentially contribute to the constructional process of delexicalised or semi-delexicalised patterns. Moving away from sole introspection, more recent context-oriented approaches predominantly rely on results gained through corpus research or similar sources for extensive data collection, like large-scale surveys. In the analysis of corpora, syntagmatic relations become obvious through concordance lines, while reoccurring patterns throughout different texts and genres help with analysis of paradigmatic relations. 2.2 Significance-Oriented Approaches While British Contextualism at first focused on the intraand intertextual relations which collocations establish within texts, significance-oriented approaches were from the very beginning more concerned with an adequate description of a collocation’s internal features and structures. This is because, unlike British Contextualists, these approaches are predominantly linked through the same purpose: EFL -oriented lexicography. Hence, Cowie, Howarth, Mel’cuk, Hausmann, and Siepmann might not be as closely linked by their academic vitae as Firth, Halliday, and Sinclair, but as lexicographers, they are all interested in finding a way to identify suitable language material which is relevant for learners of English and the compilation of a (learner’s) dictionary. Among the pioneers in western European lexicography were Harold Palmer, A. S. Hornby, and A. P. Cowie. Towards the beginning of the twentieth century, Palmer had opened a school in Belgium and became increasingly interested in lightening students workload with the help of a restricted vocabulary; a list that contained the most important and productive words for everyday conversation. This was the start of what since then has often been referred to as the Vocabulary Control 2.2 Significance-Oriented Approaches 37 Movement . In the 1930s, authorities in Tokyo commissioned him to design a limited vocabulary list for Japanese schools. A number of well-established publications arose from this project, such as the Second Interim Report on English Collocation (Palmer 1933), the General Service List (West 1953 / 1983) and of course The Advanced Learner’s Dictionary of Current English (Hornby / Gatenby / Wakefield 1948) 11 . From a very early stage, while working on the First Interim Report on Vocabulary Selection (Palmer 1930), Palmer and one of his young colleagues, A. S. Hornby, realised that it would not be sufficient to provide learners with a mere alphabetical word list. Therefore, they collected 3879 collocational pairs and structured them according to word-class and internal structure. The introduction of the Second Interim Report on English Collocation outlines the main objective of this project as follows: It is not enough to suggest in a haphazard way the inclusion or exclusion of any word, word-compound, phrase, proverbial expression, etc. that may occur to us. The work must start with collecting and classifying, and this must be done on a large scale and according to an organized plan - and we have been doing on a large scale and according to an organized plan this work of collecting and classifying those things that must be collected and classified. (Palmer 1933: 1) Coming back to Humpy Dumpty’s phraseological chunks: scornful tone or neither more nor less do not occur at all. Do work and make a remark, on the other hand, are both part of a rather large and heterogeneous category called “verb collocations: combinations of verbs with specific nouns” (Palmer 1933: 50). Introducing notions like (x to x N 3 ) or x to x N 3 , the authors tried to subdivide this group into members which take or can take an additional 12 object, like make a remark and those which remain unaltered, such as do work . This is interesting since the idea of a lexically fixed word combination with a slot for potential alternation is quite similar to patterns in valency theory (Herbst / Heath / Roe / Götz 2004) or pattern grammar (Hunston / Francis 2000) and of course semi-lexical constructions like Goldberg’s argument structure constructions (Goldberg 2006: 19-44). However, since the verbs are listed in alphabetical order, this again resembles a rather long list of more or less random entries selected by proficient native speakers. Thus, in later publications, such as Thousand-Word English (1937), Palmer and Hornby included information on pronunciation, as well as morphological alterations and disambiguation for polysemous words. The latter they achieved through 11 In its later versions this book became known as the Oxford Advanced Learner’s Dictionary (OALD). 12 This fact is indicated by brackets in Palmer’s (1933) taxonomy. 38 2 Collocations as Constructions linking, for example, the lemma with its potential antonyms and collocations, as can be seen in the example of do (box 2.1): Box 2.1: Entry for the lemma do from Thousand-Word English (Palmer / Hornby 1937: 43) In general Palmer’s own selection subsumes a rather wide range of word occurrences and syntactic patterns under the term collocation (Palmer 1933: 7); nonetheless, his work has without a doubt had “a profound and enduring influence on modern EFL dictionary-making” (Cowie 1999: 52). Since then lexicographers have struggled to develop a classification for “those things that must be collected and classified” (Palmer 1933: 1) which is comprehensive and universally applicable at the same time. Consequently, Palmer’s work did not only inspire scholars like Hornby 13 and Cowie 14 to enhance the lexicographical description of phraseological language in general and collocations in particular, but also lay the ground for further research on collocational phenomena within modern language teaching and language acquisition research (Ellis / Simpson-Vlach / Maynard 2008; Jehle 2007; de Cock 2003; Bahns 1997; Howarth 1996) A. P. Cowie shares Palmer’s concern that a broad definition of partly fixed and opaque language could again lead to an unnecessary workload for language learners (Cowie 2012: 390). Therefore, he proposes a more clear-cut definition of the combinations within the spectrum of idioms, collocations, and free combinations. Leaving Palmer’s more formalistic approach behind, he regards compositionality and variability as the two defining criteria of composite units (table 13 Together with Palmer, Hornby co-authored the word list Thousand-Word English (Palmer / Hornby 1937) and became one of the editors of the Idiomatic and Syntactic English Dictionary (Hornby / Gatenby / Wakefield 1942), which later would become The Advanced Learner’s Dictionary of Current English (Hornby / Gatenby / Wakefield 1948). 14 Amongst others, Cowie became one of the publishers of the Oxford Dictionary of Current Idiomatic English (Cowie / Mackin / McCaig 1983). 2.2 Significance-Oriented Approaches 39 2.3). According to this classification, the common denominator for collocations is a certain level of variability with a distinction between open and restricted collocations on the compositionality level. compositional non-compositional variable open collocation blow a trumpet restricted collocation blow a fuse not variable figurative idiom blow your own trumpet pure idiom blow the gaff Table 2.3: Categorisation of lexical word combinations according to Cowie (1983: xii-xiii) with examples from Howarth (1996: 15-16) Blow a trumpet , for example, would be regarded as an open collocation because trumpet as well as blow occur in their literal sense (compositional) and are at the same time exchangeable with other lexemes within the same paradigm like horn or play (variable). Blow a fuse , on the other hand, also shows a certain degree of variability. While, admittedly, blow is relatively fixed, top or stack could also be used to express the concept of “to get very angry” ( OALD 7: blow a fuse ). In all cases, however, its meaning lies beyond the prototypical sense of the verb blow and the respective nouns. Only via a chain of metaphorical extensions, for example, “ anger is the heat of a fluid in a container ” (Lakoff 1990: 383), could one construe a link between the literal meaning of this collocation and its figurative sense. This restriction makes blow a fuse less clear and comprehensible for the learner and, therefore, needs to be part of a comprehensive learner’s dictionary. For open collocations such as blow a trumpet, no further explanation is necessary, at least from a decoding point of view. But in many cases, the concept of (non-)compositionality at this stage is not very clear and precise. Blow a fuse , for example, could also be used in a more literal sense of ‘causing a fuse to melt’. Here the most opaque element is blow . This might suggest more compositionality and, therefore, would put blow a fuse under the category of open collocation, yet there are not many other items which can be destroyed through “blowing”, like gasket or cover , which then again would yield an analysis as a figurative idiom. Thus, as for statistically significant collocations, the question arises whether the limited variability of blow a fuse is due to linguistic restriction or rather because it belongs to a special vocabulary on a pragmatic basis which is restricted to the fields of electricity and insulation (Herbst 1996: 386-389; > 2.1). Furthermore, it is also interesting to note that the use of blow a fuse in its “enragement” sense is probably a 40 2 Collocations as Constructions metaphorical extension of its use to describe the destruction of a piece of wire or rubber. This concept of a mutual influence of meanings mirrors to a certain degree what construction grammar has termed inheritance relations within an inventory of constructions (Fischer / Stefanowitsch 2006: 4-5; Fried / Östmann 2004: 23-14), which today is regarded as a basic principle within language acquisition by most proponents of these linguistic approaches. To refine his typology of composite units, Cowie, in a later publication with Peter Howarth, adds two more features to his characterisation of collocations and idioms (Cowie / Howarth 1995: 82), namely that phraseological units, independent from their variability and semantic opaqueness, are institutionalised and memorised. They conclude that “[i]t is best to think of a collocation as a familiar (or institutionalized), stored (or memorized) word-combination with limited and arbitrary variation.” (Cowie / Howarth 1995: 82). Similar to usagebased theories, this stresses the fact that collocations need to be repeatedly encountered and learned, an observation which has also been made by Pawley and Syder (1983): Indeed, we believe that memorized sentences and phrases are the normal building blocks of fluent spoken discourse, and at the same time, that they provide models for the creation of many (partly) new sequences which are memorable and in their turn enter the stock of familiar usages. (Pawley / Syder 1983: 208) Moving away from a clear categorisation based on semantic opaqueness in favour of the extent of limited choice and the number of choices a language user has within a collocation, this distinction is in fact very similar to Sinclair's notion of casual and statistically significant collocations. Yet, the source of this significance is, of course, different from a frequency-oriented approach, since compositionality, in Cowie’s terminology, still lies in the eye of the (native speaker) beholder. With this definition, lexicographers and EFL teachers can of course fairly easily account for a certain feel of collocability, which might not be analysed as statistically significant but nevertheless seems to be an important contribution to a language learner’s lexicon. On the other hand, introspection and personal judgment still play a fairly important role in this definition, which makes it difficult to justify any classification on more general grounds. 2.2 Significance-Oriented Approaches 41 subtypes of collocation Cowie / Howarth (1995) “Humpty Dumpty” a) invariable collocation foot a bill -/ b) collocation with limited choice at one point take / have / be given precedence (over NP ) scornful / dismissive/ condescending tone do / carry out / perform work make / utter / offer a remark c) collocation with limited choice at two points get / have / receive a lesson / tuition / instruction (in NP ) neither more / big / TV nor less / small / radio d) overlapping collocation convey a point communicate a view *communicate regrets make a remark make an answer utter a remark *utter an answer do work / research carry out work / research pursue research ? pursue work Table 2.4: Categorisation of open collocations (Cowie / Howarth 1995: 83) Again, Humpty Dumpty and his phraseology shall be used to exemplify this. In their study Cowie and Howarth explicitly focus on open collocation and claim that these phraseological units are most fitting for the investigation of proficiency levels amongst native and non-native speakers of English. They introduce four subcategories for collocation (Cowie / Howarth 1995: 83). At first glance, the attempt to assign scornful tone , do work , make a remark and neither more nor less to these categories seems to work quite well (table 2.4), until one considers closely related collocations, like make an answer and utter a remark . Here, the limited interchangeability, for example with utter an answer or also pursue work , would then put make a remark and do work into category d) and water down the precision of this classification. In fact, this applies to most members of b) and c), which is probably due to the fact that, whenever one allows for variability, a certain degree of overlap is one of its potential consequences. Thus, this typology seems to demonstrate the variability-side of collocations quite well and also accounts for more than one slot within a collocation. But it struggles to provide a clear definition or explanation for restrictions, which then makes it seem rather arbitrary and vague. Cowie and Howarth’s subcategories 42 2 Collocations as Constructions are nevertheless remarkable, since, even though ”invariable collocations” seem to resemble in fact what used to be called idioms, Cowie and Howarth’s typology now looks strikingly similar to semi-lexical constructions, with one or more open slots and fixed lexical items in between (> 4). At the same time as Cowie developed his first framework on collocations, Russian linguists like Igor Mel’čuk studied collocational phenomena in a more formalistic way. Here the focus does not lie on the language learner him / herself. Mel’čuk sees his approach rather as a comprehensive and structured way to describe phraseological phenomena in order to obtain thoroughly processed data for the compilation of a dictionary or computational linguistic purposes. Therefore, the native speaker’s viewpoint stands at the centre of his conception of text production (Mel’čuk 1995: 25-24). In general, Mel’čuk assumes that whenever a (native) speaker wants to express a certain concept (ConceptR), s / he needs a phonetic representation (PhonR) from a language (L) in order to realize a semantic representation (SemR). Lexical functions (LF) are then part of a comprehensive inventory of semantic representation (SemR) (figure 2.1); a conception which in fact looks rather similar to de Saussure’s (1916 / 1967: 76-79) notion of the linguistic sign and therefore also to any constructions. Figure 2.1: Mel’čuk’s (1989, 1995) process of text production compared to de Saussure’s (1916 / 1967: 76-79) linguistic sign Based on this concept, Mel’čuk’s idea was to describe lexical units like set phrases and collocations on the basis of a general underlying meaning (Mel’čuk 1995: 186). Since collocations, as Mel’čuk claims, “make up the lion’s share or the phraseme inventory […]” (Mel’čuk 1998: 24) and almost all collocations are covered by syntagmatic lexical functions, it seems useful to have a closer look at some examples of lexical functions. Box 2.2 shows the respective LF s for the set phrases cleanly shaven , lend support or pass an exam , as well as scornful look , make a remark, do work and neither more nor less : 2.2 Significance-Oriented Approaches 43 Magn(shave N ) = close, clean Oper(support N ) = to lend [~] Real(exam N ) = to pass [ ART ~] Magn(look N ) = scornful, condescending, dismissive Oper(remark N ) = to make [ ART ~] Real(work N ) = to do [~] Syn(exactly) = neither more nor less Box 2.2: Lexical functions of phraseological phenomena according to Mel’čuk (1995: 186) Like a linguistic sign or a construction, these functions consist of a semantic component, which gives the general purpose of an expression like Magn, which expresses the meaning of intensification such as “intense(ly)” or “very”, Oper, for support verbs for performing or doing something, or Real, to add the meaning component of “realise” or “fulfil”. The form or phonological realisation is then the result of a lexical unit (argument) which selects based on lexical function a number of lexical expressions in order to form a phrase. Since neither more nor less is similar in meaning to exactly, it might be regarded as a kind of synonym or Syn-function according to Mel’čuk’s terminology (Mel’čuk 1998: 32-33). Similar to Cowie and Howarth’s “collocations with limited choice”, lexical functions are a more operationalised way of expressing an underlying level of meaning. At the same time, this concept also suggests that there could be two levels of meaning, or function, at work: one coming from the word meaning of the constituents themselves, the other from the more abstract combination of a specific item with a certain semantic representation . Just like in construction grammar, where, based on the assumption of inheritance relations, the general function of an argument structure construction influences the interpretation of individual constituents, such as the verb (Goldberg 2006: 19-44). While Cowie and Howarth also explored the diagnostic potential of collocational proficiency and Mel’čuk focused on a more or less meta-lexicographic analysis, Franz-Josef Hausmann’s approach towards the co-occurrence of lexical items is more concerned with an adequate and comprehensive lexicographic description from an EFL point of view. With frequent learner mistakes and translational problems in mind, he regards institutionalisation and unpredictability as the central aspects of collocation. This emphasises a contrastive aspect and explicitly takes the native language of a learner into consideration. Accordingly, he defines collocations as “semi-preconstructed items within language, which 44 2 Collocations as Constructions are not a speaker’s ad hoc creation but retrieved from memory as a whole unit; the hearer would presumably perceive these as familiar.” 15 (Hausmann 1984: 398-399). To Hausmann, these “typical, specific and characteristic relations between two words” 16 (Hausmann 1985: 118) need to be given special attention within an EFL teaching context, since they enable the learner to communicate more fluently and in a natural, native-like way. Figure 2.2: Typology of word combinations (based on Hausmann 1984 and Bartsch 2004: 38) As can be seen in figure 2.2, Hausmann further distinguishes affine combinations from free combinations. However, to him these co-creations are less interesting within an EFL context, for they are built spontaneously and based on extralinguistic meaning restrictions: they are, so to say, unrestricted combinations of ordinary words. Counter-creations, on the other hand, stand at the most creative end of the spectrum of non-fixed word combinations. Since combinations like racy tone are, according to Hausmann, predominantly part of a certain more or less idiolectic style of an author. (Hausmann 1984: 399) So again, do work and make a remark for Hausmann would qualify as collocations, since, to learners of English, there seems to be no concept-inherent 15 German original: „Halbfertigprodukte der Sprache, welche der Sprecher nicht kreative zusammensetzt, sondern als Ganzes aus der Erinnerung holt und der Hörer als bekannt empfindet.” 16 German original: „[…] typische, spezifische und charakteristische Zweierbeziehung von Wörtern […]“ 2.2 Significance-Oriented Approaches 45 reason why it should not be * make work 17 or * do a remark . On the contrary, in German for example Arbeit machen (lit.: make work) would be as accepted as Arbeit tun (lit: do work). As a result, this might cause confusion and represent a potential source of mistakes when transferring this concept from a German L1 to an English L2 context. Neither more nor less on the other hand consists of more than two words and would, therefore, be rejected on formal grounds, as within Hausmann’s framework there is only room for two constituents, a freely chosen base and its semantically dependent collocator . Both are linked via an arbitrary restriction on lexical choice. Other than word combinations, however, collocator and base, according to Hausmann, remain semantically independent (Hausmann 1984: 400-401). Scornful tone , on the other hand, might instead be considered a case of co-creation , since it could be well expected from an extralinguistic perspective that a concept like tone can be combined with the concept of a negative connotation, like scornful , but also (near)synonyms like condescending or dismissive are equally likely and possible. A combination like racy or Mozartean tone 18 , however, could be regarded as a counter-creation, since the constituents’ co-occurrence cannot be predicted based on either extralinguistic experience or linguistic institutionalisation. Interestingly enough, Hausmann seems to presuppose that a certain degree of selectional restrictions exists even within counter-creations, since amongst all the examples he gives, there are none which are not decodable or which seem to be absolutely wrong or gibberish. This raises the question of where to draw the line, not only between collocation and creative counter-creation but also regarding the point where a word combination should be regarded as wrong, rather than creative. Unfortunately, despite a clear notion of the elements which constitute a collocation, Hausmann’s concept as such remains rather vague. He does not explore the source of unpredictability, which might not only vary depending on the native language of a learner but also on very individual aspects like linguistic proficiency. This makes base and collocator useful terms for lexicographic purposes, but when it comes to learner-oriented cognitive aspects within the concept of collocation, the question remains whether it is indeed the collocator which needs to be remembered and therefore stored with the, rather free, base, or if it might be possible that the collocator as well contributes a kind of 17 Examples which are considered to be not at all established are marked with an asterisk to indicate that the text is aware of the fact that this is very likely not a good, and may even be an incorrect example. 18 These examples are taken from genres which are, in accordance with Hausmann, rather prone to creative use of language like the headline of a newspaper “New Madonna Tour Sets Racy Tone” (Zamost / Snead 1987) or an essay on one of Shakespeare’s comedies (Warren 1979: 84). 46 2 Collocations as Constructions additional, more independent meaning perspective to the overall collocational meaning. Admittedly, Hausmann does not reject the idea of a kind of reverse analysis - from collocator to base - completely. However, he regards this as a linguistic research question, which lacks a sense of the learner’s reality (Hausmann 2007: 218). But as table 2.5 shows, knowing about meaning restrictions which are caused by the collocator can just as well prevent learners from nonnative like utterances. raw frequency MI z-score t-score loglikelihood look 8 6.5697 25.5533 2.7987 57.104 eyes 6 4.7670 11.2471 2.3595 28.1301 glance 5 8.1990 34.3676 2.2285 46.9042 tone 4 6.9172 19.0573 1.9835 30.4448 laugh 4 8.2853 30.7963 1.9936 37.9964 voice 4 4.9429 9.3448 1.935 19.6867 laughter 3 7.4938 19.249 1.7224 25.2133 Table 2.5: List of noun collocates for scornful according to the BNC [ scornful +N] for example, as in scornful tone , seems to combine with a semantically rather limited set of bases. Table 2.5 shows a list of all nouns which occur, with a span of four, more than three times within the BNC and have scornful as a premodification. All of them express either visual or auditory means of perception. Hence, there might be a kind of semantic pre-selection for the N -slot in [ scornful +N] which makes a combination like scornful smell seem more unusual and creative than scornful eyebrow ( BNC AL0 233 ) or scornful accusation ( BNC AAL 822 ), even though, through metaphorical extension, both could be expressed via either eyes or voice. For a cognitive account of the word combination scornful tone this could indicate that, even if there seems to be a free co-creation at first glance, there are certain restrictions on the combinatorial potential of these word combinations which might be cognitively stored in a native speaker's mind, which any learner needs to be explicitly or implicitly aware of in order to use a language in as native-like 19 a way as possible. An underlying relationship 19 As mentioned in the previous chapter (> 1) native-like proficiency seems to be a rather heterogeneous concept (compare Dąbrowska and Street 2010; Dąbrowska 1997). Therefore, chapters 5 to 7 will discuss and analyse different levels and concepts of nativeness in more detail. 2.2 Significance-Oriented Approaches 47 like this resembles to some extent, Sinclair’s concept of semantic prosody (> 2.1; 3.2). However, a new aspect would be that a collocation can be analysed as a potentially bi-directional concept. One of the first authors within collocational research to explicitly advocate for this mutual influence between the constituents of a collocation is Dirk Siepmann (2005: 427). In his evaluation of linguistic co-occurrences, he not only takes into account that a word might be interrelated with a more abstract semantic concept but also that semantic concepts as such could frequently be observed together. He comes to the conclusion that “[…] collocational phenomena span the entire range of morphosyntactic constructions. The terms ‘collocation’ and ‘construction’ turn out to be almost synonymous, a clear indication of the fact that phraseology is at the centre of language rather than at the periphery.” (Siepmann 2005: 430). Taking into account that Firth (> 2.1) and Palmer (> 2.2) also more or less implicitly considered collocations as a pairing of selectional restrictions of form and meaning, this is a rather obvious conclusion to make. What is innovative here are the grounds on which a constructional meaning can be based. Collocations with semantically rather vague words like do or make as in do work or make a remark are particularly difficult to grasp, even if it is possible to find a definition for the meaning of the [ NP ] in [ do + NP ] this is often hard to maintain throughout all instances of a corpus let alone native speakers’ performances (Faulhaber 2011: 304-309). Any semantic explanation of a slot within a construction in general and collocations, in particular, can therefore only be a prototypical approximation. Nevertheless, knowing about these potential restrictions comes naturally to a native speaker and is valuable information for the L2 learner, or as Siepmann stresses “[h]owever, although semantic relationships can only be discerned post hoc, we should not forget that they may lighten the language learner’s task.” (Spiemann 2005: 432) Thus, the basic assumptions of significance-oriented approaches can be seen as follows: overall, these approaches revolve around the question of the reasons why a combination of words should be regarded as a collocation. They therefore, operate on a syntagmatic as well as a paradigmatic level but focus on a collocation’s restrictions as far as these two dimensions are concerned. Syntagmatic restrictions often come in the shape of a more confined meaning of the collocation itself, since in most cases it is rather specific and does not encompass the full semantic and / or functional spectrum of the individual lexical items, which constitute this co-occurrence. This has been accounted for as (non)compositionality or semantic opaqueness . Paradigmatic restrictions, on the other hand, are combinatorial restrictions. They describe the fact that constituents of a collocation cannot be exchanged deliberately, even if they are substituted by close synonyms. The actual degree of these constraints can vary 48 2 Collocations as Constructions considerably, and often results in a vague and, at times, subjective subdivision of the continuum between compound, collocation and free combination. In significance-oriented publications, these restrictions have been labelled variability or unpredictability . Furthermore, collocations can be subdivided into fully lexicalized and partly delexicalised structures. The latter are clustered, based on the identification of a joint underlying meaning and / or function dimension. They are multi-directional, which means that each part could be regarded as an influencing factor for the other. The issue of the syntagmatic and paradigmatic restrictedness of a collocation in partticular poses problems for any kind of contrastive application, such as translation or foreign language learning. 2.3 A Construction-Oriented Approach In comparison, contextas well as significance-oriented approaches share the view of a collocation being formed by syntagmatic and paradigmatic levels of meaning. This links in with construction grammar, which assumes that any (recurrent) structure within a linguistic system consists of a level of form but also a designated meaning perspective (> 4). Therefore, it will be argued here that collocations, like morphemes or words, are in fact form and meaning pairings and as such constructions in a construction grammar sense. To favour an approach from chapters 2.1 or 2.2 for a working definition, however, does not seem possible, since neither statistically-oriented methodology nor a lexicographic list of a word combination’s restrictions are, at this point, able to provide a comprehensive definition of collocational phenomena. As chapter 2.2 has shown, paradigmatic restrictions are particularly difficult to explain, since whenever there is a definition which seems to have grasped the exact nature of limited variability, an example can be found which proves the previous interpretation to be a misconception. Take, for instance, the often quoted word combination commit a crime . In terms of a limitation of interchangeability, commit as a collocate can be found with noun phrases such as adultery , burglary , crime , deed , error , impropriety , indiscretion , mistake , murder , offence , robbery , sin or suicide ( OCD 2: commit ). Semantically, all these potential collocates seem to have in common that they roughly-speaking refer to an offence for which one can or will be held accountable by some kind of authority. Yet, lie , which is semantically close to nouns mentioned above such as perjury, does not occur with commit either within the BoE or the BNC . Furthermore, even native speakers of English seem to reject the combination commit a lie as being wrong (Klotz 1998: 93-95). Examples like this have often lead to an understanding of collocations as an idiosyncratic and more or less item-specific 2.3 A Construction-Oriented Approach 49 combination of lexical items (Herbst 1996; Hausmann 1984). But since collocations are regarded as constructions here, a construction grammar approach with the concept of inheritance relations at its centre might help us to understand this problem. Commit a lie , for example, is not completely non-existent within the English language. It can be found whenever a text or, to be more precise, its author, thinks that lying is not just something one does on a regular basis, but is in fact a kind of crime, which consequently will be judged by a moral authority. This is often the case in religious discourse, and therefore it seems that [ commit + NP ] does actually maintain its basic meaning. Of course, commit a lie and the more accepted commit perjury are far from being synonymous. Even though both express an act of not telling the truth, perjury is restricted to the language of the law. So, if commit perjury is more likely to be found in a corpus, this might be due to the fact that in a rather secular world, a legally “proper” crime like perjury is much more noteworthy than an ordinary lie , but not because commit a lie is not a possible combination. The potentially underlying meaning of “an offence, for which one can or will be held accountable by some kind of authority” becomes even stronger after a closer look at collocations like commit an impropriety or commit a mistake . At first glance, it seems that neither the aspect of “offence” nor the feature of “accountability” are particularly strong within these word combinations. Under closer examination, however, it becomes apparent from examples (9) to (11) that, on the contrary, each impropriety or mistake is a somewhat euphemistic expression for a serious offence if it co-occurs with the verb commit . The printers in (9), for example, were severely punished by King Charles I for their seemingly blasphemous behaviour, while the mistakes in (10) are nothing less than kidnapping and charges for acting as an accessory to assault 20 . Finally, the impropriety in example (11) is in legal terms a serious offence which usually is brought in front of a High Court and is therefore anything but a minor procedural lapse. (9) BNC CCB 1053 His Majesties Printers, at or about this time, had committed a scandalous mistake in our English Bibles by leaving out the word NOT in the Seventh Commandment. (10) BNC HSL 1335 Under these circumstances she was left open and vulnerable to committing mistakes which the enemy exploited. 20 The source of this sentence is an article from a feminist magazine called Spare Rib. The person in question is Winnie Mandela. After this statement was published by the NEC (National Executive Committee of the African National Congress) in 1989, she was in fact convicted of kidnapping and being accessory to assault 50 2 Collocations as Constructions (11) BNC EVK 1630 This enables the Court to review the decisions of government ministers, inferior courts, tribunals and other administrative bodies to ensure that they do not act illegally, irrationally, or commit some procedural impropriety (per Lord Diplock in C. C. S. U. v. The Minister for the Civil Service (H. L., 1984) Furthermore, sentences (12) and (13) below show how English native speakers use this additional meaning dimension of “offence” almost consciously. In (12) for example, the sentence describes a bill for a change in legislature, according to which a served penalty would be erased from a former convict’s record after a certain period. These words come from a supporter, and it is thus very likely that he purposely chose the much less alarming word indiscretion instead of crime to create the impression that these offences were excusable banalities and nothing to worry about. A similar case is slight indiscretion in sentence (13). It occurs as part of a statement within a crime novel and sounds rather cynical, given the fact that the referent here is the abuse of a girl, which in Oxford’s more conservative circles seems to be regarded less severely than a homosexual relationship. (12) BNC HJ7 116 […] If a man has committed an indiscretion that brings him before the courts and results in his being convicted and penalized, it must be right that after he has served the penalty and lived it down by a substantial period of good conduct thereafter, it should be without meaning for most people of good will.' (13) BNC HTR 343 He came out of it as someone who might have committed a slight indiscretion, no more, and a heterosexual one at that. These examples show that there are cases of paradigmatic restriction or supposed item-specificity which can be explained against a context-oriented background. For a construction-oriented perspective, this suggests that it is not enough to look at the individual constituents, which make a collocation or the overall rather delexicalised construction such as for example [Adj+N], but that there is an additional layer of meaning. This additional, semi-lexical construction then influences the dimension of lexical constructions, such as individual words. Thus it either restricts - as in commit a lie - or modifies, like in commit a mistake or commit an indiscretion , the interpretation of the whole word combination through inheritance relation (> 4). Moreover, it can be assumed that 2.3 A Construction-Oriented Approach 51 these semi-lexical constructions work multi-directionally, which means, every constituent is part of a semi-fixed construction and influences the selection and interpretation of all other elements involved. At the same time, it is crucial to note that a semi-lexical construction does not coincide with the function or meaning dimension of the lexical item as such. Other instances of commit follow a different set of interpretations and are not at all influenced by any aspects of meaning like “offence” or “accountable”. Hence, semi-lexical constructions can instead be described as the necessary circumstances which need to be available once a lexical item occurs within a certain overall construction, like, for example, scornful in [Adj+N]. Figure 2.3 illustrates the different levels of constructions which might be at work in a word combination. Figure 2.3: Form and function levels of collocational construction, taking scornful tone as an example An intermediate semi-lexical level is however not to be regarded as some kind of distributional rule, since the meaning dimension in a construction is rather something which is always there and applies to a certain extent. Without any further influence from other lexical items, the negative aspect in [Adj+ tone ] dominates even a quite neutral combination like conversational tone, as can be seen in sentence (14). Yet, with a modification such as pleasant , as in (15), the feature of “deviating negatively from the norm in terms of volume or emotion” fades into the background until a closer look at the context reveals that the message conveys rather sombre information, namely the possibility of someone dying, which again fits the negative aspect of [Adj+ tone ]. 52 2 Collocations as Constructions (14) BNC AEA 1218 Elisabeth smiled, hoping to lighten the conversational tone and distract the Colonel from his purpose. (15) BNC A7J 858 ‘I suppose Mike out there will be the next to go,’ he said in a pleasant conversational tone. With an additional dimension of form and function, variability ceases in most cases to be unpredictable or inexplicable, which is in itself not a very remarkable observation. In his book Semantics Frank Palmer had already noticed: “[…] one can, with varying degrees of plausibility, provide a semantic explanation for even the more restricted collocations, by assigning very particular meanings to the individual words.” (Palmer 2 1981: 77) A constructional perspective on this matter is, however, different for two reasons: Firstly, within this approach, each lexical item retains its own meaning and / or function. It is simply influenced by an inherited meaning from a more general construction, which, unlike Palmer suggests, does not result in an infinite number of more or less arbitrary word meanings but rather adds another construction to the mental lexicon, which then contributes the same, stable meaning dimension to any context it is used in. Another point is that a level of semi-lexical constructions within collocations seems to fit seamlessly into the usage-based approaches which claim that constructions develop through a constant analysis of lexical chunks based on the principles of analogy and similarity (Bybee 2010: 57-58). This would make an intermediate constructional dimension for collocations a useful unit, not only from a descriptive construction grammar point of view but also for language acquisition research. For, if semi-lexical constructions develop gradually, they could be useful indicators for various stages within the language acquisition process. Furthermore, foreign language learners would benefit from a more accurate description of collocations and their various dimensions of function and meaning, since it would not only enable them to produce more native-like language but also to better understand elements of irony, euphemism or sarcasm within a text. In the long run, this might also lead to a more profound understanding of the foreign language itself, as deliberate analysis of levels of meaning within an EFL classroom could help students to understand language as a system and free themselves from a superficial slot-filler perspective. A working definition based on construction grammar assumptions, therefore, could read as follows (based on Goldberg 2006; Siepmann 2005): A collocation is a construction, which consists of a lexical form and a functional or semantic meaning. The lexical form of a collocation can be rather fixed - and therefore closely related to a compound - semi-fixed or delexicalised. Semi-fixed collocational constructions open up one or more slots, which through inheritance relations influ- 2.3 A Construction-Oriented Approach 53 ence the meaning of any constituent chosen to be used in this particular slot. The constituents of a collocation are interdependent; each can be regarded as a fixed point or an exemplary item within a slot. Depending on the number of constituents of a collocation, they are at least bi-directional. This conception of a collocation’s form and meaning dimension amount to the following hypotheses 1) Depending on the functional and semantic properties of this constituent and the slot it is used in, the actual manifestation of a collocation can be regarded as usual (total fit), creative (only the major definition elements fit) or inappropriate (one or more defining elements contradict each other). 2) Evaluation and understanding of a collocational construction are the results of a usage-based acquisitional process and, therefore, depends on linguistic experience and proficiency. Picking up on Siepmann’s caveat that “[…] semantic relationships can only be discerned post hoc […]” (Siepmann 2005: 432), this approach towards collocations, of course needs to address the general question whether a semi-lexical constructional level of collocations is cognitive reality in the first place or rather an additional descriptive dimension for lexicographic purposes. Admittedly, the close interdependence of constructions makes it difficult to examine each and every level of an utterance separately, especially when the object under investigation is neither purely functional nor a simple sum of individual word meanings. Since lexical variability is by definition the heart of any semi-lexical constructions, it might, however, make sense to keep the investigated items constant and vary the cognitive reality. In other words: if speakers of a language with a similar cognitive background are clearly influenced by the meaning of a semi-lexical construction while others are not, this might suggest that this additional dimension of meaning is cognitively real, at least for a certain group of speakers. In this context a comparison between native speakers of English and speakers of English with a non-native language background seems to be the obvious choice, but differences between the perception of children and adults or between non-native speakers of English with different learning experiences might also provide useful insights into the mental mechanisms which are at work once someone processes a collocation. This preliminary working definition of collocation presupposes that all collocational combinations share the same constructional nature. It furthermore defines creativity as a natural aspect of human cognition. Yet, the degree to which creative readings are supported by a potential constructional meaning needs further clarification, as does whether contextual factors play a role in the perception of collocational combinations. 54 2 Collocations as Constructions Thus, to create a comprehensive model of collocations which accounts for creativity within collocations in general as well as potentially different levels of creative alternations, the cognitive dimension of creativity and its consequences for the concept of collocation need to be discussed in more detail (> 3). Chapter 4 will then combine these implications with more general thoughts on language acquisition, in order to form a comprehensive model, which then will be put to the test in subsequent chapters (> 6; 7). 2.3 A Construction-Oriented Approach 55 3 Collocations and Creativity Establishing and maintaining a balance between formulaicity and creativity seems to be essential for successful acquisition, but in taught adults, this is difficult to achieve, with the learner most often erring on the side of too much creativity. (Wray 2002: 148-149) As the previous chapter showed, the observation that even phraseologically fixed items such as idioms or collocations can be subject to change was already part of early linguistic work on these aspects. Palmer and Hornby present different, at times antonymous, NP -collocates in their collection for Thousand- Word English , for a verb like do , for example, they include “do good, harm, etc.” (Palmer / Hornby 1937: 43). Later, Cowie even uses the number of lexical choices within a collocation as a defining feature of his categorization (> 2.2; Cowie / Howarth 1995). Also more context-oriented approaches, like most of Sinclair’s corpus-based studies, take into account that there is more than one option for a collocate to be realised (> 2.1). So, while variation and a varying degree of fixedness have been part of collocational studies almost since the very beginning, it is interesting to note that the conclusions drawn from these observations have mostly been very pragmatic: the different forms of collocational variation were acknowledged as a defining, yet unpredictable, feature and hence not analysed any further. Some publications, however, have gone beyond these rather straighforward observations. Concepts like lexical sets (Halliday 1966), semantic prosody (Sinclair 1996, 1998, 2004; Louw 1993), commutability (Howarth 1996), lexical priming (Hoey 2005) and exploitations (Hanks 2013) all account to varying degree for the fact that variation in collocations is not only possible but can also lead to further implications. This chapter will discuss a selection of more traditional views on collocational variation (> 3.1). It furthermore discusses how these observations can be interpreted against a cognitive linguistic background (> 3.2), showing in more detail how creative variability could support a constructional perspective on collocations. 56 3 Collocations and Creativity 3.1 Creative Variation of Collocations Combinatorial unpredictability has frequently been regarded as a defining feature of collocations (> 2; Howarth 1996; Hausmann 1984; Palmer 1976). However, in allowing for variation, the question is, what exactly is an unpredictable combination? This is usually answered ex negativo by contrasting native speakers’ evaluations or corpus data with combinations which in terms of general semantic understanding of a lexeme and its semantically related units (such as hyponyms, co-hyponyms, hyperonyms, and synonyms) should be both possible and acceptable. If these combinations then do not or only rarely occur in a corpus or are rejected by a group of native speakers for no apparent reason, this is often referred to as unpredictable combinatorial behaviour. A variation, on the other hand, is then its positive equivalent, namely, semantically related words which are possible as well as accepted and hence can share the same collocate. To account for the fact that these variations belong to the same item, Halliday introduced the term lexical set , which refers to a group of these words (Halliday 1966: 152-153). Take for example the following instances of pretty : (16) BNC JYA 2304 Lina was a pretty girl, with a totally natural smile and tangled dark curls. (17) BNC BN1 644 The poor young woman, a pretty creature, flushed scarlet and said […] (18) BNC CAD 3130 If you take away the image all that's left is a bunch of exceptionally pretty boys making some very ordinary music. (19) BNC ANK 1635 You are a pretty boy, isn't he a pretty boy, Bob? Girl , woman , and boy could be seen as co-hyponyms of the hyperonym human beings , as such, they share a considerable amount of features, in fact from a structural semantics point of view, their only discerning components would be ± ADULT , for girl / woman , and ± FEMALE , for girl / boy . Yet sentences (16) and (17) might be the only undisputable “natural” co-occurrences here, belonging to the lexical set of [ pretty +N], while (18) and (19) seem to be somewhat special, if not strange. Therefore, Frank Palmer argues in the first edition of Semantics (1976: 97) that any phrases which contain a noun and pretty as its collocate should be seen as an idiosyncratic combination, as the adjective does not co-occur with every noun but is restricted in its combinatorial behaviour. Of course, including ‘+female’ in the word’s definition would easily explain why lexemes with a (potential) female reading are more likely to collocate with pretty , but Palmer 3.1 Creative Variation of Collocations 57 dismisses this thought as “rather perverse” (Palmer 1976: 96). However, in the second edition he admits: It would, however, be a mistake to attempt to draw a clear distinguishing line between those collocations that are predictable from the meanings of the words that co-occur and those that are not. […] For one can, with varying degrees of plausibility, provide a semantic explanation for even the more restricted collocations, by assigning very particular meanings to the individual words. (Palmer 2 1981: 77) Furthermore, he suggests subdividing collocational restrictions into three types: combinations which are very unlikely to occur because of the items’ individual semantics, such as green cow (type 1); collocations which only admit a certain collocational range such as pretty , which, in its attributive use, seems to be restricted to any noun with the inherent concept of femininity (type 2); and very strict restrictions which only allow a certain collocate in order to express a certain concept, such as rancid for bacon which is not fresh anymore (type 3) (Palmer 2 1981: 79). This introspection, however, leads Palmer to a very conservative reading of a word’s meaning and its combinatorial properties. A BNC query - admittedly a tool which was not available at that time - for pretty or rancid yields a plethora of examples which Palmer would have classified as unacceptable, such as rancid Stilton ( BNC CHA 4470 ), rancid words ( BNC HNJ 309 ) or pretty boys . In fact, it is especially these more or less unexpected uses that demonstrate the scope of interpretative value which accompanies a combinatorial choice like in (18) or (19). For it is the implied aspect of femininity which gives boys in (18) a kind of androgyne flavour 1 and makes pretty boy in (19) a provocation, implying in this case unwanted homosexual tendencies, rather than a regular address. The same works of course for rancid which, as the Oxford Advanced Learners Dictionary ( OLAD ) points out, can refer to any kind of fat which “[…] tastes or smells unpleasant because it is no longer fresh” (OALD 7: rancid ). Therefore, one would expect rancid Stilton to have a different olfactory quality than mouldy or smelly Stilton , the latter being rather a default feature of this type of cheese. This study will therefore refer to accepted alternations, which, so to say, belong to the same lexical set , as variation or established alternations , while combinations which might seem acceptable but potentially need some more interpretation will be called creative alternations . Of course, one could argue that these more creative uses of a collocate are, to a certain degree, just metaphorical extensions from the fully accepted collocation, but these readings seem to live on within 1 This sentence refers to the alternative rock band Birdland, who, with their slender figures and blond bobs, are indeed good examples of an androgynous image. Later on in the text they are even referred to as “blond bombshells” (BNC CAD 3116 ). 58 3 Collocations and Creativity more creative combinations. To a certain extent, this makes Palmer’s typology of collocational restrictions superfluous, first and foremost because, as Herbst stresses, an image such as green or purple cow might seem unusual but is in no way related to any language-inherent restrictions (Herbst 1996: 386). Especially in fields like literature, news, advertising or internet-talk one might come across the most unusual combinations. However, just because a green cow is a phenomenon one might not simply encounter on a trip to the countryside, this does not mean that, within the right context, it is not as acceptable as a red dragon or a blue unicorn . So the imaginary nature of the concept as such cannot and should not influence a linguistic analysis of restricted language use. Collocational restrictions like type 2 and 3, on the other hand, paint a more diverse picture than Palmer leads his readers to believe. One, because his supposed restrictions yield actual instances within a modern corpus like the BNC , but also because in some cases these more or less unconventional combinations add a new semantic dimension to the concept of collocation. The mere fact that these interpretations exist shows that under certain circumstances collocates could contribute their own level of meaning. To some degree, this is of course something most words do, but what is striking here is that this additional reading seems to stem from more prototypical and generally accepted collocations a word has been used in before, like ‘+ female’ from pretty girl or ‘+ fat that is not fresh anymore’ from rancid bacon . This, however, yields two questions: first, whether these additional semantic dimensions are inherent knowledge which all native speakers of a language share and also whether they all share the same understanding and interpretative boundaries. Furthermore, it is also questionable whether, assuming this phenomenon of collocational meaning transfer exists, it is a feature of the collocation or rather a reinterpretation furthered by other factors such as context 2 or a kind of no-nonsense 3 principle (Clark / Clark 1977: 72-73). Hausmann (1984), for example, also acknowledges a kind of more or less creative, even pun-like use of collocational structures. However, here they stand in contrast to institutionalised co-occurrences like collocations, which, according to his typology, consist of one freely selected ( base ) and another restricted element ( collocator ), whereas counter-creations , 2 Searle already questioned the concept of meaning without context ( zero context) when he remarked on the question of literal meaning : “I shall argue that for a large class of sentences there is no such thing as the zero or null context for the interpretation of sentences, and that as far as our semantic competence is concerned we understand the meaning of such sentences only against a set of background assumptions about the contexts in which the sentence could be appropriately uttered.” (1979: 117) 3 Also compare Herbert Clark (1978) on meaning comprehension and interpretation as well as Eve Clark’s thoughts on the contrastive principle (Clark 2 2009: 133-134; 1988). 3.1 Creative Variation of Collocations 59 as he calls the more creative, collocationally inspired co-occurrences of words, are freely selected and characteristic of a more individual, rather literary style (Hausmann 1984: 399). As has been mentioned before, Hausmann would presumably label examples (18) and (19) as counter-creation . But each of these sentences contains a euphemistic or even cynical tone, created by a somewhat creative use of [ pretty +N]. Thus, it is questionable whether these instances of [ pretty+ N] are anything more than instances of one author’s unique and individual style. As ever, so often the choice of collocates within a collocation and their general acceptance by a language community might be subject to gradience, with highly frequent and well-established instances like pretty girl at one and pretty man at the other end of the spectrum (Klotz 1998: 88-95, Herbst 1996, Palmer 2 1981: 75-79). Yet, the choice of one collocate alone already seems to guide the recipient’s thoughts towards a certain interpretation for the second part of a collocational pair. Thus even rather unusual fillings for the second collocate, like boy or man , which at first glance might not seem to fit the semantics of a collocation, can be interpreted against the backdrop of its more established reading. This yields the question whether readings such as [ pretty+ N <female (human) being> ] or [ rancid+ N <s.th. which literally or metaphorically tastes or smells unpleasant because it is no longer fresh> ] work with any filling for the [N]slot as long as it does not contain a semantic feature which actively prevents an even more creative reading (cf. Bybee / Eddington 2006). Bisecting established and creative alternations into two different phenomena, however, disguises the fact that both cases might refer to the same, to a certain extent prototypical, reading (cf. Bybee 2013). This interpretation is similar to a phenomenon Sinclair (1991, 1987b) observed during his corpus research; studying lexical patterns such as set in or happen , he realized that they are almost exclusively connected to a negative NP such as impoverishment or accident (Sinclair 1991: 73-75, 111-112). This led him to the conclusion that there might be an additional semantic-pragmatic level, which requires a certain semantic quality, for collocates of a node. In an analogy with phonology 4 , Louw (1993) later coined the term semantic prosody for this sort of “consistent aura of meaning with which a form is imbued by its collocates” (Louw 1993: 157). Several publications have defined, analysed and re-defined semantic prosody since, amongst them Stubbs (2001, 1995) and Bublitz (1995), who contributed several sample analyses, Partington (2004, 1998), Tognini-Bonelli (2001) and Xiao and McEnery (2006), who approached the subject from a contrastive, partly pedagogic perspective as well as Hunston (2007) and Stewart (2010), who tried to structure some of the terminological debate. But besides 4 For one of the early yet quite comprehensive accounts on prosody see Firth (1948). 60 3 Collocations and Creativity a general understanding which assumes that semantic prosody is a meaningrelated phenomenon which can be observed across different established alternations of a collocation, there are also points of disagreement (Hunston 2007: 250). Since most of these aspects are also going to be relevant for the study at hand, the next lines will outline these aspects against the backdrop of established and more creative alternations of collocations. A first, fundamental question is, at which level does semantic prosody operate? Since it is a phenomenon which has predominantly been studied for collocates of a node word, it might work on a combinatorial and syntagmatic as well as a variable, paradigmatic level. Hence, it could either be part of the extended meaning, or function, of a lexical unit (Partington 2004: 132-133) or belong to the overall features of a collocational sequence (Sinclair 2004: 35). The methodology behind most analyses suggests semantic prosody to be a property of a single lexical unit, since collocates of a node are searched for mutual aspects of meaning in order to find an overarching reading for the companions of the word. Yet, if semantic prosody is part of a word’s meaning, it would follow that this reading is always applicable, independent of the pattern in which a word is used. For pretty for example, the consequence would be that ‘somewhat female features’ are always implied, but this is only true for NP s, while [ pretty +Adj] instead triggers a reading of pretty as an intensifier similar to very , as in pretty obvious or pretty difficult . Of course, this reasoning only holds if [ pretty +N] and [ pretty +Adj] are seen as two different uses of the same lexical item. Thus, within an approach which regards both instances as separate entries in the mental lexicon, the implication of semantic prosody could well be interpreted as part of the word meaning. Chapters 4 and 7 will pick up on this question and discuss it against the background of a construction grammar dimension. A second aspect in the scope of semantic prosody is its potentially binary nature. Initially, the phenomenon had been treated as the tendency of a node to co-occur with more positively or negatively charged collocates (Sinclair 1987b). Some approaches maintained this view, defining semantic prosody as a property with a binary quality (Partington 2004; Channell 2000; Louw 1993). Other researchers, such as Sinclair (1991: 110-115), argue for more diverse implications. Take again the example of [ pretty +N]: assuming that the choice of pretty as a premodification does indeed influence the interpretation of a [N] as in sentences (1), and (16) to (19) 5 , a primary division into an either positive or negative reading, it would follow that ‘somewhat female features’ are either good or bad. But, while examples (18) and (19) might have negative implica- 5 Compare the respective examples from chapter 1 and 3.1. 3.1 Creative Variation of Collocations 61 tions, neither (16) nor (17) or even the more creative (1) convey a bad or even rude meaning. So, unless one wishes to postulate an additional binary division into male / female, a more detailed description of an item’s semantic prosody might be more useful for a fruitful interpretation of this kind of additional level of meaning. The third and probably most complex question concerns the pervasiveness of semantic prosody. This aspect falls into two subcategories: pervasiveness within genre or discourse-related context and pervasiveness outside a certain context. In short, the question here is about the exact scale of semantic prosody. Within a certain context, Partington (2004), for example, reasoned there might be instances which are more prone to a reading along the lines of their semantic prosody than others, while Sinclair (2004) would argue that semantic prosody is an obligatory property which, if it can be observed, holds for all instances within a sequence. In this case, this feature would trigger a certain reading in any given context. Whitsitt (2005), however, challenged this position, observing that while [ cause + NP ] yields a negative interpretation for the filling of the [ NP ] in a genre like ‘news’, the same pattern has instead a neutral reading in academic publications. Applying these observations to the example of [ pretty +N], this would imply that the reading of ‘somewhat female features’ applies to all kinds of nouns, irrespective of the [N] as such or the context in which it occurs. Interestingly, as the examples in chapter 1 showed, this interpretation carries into various contexts such as literature, media or everyday conversation. Yet, while applicable to quite a few, even less expected [N]s such as face , man or boy , not every noun seems to embrace the underlying reading of ‘somewhat female features’ with the same vigour. In connection with a noun-collocate referring to a human being, this relation holds quite well, and even for animals, as in (20), pretty and handsome appear to be used within a female or male context respectively. However, comparing the effect of the premodifications pretty and handsome on inanimate [N] referents, the picture becomes hazier. For sentences (21) and (22) it already takes some imaginative effort to construe a more female or male reading; in these cases, a pretty house actually seems to be the more common and neutral expression, whereas a handsome house might have a kind of pretentious grandeur to it. It is highly questionable though whether this is because women are more frequently associated with the interior, domestic domain while men’s sphere is claimed to be the outside world of business and politics. Neither would it explain the derogatory tone in (23) nor why there are only very few hits for pretty + landscape and none for handsome + landscape in the whole of the BNC . 62 3 Collocations and Creativity (20) BNC A17 1047 Competitions for the most handsome dog and prettiest bitch need no explanation. (21) BNC H7H 505 It was a pretty house too; being built beside beautiful entrance gates, gates hinged to cut stone posts, dignified as pillars in a temple, gate lodges were designed with appropriate distinction. (22) BNC HGY 371 Because although it's a handsome house, and the gardens are extensive, they in no way compare to those of the castle which is just up the road. (23) BNC CHH 375 Each turn along the coast path reveals a beautiful sandy cove, or jagged rocks giving lie to the notion that this is just a pretty landscape. In example (23), however, the choice of just also seems to contribute to the slightly negative reading of pretty landscape . This indicates that semantic prosody might also be influenced by other lexical as well as structural factors, as for example Klotz (1997) shows for the case of cause . Nevertheless, these examples show that semantic prosody can be regarded as a rather pervasive feature, although there are also items which refuse to be coerced into a certain interpretation. At the same time, the fact that phrases like pretty creature or pretty boy receive an implicit female interpretation through their analogy to pretty woman is difficult to deny. To shed some more light onto these apparently contradictory tendencies, it might be useful to explore where semantic prosody comes from in the first place. Louw, for example, speculates that the “prosodic aura of meaning” develops through a period of diachronic change (Louw 1993: 164) and Bublitz (1996) too believes that over time and through repetitive use, “the word adopts semantic features from an adjacent item” (Bublitz 1996: 11). Yet, the concept of a formal sequence of items, which is closely linked to a certain semantic or functional dimension of meaning, has also been one of the main topics within cognitive linguistics. In fact, the idea of form-function pairings which operate on every linguistic level constitutes part of the basis of cognitive, usage-based approaches, like emergentist approaches (> 4.2.1), Complex Adaptive Systems (> 4.3.3) or construction grammar 6 (> 4.3.1). Like semantic prosody within corpus linguistics, attempts to view phraseological phenomena such as collocations as a 6 Not all construction grammar approaches concern themselves with cognitive considerations. Sign-Based Construction Grammar (Boas / Sag 2012), for example, focuses on the formal description of a linguistic system rather than on its cognitive plausibility or mental representations. 3.1 Creative Variation of Collocations 63 combination of form and meaning were part of cognitive linguistic approaches from a very early stage on. Initial studies within this, then fairly new, field of linguistics, such as Lakoff’s (1990) survey of the different constructions of there , or Fillmore, Kay and O’Connor’s (1988) report on the construction let alone , quintessentially dealt with phraseological chunks and their function within a text or discourse. However, very often this meant that a sequence of words, like let alone, was assigned a rather idiomatic meaning. Basically, it resulted in treating these items as a kind of more or less fixed multi-word unit. Later, research on phraseological items with respect to different groups of speakers was able to show that this kind of idiomatic meaning is not necessarily available to every native speaker at any time. In Nippold and Duthie’s (2003) research study on idiom comprehension in school-age children and adults, for example, children aged 12 still experience difficulties in detecting all levels of idiomatic meaning, which indicates that the inventory of phraseological items might be subject to a kind of acquisitional process. In addition, Wray (1999: 222) believes that “[t]he increasing automatisation of language […] is marked by a switch from a preference for literal interpretations of standard formulaic sequences (e. g. she has him eating out of her hand ) to their metaphorical counterparts, a process which is not complete until late teenage […]” (> 4.3.2). A combination of both approaches could then introduce a new level of reasoning to each conception and thus strengthen their respective explanatory value: incorporating a usage-based perspective into semantic prosody would lead to a lexically semi-fixed collocational construction, while a usage-based approach might be able to answer the partly restricted pervasiveness of semantic prosody. This would be the case if collocations, for example, enter a kind of life cycle, starting off as a rather fixed, almost compound like expression, which through frequent encounter and concomitant entrenchment, transition into a more flexible, semi-lexicalised structure which takes its basic meaning from a more established prototype. On the one hand, this would make the definition of collocation fuzzier, since collocational combinations then do not represent examples on a gradient scale, but should instead be seen as early (for established pairs) or late (for more creative versions) snapshots of a collocational structure, depending on the collocation’s developmental stage. On the other hand, identifying a kind of collocational meaning could not only facilitate EFL teaching but also improve teaching materials such as dictionaries and workbooks or even help to make the output of software, like online translations or other NLP applications more fluent and / or native-like. An important prerequisite for this kind of usage-based model of collocations and creativity is a firm anchoring of creative processes as a part of human cognition. This is not self-evident, since creativity is often understood as the exceptional faculty of only a few outstand- 64 3 Collocations and Creativity ing individuals, such as professional authors, composers or artists in general. Thus, the next chapter will show that this conception does not do justice to the pervasive character of creativity. 3.2 Creativity and Cognition The examples from previous chapters have shown that creative alternations of collocations are possible and that at least some of these variations semantically lean towards parts of the established meaning of the original collocation. A potential relationship between collocations, their creative alternations, and syntactic structures has been outlined. Since traditionally, however, the terms collocations , creativity, and constructions have been associated with quite different fields within linguistic research, this chapter will be dedicated to a more detailed introduction of the concept of cognitive creativity in language research and how it contributes to processes within human cognition. If we think of creativity , one of the first features which might come to mind is a notion of novel, innovative concepts. In this, it seems to differ quite dramatically from phraseological phenomena such as collocations, which are often associated with prefabricated, institutionalised or even partially fixed items (> 2). But, if creativity consisted only of unique and novel aspects, it would border on nonsense, since it would clearly lack a basis against which it can be understood. Anyone who has ever been to a country where s / he was not familiar with the local language knows from experience that a language made up of completely unfamiliar words and structures, for example, would hardly be more than just a string of characters and sounds. For this reason, creativity, linguistic or otherwise, needs a generally accepted framework to operate or, as Sternberg and Lubart (1999: 3) point out: “Creativity is the ability to produce work that is both novel (i. e. original, unexpected) and appropriate (i. e. adaptive concerning task constraints).” Therefore, outside academic discourse linguistic creativity is commonly associated in one of two ways; either as the rather rare and ingenious trait of an author, whose creativity is the acclaimed source of extraordinary pieces of work, or, more recently, as a kind of key characteristic for any modern human being, who needs creative ways of thinking and communicating in order to have successful job interviews, lead fruitful discussions or simply be regarded as an interesting person to talk to. In the past, several approaches have sought to account for this dichotomy. Craft (2001: 45-49) for example uses “high” or “big C” creativity for the former, while she refers to the latter as “little c creativity”. Kozbelt and colleagues even suggest a further subdivision, taking into account 3.2 Creativity and Cognition 65 innovation which is only novel to an individual (“mini-c”) as well as a separate level for professional creatives, like journalists, who have not (yet) produced an outstandingly creative piece of work (“Pro-c”) (Kozbelt / Beghettto / Runco 2010: 23-24). In addition to these four categories, they also point out six main factors within creativity research, known as the Six P’s of Creativity (Kozbelt / Beghetto / Runco 2010: 24-25): product, process, person, persuasion, potential, and place. For the study of language, these six factors mean that creative language use does not just consist of creative language per se, which might simply be explained through the process of language acquisition and its subsequent cognitive manifestation, but that it is also influenced by individuals who might be to different degrees capable (potential) as well as more or less inclined (person) to use language creatively. Furthermore, Kozbelt and colleagues explicitly include a potentially creativity-fostering setting (place) in this list. Hence, a focus on the creative product and process should only be a starting point for research into creativity, and person-specific factors, as well as the setting and context, should be considered too. This multifactorial approach has not always been part of the conception of creativity. Very early academic thought on linguistic creativity instead seem to focus on little c creativity. In Plato’s dialectic work Cratylus (Plato / Reeve 1998), for example, the author explores the relation between language and reality. There, the topic of creativity is taken up in the quite literal sense of creating something, in this case, words for people and everyday items, both of which Plato refers to as “names”. A discussion on the origin of these labels unfolds between Hermogenes, who claims that “[…] the correctness of names is determined by anything besides convention and agreement” (Plato / Reeve 1998: 2), and Socrates, who assumes that meaning and structure come from within an item. To illustrate his point, Socrates compares linguistic structures like phones, syllables and, with these, words, to a broken weaver’s shuttle and the process of (re)creating it 7 . 7 The dialogue between Socrates and Hermogenes reads as follows (Plato / Reeve 1998: 10-11): Socrates: Come now, consider where a rule-setter looks in giving names. Use the previous discussion as your guide. Where does a carpenter look in making the shuttle? Isn’t it to that sort of thing whose nature is to weave? Hermogenes: Certainly. Socrates: Suppose the shuttle breaks while he’s making it. Will he make another looking to the broken one? Or will he look to the very form to which he looked in making the one he broke? Hermogenes: In my view, he will look to the form. Socrates: Then it would be absolutely right to call that what a shuttle itself is. Hermogenes: I suppose so. 66 3 Collocations and Creativity From this conception of linguistic creativity , it follows that every item has an inherent meaning which influences its lexical shape and application. Therefore, if the meaning of an item precedes its name, all potential meaning dimensions should be an a priori part of its definition. A creative use of a word is then simply its correct application, for all a word does is to “divide things according to their natures” (Plato / Reeve 1998: 10). Since Plato focuses, however, more on labels and terms within a language, he does not elaborate the consequences his approach might have for lexical items which are used in an idiomatic sequence, such as idioms and collocations. Only recently have cognitive linguistic approaches put the human need to conceptualise his / her environment at the centre of any linguistic activity. But because these surroundings change with experience and outside influences, concepts need to be re-combined, expanded or even transformed on a regular basis; these processes are very similar to a usage-based linguistic conception. This becomes even clearer if one compares the three most common types of creative thinking, combinatorial , exploratory or transformational creativity (Boden 2001: 96-97) to general usage-based principles such as schematisation, analogy, entrenchment and competition (> 4). As will be shown below, all types of creative thinking can be defined and explained through these four basic cognitive processes, which suggests that creativity could equally well be interpreted as an inherent part of language acquisition. Combinatorial creativity is caused by the use of established concepts like words or phrases in a new, unfamiliar way (Boden 2001: 96). It draws on entrenched items, like a parent’s “ Juice gone ” once there is no more juice to drink or the praise of “ flowers pretty ” or “ Janie pretty ” to express that something is pleasant to look at (Tomasello 2005: 95). Through analogy, a child might then produce creative solutions, such as cookies gone , to express that, in analogy to the juice, there are no cookies to be seen, or bird pretty , to point out a particularly interesting bird. The resulting schemata are also called pivot schemes , since items such as gone and pretty link the concepts like a pivot (Braine 1963). The link itself, however, is created through a creative combination of familiar items. It is important to note at this point that a creative solution does not necessar- […] Socrates: And a carpenter must embody in wood the type of shuttle naturally suited for each type of weaving. Hermogenes: That ’s right. Socrates: Because it seems that there’s a type of shuttle that’s naturally suited to each type of weaving. And the same holds of tools in general. Hermogenes: Yes. Socrates: So, musn’t a rule-setter also know how to embody in sounds and syllables the name naturally suited to each thing? […] 3.2 Creativity and Cognition 67 ily have to be a generally accepted way of phrasing nor does it imply that the aspect of novelty needs to be new to everyone. What is important here is that the child created an utterance which s / he had never used before by combining items s / he is familiar with, which makes most creative processes at the early stages of first language acquisition a case of “mini c” creativity. The principles, however, remain the same for all creative levels, up to the masterpieces which are generally assigned as eminent creative work or “big C” creativity. Another aspect Boden points out is that not everyone dares to be combinatorially creative to the same extent. “That [kind of] mental flexibility can be inhibited by lack of self-confidence, as well as by having a sparse collection of ideas in the first place.” (Boden 2001: 96). Linking creative thinking with an individual’s character traits, it follows that there are not only different levels of creativity but also a certain predisposition. Furthermore, a collection of ideas and concepts grows through experience, thus, in the case of linguistic creativity it follows that the more creative language an individual uses, the more s / he knows about language in the first place. Therefore, linguistic creativity could be fostered through exposure to language, like reading, writing or communication in general. Exploratory creativity , on the other hand, is produced within a concept or schema. Here the creative aspect is not formed through the combination of two established items but rather within the perimeters of a system (Boden 2001: 96-97). In language acquisition, morphological rules, such as {-ed} for past tense marking or slots within a semi-fixed schema, like [ pretty +N], can be subject to exploratory creativity. The limits are tested through extensive use of these patterns. which then might lead to over-regularisation (Bybee / Slobin 2007, Bybee 1995: 447-453) or language play (Crystal 1998: 159-182). Children, for example, form analogies like walked , played and goed , but Bybee and Modor (1983) observe that also highly frequent verbs which form a so-called irregular past tense can be used as the prototype of an analogy, if the respective test item is similar in form yet less frequent or unknown to the test taker. Forms like pretty man would also fall into the category of exploratory creativity . While a child would most likely soon substitute this phrase for the more likely version of handsome man or be corrected by a parent or teacher, the examples above show that, if used by a seemingly more competent source, like an adult native speaker or even an author, human cognition falls back on the process of intention reading, assuming that there is some purpose behind this choice of words. In comparison to combinatorial creativity, transformational creativity affects not just single items or concepts, but rather the whole conceptualisation, for example of a schema (Boden 2001: 97). While it could be argued that the use of pretty doll , pretty bird or pretty woman are all analogies based on pretty girl , 68 3 Collocations and Creativity schematization might later lead to the conclusion that all have one part in common and hence share the same pattern of [ pretty +N]. This might shift the perception of this sequence from one item to a semi-lexicalised schema. Similar cases would be “ cherry gone ” into [ NP is / are gone ] or “ more jelly ” into [ I want more N] (Tomasello 1992: 288-292). Like combinatorial and exploratory creativity, transformational creativity is based on analogy and entrenchment but the result is a change in conceptualization rather than a novel utterance. The last paragraphs have argued that creative processes are an essential part of language. Furthermore, it also seems as though the importance of creative language in linguistic research is one of the very few issues that even the two main branches of linguistic research, generativists and cognitivists, can agree on, or as Goldberg admits: “Constructional approaches share with mainstream generative grammar the goal of accounting for the creative potential of language.” (Goldberg 2006: 22). Chapter 4 will, therefore, provide a more detailed discussion of creativity with respect to these different branches of cognitive linguistics (> 4) and thus will try to bring together thoughts on collocation, creativity and language acquisition in order to suggest a model for the cognitive representation of collocations. 3.2 Creativity and Cognition 69 4 Creating Linguistic Creativity What makes a theory that allows constructions to exist a “construction-based theory” is the idea that the network of constructions captures our grammatical knowledge of language in toto, i. e. it’s constructions all the way down. (Goldberg 2006: 18) The last chapter has shown that creativity is a basic faculty of human cognition (> 3.2). At the same time, creative alternations of language seem to be part of every aspect of language, not just poetic works of art, but also everyday conversation (Carter 2004, Crystal 1998). Creativity is at work in any aspect of language, and even established sequences like collocations show not only a certain degree of flexibility but creative alternations as well. As with most creative innovation, new items gradually become familiar. In the fields of lexicology and grammatical structures, this is already quite an established phenomenon. A word like gay , for example, simply meant very generally being ‘happy and full of fun’ ( OALD 7: gay ). However, its creative, extended use to refer to ‘homosexual’ soon became so established that it could now be regarded as its only reading. Not just single words but lexical sequences or phrases too can acquire a new or additional function, like the frequently quoted going to , which used to refer to a literal change of place, yet, through creative extension, came to acquire the additional function of expressing ‘future’ (Bybee / Pagliuca 1987). Examples of creative variants or sequences which have found their way into conventional language use can be found in every aspect of language. As in many other areas, creativity is one of the building blocks of change, which again represents one of the fundamental aspects of development and progress. Therefore, understanding collocations and their new creative alternations also contributes to any theory which seeks to provide a comprehensive framework of language. In order to do so, it is necessary to understand what linguistic change is and, even more importantly, how it is created. In linguistic research generally, two ways exist to investigate change: either through diachronic analysis of co-occurrences over time or through studies within a cognitive system like the human mind. There are several reasons why this study will focus on change from a cognitive point of view, analysing factors of change and creativity within the human mind within an individual’s lifecycle: First, the process of language acquisition, even if seen as a life-long enterprise, is confined to a relatively short time span. If creativity and change can be observed within the 70 4 Creating Linguistic Creativity relatively short span of a lifetime, the results could be taken as a starting point to investigate whether this phenomenon prevails and develops or if it remains a short-lived whim. A second aspect concerns the methodological issue of a suitable database; while, thanks to large corpora, it is no longer a difficult task to find enough instances of a single word such as gay or a sequence such as going to , identifying less frequent co-occurrences of a node and its collocates throughout different periods is still challenging from a methodological point of view, since comparable data for diachronic studies - that is, samples of language from different periods in time which share the same features such as genre, context or register - is notoriously hard to find. Especially in earlier periods, preserved language data is rare. Furthermore, there are often only a few authors included who could make a sub-corpus for diachronic purposes rather biased towards an individual style. Another aspect is the potential (inter)relation between the functional-semantic change of a single lexical item, which could then again influence the combinatorial behaviour of these items. Therefore, this study will focus on the construction of creativity and change in a language acquisition context. Still, diachronic and cognitive change remain closely related, since both, as Traugott (2015) emphasises, are observable through speakers’ usage. Thus, chapter 8 will come back to the notion of change in the shape of grammaticalisation and lexicalisation and discuss their implications for this study as well as the value a concept such as constructionalisation might have for language acquisition and learning (Traugott 2015; Bergs 2012; Traugott / Trousdale 2010). For the purpose of this study, corpora, as well as selected judgement tasks and elicitation methods provide a larger database, which at the same time is more manageable in terms of context and creativity (> 5). Investigating linguistic creative alternation against a language acquisition background, however, has two major implications: it shifts the focus from big C and ProC to little c or even mini c creativity , and it assumes that linguistic creativity is part of every individual’s linguistic system. As has been discussed before, both assumptions are supported by research into cognitive processes of creativity in general (> 3.2). As Boden (2013: 95-98) has outlined, creative thinking is based on combining and testing familiar concepts with a new level of application or interpretation. In language acquisition, this is essentially done from a very early stage, namely, when familiar concepts like hunger or desire are cognitively combined with new forms of articulation, like more cookies or gimmi . Since the basis of creative alternations of collocations also seems to be the acquisition of a new functional-semantic conception, this chapter will examine the most prominent approaches towards first language acquisition and discuss their perspective on the development of meaningful concepts and creativity. In the context of first language acquisition, there are often three 4.1 Nativist Approaches 71 major approaches which serve as a framework for most theories and models: Behaviourism , Nativism and Constructionism 1 . Behaviourism, however, would not seem to be a suitable approach for a study which is partly concerned with creativity and creative alternations, since it suggests that language, like any other acquired behaviour, is learnt through operant conditioning. In principle, this means that as soon as an individual shows a required behaviour, s / he is rewarded - either positively, through receiving a treat, or negatively, by making a negative situation easier or more bearable. According to behaviouristic belief, this then motivates a participant or subject to repeat whatever s / he did to deserve a reward. This reinforcement can be used to train required behaviour as well as to break somebody of unwanted habits (Skinner 1957). As a consequence, behaviourist approaches can only account for the acquisition of controlled and conditioned input, which makes operand conditioning as the sole process of language acquisition unsuitable for any take on creativity and change (Chomsky 1959). Thus, chapter 4.1 starts with a review of selected nativist approaches which suppose that language is a uniquely human, partly inborn faculty, while 4.2 presents constructionist perspectives which take language as a usage-based, predominantly emergent phenomenon. In 4.3, selected models will be discussed in order to propose a model (> 4.4) which will then be used as a theoretical basis for the results in chapters 6 and 7. A summary of the major implications and consequences which can be drawn from this chapter will be provided in 4.5. 4.1 Nativist Approaches Despite their inability to account for linguistic creativity, behaviouristic approaches can indirectly take credit for being one of the foundations of modern, nativist approaches, since it was the lack of any discussion of creativity within language that, amongst other factors inspired Noam Chomsky (1959) to write his ardent review of Skinner’s Verbal Behavior (1957) and thus lay the foundations for generative linguistics 2 . Therefore, it comes as no surprise that generative linguists such as Noam Chomsky advocate the importance of analysis of creative language. In his 1964 publication Current Issues in Linguistic Theory , 1 As Ambridge and Lieven (2011: 1-3) point out, other labels such as generativist or Universal Grammar (UG) approaches and functionalist or usage-based approaches can be used respectively and with only slight shifts in application. 2 While his review of Skinner’s Verbal Behavior (Chomsky 1959) presents a very ardent defence of language faculty as an inherently human ability, a more comprehensive account of Chomsky’s syntactical theory can be found in Syntactic Structures (1957). 72 4 Creating Linguistic Creativity Chomsky even states the need to explain the source of novel, creative utterances as being one of the main objectives of modern linguistics: The central fact to which any significant linguistic theory must address itself is this: a mature speaker can produce a new sentence of his language on the appropriate occasion, and other speakers can understand it immediately, though it is equally new to them. Most of our linguistic experience, both as speakers and hearers, is with new sentences; once we have mastered a language, the class of sentences with which we can operate fluently and without difficulty or hesitation is so vast that for all practical purposes […] we can regard it as infinite. (Chomsky 1964: 7) According to Chomsky, any theory which attempts to draw a comprehensive picture of language needs to address the fact that, despite certain structural constraints, speakers of a language are able to produce novel yet generally accepted sequences of words by filling and recombining a limited set of words with a limited set of sequential structures (Chomsky 1972: 5-7, Chomsky 1965: 15). However, by referring to “new sentences”, it becomes clear that, in the beginning, Chomsky’s emphasis lay on syntactic structures. These, he claims, are part of an inborn linguistic capacity, which then allows the native speaker to produce novel sentences by filling syntactical slots with a finite set of lexical items. Later this model was altered and redefined from the initial idea of a Generative Transformational Grammar (Chomsky 1964, 1957) to a more refined concept in the shape of the Minimalist Program (Chomsky 1995). The basic assumptions regarding the process of language acquisition remain throughout these developments: the faculty of language is a uniquely human ability, and sentences are derived from a deeper, inborn linguistic universal grammar. This observation leads Chomsky to conclude that “[…] there is only one human language, apart from the lexicon, and language acquisition is, in essence, a matter of determining lexical idiosyncrasies.” (Chomsky 1992: 55) At first glance, this statement gives the impression that, at a later stage, Chomsky puts lexical items and even phraseological phenomena at the very centre of his theory; yet, quite the opposite is the case. Within a nativist framework, a linguist’s attention revolves around the uniquely human and absolutely universal structures which are regarded as the basis of human language. Lexical items, on the other hand, are to be regarded as secondary. Thus, a strict separation of lexicon and grammar seems to be the logical consequence of Universal Grammar. Against this background, creativity is first and foremost to be regarded as novel lexical sequences which have neither been heard nor uttered before by a speaker of the respective language. This is, of course, directly aimed at Skinner’s assumption that language, like other mental faculties, can be trained and learnt by operand conditioning. But while this conception might explain sufficiently 4.1 Nativist Approaches 73 well why a sentence like colorless green ideas sleep furiously (Chomsky 1957: 15), despite its initial novelty, would not be disregarded as “not English”, it does not help to understand why pretty man carries a certain connotation, nor why this combination is remarkable. This is a question of how word meaning is shaped and created in the first place; a problem which Chomsky avoids by focusing on structural language universals which he regards as the core of language while dismissing idiosyncrasies or irregularities of any kind, like collocations or irregular verb forms, as periphery (1995: 19-20). There are, however, approaches within the nativist tradition which try to approach word meaning from a generativist point of view. One of them is the structuralistically inspired field of Interpretive Semantics (> 4.1.1) and the almost constructionist perspective of Conceptual Semantics (> 4.1.2). 4.1.1 Interpretive Semantics While Chomsky himself in his earlier works focuses on syntactic structures and their interrelation, Katz and Fodor consider a semantic branch for generative grammar. Their idea for Interpretive Semantics is quite similar to Plato’s early conception. Picking up on the idea of an inborn linguistic faculty, Katz and Fodor (1963) married platonic, universalist thinking with a concept from structural semantics: componential analysis (Leech 1974: 95-125; Palmer 2 1981: 108-107) attempts to break lexical items into their smallest meaningful units, thus formulating the basic semantic components of a word. Katz and Fodor took these structures and established them as the mentally stored cornerstone of their Interpretive Semantics. They assume that, through a system of projection rules, these formal dictionary entries are transformed to fit into a given context. As a consequence, every meaning of a lexical entry needs to be accounted for in the first place to make a transformation and ultimately understanding as such possible. Katz and Fodor give the example of bachelor which, according to the authors, has four different readings: 1) a human male who has never married, 2) a young human male knight serving under the standard of another king, 3) a human who has the first or lowest-level academic degree and 4) a young male fur seal (animal) without a mate during the breeding season. The adequate interpretation of bachelor in a sentence like The old bachelor finally died. would then be disambiguated by the premodification old , since reading 1) is not blocked by the marker young and can, therefore, be used in connection with old without any interpretative clashes (Katz / Fodor 1963: 189-190). Unfortunately, Katz and Fodor do not comment on the reason why option 3), a person with a certain kind of academic degree, might not be possible either. There are cases where even reading 4) might be constructed, for example in a conversation between two zoo 74 4 Creating Linguistic Creativity keepers. Furthermore, the actual age of the bachelor might also depend on the context. Prototypically, this sentence of course triggers the image of rather advanced age, maybe a gentleman in his nineties comes to mind, yet it might also be used in a situation where people are talking about the death of a generally unpleasant and ardently detested acquaintance who was single for most of his life and died after some years of illness at the age of 40. Here, old would hardly be regarded as ‘old age’ and would probably refer to the fact that this person had been a bachelor for most of his life. Nevertheless, this sentence within this slightly unusual context would presumably be both acceptable and interpretable if it were encountered within the right setting, for example as an ironic remark in a book. In order to provide for all potential readings of man , the potential interpretations would even be more diverse, reaching from the more prototypical adult, male human, to a very broad understanding along the line of ‘all human beings’, up to the somewhat seemingly contradictory reading of ‘male human, but with female characteristic’ as in the example of pretty man . Furthermore, like most approaches within the framework of Universal Grammar before it, Interpretive Semantics presupposes an ideal speaker-hearer who has total command of the whole of the English language. Especially in a meaning-related context, this seems to be a rather unrealistic conception, since not only does the size of a speaker’s vocabulary vary from individual to individual (Clark 1995, 1993; Anglin 1993), but the different readings of a lexical entry might also be only partially available to him / her, even for native speakers of a language. So, it could well be that not every adult native speaker of English is aware of the fact that bachelor could, amongst others, refer to a ‘knight’ or a ‘fur seal’. Of course, some of these examples sketch a very rare use of bachelor or man . But any description of linguistic processes needs to be able to account for less frequent or creative applications as well. At the same time, this is why an analysis of collocations and their more creative alternations not only has the potential to gain insight into the creative process at work but can also explain aspects of a speaker’s language processing in general. Admittedly, Katz and Fodor explicitly stress that what they introduce is a “characterization” rather than a “semantic theory of a natural language” (Katz / Fodor 1963: 170). Nevertheless, in their framework too, the idea of a contextual setting is already an important variable for the semantic interpretation of sentences (Katz / Fodor 1963: 176-181), even if it struggles to explain the interpretation of creative or even partly contradictory readings, as demonstrated above. Nonetheless, the question of the origin of meaning in general and creative readings in particular should be central to any semantic theory, irrespective of whether it is trying to explain naturally occurring utterances or a stylized linguistic system. Together with Postal, Katz also developed a theory which argues that meaning remains constant across 4.1 Nativist Approaches 75 generative transformations (Katz / Postal 1964). According to the Katz-Postal Hypothesis , semantic representations are allocated in a sentence’s deep structure 3 , while actual sentences are a result of transformations and represent different surface structures, all of which share the same deep structure. Therefore, Katz and Postal (1963) also suggest treating sentences with an idiomatic reading, like to kick the bucket , as one possible surface structure which shares the same deep structure as the more literal to die (Prinz 1983). Similar 4 to the concept of Interpretive Semantics, this Generative Semantic approach also presupposes that potential semantic representations are already part of the deep structure, which makes it difficult to account for spontaneous, creative alternations and readings within this theory. 4.1.2 Conceptual Semantics and Parallel Architecture Ray Jackendoff (2002, 1990), one of Chomsky’s students, takes a different approach to semantic structures within a Universal Grammar framework. Unlike Katz and Fodor’s advance in Interpretive Semantics, his Conceptual Semantics does not assume that aspects of meaning are derived from syntactic structures. Rather, he presupposes that there is a network of semantic primitives which form the components of semantic structures. As a result, semantic structures exist as a generative entity alongside syntactic structures; both are correlated through interface rules. With this perspective, he moves away from the traditional Universalist perspective that syntax is primary to all other aspects of language but retains the Chomskyan spirit when he summarises the aim of his research as follows: My purpose - the characterization of the mental resources that make possible human knowledge and experience in the world - is conceived as an extension of Chomsky’s goals. Accordingly, an important boundary condition on my enterprise is that it be in all respects compatible with the world view of generative linguistics. In particular, it is crucial to choose I-concepts rather than E-concepts as the focus for a compatible theory of knowledge. ( Jackendoff 1990: 8) 3 This concept also became part of Chomsky’s Standard Theory (for a schematic summary see Chomsky 1965: 16) in which he argued that semantic information is part of a sentence’s deep structure. 4 In a later publication Katz (1971) argues that Interpretive Semantics and Generative Semantics are conceptually rather similar and to a large extent only differ on a terminological basis. Thus, he stresses that “Interpretive semantics says that underlying phrase markers initiate derivations while generative semantics says that semantic representations do.” (Katz 1971: 320) 76 4 Creating Linguistic Creativity Nevertheless, Jackendoff also acknowledges that he shares his pursuit for internal “mental resources” with researchers from fields like Cognitive Grammar or Cognitive Semantics , but stresses that, unlike Cognitive Grammar approaches, Conceptual Semantics is “committed to an autonomous level of syntactic representation rather than to its abandonment” as well as “to rigorous formalism” ( Jackendoff 1990: 16). At the same time, he is also keen to find potential relations between perceptual psychology, language acquisition, and his new approach. Hoping to find “the possibility of a strong, innate, formal basis for concept acquisition” ( Jackendoff 1990: 16), Jackendoff then sets out to analyse various words and phenomena. Among these is the way -construction as in (24) (24) Babe Ruth homered his way into the hearts of America. ( Jackendoff 1990: 211) Later, Jackendoff flags the analysis of this construction as one of the main reasons why he started to consider constructionist approaches as a potential explanation for the restricted, yet productive nature of a construction like “V X’s way PP ” ( Jackendoff 2013: 77). This observation led Jackendoff to the Parallel Architecture model which, in many respects, is very similar to most modern construction grammar approaches (> 4.3.1) in that it assumes that language is ultimately produced by the combination of more or less abstract lexical items. These lexical items again are in their prototypical form very much defined as constructions, namely as a pairing of phonological, syntactical and conceptual structures. Furthermore, Parallel Architecture also allows for lexical items which lack one of the three columns, like predominantly structural items without a concrete phonological representation 5 but which are still seen as a unit as long as they are stored in the long-term memory. ( Jackendoff 2002: 152-195) Still very much rooted in a generative framework is Jackendoff’s focus on the presumably unlimited productivity of language in general and lexical items in particular, which, of course, makes his approach relevant for the analysis of creativity and creative alternations of established items. Since his defining feature is storage in long-term memory, the cognitive make-up of language, according to Parallel Architecture, very much depends on which items are needed to produce the potentially infinite amount of utterances. Here, Jackendoff suggests two mechanisms, full productivity , and semi-productivity , to account for lexical items which can be extended without many limitations and more restricted entries on the other side of the spectrum. For items with full productivity, Jackendoff 5 A similar concept can be found in Construction Grammar, for example in the shape of Goldberg’s argument structure constructions (Goldberg 2006: 23). 4.1 Nativist Approaches 77 postulates that only a more abstract, rather prototypical entry is enough, even though he admits that it might be likely that, especially in the early stages of language acquisition, some exemplar instances remain ( Jackendoff 2002: 339-343). This, for example, might be the case for general structural relations like V+N. In semiproductive items, regular behaviour can be observed as well, however, as Jackendoff points out “acceptable instances must be learned and stored individually” ( Jackendoff 2013: 84). As a consequence, semi-productive items cannot be fully predicted as far as their potential variations and extensions are concerned, yet there might be sub-constructions, which can behave in a fully productive, open way. Analysis of collocations and their creative alternations according to Jackendoff could work in a similar way. While pretty or commit might not form accepted combinations with any noun, a construction like [ pretty +N <human being> ] or [ commit +N <an offence one can be held responsible for before an authority> ] could form an unlimited number of word pairs. The aspect of an additional reading, which Jackendoff assumes to be contextually decodable, is referred to as enriched composition in Parallel Architecture ( Jackendoff 2002: 387-394). Therefore, even though Jackendoff’s research started out as an exclusively generativistic undertaking, the last few paragraphs have shown that it gradually developed into a framework which, in conceptualisation and layout, resembles most construction grammar approaches. The fact that Jackendoff himself acknowledges that generativist linguistics is not comprehensive enough, since it is not able to explain phenomena like the way -construction, shows that, despite the central role which Generative Grammar assigns to creativity, it can be only applied to a quite limited set of creative language 6 . Thus, other approaches, which also take actual language and language use into account, are needed to design a useful framework for any type of creative language. Jackendoff’s advances contain some interesting thoughts, yet, they are still mostly based on introspective consideration. Therefore, the next chapter will present a family of approaches which, from the very beginning, were dedicated to basing their theories on more usage-based grounds. 6 In a rather interesting chapter on the polemics in Cognitive Linguistics, Taylor (2007) writes: “The aim of the generative enterprise has been from the very start the search for high-level generalizations. In this process, the idiosyncratic, the idiomatic, and the exceptional have been sidelined. The high-level generalizations define the “core” of the language system, while the idiomatic, and the peculiarities of individual constructions and lexical items were relegated to the periphery […]” (2007: 573). Taylor also points out that more recent advances in the field of Autonomous Linguistics might have their roots in a UG-based framework but developed an understanding of language which seem to defy Chomsky’s dualistic conception of an important linguistic core (i. e. syntactic structures) and its less relevant periphery (i. e. idiosyncrasies, like for example collocational restrictions). 78 4 Creating Linguistic Creativity 4.2 Constructionist Approaches The distinction between theoretical linguistic proficiency and a speaker’s actual performance becomes obsolete once language is regarded as a system of signs and symbols which develops through use. Therefore, moving away from the notion of an ideal speaker-hearer and his / her competence, usage-based cognitive approaches do not regard grammatical structures and lexical content as separate entities. Today, various approaches like to think of themselves as cognitive linguistic approaches. The spectrum ranges from prototype theory (Rosch 1975, 1973; Labov 1973; Berlin / Kay 1969), to frame semantics (Fillmore / Atkins 1992; Fillmore 1985), and from more general concepts such as metonymy and metaphor (Croft 1993; Lakoff/ Johnson 1980), to fairly comprehensive approaches such as construction grammar (Ziem / Lasch 2013; Goldberg 2006; Fischer / Stefanowitsch 2006). Thus, despite their differences, most approaches share the basic assumption that language is not an inborn, cognitive faculty but rather a multi-layered network of gradually emergent patterns or constructions, which are primarily influenced by conceptualisation and experience (Croft / Cruse 2004: 1-4, Langacker 1983: 1-4). This has far reaching implications for the conception of meaning. It is no longer fixed and confined to single lexical units but seen as a continuously developing part of mental processes which can adapt to their semantic as well as pragmatic surroundings (Geeraerts 2009: 182-183). As a consequence, language in cognitive linguistics finds itself on the same level as other cognitive processes such as sight, hearing or locomotion. Following this argumentation, the same factors which influence the way human beings process a picture, listen to music or learn to dance should also be at work when it comes to language acquisition. Another consequence which follows from the fact that constructionist approaches are essentially usage-based approaches which take domain-general cognitive processes as the source of linguistic structures, is that language acquisition is a highly subjective process. Of course, the overall mechanisms are assumed to be fairly similar, but they can only process what they receive, which makes this approach not only useful to explain the development of a general language system but also suitable for accounting for variation and change. Research here falls into two general tendencies which share most of their basic assumptions but focus on slightly different mechanisms: emergentist approaches, which are more concerned with the brain’s internal mechanisms (> 4.2.1), and socio-pragmatic approaches, which in addition focus on the interaction of individuals and their learning through cues within their (linguistic) environment (> 4.2.2). 4.2 Constructionist Approaches 79 4.2.1 Emergentist Approaches Like behaviouristic approaches, emergentist research assumes that a language is acquired through input. Within an emergentist framework, the mind operates as a kind of data structuring unit, which processes any linguistic input and abstracts categories and systematicity from it. Thus, the brain relies on nothing but its own, general mechanisms, which are not thought to be supported by any language specific, inborn mental faculty or device. In this, emergentist approaches are closely related to connectionist models, which, especially at the beginning of their implementation, were at times misconceived as mechanical or even as a “variety of confusions and irrelevances” (Fodor / Pylyshyn 1988: 6); emergentist studies were subject to similar scepticism. But while this criticism might be justified for behaviouristic models, which postulate that the acquisition of a language boils down to a mere link between an input stimulus and its effect, emergentist approaches assign a much more significant role to the mechanisms of input processing within the human mind. This kind of cognitive grounding of language then, in fact, seems to share understanding with generative approaches (> 4.1), but other than in (most) nativist models, these mechanisms are assumed to be domain-general. Emergentist models claim that a linguistic system can be learnt and explained based on the same principles as any other cognitive domain, such as sight, hearing or locomotion. Unlike most generative models, associative language acquisition can even account for constant change and variation, which would render any specific linguistic process unnecessary (Bybee 2010: 6-7; MacWhinney 2001: 448). The basis for the emergence of a language, according to these associative approaches, is frequency and processing effects . (MacWhinney 2001: 462-465). Frequency effects with respect to language postulate that a morpheme, word or structure is more likely to be stored, retrieved and used the more often it has been encountered before. Bybee (1995, 1985) for example argues that irregular verb forms for the English past tense are retained because they occur with ahigh token frequency. Thus, they are encountered without much modification at a very early stage within the language acquisition process. Verbs which occur with a regular past tense form, on the other hand, occur with a high type frequency. As a consequence, their forms change ever so slightly, which makes them subject to cognitive analysis: the varying parts are contrasted and analysed and associations are formed, which results in patterns of reoccurring elements, like the past tense marker “-ed”. Therefore, verbs with a high token frequency for their past tense form, like went or took , preserve their form, while bust , dive or fit are also adapted to an accepted form with a regular -ed ending. For collocations, this could mean that their degree of variability might depend on the question of whether they 80 4 Creating Linguistic Creativity occur as a stable item or in the shape of different alternations. A collocation, like pretty girl or commit a crime , is usually encountered in various alternations, but if, for example, a child exclusively hears pretty girl without any form of variation, as in look at you, what a pretty girl , you are a pretty girl or where is my pretty girl? , she might later store this reoccurring sequences of words as one item and use them as a fixed expression. But as MacWhinney (2001: 464-465) warns, frequency alone cannot account for the multiple layers and structures of natural language. To begin with, it does not explain how a system like the human brain, can abstract from this frequency-based information; for example, how it can form analogous past tense forms such as busted , dived and fitted or why a child eventually produces variant phrases like pretty doll and pretty mummy along with pretty girl . Neither does frequency alone provide any reasonable explanation for change. It would, for example, be unable to explain why sequences which are encountered very often get shortened or can even lose their initial meaning like go in going to or gonna . Therefore, emergentist approaches assume that there are several processing effects , which support the processing of the data input. One of the more comprehensive discussions of these processes comes from Bybee (2010), who lists rich memory , categorisation , chunking , analogy and cross-modal associates as the most important processes (Bybee 2010: 7-8). 4.2.1.1 Rich memory One of the basic assumptions of emergentist models as suggested by Bybee is that the human brain is capable of storing a vast amount of data (Schneider / Bjorklund 2003: 370-371; Bod 1998: 2-7; Hintzman 1986: 422-424). Since constructionist approaches do not believe that a language-specific inborn structuring device exists, any linguistic information which is used in language production needs to stem from memory in one way or another. Thus, rich memory could be regarded as a major prerequisite faculty for most cognitive processes rather than a process in itself. Estimates of the exact capacity of the human brain differ depending on the assumed quantity of synapses and bytes per synapse. Most neuroscientists who have attempted a rough estimate, however, agree that the capacity of the brain is, if not infinite, at least not the restricting factor for human cognitive abilities (Hawkins / Blakeslee 2004: 210; von Neumann 5 1986: 61-66). The decisive factor when it comes to the ability of language acquisition and learning is instead a system‘s performance and its mechanisms for storing and retrieving information. This is, in fact, also a relatively undisputed observation within usage-based approaches (Christiansen / Charter 2008: 501; Dąbrowska 2004: 58-61). Hence, the body of research is concerned with factors which might process and shape the received input into a more or less uniform linguistic system. 4.2 Constructionist Approaches 81 It does not automatically follow that, because of the capacity of the human brain to store and process a vast amount of data, every bit of input automatically develops into a processed piece of information. It instead seems that once faced with a task, the human mind tends to focus on the most salient aspects. In language attainment, this could happen through any social or situational highlighting, as, for example, through contextual priming (Bargh / Chen / Burrows 1996; Schacter 1992; Tulving / Schacter 1990) or explicit awareness raising, like through pragmatic cues (Tomasello / Akhtar 1995). Ellis (2008: 379) defines this phenomenon of salience as “[t]he general perceived strengths of stimuli […]”, which, like anything which is subject to perception “[…] varies between individuals and between species.” Schmidt (1990: 138) even hypothesises that for L2 attainment “[…] subliminal language learning is impossible, and that intake is what learners consciously notice.“ Chapters 6 and 7 will come back to this assumption when they compare the collocational proficiency of L1 and L2 speakers of English. But in any case, it seems plausible that even less salient aspects within a language system are more likely to be internalised by native speakers, since their exposure to the target language is longer and more authentic than that of non-native learners, at least in a more formal classroom context. 4.2.1.2 Categorisation One of the most fundamental cognitive processes is categorisation . It refers to the ability of human beings to cluster the input they receive to form categories of the same or similar items. In language acquisition, these linguistic categories emerge at a very early stage. According to Braine (1963), the first signs of category formation can be observed within the first five months after a child utters his / her first word. Soon after first words, holophrases, like apple, Mommy or play-play (Tomasello 1992: 286-370; Nelson 1973) emerge and toddlers apparently start to explore the combinatorial potential of the structures they have at their disposal. Based on instances like more cookie or all gone, variations like more juice , more toast , more sing or all dressed and all wet are formed. Since these structures are constructed around a central point, Braine (1963: 4) coined the term pivot constructions . They show that without any prior verbatim input, sequences can be formed based on previously experienced input. How exactly these categories emerge is still an ongoing debate. Thus, with respect to their conceptualisation of categorization, emergentist approaches can be further subdivided into prototype models and exemplar models (Reber 2009: 89-94; Bod 1998: 1-4). Both share the assumption that first categories are built through single sample instances like a word, phrase or sentence but differ in their understanding of the shape and make-up of a category. Prototype models assume that an abstract prototype, which captures the essence of a category, is formed 82 4 Creating Linguistic Creativity based on input. Potential new members are then identified through a simple comparison with the category representation (Reber 2009: 89-94). The notion of an explicit formation of categories, for instance, is part of Goldberg’s (2006: 45-49) approach towards construction grammar (> 4.3.1). In most exemplar models, on the other hand, potential new members are compared to an average of all instances a system has received so far. Here, the category is formed ad hoc through statistical extrapolation. Thus, categorisation is the process of a constant calculation of an average likelihood and not - as in prototype models - the explicit forming of a cognitive category entry (Reber 2009: 89-94). Coming back to the example of pretty girl , categorisation via exemplars or prototypes would mean that through input data like pretty girl , pretty doll or pretty flower , the child formulates a pattern which could be described as [ pretty +N]. In a prototype model, these abstractions are assumed to be stored as form-function pairs, similar to individual words or formulaic sequences. Exemplar models, on the other hand, would assume that this abstraction is made ad hoc, whenever the brain is triggered to form a sequence starting with pretty . Based on previous input it then compares similarities and differences between samples and then calculates the most likely option. From a linguistic point of view, it is, however, difficult to judge which model is at work when it comes to language acquisition, since, as Reber (2009: 105) points out, “observations based only on input and output may be incapable of distinguishing between competing theories.” However, the fact that associations seem to be formed ad hoc, as observable in priming effects (Schacter 1992; Tulving / Schacter 1990) and as the partly idiosyncratic development of variation and change in natural as well as modelled language (Larsen-Freeman / Cameron 2008; Bod 1998), seems to indicate that an exemplar model might be more likely when it comes to first language acquisition. 4.2.1.3 Chunking Categorisation, in turn, can only emerge if similarities between individual instances are recognised and processed. To identify potential similarities, associations between input data need to be formed. If two items co-occur repeatedly, they gradually become more and more associated until they are perceived as one unified chunk. Therefore, linguistic frameworks often refer to this process as chunking (Bybee 2010: 34-37; Ellis 1996: 106-108). From a neuroscientific perspective, chunking can be regarded as an association process based on Hebb’s Law (Hebb 1949: 62-63). This has often been summarised by the phrase: elements which fire together, wire together. Thus, if two items like a sequence of movements or words co-occur often enough, they activate the same neurons until they become so closely related that they are perceived as one item. If this 4.2 Constructionist Approaches 83 connecting ‘wire’ is strong enough, it is enough to activate one part to trigger the other. This process is the same logic which is used in operant conditioning, where a stimulus and an unconditioned behaviour are made to co-occur until they are cognitively linked. In fact, most activities which require practice are to a great extent based on chunking, like throwing a ball or dancing. Most usagebased models assume that chunking processes can be observed on all kinds of levels in language, from simple word associations to the association of a verb with a certain grammatical structure (Stefanowitsch / Gries 2003). In the case of collocations, chunking might, in fact, be regarded as one of the most important processes, since a collocational item could be interpreted as two individual words, which, through frequent co-occurrence, became strongly associated. This could also explain why some collocations are almost perceived as one idiomatic unit, like pull a face , while others, like pretty girl, seem less fixed, since they might be used with other items as frequently as together. There are, however, other voices, like Wray (2002), who instead argue that formulaic language, in fact, starts out as larger units, such as how do you do? or pretty girl , which then, through language input and cognitive processing, are eventually analysed and broken down into smaller elements. There are observations which favour the latter analysis. Thus, chapter 4.3 will come back to this debate. However, another area, where chunking might serve as a more plausible explanation, is the interpretation of semantic prosody as in (25) or (26): (25) BNC A68 1495 His high-pitched southern university voice caused amusement to North Country clergy who came. (26) BNC BN1 644 The poor young woman, a pretty creature, flushed scarlet and said […] If a lexical item frequently or even exclusively co-occurs with other items which share a certain meaning dimension - for example, the feature of ‘negative effect’, as in cause a disease or ‘female’ like in pretty girl - this meaning might through chunking become closely associated with any lexical item which is used together with cause or pretty . The consequence would then be that the reading of ‘negative effect’ or ‘female’ rubs off on other combinations such as in cause amusement or pretty creature. 4.2.1.4 Analogy Analogy, on the other hand, relies on chunking and categorisation. As Bar (2007) explains, analogies are often regarded as a situation “[…] where a new exemplar of a certain object class is analogically mapped to the corresponding prototype” (Bar 2007: 281). Coming back to the distinction of prototype and exemplar 84 4 Creating Linguistic Creativity models, analogy could then be seen as the allocation of input onto an existing prototypical category, while association and chunking would be observed as soon as incoming data triggers another, already stored entity, which makes association a fundamental process for any exemplar model. Bybee (2010: 57) further argues that “[a]nalogy is considered to contrast with rule-governed productivity because it is heavily based on similarity to existing items rather than on more general symbolic rules.” She then defines analogy as “[…] the process by which a speaker comes to use a novel item in a construction.” (Bybee 2010: 57) In describing analogy as a variant of an established construction or category, Bybee emphasises once more that there is no need for pre-set rules in an emergentist framework, since more abstract structures, like grammatical constructions, develop along the same line as multi-word units, all of which form the basis for analogous, creative variation. Bod (2009a) too stresses the importance of analogy in calling his approach towards language acquisition a “probabilistic, analogy-based model”. He even goes a step further when he claims that rules and exemplars could indeed be viewed as two ends of the same spectrum (Bod 2009a: 754-758). In this approach, categories and structures which emerge from exemplar-based theories are regarded as the rules which are seen as a prerequisite in rule-based approaches, like Universal Grammar. With this perspective on language acquisition, Bod actually comes very close to Jackendoff’s understanding of language (> 4.1.2). To show that the identification of similarities is indeed a ubiquitous underlying feature of human cognition, Gentner and Markman (1995: 122-126) conducted a series of experiments using a one-shot mapping task . They asked their adult participants to point at the similarities between two pictures. All were able to identify similar aspects of both abstract shapes and more concrete scenarios, yet, depending on their prior tasks and experience, the focus of these associations varied from the identification of similar objects to the construction of similarities in the items’ relation to each other. This observation also holds for language. However, the fact that the detection of similarities can be controlled by previous tasks and input is striking, since this means that not only the input as such but also the circumstances under which it occurs might play a role in the language acquisition process (> 4.2.2). Finally, coming back to Braine’s (1963) study for an example, the mechanisms which lead a child to produce the analogous more sing are based on chunking and categorisation. Items like more milk and more cookies are associated on the basis that the pivot item more is a similar, reoccurring element which is used in a similar situation: expressing the desire for some kind of second helping. This forms a chunk like [ more +N <something I want> ], which ultimately might result in the category. Through analogy, an utterance like more sing is created. The same, of 4.2 Constructionist Approaches 85 course, applies to collocational combinations like [ pretty +N <female (human) being> ] and the analogous creation of pretty creature or pretty man . 4.2.1.5 Cross-Modal Associates Cross-Modal associates , in Bybee’s terms, could be re-interpreted as chunking of a form and its meaning. With this term, she refers explicitly to the linking of different formal representations, like a sequence of phones, a string of words or a certain word order with a specific meaning. Even if the observation that language is based on form-meaning, or better, form-function pairings in modern linguistics dates back to Ferdinand de Saussure and the concept of the linguistic sign (de Saussure 1916 / 1967: 76-82), it is questionable whether this should really be treated as a separate process or rather as a result of analogy, categorization, and processing at a certain (linguistic) level. For, while it might be true that the development of form-function pairs, also referred to as constructions (Goldberg 2006: 3), are the basic building blocks of natural languages, it is debatable whether these are (rather) a result of several processes or (in fact) a process in itself. Taking again the example of pivot constructions: a child which encounters more milk and more cookie forms the analogy that in both instances more is a reoccurring item, s / he furthermore links a situation with more to the experience of getting a second helping, in this example of milk or cookies. Therefore, [ more +N <something I want> ] becomes a linguistic sign or construction for requesting another helping. This linking of auditory and physical experience is what Bybee refers to as cross-modal association . The term as such emphasizes the fact that learning a language is not restricted to a closed system of grammatical rules and lexical labels but happens across and is supported by all domains of human experience. In listing cross-modal association as an underlying process of first language acquisition, Bybee stresses that, according to emergentist approaches, language is not domain or faculty specific but subject to the same learning mechanisms as all other cognitive abilities. While this is understandable from the point of view of theory building, it does not necessarily follow that an additional process of cross-modal association is needed for the application of a framework per se. As shown above, this process simply emphasises a fundamental assumption of emergentist approaches, namely that first language acquisition is based on non-domain specific processes; hence, learning across different domains and modes is possible, since it follows the same rules and processes. 86 4 Creating Linguistic Creativity 4.2.1.6 Connectionism and Neuroscientific Implications Interestingly, findings from emergentist approaches can also be found in neuroscientific research and connectionist models. With rich memory as a prerequisite and cross-modal associations as a basic, theory-implicit assumption, there are three major cognitive processes which also remain necessary to acquire a first language: categorisation, chunking, and analogy. Compared to a complex system of inborn grammatical structures, as in Universal Grammar, emergentist approaches focus on very few factors. Yet in recent years, an increasing number of studies and simulations have shown that language in general and language acquisition in particular might indeed be based on this very restricted amount of mental processes. The most prominent support comes from Connectionism . This term has been established for a series of computer-based simulations which are grounded on the assumption that the human brain is made up of a large number of neurons or nodes. Through input from the outside world or other neurons, these neurons can be activated and thus shape cognitive networks and patterns. As a result, connectionist models do not assume that language processing is based on rules and parameters. Rather, they see (linguistic) knowledge as a network of associations. As Elman summarises: The exact choice of representation might vary dramatically. At one extreme, a word could be represented by a single, dedicated input unit (thus acting very much like an atomic symbol). At the other extreme, the entire ensemble of input units might participate in the representation, with different words having different patterns of activation across a shared set of units. (Elman 2001: 297) Most strikingly, connectionist simulations were able to show that a quasi-neural network could be self-programming. Thus, they do not rely on an explicit input of rules but rather develop patterns inductively. One of the first studies to show this phenomenon was Rumelhardt and McClelland’s (1986) model for English past tense forms. They were able to train a network without feeding it any distributional rules. Nevertheless, the network was capable of “learning” correct English past tense morphology. The model not only developed the target proficiency without being taught any explicit rules, but it also produced the same u-shaped learning curve which could be observed for past tense morphology in human first language acquisition (Cazden 1968: 446-448; Berko 1958). The fact that an artificial network can simulate self-programmed acquisitional processes, however, does not necessarily mean that these structures are also to be found in the human mind. On the topic of past tense morphology, Bybee argues along the same line. In her paper on “Regular morphology and the lexicon” (Bybee 1995), she emphasises that irregular as well as regular past tense forms might be cognitively stored. The only difference is that irregular forms enter the 4.2 Constructionist Approaches 87 network in the shape of a high token frequency, which makes them fossilize as one “single dedicated input unit”, while so-called regular forms are abstractions over input in the form of a high type frequency. Bybee points out that, in both cases, frequency seems to be the decisive factor, arguing that associations and patterns develop based on the type of input a system receives. Therefore, when it comes to the role of pre-set rules, this network model is completely in line with connectionist reasoning, although Bybee stresses that her emergentist network model contributes a much larger role to token frequency, which she interprets as an input unit in its own right and not as a further mapping between a base and its derived form. Moreover, she assumes patterns can not only be created from associations between individual types but also that there are more abstract structural patterns which are formed via structural analogies (Bybee 1995: 432-433). A further objection against an emergentist network perspective on language acquisition refers to the shape and make-up of the individual nodes. In connectionist models, these are often programmed based on simple, yet varying algorithms (Elman 2001: 296-300). Thus, if language acquisition seems to be similar to the processes within a network model, one might ask whether human neurons function like a node in a (connectionist) model. Support for this assumption comes from neuroscientific research. Martin and colleagues (1996), for example, found that naming objects like tools or animals simultaneously activates areas in the brain which are associated with movement or vision. This might indicate that input data, independent of its domain, is indeed mapped onto the same neurological network. Furthermore, taking Hebb’s law (1949) as the basis of his studies, Pulvermüller (1999) argues that word processing indeed seems to be distributed across cell assemblies. More recently, Miller (2003) and his colleagues have even suggested that the same mechanisms are traceable in a primate brain. 4.2.2 Social-pragmatic Learning With a strong focus on internal, cognitive processes comes the danger of neglecting other aspects. Since mental processes would not be able to work without data to process in the first place, socio-pragmatic approaches argue that data-generation should play as much of a role in language acquisition research as more internal cognitive abilities. Focusing on data retrieval, Tomasello argues that, like data analysis as such, the ability to access and record relevant data is an inborn faculty. The underlying claim of socio-pragmatic approaches is that the social context as well as the situation in which an utterance takes place provide the necessary cues for a language learner to extract the relevant information. Within this framework, Tomasello (2005: 295-305) describes “four basic sets of processes”, as the foundation of first language acquisition (L1): 88 4 Creating Linguistic Creativity - Intention-reading and cultural learning - Schematization and analogy - Entrenchment and competition - Functionally based distributional analysis Intention-reading and cultural learning can be seen as the most fundamental (probably) inborn set. It helps a new member of a (linguistic) community to interpret utterances and discern the different functions which lie behind formal entities such as a string of sounds like / θæŋk ju/ and / ˈprɪti gɜːl/ . They do so by drawing inferences from expressed intentions within a joint attention frame. Tomasello and Akhtar (1995), for example, tested children (average age: 2; 3.13) in a situation where they were asked to present an object or repeat a process. As well as a control group, they set up two groups with different conditions. While one group was presented with a novel toy while perceiving a nonce word (“widget”), the other group received the same verbal input but witnessed an action, namely the same toy spinning on a turntable. The test phase showed that most children were influenced by the objector action-highlighted condition. In the object-focused group, seven out of twelve children chose to point out the novel toy, whereas nine out of twelve children from the action-focused setting repeated the action they had previously been shown. Like Gentner and Markman’s (1995) experiment on similarities and analogy, these experiments again underline the observation that, besides language input and cognitive processing, context and setting are other crucial factors in the language acquisition process, since they seem to influence the salience of an item or activity. Schematisation and analogy, as well as functionally based distributional analyses, on the other hand, instead refer again to actual methods of input processing, as described in chapter 4.2.1. Also, entrenchment and competition are in fact linked to the development of mental chunks and categories. A sequence of sounds such as / θæŋk ju/ or / ˈprɪti gɜːl/ , for example, would not be firmly associated with the general concept of ‘expressing someone’s gratitude’ or ‘attractive young female human being’ after just one encounter. It potentially takes some time before a child connects either utterance with a function and a context. Once a concept and its formal representation are firmly mapped together, this item or expression is said to be entrenched . But entrenchment is not just a process of conditioning in which a form becomes a preferred way to express a certain function or meaning and vice versa. Especially with larger units, such as argument structure constructions ( ASC ), like an intransitive or a transitive construction, entrenchment also entails that the more firmly a certain form is established with meaning, like for example an intransitive construction and the verb giggle , the less likely it is for this meaning to be expressed in any 4.2 Constructionist Approaches 89 other way, for example, in the shape of a transitive construction, as described by Ambridge and Lieven (2011: 252-256), Tomasello (2005: 178-181) or Brooks and Tomasello (1999). Thus, creative or new uses of a verb like * She giggled me would be rejected by a speaker who has acquired a firmly entrenched pattern of giggle plus intransitive construction. Often, frequency of co-occurrence is regarded to be the decisive factor, which is presumably why young learners tend to overgeneralise certain constructions and would, in fact, produce a sentence like She giggled me . Indeed, Brooks and her colleagues (1999) were able to show that highly frequent verbs are more likely to be restricted and rejected as unnatural or ill-formed than less frequent verbs. This implies that the more frequently a verb or lexical item occurs, the stronger is its tendency to be cognitively embedded into a set of more or less restricted ways of applying it. Coming back to the question of creativity and creatively formed new sentences, Goldberg (2006: 93-102) argues that entrenchment alone, however, does explain why some highly frequent and thus probably highly entrenched items can be used in new constructions without being judged unnatural or non-native at all but not why there are nevertheless combinations which would be rejected by most native speakers. Sentences (27) and (28) show two of Goldberg’s most frequently quoted examples (Goldberg 2006: 94): (27) She sneezed the foam off the cappuccino. (28) She danced her way to fame and fortune. Drawing on Clark and Clark’s (1979) thoughts on the prerequisites for lexical innovation, Goldberg argues that the process at work might be best explained in terms of statistical pre-emption . Like entrenchment, pre-emption is also a process which might explain why certain creative alternations of constructions are rejected by the majority of a linguistic community while others are readily accepted. But while entrenchment relies on frequency alone, statistical preemption is based on the principle of blocking: if a specific way of expressing a concept or a situation already exists, a language user is more likely to reject a new way of verbalizing said concepts. Thus, creative alternations are only possible if they, so to say, fill a lexical gap 7 (Goldberg 2006: 94-98; Tomasello 2005: 178-181). 7 For the field of word formation processes Burgschmidt (1973: 124) described this phenomenon as a kind of blocking due to an already existing concept („Regel der besetzten Stelle“, lit.: rule of the occupied slot). Therefore, certain potential words like * regularness or * happility are very unlikely to become more than ad-hoc formations for the sake of an example, because for these concepts the lexemes regularity and happiness already exist. 90 4 Creating Linguistic Creativity Apart from internalisation, the mind also analyses and dissects the input it receives. It is sensitive to change and thus, for example, accounts for variation in established sequences such as thank you very much or reads analogies as in pretty girl , pretty doll , pretty woman . The result is a schematisation, which is a more abstract version of several, related, entrenched sequences, like [ thank you +quantifier] or [ pretty +N <female (human) being> ]. Functionally based distributional analyses then serve to identify items which are similar, for example, because they occur in the same paradigmatic slot in relation to the same sequence of words or sounds, like girl , doll or woman with respect to pretty . This analysis leads to the interpretation of a word’s meaning based on contrast. In combination with schematisation and analogy, functionally based distributional analysis can further explain how more comprehensive functional categories such as word classes or phrase types come into existence. At this point, it is important to point out that, as potentially inborn abilities, intention reading and cultural learning are unlikely to stop at a certain age. The implicit knowledge that people live in societies with fellow human beings who do everything they do for a reason or with a certain motivation remains an important factor in human cognition. This can be observed almost any time people interact with their surroundings, for example, when reading between the lines in a potentially ambiguous conversation, as in (29). (29) A: “Tom’s girlfriend is a very pretty and intelligent girl.” B: “She certainly is pretty.” The reason why B’s answer could be interpreted as derogatory or even rude is not because this person said an insulting word or phrase. In fact, the very opposite is the case; B even confirms A’s observation that Tom’s girlfriend is pretty. However, the established and accepted way of expressing agreement is an utterance like yes or a simple nod of the head. Hence, a deviation from this convention serves an additional or even different function. Here, intention reading, which, in this case, could also be called the interpretation of conversational implicature (Levinson 1983: 104; Grice 1975: 45), provides the basis for a more suitable analysis of the situation, namely that a deliberate affirmation of just one aspect of a statement implies that the other part does not meet someone’s approval. In other words, this set of processes initiates analysis and interpretation As Aronoff and Anshen (1998: 238) point out, children especially seem to be rather prone to this kind of creative affixation. For the study at hand, this would imply that creative alternations of collocations might also be more readily produced and accepted by children compared to adult native speakers. 4.3 Phraseology and Language Acquisition 91 and could, therefore, be seen as a kind of motivation for learning in general and language acquisition in particular. 4.3 Phraseology and Language Acquisition The last pages argued that usage-based approaches towards first language acquisition provide convincing evidence to support the hypothesis that language is gradually acquired based on non-domain specific, cognitive processes which shape the linguistic input. These studies and processes present only a general outline of what might be going on in the human brain when it acquires its first language. To link language and variation with the production and reception of creative language in general and alternations of collocations in particular, the processes at work need to be viewed within a framework which provides a unified view of language in general and language acquisition specifically. With the aim of providing a more comprehensive model to explain collocational phenomena as well as their established and creative variations, the following chapters will present three more or less usage-based models which, to a certain degree, concern themselves with formulaic sequences. After a period of predominantly UG-based studies, phraseological phenomena were virtually non-existent in much research on the acquisition of the English language. Formulaic sequences like collocations were only analysed in order to describe language, for example for lexicographic reasons (> 2.2; for example Cowie / Mackin / McCaig 1983; Palmer / Hornby 1937; Palmer 1933). About 30 years ago, some tentative advances were made to not only describe but also to understand the mechanisms underlying selected phraseological sequences such as let alone (Fillmore / Kay / O’Connor 1988) or there , which Lakoff (1990: 462-585) uses to demonstrate the concept of prototypicality and metaphorical extensions within constructions. While these are, of course, not collocational combinations in a narrow sense, these studies started to shift perception of phraseological phenomena from the periphery to the centre of attention in first language acquisition. The result today is a multitude of constructionist approaches; therefore, starting with early studies of phraseological phenomena, chapter 4.3.1 will outline the basic assumptions of construction grammar approaches. Another usage-based, yet not construction grammar approach towards the development of formulaic language in L1 comes from Wray and Perkins (2000). In their stage-model, they propose four stages of phraseological development (> 4.3.2). It is specifically designed to account for formulaic language but lacks a broader foundation within a more comprehensive model of language acquisition. Moreover, it only considerd the general development of phraseological 92 4 Creating Linguistic Creativity phenomena and formulae without paying much attention to more individual examples. The analysis of selected examples and their creative or even productive potential is an aspect fundamental to most construction grammar approaches, but only through the development of more applied models, such as Embodied Construction Grammar (Bergen / Chang 2005), Sign-based Construction Grammar (Boas / Sag 2012) or Fluid Construction Grammar (Steels 2011, 1998), have construction grammar principles been put to a more comprehensive test. Partially, this shifted the focus from rather example based studies to more process-oriented simulations. A third approach, which embraces the complex, multilayered nature of more creative instances of productivity, is the Complex Adaptive Systems ( CAS ) model, which, designed with the complex nature of language in mind, could also be regarded as a kind of meta-model (> 4.3.3). Furthermore, drawing on aspects from construction grammar, and Wray and Perkins’ stage model, this chapter leads to a suggestion for a usage-based model of first language acquisition which can potentially explain how collocations come into being, how semantic prosody develops and how creative alternation within collocations can be accounted for (> 4.4). 4.3.1 Construction Grammar In Lakoff’s (1990: 462-585) thoughts on constructions containing there as well as Fillmore, Kay and O’Connor’s (1988) study on let alone and Kay and Fillmore’s (1999) publication on what is X doing Y? , the respective authors shifted their attention to phraseological phenomena at a time when these were widely regarded as peripheral exceptions within a rule-based framework. While still very much influenced by generative ideas - Fillmore especially had a generative background - they approached formulaic sequences to examine if and to what extent theses phrases could be understood in terms of structure, but also traced their semantic and pragmatic contribution within an utterance. Therefore, today, Fillmore, Kay and O’Connor’s (1988) publication in particular is strongly linked with the advent of modern construction grammar, even though earlier research, for example within (British) Contextualism (among others Sinclair 1987b, 1966; Halliday 1966; Firth 1957 / 1968; Malinowski 1923 / 1956), had already been concerned with the meaning of formal sequences within a certain context since the first half of the twentieth century. In their analysis of let alone , Fillmore, Kay and O’Connor (1988: 504-506) suggest a classification which distinguishes four types of idiomatic constructions: decoding vs. encoding 8 , grammatical vs. extra-grammatical, and substantive 8 A distinction which had already been made earlier by Makkai (1972: 25). 4.3 Phraseology and Language Acquisition 93 vs. formal idioms, as well as idioms with and without a pragmatic function. While this typology is still very much focused on a specific subset of language, it already undermines one of the most fundamental assumptions within Universal Grammar, namely that grammatical structures and lexical entries operate on two different and clearly separate levels. In assigning certain functions to formulaic sequences, Fillmore and his colleagues lay the ground for one of the most fundamental assumptions within construction grammar: that language consists of items which have a formal as well as a functional side. Furthermore, in postulating a functional dimension for multi-item sequences, Fillmore, Kay and O’Connor break with the generativistic tradition that language is derived from a combination of multiple rules which form a hidden, underlying layer of transformational deep structure. In mapping a formal sequence of words directly onto a specific function, they instead assume the position of monostratal , 1: 1 direct relations. Closely related to this conception is a further principle which most of today’s construction grammar models share: non-compositionality . This refers to the observation that items which cannot be split into smaller, meaningful units can occur in various shapes and with varying degrees of complexity and abstraction. This then results in conventionalised form-function pairings, the so-called constructions. These early studies, however, are very much rooted in a UG tradition. They assume primitive, semantic units (Katz / Fodor 1963) and analyse their items based on invented examples which lack context and at times even authenticity. Thus, these studies are still far from a usage-based perspective. Lakoff, on the other hand, comes from a cognitive background; he sees constructions as a kind of cognitively real, prototypical expression of formfunction pairings (Lakoff 1990: 462-468). He is one of the early proponents of construction grammar and also the first to define constructions in the following way: “Each construction will be a form-meaning pair (F,M), where F is a set of conditions on syntactic and phonological form and M is a set of conditions on meaning and use.” (Lakoff 1990: 467). Similar to de Saussure’s terminology, this link is arbitrary, yet conventional. But unlike de Saussure, who predominantly concerned himself with lexical units and grammatical structures on a more theoretical level, constructionist approaches not only assume that these interrelations occur on every linguistic level from small units, like morphemes, or words, like [ pre- ] or [ avocado ], via complex words or partially filled idioms, as in [ daredevil ] or [ jog N <someone’s> memory ], to sentence spanning argument structure constructions as in the case of a ditransitive structure like [ Subj V Obj1 Obj2 ] (Goldberg 2006: 5), but they also design very concrete, applied studies to put these assumptions to the test 9 . Be- 9 For a comprehensive overview compare for example Hoffmann and Trousdale (2013). 94 4 Creating Linguistic Creativity cause of their considerable range, from small morphemes up to sentence-wide, rather abstract phenomena, constructions are, at times, based on their level of abstraction, subdivided into micro- , meso- and macro-constructions (Traugott 2008: 31-32). Furthermore, constructions are also seen as conventionalised pairings, whose usage is negotiated among a language community. In the case of the German adjective hübsch and the Italian bella for example, there is no other motivation except convention, why both forms / hʏpʃ/ and / ˈbɛlla/ refer to the same concept of ‘pretty’. Differing from the de Saussurean concept, most constructionists, however, define constructions as non-compositional sequences of language; so they cannot be split up into smaller meaningful units without changing the meaning of the whole sequence. Take the example of let alone : let and alone , of course, are lexical items with their own level of meaning, yet in a sentence like he would not drive a car let alone buy one they function as a unit, expressing a concept along the reading of a coordinating conjunction with negative polarity, which contradicts or ascertains a preceding proposition (Fillmore / Kay / O’Connor 1988: 510-519). Frequency can also play a role in the creation of a construction, since, as Goldberg argues: “[p]articular languages are learned by generalizing over utterances that a learner has heard used, while language production and comprehension involve combining or decomposing an utterance into its more basic from-function correspondences.” (Goldberg 2013: 27) However, with the socio-pragmatic dimension (> 4.2.2) in mind, construction grammar, like, for example, represented by Goldberg (2006, 1995), is not so much a radically different approach but rather a supplement to the emergentist perspective on language acquisition. It shares its belief that language is a usagebased, non-domain-specific ability but focuses more on the description of cognitive representations as such than the processes which created them. To talk about construction grammar as if it is a single unified school within linguistic research would not do justice to the plethora of constructionist concepts and approaches. The following pages, however, will focus on what has come to be known as Cognitive Construction Grammar ( CC xG) (Goldberg 2006, 1995). At first glance, this choice might be surprising, since, as has been mentioned before, there are various constructionist frameworks which would provide an apt basis for a computer-based simulation. However, the aim of this study is to understand and describe how different groups of native and non-native speakers handle collocations and their creative alternations. First and foremost, this approach calls for a model which provides a sound theoretical basis against which observations and findings can be discussed, before any pattern or hypothesis on development can be tested in further studies or simulations. Further reasons for the preference for CC xG above other construction grammar approaches are that, unlike more formalistic approaches like the Berkely Construction Grammar (Kay / Fillmore 4.3 Phraseology and Language Acquisition 95 1999; Fillmore 1988) or Sign-based Construction Grammar (Boas / Sag 2012), it is explicitly committed to a usage-based understanding of language and language research. In addition, it is also an approach which, while introspective in the development of its basic principles, has been used and validated in many studies on first and second language acquisition, a path which Langacker’s Cognitive Grammar (Langacker 2009, 1983) has not yet embarked on. In order to relay the general considerations of CC xG onto collocations and their potential to form creative variants, the following paragraphs will discuss four aspects which lie at the centre of attention with respect to the question of how collocations, semantic prosody and creative alternations of constructions come into being from a CC xG point of view. These four issues are the assumptions that constructions are non-compositional , based on inheritance relations , polysemous and productive . 4.3.1.1 Non-compositionality As outlined in chapter 4.2, usage-based approaches in general and emergentist models in particular assume that frequency and statistical distribution are the most important features which play a role in the acquisition and development of a linguistic structure; some construction grammar approaches, on the other hand, argue that a sequence of items is only cognitively stored as a construction if it is non-compositional, which means that it cannot be broken down into smaller units of form and function without changing the meaning of the initial sequence. While this might be true for formulaic sequences such as let alone (Fillmore / Kay / O’Connor 1988), non-compositionality seems to be a more problematic feature for other constructions like [Ns ] for regular plural marking (Goldberg 2006: 5) or even some supposedly collocational pairs like pretty girl . Specifically, instances of highly frequent, fully compositional yet potentially entrenched cases, like the formation of a regular plural for a noun by adding {S}, led Goldberg to the understanding that constructions can be created either because they form a unique combination of form and function or based on their frequency. This is particularly interesting for the definition of collocations; most significance-based approaches (> 2.2) assume that a collocation can only be regarded as a phraseological unit if it is at least partially semantically opaque, which in CC xG’s terms means non-compositional. From allowing for highly frequent combinations to be cognitively stored as one item, it follows that items which might not be regarded as very opaque, yet occur very often within a language can also be seen as a cognitive unit. In the case of collocations, it would then follow that items like commit a crime and pretty girl can both be regarded as constructions, even if one might be stored as a single item due to its restricted meaning, while the other is instead compositional, yet very frequent. 96 4 Creating Linguistic Creativity 4.3.1.2 Inheritance Relations Inheritance relations refer to the question whether and how constructions can influence other constructions, for example, whether within a collocation, the constructions of individual items contribute parts of their meaning to the constructional meaning and, if they do, whether this can happen to selected aspects ( partial inheritance ) or if the whole meaning part of a construction has to be incorporated ( complete inheritance ). Unlike other construction grammar approaches, CC xG allows for complete as well as partial inheritance (Goldberg 1995: 73) as well as inheritance links through metaphorical extension (Goldberg 1995: 81-96). These two links are relevant for the question whether and how established items such as collocations are influenced by other constructions. Often inheritance relations are described in terms of a top-down hierarchy, where more abstract constructions influence more concrete instantiations. For the example of Adj+N collocations, complete inheritance of the function (in this case ‘modification / description’) can be assumed in a phrase such as pretty creature in which pretty and creature both contribute their individual word meaning more or less completely, but there is another level of meaning which also contributes to the interpretation of pretty creature as a female being, as in sentence (30). (30) BNC BN1 644 The poor young woman, a pretty creature, flushed scarlet and said […] This additional reading could stem from a partially filled construction in the shape of [ pretty +N], with a reading of ‘female’ for all realisations of [N]. As has been outlined before, this in-between construction might have developed through input such as pretty girl or pretty woman and, through partial, metaphorical extension, now transfers this meaning onto creature . 4.3.1.3 Polysemy Polysemy presents another potential link between constructions. It is thus closely related to the question of inheritance relations. Whereas inheritance relations refer to the paradigmatic relationship of constructions, that is, whether one construction can be part of another or not, polysemy operates on a syntagmatic level. Constructions can either be regarded as polysemous or not, which means the same form can or cannot be associated with only one or two or more meanings. If a collocational construction like pretty girl , for example, shows a tendency to have a female reading for its [N], [ pretty +N] might be seen as a kind of semi-lexicalised construction. Yet, house and view as in pretty house or pretty view are as individual items generally regarded as rather gender-neutral concepts, which would have the consequence that not all Ns in the [ pretty +N] 4.3 Phraseology and Language Acquisition 97 construction show the same prosody. As a consequence, there would be two [ pretty +N] constructions (polysemy). Alternatively, it could be argued that the female reading for the N is so strong that neutral Ns also acquire a female reading, or that [ pretty +N] is not a construction in the first place (no polysemy). However, it should be mentioned that this problem arises only because CC xG regards language and constructions as a “highly structured lattice of interrelated information” within which “linguistic constructions display prototype structures and form networks of association.” (Goldberg 1995: 5). As a consequence, prototypical abstract constructions might develop potential overlaps in form. In a more exemplar-based model, where no prototype-abstractions are presupposed, different expressions stem from ad hoc abstractions over a set of available examples (Bybee 2010, Reber 2009, MacWhinney 2001). Here too, partial inheritance could be assumed. But while CC xG addresses the need to explain why pretty creature might trigger a reading along the line of ‘female (human) being’ whereas pretty house does not, an exemplar-based model could simply argue that the processing of pretty or the instances of [ pretty +N] which create an ad hoc pattern, are those which are closest to the perceived input, in this case [ pretty +N <female (human) being> ]. 4.3.1.4 Productivity and Creativity If a construction opens up one or more slots which can be filled with any item (as long as it fits the construction’s requirements), this construction is called productive. If only a limited set of items are allowed, this construction is not productive or simply completely filled, as, for example, is the case with rather fixed idioms such as give the devil his due (Goldberg 2006: 5). [ pretty +N], on the other hand, could be seen as quite a productive construction,because it allows various nouns to occur in the [N]-position. Since Bybee’s (1985) early studies on morphological productivity, the number of different types which can be found within a construction’s slot is usually regarded to be indicative of the productivity of a partially filled construction. The frequency of individual tokens, on the other hand, serves to determine how entrenched a certain type of word form is. The relationship between these two indices can also be taken as distinguishing productive constructions from creative alternations, as they are used in the scope of this study: within a productive construction, creative alternations are new fillers which, since they are not established uses of a construction, occur with a very low token frequency. Therefore, the type-token ratio is close to one, whereas productive, yet, conventional combinations occur more often and thus show a type-token ratio which is closer to zero. With a token frequency of 21.969 within the BNC , pretty woman could be seen as one example of the productivity of a pattern like [ pretty +N], but it is certainly not a very creative 98 4 Creating Linguistic Creativity expression. Pretty man , on the other hand, which occurs only once in the BNC , could instead be classified as a creative alternation. As proposed in chapter 1, creative alternations of collocations might be possible because collocational pairs are supported by an underlying level of additional meaning of semi-fixed constructions. This connection has already been pointed out by research (chiefly) within corpus linguistics, but only from a descriptive point of view. However, it does not suffice to focus merely on the rarest and most creative alternations; a model of first language acquisition needs to be able to account for established phraseological phenomena first, before it can think about including more creative alternations. Here, construction grammar in general and CC xG in particular, provide a comprehensive framework for understanding the general process of construction building and the potential implications of a usage-based, constructionist view on language acquisition. However, collocations, like other formulaic sequences, seem to be slightly more special than individual words or even argument structure constructions, since they lie between fully productive and completely fixed constructions. The fact that constructions seem to operate on a kind of productivity cline has, for example, also been observed by Bybee and Eddington (2006). Braðdal (2008) too explicitly examines the interaction between type frequency and semantic coherence. In a study on Icelandic argument structure constructions, she identifies type frequency and semantic coherence as the most influencing factors for the development of open schemata (Figure 4.1). 4.3 Phraseology and Language Acquisition 99 Figure 4.1: The productivity cline according to Braðdal (2008: 172) Semantic coherence in Braðdal’s study describes a feature which in phraseology is commonly referred to as opacity or idiomaticity. As the graph in figure 4.1 shows, the more idiomatic or opaque the meaning of a sequence, the less productive the item becomes and vice versa. Creative alternation with these most semantically coherent units can then only be formed through analogy but not via the creative filling for an open slot, as would be the case within a schema. As has been suggested before, collocations operate exactly along this line, with more transparent, less coherent items like pretty girl towards the high-type-frequency, low-semantic-coherence side of the spectrum and almost fixed combinations, like pull a face , on the low-type-frequency, high-semanticcoherence end. This observation links in with the stage model presented by Wray and Perkins (2000). They also identify different stages of more analytic or more holistic formulaic sequences, but, in addition, try to relate these to the process of language acquisition. 100 4 Creating Linguistic Creativity 4.3.2 A Stage Model The last two chapters discussed different approaches to language acquisition and their capacity to account for variation and change. It attempted to show that theories based on Universal Grammar often seem to be too static in their views to be able to explain the vast spectrum of linguistic alternation and change. Furthermore, as Matthews (2001: 151) argues, “[i]f languages have properties that change independently of ‘grammars’ or ‘I-languages’, it is natural to ask how such abstractions can be justified.” Thus, usage-based advances seem to be much more suited to incorporating change, since they are based on the assumption that language is a cognitive ability which, like other cognitive abilities, is constantly shaped and reshaped based on the input the brain receives. While this might be convincing for the selected examples of the acquisition of words and phrases, the question remains whether this approach can also serve as a model to explain creative variation within collocations, or just collocational phenomena to begin with. As chapter 2 has shown, collocations operate on a level between traditional lexicon-grammar distinctions. They are neither words with one meaning within a certain context nor pure syntactic structures with the purpose of structuring relations within an utterance. Collocations are defined through a certain kind of limited variability, so a comprehensive approach to first language acquisition needs to be able to account for variation like pretty girl and pretty face as well as potential restrictions such as pretty man . To make matters worse, it should furthermore be able to explain why, under certain circumstances, these restrictions can be neutralised and, of course, what those circumstances might be. In an attempt to find a comprehensive model for first language acquisition of phraseological phenomena, Wray and Perkins (2000: 123) suggest a model which in principle is based on Locke’s theories on neurolinguistic development (Locke 1997). In this, he proposes two neural mechanisms, specialisation in social cognition and a grammatical analysis module . Such a conception is not unlike Universal Grammar, since it is also based on holistic storage of lexical items and the analytical aspect of structures. However, while Universal Grammar sees lexical input and grammatical structure as different entities operating on different levels, Wray and Perkins (2000) argue for a developmental model with phases of holistic and analytical language processing (figure 4.2). 4.3 Phraseology and Language Acquisition 101 Figure 4.2: A stage model for the acquisition of formulaic language (Wray / Perkins 2000) Similar to more general, usage-based approaches towards L1, they argue that the first phase is dominated by simple imitation of unanalysed chunks, like, for example, thank you . These holistic chunks are only uttered in combination; they function as one item, serving a specific, pragmatic purpose. According to Wray and Perkins, the second phase then begins as soon as the holistically stored items reach a kind of critical mass. They speculate that this coincides with the successful end of the vocabulary spurt 10 . Next, an analytical phase begins in which a small amount of phraseological chunks are retained holistically, while the majority complete a process of structural analysis which originates from a kind of urge supported, or maybe even initiated, by the grammatical analysis module ( GAM ). Regarding the origin of this module, Wray (2002) later speculates: Although the involvement of the GAM can be seen as a natural consequence of maturation, motivated by the child’s need to express ever more complex novel messages and interpret decontextualized linguistic input, the extent of the adoption of the analytic strategy may be affected by beginning literacy and the analytic method of formal education. (Wray 2002: 134) 10 The vocabulary spurt is generally defined as a fairly rapid increase in children’s vocabulary during their second year of life (for a more detailed description see Ganger / Brent 2004). However, not all children seem to have evidenced a vocabulary spurt by the age of three (Goldfield / Reznick 1990). Yet, while Mervis and Bertrand (1995) claim to have identified “late spurters”, Goldfield and Reznick (1996: 242) argue that “[a]lthough a spurt will increase the size of the lexicon, continuous and gradually increasing growth will also produce a sizeable vocabulary.” Ganger and Brent (2004), who used statistical modelling to investigate the development of children’s vocabularies, also have doubts about the concept of a vocabulary spurt as a stage in early language acquisition. 102 4 Creating Linguistic Creativity As an indication of this rather analytical phase, Wray puts forward a child’s difficulty in dealing with pragmatic phenomena such as sarcasm, irony, idiom and metaphor (Wray 2002: 135). Unfortunately, neither Wray and Perkins (2000) nor Wray (2002) elaborate the exact cognitive processes any further. However, since this phase is restricted to ages two to eight, it can be assumed that the underlying processes are to be assigned to a particular ability which is not available to an adult speaker of a language. In phase three, the young native speakers restore parts of the formulaic sequences as holistic chunks for reasons of cognitive efficiency. Here, the frequently used sequences (tokens) especially are chunked together again. This phase continues until the age of eighteen. After this, stage four continues at adult proficiency with most formulaic sequences holistically stored and only a small percentage in a pre-analysed, easy-to-be-altered shape. Furthermore, Wray points out that the fact that formulaic sequences like too big a piece , if I were you or if I was younger are not irregular exceptions but rather quite regularly acquired chunks which - unlike other sequences - retain their holistic shape (Wray 2002: 130-132). She calls this the needs only analysis , arguing that categories, or even rules, are not broken into more analytical pieces “unless there were a specific reason” (Wray 2002: 130). In Wray’s framework, this would be the communicative context. Therefore, if an item serves a specific function which does not change through modification, form and function are kept as one whole unit. This is essentially what Tomasello suggests when he puts socio-pragmatic joint attention frames at the centre of first language acquisition (Tomasello 2005: 282-322). Through the directed interaction with its environment, the human brain receives (linguistic) input, which is immediately mapped with a situation and its context, but only further segmented if necessary. Wray’s “specific reason” seems to be a bit vague here, but the answer as to how the need to analyse a chunk might arise could be found within emergentist research: as Bybee argued, the very first instance of an item is already stored (Bybee 2010: 18), and this entry can then be strengthened through repetition and increasing token frequency or linked to similar items. This process of analogy might be the reason for analysis. Too big a piece , for example, is very likely to be perceived as one item, since it is mostly used in this exact shape (high token frequency). If I were you and if I was younger are similar cases, yet the encounter with instances like if I were him or if I was prettier leaves the reoccurring elements unanalysed, while between the altered parts you - him and younger - prettier analogies are formed; they occur in a similar position and hence might serve a similar function (high type frequency). Thus, gradually a pattern emerges which retains only one slot like [ if I were NP ] or if [ I was AdjP] or opens up new areas of change such as, for example, [the Xer the Yer] in the more the merrier or the older the sillier . The result is a network of items which 4.3 Phraseology and Language Acquisition 103 all started as very specific form-function pairs, but while some stuck to their initial shape, others through variation in input and analogy developed further categories and abstractions. Quintessentially, this is the same conception as in most construction grammar approaches, where form-function pairs are the ubiquitous foundation of language. However, the view of language in (most) constructionist approaches is often aimed at a more comprehensive, systematic conceptualisation. This becomes clearest when seemingly unmotivated restrictions, like the fact that some verbs like give can be found in a ditransitive as well as a prepositional dative construction like He gave her the letter. or He gave the letter to her. while others, like bring , prefer a construction with a prepositional dative (Gries / Stefanowitsch 2004). As mentioned before, the usual explanation follows the concept of pre-emption (> 4.2.2), which means that a more regular, non-formulaic form can be blocked by a more opaque, yet also more familiar - or even just more frequent - form; this would divide linguistic knowledge into a regular, rule-based part and an earlier acquired, irregular and more formulaic part. In consistently putting chunks first, Wray’s assumption of a needs-only analysis is not only able to account for item-specific irregularities, but also avoids an acquisitional process of pre-emption altogether. In fact, since forms like too big a piece are seen as one among many early acquired chunks, irregularities could instead be regarded as the most basic pieces of a gradually developing linguistic system. Another difference between Wray and Perkins’ model and usage-based construction grammar is the enormous amount of time which those who acquire a first language seem to need to analyse most of the holistically stored chunks in order to reach finally adult proficiency. Wray and Perkins estimate a span of roughly 15 years, while construction grammar studies were able to show that children as young as two years only need a couple of minutes (Tomasello / Akhtar 1995) to adapt a new word into an existing system. Although this appears quite a dramatic difference in timescale (15 years vs. 15 minutes), a closer examination reveals that the two approaches view the same phenomenon from two different angles. The span of 15 years refers to overall performance and, indeed, there are studies which suggest that some sequences take a relatively long time to form a holistic as well as an analytical mental representation (Levorato / Cacciari 1992; Nippold / Martin 1989; Prinz 1983). This observation does not contradict research which postulates a much shorter time for the analysis and categorisation of individual items (Reber 2009; Tomasello / Akhtar 1995), yet the stage model as suggested by Wray and Perkins (2000) might be misleading here, since it does not visualise the ongoing process of association finding and category formation during the acquisition of a first language. 104 4 Creating Linguistic Creativity 4.3.3 Complex Adaptive Systems Another model, which, like Wray and Perkins’ stage model, is closely related to usage-based, emergentist approaches, is the conception of language as a Complex Adaptive System ( CAS ) (Ellis / Larsen-Freeman 2009; Larsen-Freeman / Cameron 2008). As the name suggests, CAS theory has its theoretical roots in complexity theory, which in its modern form was first introduced in areas of (natural) sciences like biology and physics. It is based on the idea that a system, such as an organism or a neural network, is not a static collection of linking rules and building blocks, but rather by adapting to its environment a system changes itself and thus forms the basis for the next adaptation, the next change. Taking Waddington’s (1957) work on human genes as an example, van Geert (2003: 648-649) summarises this idea nicely when he writes: “[…] the form of the body is literally constructed by the construction process itself - and is not specified in some pre-existing full instruction set, design or building plan […]”. Applied to cognitive processes and mental development, it follows that knowledge in a complex adaptive system does not consist of static units of information but is rather a process itself, which is shaped and re-shaped based on the input it receives and the developmental stage it is in. Ellis and Larsen-Freeman suggest the following seven major characterisations as being fundamental to any complex adaptive system (Ellis / Larsen-Freeman 2009: 14-18). Below, these features are listed accompanied by a short explanation with respect to language and language acquisition: 1) Distributed control and collective emergence : Language can be viewed as an individual system (idiolect) or a more global, community-dependent system (sociolect/ a language), but global patterns always emerge through longterm interaction between individuals. In the case of semi-productive lexical items like collocations, this implies that it might be possible to identify collocational pairs which are more accepted than others. The use and acceptance of a collocation are still dependent on the individual’s system, especially for less frequent combinations. Thus, while one native speaker might accept a combination like pretty boy or commit a mistake , another might reject it. 2) Intrinsic diversity : As a consequence of collective emergence, there are no ideal representative speakers of a language, no ideal speaker-hearer as in Universal Grammar and no fixed benchmarks for full mastery of a language. This principle is particularly relevant for an approach which assumes a general system of syntactic rules or constructions for each native speaker of a language. According to CAS , these generalisations should be treated with caution, since they suggest a unified structure which might not do justice to reality. 4.3 Phraseology and Language Acquisition 105 3) Perpetual dynamics : Change and reorganisation are constantly ongoing processes which are based on the input a system receives. In language, this input is not only formal data in the shape of phones and letters but also socio-pragmatic information and the context from a situation. The study of language should, therefore, encompass an analysis of the context and its potential influence on the item under investigation. 4) Adaptation through amplification and competition of factors : This aspect stresses once again the importance of setting and context, but while perpetual dynamics focus on the process as such, this criterion emphasises the relationship of components within a system and draws attention to the observation that factors can support but also suppress one another. In the acquisitional process for collocations, for example, input in the shape of the very same combination of items could lead to the fossilisation of an item as one chunk (token frequency), while the encounter with a collocation which shares only one collocate with a familiar collocational combination could lead to the expansion of a concept (type frequency). 5) Non-linearity and phase transition : Input and change do not develop in a linear fashion. A considerable amount of data might result in hardly any observable change, while one new utterance could serve as the metaphorical straw that breaks the camel’s back and thus trigger a new construction. 6) Sensitivity to and dependence on network structures : Despite the impression of randomness which complex adaptive systems at times might give, all systems are assumed to be based on underlying structures. In language, cognitive processes (as discussed in 4.2) could be regarded as a general network structure; evolutionary processes, such as brain development, could also play a role. 7) Change is local : Unlike a rule-based perception of language, CAS assume that change happens through interaction and is thus at first restricted to a very specific scenario, which might or might not be extended depending again on external circumstances and internal network structures. Thus, change could be regarded as highly contextand / or salience-dependent. Most of these characterisations are, as mentioned before, insights which might also stem from usage-based or neurolinguistic research, but what makes CAS interesting for a study on collocations and their creative alternations is an unreserved commitment to the consequences of a usage-based, emergentist view on language. For, if language emerges through the constant interaction of input and internal cognitive processing, the result is more likely to be an individual, constantly changing system which shows some general regularities but also a lot of item-specific, selective structures than a fixed set of rules with lexical fill- 106 4 Creating Linguistic Creativity ing. Therefore, a CAS point of view on language in general, and language acquisition in particular, has two far-reaching consequences for linguistic research. First, that the distinction between competence and performance is superfluous, since the performance of a system like the human brain at a given point in time is also the competence it has at that same moment (Larsen-Freeman / Cameron 2008: 131-132). This is particularly interesting for methodological reasons. On the one hand, it might seem as if it facilitates a researcher’s work since all s / he has to do is ask an individual or even him / herself about the structure of a sentence or the acceptability of language, since the output equals the actual underlying competence. This method is more direct than calculating acros several answers and extrapolating potential mechanisms. Nevertheless, it is only one individual’s system. So, on the other hand, this means that established tools such as large corpora are to be handled with great caution. They are built from different texts produced by different systems at various stages of their linguistic development, yet each entry is treated as if it is operating on the same, comparable level. A second issue is the fact that CAS places variation and change at the centre of all language development. Thus, instead of being treated as peripheral anomalies, creative variation could provide valuable insight into the system, its mechanisms, and the stage it might be at. 4.4 The DMCDC-Model: A usage-based model of collocations Since, like Wray and Perkins’ stage model, CAS and CC xG link in well with observations from usage-based studies (> 4.2), the three can be easily combined to form a framework which not only outlines the developmental process of formulaic language but also explains in more detail how this development, particularly in stages two and three, comes into being. Figure 4.3 shows a unified model of the potential development of formulaic sequences within a CAS framework. Figure 4.3: A dynamic model for the cognitive development of collocations ( DMCDC ) 4.4 The DMCDC-Model: A usage-based model of collocations 107 In principle, this dynamic model for the cognitive development of collocations ( DMCDC ) is very similar to Wray and Perkins’ stage model, but with two modifications which indicate its commitment to usage-based theories and make it not only applicable to established language but also creative variation. The first addition which stands out is that of starting and end points in the shape of constructions. As argued in 4.2 and 4.3.1, construction grammar approaches provide rather convincing evidence that language is indeed cognitively structured as units of form and function. At the most basic stage, a construction is a simple 1: 1 relationship between a distinct form, like a word, and a clearly defined meaning. As Wray explains, in a model which assumes a needs-only analysis every lexical item starts as this basic form-function pairing, but while some phrases, like too big a piece , never surpass this stage, other concepts become more flexible and develop the potential for variation at one point or another. Most collocations and their creative alternations like commit a crime and commit a mistake would fall into this category. In general, they would still be regarded and retrieved as one unit, but slots for variation are also cognitively stored, and could be altered if the communicative needs require it. Finally, at the top of the diagram lie constructions which, through frequent use and variation in type input, have developed into several, equally productive constructions. This would apply to most first “words” (like gimmi ) and sentences. These, as Tomasello (2005, 1992) suggests in his island-verb hypothesis, are used almost exclusively within one pattern at a very early stage, but gradually develop into independent, fully productive words and syntactic structures. A further aspect which is introduced is the representation of the developmental processes as a spiral instead of a line. This has been done with CAS theory in mind and should emphasise the assumption that language acquisition is a perpetual process which creates new structures and entries based upon existing cognitive categories. Wray and Perkins’ model was initially designed with formulaic language in mind, but, in fact, the proposed changes now make it also applicable to more productive constructions. So, with respect to collocations, two questions remain: how can this model account for collocations as well as collocational variation, and which combinations can be regarded as collocations in the first place? The latter was addressed in chapter 2, which demonstrated that context as well as significance-oriented approaches have in common a focus on specific combinations which show either paradigmatically or syntagmatically restricted variability. Therefore, they seem to stand in contrast to productively open free combinations. Taking into consideration cognitive and neuroscientific implications from language acquisition, it is, however, debatable whether a completely free and context independent combination exists within a usage-based framework in the first place. Nevertheless, in the case of collocational combinations, 108 4 Creating Linguistic Creativity examples can certainly be found at various points on the formulaic spectrum, with examples like blow a fuse towards one end and pretty girl towards the other. Thus, it would be interesting to see whether these combinations are also cognitively processed in a different way, as the DMCDC -model would suggest. As for the question of how exactly collocations fit into this usage-based, emergentist model, the answer is: basically like all other linguistic phenomena. If, as Wray (2002) and Tomasello (2005) suggest, all languages are at first stored as chunks with a certain function and / or meaning, collocations too start off as multi-word items. This seems to be close to a Universal Semantics’ conception of lexical and phraseological chunks, which supposedly have an underlying deep structure meaning (Katz / Postal 1963). However, it is important to emphasise that, unlike UG ’s claim, any instances of language are at first stored as an item and its accompanying function. Moreover, taking the CAS perspective seriously, the genesis of a collocation with its variations and restrictions is not a prototypical generalisable process, but proceeds differently within each individual and would, therefore, be highly individual and subjective. The reason why there are then still areas of agreement in linguistic research on what might qualify as a collocation comes from the fact that the input a child receives stems from a (linguistic) community. Through social interaction and negotiation, this community shares a common linguistic framework. Thus, even if processing and the acquisitional process are highly individual, there are patterns which emerge, although only gradually and through long-term interaction within a community. Therefore, a collocation might well start as various variations, depending on the individual and his / her immediate surroundings. Going back to the example of pretty girl , Lara, one of the children from the CHILDES corpus, for example, uses pretty 39 times within a time span of about a year, starting at the age of 2.7. Among the first instances, the phrase pretty dress is the most frequent 11 combination. Gradually other [ pretty +N] pairs occur, like pretty plant (2; 10) or pretty one (2; 10). However, in 15 cases, the [N]-slot is filled by words referring to clothes, like pretty dress (2; 7), pretty jumper (2; 8), pretty blue socks (2; 11), pretty shoes (2; 11), pretty shorts (3; 0). Thus, it seems that Lara is gradually expanding this concept of [ pretty +N], yet not, as suggested above, based on the concept of ‘female human being’ but apparently around her 12 prototypical phrase pretty dress . Generally speaking, these associations are then assumed to be strength- 11 Within a month Lara uses pretty eleven times, seven times in combination with dress . 12 Another child from the CHILDES database, Thomas, behaves quite differently. Between the age of 2.3 and 4.6 he uses pretty three times: on its own ( pretty , 2; 3), with a noun ( pretty stones , 2; 10) and with an adjective ( pretty good , 4; 6). This indicates that, while it is quite likely that Thomas in general also used pretty more than three times during these two years of his life, he seems to associate this item with different words. Furthermore, 4.4 The DMCDC-Model: A usage-based model of collocations 109 ened or modified, depending on the kind of linguistic input this child receives within the following hours, days and even years. If, hypothetically speaking, the only combination s / he ever hears is pretty dress , the sequence will be stored as one item similar to too big a piece . A more likely scenario is that soon other instances like pretty socks or pretty shoes will follow. According to emergentist models, this is the time when through analogy new patterns are formed which then open up the initial item, allowing for variation but also a broadening of the concept, for example away from an utterance which only refers to clothes to an extension, which might read something like ‘applicable to anything used by women’ and eventually just retaining a slight female or womanish connotation. At the same time, structural perception is also shaped and reshaped; splitting the former one-item utterance into two constituents, which then again allows for other analogies like dress - socks - one as sharing some functional features. This attempt at drawing a rough sketch of the acquisition of a collocational item shows that cognitive processes result in a multitude of relations and categories and, even more importantly from a conceptual point of view, by zooming in on one process, other processes are pushed into the background, although this does not mean that they are not very active. So, the input of any utterance of pretty dress strengthens the combination as such, while a variation in input triggers analogies, formation and / or re-formation of categories and functional concepts, which again might broaden the concept of ‘items which go with pretty’, allowing for new analogies, categories and readings to be formed. After this stage of continuous (re-)analysis, novel input is likely to become less. This, in turn, increases the ratio of familiar uses and combinations, which causes the system to stabilise at a certain point. Once again, this point is not necessarily the same for each individual; it might vary depending on the diversity or even poverty of the input. Thus, what might seem to be a fairly restricted collocational combination to one person could well be a free or at least more open combination to another. At the same time, the fact that categories and constructions could level out at some point might not necessarily prevent the system from extending a concept or finding new analogies once novel input causes it to re-evaluate established structures. Frank Palmer could be seen as one of the more prominent examples for this mechanism. While he passionately argued for the restricted and fairly limited distribution of pretty in the first edition of his Semantics (Palmer 1976: 96), he conceded in the second edition that even though it is not normal, a combination such as pretty boy might be thinkable (Palmer 2 1981: 76-77). Of course, one could argue here that Palmer’s change of mind did not come from the different output from these two children might support the assumption that language should be regarded as quite an individual complex adaptive system. 110 4 Creating Linguistic Creativity day-to-day conversation but was rather the result of an academic discussion, the question is: does this make a difference for the model as such? As so often, the answer is “yes” and “no”. No, simply because it does not matter whether re-categorisation is triggered by a conscious or subconscious analogy, that is, whether the brain works out analogies through its own cognitive processes or because it has been pointed towards an analogy by its environment. Both processes result in the same mechanism; they link entities which have not been linked before and thus allow new categories to be formed and the system to be re-structured. This is, however, only possible if the system is able to internalise this novel input. Internally formed analogies are internalised expressions by definition, but this is not necessarily true for analogies an individual has been made aware of by his / her outside world. The reason is that categories which are ultimately responsible for the restructuring of a system can be formed in two ways: subconscious information integration through analogies and frequency processing or conscious rule-based learning through memorisation (Reber 2009). While information-integration is one of the basic cognitive processes that seem to happen almost automatically, rule-based category learning relies on working memory with all its benefits and limitations. Therefore, unlike information integration, conscious category learning does not need an immediate response to test a categorisation hypothesis and file an item accordingly, but is restricted in its capacity, so not everything that has been committed to memory once stays within the system. This also seems to be the reason why the lyrics of a favourite song seem to stick, while a list of French vocabulary might not, and also why input in the form of an explicit formulation of a category might or might not influence an individual’s long-term performance and perception, as Frank Palmer’s example shows. In recent years, neuroimaging has found more and more indications that category learning might indeed be a multiple system (Reber 2013, 2009), which again could have far-reaching consequences for language acquisition and language learning. As demonstrated above, language as a CAS is based on the assumption that a language emerges based on cognitive processes - the most important of which with respect to language are analogies, chunking and categorisation (> 4.2.1) - as well as interaction and use (> 4.2.2). Furthermore, it seems that category learning is an ability which every human being is equipped with and which in itself consists of two processes. Since usage-based studies were able to show that categorisation processes take place from a very early stage on (Braine 1963) and presumably without conscious category learning (Tomasello / Akhtar 1995), and given the age of the participants and the fact that native speakers of a language are generally unaware of many structures and regularities in their mother tongue, it is very likely that the majority of category learning within first language acquisition, at least in 4.5 Summary and Implications 111 the early years, is based on implicit information integration. Second language learning, on the other hand, often takes place in a classroom setting with a relatively small amount of input and a discontinuous setting, as well as preset categories in the shape of grammatical rules and dictionary entries. Hence, it is reasonable to suppose that foreign language learning is, in constrast, based on rule-based category learning, which might seem more efficient in the short run, but is less long-lasting without a phase of implicit practice and repetition, as Reber emphasises: One of the challenges of identifying the contribution of implicit learning to complex skill learning domains is that the lack of awareness of the implicit information by experts may generally lead to an inordinate focus on the explicitly learned knowledge and a relative lack of attention to the contribution from implicit learning. (Reber 2013: 2039) This “relative lack of attention to the contribution of implicit learning”, however, might be responsible for non-native speakers’ difficulties in obtaining nativelike proficiency, let alone a native-like grasp of phraseological phenomena. An important note at this point is to acknowledge that, even from an emergentist, usage-based perspective, the learning of a second language cannot be the same as the acquisition of a mother tongue. Yet, as cognitive researchers would argue, both abilities rely on the same cognitive processes and are dealt with within the same brain. So, it is very likely that it is the method which makes the difference: while first language acquisition is usually regarded as an example of implicit learning, second language learning is largely dominated by explicit learning methods, neglecting the need to commit the learnt input to memory via implicit learning mechanisms such as repetition. In the case of collocations this means that whereas native speakers dedicate several years of implicit dayto-day training to the acquisition of language patterns, language learners are given vocabulary lists which have to be learnt by heart ( Jehle 2007; Hausmann 1984) within a short amount of time and often without any further chance for implicit learning through repetition and use. Furthermore, instead of seeing collocations as emergent and potentially flexible linguistic items, most EFL students still learn that collocations are fixed combinations. 4.5 Summary and Implications On the basis of the cognitive character of collocations (> 2) and creativity (> 3), this chapter has outlined models of language acquisition which have the potential to account for phraseological phenomena as well as for creative language use. It has shown that universalist approaches either lack the ability to ac- 112 4 Creating Linguistic Creativity count for linguistic phenomena which lie between purely syntactic or lexical structures, such as collocations, or that converge with usage-based thinking (> 4.1-2). Hence, a combination of usage-based approaches and the dynamic conception of language as a Complex Adaptive System (> 4.3) has been selected as the basis of the DMCDC -model, a cognitive model of collocations which is not only able to account for linguistic patterns and their creative alternations but is also strongly supported by recent findings in neuroscience (> 4.4). Thus, this model shows that a cognitive perspective on collocations is, in fact, not only possible but also necessary to understand the full extent of the interrelation between established collocational structures and their creative alternations. Furthermore, this model tries to answer RQ 1 - which asked about the potentially cognitive character of collocations - at least on a theoretical level. Several implications can be drawn from this: First, semantic prosody emerges through constant organisation and reorganisation within a Complex Adaptive System. Hence, instead of an additional static level of meaning, it might instead be an emergent category which develops individually over time. According to the DMCDC-model, semantic prosody is thus more likely to be found in mature systems. Therefore, adult native speakers can rely on semantic prosodic patterns in their processing of creative collocational variation, while children are more likely to prefer a more conservative reading and tend to reject more creative collocational combinations. A second aspect is that, since language is assumed to be an individual, gradually emerging cognitive ability, generalisations such as semantic prosodic tendencies can only be made retrospectively and in the shape of an estimated abstraction of adult language. Since children are still developing language patterns, it is more difficult to establish one overarching pattern during the early stages 13 of language acquisition. Corpus analyses of adult language, however, might yield some results, even though the fact that semantic prosody does not account for every instance within a corpus does not automatically mean that this reading has to be rejected in general; it might simply be one example of a sentence produced by a speaker with a different internal pattern. Finally, assuming that native speakers acquire their mother tongue through implicit mechanisms of analogy and categorisation, while in second language learning a phase of implicit learning is often neglected, L2 learners of English are more likely to show behaviour similar to that of children when it comes to formulaic sequences, since they have had less input. Also, the fact that input 13 Based on Reber (2013), who argues that the concept of subconsciously acquired and analysed data is, in fact, ubiquitous in everything a human being does, this study regards language acquisition as a life-long, ongoing process. 4.5 Summary and Implications 113 in second language learning tends to be more prestructured and that formulaic sequences are often memorised as one chunk - as would be done in the early stages of first language acquisition - supports a more conservative evaluation of creative language by non-native learners of English. At this stage, these assumptions are, of course, mere hypothetical implications based on a plausible, yet still rather theoretical conception. The second part of this study will therefore put this usage-based, dynamic model of collocations to the test. Starting with possible ways to operationalise and thus identify collocational patterns and constructions (> 5), the following pages will not only concern themselves with potential stages of collocational acquisition and the differences between the attainment of collocations in first and second language acquisition and learning (> 6). They will also test which role influencing factors such as ‘context’ and ‘creativity’ might play to find out how pervasive the assumed meaning level of a collocational construction, such as [ pretty +N] or [ commit + NP ] really is (> 7). 5 Measuring Collocations - Methodological Considerations It is going to take us lots of tools to understand language. We should try to appreciate exactly what each of the tools we have is good for, and to recognize when new and as yet undiscovered tools are necessary. ( Jackendoff 2002: xiii) As outlined before, collocations seem to be a linguistic phenomenon which cannot easily be defined from either a traditionally lexical or a purely structural point of view. On the one hand, as a combination, a collocation’s constituents often exceed the meaning of its parts and form an idiomatic unit of meaning, while unlike lexemes or most idioms, the individual collocates remain flexible enough to be exchanged or recombined. Furthermore, they are seemingly able to create new combinations which, despite their rather creative character, are likely to be interpreted against the more common reading of established combinations. Thus chapters 2 and 3 have discussed collocations and linguistic creativity, while chapter 4 has examined how these phenomena could contribute to a more comprehensive conceptualisation of language in general and language attainment in particular. So far, however, most of these considerations, especially the models proposed for collocational constructions (> 2.3) and the DMCDC model of collocation attainment (> 4.4), are predominantly based on somewhat theoretical reasoning. Therefore, in order to find a suitable methodology to put these models to the test, this chapter will concentrate on potential ways of identifying and analysing collocations and their more creative alternations. It will therefore focus particularly on potentially different groups of speakers ( RQ 2a / b) as well as the role the contextual setting might play in the interpretation of collocations and their creative alternations ( RQ 3). Chapter 2 has already introduced a range of views which approach these questions from different perspectives. While significance-oriented approaches (> 2.2) see collocations as a phenomenon of lexical exception or special kind of multi-word combination (Howarth 1996; Hausmann 1985, 1984), others approach them from a more context-based perspective (> 2.1), arguing that the more frequently words co-occur, the more likely it is for this combination to be somewhat special. Of course, it has repeatedly been pointed out that the more often items are encountered and produced, the more likely they are to become cognitively stored units within the human mind (amongst others Bybee 2010; 116 5 Measuring Collocations - Methodological Considerations Ellis 2006; Tomasello 2005; MacWhinney 2001). So, there is strong evidence which suggests treating collocations like a kind of multiword lexeme, yet there is also a certain degree of flexibility within these word combinations. The task of the following chapters will be to discern how this flexibility can add to a more comprehensive understanding of collocations. However, as outlined before, like the definition of collocation , the appropriate approach towards a comprehensive study of these special kinds of multi-word co-occurrences is still a matter of debate. While most presumably agree that the era of the earlier contemplative introspection is over (> 2.2), research within this field is quite often based either on corpus data or elicitation methods, with the latter being predominantly used in the field of language acquisition and EFL research 1 . Statistical analysis of corpus data, on the other hand, traditionally has more descriptive applications in mind, like the exploitation of linguistic data in order to adapt phraseological phenomena for stylistic, lexicographic, or NLP uses. To classify different methodologies within linguistic research, several dimensions have been established. The distinction between elicitation methods which focus on language production as opposed to language perception , for example, can be regarded as fairly established. Siyanova and Schmitt (2008) further propose a second dimension, which subdivides linguistic research into online and offline methodologies. This dichotomy refers to the way in which linguistic data is obtained. Online tasks are performed under a certain time constraint and are thus assumed to reflect underlying cognitive processes. Offline methodologies, on the other hand, are not temporally limited and are therefore thought to be less clearly related to the cognitive processes involved in language production or perception. Table 5.1 visualises this typology and gives examples of potential tasks which fall under the respective classifications. online offline productive interviews, text production tasks, etc. gap filling tasks, cloze tests, translations, etc. receptive eye movement studies, self-paced reading, etc. matching tasks, multiple choice tests, judgement tasks Table 5.1: Typology of linguistic methodologies (partly based on Siyanova / Schmitt 2008) 1 Amongst many others, for example, Ambridge / Pine / Rowland (2011), Kuska / Zaunbauer / Möller (2010), Conklin / Schmitt (2008), Cameron-Faulkner / Lieven / Tomasello (2003), Bybee / Eddington (2006), Akhtar / Tomasello (1996). 5 Measuring Collocations - Methodological Considerations 117 To investigate the scale of the dynamic DMCDC -model as well as to find answers to RQ s 2 and 3, there are, however, three central criteria which a methodological toolkit needs to meet. First and foremost, tasks within this toolkit must be capable of studying the variables “age of the participants” (‘age’), “creative variation of collocations” (‘creativity’), and “contextual factors” (‘context’). While ‘age’, at least at first glance, can be easily controlled through the selection of participants within all four types, ‘creativity’ and ‘context’ are particularly hard to monitor in production methodologies. This is because even with a thematically controlled input (such as a given topic for a text production task) it is very likely that participants would stick to standard combinations, especially in a test situation. That is, unless, perhaps, they have been explicitly told to use more creative language. Furthermore, it would be difficult to control which collocations participants use in online production tasks in the first place. Offline production tasks, on the other hand, need to specify a certain context, which, to a certain degree, makes them unsuitable for analysis of contextual influence. Secondly, in order to analyse the behaviour of native and non-native speakers as well as different age groups, this study needs samples from at least six 2 different groups of participants. To collect a database from online data would then be, in any respect, a rather costly undertaking. A third prerequisite is degree of difficulty: since the behaviour of participants from different age groups should be compared, the tasks have to be easy enough so younger or less advanced speakers can perform them too, but challenging enough for adults or advanced speakers to stay focused and not lose interest. These prerequisites suggest that offline perception tasks are most suitable for the study at hand. However, as it has been mentioned before, data obtained from offline tasks might not be as directly related to cognitive processes as onlinetask data. Therefore, it will also be interesting to check the results of the offlineperception part of this study against a large database of online produced data - that is, a corpus. Moreover, as chapter 2 has already demonstrated, measuring the strength of association between two items based on corpus data could be considered one of the standard ways to approach collocations, and it will thus be interesting to see how these measures fare against a more carefully controlled setting. This is why this chapter will test corpus-based as well as more cognitive approaches towards a collocation’s operationalisation for their suitability, to examine more creative alternations of collocations (> 5.1-2). Chapter 5.3 will then discuss potential shortcomings and pitfalls, before chapter 5.4 suggests 2 A minimum of six includes one native and one non-native group of adult speakers of English as well as two groups of younger and / or less advanced speakers with a native and a non-native background. 118 5 Measuring Collocations - Methodological Considerations an adequate methodology for testing the scale of the DMCDC -model as well as answering RQ 2a / b and RQ 3. 5.1 Online Production Tasks - Corpus Data and Statistical Association Measures for Collocations In modern linguistics, corpus research is one of the most influential achievements within a linguist’s tool box. Large online corpora especially enable the researcher to answer questions on the distribution and usage of words or phrases within minutes without the need to retreat to subjective introspection. In the case of collocational research, this is, of course, particularly useful for looking more closely at the company a lexeme keeps (cf. Firth 1968: 179). Aside from being independent from other, more time-consuming or less objective tools, corpus analysis offers a substantial range of further advantages. Since large corpora like the BNC contain a selection of samples of text from different sources, an extensive corpus represents various aspects of a language which not even a native speaker might be able to recall ad hoc, like, for example, different readings for a high-frequency word like have (amongst others Louw 2003: 1.1; Hunston 2002: 142; Stubbs 1995: 24). Furthermore, different hypotheses can be tested within the same, constant database. Moreover, the corpus does not get tired if asked too many questions, like human participants would, and it cannot be influenced by external factors, such as mood, time of day or simply basic priming mechanisms. With the help of statistical methods, corpus data enables the researcher to extrapolate certain tendencies and obtain a first, if not comprehensive, impression of a lexeme’s collocational profile. Since most of them focus on testing the association strength of two or more words, these methods are often referred to as association measures (Evert 2005: 20-21). Most of them share the same assumption; they take the corpus as a data sample and use its observed behaviour to project it onto a population, or, in this case, the whole of a language. Therefore, frequencies in a corpus could be seen as nominal scaled data, which functions as a sample to the hypothesis that ‘[i]tem A and item B co-occur by chance’ (H 03 ). The basic method is then quite simple: frequency counts of single, as well as co-occurring instances of item A and item B within a corpus, are compared to expected frequencies, under the assumption that these items would occur 3 H 0 , also known as null hypothesis , is used as a kind of benchmark in hypothesis testing. Here, parameters of the respective hypothesis are set before statistical measures are used to test for the (non-)conformity of a sample. Depending on the result, H 0 is then confirmed or rejected (for a more detailed introduction compare Bortz / Schuster 7 2010: 97-116). 5.1 Online Production Tasks 119 by chance. This makes all measures in this chapter hypothesis-testing studies, testing the independence of two variables. For a better understanding, these observations of frequencies, trails, and their outcomes are often visualised in contingency tables (table 5.2), which contain the criteria for a decision (w 1 , w 24 ) and the observations being made (O). In this case, O 11 refers to the times when criterion w 1 and criterion w 2 were both observed, while O 12 and O 21 stand for trails which only showed either criterion w 1 or criterion w 2 . O 22 then refers to the absence of both w 1 and w 2 within a trail. Statistical calculations are then used to compare these observations with the frequency which is to be expected if these criteria occured and co-occured on a random basis (table 5.2). Here again, E 11 represents the expected random co-occurrence of both items, while E 12 and E 21 are used for the respective expected occurrence of w 1 and w 2 , with E 22 referring to the likelihood of a scenario where neither w 1 nor w 2 occur randomly. 6 𝑤𝑤𝑤𝑤 2 - 𝑤𝑤𝑤𝑤 2 𝑤𝑤𝑤𝑤 2 - 𝑤𝑤𝑤𝑤 2 𝑤𝑤𝑤𝑤 1 𝑂𝑂𝑂𝑂 11 𝑂𝑂𝑂𝑂 12 = 𝑅𝑅𝑅𝑅 1 𝑤𝑤𝑤𝑤 1 𝐸𝐸𝐸𝐸 11 = 𝑅𝑅𝑅𝑅 1 𝐶𝐶𝐶𝐶 1 𝑁𝑁𝑁𝑁 𝐸𝐸𝐸𝐸 12 = 𝑅𝑅𝑅𝑅 1 𝐶𝐶𝐶𝐶 2 𝑁𝑁𝑁𝑁 - 𝑤𝑤𝑤𝑤 1 𝑂𝑂𝑂𝑂 21 𝑂𝑂𝑂𝑂 22 = 𝑅𝑅𝑅𝑅 2 - 𝑤𝑤𝑤𝑤 1 𝐸𝐸𝐸𝐸 21 = 𝑅𝑅𝑅𝑅 2 𝐶𝐶𝐶𝐶 1 𝑁𝑁𝑁𝑁 𝐸𝐸𝐸𝐸 22 = 𝑅𝑅𝑅𝑅 2 𝐶𝐶𝐶𝐶 2 𝑁𝑁𝑁𝑁 = 𝐶𝐶𝐶𝐶 1 = 𝐶𝐶𝐶𝐶 2 = N Table 5.2: Basic contingency table for observed and expected frequencies (based on Evert 2009: 1231) Table 5.2: Basic contingency table for observed and expected frequencies (based on Evert 2009: 1231) This almost suggests that testing collocational pairs for statistical significance is a unanimous, straightforward undertaking, but the means of measuring statistical significance are as diverse and multi-layered as collocations and language themselves. This is partly due to the fact that, even though most share the same H 0 , the underlying assumptions of each measure differ - at times quite dramatically - and which association measure to choose for a corpus-based study on collocations is often a matter of choice, taste or even conviction. Over the past decades, computer linguists like Bartsch (2004), Evert (2009, 2005) or Manning and Schütze (1999) provided useful overviews and introductions into this topic. The following pages do not seek to repeat their work and re-list all the measures which they have already described and discussed far more exhaustively. Rather, they serve to contrast four of the best known and most frequently used 4 The notation for these contingency tables as well as the subsequent formula is the same as used by Evert (2009: 1231). 120 5 Measuring Collocations - Methodological Considerations association measures for collocational strength, in order to discern which might be most appropriate for the purpose of this study and to make clear where each measure’s shortcomings and benefits lie. These measures will be Mutual Information ( MI ), z-score, and t-score, as well as log-likelihood (> 5.1.1). The second part of this chapter (> 5.1.2) will then discuss more recent, explicitly cognitionbased approaches within corpus research as suggested by Stefanowitsch and Gries (2003), Lüdeling and Bosch (2003), Zeldes (2012) and Wulff (2010). 5.1.1 Traditional Association Measures In mathematical terms, one of the simplest ways to express a relation between two entities is to calculate their quotient. Dividing the observed frequency (O 11 ) of a collocation by its expected frequency (E 11 ) would therefore serve as a logical starting point for a linguistic association measure. As Evert (2009: 1226) points out, this could result in extremely high values, once E 11 is much smaller than O 11 , which, in a corpus, can often be the case. Therefore, Church and Hanks (1990: 23) suggested measuring collocability on a logarithmic scale called pointwise Mutual Information ( MI ). MI is part of a family of measures from information theory. Hence, its results are given in bit. If the expected frequency of a collocation matches its observed frequency, MI has a value of 0 bit, or total chance. Within 0 to |3| a co-occurrence by chance is said to be still very likely, while a value of |3| is usually seen as an indicator for rejecting the null hypothesis (McEnery / Xiao / Tono 2006: 56). Since it is mapped on a logarithmic scale, MI is an asymmetric test; it can also produce negative results which translate into “no attraction” or even “repulsion”. This makes MI a relatively easy measure to interpret, since a higher MI value, quite logically, proposes a higher attraction, which, unlike the result of a symmetrical measure, cannot be mistaken for repulsion. (Evert 2009: 1226) MI = log 2 O 11 E 11 Other than the remaining measures from this chapter, MI is a measure of effect size . Unlike measures of significance , which only allow inferences about the certainty for rejecting the null hypothesis, it gives information about the degree of attraction of the items under investigation. Initially a calculation used in information theory, MI expresses the information content of two events. It shows the degree of information item A (w 1 ) gives about item B (w 2 ) in order to examine to what extent the informational content of B can be reduced while still maintaining the same degree of information (Evert 2005: 89). Sampling variation for example across different parts of a corpus, is, however, not taken into consideration. Furthermore, researchers have often warned that MI shows a low-frequency bias, resulting in quite unique, rather low-frequency collocates on top of an MI -sorted collocate list (Barnbrook / Mason / Krishnamurthy 2013: 67-68; Evert 2009: 1227). Table 5.3 shows the top ten 5 BNC verb collocates for crime according to MI , within a span of ± 4 and without any restriction for frequency. Even though this measure yields some collocates which can be found in most dictionaries, like commit or perpetrate , the results are predominantly very rare or even ad-hoc produced instances of verb collocates for crime . In general, many collocates which are calculated based on MI often occur only once in the BNC in the first place, which is why, in most cases, it would be advisable to introduce a cut-off frequency for potential collocates. Considering its information theory history, this tendency is, however, unsurprising, since words which only occur once within a corpus ( hapax legomena ) exhibit the strongest - namely exclusive - connection to a node. This again leads to the question whether a collocational pair with a high MI is just a phenomenon of not enough corpus data (since if there were more instances of a lemma within a corpus, it might show a less exclusive distribution) or, if a high MI remains even with more than one corpus-wide hit, this combination should instead be considered a compound, a multi-word lexeme or at least a combination with highly repetitive informational content ( pleonasm ). Interestingly, Ellis and colleagues (2008) have shown that prefabs which native speakers identify more rapidly - maybe because they are stored as a whole unit - also obtain a relatively high MI , while the prefabs most available to non-native speakers correspond to a high raw frequency for the item. However, it remains to be seen whether this is also true for a node and its collocates. MI z-score t-score log-likelihood 1 abett commit commit commit 2 under-emphasize combat rise combat 3 de-politicise convict report rise 4 expiate tackle investigate convict 5 suborn investigate prevent investigate 6 commit rise combat tackle 7 incite victimize convict report 5 The respective lists have been checked manually. 5.1 Online Production Tasks 121 122 5 Measuring Collocations - Methodological Considerations MI z-score t-score log-likelihood 8 combat garot tackle prevent 9 incite accuse accuse accuse 10 perpetrate deter be punish Table 5.3: Ten most frequent verb collocates for the lemma crime according to MI, z-score, t-score and log-likelihood ( BNC ) Z-score values (Dennis 1965; Berry-Rogghe 1973), on the other hand, are measures of significance, which means that they provide numerical cut-off points for rejecting H o but not any further degree of attraction. They are based on the assumption that language provides normally distributed data and thus calculate the distance of a certain value from the mean (Oakes 1998: 7-8). Typical thresholds for word pairs are |z| > 1.96 or |z| > 3.29 (Evert 2009: 1227). z - score =O 11 - E 11 √E 11 As table 5.3 shows, z-scores are less likely to favour very rare lexemes within a corpus but, just like MI , they show a low-frequency bias. Furthermore, Evert (2009: 1227) warns that, if applied to measure the association of potential collocates, most bigrams score over the threshold of 1.96 6 . Thus, these cut-off points prove to be highly problematic. For compiling rankings as in table 5.3, z-scores could still provide suitable lists, for example for corpus-internal comparison. Another fairly established way of testing the association of two variables is the t-test. T-tests have been part of statistical hypothesis testing for more than 100 years. Originally employed to test small samples of barley or stout for quality management at the expanding Guinness Brewery, t-tests today are widely used to measure statistical differences of arithmetical means from two or more groups (Zabell 2008; Student 1908). Collocational studies adapted this approach in order to contrast the observed and expected frequency of a collocate (Manning / Schütze 1999: 163-166; Church / Gale / Hanks / Hindle 1991: 122-133). Yet, as Evert (2005: 82-83) points out, these changes result in a quite considerable deviation from Student’s original t-test, which, from a mathematical point of view, makes the t-score a “heuristics variant of z-score” (Evert 2005: 83) rather than part of the t-test family. 6 In a study on the Brown Corpus, Evert (2009: 1227) found that 70 % of all “distinct word bigrams” show a z-score of over 3.29 and 80 % range above 1.96. t - score = O 11 - E 11 √ O 11 Like z-scores, t-scores are a measure of significance, answering the question of the degree of confidence with which the null hypothesis can be rejected (Evert 2009: 1227). Therefore, for a confidence level of p=0.005 7 , a t-score of 2.576 or higher is generally assumed to be statistical 8 , which indicates that it is less likely that the two collocates under investigation co-occured by chance (Manning / Schütze 1999: 164). As opposed to z-score and MI , this measure shows a bias towards high frequency combinations ( high-frequency bias ). Hence, consulting at least two measures for collocational analysis has repeatedly been suggested, since they represent different perspectives on a lexical item’s collocational behaviour (Barnbrook / Mason / Krishnamurthy 2013: 69; McEnery/ Xiao / Tono 2006: 57). MI , z-score, and t-score are often referred to as simple association measures , because they only make use of E 11 and O 11 within the contingency table. Thus, they focus on the mere co-occurrence of the items under investigation but do not make use of the relation between these potential collocations and other, similarly likely combinations of one collocate with other items, like for example contingency-based association measures. Measuring the significance of the null hypothesis based on the complete contingency table, on the other hand, does not only take occurrences of the item under investigation into account but also instances where only one or neither of the two variables apply. In the case of collocations, this means that a contingency based measure not only compares the observed and expected frequency of the co-occurrence of node and collocate, it also uses the likelihood for the node to occur without a certain collocate and vice versa, for example, all instances of crime without commit as a collocate, or for a combination of words which contain neither crime nor commit . Therefore, contingency-based approaches stand on mathematically firmer ground, since they partly substitute estimates with actual data. On the other hand, this is what makes these statistical measures to some degree more difficult to calculate. At the same time, including instances which are defined by the absence of an observation, these more complex as- 7 This value indicates the probability (p) of an error within the respective measure. Therefore, p=0.05 means that H 0 can be rejected with 99.5 % confidence. 8 Kline (2004: 86-89) argues quite convincingly that the label “significant” should be reserved “to describe something actually noteworthy or important.” (Kline 2004: 86) and thus prefers to use only “statistical” in data analysis. The study at hand follows this suggestion and will refer to any rejection of H 0 as a “statistical result” rather than a “statistically significant result”. 5.1 Online Production Tasks 123 124 5 Measuring Collocations - Methodological Considerations sociation measures can account for asymmetric distribution between variables. In the example of commit + crime, a simple association measure, like MI or tscore, refers to the cases where both variables are present, that is, sentences which contain the noun crime which collocates with the verb commit and vice versa. The result would be the same, independent from crime or commit serving as the node. Contingency-based measures, however, distinguish between both scenarios, thus, providing information about the degree of interdependence of both variables, since, despite their co-occurrence, it could well be that one item almost exclusively triggers another, while the second variable is less dependent. In his very comprehensive overviews, Evert (2009, 2005) lists many contingency-based association measures but concludes: The log-likelihood measure (G2) and to some extent also simple-ll (G2 simple) give an excellent approximation to Fisher’s test, as all data points are close to the diagonal. Chi-squared and z-score overestimate significance drastically (points far above diagonal), while t-score underestimates significance to a similar degree (points far below diagonal). (Evert 2009: 1237) Therefore, unlike for example the chi-squared test 9 , a test based on log-likelihood (Dunning 1993) is not limited by quantitative restrictions on the data. It can be performed for small samples and is not dependent on corpus size (Oakes 1998: 174). Furthermore, it does not assume that the distribution of a lexical item within a language is based on a classical, bell-shaped normal distribution of the data. This is, in fact, a more realistic assumption, since, as Zipf (1949) observed, the distribution of words in English instead follows a linear distribution on a log-log graph: the most frequent word occurs twice as often as the second most frequent word, three times as often as the third most frequent word, and so on. log - likelihood = 2 O ij log O ij E ij ∑ ij Log-likelihood measures also allow conclusions about the degree of certainty with which the null hypothesis can be rejected, but also whether this means that the node and collocate attract or repel each other. Therefore, today they 9 A chi-squared test for significance is only applicable if each value is at least five. For a study on more creative and hence less frequent alternations of established collocations, this measure cannot be used for all co-occurrences under investigation. Furthermore, the chi-squared test is a two-sided test, so initially, it can only help to make statements about the degree to which the null hypothesis can be rejected but not whether a potential relationship between two items is based on rejection or association (cf. Manning / Schütze 1999: 172-176). are one of the most widely used measures of lexical association. Furthermore, as has already been mentioned, it yields results similar to one of the statistically most accurate association measures, the Fisher-exact test 10 (Evert 2009: 1237). However, as Stefanowitsch and Gries (2003: 217-218) point out, log-likelihood is still based on chi-square distribution, which might again make it rather unsuitable for less frequent occurrences of the phenomenon under investigation 5.1.2 Corpus Data in Cognitive Linguistic Research Corpus data plays a central role in cognitive linguistic research as well, predominantly in research concerned with data mining in the shape of the definition and description of partly or completely delexicalised constructions (Hilpert 2008; Stefanowitsch / Gries 2005, 2004, 2003). But there are also first approaches which explicitly take into account more dynamic aspects of language like gradient idiomaticity (Wulff 2010) or the productivity of constructional arguments (Zeldes 2012; Lüdeling / Bosch 2003). Stefanowitsch and Gries’ collostructional analysis (2003) is one of the prominent examples of corpus-based research in construction grammar. It combines theoretical, constructional considerations and corpus linguistic methodology. While still based on corpus linguistics, they argue that traditional association measures (> 5.2.1) lack a consequent commitment to the constructional character of language. As has been mentioned before, this results in calculations which take any instance of a potential collocate as a hit whether or not they occur in the same phrase, construction or even clause. Thus, Stefanowitsch and Gries suggest correlating a lexeme with the construction it occurs in, instead of two individual lexemes. In the case of collocations, this would imply that strength of a potential association is measured based on a lexeme (≙ w 1 ), for example, mistake and a construction like [ commit + NP ] (≙ w 2 ). In analogy to collocations, these co-occurrences of a word and a construction are called collexemes . When it comes to the choice of association measure, Stefanowitsch and Gries (2003: 218) too prefer the Fisher-exact test (Fisher 1922), since it seems the most precise association measure for the calculation of co-occurrences within a corpus. But, unlike Evert (2009: 1237), they argue that this test cannot be substituted by a less costly measure of log-likelihood, since this association measure is based on a chi-square distribution, which compares normal and Poisson distribution and is thus particularly useful for phenomena with a high number of occurrences (Stefanowitsch / Gries 2003: 217-218). A prerequisite, which might be suitable for most high-frequency phenomena but not, as they argue, for a study which 10 For a comprehensive discussion of the Fisher-exact test (Fisher 1922) see Yates (1984). 5.1 Online Production Tasks 125 126 5 Measuring Collocations - Methodological Considerations wants to include rather rare co-occurrences. Therefore, the Fisher-exact test is used as the statistical basis for a collostructional analysis. This method enables the researcher not only to investigate the co-occurrence of lexemes per se but also check whether they occur in the same structural environment. In the case of [ commit + NP ] for example, a collostructional analysis would not simply yield a list of all co-occurrences within a set span, like crime or suicide, but rather narrow the scope, and focus on all potential collocates of commit , or also crime , depending on the perspective, in a [ commit + NP ] structure. While this seems a useful way to extract and operationalise collocations, this analysis has one major issue when it comes to its application: it needs the total amount of constructions (n) within a corpus, since the calculation is based on contrasting instances of [ commit + NP ] (w 1 ) with constructions which are not [ commit + NP ] (-w 1 ). But the question is how these -w 1 -constructions should be defined. An intuitive solution would, of course, be to see n as the sum of all constructions in a corpus, yet, unlike lexemes, constructions can be found on different levels of abstraction (> 4.3.1). Thus, taking this perspective, the number of all constructions in a corpus would include all lexemes and semi-lexical meso-constructions, like [ The X er the Y er ] (Goldberg 2006: 55), as well as argument structure constructions like ditransitive constructions 11 . Some authors, such as Booij (2010), would even include morphemes, as the smallest instance of form-function pairs. The fact that this would amount to a rather high number poses less of a problem here than the question of which semi-lexical constructions or ASC s to include in the first place. There have been attempts to create a comprehensive list of all constructions in a language, a so-called constructicon , but so far no such list or overview seems to be available 12 . Even with such a constructicon at hand, another question would be whether it makes sense to regard all constructions as the basis for an analysis of [ NP + VP ] collocations. Therefore, a second option could be to define n as all constructions in a corpus, which operate on the same level of abstraction. Still, the task of identifying the right (that is, cognitively relevant) scope is challenging, since even on a more narrow phrase-level, potential candidates can range from all phrases to all phrasal combinations such as [ VP + NP ] or [ NP + NP ], to all [ VP ] with various companioning elements, such as [ VP +-ing clause] 13 . Bybee (2010: 98) also briefly comments on this problem. Stefanowitsch and Gries argue in favour of the latter option when it comes to constructions 11 For a list of potential constructions see for example Goldberg (2006: 5). 12 First advances have been made in connection with the FrameNet Project at the University of California, Berkeley (Fillmore / Lee-Goldman / Rhodes 2012). 13 A verb like commit , for example, is listed with eleven different possible realisations of valency complements in the Valency Dictionary of English (VDE: commit ), three of which refer to the reading of crime . which operate on the level of a clause (2003: 218-219) to analyse the attraction between a noun like accident and [N waiting to happen ]. This, of course, enables them to list potential [N]s sorted according to their attraction to the construction, but it does not help to identify any creative combinations, since the filler elements, such as accident or crime, have to be selected in advance. As a consequence, collostructional analysis 14 , like other non-constructionist association measures, can help to identify the associative link between two collocates or a collocate and its construction, but does not allow for any further inferences about the scope of a collocational construction’s potential meaning or semantic prosody. Also, further influencing factors, such as context or differences in the cognitive processing of individual recipients, cannot be investigated. This problem remains for measures which explicitly focus on the productive potential of language. Most of them stem from research on derivational morphology (for example Baayen 2009, 2001, 1992; Lüdeling / Evert 2003; Baayen / Lieber 1991; Aronoff 1976) 15 , but Lüdeling and Bosch (2003) also tried to apply this logic to collocations. Following Baayen’s (2001, 1992) reasoning, they suggest that the more variation a collocation allows, the more productive it might be. In their case study, they use the number of words which occur only once as collocates of a given node ( hapax legomena ) to calculate the rate at which creative combinations occur. A high number of hapax legomena would then also imply that these nodes are more prone to rare, more creative collocational alternations. Yet, as Zeldes (2012) concludes, even if productivity within a corpus might be measurable, this only states that variation is possible - a fact which is already part of most definitions of collocations (> 2) - but not which factors are responsible for productive variation in the first place (Zeldes 2012: 187-189). Therefore, he suggests a usage-based model, which, like the model in this study, is largely based on neurological implications drawn by Hebb (1949). His study, however, does not continue to test his findings outside a corpus, but Zeldes (2012) emphasises the necessity of an investigation of productivity within a language attainment context: As the data presented here has concentrated on a static view of adult-produced distributions, a developmental cognitive account could complement it by explaining how the way people learn language changes with time and how previously acquired knowledge builds up and affects further learning longitudinally. (Zeldes 2012: 242) 14 Other measures which analyse the relationship between potential collocates or lexical fillings of a construction are, for example, Biber’s (1993) Co-Occurrence Patterns or Behavioral Profiles (Gries 2010; Otani / Gries 2010; Berez / Gries 2009). 15 For a detailed discussion see Zeldes (2012: 48-95) or Bauer (2001). 5.1 Online Production Tasks 127 128 5 Measuring Collocations - Methodological Considerations Wulff (2010) too calls for a combination of corpus linguistic findings and experimental data. In her study, she concerns herself with the gradient cline between idiomaticity and flexibility. But, unlike Zeldes, she models her factors around the evaluations of 39 academically trained, native-speakers of English. She thus identifies different intralinguistic parameters such as tree-syntactic flexibility 16 , lexico-syntactic aspects, morphological flexibility, compositionality, and corpus frequency. In doing so, however, she does not take into consideration that the native speakers themselves, as well as the way in which the items were presented 17 , might also be factors worth considering. Thus, taking into account experimental data seems to be necessary in order to form a more comprehensive understanding of potentially influencing factors for constructions in general and collocations in particular, since indices of productivity or idiomaticity might show that different items operate on a respective spectrum. Yet, the methods presented here do, however, not include information about the source or contextual setting of constructions in general or collocations in particular. 5.2 Offline Perception Tasks - Experimental Data in Usagebased and Constructionist Studies As mentioned before, offline perception data is needed to analyse potential influencing factors of collocations and their creative alternations. This chapter will present methods of offline data elicitation which were designed with a clear commitment to a cognitive dimension of language in mind. In early studies such as Fillmore and Kay (1988), Goldberg (1992) or Morgan (1997), this often meant that research mainly drew on the introspective evaluations of its authors. It has, however, been mentioned earlier (> 1, 4) that this is problematic for two reasons. First, because native speaker proficiency as one uniform entity might not exist after all (Dąbrowska 2012; Pakulak / Neville 2010; Dąbrowska / Street 2006; Widdowson 2000), but also because language experts such as linguists might be in danger of approaching language and its systematicity from a slightly different 16 In Wulff’s (2010) analysis, “tree-syntactic flexibility” comprises different versions of syntactic structures in which the item under investigation might occur, such as “declarative active”, “declarative passive” or “relative cl. passive”. Aspects like “attr. adjective for NP” or the use of a “time adverbial”, for example, are part of “lexico-syntactic aspects”, while morphological flexibility includes the potential use of affixes describing “person”, “tense”, “aspect”, etc. (Wulff 2010: Appendix F) 17 The questionnaire contained 39 sentences which mostly consisted, apart from the item under investigation, of lexically more or less empty words such as pronouns. 5.2 Offline Perception Tasks 129 perspective compared to linguistically untrained speakers (Dąbrowska 2010; Spencer 1973). Hence, more recent research uses methods from neighbouring (cognitive) disciplines such as psychology or neuroscience to form a more reliable methodological basis for its analyses. For the study at hand, it also implies that the chosen methodology needs to be able to address the general level of proficiency of different groups of test takers when it comes to collocations, as well as their opinion on collocations and their creative variations in different contexts. In general, offline perception tasks offer three different subcategories: Matching-tasks , which ask participants to combine potential collocates from a given list of candidates (for example Granger 1998: 152-154), multiple choice tasks , which give the participant several collocational combinations to choose from (for example Eyckmans 2009; Gyllstad 2007: 178-199, COLLEX 5), and judgement tasks , which ask test takers to evaluate the acceptability of a collocation (for example Siyanova / Schmitt 2008: 439-452; Leśniewska / Witaliz 2007: 34-38; Gyllstad 2007: 178-199, COLLMATCH 3; Jaén 2007; Bonk 2001). To test participants’ general collocational proficiency, all three categories should provide suitable data. Yet, there are not many comprehensive, validated tests for collocational proficiency to choose from. Eyckmans’ (2009) Discriminating Collocations Test ( DISCO ), is one of them. This test consists of 50 test items which present the participant with three VP+NP combinations, two of which are used in the target language, while the third is not. In order to score, a test taker needs to identify both established collocations correctly. Gyllstad (2007) suggests a combination of two tests, CollLex and CollMatch. CollLex is in fact quite similar to DISCO . It also comes in the shape of a multiple-choice task, with three options within an item. However, here the 50 items all share one collocate, and participants should identify the most likely option. CollMatch, on the other hand, is a simple judgement task, containing 100 collocations and pseudo-collocations which can either be accepted or rejected. At first glance, the DISCO test might seem the most suitable for the task at hand, since its design accounts for the comparison of three different combinations. But this test is designed to measure a participant’s ability to distinguish between collocations and free combinations. Therefore, the original test contains three combinations, which look like three more or less unrelated items. In order to compare collocations and their creative alternations, combinations within an item would, however, look fairly similar. Namely, they would all share one collocate, as in CollLex. If test takers are then allowed to choose two out of three options, this might yield the least accepted combination within a set, but it would not be possible to tell which of the other two combinations is in fact the preferred phrasing or if they might even be considered to be equally 130 5 Measuring Collocations - Methodological Considerations possible. Furthermore, to alter the test items of either of these tests would mean that they are in danger of losing their predictive potential, since different items might result in a drop in validity. Therefore, a judgement task seems the more suitable test for studying the collocational proficiency of different groups of L1 and L2 speakers. (> 5.4.1.1) This is even more the case for any analysis which seeks to investigate the effect of ‘context’ on the acceptance of creative variations within phraseological phenomena. In a matching-task or a multiple-choice test, creative alternations of collocations are very likely to lose out against their more established counterparts, which would have the consequence that, in a direct comparison between more established and creative combinations, creative variations are likely to be rejected and thus not evaluated at all. To find out whether contextual factors influence the evaluation of creative alternations, it might therefore be more appropriate to present creative versions of collocations alongside their established counterparts in judgement ratings with varying levels of accompanying context. This would also make it possible to control the test setting carefully, so the two variables ‘creativity’ and ‘context’ are the only two to change. So far there is, however, no test which incorporates these considerations, which is why the present study uses a second task which was designed to meet the needs of this analysis. (> 5.4.1.2) 5.3 Methodological Limitations and Shortcomings The last two chapters presented several methods of data elicitation, focusing on online production as well as offline perception methodologies, which were identified as being most suitable for the study at hand. Yet, as already mentioned throughout sections 5.1 and 5.2, there are limitations to each of these tools and tasks. This study will try to limit these effects by applying a toolkit of several methods to answer the underlying research questions. Therefore, the following pages will outline the major limitations of the two main methods discussed in chapters 5.1 and 5.2: corpus data (> 5.3.1) and judgment tasks (> 5.3.2). 5.3.1 Corpus Data As chapter 5.1 has already demonstrated, even association measures with a high level of statistical accuracy, like log-likelihood, focus by definition on established combinations. They are thus only ex negativo able to account for more creative use of language since these combinations occur less often and thus fall for example under a certain threshold. As a consequence, the observed 5.3 Methodological Limitations and Shortcomings 131 frequency then probably lies below the level of expected co-occurrence, and creative variants of collocations are likely to have rather low values for most association measures, such as z-score, t-score or log-likelihood. MI , on the other hand, might be the notable exception since this measure favours rare combinations. However, this is only true for collocates which in a corpus are rare or even unique per se. Lexemes which occur more frequently throughout a corpus but only occasionally in a certain collocational combination are not accounted for by MI either. Thus, if creative variations of collocations are likely to yield low values for most association measures, one way to identify these combinations might be to simply look at the bottom of a ranking of collocates for any given node, for example, sorted according to their log-likelihood values. Such a list, however, would not only produce potential candidates for an analysis of creative variations of a collocational combination. Since a low value of any association measure essentially means that there is not enough evidence to reject H 0 , lexemes which only occur in the vicinity of a node by chance would also be part of this list. As has already been demonstrated in chapter 2, this shows, once again, that low frequency of co-occurrence is not the only criterion for a creative alternation of a collocation. The fact that these combinations are apparently interpreted against the background of an established collocation, like pretty boy and pretty girl , is important as well. Thus, it seems that, in order to extract collocations as well as their creative variations from a corpus, statistical extrapolation is not enough. Once extracted, the measures presented in 5.2.1 can help to distinguish between more established and more creative combinations, but they cannot be used as a tool to identify a list of potential candidates in the first place. Another aspect which, as the previous chapters indicated, proves difficult for most association measures are the theoretical assumptions about language and linguistic structures made by most calculations. One of the most frequent misconceptions is to see language as normally distributed data. But, as Zipf ’s Law (Zipf 1949) has already observed, the distribution of lexemes in a language in general and a corpus in particular takes a linear distribution in a log-log graph, with some items occurring with high frequency, while the remaining vocabulary decreases gradually. Connected to Zipf ’s observation, a second aspect assumes independence of items. Many statistical calculations, including MI, t-score, z-score and log-likelihood, take an independent, random distribution of items as one of their prerequisites. In language, however, words are combinatorially restricted due to set structures of clauses or phrases. Furthermore, even the most statistically sound association measure does not automatically result in a more reliable list of candidates. Authentic results need to rely on authentic data as well. In most cases, this means that the corpus at hand should present 132 5 Measuring Collocations - Methodological Considerations a full and realistic picture of linguistic performance in general or, in the case of specialised corpora, within a certain field. Questions about the fundamental nature of a language, like statistically significant co-occurrences, are therefore best described with a corpus which could be seen as a representative sample of English. It follows that a reliable corpus needs to be well balanced with respect to genre, medium, and the authors’ socio-cultural characteristics 18 . While most (corpus) linguists would agree on the fact that a corpus needs to be balanced, the actual parameters are often a source of debate (Hunston 2002: 28-32; Aston / Burnard 1998: 21-24). Unsurprisingly, therefore, corpus studies can only operate within the realm of their respective corpus. Despite their considerable size, ranging from 100 million ( BNC ) to more than 450 million (CoCa 19 ) entries, even large corpora can never be fully balanced. Economic or financial constraints often result in a bias towards a certain genre - like a tendency to use publications by one of the project’s partners - or a specific mode - BNC and Coca both only contain a maximum of about 10 % ( BNC ) to 20 % (CoCa) of spoken language. So it could be argued that some corpus analyses indeed tell us more about the characteristics of newspaper articles than the language system as such. Furthermore, a corpus, despite its size, can hardly represent the whole of a human’s proficiency; it cannot predict how s / he might process novel utterances, nor how a linguistic system as such operates. Corpus queries can only support or reject a hypothesis but not provide information on its own. Apart from this, differences among speakers of a language make it highly unlikely that one single corpus can paint a representative picture of the linguistic input an individual receives on a daily basis. Therefore, even though they contain a large amount of data, corpora never present the whole of a language. Thus, the fact that an item or a collocational pair does not occur in the BNC does not imply that these are not acceptable words or phrases among native speakers of a language. Taking the compilation of an exemplary database which mirrors a sample of authentic input for an adult native speaker, it is highly debatable which ratio literary texts, and articles should make up, or if a ratio as little as 10 % for spoken language, as for example in the BNC , is representative. The datedness of texts within larger corpora especially could be seen as a further cause of less authentic corpus data. This includes the age of a corpus as such, but also the different types and styles of writing which might have been prevalent at the time of compilation. The BNC , for example, was created in the 18 For a more detailed discussion compare Atkins and colleagues (Atkins / Clear / Ostler 1992), but also Hunston (2002) or Aston and Burnard (1998). 19 Corpus of Contemporary American English compiled by Davies and colleagues (2008) at Brigham Young University. 5.3 Methodological Limitations and Shortcomings 133 early 1990s, which not only means that the language it contains is over 20 years old but also that sub-genres like e-mails, internet articles or blogs did not play any major role at this time - unlike today, where these texts are ubiquitous in everyday life. 5.3.2 Judgement Tasks The last paragraphs have shown that, while the size of a corpus might be able to make up for a certain level of imbalance when it comes to highly frequent phenomena and established lexical or grammatical patterns, it cannot remain the sole source of information for creative, non-normative language. Even methods such as collostructional analysis are not able to show any correlations between frequent and rare co-occurrences beyond a very general measurement of acceptability through statistical extrapolation. Furthermore, it cannot take into account other factors such as ‘age’ or, to a certain extent, ‘context’. This is, of course, possible with carefully designed questionnaires or tasks, but more experimental data, such as offline perception tasks, has its own limitations. Compared to corpus data, judgement data can only refer to a rather limited set of items, which makes it less comprehensive than most corpus-based studies. Moreover, in order to contrast different instances of a phenomenon which only differ with respect to one variable, test takers would need to judge items which look very similar. This entails the risk that participants might be able to discern the actual intention of the test and adapt their behaviour accordingly. Especially within items which would be altered by only one word (as would be the case for a comparison of collocations and their more creative counterparts), this might potentially skew a test taker’s judgement. Countermeasures can be taken either by introducing distractor items or through the distribution of an item’s different variants across different tests (Cowart 1997: 46-53). Furthermore, the aim to keep all factors stable (ceteris paribus) while only altering one or two variables is not trivial either. Apart from the test as such, other elements include the instructions and the way in which they are presented, as well as the general setting (which includes general factors such as the room a test is taken in or temporal aspects, like the time of day). Also, participants themselves contribute potentially unwanted variables such as a lack of motivation or a tiring effect during the test. Some of these aspects can be controlled through prepared instructions and a predefined process which is then applied to every test session. The length of a test as well as the order the test items occur in - if they are flexible - can also counterbalance potential motivational issues. Yet, especially with younger test takers, there is always the danger of them losing interest in the task, which then means that data from these participants 134 5 Measuring Collocations - Methodological Considerations needs to be eliminated from the database. At the same time, this decreases the amount of evaluations per item. Therefore, participant groups should have a certain size to ensure that the data obtained remains representative. In general, studies with a sample size of 30 or less are considered small (Gries 2013: 319-322; Larson-Hall 2012: 243-249; Albert / Marx 2010: 155-157). Thus, while the application of judgement tests seems the most suitable way to investigate the degree of influence of ‘creativity’ and ‘context’ on collocational combinations, the results of such a study should also be compared to corpusbased association measures to find out whether one of the measures scores close to the participants’ evaluations. However, any attempt to approach mental representations of language from a linguistic point of view must be aware that most methods are only able to describe the output of a cognitive system and thus might be able to rule out models which are not able to explain a certain phenomenon, but can hardly make inferences about the actual, true nature of language and language processing (Sandra 1998; Croft 1998). As a consequence, a cognitive linguistic, usage-based study might want to utilise a combination of different methods to approach the object under investigation from different angles. If different methods then produce a similar outcome, the result stands on firmer methodological grounds 20 . 5.4 Methodology of this Study When analysing the relationship between normative and creative, non-normative language, corpus research provides a useful basis for telling the norm from creative cases. Yet in order to zoom in on the collocational items under investigation and compare established co-occurrences with their less frequent counterparts as well as different degrees of contextual influence, these items need to be evaluated ceteris paribus; that is under comparable circumstances. Therefore, the study at hand consists of a questionnaire which should serve as the basis for analysis of a speaker’s proficiency as far as collocations are concerned, as well as his / her evaluation of normative and creative co-occurrences. In addition, a corpus-based analysis of selected items will be used to compare the limited database from the questionnaire against a larger collection of lan- 20 This so called mixed methods approach can often be found in applied linguistic research. It serves, as Labov points out, as “[t]he most effective way in which convergence can be achieved is to approach a single problem with different methods, with complementary sources of error in each study.” (Labov 1972: 118). Other studies on collocations which also use several methods to triangulate their research questions are for example Ellis, Simpson-Vlach and Maynard (2008) or Siyanova and Schmitt (2008). 5.4 Methodology of this Study 135 guage. The questionnaire itself comprises four sections: a general test on collocational knowledge; a sentence judgement task, specially designed to find answers to RQ 3; a form to collect information on the participant’s (linguistic) background; and a set of distractor tasks (Appendix I). Furthermore, in order to investigate the implications the variable ‘age’ might have, this questionnaire was answered by test takers from different age groups and with different linguistic backgrounds. This pseudo-longitudinal character makes some tentative inferences about the development of collocational proficiency in L1 and L2 attainment possible. But it should be stressed that a full longitudinal study over five to ten years would, of course, yield more reliable results. Thus, this study was designed not only to test the DMCDC -model as well as RQ s 2a / b and 3 but also to find out whether the chosen methodology would also be as suitable procedure for a more complex and costly longitudinal study. All methodological instruments used for this study will be outlined in chapter 5.4.1, while chapter 5.4.2 presents the different groups of participants as well as the setting and procedure for this part of the study. 5.4.1 Instruments As has been indicated before, this study chose to use two separate tests to compare different groups of L1 and L2 speakers (> 5.4.1.1) as well as further influencing factors such as ‘creativity’ and ‘context’ (> 5.4.1.2). The main reason for this decision was that to incorporate all variables into one questionnaire would have made it more difficult to separate them and their effects clearly. As an additional bonus, this also implies that there is one test designed to answer both RQ 2a / b and RQ 3. Furthermore, the results of this last test will be compared to the association measures presented under 5.1, in order to see whether its findings are also mirrored in corpus-based data. Since these association measures have already been presented and discussed (> 5.1), this section will focus on the two perception tasks: CollMatch (> 5.4.1.1) and CollJudge (> 5.4.1.2). 5.4.1.1 CollMatch CollMatch 21 (Gyllstad 2007) was chosen to provide a first structured overview of collocational proficiency of different groups of native as well as non-native speakers of English (> 5.3.2) in order to find out how consistent different agegroups of L1 and L2 speakers are with respect to their evaluation of (pseudo) 21 The test used here is, in fact, called CollMatch 3 , since it is the third version of Gyllstad’s CollMatch test design. However, this is the only version used in this study, so it will simply be referred to as CollMatch . 136 5 Measuring Collocations - Methodological Considerations collocations. The test uses yes / no-evaluations to test receptive collocational knowledge. It consists of 100 verb-noun pairs which fall into 70 statistically significant collocations and 20 pseudo-collocations (Appendix I). The items were selected based on their z-score within a span of +3 from the verb at a cutoff point of 2.58, which translates into 1 % chance of receiving a result which does not contradict H 0 (Gyllstad 2007: 107; Oakes 1998: 8-9). CollMatch, like its sister-test CollLex, was designed with L2 proficiency testing in mind (Gyllstad 2007: 2-4). Therefore Gyllstad selected each item on the basis of the BNC as well as the JACET 8000 list 22 for basic vocabulary to ascertain that an item is not rejected because a test taker is not familiar with one or both collocates. The general aim of CollMatch is to generate interval data which can then be used as a kind of placement test. It is highly reliable with a Cronbach’s alpha of .82 (Gyllstadt 2007: 237). Furthermore, the test correlates (Pearson r ) well with both vocabulary size (r= .90) and vocabulary depth (r= .85-.90) tests 23 (Gyllstad 2013: 22-25, Gyllstad 2007: 208-211), which, despite Gyllstad’s own reluctance to call it a depth test (Gyllstadt 2013: 22), at least means that students who perform well on CollMatch are also likely to have a high level of lexical proficiency. Thus, it comes as no surprise that native speakers of English with an academic background often obtain ceiling effects (Gyllstad 2007: 238). However, as has been argued before, it would be a mistake to assume that every native speaker can or must reach the same level of proficiency (Dąbrowska 2015, 2010; Dąbrowska / Street 2006). This implies that a test with close to ceiling results from academically educated native speakers could also be used as a benchmark test for native speakers who, because of social or age-related differences, might sit at a different level of proficiency. The fact that CollMatch - as a test essentially designed for non-native learners of English - deliberately uses a range of more basic lemmata, further facilitates its application for younger age groups. Furthermore, since CollMatch’s test items are selected on a mainly statistical basis, this helps to operationalise the comparison and moves it away from a more subjective test format, which considers collocations based on a rather introspective selection process (> 2.2); this is particularly useful since, as mentioned before, often not even native speakers can be trusted when it comes to finding a variety of examples or applications for a specific phenomenon (amongst others Louw 2003: 1.1; Hunston 2002: 142; Stubbs 1995: 24). Thus, even among native 22 The JACETlist (Ishikawa / Unemura / Kaneda / Shimizu / Sugimori / Tono 2003) is a word list of 8000 items, initially designed for learners of English in Japan (for a detailed description compare Uemura / Ishikawa 2004) 23 Gyllstad (2007) uses Schmitt’s (2000) Vocabulary Levels Test (VLT) to test vocabulary size and Read’s (1993) Word Associates Test (WAT) for vocabulary depth. 5.4 Methodology of this Study 137 speakers, an overall correct evaluation of all items with a CollMatch score 24 of 100 is rare. Furthermore, in order to compare evaluations from different groups, item-related results were calculated as an overall acceptance score (%) within the respective group. Therefore, if an item had been accepted by all test takers within a group, it received an acceptance score of 100 %. Tests which did show a clear alternate pattern of answers were assumed to come from unmotivated participants. Thus, the questionnaires of these particular test takers were removed from the database. 5.4.1.2 CollJudge CollJudge is a test which had been specially designed to fit the purpose of this study. Like CollMatch it is also based on judgement tasks; however, it tests not just collocational combinations but whole sentences. It can be regarded as an accompanying test for CollMatch, since it takes more than half of its items from this more general test. Even its name, CollJudge, is created in analogy to Coll- Match. In comparison to CollMatch, CollJudge’s items also include nine items from CollMatch 25 and a further six additional collocations. These were selected based on a pre-test of CollMatch and have been expanded with respect to paradigmatic (‘context’) as well as syntagmatic (‘creativity’) parameters. The underlying principle behind the selection of these fifteen test items is to test those contextual factors which were not included in CollMatch. Thus, the focus of this analysis is to see whether these factors influence the perception of collocational combinations or even vary according to age, L1, or schooling. To ensure that all factors were tested as individual variables with an otherwise stable context, the items were presented in four different variations. Each variant was a grammatically correct sentence of English which included the collocation as such within a more complex sentence structure but also in a simplified, semantically similar version. These simple and complex variants were then paired with either a more established or a more creative variation of one collocate. As a result, each test item consists of a set of four variants: 24 A participant-related CollMatch score is calculated by counting all collocations and pseudo-collocations which had been classified correctly by the respective test taker. For misclassified items no points were taken off, which translates into a spectrum of 0 to 100 possible points. 25 Since the majority of CollJudge’s items were selected based on first results from a pretest with CollMatch, it seems necessary to discuss CollMatch’s results and implications (> 6) before a more detailed explanation of CollJudge’s items as well as reasons for their selection will then be part of chapter 7. 138 5 Measuring Collocations - Methodological Considerations E / S: established / simple the item under investigation is presented in its normative shape within a relatively short sentence. E / C established / complex the item under investigation with the same shape as E / S but with a longer version of the sentence as context. C / S creative / simple the item under investigation in an expression which shows a non-normative deviation and within the same sentence as in E / S. C / C creative / complex the item under investigation in an expression which shows a non-normative deviation and within the same sentence as in E / C. The task as such was then quite similar to CollMatch. Participants were asked to evaluate the acceptability of each sentence on a scale of four from “okay” to “not English” (box 5.1). A scale without a clear middle had been chosen deliberately to force test takers to give at least a tendency, even if they were generally undecided. Moreover, to prevent any associations with any kind of marking scheme 26 , numbers have been foregone deliberately. To make sure each participant understood the task at hand, a short introductory text with a model demonstration of the task was provided (Appendix I). Furthermore, participants had the chance to suggest a better, more English, wording for a sentence once they had judged it. Box 5.1: Test item from CollJudge - creative / simple variant of cook the tea / meal However, presenting the participants with all variants from a set would, of course, been highly problematic since this might point them towards the intended purpose of a test (> 5.3.2). However, a number of measures can be taken to prevent this effect (Cowart 1997: 46-53). The first was to include filler items - a very common practice throughout all kinds of test. Thus, more than half of the sentences were in fact sentences which were not in any way related to the items under investigation. These filler items consisted of sentences which were either directly taken from Cowart (1997: 172-173), the OALD or the BNC . To allow for a control, some of these sentences were modified so they contained wrong or at least problematic aspects which were either of a phraseological or morpho-grammatical nature. Another, less common precaution was the distri- 26 In Germany marks stretch from 1, “very good“, to 6, “unsatisfactory”. 5.4 Methodology of this Study 139 bution of an item’s four test sentences across four different variations of the questionnaire. Thus, every test taker evaluated the same filler sentences but only one of the four variants of an item. This resulted in four different sets of questionnaires, which each consisted of 15 test items and 21 filler items. Like the sentences from the sets, these fillers also varied with respect to length and context. Therefore, the scores for each variant of an item come from different evaluators. In terms of methodology in general and statistical analysis in particular, the test benefits from this design, since no influence by similar items is possible, and the results can be compared with a one-way Analysis of Variance ( ANOVA ) 27 . But, of course, this procedure has some theoretical implications which need further consideration. The underlying principle of splitting similar sentences, as was suggested by Cowart (1997: 79-84) almost 20 years ago, is that native speakers of the same group behave in a very similar way when asked to judge linguistic phenomena. Cowart even suggests that, under certain circumstances, it is enough to ask only one native speaker per variant to be able to obtain reliable results (Cowart 1997: 83). He bases his claims on a set of studies he conducted with a fairly small and a considerably larger group of students for which he found no significant differences between their evaluations. But, as it has been pointed out before, while educated, adult native speakers of English indeed seem to agree to a fair extent, this cannot be said for either young native speakers or people who learn English as a foreign language. Especially against the background of constructionist research, as well as within the framework of Complex Adaptive Systems, results from a small number of speakers should be treated with caution, since less experienced speakers in particular might show a considerable amount of variation. For this reason, each set of questionnaires was given to several test takers within each group to ensure a broader and more reliable range of answers. However, since the tests were distributed randomly, and some tests had to be deleted from the sample due to unmotivated test takers, some data sets were evaluated by fewer people than others. For the adult group, this variation was not very high, but in some younger groups, it reduced an already small group of, for example, five participants to only four or three. Therefore, particularly results from smaller groups, like the German native- 27 The scale used for the evaluation of CollJudge’s items is a so-called Likert scale , which uses different points on a spectrum and has clearly defined endpoints (compare box 5.1). Strictly speaking, this type of elicitation method would yield discrete, non-continuous data. Using ANOVAs and z-transformations in the data analysis, however, presupposes continuous data. In the case of rating scales, like the Likert scale - especially if the scale has clearly defined endpoints but not labels for the point in-between - it could, however, be argued that these ratings are perceived like continuous, metric data (Schütze / Sprouse 2013: 33-34; Bortz / Schuster 7 2010: 22-23). 140 5 Measuring Collocations - Methodological Considerations speaking teenagers, where, at times, there were less than 20 students in one class, should be treated with caution. The groups of English native speakers and adult learners of English, on the other hand, are large enough to form a more robust database. To control the between-group variation in the four test sets, a one-way ANOVA across the participants’ CollMatch scores was used to check whether all ranged on a comparable level of collocational proficiency (> 7.1-2). Furthermore, delta-p (Ellis 2006: 11; Allen 1980) was calculated to see whether the level of constructional meaning or more context had a bigger effect on the participants’ evaluation (> 7.3). In order to be able to compare the participants’ evaluations in the first place, their individual scores were z-transformed 28 . This step was necessary, since individually scored data cannot automatically be regarded as comparable. For example, while one test taker uses the whole scale (1-4), another might be more cautious and stick to a certain spectrum instead (for example 2 and 3). Both, however, might produce the same evaluation, but this would not be evident because what an evaluation of 4 is to one might be a 3 to the other. To transform theses evaluations would, however, only make sense if the test included a sentence which were either so correct or so badly wrong that participants would, in fact, need to make use of the whole of their spectrum. Indeed, the sentence “ Studying a the subject, we realise that Rob depended on people they being able think like. ” in CollJudge (Appendix I) presents precisely such a case. There are a number of issues with this distractor, and therefore all participants would be expected to rate it as “not English”. If one, however, chose not to do so, it would be very likely that s / he would not use this extreme rating for any other item either. Here, z-transformations ensure that the most extreme evaluations of each participant lie on the same level and are thus comparable. Furthermore, the data presented some rare cases, where a test taker judged an item based on other factors than ‘creativity’ or ‘context’. The comments, which were collected for each of CollJudge’s items, made it possible to identify these evaluations and exclude them from the list of evaluated items. Depending on the evaluations of each item’s variants, the data might then support the following claims: If only complex sentences are favoured over their simpler alternations, this might indicate that participants rely on the fact that a more complex sentence might automatically be considered correct. Yet, if only one variation is accepted, independent of the sentence’s complexity, a colloca- 28 A z-transformation is calculated as follows: individual mean-score minus the participants evaluation of an item, divided by the test takers individual standard deviation (s. d.). It might also be worth noting that z-transformations assume continuous data. But, as it has been pointed out before, this is assumed anyway once an ANOVA analysis is performed. (Schütz / Sprouse 2013: 42-44; Bortz / Schuster 7 2010: 35-36) 5.4 Methodology of this Study 141 tion could be seen as more restricted and less open to extensions within a similar semantic spectrum. If, however, in a third scenario all variants receive the same degree of acceptance, this suggests that this combination is very flexible with respect to context as well as in its variability. To further support a consistent interpretation of CollJudge, the participants in this test were the same as for CollMatch. In fact, the two tests were taken in one session to ensure that neither age nor any kind of experience or input skewed the data. However, since nine items occur in both tests, a distractor task was used to function as a cognitive barrier between CollMatch and Coll- Judge and prevent participants from identifying reoccurring collocations (Appendix I). This task was deliberately chosen to be non-linguistic as well as easy enough to master for all students at all levels in order to make sure that every test taker engaged in the exercise. (Schütze / Sprouse 2013: 39; Cowart 1997: 51-52) The participants were presented with ten sequences of three symbols as demonstrated in box 5.2. Each symbol represented a number. In addition to these sequences, they received a code which showed which number had been encoded in which symbol. The task was to decode all ten sequences into threedigit numbers. Similarly to the CollMatch and CollJudge tests, students were also provided with a sample solution for one additional item. Box 5.2: Code and example from the non-linguistic distractor task (Appendix I) In the evaluation, only sequences with a total of three correctly translated digits were counted. Each correctly decoded sequence scored one point. If a participant scored less than eight points in this task, all of his / her tests were eliminated from the sample. 5.4.2 Participants and Procedure A total of 510 participants took part in this study. The questionnaire was distributed to all students in a class or a course. All participants answered anonymously. ID numbers were used to allocate the participant’s group affiliation, as 142 5 Measuring Collocations - Methodological Considerations well as the set type of the questionnaire. In order to obtain data from different age groups, the questionnaire was distributed to four different age groups (pseudo-longitudinal): teenagers around the ages of 11, 14 and 16, as well as young adults with an average age of 21. In total there were 110 British university students from the University of Hertfordshire (Hatfield), 99 students from the German Friedrich-Alexander University (Erlangen), and both groups had English as one of their core subjects. Furthermore, the German students had all passed a compulsory entrance test 29 prior to their studies. The groups of younger test takers consisted of 130 children from a local school in Hatfield as well as 114 children from three southern German secondary schools. In order to limit any influences from another first language, only data from participants with a clear either English or German native-speaker background was used for this study. Thus, only 417 questionnaires remained. For the purpose of this study a native speaker of English (L1) is defined as any participant with at least one English native-speaking parent and who indicated that s / he actively uses English in private communication. Similarly, every participant with at least one German native-speaking parent and who indicated that s / he actively uses German in private communication was counted as a learner of English (L2). This information was obtained through the questionnaire which contained general as well as linguistic background information (Appendix I). As a consequence, the participants’ distribution across the different age and language groups looks as follows: adult children year 7 / 5* year 9 year 11 average age (L1 / L2) 21.17 / 21.87 11.67 / 10.74 13.72 / 14.9 15.35 / 16.81 L1 speakers 86 130 47 43 40 L2 speakers 87 114 21 / 19 / 15 / 18** 20 21 * for the group of L1 speakers the youngest group are students from year 7, while the youngest group of L2 speakers comes from year 5 ** in order to contrast different classroom situations three additional classes were tested (> 6.5) Table 5.4: Distribution of participants across languages and age groups 29 The EFV-test consists of a C-test as well as several tasks within the bands listening comprehension , content , structure and reading comprehension . Each band is equally weighted. (for further information compare: FAU Sprachenzentrum 2007) 5.4 Methodology of this Study 143 As table 5.4 shows, the two adult groups contain almost the same number of participants, while groups of L1 children are at least twice as large as their nonnative counterparts. This is due to the fact that the data was collected from all the students in a year in the United Kingdom, while in Germany it was only possible to test one class within a year. Despite this rather big difference, the L1 data from teenage participants was not scaled down, since the acceptability scores were calculated in percentages, and between L1 and L2 speakers only the resulting patterns but not the evaluation scores as such were compared. Furthermore, while the older teenagers, as well as the adult participants, had spent roughly the same time in formal education, their ages differ due to differences in the schooling systems. The two youngest teenage groups do not correspond as far as year and age are concerned. There are two reasons for this choice: first, L2 children from year 5 were tested towards the end of the term, which means that, with an average age of 10.74, they are much closer in age to the youngest group of native speakers than the respective classes would suggest. A second reason lies within the German schooling system. Children usually start school at the age of six or seven and attend elementary school for the first four years, then, depending on their grades and academic ambition, change to one of three different types of secondary school. The most academic option is the Gymnasium . Up to then, children’s exposure to English varies. Based on the respective elementary school a child experiences English lessons from year 1 or 3 onwards. These sessions are, however, often rather playful and based on spoken interaction, which means that when the children start learning English in year five, they make up a very heterogeneous group of learners. Therefore, it was necessary to wait until the end of their first year to ensure a more homogeneous linguistic background. This heterogeneity will also be accounted for in this study. In order to see whether the type of schooling has an effect on EFL learners’ collocational proficiency, the results of children within different classroom settings will be contrasted as well (> 6.5). Since the DMCDC -model hypothesises that the kind of input learner receives might play an important role in his / her development, and also because there seems to be a current trend towards multilingual and bilingual language learning, these settings are immersion based ( IM ). They will then be compared to regular language-as-a-subject classes ( LS ). Since these programmes are still relatively rare or just about to start, most schools only have progressed as far as year five or six. Hence, in order to be able to compare bilingual and regular schooling, the youngest group of L2 participants were all recruited from the same year. The distribution of male and female participants for the children’s data is quite even, but within the adult groups, only 12 L1 and 29 L2 participants were male. For the study at hand, this implies that adult data especially, despite its 144 5 Measuring Collocations - Methodological Considerations quantity, cannot be regarded as fully representative with respect to gender. Thus, a follow-up study to investigate potential differences between adult male and female participants could help to clarify this question. Questionnaires were distributed per class or course during a regular session. All instructions concerning the tests were presented in written form (Appendix I). There was no time constraint, but participants were encouraged to work as quickly as possible and to hand in their questionnaire as soon as they had completed all tasks. 5.4 Methodology of this Study 145 6 CollMatch We cannot simply assume that what is true of one native speaker of a language will also be true of others: to make general statements about speakers of a particular language or language variety, we need to collect data from a range of speakers of different backgrounds. (Dąbrowska 2015: 663) The following chapter presents results from the judgement test CollMatch. This test focuses on the general development of collocational knowledge within adult and teenage test takers as well as native and non-native speakers of English. CollMatch, however, only features isolated items, which is why a second test (CollJudge) will be used to zoom in on the parameters ‘creativity’ and ‘context’ (> 7). Table 6.1 shows results from Gyllstad (2007: 168) compared to different groups of test takers from this study. The numbers given under mean , maximum , minimum as well as the three quantiles 1 are calculated based on each group’s Coll- Match scores; that is, correct evaluations within the test. As already described in chapter 5.4, CollMatch consists of 70 collocations and 30 pseudo-collocations. Participants scored one point once they identified an item correctly. For misclassifications, no points were awarded but also none were taken off. This translates into a maximum of 100 possible points. As the mean scores for each group show, adult native speakers of English achieve results which at 90.2 come close to the maximum possible score of 100. Furthermore, the kurtosis and skewness 2 of this sample lie close to 1 and -1 1 Different to the mean, which only calculates the average score of a group, quantiles give the highest value within a certain part of the group, for example the highest score achieved among the bottom 25 % of a class (quantile 25). Thus quantiles are able to show whether results within a group are evenly distributed throughout the sample. The quantile 50 is often also referred to as median . 2 Kurtosis and Skewness are statistical measures which are used to describe how data behaves compared to a normally distributed dataset. Kurtosis refers to the flatness of a graph. Normally distributed data would have a kurtosis of 0, while a kurtosis > 0 indicates that the dataset’s distribution is peaked. A kurtosis of < 0 means that the graph tends to be flat. The skewness of a dataset describes the data’s symmetry (= 0). A negative skewness value (< 0) indicates that the data is skewed towards the left (here: higher scores), positive skewness (> 0) that it leans towards the right (here: lower scores). (Paltridge / Phakiti 2010: 45; Joanes / Gill 1998: 183-185) 146 6 CollMatch respectively. This indicates that the curvature of the distribution tends to be flat (positive kurtosis value) and leans towards higher test scores (negative skewness value) but would still be considered normally distributed. Gyllstad native non-native Value (2007) adult yr.7 yr. 9 yr. 11 adult yr. 5 yr. 9 yr. 11 Participants 25 86 47 43 40 87 21 20 21 Mean 81.8 90.2 64.8 68.5 69 68.7 43.8 51.35 58.5 s. d. 7.9 4.9 11 15.1 17.6 9.8 14.1 6.3 5.5 Maximum 98 98 90 95 94 95 56 62 74 Minimum* 65 75 44 32 25(41) 41 4(21) 40 49 Kurtosis .19 .83 -.62 -.27 -.63 -.08 1.93 -1.02 2.26 Skewness -.35 -.91 -.02 -.32 -.44 -.12 -1.60 -.04 1.03 Quantile 25 n / a 87 57 59 56 62 27 45 56 Quantile 50 n / a 91 64 71 73 69 48 51 58 Quantile 75 n / a 94 73 78 82 76 53 56 61 *In cases where the lowest score lies more than 10 points below the next highest number of points, this second lowest score is given in brackets. Table 6.1: Overview of group results from CollMatch based on Gyllstad (2007) Advanced learners of English, on the other hand, have a mean score of 68.7, close to the results of L1 teenagers. These observations are, in fact, supported by an ANOVA analysis and a post-hoc Games-Howell 3 test (Games / Howell 1976), which yield high statistical differences (p < .0001) between each adult group but not for L2 advanced adult learners and English native-speaking teenagers from years 7, 9 and 11. Thus, it seems that, from a merely quantitative point of view, advanced adult learners are on the same level of collocational proficiency as teenage L1 speakers. Gyllstad’s (2007) participants, Swedish undergraduate students, on the other hand, obtain average results which are well above this teenage level but still not quite native-like. Among the teenagers’ evaluations, younger L2 learners display the lowest results throughout all eight groups in this study. On average, their scores lie 3 Since the eight groups vary quite considerably in size (table 6.1), a post-hoc Games- Howell Test (Games / Howell 1976) was chosen instead of the more traditional Tukey HSD test (Tukey 1949). 6 CollMatch 147 consistently below the evaluations of their English native-speaking peers. Looking at the standard deviation (s. d.) as well as the maximum and minimum scores within each group, the results, however, indicate that German learners of English can also achieve very good results which lie at an average L1 teenager’s level. Still, not even 25 % of L2 teenagers in year 9 and 11 reach a score which comes close to native-like proficiency. But, the maximum and the minimum values within the regularly schooled learners at least show more or less steady growth. This suggests that even though they are not generally at a very high level of collocational proficiency, L2 speakers of English at least seem to gain proficiency throughout their language training. The standard deviation also reduces between year 5 to year 11, which means that on average the spread of different scores is less broad among older L2 teenagers than within a younger learner group. Non-native speakers of English from year 5 seem to be a more heterogeneous group; even after almost one year of language learning in the same group. Surprisingly, teenage native speakers of English also show a rather high standard deviation, which indicates that, including among participants from these groups, some scored quite high and others comparatively low. This is mirrored by the minimum and maximum values of these groups’ test scores. Furthermore, while even the lowest results among the adult native speakers of English are still close to the average adult learners’ score, about 25 % of each group of native-speaking teenagers seem to be less successful at identifying collocations than the average teenage learner of English from year 11. Furthermore, a post-hoc Games-Howell test yields statistical differences between all three groups of regularly schooled German learners, while the three groups of English native-speaking teenagers do not produce statistical results in their evaluation of collocational combinations. This, however, only indicates that they all score quantitatively on the same level. But only a qualitative analysis of the individual test items would show whether these test takers also agree in their evaluation of individual collocations. Overall, this first overview seems to confirm that, as Gyllstad suggested (2007: 159), native speakers of English score higher than their non-native speaker counterparts. At the same time, it also seems as though adult speakers with an academic background outperform younger speakers within the respective groups of native and non-native speakers. On the other hand, a closer look at the test scores’ distribution within the groups revealed that, while learners gradually gain greater collocational proficiency, native speakers of English include a relatively weak group throughout all teenage groups. This suggests that in L2 learning even weak students might benefit from explicit training, whereas in L1 language acquisition, students with a comparatively low level of collocational proficiency seem not to improve. 148 6 CollMatch Yet, even for groups with similar quantitative results, the question remains whether all collocations are accepted in the same way. Thus, focusing on the individual test items, the following pages zoom in on participants’ acceptance of CollMatch’s collocations and distractors. It is therefore important to note that this also means a shift in measurements. While the data in table 6.1 is calculated based on the number of correct answers within CollMatch, chapters 6.3 and 6.4 take the number of positive evaluations per item as their point of reference ( acceptance score ). This is due to the fact that, when an analysis is concerned with the attitude of different groups of test takers towards an item, it is less relevant whether an individual was right or wrong in his / her evaluation, but rather how this item was evaluated in the first place and whether this evaluation remains the same for all participant groups. Therefore, the following chapters first take a more quantitative look at the acceptability ratings of CollMatch’s items by adult (> 6.1) and teenage (> 6.2) native speakers of English, before chapter 6.3 then compares the different ratings from each L1 group to identify whether the degree of acceptability is the same for all collocations and distractors throughout the respective age groups. These results are then compared to non-native speakers’ data in chapter 6.4. Another interesting result that has not been mentioned thus far concerns the potential influence of different classroom situations, such as more regular and bilingually taught classes. Average scores indicate that both approaches are able to achieve similar results. Chapter 6.5 picks up on this observation and discusses the potential implications of classroom situation in more detail. Finally, chapter 6.6 sums up this section’s findings and observations. 6.1 Native Speakers - Adult From the data collected in Great Britain, adult English native speakers were selected to provide a benchmark for all other - younger as well as non-native - participants. In this group’s evaluation, most combinations fall into two groups: items which are accepted by the majority of evaluators and combinations which are rejected (Appendix II ). These evaluations are also reflected in graph 6.1, which provides a relatively clear picture: most accepted collocations can be found in the topmost part of the graph, while almost all pseudo-collocations range just above the x-axis. While the x-axis serves as a mere indicator of position within the test, the y-axis represents the frequency of positive evaluation (answer “yes”) in percent. Since CollMatch works with a simple yes-no scale, all items which show a low positive rating, such as turn a reason or pick a glance , were either negatively evaluated or not evaluated at all. The average accept- 6.1 Native Speakers - Adult 149 ance of collocations is 66.72 % with a standard deviation of 11.77, while pseudocollocations were only accepted by 3.51 % (s. d. 11.54). A more detailed overview can be found in appendix II , which will also show that within this group, in fact, most of the items in CollMatch are correctly evaluated as collocations or pseudo-collocations. Graph 6.1: Schematic overview of CollMatch’s collocations and pseudo-collocations evaluated by adult native speakers of English (see Appendix II for a detailed overview) Some items, however, range between a level of 40-60 % of acceptability, which seems to make them less suitable indicators. These items are supply one’s assistance , afford an opportunity and lay pressure . Another intended pseudo-collocation, express a worry , lies above 90 % and thus even outperforms the evaluation of some of the test’s collocations. In the case of afford an opportunity , it is interesting to see that this collocation has already been rejected by two of Gyllstad’s native speaker participants (Gyllstad 2007: 170). Gyllstad speculates that this might be because this combination is “a fairly formal phrase” (Gyllstad: 2007: 170), but a look at the BNC reveals another possible explanation. The combination afford an opportunity is usually 150 6 CollMatch accompanied by some kind of recipient or beneficiary, for example in the shape of a ditransitive construction or a [for_ NP ] as in examples (31) or (32) 4 . (31) BNC HTE 1670 There are many clubs and societies within the University which afford students and staff opportunities to perform in every kind of vocal and instrumental music. (32) BNC ANA 421 Such short-term care can provide the parents with children who are difficult to handle with an essential break, and affords an opportunity for older mentally handicapped people to gain more independence. Since CollMatch usually includes “someone” (Appendix I) to indicate that a collocation needs this kind of complement, participants might have rejected this item because it fails to account for the frequent realisation of a recipient and / or beneficiary. Thus, afford someone an opportunity might have yielded a very different picture. Supply one’s assistance, express a worry and lay pressure are interesting cases for another reason: they are classified as pseudo-collocations (Appendix II ) but achieve, compared to most distractor items in the test, a rather high acceptance score. As mentioned before, express a worry even scores higher than collocations like break news or draw a breath . A query for collocates of the lemmata supply, express and lay within a span of ±4 reveals that instances of all three intended distractors can be found within the BNC . While lay pressure and supply one’s assistance only occur once or twice respectively, a query for express a worry provides 35 hits. This might explain why native speakers felt that these combinations would actually be acceptable phrases within the English language. The question, of course, is then, why did they become distractor items in the first place? For lay pressure and supply one’s assistance , the answer is simply that their respective z-scores 5 of -1.95 and 0.91 did not yield statistical results. The z-score of express a worry , however, is 27.2 and thus exceeds the benchmark of 2.58 by some distance. So, it seems as if express a worry should be treated as a 4 In a BNC query for collocates of the noun opportunity (span: ±4), 57 out of 123 times this collocation occurs with some kind of beneficiary or recipient (for a suggestion of semantic roles see for example Herbst / Schüller 2008: 126-134). 5 Initially, the query yields a total of 3 and 4 instances of lay pressure and supply one’s assistance , but, on closer inspection, in one case only one and in the other two sentences remain as true hits. The z-scores given, however, are based on these initial, automatically generated scores. Taking the actual, lower co-occurrences would result in z-scores of -1.95 ( lay pressure ) and 0.65 ( supply one’s assistance ). 6.2 Native Speakers - Children 151 collocation rather than a distractor item 6 . For the calculation of a participant’s CollMatch score, however, the intended ratio of 70: 30 items is maintained since this misclassification affects all participants. The fact that adult native speakers are, nevertheless, able to see past this seemingly non-statistical result is comforting and disturbing at the same time. Comforting, of course, because it shows that in this kind of setting, native speakers are a fairly reliable source when it comes to evaluation of collocational phenomena, but, at the same time, cases like lay pressure or supply one’s assistance also demonstrate that any tool, like a corpus, is only as good as the question it is approached with. Despite these four problematic items, CollMatch could, however, still be regarded as a highly indicative test for the receptive collocational proficiency of a speaker of English, since it still contains 96 items which work comparatively well. Hence, comparing the 70 actual collocations with all items positively evaluated by more than 50 % of the speakers (above chance) results in a ratio of 69: 70 or simply 98.6 % of correctly identified items. 6.2 Native Speakers - Children The relatively clear evaluations from native speakers of English can only be observed for adult native speakers with an academic background. The picture changes with the age of the participants. The above chance identification of collocational items, for example, drops from 98.6 % down to 70 % for English native speaking children around the age of 12 years. It then grows in strength for year 9 (91.4 %), before, like the mean scores for correctness (table 6.1), decreasing again in year 11 (85.7 %). This pattern is, however, not retained for all items. Thus, there is apparently no overall process which can be observed, like, for example, steady, gradual growth in acceptance the older a test taker gets. Therefore, in order to obtain a more comprehensive picture of the different degrees of acceptability as far as the individual items are concerned, the percentage of positive evaluations for each item was computed and then compared against the adult data (Appendix II ). However, it is important to note that, unlike the adult L1 speakers, the children’s answers were less unanimous, with a mean acceptance for items of 43.28 (year 7), 50.77 (year 9) and 47.19 (year 11) and a standard deviation of 20.55, 15.22 and 17.22 respectively. Particularly striking is the fact that children around the age of 14 seem to be able to make a more accurate evaluation of CollMatch’s test items than their younger and older schoolmates. This 6 Gyllstad himself suggests changing the ratio of collocations and pseudo-collocations to 69: 31 (personal correspondence). 152 6 CollMatch tendency could be an indicator of greater variability or diversity among nativespeaking teenagers’ language perception in years 7 and 11, while students in year 9 obtain more stable results. At the same time, children’s evaluation of pseudo-items lies below chance for all but two items across all age groups. But, while in year 7 no pseudo-items except for one example were accepted by more than 60 % of the group (acceptability rate below 40 %), in year 9 only 19 out of 30 pseudo-items were firmly rejected. One pseudo-collocation - restore a favour - is even accepted by 70 % of the group. In year 11, again, no more than 40 % of the group accept all pseudoitems as being suitable combinations within the English language. Once more, children from year 9 seem to be more variable in their ratings. This picture becomes even more apparent when comparing the number of true collocations which range on a similarly low level of acceptance as the test’s pseudo-items. Here, children’s good result at the age of 12 has slightly decreased: while they seem to be able to identify pseudo-items relatively well, they also rate 15 of the actual items below 40 %. This means that although these children might already know which items do not qualify as suitable combinations within the English language, at the same time, they are also not familiar with one fifth of the test’s collocations. Children aged about 14, on the other hand, are less certain in their overall evaluation of pseudo-collocations, but only four collocations receive a positive evaluation by 40 % or less than 40 % of the group. So, while they seem to be more tolerant in their overall acceptance, students from year 9 achieve better results in telling a true from a pseudo-collocation than their younger schoolmates. In year 11 the number of misjudged collocations rises again to seven, which might indicate that, even at a quite late stage, a clear idea of what qualifies as a suitable combination within the English language has not yet set. This observation is partly in line with Wary and Perkin’s (2000) model, which postulates a transitional stage between more holistic and more analytical knowledge for children between 8 and 18 years. Yet, according to this model, pupils at the age of 16 are already well on their way to a predominantly holistic stage, which would almost suggest somewhat adult-like proficiency and indeed less variation compared to the younger groups. So, there might be other influencing factors at work which cause this heterogeneous picture: one might be that collocations, other than more fixed formulaic language like nice to meet you or thank you very much , are fused at a relatively late stage (Wray 2002: 123). A second option could be that the group as such is not as homogeneous as the adult sample. This would be supported by the relatively high values of standard deviation for the correctness scores in CollMatch, but the kurtosis and skewness both still lie within in the range of -1 to 1, which would usually still be considered as normally distributed data. Thus, the next chapter compares all four sets 6.3 Native Speakers - Patterns 153 of native speaker data in order to determine whether there are any consistent patterns which would support Wray’s claim that different phraseological combinations might in fact range on different levels of fusion. 6.3 Native Speakers - Patterns As indicated above, there is no overall tendency which holds for all CollMatch items across all native speaker data sets. Therefore, the three sets of children’s evaluations were more closely examined in order to see whether reoccurring patterns could be observed throughout the different subsets. Four general patterns can be identified, which, based on the development of acceptability throughout the groups, were labelled: Gradual Acceptance ( GA ), Peaked Acceptance (PA), Steady Acceptance (StA), and Receding Positive Evaluation (REC). In the next step, these four patterns were compared to the adult data in order to find out whether their observed tendencies prevail or not (> 6.3.1-6.3.4). For the sake of clarity the evaluation of the 30 pseudo-collocations will then be summarised in a separate chapter (> 6.3.5) in order to keep a consistent line of interpretation and avoid unnecessary leaps. Furthermore, only one item per pattern is pictured with a bar chart. However, a list of all items’ overall acceptance scores can be found in Appendix II . 6.3.1 Pattern 1: Gradual Acceptance Acquisitional processes are often thought to take the shape of gradually developing mastery of abilities. Thus, Gradual Acceptance ( GA ) is a pattern which is presumably to be expected as a result of analysis of (language) acquisitional processes. As depicted in graph 6.2, here acceptance rises gradually from age group to age group. This could be interpreted as reluctance in young native speakers of English to accept a combination with which they are unfamiliar. Positive evaluation then increases with age, which could indicate that the more experienced a speaker is, and - very likely - the more often s / he has encountered or even used an item, the more likely s / he is to accept this combination. This process then continues until the item seems to be part of a native speaker’s language inventory and thus is completely accepted whenever it is encountered. So, while native speakers at the age of 12 might know individual words but are hesitant to accept their combination, older native speakers have already encountered these collocations more often and are more confident in accepting established collocational pairs. 154 6 CollMatch Graph 6.2: Example for Gradual Acceptance ( GA ) - L1 acceptability rating for drop hints Since the data obtained by CollMatch only allows for inferences about an individual’s total acceptance (“yes”) or rejection (“no”), it could also be argued that the percentage of positive judgement shows that not all children have encountered a collocational pair, not that the children who rejected an item had encountered this combination but not frequently enough. While this would not change much regarding the overall gradient character of the acquisitional process, it certainly has implications for a cognitive model of language development. As has been mentioned before, the data from CollMatch does not support a more detailed analysis of this aspect, yet a glance at the items which show this pattern of gradual acquisition might help to decide how likely it is that, rather than frequency of encounter, the first encounter with a collocational pair could be a decisive factor. For this reason box 6.1 lists all items with gradual age-related acceptance. have a say * , catch fire * , drop hints, clear one's throat, settle a dispute, grant permission * , launch a campaign * , spread one's wings, blow one's nose 5 % over do justice, say grace, bear witness, serve a sentence, reach a conclusion * , realise potential, strike a blow, commit a sin, steal someone's thunder, pursue a career, dismiss an idea * 5 % under break news, acquire a skill, jump a queue *) Even in year 5 these items start at an acceptance level of equal to or over 60 %. Box 6.1: Items of Gradual Acceptance (L1) 6.3 Native Speakers - Patterns 155 As far as individual lexemes are concerned, most elements within the GA pattern could be seen as rather basic vocabulary. With the exception of say (noun), hint and dispute , all lemmata are rated as part of the Oxford 3000™ 7 , and almost all can be found in data from early language acquisition, like CHILDES (MacWhinney 3 2000). In the Thomas Corpus, for example, caretakers and child alike use all of the above lexemes, except dispute , grant , and campaign. Thus, it could be argued that it is very likely that children who rejected a certain combination did so, not because they were not familiar with one of its constituents, but because they rejected the combination per se. This assumption is further supported by the fact that some of the combinations with supposedly quite familiar words like have a say , catch fire or drop hints receive similar acceptability ratings as less common combinations like grant permission or launch a campaign . It indicates that, rather than first encounter with the individual constituents, it is indeed the item’s type frequency, and with it familiarisation, which results in higher acceptance of these pairs the older the native speakers get. Furthermore, one third of the items contain a one’s-construction, so it could be argued that while children might be generally familiar with the collocation as such, the more schematic presentation might influence their judgement. Yet, only three out of ten items with a one’s or someone’s-slot occur with a pattern of gradual acceptance; within the other seven, three items share the same steady level of acceptability across all age groups, one with an average rating of over 70 % (> 6.3.3). However, word pairs which strictly behave according to a GA pattern are not very pervasive; only 13 % of CollMatch’s items display this pattern. Since one student’s evaluation causes a difference of about 2.5 %, items which showed a deviation of up to 5 % were also included as Gradual Acceptance. Thus, collocations which share the same overall trend of a rise in positive evaluation from youngest to oldest teenage group but were given an evaluation by students from year 9 which either exceeds year 11 evaluations by up to 5 % or lies about 5 % below evaluations from year 7, were also labelled as gradually accepted. A further 14 items fall into this category, which means that 33 % of CollMatch’s items could be regarded as gradually accepted. For the extended pattern as well, the number of words which are not included in the Oxford 3000™ remains rather low at four lemmata ( grace , sin , thunder , queue ). At the same time, nine 7 A list of 3000 keywords, selected by Oxford University Press; it should be used as vocabulary for all definitions within learners’ dictionaries such as the Oxford Advanced Learner’s Dictionary (OALD) but could also be regarded as a kind of basic vocabulary for English language learning and teaching. According to the publisher, most words are selected because they range among the most frequent words in the English language (based on the BNC). Furthermore, a lemma needs to occur within a variety of different genres to be granted the status of an Oxford 3000. In addition, it contains words which are less common but were considered to be “useful for learning as well as relevant for defining purposes” (Hornby 2005: viii) 156 6 CollMatch out of 28 lexical constituents do not occur in the Thomas Corpus. However, the absence of justice , grace , conclusion , potential , commit , sin , pursue , dismiss and acquire in the Thomas Corpus, and thus any potential unfamiliarity, is not reflected in the evaluation by the youngest group of children. Of course, it has to be kept in mind that the Thomas Corpus only represents a fairly small amount of data compared to the whole of the linguistic output a child produces on a daily basis. Furthermore, with one child as the sole provider of child language, it is also far from representative. Nevertheless, the words which are neither used by a caretaker nor Thomas himself are all rather abstract, adult concepts, which might not be very common words for 12-year-olds either. Therefore, to a certain extent these words could be expected to start off at a relatively low level of overall acceptance, but the range from close to 40 % up to almost 70 % can be observed for more general vocabulary as well, like strike a blow or break news . This suggests that, while not all items within this pattern enter the evaluation at the same level of acceptability, it is difficult to predict whether more abstract or quite concrete concepts obtain higher scores. The adult evaluations confirm the pattern of steady growth in acceptability throughout the different age groups. This then translates into a close-to-ceiling acceptance for adult native speakers for all GA items. In addition, a second trend can be observed: the majority of the items, 16 out of 23, show a distinct difference between teenage and adult evaluators, which is a discrepancy equal to or over 20 % compared to the highest positive evaluation among the teenage participants. Leaps of this size occur almost exclusively between the evaluations of children and adults. In fact, there are only six instances where acceptance scores from children from years 7 and 9 show a similar gap. Interestingly enough, if this earlier gap occurs, there is then no jump in acceptance score between years 9 and 11. This allows for two interpretations: either children, even if they are in their late teens, are still to gain confidence and proficiency concerning their own mother tongue, or there is a visible discrepancy between the proficiency of undergraduate university students and older L1 teenagers, which might be connected to linguistic differences between secondary and tertiary education. If the latter holds true, collocations which might be considered to belong to more academic vocabulary could be particularly prone to displaying this pattern with quite strong adult acceptance. Since this effect might also occur in other patterns, chapter 6.3.6 will come back to this observation and review its value and implications against the background of all CollMatch items. 6.3 Native Speakers - Patterns 157 6.3.2 Pattern 2: Peaked Acceptance Another common pattern in children’s evaluation of CollMatch items is Peaked Acceptance ( PA ), which could also be regarded as a sub-pattern of Gradual Acceptance, since both patterns share the same trend: a gradually growing acceptance of word combinations between the youngest and the oldest age group. However, within the pattern of Peaked Acceptance, children from year 9 behave slightly differently. Not only do their evaluations reach above year 7, but they also surpass the acceptability ratings from year 11 as well (graph 6.3). This is the case for 26 % of the test’s items for which children around the age of 14 seem to be the most positive non-adult evaluators. Graph 6.3: Example for Peaked Acceptance ( PA ) - L1 acceptability rating for meet a need The pattern as such is again difficult to explain in terms of predictability. It occurs with items which overall achieve a relatively low acceptance score, like adopt an approach , or initially positively evaluated items like justify one’s existence , all of which might be considered to describe more abstract concepts, but also with more concrete, everyday vocabulary like beat eggs , bend a rule or snap one’s fingers . Moreover, the range covers a spectrum between 6 % to 22 % points discrepancy, so it is very unlikely that the very positive ratings from the age group of 14-year-olds are caused by the easiness of the items’ constituents. Furthermore, since half of the items display a difference of 10 % points or more 158 6 CollMatch between the evaluations from years 9 and 11, chance or coincidence are also unsatisfactory answers to the question of why students around the age of 14 seem to be generally more accepting in their evaluation. A factor which can often be found, especially when a group shares a pool of similar experiences like a family or a class, is priming 8 . Thus, it could be argued that students from year 9 were exposed to the items in box 6.2 prior to the test and that this is why they are more ready to accept these co-occurring items. There are, however, two aspects which make this very unlikely. First, while all students came from the same year, they were divided into different groups for most of their subjects. Thus no previously experienced classroom situation could have influenced the majority of the students at the same time. A glance at the items as such provides a second reason: the range of concepts seems to be too vast to justify any claims of priming, since it is very rare that a situation occurs where, for example, fit the bill , dress a wound and shift gear might be used in the same session. Another explanation might be that students from this particular age group at this particular school are linguistically simply more advanced and thus more proficient users of English. While this could explain why these 14-year-old teens are almost as certain in their evaluations as the adult speakers, this certainty is not reflected in their overall evaluational behaviour. These students do not score close to adult proficiency across all items, nor, as outlined in 6.2, is this age group particularly prone to over or underevaluation of pseudo-items and other collocations respectively. In fact, the tendency of more positive scores from year 9 can also be found among the evaluations of some distractors (> 6.4.5). draw a breath, raise objections * , meet a need * , fit the bill * , gain ground, adopt an approach, beat eggs, employ a technique, assess damage * , exercise discretion, dress a wound * , challenge a view * , shift gear * , justify one's existence * , bend a rule, snap one's fingers, grab a hold * , file a report *) Difference of 10 % points or over between evaluations from year 9 and 11. Box 6.2: Items of Peaked Acceptance (L1) 8 In a study on the effects of priming on the social behaviour of their participants, Bargh, Chen and Burrows define priming as follows: “Priming refers to the incidental activation of knowledge structures, such as trait concepts and stereotypes, by the current situational context.” (1996: 230). However, in a replica study of this experiment Doyen et al. (Doyen / Klein / Pichon / Cleeremans 2012) were able to show that priming works particularly well if the experimenters themselves believed in priming effects. Since experimenter and teacher were not the same person, this type of immediate priming can be excluded from the list of potential reasons here. 6.3 Native Speakers - Patterns 159 These observations lead to the conclusion that, despite a seemingly positive trend within this pattern, teenagers around age 14 might be considered to be less concerned with actual phraseological acceptability and instead focus on the fact that a combination consists of two acceptable English words which occur in an appropriate, grammatically correct way. Since this is the case for all of CollMatch’s combinations, the difference between acceptable and unacceptable combinations becomes less distinct compared to other groups. With respect to the DMCDC model, this could be further explained as a rather analytic phase within L1 language perception, which, as Wray and Perkins (2000) suggest, still does not seem to be in place for some collocational combinations at the age of about 14 years. 6.3.3 Pattern 3: Steady Acceptance The third pattern, Steady Acceptance (StA), shows very evenly distributed and often very high or even close to ceiling acceptance of an item throughout all three groups. Steady Acceptance can be observed in 17 % of CollMatch’s items. Graph 6.4 shows an exemplary pattern for pull a face , while box 6.3 gives an overview of all CollMatch items which share this kind of evaluation. Here, items like make a move or pull a face reach adult-like scores of positive evaluation even from the youngest age group. The majority of these items reach an initial acceptance score of over 80 %. Graph 6.4: Example for Steady Acceptance (StA) - L1 acceptability rating for pull a face 160 6 CollMatch Therefore, it could be argued that these collocations are either stored or even acquired as a whole unit from a very early age, or that they are so frequently cooccurring that participants, independent of their age, are able to identify them immediately as an acceptable unit. In the Thomas Corpus, for example, collocations like make a move , pull a face , run a bath and pay attention can be found in early recordings of the caretakers’ tier, when Thomas is about 2; 02 years old. This suggests that these lexical combinations are used on a regular basis when interacting with the child. He is introduced to these items within the first years of his life; an observation, which, in itself, is not really helpful for either theory, since it could be seen as evidence for the more nativist assumption that these collocations function like multi-word lexemes and are acquired after only a few sequences of input. On the other hand, it could also support any entrenchment hypothesis which proposes a continuous acquisitional process in which cognitive associations are strengthened with the overall frequency of input. However, according to nativist tradition, children acquire all a language’s lexical aspects at a relatively early stage. This then would imply that a test like CollMatch would yield more or less the same close to ceiling acceptance pattern for all non-pseudo items or - if one assumes that the “remarkable rapidity” (Chomsky 1959: 57) in the language acquisition process takes well into a native speaker’s teens - that all items display more or less the same pattern in general. At least for the data at hand, this is not the case. Throw a party, set an example , give a speech and cast a vote reach an acceptability rate of over 80 % as well but do not occur in the Thomas corpus. However, as has been pointed out before, just because a combination is not used during a recorded session by one child and his caretakers does not necessarily mean that this collocation has not been used at all or at least within the acquisitional process of other children 9 . Thus, in this case, an argumentation ex negativo is difficult to maintain. Especially since the items as such might fairly regularly be used in contexts young teenagers are familiar with, like school, where all four collocations are likely to be used fairly regularly either among peers or in the classroom. Furthermore, there is hardly any difference between children’s and adult evaluations, which suggests that adult-like proficiency for these items is obtained at a very early stage. Interestingly, the only collocation which shows a more pronounced difference between teenage and adult acceptability scores is cast a vote , which is also the item within this pattern that consists of one out of two words which cannot be found at all in the Thomas corpus. 9 See Tomasello and Stahl (1999) for further discussion of the amount of production data needed for a representative study of child language acquisition. 6.3 Native Speakers - Patterns 161 Other items like lose sleep , push one’s luck or sustain an injury are more difficult to imagine featuring in a child’s daily share of communicative input. Even though all the lexemes except sustain can be found in the Thomas Corpus, none can really be called a classroom phrase or a very common expression in teenager-talk. This might also explain why only 60 % to 80 % of all teenage evaluators across the three groups accept these items as “used in the English language” (Appendix I), compared to the adult native speakers, for whom this score rises again close to ceiling. It is surprising that the young evaluators seem to agree in their evaluations. This effect could again indicate a particularly academic choice of words. Thus, an acceptability rate of about 60 % might represent the percentage of children with a more advanced linguistic competence, who might continue their education at university. However, there are no statistical differences between children from higher and lower educational backgrounds as far as this group of test takers is concerned 10 . lose sleep, make a move * , give a speech * , pull a face * , run a bath * , throw a party * , set an example * , pay attention * , push one's luck, cast a vote * , kick one's heels, sustain an injury *) Scores of all groups lie above 80 %. Box 6.3: Items of Steady Acceptance (L1) The remaining item, which shows a steady acceptability rate among teenage evaluators, is kick one’s heels . It scores under 60 % throughout all children’s groups; nevertheless, the trend of a clear academic preference prevails. This suggests that, while the concept of being bored is surely something children as well as adult evaluators experience, children might be used to different methods of linguistic conceptualisation. 6.3.4 Pattern 4: Receding Positive Evaluation Unlike the Gradual and Peaked Acceptance patterns, the following items do not show an overall tendency towards an age-dependent rise in acceptance of the respective combinations. On the contrary, they all have in common that the evaluations of the oldest group of teenagers lie below the assessment by year 7 pupils. Items with this overall negative trend can, like items with increasing 10 See chapter 6.5 for a more detailed discussion on the influence of parents’ or caretakers’ educational backgrounds on participants’ performance in CollMatch. 162 6 CollMatch positive evaluation, be subdivided into two sub-patterns: Gradual Decline (DEC) and Peaked Recession (PR). In addition, there is a third sub-pattern, Dented Recession ( DR ), which apart from a decreasing acceptability score, reveals a more reluctant year 9. To find these patterns within the evaluations of CollMatch items is rather surprising, since a decrease in positive assessment would insteadbe expected for the test’s pseudo-items. DEC play a trick, press charges * , afford an opportunity, clean windows * PR keep pets * , hold meetings * , assume responsibility, cut a corner * , fly a flag, perform a miracle * , deliver a speech * , ride a storm, lend support * , cease fire, shrug one's shoulders * DR suffer damage, abandon ship * *) Scores of all age groups lie above 60 % Box 6.4: Items with an overall tendency of receding positive evaluation (L1) In this case, however, the items are statistically significant co-occurrences within the English language, most of which are also well accepted by the majority (over 80 %) of adult native speakers. At first glance, two possible explanations come to mind: either older teenagers forgot about these combinations because they do not use them (anymore), or there is something about these combinations which makes them in somehow less appealing to a certain age group. Since very common concepts such as clean windows or shrug one’s shoulders are among these items, it is though very unlikely that these collocations fell out of fashion. A closer look at the items which show a pattern of Gradual Decline reveals that, even if a negative trend throughout the three age groups can be observed, three out of four combinations still range above 80 % of acceptance. Adult data again yields close to ceiling effects, so a slight downward trend in children’s data might be considered marginal fluctuation. However, play a trick , press charges , and clean windows share this pattern with an item which has been mentioned previously: afford an opportunity . This collocation, despite a z-score above 2.58, scores rather low even among academically trained native speakers of English. As has been pointed out before, one reason might be the overall low frequency of the combination in connection with its rather unnatural phrasing, since afford someone an opportunity might sound more common. In order to find out whether effects of frequency or the items’ construction could be responsible for this negative trend among young native speakers, all items dispaying a pattern of overall decline were checked against corpus data for potential alternate combinations (Appendix III ). 6.3 Native Speakers - Patterns 163 Especially in the sub-patterns of Gradual Decline and Peaked Decline, most items can either be found with a more frequent collocate (paradigmatic level) or occur more often in a construction which differs from that presented in Coll- Match (syntagmatic level). Particularly for items within the pattern of Peaked Decline, it appears that the less prototypical portrayal of these combinations triggers a more reluctant acceptability in some of the younger age groups. Year 11 in particular seems to be least sure about the items’ appropriateness, while year 9 and at times even year 7 have fewer reservations about the acceptability of these items. This raises the question of why for some items older teenagers have more issues accepting less frequent and / or constructionally deviant combinations than adult native speakers or even their younger peers. Similar to items which show a pattern of Gradual Acceptance, the increase in acceptability score from year 7 to year 9 could be explained by an age-related increase in input and frequency. Year 11, students, however, seem to shift their focus. While younger teenagers instead focused on a generally acceptable co-occurrence of a sequence of words, children at the age of 16 might be more aware of further factors, like a general frequency of co-occurrence, or an item’s preference for certain constructions. Adults, on the other hand, then master both high-frequent input of individual tokens as well as the association of more varied, but related tokens to one more abstract type. This results in knowledge about what is combinatorially possible even if it is less frequent, as well as the ability to use more abstract lexemes in order to attribute these to actual, acceptable constructions. The two items which do not seem to have a more frequent, alternate construction or another semantically similar, yet more likely, collocate to occur with, are suffer damage and abandon ship . At the same time, these two collocations are the only two to share the Dented Recession pattern, peaking in year 7. Thus, in this case, the youngest teenagers are those who score closest to a close-to-ceiling adult evaluation. In fact, abandon ship is rated as acceptable by over 80 % of the 12-year-olds, while for suffer damage all teenage groups respond above chance, but only in a range between 51 % and 66 %. Furthermore, neither of the two collocations are very common, at least in teenagers’ linguistic environment. However, all four lexemes occur in the Thomas corpus, which might explain why children are at least familiar with these items. The question is: why are older children not familiar with them, or at least not to the same extent? The reason could still be a matter of focus, even though combinatorial and constructional likelihoods do not seem to be affected here. Nevertheless, these two items allow two observations. First, it once again supports the hypothesis that for some items, children focus more on the existence of the actual words and far less on their distribution or patterns. Therefore, if two familiar English words co-occur in a sensible way they tend to be accepted, while the older a native 164 6 CollMatch speaker gets, the more attention s / he pays to potential structural restrictions and frequency effects. Furthermore, the fact that this peak in acceptability can also occur at an earlier stage than year 9 indicates that this process, while often similar in shape, does not necessarily take place at the same age. 6.3.5 Distractors As described in section 6.1, adult native speakers perform relatively well on all CollMatch’s combinations. So, while all but two collocations score over 60 % of acceptance, pseudo-collocations are evaluated as acceptable combinations of English by under 20 % of the adults (graph 6.1). Young native speakers, on the other hand, are less clear in their judgement and it thus comes as no surprise that 23 out of the 30 distractor items are more likely to be accepted by any of the children’s groups. This indicates once more that there is a gradual familiarisation process which happens between the early teenage years and adulthood. Young native speakers seem to gradually become more aware of the restrictions within their own mother tongue and reject pseudo-collocations more readily. However, even for the youngest age group, there are items which already score quite low, like turn a reason or drag a limit . The majority of items range between 20 % and 40 % of acceptance in the youngest age group, which indicates that most items are indeed rejected, but not as firmly as among adult native speakers. However, like the collocations in CollMatch, the distractor items in most cases do not occur in a single linear pattern of gradually declining acceptability. They display patterns with peaks or steady acceptance, very much like the test’s actual collocations. In fact, the patterns of pseudo-collocations are quite similar to those described earlier. But, as expected, the overall percentage of pseudo-items which fall into the respective categories is reversed. So, while Gradual Acceptance (> 6.3.1), Peaked Acceptance (> 6.3.2) and Steady Acceptance (> 6.3.3) account for the majority of collocations, the family of Receding Positive Evaluations (> 6.3.4) are the patterns which can be found for 70 % of the distractor items. GA : turn a reason PA : stretch a regard, express a worry, rush rank, lay pressure, pack an affair, stand an occasion StA: supply one's assistance, pick a glance REC : DEC : fetch an illness, sink speed, knock a concern, hit approval 6.3 Native Speakers - Patterns 165 PR : claim trade, restore a favour, shake a smile, drag a limit, gather a matter, win one's memory, impose success, swing a secret, rule an award, stick one's mood, score problems, roll a look, bind blood, charge respect, fill an aim, sit seed, fall a failure Box 6.5: CollMatch’s pseudo-collocations according pattern (L1) Zooming in on the different sub-patterns of Receding Positive Evaluation, a pattern of Gradual Decline ( DEC ) is presumably the pattern which might be expected of an evaluation of pseudo-collocations across different age groups. Yet only 13 % of all distractors show this tendency of steadily growing rejection of pseudo-items. The vast majority of cases fall into the pattern of Peaked Recession ( PR ). This 56 % also shows a general decrease in positive acceptances between year 7 to year 11, but with an overly positive peak in evaluation from year 9. This group of young native speakers has a strong tendency to judge most pseudo-collocations - 24 out of 30 - more positively than all other native speakers from the remaining data sets. In half of the cases this discrepancy lies within the range of just 5 % to 10 % points, but, in some dramatic cases, such as claim trade , swing a secret , express a worry , stick one’s mood or lay pressure , the margin between the other groups reaches over 20 % points. As mentioned before, the reason for this tendency cannot be caused by a general lack of motivation among some participants, since all tests which showed an answering pattern which indicated little or no motivation were removed from the database in advance; like for example tests with answers of only “yes” or “no” for all items as well as a clear alternation between positive and negative evaluations. Therefore, the over-positive acceptance of pseudo-collocations which predominantly occurs in the age group of 14-year olds might mean that at the age of about 14, young native speakers are less sensitive to lexical restrictions and more readily accept combinations which are formed of English words, but do not usually cooccur together (low z-score). This distribution is very similar to results of the collocation evaluations, which yielded a growing acceptance of items but also showed a peak in year 9. It suggests that, while collocations and distractor pseudo-collocations generally diverge towards adult-like results, young native speakers around the age of 14 display the same over-positive evaluations for collocations and distractors alike. The six items within the pattern of Peaked Acceptance could be counted as further evidence of this trend (> 6.3.2). Not only because they show the same peak in year 9, but also because most of them do not differ from the pattern of Peaked Decline by more than 5 %, which translates into only one or two students who make the difference between a tendency to accept or refuse a combination. 166 6 CollMatch In a similar vein, the only pseudo-items which occur with a pattern of Gradual Acceptance or Steady Acceptance are more closely related to Gradual Decline and Peaked Decline than to their equivalents among the test’s collocations. Pick a glance in particular distributionally resembles the pattern of Peaked Decline, but since the group’s scores do not diverge by more than 5 %, it is primarily listed as a pattern of Steady Acceptance. Furthermore, across all teenage groups this combination does not reach more than 20 % acceptance, which, once again, shows how superficial a strict classification under Steady Acceptance would be. Turn a reason is also not a very convincing example of Gradual Acceptance. However, a gradually growing acceptability score can clearly be observed, though again, the status of “acceptance” is rather weak since the teenagers’ acceptance does not exceed the 15 % mark. As mentioned in chapter 6.1, supply one’s assistance and express a worry have to be treated as exceptions. They are the only two items which recieve evaluations above chance from adult L1 speakers. Express a worry , with an acceptance score of 93 %, ranges even above collocations which would be considered quite established, such as have a say or draw a breath . As mentioned before, it could therefore be considered a collocation. But even this item shows receding acceptance in year 11 compared to a rather high 74 % acceptance in year 9. Overall, this peak in acceptance for distractors is consistent with Wray and Perkin’s (2000) as well as Bybee’s (1995: 447-448) observation that after a period of successfully using an entrenched structure (such as formulaic sequences or irregular past tense forms) children fall back on more rule-based and therefore often less established use of these items. In their model Wray and Perkins (2001: 20) hypothesise that this phase of more analytic processing of formulaic sequences lasts from the age of two up to eight. The fact that young native speakers around the age of 14 still partly seem to ignore conventional, frequencybased restrictions, however, indicates that, at least for some collocational pairs, an analytic phase might instead be observed between the ages of 11 and 16. 6.3.6 Summary Summing up the results of chapter 6.3, there are four general observations. First, even for native speakers of English, the acquisition of collocations is still an ongoing process throughout most of their teens. In addition, the relatively broad variation in acquisitional patterns shows that the acquisition of collocational combinations is not a uniform process but rather differs in onset, duration, and shape. However, a stage of over-positive evaluation seems to occur for a large number of CollMatch’s combinations. Furthermore, there is an observable difference between native speakers with and without an academic background. 6.3 Native Speakers - Patterns 167 The first of these observations in particular might at first glance appear fairly uncontroversial, but since some approaches (> 4.1) assume that L1 acquisition is completed within a native speaker’s early years, it is rather striking that most collocations and even distractor items pass through a stage of stabilisation. In fact, apart from items of Steady Acceptance (17 %), where the youngest group of test takers perform in a similar way to their older peers, all combinations apparently need several years to be fully stored and evaluated with close to adult proficiency. Thus, at least for some phraseological items (like certain collocations) it can be assumed that language acquisition is not completed until the age of 16 or even older. Of course, data for this study is derived from different participants from different age groups, so variation in acceptance scores could also be caused by general variation among participants. Yet, as the previous chapters explained, the evaluations across the different age groups are not completely arbitrary, and there is an observable tendency towards age-related, growing acceptance of most items. Nevertheless, a longitudinal study with a consistent group of participants would surely help to support the claims made in this analysis. The second implication of CollMatch’s evaluation is the variation in patterns which were brought to the fore by the detailed analysis of data from different age groups. Table 6.2 gives an overview of the four main patterns identified in this section. patterns collocations n=70 pseudo-collocations n=30 Gradual Acceptance 33 % 3 % Peaked Acceptance 26 % 20 % Steady Acceptance 17 % 7 % Receding Positive Evaluation 24 % 70 % Table 6.2: Patterns identified in CollMatch based on each L1 group’s acceptance scores This variation is interesting, since, as has been pointed out before (> 5.4.1.1), from a corpus linguistic point of view all CollMatch items share the same statistical significance and are made up of lexemes which are generally regarded as basic vocabulary. Therefore, a difference in acquisitional patterns indicates that the status of collocations in the first language acquisition process is indeed rather complex and, to a certain extent, item-specific. Most patterns for example pass from a stage of non-recognition to being well accepted by most adult native speakers within the observed age frame, but some collocations have either 168 6 CollMatch been stored as one holistic unit from an early stage on or have already completed this process at an earlier age (> 6.3.3). Also, acquisition as such does not seem to be a consistent development for all items. While some actually display gradually growing acceptance (> 6.3.1), many collocations appear to have a peak in acceptability ratings some time before they are retrieved with adult proficiency. Since this overly positive evaluation within a certain age group occurs for many of CollMatch’s items, it is very likely that this phenomenon is in fact part of a collocation’s acquisitional process. The reason why some items show a smoother gradual pattern could be that either they belong to a cognitively different group of linguistic information, or due to the fact that data collection for most children took place before and after the item’s period of over-evaluation. A test design with shorter intervals and perhaps even a longitudinal study could shed more light on this question. This, however, bears the risk of entrenchment through test taking, since asking the same questions within a fairly short span of time could itself make test takers more accustomed to the items under investigation. Regardless of the scope of this peak effect, this result has major implications for first and possibly even second language acquisition and learning. Thus, even though collocations result in similar patterns of distribution and significance within an adult language system, the teenagers’ evaluations of CollMatch’s items showed that these phrases could still have been acquired at different stages and maybe even in different ways. Thus, any model of first language acquisition needs to be able to account for this multilayered process as well as the fact that similar items are not necessarily acquired in a similar way or at the same time. Furthermore, combining this observation with the fact that the acquisition of collocations appears to continue well after the age of 12 puts any approaches which claim that first language acquisition is a relatively fast and grammar-centred (Chomsky 1959: 57) process in a difficult position. With its dynamic background and different levels of idiomaticity and fusion, the DMCDC -model (> 4.4) seems to be adequate to account for these implications. Since it assumes collocational structures to be subject to continuous change, according to this model different collocations would in fact even be expected to be processed on different levels, even if they share some formal features such as a general VP + NP structure or a similar statistical expectancy of co-occurrence. Furthermore, like Wray and Perkins’ model (2000: 20) or Bybee’s (1995; 1985) research on irregular past tense forms, the DMCDC -model postulates that the acquisition of collocational combinations includes a more analytic stage in which more structural factors like a correct phrase structure or the combination of suitable word classes seem to be the focus of a young native speakers’ reasoning. During this phase, to a certain degree phraseological restrictions fade into the background. The finding that especially young native speakers around the 6.3 Native Speakers - Patterns 169 age of 14 seem to have fewer reservations in their acceptance of collocations and pseudo-collocations alike suggests that this stage is indeed part of the process of collocational language attainment. Yet the data from this section indicates that, for collocations, this phase might last throughout a native speaker's teens. The third observation also refers to a kind of variation across the different age groups, but this time it is the comparison of teenage and adult evaluators which reveals an interesting tendency. In recent years, researchers like Ewa Dąbrowska (2012; 2004) and James Street (Dąbrowska/ Street 2006) have stressed that linguists need to be cautious about viewing native speakers as a unified group in which each member has the same linguistic competence. Thus, “the native speaker” could rather be seen as a construct while reality is much more diverse and variable. Evidence for this claim can also be found in this chapter’s data. As has been mentioned before, most items reach close to ceiling acceptance scores from educated adult native speakers, while even the oldest group of teenagers rarely show this effect. Some items are only hesitantly recognised by all young native speakers, which then causes a rather drastic jump in acceptance score between year 11 and academic adults. Collocational pairs such as adopt an approach or justify one’s existence , for example, are readily accepted by almost all adult native speakers, while even the older teenagers are reluctant to judge these items as a regular reoccurring combination (graph 6.5). Graph 6.5: Example for Academic Acceptance (AcA) - L1 acceptability rating for adopt an approach 170 6 CollMatch Of course, it could be argued that 16-year-olds simply need the remaining four to five years to internalise these patterns. In fact, as section 6.4 will discuss in more detail, success in CollMatch is, for example, not as dependent on the parents’ or caretakers’ educational background as on age. Yet 60 % to 70 % of participants who reject for example combinations like adopt an approach , exercise discretion or justify one’s existence seems a rather high ratio compared to other items which only lie on average about 18 % points from adult proficiency. Box 6.6 lists all items for which the teenage evaluators’ score lies 30 % below the adults’ evaluation. This list contains five items with a (some)one’s-construction, which equals 50 % of all items which occur within this construction. Thus, there is a slight tendency that those items with more abstract wording might actually be more difficult for the younger test takers. However, structural aspects are not the only reason why teenage evaluators might struggle with these collocations. Most of these items, like realise a potential , adopt an approach , employ a technique , acquire a skill , dismiss an idea or justify one’s existence , are rather likely to be used and at the same time encountered more often in a more academic context or at least a certain, rather academic discourse, like the more legal terminology in bear witness, serve a sentence, settle a dispute or exercise discretion. lose sleep, bear witness, serve a sentence, realise a potential, adopt an approach * , clear one‘s throat, strike a blow, employ a technique, settle a dispute, acquire a skill, spread one‘s wings, exercise discretion * , steal someone‘s thunder, dismiss an idea, justify one‘s existence * , kick one‘s heels, sustain an injury *) Acceptance scores from all teenage groups score below chance (<50 %) Box 6.6: Items of Academic Acceptance (L1) On the other hand, collocational pairs which do not seem very concerned with predominantly academic concepts also occur with this kind of subpattern, like lose sleep , clear one’s throat , strike a blow , spread one’s wings , steal someone’s thunder , kick one’s heels or sustain an injury . Even though these collocations are not classic examples of academic or discourse specific language, they are, however, very literary examples of language use, which all have a more common way of phrasing, like for example (not) worry, cough, support, become more independent, get attention, having nothing to do or suffer 11 . 11 These potential paraphrases are taken from the respective entries in the OALD 7 (Hornby 2005). From a lexicographic point of view, it is interesting to note that only two are listed 6.3 Native Speakers - Patterns 171 Therefore, this discrepancy in positive evaluation between teenagers and academically educated adults might, in fact, originate in the observation that a more diverse, discourse specific vocabulary is not available to all native speakers and only develops with further education and / or a certain linguistic experience throughout adulthood. A similar phenomenon has been expressed by Herbst and Klotz (2003: 145-147) with the concept of probabemes . In general, they describe probabemes as a preferred way of phrasing by native speakers of language compared to the linguistic realisation of the same concept in another language. A time frame of about 180 days, for example, is usually referred to as six months in English, while German native speakers would rather use ein halbes Jahr (lit.: “a half year”); the very similar half a year , of course, is used in English as well but it occurs with a much lower frequency 12 . The data at hand shows that in language acquisition too, native speakers of English appear to be aware of these distributions. Therefore, some collocational pairs are accepted at a relatively late stage within the acquisitional process. This not only suggests that the process of language acquisition continues well beyond the age of 16 but also that probabemes, in the shape of a preferred way of phrasing, cannot only be observed within lexicographic or corpus data but also through contrasting different stages within first language acquisition. So, while CollMatch works well for the assessment of advanced vocabulary proficiency, it has to be treated carefully when applied to other purposes, as, if not even young native speakers of English are able to identify acceptable English collocations correctly, it is questionable whether a student at an intermediate level should be treated as less proficient if s / he struggles with certain, more academic collocational pairs. A second implication concerns the exclusiveness of collocational expressions. As some of the examples from this group have shown, collocations need to not only be evaluated in terms of their associational strength but also checked against other, linguistically more likely ways of expressing a concept before they gain a high status, such as inclusion in a dictionary or course book. under the entry of their noun-collocate ( lose sleep , kick one’s heel’s ), while the rest occur under the entry for their verb-collocate ( clear one’s throat , spread your wings , steal someone’s thunder , sustain an injury ). Strike a blow , however, is listed under strike in the OALD 7, while the current online version shows as a subentry for blow. To a certain extent this distribution seems rather arbitrary, since, for example, kick could be seen to be as closely associated with heels as spread is with wings . Especially for the intended target group - learners of English - this might be an unfortunate and at times confusing choice, since, as Hausmann (1985: 121-122) already pointed out over 30 years ago, for the purpose of decoding a text, an entry under the collocator (here the verb-collocate) is vital. 12 In the BNC the string half a year occurs 46 times, while six months is used in 3866 cases. 172 6 CollMatch 6.4 Non-Native Speakers Non-native speakers’ level of proficiency differs from native speakers’ performance. This observation is, however, neither surprising nor necessarily new. Gyllstad’s pilot study (2007: 168) yielded similar results, and it has repeatedly been pointed out by researchers from various backgrounds that, when it comes to phraseological phenomena, even advanced learners of English are less competent than native speakers. Among the first to explicitly comment on this phenomenon were Pawley and Syder (1983) with their description of the “puzzle of native-like selection and native-like fluency”. Howarth (1996: 160) too found that, compared to native speakers of English, non-native speakers struggle to understand the fine line between acceptable deviation and restrictions on commutability. He assumes a learner’s approach towards collocations differs from the native speaker, who, furthermore, has of course spent considerably more time immersed in the target language. Granger (1998: 158) as well finds that learners of English produce a plethora of non-native combinations and are furthermore in most cases not aware of salience among different variants of a collocation (Granger 1998: 152-154). In the same vein, Nesselhauff (2004) concludes that over 50 % of the collocations L2 participants used in a corpus study were either wrong or deviant. She also stresses that learners seem to be less aware of restrictions and use collocations in a more creative way (Nesselhauff 2004: 237-24). However, to conclude that advanced learners of English are simply not very aware of collocational restrictions would be too simplistic. For, as de Cock (2004) points out, closer analysis of phraseological data could reveal that, like other prefabricated items, the phenomenon of collocation as well “[…] displays a complex picture of overuse, underuse, misuse of target language NS [native speakers’] sequences and use of learner idiosyncratic sequences.” (de Cock 2004: 243) Therefore, this chapter will take a closer look at the development of a nonnative speaker’s collocational proficiency in order to determine whether this variation prevails in learners’ receptive knowledge about collocations. As far as group make-up is concerned, the group of non-native adult learners of English is fairly similar to the adult native speakers from the previous chapters. The data set consists of 87 participants with an average age of 21.87. They were all studying English at undergraduate level at the time of testing and have German as their mother tongue. All participants had passed an entrance level test 13 and could thus be regarded as advanced learners of English. Unlike 13 The data was collected at the Friedrich-Alexander University Erlangen-Nürnberg. An entrance level test is compulsory for anyone who wishes to study English there. (FAU Sprachenzentrum 2007) 6.4 Non-Native Speakers 173 the results from the adult native speakers of English, the adult learners’ overall CollMatch performance is less clear (Appendix II ). However, a comparison of individual items with the native evaluation shows that some collocational pairs are indeed accepted on a native speaker level (table 6.3). Allowing for a variation of plus-minus five percent (which equals about four to five participants who differed in their evaluations), there are only 13 combinations, six items and seven distractors, which receive similar scores in both adult groups. This number increases to one-third once the level of deviation is raised to plus-minus 10 %. Then, 34 phrases, 17 items and 17 distractors, are roughly evaluated in the same way by adult native and non-native speakers of English. items distractors ± 5 % break news, pay attention, suffer damage, spread one‘s wings, clean windows, grab a hold claim trade, stretch a regard, shake a smile, impose success, swing a secret, roll a look, fill an aim ± 10 % (additional) make a move, supply one‘s assistance, give a speech, catch fire, clear one‘s throat, grant permission, express a worry, launch a campaign, acquire a skill, justify one‘s existence, snap one‘s fingers restore a favour, gather a matter, sink speed, win one‘s memory, score problems, rush rank, knock a concern, pack an affair, sit seed, fall a failure same pattern fit the bill, clear one‘s throat, grant permission, express a worry, spread one‘s wings, jump a queue pick a glance, sink speed, stick one‘s mood, knock a concern, charge respect Table 6.3: Overview of items with the same acceptance pattern or a similar acceptance score in adult L1 and L2 evaluations Interestingly, this group of items does not correspond with any of the groups which belong to one of the acceptance patterns identified in chapter 6.3; instead they range from items which are readily identified by all native speakers independent of their age, like make a move , supply one’s assistance , give a speech or pay attention , to gradually developing items, like catch fire, clear one’s throat, grant permission, launch a campaign or spread one’s wings , up to items with a Peaked Acceptance, such as express a worry, justify one’s existence, snap one’s fingers or grab a hold . Furthermore, only eleven phrases - six items and five distractors - share the same acquisitional pattern as the teenagers’ evaluation. 174 6 CollMatch Additionally, 19 items 14 were evaluated below chance by the advanced learners, while adult native speakers identify them as collocations with a close-to-ceiling acceptance score. They are also distributed throughout all patterns except Gradual Acceptance. This result corresponds with Granger’s (1998) findings that even advanced learners of English are not able to identify the most salient items. Overall, this first overview further suggests that, for most of CollMatch’s items, L2 speakers do not seem to share the same patterns of acquisition when it comes to collocational proficiency. In general, advanced adult learners also seem more reluctant to accept collocations, since all, except three, collocational items 15 score lower average acceptance compared to the native speakers. This tendency, however, changes with the pseudo-collocations. Here, the advanced learners accept phrases more easily. Combinations like pick a glance , fetch an illness , drag a limit , stick one’s mood , lay pressure , charge respect or hit approval show a difference of over or equal to 20 %. Overall, these divergent evaluations suggest that even relatively advanced learners of English have a more restricted knowledge compared to native speakers at the same age and educational level. These findings are in line with previous research by Howarth (1996), Granger (1998), Nesselhauff (2004) and de Cock (2004), who already demonstrated that non-native speakers of English seem to be less accurate and more creative in their use of collocational combinations. At the same time, a graph of the Coll- Match items sorted according to their level of positive acceptance by advanced non-native learners shows that, as in the adult native speakers’ data, most collocations in general range higher than the pseudo-collocations (Appendix II ). However, 50 items out of 70 collocations were accepted by more than half of the L2 adult participants, so, as a consequence, about 30 % of the collocations seem to be accepted by chance. This is compared to the corresponding native speaker data set, where only one item - the rather problematic afford an opportunity (> 6.1) - was not accepted by the majority of test takers. Overall, the results from the data set of adult non-native speakers show that the L2 learning process of collocational items only partially seems to operate along the same lines as for native speakers of English. This tendency manifests itself within the L2 pseudo-longitudinal data. A comparison between the different L2 age groups shows a more heterogeneous distribution of different levels of acceptance across the groups. The analysis still yields patterns of Gradual Acceptance , Peaked Acceptance , Steady Acceptance and Receding Positive Evalu- 14 These items are: lose sleep , draw a breath , say grace , serve a sentence , pull a face , run a bath , assume responsibility , cut a corner , fly a flag , fit the bill , perform a miracle , beat eggs , assess damage , ride a storm , jump a queue , steal someone‘s thunder , dress a wound , lend support , sustain an injury . 15 These items are: do justice , gain ground , grab a hold . 6.4 Non-Native Speakers 175 ation within the data set of non-native speakers of English, but items which show one pattern within the group of native speakers only rarely display the corresponding pattern when evaluated by non-native speakers (compare table 6.4 for items with the same pattern in both groups). Furthermore, L2 groups produced one additional pattern: Dented Acceptance . As with Peaked Acceptance, there is a general upward tendency in acceptance scores, but while year 9 lies above year 5 and 11 in items of Peaked Acceptance, for Dented Acceptance their evaluations fall below the other teenagers’ scores. Furthermore, unlike English native speaking adults, L2 advanced learners are at times outperformed by one, or even more, younger age groups. Hence, there seems to be not only a trend of Academic Acceptance but also a kind of Academic Decline at work: items which have been more or less gradually accepted by young learners but are then strongly rejected by the adult advanced learners (> 6.4.6). Similar to the analysis of L1 evaluations, the following chapters will first present the L2 speakers’ evaluational patterns (> 6.4.1-6.4.5), before chapter 6.4.6 will provide a short summary of these findings. 6.4.1 Pattern 1: Gradual Acceptance Within the data for second language learners, Gradual Acceptance (GA) seems to be the most frequent pattern. About 41 % or a total of 29 collocations generate this type. Of these, just four also occur in the L1 group: clear one’s throat , grant permission , spread one’s wings and drop hints . An overview of all items which show a pattern of Gradual Acceptance can be found in box 6.7. Here as well, a tolerance of plus-minus 5 % was allowed for. However, the majority of items are rather clear cases of Gradual Acceptance. do justice, raise objections, give a speech, pay attention, reach a conclusion * , assume responsibility * , suffer damage, realise potential, grain ground, perform a miracle * , clear one's throat, employ a technique * , grant permission, acquire a skill, deliver a speech * , spread one's wings, steal someone's thunder * , clean windows * , shift gear, justify one's existence, cast a vote * , snap one's fingers * , file a report 5 % over: fit the bill * , press charges, kick one's heels 5 % under: lose sleep * , drop hints *) At least one teenage group outperforms the adult evaluations. Box 6.7: Items of Gradual Acceptance (L2) 176 6 CollMatch A closer look at these items further suggests that, while similar in pattern, the group of items with Gradual Acceptance in second language acquisition varies a lot from the collocations which share this pattern in the native speaker data. Some of the differences are almost to be expected, while others yield interesting results about the status of collocations in second language learning. First of all, most items from box 6.7 show a pattern of Peaked Acceptance within native speakers’ data, which is not very surprising since this pattern is the most frequent one in chapter 6.3. But, there are also five items 16 which were evaluated with the same steady acceptance score by all native speakers. In the native teenagers’ data this pattern is comparatively small so these five items make up almost half of the whole pattern. Compared to non-native learners, native speakers nonetheless have a headstart of at least six years, with daily exposure to the target language. Thus, it might not be very surprising that learners are not as familiar with these items as native speakers would be. There is, however, one collocation which scores with a close to ceiling acceptance throughout all L2 groups except year 5: pay attention . This brings it close to a native-like pattern of Steady Acceptance. The fact that pay attention yields relatively high acceptance levels is remarkable, because this collocation could be regarded as a kind of classroom phrase . Essentially, these are phrases which are used quite often in a classroom context in order to structure and organise a lesson, give instructions or simply interact within a classroom situation. High positive acceptance throughout all datasets, with the exception of the youngest group of learners, supports the frequency of input hypothesis, since this is one of the phrases students might experience on an almost daily basis; yet reluctance among non-native speakers in year 5 indicates that it takes over one year to fully internalise and recognise these phrases. Of course, it might also be possible that the performance of students in year five is skewed by the predominantly spoken input they received throughout their first years of English language training and that, were CollMatch an oral test, this group might have performed better on (usually) familiar items such as pay attention . But even then, the pattern of evaluation for this item shows that, at least to a certain extent, students towards the end of year five still struggle to combine oral and written input. This might be an argument in favour of a combination of both auditive as well as visual exposure to a language from an early stage onwards. A second difference is the accumulation of fairly abstract, almost academic data in box 6.7. With the exception of eight combinations 17 , all items which 16 These items are: give a speech , pay attention , cast a vote , kick one’s heels and lose sleep. 17 These items are: perform a miracle , spread one’s wings , steal someone’s thunder , clean windows , shift gear , kick one’s heels and lose sleep. 6.4 Non-Native Speakers 177 become increasingly accepted by non-native learners either serve some kind of discourse or discussion-structuring function, like raise objections and reach a conclusion , or could be seen as a rather elaborate way of describing some kind of action, like employ a technique or file a report . However, only eight of these items can be found with a tendency of Academic Acceptance among the native speakers, which suggest that there might be a growing focus on this kind of vocabulary, if not in general, then at least at this particular school. Last but not least, this group of adult learners performs in a more heterogeneous way than their native-speaking counterparts. While in chapter 6.3 this pattern always continued with a clear, often close to ceiling, evaluation by the adult native speakers, here, for twelve items at least, teenagers from year 11 score with higher accuracy than students of English at university level. This suggests that the attainment of collocational proficiency in L2 is a more heterogeneous process than L1. While native speakers seem to progress more or less gradually towards adult proficiency, advanced learners of English should not necessarily be regarded as the most proficient group among all L2 groups, at least not for every item. Furthermore, items which are (with a difference of over 30 %) more readily accepted by students of English at university level are very rare within this data set. Only raise objections , shift gear and kick one’s heels display this pattern of Academic Acceptance (> 6.3.6). Hence, it could be argued that while there seems to be slow but continuous development for learners within the same institution (such as a school), not all institutions progress along the same lines, which then might lead to a rather mixed picture at university level. 6.4.2 Pattern 2: Peaked and Dented Acceptance With a total of four items (about 6 %), collocations with an acquisitional pattern of Peaked Acceptance (PA) make up only a small fraction of non-native speakers’ patterns. One of these items, shrug one’s shoulders , occurs with a peak in year 9 within both datasets. But since peaked patterns ( PA and PR ) account for 50 % of the collocations among native speakers, one match could also be expected to occur by chance. Furthermore, unlike native-speakers’ evaluation, not all items’ score increases further with the adult data. Catch fire and throw a party show a slight decrease in non-native speakers’ acceptance. In addition, none of the items with a pattern of Peaked Acceptance present a clear case of Academic Acceptance, but three out of four items score close to ceiling among year 9 and the adult group of non-native speakers. This indicates that the items from the pattern of Peaked Acceptance, even if similar in general shape, behave very differently to items with the same pattern in the native speakers’ dataset. 178 6 CollMatch PA : break news, catch fire * , throw a party * , shrug one's shoulders * DA : have a say, make a move * , hold meetings * , run a bath * , adopt an approach * , settle a dispute * , commit a sin * , jump a queue * , exercise discretion * , pursue a career * , dismiss an idea * , sustain an injury * , cease fire * *) Difference of 10 % points or over between evaluations from year 9 and 11. Box 6.8: Items of Peaked Acceptance and Dented Acceptance (L2) Moreover, learners of English produce a pattern which, despite an overall gain in general acceptance, show declining acceptance scores in year 9. This pattern of Dented Acceptance ( DA ) does not occur in the native speaker data, whereas 19 % of CollMatch’s collocations show this pattern in the non-native teenagers’ evaluations (graph 6.6). Graph 6.6: Example for Dented Acceptance (DA) - L2 acceptability rating for make a move Here, the discrepancy between year 9 and year 11 reaches up to 47 % points and is thus considerably larger than for the items with a positive peak. Yet the overall acceptability scores tend to be smaller; including adult evaluation, only two items, make a move and hold meetings , reach scores of over 80 %. In addition, only half of the items show a peaked pattern in the native dataset, and similar to the pattern above, three items were evaluated with a steady acceptance score 6.4 Non-Native Speakers 179 by native speakers of English. Thus, it almost seems as if young learners in year 9 react in precisely the opposite way to their native speaker counterparts. But evaluations also vary considerably within the adult group, resulting in seven items for which the adults outperform the younger native speakers, but also two collocations which receive about the same score from year 11 and adults, as well as a further three where year 11 scores considerably higher 18 than the grown-up evaluators. Compared to the previous pattern, the items in box 6.8 are also interesting from a semantic point of view. While Gradual Acceptance seemed to attract items with a more abstract, academic application, collocations like break news , throw a party , have a say or jump a queue could be regarded as more common in daily conversation. This would also explain why a comparatively high percentage of items could be found which develop with Gradual, Peaked or even Steady Acceptance in first language acquisition. The relatively high variation among non-native speaker data suggests that learners of English struggle with these items or are at least less certain when it comes to their correct evaluation. 6.4.3 Pattern 3: Steady Acceptance 6 % of CollMatch’s collocations occur with a pattern of Steady Acceptance (StA), which, compared to patterns of GA , is again a relatively small ratio. Furthermore, none of them are rated above 60 %. Ride a storm and abandon a ship are even evaluated below chance by all of the young learners (box 6.9). launch a campaign, afford an opportunity, ride a storm, abandon a ship Box 6.9: Items of Steady Acceptance (L2) For students in year 5 this is, of course, not very surprising, since the youngest group of teenage learners who have undergone a total of three to five years of very limited exposure to English can only be expected to have a more restricted vocabulary and thus more difficulties in identifying reoccurring phrases, such as collocations. But the fact that the acceptance scores remain steady throughout the years suggests that neither input nor formal teaching increased the learners’ familiarity with these items. Thus, students who accept these items have encountered or learned them elsewhere, for example on TV or through other media. In the case of the items which scored below average, this could of course 18 Up to 30 % for the evaluation of run a bath . 180 6 CollMatch also simply imply that some participants had a lucky guess. Especially for ride a storm this seems to be the case, since adult non-natives speakers as well remain under 40 % in their evaluation. Afford an opportunity ranges even lower among the most advanced learners of English, which, as has already been pointed out earlier, is very much in line with native speakers’ tendency to reject this item (> 6.1). Launch a campaign and abandon ship , however, gain with the age of the test taker. But, while abandon ship only increases by a further 20 %, launch a campaign soars up to 91 % of adult acceptance, a plus of 34 % points compared to the item’s evaluation in year 11. This makes launch a campaign a further example of Academic Acceptance. Overall, the low acceptance scores suggest that in contrast to the L1 group, this pattern does not stand for basic, early fused collocations, but rather subsumes items which in most cases were difficult for all learners irrespective of their age or educational level. 6.4.4 Pattern 4: Receding Positive Evaluation Among CollMatch’s items, a total of 30 % show a general decline throughout all groups of German native speaker teenagers. Furthermore, as described before, the trend towards more items with a dent in year nine continues ( Dented Recession , DR ). As pointed out in chapter 6.4.2, this development is precisely the opposite of the English native speakers’ data, where acceptance scores peaked in year 9 ( PR ). While in the L2 dataset, there is only one item with this pattern ( meet a need ) and only four items with a general Gradual Decline ( DEC ) ( say grace, fly a flag, dress a wound and lend support ), 15 collocations are least accepted by the age group of 15-year-olds. Meet a need , however, has a difference of only 2 % points between year 5 and year 9, so it could actually be regarded as a fifth item of Gradual Decline, which then makes peaked patterns ( PA and PR ), the most frequent sub-pattern from chapter 6.3, virtually nonexistent. This, once again, indicates that unlike English native speakers, learners with a German background tend to be more cautious where native speakers are more confident in their positive evaluations. In general, there is not a single item in box 6.10 which shares a similar pattern in both, L1 and L2, datasets. 6.4 Non-Native Speakers 181 DEC : say grace * , fly a flag ** , dress a wound, lend support * PR : meet a need, bend a rule DR : draw breath, bear witness * , serve a sentence ** , keep pets ** , pull a face * , set an example * , play a trick * , cut a corner ** , push one's luck * , strike a blow * , beat eggs ** , assess damage ** , blow one's nose * , challenge a view, grab a hold * *) Adult acceptability scores jump back to or even exceed the initial score from year 5 **) Acceptability scores from year 5 remain the highest evaluation for this pattern Box 6.10: Items with an overall tendency of receding positive evaluation (L2) Likewise, adult data does not continue with consistent decline or growth. In fact, there are two general tendencies to be observed. The first one presents itself as a jump in adult data, back to a similar or even higher level than the initial score from year 5, after year 9 or year 11 showed a rather reluctant acceptance. At times, this rather drastic increase happens in year 11. Thus, three instances of Academic Acceptance can be found within this constellation: bear witness , blow one’s nose and grab a hold . Meet a need is clearly favoured by adult non-native speakers, while younger learners evaluate this item well below chance. Two other items, set an example and play a trick , reach close to ceiling acceptance in year 5 and the adult group. Yet it is difficult to see why blow one’s nose and grab a hold are not better known among young learners of English. Set an example and play a trick , on the other hand, might again be regarded as classroom phrases. Thus, it is not unexpected that they receive rather high scores straight away. More surprising is the fact that year 9 is hesitant to accept these phrases as English combinations. Since this behaviour can be observed for patterns with generally growing as well as declining acceptance scores, this could indicate that there is a certain stage of second language learning in which students tend to accept only the items they are sure about or were explicitly taught, like the phrases from 6.4.1, while other combinations tend to be rejected. Furthermore, there are some instances in which adult data remains at a rather low score or at least below year 5’s evaluations. In these cases, the youngest students seem to be the most confident. Some of these evaluations, however, could be regarded as a lucky guess, since none of the groups score above chance (as for example with serve a sentence , pull a face , fly a flag or assess damage ). Challenge a view only displays a slight variation as well, with a maximum of 7 % points discrepancy between groups. Therefore, it could instead be considered 182 6 CollMatch a not so well, yet steadily accepted item. In their evaluation of keep pets , cut a corner and beat eggs , on the other hand, students from year 5 seem to be more confident. A potential reason might be that these phrases are closely related to their everyday experience, or that these collocations exist as a direct translation in German. In fact, at the time of test taking, the classroom of this youngest group of English learners was covered with posters, titled “My pet”, a project this class did as part of their English lessons. The task was to design a poster about the habits, preferences and pastime activities of their own or a fictional pet. It is very likely that the collocation keep a pet was used quite often throughout this sequence. So, it is not surprising that keep pets was recognised by the majority of the class, which also suggests that this recent exposure raised the item’s salience and frequency. As for cut a corner and beat eggs , these phrases incorporate both features: they could be regarded as basic vocabulary and have a direct equivalent in German. This makes them very likely to be recognised by learners of English at a very early stage of their formal language training, while advanced learners tend to reject them because they seem too German to be correct (Glass 2010). This reservation among older L2 speakers can also be observed for draw a breath . Here as well, most students from year 5 are ready to accept this combination, while acceptance scores stay below chance for the rest of the nonnative speakers’ groups. However, a literal translation does not correspond to the German “einatmen” (literal translation: to in breath), although the combination ‘draw’ plus ‘breath’ occurs in the compound “Atemzug” (literal translation: breath draw). If most advanced learners of English tend to reject this combination, it might be due to the fact that, despite the difference in word class and combinatorial order, draw breath is still considered to be too close to the German “Atemzug” and thus rejected in order to avoid a potentially unidiomatic, German translation. Say grace , on the other hand, could easily be regarded as another classroom phrase, but it scores below chance for all non-native groups except the youngest learners. The reason could be that the tradition of a prayer at the beginning of a school day 19 is less practiced at one school or simply introduced using aother, synonymous phrase like say a prayer or let’s pray . The fact that this particular group of young students seems to be familiar with this phrase might be because their English teacher uses this exact phrase and, even more importantly, taught this class in at least one of the first sessions throughout the school year. 19 Even if this might contradict the idea of secularism, to begin a school day with a short prayer is still a practiced, yet, voluntary tradition in many German schools. 6.4 Non-Native Speakers 183 6.4.5 Distractors In general, the picture box 6.11 presents is not very different from native speakers’ distribution of pseudo-collocations across all patterns. In the L2 data as well, most distractors (63.3 %) tend to be gradually rejected, though admittedly with less clear scores compared to L1 (> 6.3.6). 36.7 % of the pseudo-collocations show an evaluation which suggests an overall rising acceptance across the teenage groups. Yet, three out of these eleven items are supply one’s assistance , lay pressure and express a worry . As the discussion of these items in section 6.1 has demonstrated, these combinations should actually be classified as collocations rather than distractors. Among the remaining eight items with overall rising acceptance, a further five do not, or only barely, reach an acceptance score above chance within any of the groups, which therefore could be regarded as cases of general rejection. Restore a favour , fetch an illness and pick a glance , on the other hand, score above chance, at least among the oldest teenagers (year 11). The level of acceptance is not particularly high for either of these iems, yet it might be that some non-native speakers confused restore with return , fetch with catch and pick with take. This would in fact support Ganger’s (1998) and Nesselhauff’s (2004) findings, which claim that learners of English are less aware of collocational restrictions. But, since participants were not encouraged by the test’s design to explain their answers, this implication would need further investigation 20 . GA : supply one's assistance, lay pressure, fill an aim DA : turn a reason, restore a favour, fetch an illness, express a worry, hit approval, fall a failure StA: pick a glance, claim trade REC DEC stretch a regard, shake a smile, gather a matter, sink speed, swing a secret, score problems, roll a look, rush (a) rank, knock a concern, pack an affair PR stick one's mood, charge respect DR : drag a limit, win one's memory, impose success, rule an award, bind blood, stand an occasion, sit seed Box 6.11: CollMatch’s pseudo-collocations according pattern (L2) 20 It would also be interesting to see whether phonetic similarity ( fetch vs. catch ) or semantic similarity ( pick vs. take ) causes more confusion among learners of English. 184 6 CollMatch Still, the majority of distractors show an overall tendency towards declining acceptance scores. Furthermore, it is interesting to see that two of the main tendencies identified throughout the L2 evaluations of collocations pertain to the pseudo-collocations. One is the observation that learners seem to develop their collocational proficiency gradually, since also among the distractor items, the pattern of Gradual Decline is the most prevalent with 33 % of the items. Compared to the L1 data set which had Peaked Recession as the predominant pattern for most pseudo-collocations, this suggests that while both L1 and L2 are in most cases able to distinguish collocations from distractors, L2 learners do not seem to pass a more analytical stage which makes them more susceptible to any structurally and logically correct combinations. In fact, quite the opposite is the case: while the majority of items in the L1 dataset show a peak of acceptance for participants from year 9, L2 test takers at roughly the same age are quite reluctant and more cautious in their evaluations. This is true for collocations as well as pseudo-collocations. The number of distractors which are misclassified by the majority of the group (acceptance score above chance at >50 %) is also lowest for year 9. Here, only six pseudo-items receive above chance acceptance scores from the group of 15-year-old L2 evaluators, two of which are in fact the misclassified collocations supply one’s assistance and express a worry . For years 5 and 11, this number lies at eleven and nine respectively. 6.4.6 Summary In her 1998 study on the phraseological skills of French EFL learners, Granger reaches the rather bleak conclusion that “[…] learners’ phraseological skills are severely limited […]” (Granger 1998: 158). As chapter 6.4 has demonstrated, it is certainly true that L1 speakers of English outperform their non-native peers when it comes to their general collocational proficiency (CollMatch scores). Yet, compared to earlier studies which concerned themselves with collocational proficiency and phraseological language production from a more general point of view, this chapter was also able to show that the collocational proficiency of non-native speakers of English develops as well and that, furthermore, this development seems in certain aspects similar to native speakers’ attainment patterns. As table 6.4 indicates, the patterns of Gradual, Peaked, and Steady Acceptance, as well as items with Receding Positive Evaluation, re-occur in the pseudolongitudinal L2 evaluations. But, while Peaked Acceptance has been one of the most frequent patterns among L1 speakers, young learners in year 9 seem to be more cautious in their evaluations. Only 6 % of CollMatch’s collocations show this pattern. On the contrary, learners in year 9 tend to be the most reluctant 6.4 Non-Native Speakers 185 group of young EFL learners. This results in a new pattern: Dented Acceptance. Similar to the pattern of Peaked Acceptance among L1 speakers, Dented Acceptance occurs equally frequently among collocations and pseudo-collocations alike, which indicates that this reluctance is a general trait of non-native collocation attainment and not simply a sequence of misclassifications by one particular group. patterns collocations n=70 pseudo-collocations n=30 Gradual Acceptance 41 % 10 % Peaked Acceptance 6 % n / a Dented Acceptance 19 % 17 % Steady Acceptance 6 % 3 % Receding Positive Evaluation 30 % 70 % Table 6.4: Patterns identified in CollMatch based on each L2 group’s acceptance scores The pattern of Academic Acceptance can also be observed in the L2 dataset. Another 8.6 % of the items reach a greater acceptance within the group of academically trained adults than among younger learners. Thus, they fall within the pattern of academic acceptance (box 6.12). But only adopt an approach and dismiss an idea are collocations which show the same pattern among native speakers. As has been observed for the other patterns so far, the remaining five items are either initially accepted by all native speakers or gradually gain more positive acceptance. From this, it could be deduced that, over time, items like have a say or raise objections become part of most native speakers’ language use, while remaining advanced or even specialised items within the L2 learning process. have a say*, raise objections*, meet a need*, adopt an approach*, launch a campaign, dismiss an idea*, shift a gear* * ) acceptance from all teenagers below chance (<50 %) Box 6.12: Items of Academic Acceptance (L2) On the other hand, the items of academic acceptance in the L2 learning process are often paired with a slight tendency towards gradual acceptance within the teenagers’ data. So, the high acceptance among advanced learners of English for 186 6 CollMatch these items could be less an indicator for the predominantly academic use of collocations, like have a say or raise objections , and instead indicate that advanced learners of English are more experienced language users. But as the pattern indicates, the distance between young learners’ at secondary school level and advanced adult learners at university is quite striking for some items. One year before they leave school, most of the L2 learners of English are not very confident about collocations like have a say , raise objections , meet a need or shift a gear . This changes at university level. Of course, raise objections or meet a need are phrases most learners are confronted with once they start discussing issues on a more advanced level or have to write term papers or essays, but neither of these tasks is restricted to academia. On the contrary, the respective curricula require dialectic analyses and text production of non-fictional texts from as early as year 9 onwards ( ISB 2004). Thus, an explanation of this restricted familiarity with more ordinary items might be that students of English tend to focus on the target language inside and maybe even outside academic seminars and lectures. On the other hand, items which yielded the pattern of academic acceptance within the native speakers’ datasets range from gradual acceptance to a score below chance in the non-native speakers’ evaluation. This shows, once again, that there is no real correspondence between the type of patterns in the L1 and L2 data, which might, of course, be due to the fact that even though the two groups are similar in age and set-up, they do not have the same exposure to the English language, nor do they receive the same input. While it is relatively difficult to measure the actual extent of exposure to the English language for each and every participant, the relatively different distribution of patterns at least suggests that it is not at all authentic for some items. This view is further supported by another pattern, Academic Rejection , which can only be found in the non-native data set of this study. It suggests that items like assume responsibility , perform a miracle and sustain an injury - while accepted by the teenage test takers more or less gradually - display a strong tendency to be rejected 21 by advanced EFL learners (box 6.13). 21 For all items within the pattern of Academic Rejection the evaluation from academically trained adult non-native speakers of English lies below chance. Furthermore, at least ten percentage points’ discrepancy lie between the adult L2 test takers and the oldest group of teenage EFL learners. 6.4 Non-Native Speakers 187 Collocations: lose sleep, run a bath, assume responsibility, fit the bill, perform a miracle, jump a queue, sustain an injury Pseudo- Collocations: turn a reason, restore a favour, fetch an illness, fill an aim, fall a failure Box 6.13: Items of Academic Rejection (L2) This pattern can be observed in 12 % of the items. Interestingly, it mixes collocations and pseudo-collocations. Thus, while the rejection of pseudo-collocations, like turn a reason , restore a favour and fill an aim , is absolutely justified, advanced learners of English also appear to reject collocations with which the younger learners became more and more familiar. In a longitudinal study, this would mean that learners seem to forget some of the collocational items. Within a pseudo-longitudinal study, inferences like this are of course problematic. Furthermore, this pattern does not occur in the native speakers’ dataset, which suggests that this phenomenon is largely not caused by some kind of language attrition. It might instead be furthered by the fact that even though the four groups of non-native speakers are to a certain extent fairly similar in terms of regional and educational background, the data still comes from different speakers. This is, however, also true for the native speaker data sets, so the fact that a pattern like this exists in one group but cannot be traced in the other might indicate that while the L1 language acquisition process to a certain extent follows a consistent pattern, the affiliation to an individual learner group or cohort seems to play a more important role in second language learning. This might also explain why, even within the more consistent pattern of gradual acceptance, there are individual data sets which from time to time stand out and do not conform to the pattern’s initial shape. Just like their native-speaking counterparts, the group of advanced non-native speakers correctly rejects most of the pseudo-collocations. Apart from the items which already have been discussed as gradually accepted or (academically) rejected, most pseudo-collocations score below chance with 22 out of 30 items even showing a pattern of Gradual Decline. However, again, the pattern is not as consistent as within the native speakers’ data sets. Only seven items 22 conform to the initial definition of a gradually declining acceptance from age group to age group. The remaining 15 items feature either a rather high - even though mostly still below chance - evaluation from students in year 11 or a 22 These items are: stretch a regard, shake a smile, sink speed, swing a secret, score problems, rush rank, knock a concern and pack an affair. 188 6 CollMatch strong rejection by students in year nine which lies not only below the acceptance level of year five but also below the evaluation of year 11. Thus, this is a further indication that, even though general learning effects occur within the L2 learning process, they seem to depend much more on individual groups than the general age of the test takers. To a certain degree, this effect is to be expected, since English is learnt within formal education 23 in the German educational system, while the possibilities to encounter authentic language are much more diversified in an English native-speaking environment. This might result in a more balanced input as opposed to L2 learning, where the input depends to a large degree on a small number of people - teachers - and often restricted media in the shape of course books and some selected books or films. Therefore, in order to test how much influence the degree of immersion in an L2 might have, the next chapter will contrast the year 5 students from this chapter, who receive a rather traditional English-as-a-subject education, to students who are part of an immersion programme at school and would thus be expected to be exposed to the target language more often and in a more authentic way. 6.5 Effects of Schooling The findings of the last chapter suggested that to a certain extent native speakers and learners of English might show similar developmental patterns when it comes to the acquisition of collocational proficiency, though not for the same items. One of the reasons for this discrepancy could be a difference in input. Of course, the prerequisites for language attainment for L1 and L2 speakers of a language are traditionally very different. While native speakers are literally immersed in their mother tongue, the traditional classroom for English as a foreign language ( EFL ) in German schools consists predominantly of a teacher and an average of 24.3 students 24 who learn a language as a subject in its own 23 Unlike in other European countries, German media is relatively English-free, since all movies and TV programmes, even if not produced in German, tend to be dubbed. However, students are able to encounter a certain amount of authentic English language through radio, internet or original versions of movies or TV programmes. 24 This ratio is taken from the German education report (Bildungsbericht 2014: 265). It refers to the average number of students per class in German secondary schools. According to this report, the estimated student-teacher ratio lies at only 14.3 (Bildungsbericht 2014: 83). This number is however based on the allowed average number of children per class, the general number of lessons taught by all teachers, and the number of lessons taught to all classes within the respective stage of education. Thus, this ratio also includes smaller, more specialised groups which occur for example in subjects such as physical or religious education. Since English can be regarded as one of the main subjects, it is very likely 6.5 Effects of Schooling 189 right ( LS ). As a consequence, these learners often experience the English language as a subject matter rather than a medium of communication. Immersion programmes ( IM ), on the other hand, try to recreate a more authentic learning experience by taking students’ L2 as the language of instruction for all ( total immersion ) or selected subjects ( partial immersion ) (Wode 1995: 60-65; McLaughlin 1978: 150-153). One of the notable pioneering studies in this field was Lambert and Tucker’s (1972) pilot study at St. Lambert, Canada. Reacting to the needs of English-speaking parents in Montreal, they designed a concept which was radically based on the idea of total immersion. The project was highly successful, and soon other total immersion programmes followed; first in North America and later in Europe 25 . In Germany, the Claus-Rixen elementary school in Altenholz, Kiel was the first state school to run an early English- German bilingual programme. In their report, Lambert and Tucker (1972: 204) claim that their experimental group reached native speaker level for the receptive competence of their L2 after only three to four years of early 26 total immersion. Yet it was not until year 4 that this group was also able to perform on a native-speaker level in a vocabulary test (Lambert / Tucker 1972: 148). Similarly, Swain and Lapkin (1982: 82) confirm that immersed children can reach native-like L2 proficiency in reading and listening and even out-perform their native-speaking peers after three to four years of total immersion. But they also admit that “[…] late immersion students appear to remain well below those of francophone [the target language’s] comparison groups, even after several years of immersion.” (Swain / Lapkin 1982: 82) In a more recent set of studies, Zaunbauer and Möller (2008) were able to demonstrate that bilingually and monolingually trained students in their first and second years of elementary school progress equally well as far as their L1 reading and writing skills are concerned. Yet these students achieve better results when it comes to mathematical skills and English L2 vocabulary. However, in their third and fourth year, immersion students also outperform their regularly schooled peers in a test of general learning competence (Möller / Zaunbauer that groups within an EFL classroom are closer to the average class size and therefore considerably larger. This is also supported by students’ reports, such as Plötzgen (2003: 42-43). For primary schools the average class size is 20.8 while the student-teacher ratio is 16.6 (Bildungsbericht 2014: 83 and 265). 25 For a comprehensive overview compare for example Garcia (2009: 159-216), Wode (1995: 90-127), Genesee (1987: 1-26, 116-131), Cohen and Swain (1976). 26 The children at St. Lambert started their total immersion programme in Kindergarten (one year) and would therefore be regarded as early immersed (age 3-7), whereas programmes starting between the age of 7 and 10 would be considered delayed immersed , and immersion programmes from the age of 11 onwards are referred to as late immersed (age 11-13) or secondary immersion (age 14 or later) (Wode 1995: 60-61). 190 6 CollMatch 2008). These findings, however, focus on children’s general ability to comprehend or produce the target language and not on phraseological phenomena like collocations. According to Lambert and Tucker (1972) or Swain and Lapkin (1982), children who have been immersed early into a language can reach L1 like proficiency after only 3 to 4 years when it comes to L1 reception. Therefore, children who attended immersion programmes from an early age on (for example in Kindergarten) should have reached near-native competence in their L2 by the time they attend year 5. As has been mentioned earlier (> 5.4.2), data from partially immersed children was collected for this study in order to determine whether immersion programmes might also have a positive influence on a learner’s collocational proficiency. To compare children who have been taught English via regular English-as-a-subject lessons with partially immersed learners as well as native speakers of English, data from CollMatch was collected in two additional classes: one with a concept of partial immersion from year 5 on ( IM 2) 27 and another which attended a so-called preparation programme in order to prepare for a subsequent immersion class from year 7 onwards ( IM 1) 28 . However, the IM 2 group falls into two subgroups with 12 children who started the immersion programme in year 5 and a further three German native speakers who had already attended an immersion class during their time at elementary school. These children spent at least three successive years in partial immersion programmes and are therefore the only ones who could be expected to show considerable effects as far as their receptive L2 skills are concerned (Swain / Lapkin 1982: 82). In addition, in this particular class, a further eight students whose L1 is not German came from the same elementary school and therefore spent the same time in a partial immersion programme. Thus, there is a total of eleven children who will be used as a group of students to compare the effects from early partial immersion ( IM 2b) with results from students after one year of late immersion ( IM 2a). In order to find out whether immersion programmes have any considerable effect on native-like collocational proficiency of young learners of English, data from GB L1 children was compared to these groups’ data. In addition, CollMatch scores from two regular LS classes ( REG 1, REG 2) were 27 The immersion programme at this school is based on a partial immersion concept. The class still has regular language-as-a-subject lessons with an additional two more lessons per week and two other subjects (for example Biology, Science or Geography) exclusively taught in English. On this programme the school accepts students who have either been part of early immersion programmes in their elementary school or who feel capable of following English-only lessons for biographical reasons. 28 At this school years 5 and 6 are regarded as preparatory years in which students receive one additional English lesson. Furthermore, they focus on biological and historical topics. From year 7 onwards there are additional immersion lessons in Biology and History, but the subjects as such are still taught in German. 6.5 Effects of Schooling 191 added. Here, REG 1 is in fact the same year 5 as in the previous chapter (> 6.4), while REG 2 consists of a group of German native speakers from another year 5 class from the same school as IM 2. All L2 classes are situated in the south of Germany. Table 6.5 provides a more detailed overview. value REG 1 IM 1 REG 2 IM 2a IM 2b Year7 (L1) Participants 21 18 19 12 11 47 Mean 43.8 34.83 31.3 35.3 46 64.8 s. d 14.1 14.13 11.7 9.4 9.8 11 Maximum 56 65 58 46 59 90 Minimum* 4(21) 15 18 19 28 44 Kurtosis 1.93 -.05 -.22 -.61 -.66 -.62 Skewness -1.60 .82 .93 -.85 -.50 -.02 Quantile 25 27 26 23 21 40 57 Quantile 50 48 30 27 39 49 64 Quantile 75 53 44 42 40 53 73 *In cases where the lowest score lies more than 10 points below the next highest number of points, this second lowest score is given in brackets. Table 6.5: Overview of group results from CollMatch in year 5 (Germany) and year 7 (Great Britain) As expected, the young native speakers of English produce by far the best results. Their average score, as well as their scores in all three quantiles, lies above the results of all EFL learner groups from year 5. IM 2b, the group of students who took part in an early immersion programme, achieves the highest Coll- Match scores among young L2 test takers. Yet REG 1, one of the groups with a regular English-as-a-subject setting, scores equally well. IM 1 and IM 2a, on the other hand, achieve results which lie about eight percentage points below this class. A one-way ANOVA reveals that these groups are indeed statistically different, but as table 6.6 shows, a post hoc Games-Howell test indicates that this difference (marked with “yes” in table 6.6) refers particularly to the results of the GB group compared to GER groups. 192 6 CollMatch Group GER yr. 5 ( REG 1) GER yr. 5 ( IM 1) GER yr. 5 ( REG 2) GER yr.5 ( IM 2a) GER yr.5 ( IM 2b) GB yr. 7 (L1) GER yr. 5 (reg.) x GER yr. 5 ( IM 1) no X GER yr. 5 (reg 2) yes no x GER yr.5 ( IM 2a) no no no x GER yr.5 ( IM 2b) no no yes no x GB yr. 7 (L1) yes yes yes yes yes X Table 6.6: Comparison of group results from CollMatch in year 5 (L2) and year 7 (L1) Here, British students perform statistically differently compared to their German counterparts. While this might be less surprising for the children from regular English-as-a-subject classes (REG1 and REG2), partially early immersed children as well do not reach a native-like level, at least as far as receptive collocational proficiency is concerned. At the same time, children who learn English in a non-immersed setting ( REG 1) seem to be able to reach a level which is similar to their partially immersed peers when it comes to the correct identification of collocational pairs in their L2. Interestingly, the REG 2 group scores statistically lower than the other three groups for the German year 5 participants. This result suggests that when it comes to receptive collocational competence, regular English-as-a-subject lessons can be as successful as some immersion concepts. Of course, further research on participant-specific as well as contextual factors could shed more light on potential influencing factors. Therefore, a promising line of research might be to investigate the didactic as well as the methodological quality of teaching. Since, especially in early years, language can hardly be the only source of instruction, most immersion classes tend to be structured according to the principles of task-based learning, multimodal approaches, and differentiated instructions. Therefore, an LS classroom which follows the same principles might be as successful, at least in terms of collocational proficiency. Furthermore, data from this pseudo-longitudinal study indicates, that, even in an English-as-a-subject classroom, German learners of English gradually develop a better understanding of L2 collocations, even if overall, older teenagers 6.6 Summary and Implications 193 in year 11 still remain below an age-adequate native speaker level. Nevertheless, formal language learning seems to have a general effect on collocational proficiency, even though, as Nesselhauff (2004), Bahns (1997) or de Cock (2003) point out, there is not a particular focus on native-like phraseology in most L2 classrooms. Differently to more general studies on student assessment (OECD 2014, 2010), the educational background of the test takers does not seem to be very influential here. The DESI study (DESI-Konsortium 2008) also found that for a student’s English skills compared to his / her mathematical competence, parents’ level of education seems to be less influential. For the study at hand, a Welch’s test (Welch 1947) has been conducted which suggests that children from parents with a higher educational level 29 do not score statistically better than children from a lower educational background. This is the case for all but two groups of participants: adult, advanced learners and LS children from year 5 ( REG 1). It indicates that, especially at the beginning of secondary and tertiary education, parents’ educational background seems to play a statistical role. Later this influence seems to level, but it is difficult to tell whether this is because similar input through formal education eventually produces more similar results or because students from less educated family backgrounds drop out of the system. 6.6 Summary and Implications This chapter sought to demonstrate that when it comes to language attainment, the level of acceptance of collocational proficiency differs not only from age group to age group, but also depends on the collocational combination in question. So, while it might be possible for all collocations to be acquired through the same processes of language attainment, not every collocation enters a speaker’s inventory at the same acquisitional stage. For some, this process might even last up to early adulthood (Academic Acceptance). However, in general, the level of collocational proficiency of adult advanced learners of English seems able to reach that of L1 teenagers, but, while there is only a slight improvement across the three groups of L1 teenagers, the pro- 29 This information was obtained through a form on linguistic background which was part of this study’s questionnaire (> 5.4; Appendix I). Participants were asked about the highest level of education completed by one of their parents or caretakers. Based on these answers, students were grouped into participants with at least one parent with a higher level of education (i. e. any university degree including medical and law degrees as well as doctorates) or participants with parents with a lower educational background (i. e. GCSE, A-level or any equivalent qualification). 194 6 CollMatch ficiency of young EFL learners improves in the five-year span of this analysis. This might, of course, be due to the fact that unlike native speakers of English, these young L2 learners in year 5 are just about to start learning English, while even the youngest L1 test takers from year 7 have been immersed in the English language for over ten years. In fact, data from early immersed EFL learners (> 6.5) suggests that even students on a partial immersion programme ( IM ) can produce statistically better results than their peers who have only experienced regular language-as-a-subject classes (LS). However, it is important to stress that similar results can be obtained in a regular LS setting if they are taught according to modern, student-centred methods. As far as acquisitional patterns in the L1 and L2 data set are concerned, it is interesting to note that most patterns can be observed in both groups. There is, however, one notable exception when it comes to teenage test takers in year 9: at this age, native speakers seem to be willing to accept a VP + NP combination more readily, even if it is not an established collocational combination, as long as both parts form a semantically plausible combination (Peaked Acceptance). This trend is almost in line with the DMCDC-model, which - based on Wray and Perkins’ (2000) stage model - suggests a stage of rather synthetic, item-focused language attainment for collocations. EFL learners of the same age, on the other hand, display precisely the opposite behaviour: for one fifth of the items, they are the most reluctant group of L2 test takers, which indicates that L2 learners differ with respect to this stage from their native-speaking peers. There could be two reasons for this: either a stage of general reluctance is also part of the L1 attainment of collocations, but at an earlier stage, or, L1 and L2 speakers simply behave differently during one stage within the acquisition process. The first assumption is, however, rather unlikely. Collocations from CollMatch seem to represent all three stages of the DMCDC -model, from more holistically stored combinations (Steady Acceptance) to Gradual Acceptance and a kind of mixed pattern consisting of a relatively open acceptance of combinations in year 9 and Gradual Acceptance in years 7 and 11 (Peaked Acceptance). Furthermore, overgeneralisation, like the tendency to accept and produce more regular, seemingly rule-governed structures, is a phenomenon which can be observed in other areas of first language acquisition, like past tense morphology or lexical concepts (Bybee 1995; Plunkett / Marchman 1991; Rescorla 1980; Anglin 1977). Thus, one might speculate that L1 and L2 speakers treat a similar stage differently: while native speakers tend to accept VP + NP combinations for structural and semantic but not phraseological reasons, EFL learners are more reluctant the more they learn about the lexis and structure of a language. Since the data from this study comes from a pseudo-longitudinal set-up, an implication like this is of course highly speculative, but it would be interesting to conduct a longitudinal follow- 6.6 Summary and Implications 195 up study which includes EFL learners with various L1s to see whether this observation pertains and is part of any processes of collocational L2 attainment, irrespective of the linguistic background of the learner. Moreover, even though CollMatch’s patterns seem to be similar to a certain extent, the individual items which fall under the respective patterns differ. Thus, it seems that while the process of collocational attainment might be similar in L1 and L2, the input which is responsible for early acquisition (Steady Acceptance) or a later, gradual rise (Gradual Acceptance or Academic Acceptance) is not. This yields the question of whether more authentic, native-like input would result in more native-like proficiency, but also whether more native-like and less academically focused input should be a desirable aim for EFL teaching in the first place. Of course, if the preparation for an academic career is the foremost goal of secondary education, a focus on elaborate, academic vocabulary and phraseology should be one of the main objectives of any EFL classroom. But as a modern lingua franca, the purpose of English for most students will presumably lie in everyday conversion for personal and business reasons. Thus, a curriculum with a strong focus on academic application might partly defy the purpose of English for the majority of EFL learners, especially since even native speakers seem to struggle with more academic VP + NP combinations (Academic Acceptance). Therefore, as Dąbrowska (2012, 2004) as well as Dąbrowska and Street (2006) points out, the native speaker as a unified concept might not exist, which would imply that current EFL teaching adheres to a goal that only a specific group of native speakers achieve. 6.6 Summary and Implications 197 7 CollJudge People communicate via sentences, seldom via isolated words. Consequently, people’s understanding of the meaning of sentences is far more reliable than their understanding of the meaning of words. Their intuitions about the definitions of the words they utter and understand are fragmentary at best. (Miller 1999: 4-5) One of the defining criteria of collocations is their in-between status on the spectrum of flexibility and fixedness. They are neither fully fixed compounds nor completely arbitrary but display certain combinatorial restrictions which, under certain circumstances, can be interpreted more freely (> 1; 3). As the previous chapters showed, these restrictions depend not only on the relative frequency of co-occurrence but also on age-related factors (> 6). Hence, it could be argued that collocations display a level of synchronic as well as diachronic variability. This chapter intends to explore the role contextual factors play in a speaker’s perception of established collocations and their more creative alternations. Like chapter 6, it also includes age and linguistic background as further variables. In order to see whether these factors prevail under different circumstances, the items under investigation were modified with respect to ‘creativity’ and ‘context’. To make comparison possible, the focus will, once again, lie on language perception tested through item evaluation. In contrast to CollMatch, the items in this second test consist of whole sentences (table 7.1). On the one hand, this makes a clear focus on collocations more difficult since a sentence might be rejected because of phrases or word choices which lie outside the combination under investigation. On the other hand, this extension is one of the few ways of including context as a factor. Several precautions were taken to avoid as much deviation from the actual focus of the test as possible, such as the distribution of test items across four versions of the test, distractor items and distractor tasks (> 5.4.2). Furthermore, CollJudge, should not be regarded as an independent test, but rather, as its name suggests, an add-on to CollMatch’s results (> 6). 198 7 CollJudge Table 7.1: Items in CollJudge in their four different variants (original version of each item in bold print) 7 CollJudge 199 As has been mentioned before, test items from CollJudge take the shape of sentences. There is a total of fifteen items. Nine of these sentences contain collocations from CollMatch, which, based on a pre-test, were selected with a maximum amount of pattern variety in mind. So, of course, all four patterns from chapter 6 and also the meta-pattern of Academic Acceptance are represented by at least one item. More than half of these items, however, contain collocations which showed a pattern of Peaked Acceptance. One reason for this is that Peaked Acceptance turned out to be one of the most frequent patterns among CollMatch’s items within English native speakers’ evaluations; a result which was unexpected but which at the same time seems to be in line with the DMCDC -model (> 4.4). Furthermore, these five items resulted in five different patterns within the data set of German native speakers. Thus, they also serve the purpose of examining whether a change in context might actually also lead to a different evaluation by non-native speakers of English. Commit a sin has deliberately been changed to commit a crime to make the contrast with its less frequent variant commit a mistake clearer and less religiously marked. In addition, two new VP + NP combinations lose one’s job and cook meal , were included. Lose one’s job, because, in its abstract form, it contains a one’s-construction and could also occur in the variation of lose one’s work . In this combination, it would then form a direct translation of the German “Arbeit verlieren”. Since some of these direct equivalents were rejected by more advanced learners of English (> 6.4.4), This item might help us to find out whether this effect prevails once more context is added. The second addition also serves the purpose of finding out more about one-to-one German translations, since the process of preparing tea in German is often expressed with “ Tee kochen ”, which directly translates into cook tea . However, make tea would be the more idiomatic phrasing. Of course, cook tea is also used by some native speakers of English, but only if tea does not simply refer to a hot beverage but is extended to cover a whole meal. So, while lose work and cook tea might be combinations which are both rejected in either context by advanced learners of English because of their similarity to German phrases, the evaluation of adult native speakers of English could be expected to be based on different factors. For lose work , like for all the other less frequent variations of collocations, a level of semi-fixed constructions or context might influence native speakers’ evaluations. With cook tea , on the other hand, it might also be a question of an individual’s sociolinguistic background. Thus, it would be interesting to see if a rather homogeneous group, like teenagers from a local school, either clearly rejects or predominantly accepts this combination, or whether more educated speakers of English from different backgrounds fall into specific groups. Up to this point, the test was only concerned with [ VP + NP ] constructions, but, as outlined in chapters 1 and 2, 200 7 CollJudge collocations can take the shape of various word class combinations. Hence, to see if the observed patterns and tendencies prevail throughout different formal combinations, four [Adj+N] constructions were included as test items. Their selection was largely based on combinations which have already have been used as examples for Adj+N collocations in different publications (Herbst 2010: 128-133; Herbst 1996: 379-389; Palmer 2 1981: 75-79: Leech 1974: 19-20). To test whether all four variants of CollJudge (> 5.4.1.2) indeed come from statistically similar groups, table 7.2 gives the respective means and standard deviation for the test’s two adult subgroups in terms of correct CollMatch scores. All groups share a similar mean and standard deviation. An ANOVA across the four variants for each group confirms that these results are not statistical. This indicates that, as Cowart (1997: 79-84) suggested, all groups lie on a similar level when it comes to their collocational proficiency. Therefore, the results from all four variants of CollJudge, even if they are obtained from different speakers, can be regarded as comparable. GB adult (L1) - variants GER adult (L2) - variants variants of CollJudge 1 2 3 4 1 2 3 4 participants 22 21 19 24 25 20 22 20 mean 91.2 89.5 90.7 89.8 68.8 68.4 66.8 71.3 s. d. 5.5 5.2 4.7 4.4 8.1 10.4 10.5 10.4 Table 7.2: CollMatch results according to the four test variants of CollJudge for adult L1 and L2 participants Thus, similarly to chapter 6, the subsequent pages (> 7.1-2) will present results from English and German native speakers and analyse them against the background of the previous chapters. Chapter 7.3 then compares these results with corpus data and tests whether one of the frequently used association measures (> 5.1) corresponds to these findings, which could then be regarded as a cognitively relevant index. 7.4 will bring together the most central results from these three chapters and discuss their implications for a more general conception of first and second language acquisition and learning. 7.1 Native Speakers 201 7.1 Native Speakers As has been outlined above, for this study the group of 86 adult English native speakers falls into four subgroups, which, due to randomisation on the day of test taking, are of irregular size. Thus, the largest subgroup consists of 24 participants, while the smallest only contains 19. From a statistical point of view, this is, however, a large enough sample to be able to extrapolate statistical results (table 7.2). Since there was only half the number of participants in each children’s group, this does not hold for the developmental data from the younger test takers. Subgroups here vary from 18 to seven students per group. This unequal distribution was not intended, but because L1 students come from various linguistic backgrounds and - as has been outlined previously - the definition of native speaker chosen for this study was deliberately quite a narrow one (> 5.4.2), the number reduced due to data mining. With only children with at least one English native speaking parent counted as native speakers, tests had to be removed from the database, which resulted in quite a drastic reduction of answers for some subgroups. Therefore, only the adult data can be presented here with statistical, quantitative analysis, while data from the younger groups of native speakers can only be interpreted as general tendencies. For both datasets, however, qualitative feedback was also collected (> 5.4.1.2) which will be used to put the quantitative data on firmer and more comprehensive ground. In general, the adults’ evaluations produce three patterns which can be found for more than one item: Preference of Established Variants (EST), Overall Acceptance (OA) and Contextual Acceptance ( CONTEXT ) 1 . In fact, most items fall under either of these patterns, except for two: serve a sentence and heavy rain . The following chapters will first discuss these four groups based on the English L1 test takers (> 7.1.1-7.1.4). A review will then summarise these results and consider their implications (> 7.1.5). 7.1.1 Pattern 1: Preference of Established Variants If collocations are regarded as “typical, specific and characteristic relations between two words” 2 (Hausmann 1985: 118), it is to be expected that in a test like CollJudge otherwise structurally acceptable sentences which contain an established collocation are preferred over rarer or even creative variations. In fact, Glass (2010) showed that native speakers can not only detect whether a 1 See Appendix V for the z-transformed values of each item within the respective age group. 2 German original: „[…] typische, spezifische und charakteristische Zweierbeziehung von Wörtern […]“ (Hausmann 1985: 118) 202 7 CollJudge native or non-native speaker of a language wrote a text, but also the level of this person’s command of a language based on phraseological cues alone. Thus, it comes as no surprise that one-third of CollJudge’s items were only accepted in their established form (like pull a face ) while a more creative alternation, like pull a smile , is rejected by the majority of adult English evaluators. As presented in graph 7.1, the z-transformed acceptance scores 3 of these items are fairly high for the simple as well as the more complex sentences which contain an established version of a collocation. Sentences with a more creative alternation, however, score relatively low, independent of the context in which they occur. The five items which display this effect are raise objections , meet the need , run a bath , pull a face and cook the meal . Table 7.3 contains an overview of all suggested changes which have been made by English native speakers, sorted according to age. It furthermore indicates which evaluations proved to be statistical in a one-way ANOVA analysis. Graph 7.1: Example for Preference of Established Variants ( PREF ) - L1 acceptability rating for pull a face It is interesting to see that items which establish a CollMatch pattern of Steady Acceptance and therefore seem to be associated at quite an early stage in language development, like run a bath and pull a face , retain a rather close relation- 3 In all pattern related graphs in chapter 7, z-transformed acceptance scores were raised by 1.5. This does not change the overall pattern, but makes interpretation of the graph more intuitive. 7.1 Native Speakers 203 ship and do not allow for much change or variation. Of course, one could argue that there are not many options for variation in either case apart from those given, which puts run a bath and pull a face on the more restricted end of the spectrum of phraseological phenomena. This, however, does not mean that a comparison between the four variations is not possible. Furthermore, the more creative alternations of both collocations do not form ungrammatical or indecipherable sentences; on the contrary, there are positive evaluations for all variants. The example sentence for run a bath is even taken from an online forum and features in its original phrasing as run a tub (table 7.1). For pull a face / smile , one adult participant even explicitly refers to the fact that the use of pull in pull a smile “makes it sound fake” (table 7.3), which shows that this combination is seemingly interpreted against the background of its more established counterpart. This can already be observed among the teenage evaluators, who also only comment on the creative variants of both items or at least mark 4 the respective combination. Item Age e / s e / c c / s c / c Raise objections/ reservations** Adult 1x questions, 1x concerns - 3x concerns, 2x questions, 1x doubts 2x questions, 2x concerns, 1x have Year 7 - 1x [phrase] 1x concerns - Year 9 - - - - Year 11 - - - - Meet the need/ want** Adult - - 7x need, 2x demand, 1x desire 7x need, 1x want Year 7 - - 1x demand, 1x target, 1x [phrase] 1x want Year 9 - - 1x want, 2x [phrase] 2x need, 1x [phrase] Year 11 1x needs - 3x need, 1x qualifications 2x need 4 The cases where participants underlined or circled established or creative variants of a collocational combination but did not provide a more suitable alternative are still regarded as comments and therefore marked “[phrase]”. 204 7 CollJudge Item Age e / s e / c c / s c / c Run a baht/ tub* Adult - - 6x bath 1x fill 2x bath Year 7 - - 2x bath, 1x [phrase] 1x bath Year 9 - - 2x bath 2x bath Year 11 - - 1x fill 1x bath Pull a face/ smile** Adult - - 5x smiled, 2x gave, 1x face, 1x "pulled makes it sound fake" 5x smiled, 2x face, 1x shot, 1x gave Year 7 - - 1x [phrase] - Year 9 - - 1x showed 1x cracked, 1x smile Year 11 - - 1x smiled - Cook the meal/ tea** Adult 1x prepare - 5x make, 2 brew, 1x tea, 3x make, 1x drink, 1x [phrase] 1x dinner, 1x “meaning dinner ok, used in some regions” 1x [phrase] Year 7 - - 2x make, 1x [phrase] - Year 9 - - 1x dinner 1x boil Year 11 - - 3x make, 1x brew, 1x dinner 1x make, 1x brew * = p< 0.1; ** = p < 0.5 Table 7.3: L1 speakers’ qualitative evaluation of items with the pattern of Preference of Established Variants sorted according to age and variant 7.1 Native Speakers 205 In general, however, the quantitative results are retained throughout the participants’ additional comments. While the sentences containing the established variation receive virtually no comments from adult native speakers of English, sentences with creative alternations are corrected by most age groups. But even if these two collocations seem similar in structure and idiomaticity, the proposed changes concern different positions within the respective phrases. For run a bath / tub , the majority of suggestions affect the noun tub , which should be altered to bath in the eyes of most evaluators. Only two native speakers think of a change along the lines of fill a tub . Pull a smile , on the other hand, receives suggestions for the verb which result in a simple exchange, like gave or shoot or a rephrasing as in smile . While run a bath and pull a face might be regarded as rather idiomatic and hence quite restricted in their potential variation of combinations, other items with a broader range of collocates also behave in a similar way. Raise objections and meet the need as well as their variants raise reservations and meet a want show a pattern of acceptance scores which clearly favours the established combinations independent of the complexity of the accompanying sentence. The different evaluations are again statistical. Both items showed a pattern of Peaked Acceptance in CollMatch, which indicates that, apart from adult native speakers, especially children around the age of 14 judge these collocations rather positively. In their evaluation of different contexts and variations, participants of all ages agree that sentences with an established variation of a collocation are to be preferred over creative alternations. For raise objections , however, students from year 7 are insecure about an established combination in a more complex setting. This, once again, might indicate that the concept of ‘raising objections’ is a rather adult one, which comes with a fair amount of more elaborate context and is thus generally more difficult for younger speakers of a language. In CollMatch the item also occurs with a pattern of Academic Acceptance . From a qualitative point of view, however, comments on sentences with established collocations very much resemble the feedback for the more restricted items, pull a face / smile and run a bath / tub . There are hardly any comments at all. Only the simpler version of the item, raise objections , receives two suggestions for an alternative noun, compared to the creative versions of this item, with six and four comments respectively. But, interestingly enough, almost all comments circle around the same two lemma: concern and question . So the focus of the correction does not refer to the established item objections (as was the case for run a tub , which got corrected to bath ) but a completely different combination, which seems to be the preferred phrasing. This suggests that, even though 206 7 CollJudge raise objections is a statically significant combination 5 , it does not appear to be the most salient combination that could have been chosen, at least for some of the adult English native speakers. Children do not explicitly express concerns about the creative variants, even though they express a degree of reluctance in their evaluations. One reason for this might be that, again because of the more abstract concept which raise objections / reservations covers, they feel a certain discomfort with the sentences without explicitly knowing why. Meet the need , on the other hand, behaves very much like run a bath and pull a face in all aspects of the evaluation. Quantitatively it shows an EST -pattern throughout all age groups, and its qualitative evaluation too resembles the feedback more restricted items received. Similar to run a bath or raise objections , the noun is the element which gets corrected in most of the comments. Furthermore, the item displays a tendency which can also be traced in the other items: the majority of comments refers to the simple, creative version of the item. Thus, apart from the dominant pattern of Preference of Established Variants , there is also a slight tendency towards a more positive evaluation of creative variants if they occur with a more complex context. This suggests that context as well might play a role when it comes to the evaluation of more creative alternations of collocations. The last item within this pattern is cook the meal / tea . Like the other items, there is a statistical difference between the four variants. Moreover, the qualitative evaluation for this item is very similar to the collocations which occur within this pattern. Here as well, sentences which contain established variants scored higher than their more creative counterparts and the item received all but one comment for sentences which contain a creative variation of an established collocation. Furthermore, this item is a good example of a collocation where native speakers suggested a correction by altering the verb or the noun. One participant even offered two solutions, while another explicitly explained that “tea meaning dinner ok, used in some regions”. Also, younger test takers exclusively comment on the creative variants of this item. Thus, the more region-specific meaning of tea as a synonym for meal does not seem to play a role here, nor do more educated test-takers seem very aware of this particular case of potential polysemy. 5 The item has a z-score of 110.07, a t-score of 12.26, a log-likelihood value of 442.52 and an MI of 6.33 (> 7.3). 7.1 Native Speakers 207 7.1.2 Pattern 2: Overall Acceptance Like CollMatch, CollJudge yields a pattern which shows general acceptance across all expressions. This time, however, this acceptance does not represent different age groups, but rather different variations of one and the same item. Therefore, this result can be interpreted as Overall Acceptance ( OA ) of an item, independent of context or even a variation within the collocation as such. It could thus be argued that items which display this pattern might either be stored in both their more established as well as their creative variants, or have a level of collocational semi-fixed construction which supports the interpretation of more creative items, even within a less contextually rich environment. Data sets with this result are lend support , say grace , lose job , pretty girl and weak link . Only one of these sets yields a statistical p-value in an accompanying ANOVA analysis, which means that despite slight fluctuations in evaluations, only the evaluations of the four variants from say (a) prayer / grace differ statistically from one another. Graph 7.2 shows again the z-transformed mean evaluations scores for the variants of an item, this time lend support / advice , which, since it represents a case of Overall Acceptance, does not show any significant differences across the four different sentences. Graph 7.2: Example for Overall Acceptance ( OA ) - L1 acceptability rating for lend support / advice At first glance, this might indicate that these four sentences are perceived as “the same”, which suggests that some kind of underlying constructional tier 208 7 CollJudge might exist. A look at table 7.4 indeed reveals a larger share of sentences for which no comment came from any particular age group. In fact, the comments as such are more scattered and, apart from pretty girl , which is commented on only once in its simple yet creative variant, all items received several corrections in different (or even all) parts of the data set. In the case of lend support , the more creative and contextually more complex variant received nearly twice as many comments as the other groups. These tendencies appear in the quantitative evaluation as well, but they are not as indicative for the examples as in the previous chapter. Interestingly, most items which were clearly rejected in their creative variant (> 7.1.1) did receive feedback on the noun of a combination while for lend support / advice changes are only suggested for the verb. Thus, it might be that even though the noun within a VP + NP or an Adj+N combination is not necessarily the semantically more autonomous element, as for example Hausmann (1984: 401) suggests, it could quantitatively lead to a more negative evaluation. However, this does not hold for the other items within this pattern. Item Age e / s e / c c / s c / c Lend support/ advice Adult 1x provide 2x give, 1x offer 2x give, 1x offer 6x give Year 7 - 1x give - 1x give Year 9 - - - - Year 11 - - 1x give - Say (a) prayer/ grace* Adult 1x pray - - -- Year 7 - 1x pray - 1x the grace Year 9 1x pray, 1x grace 1x pray - - Year 11 1x pray 1x grace - - Lose one’s job/ work Adult - - - - Year 7 - - 2x job, 1x it was hard work - Year 9 - - - - Year 11 - - 1x Don't lose your job - 7.1 Native Speakers 209 Item Age e / s e / c c / s c / c Pretty girl/ boy Adult - - 1x handsome - Year 7 - - - - Year 9 - - - - Year 11 - - - - Weak link/ connection Adult - 1x students 1x link - Year 7 1x weakest - - - Year 9 2x weakest - - - Year 11 1x weakest - 1x link, 1x this is weak 1x individuals * = p< 0.1; ** = p < 0.5 Table 7.4: L1 speakers’ qualitative evaluation of items with the pattern of Overall Acceptance sorted according to age and variant While pretty girls seems to demonstrate that this observation can be extended to Adj+N combinations - even if it only receives one comment for the adjective choice in its simple, creative version - lose job and weak link get suggestions for an alternative noun as well. Admittedly, with lose job , none of the adult English native speakers find fault with any of the items’ variants and two of the four comments from the teenage evaluators could be regarded as not very convincing attempts at correcting the respective sentence. Yet job , the more established variant of this collocation, is mentioned twice by young native speakers of English whereas the verb is not commented on at all. Weak link , as an example of an Adj+N combination, behaves in a similar way. Here, adults as well as children prefer to suggest alternatives for the noun. The only exception is the simple, established version of the item, where weakest is the only, reoccurring comment. This might look surprising at first glance, but the picture which is painted here suggests that cultural input influences language as well, particularly that of younger native speakers. “The weakest link” is a popular TV programme in the UK . Produced by the BBC , it aired nationwide between 2000 and 2012 (Culpeper 2005: 49-52; OALD online 6 : ‘The Weakest Link’). Even though it is no longer part of the BBC ’s programming, the show’s most popular catchphrase “You are the weakest link, goodbye! ” is still embedded in Britain’s popular culture today and, of course, most of the 12 to 16-year-olds grew up with this show. Therefore, 6 Quoted from the webpage of Oxford Learner’s Dictionaries . 210 7 CollJudge the premodification of link with the lemma weak might be much more established in its superlative, which could then cause a feeling of unfamiliarity with any other form of this combination. More context, on the other hand, might trigger a more specific, less TV-related picture, which would explain why this effect is only traceable in the less contextually-rich version of weak link . Initially, this item was not selected to test or demonstrate a media related effect, but it is an interesting observation which, at the same time, indicates that the fact that an item had not been commented on at all could be regarded as an indicative factor, since even slight irregularities are spotted by the English native speaker participants. Furthermore, it shows that even the oldest teenagers linguistically seem to take a less general perspective than older native speakers. This might suggest that younger native speakers tend to have a less comprehensive, abstract linguistic proficiency than their adult counterparts. Although this pattern has shown so far that not all perceived irregularities automatically lead to a decrease in native speakers’ acceptance, it has not answered whether it automatically follows that the items discussed above are proof of the existence of an underlying collocational construction with its own implications and meaning. Since the more creative alternations of each item occur in a large corpus like the BNC significantly less frequently (Appendix IV ) than the respective established collocational combinations, it is not very likely that they are encountered as often or are even similarly entrenched. Thus, the relatively high overall acceptance scores for all variants of these items are possibly supported by some kind of semi-fixed construction. Only the data set for say (a) prayer / grace might be a notable exception. Like all other items from CollJudge, the test includes an alternative and, statistically speaking, less established combination, in this case: say grace . This combination, however, is not only slightly different since, compared to say a prayer , it does not contain the determiner a . In most cases, say grace , would, in fact, be considered to be a veritable synonym rather than a creative alternation. Independent of its perception, the effect for these items remains the same: there is no difference between the evaluations of a more frequent and a less statistical combination. Cognitively, however, this might have a different implication: while a creative alternation could be regarded as a direct derivation from an established collocation, which is cognitively and / or constructionally linked to its origin, two equally accepted phrases which happen to mean the same thing, on the other hand, should instead be regarded as two, individually stored collocations. 7.1 Native Speakers 211 7.1.3 Pattern 3: Contextual Acceptance To a certain extent, a slight tendency to favour sentences with more context can be observed in other items as well (> 7.1.1). There is, however, a subset of three collocations which show a very clear context effect, even for the creative variant of a collocation, a true pattern of Contextual Acceptance ( CONTEXT ). For commit a crime , drop hints and false teeth , all sentences containing an established combination are readily accepted while the simple, creative version is rejected. The more complex, creative version, on the other hand, then receives scores which are about as high as the evaluations of the established variants (graph 7.3). Graph 7.3: Example for Contextual Acceptance ( CONTEXT ) - L1 acceptability rating for commit a crime / mistake This indicates that, even if a combination alone tends to be rejected by most participants, it could still be accepted if the contextual setting is right. As pattern 1 has demonstrated, this observation does not hold for every more creative combination or any context. Yet, this pattern shows that there might be cases where it is not enough to simply consider a combination of an item per se, since context seems to be able to influence the evaluation of an item to such an extent that it regains complete acceptance. Test takers’ feedback further supports this observation (table 7.5) Similar to items with a pattern of Preference of Established Variants, none of the participants commented on sentences which either represented a simple 212 7 CollJudge or complex variation of the item in its established form. The creative variants, on the other hand, receive suggestions for their simple as well as their complex realisations, but the corrections on the simple, creative sentence exceed the contextually more complex ones. However, not every sentence receives the same amount of correction or the same range of suggestions. Drop a hint is certainly the most conservative example. Not only do the only comments the item receives come from adult test takers, the correction as such consists of only one item as well, hint . Furthermore, this one suggestion corresponds with the more established collocate for drop . Item Age e / s e / c c / s c / c Commit a crime/ mistake** Adult - - 13x made, 1x crime 1x make Year 7 - - 3x made - Year 9 - - 2x made, 1x crime - Year 11 - - 3x made - Drop a hint/ clue* Adult - - 5x hint 3x hint Year 7 - - - - Year 9 - - - - Year 11 - - - - False teeth/ hair** Adult - - 1x eyelash, 1x word, 1x teeth, 2x teeth 1x fake, 1x [phrase] Year 7 - - - - Year 9 - - - - Year 11 - - - 1x wig * = p< 0.1; ** = p < 0.5 Table 7.5: L1 speakers’ qualitative evaluation of items with the pattern of Contextual Acceptance sorted according to age and variant Thus, it almost looks as if only academically trained native speakers of English find fault with the more creative combination of drop clues , but students at the 7.1 Native Speakers 213 age of 16 evaluate this item with the exact same pattern of Contextual Acceptance. Year 7 rejects both versions of drop clues and only regards sentences with an established combination as “English”. So, even though not all of them seem to be able to articulate their doubts, adult as well as young speakers of English feel more comfortable with their evaluations if they come with sufficiently strong contextual backup. At first glance, this might appear somewhat logical and foreseeable, but at the same time this observation also shows that semantic prosody or an underlying semi-fixed construction is not always independent but might affect the interpretation of an item if it is supported by enough context or backed by a concrete setting. This also supports a more exemplar-based conception of L1 attainment (> 4.2.1). False teeth / hair presents a similar case. As with drop a hint / clue , the majority of comments come from adult test takers, but while for drop the only readily available collocate seems to have been hints , false has a wider range of potential collocational partners, so this effect of Contextual Acceptance can occur in more restricted but also rather open collocations. Moreover, it is not limited to one construction, since it is observable in VP + NP as well as Adj+N combinations. Last but not least, the clearest example of contextually dependent acceptance is represented by the data set of commit a crime / mistake : not only do the evaluations show the largest discrepancies between simple and complex sentences with a creative variation of the item, it is the simple, creative variant which receives the most comments, in stark contrast to the single comment on the creative, complex sentence. Furthermore, all age groups made suggestions for the potential correction of the simple, creative variation, which contained alternatives for the verb as well as for the noun collocate. As far as the question of the potentially stronger weight of certain collocates is concerned, there does not seem to be a preference for this pattern. Verbs as well as nouns are corrected, but there seems to be an item specific focus. While false hair and drop clues tend to be commented on with respect to their noun collocates, commit a mistake receives most (though not all) comments on its verb. 7.1.4 Other Patterns Out of the 15 items within CollJudge, there are two which elude a classification according to any of the three patterns introduced in the previous pages: serve a sentence / apprenticeship and heavy rain / wind . In both cases, the longer version containing the more established form of the item seems to be less accepted than its shorter counterpart. This is particularly surprising for heavy rain / wind , since the majority of comments among adult and teenage evaluators alike focus on the adjective-collocate in the two versions containing the more creative version 214 7 CollJudge heavy wind (table 7.6). Unanimously, all L1 comments prefer to see it corrected to strong . Yet this is not mirrored in the adult participants’ quantitative evaluations, in which they seem to rate the more established combination with a contextually richer setting as low as the two sentences with the creative alternation. Item Age e / s e / c c / s c / c Serve a sentence/ apprenticeship** Adult - 1x long sentences, 1x [phrase], 1x “doesn’t make much sense” 1x completed, 1x ["served" underlined] - Year 7 - - - - Year 9 - - - - Year 11 - - 1x done - Heavy rain/ wind Adult - - 2x strong 2x strong Year 7 - - - - Year 9 - 1x it was very cold and wet 1x strong - Year 11 - 1x strong - * = p< 0.1; ** = p < 0.5 Table 7.6: L1 speakers’ qualitative evaluation of items with an unclear pattern sorted according to age and variant In a similar vein, this is also true for serve a sentence / apprenticeship , where this established, contextually-richer version is outperformed by sentences containing a less established combination. The answer for these sentences’ low score might lie in the test sentences themselves. In the case of serve a sentence / apprenticeship , the sentence as it occurs in the corpus is in the shape of (33). (33) BNC B7L 1520 It will also cut training needs at a time when people no longer want to serve long apprenticeships. (34) He had served his sentence in London. This intended wording thus seems to read more naturally in either a long or a shortened version. But while the shorter version gives less reason for the sup- 7.1 Native Speakers 215 posedly more established combination of serve a sentence to collide with the context (as in (34)) the discrepancies between collocational combination and context become most obvious once serve an apprenticeship is exchanged for serve a sentence in (33). So, in a way, this item seems to work and not work at the same time. On the one hand, similarly to the Contextual Acceptance items, it demonstrates that adult native speakers are indeed sensitive to contextual factors. On the other hand, since the supposedly most established combination is, in this case, the least acceptable one, the idea of contrasting established collocations and their more creative alternations does not catch on in this particular case. This explanation might also hold for heavy rain / wind , which was also taken from the BNC in its creative form (table 7.1). Yet, here, both the combination itself as well as the phrasing seem to be too odd for most L1 evaluators’ taste. Nevertheless, while it would of course have been desirable to have 15 fully functioning items in terms of test design, these two slightly skewed items are still important for the test because they indicate that adult native speakers are able to spot malfunctioning items, which in a way makes the results from the remaining three patterns (> 7.1.1-7.1.3) more reliable. 7.1.5 Summary This chapter has attempted to demonstrate that, even though the established version of CollJudge’s items could be considered statistical from a corpus-based point of view, they display a range of different behaviours when it comes to contextual factors and more creative alternations. While some items, like raise objections / reservations or pull a face / smile , seem to be deeply entrenched and resist change - at least in the form of the creative alternation chosen for this study - others, like lend support / advice or lose one’s job / work , might be more flexible. This would support the hypothesis that collocational constructions work on a meso-level but only for some collocational combinations. However, a third pattern suggests that contextual factors can play a role as well. Thus, in cases such as commit a crime / mistake or drop a hint / clue it seems as if a more creative combination is not (yet) supported by a semi-abstract construction, such as for example [ commit + NP ], but in fact, needs further backup from its context in order to be accepted as “English” by adult native speakers of English. These patterns also seem to hold for the Adj+N combinations of this test. Furthermore, qualitative evaluation of the participants’ comments showed that there does not seem to be a collocate within a collocation which could be considered the more dominant element simply because of its word class. So, at least for L1 speakers of English, the noun within a VP + NP or Adj+N combina- 216 7 CollJudge tion - the base as Hausmann (1984: 401) calls it - is not necessarily the most autonomous item or that which dictates the choice of verb or adjective collocate. In fact, in the case of CollJudge, seven 7 out of 15 items received corrections on the noun-collocate, while the noun-collocate was regarded as the point of reference in only four 8 cases. A further two 9 were corrected in both positions. Of course, here as well, context might be one of the decisive factors, but the fact that the element which is supposed to function as the base of a collocation tends to be substituted too shows that at least in the case of native speakers’ perception, this distinction between collocator and base does not reflect cognitive processes in L1. 7.2 Non-native speakers Like the group of native speakers, the group of non-native speakers is subdivided into four groups for this task. In this case, a total of 87 German university students who are studying English fall into one group of 25, two groups of 21 and another group of 20 participants. Thus, while again slightly different in size, the distribution is similar to the set up of the native speakers’ data. In terms of evaluations from young learners, the data set is very similar to UK -based teenagers. Although classes from the German schools which teach languages as a subject ( LS ) are slightly more homogeneous with respect to the students’ mother tongue, the group sizes are smaller with only about 22 students per class. Furthermore, to prevent frustration at the evaluation of too complex items, students in year 5 were asked to focus on a selection of sentences but were encouraged to continue their evaluation if they felt confident enough to do so. Therefore, the number of test takers for each version of CollJudge varies from seven to three students. This again is a sample which is seemingly too small to produce quantitatively, statistically relevant information. Moreover, a relatively small group of participants makes the evaluation of qualitative feedback more meaningful, since here three comments could represent 50 % or even 100 % of the test takers’ feedback. Data from immersion students (IM) also comes from a rather small group of participants. As described in chapter 5.4.2, schools with two different approaches towards bilingual education participated in this 7 Items with comments on the noun-collocate are: drop a hint / clue , raise objections / reservations, run a bath / tub , meet a need / want , lose one’s job / work , pretty girl / boy , heavy rain / wind . 8 Items with comments on the verb or adjective collocate are: lend support / advice , pull a face / smile , false teeth / hair , weak link / connection . 9 Items with comments on both collocates are: commit a crime / mistake , cook the meal / tea . 7.2 Non-native speakers 217 study. In both bilingual classes the ratio of students with German as their L1 is about 60 %, while 80 % to 95 % percent of regular LS test takers are German native speakers. Focusing on bilingual students with German as their L1, the number of participants for each version of CollJudge again ranges from six to three, similar to the distribution of their LS peers. Thus, here too the qualitative analysis of the test takers’ responses will be the focus of the analysis of the two immersion classes, while the quantitative aspect of the evaluations needs to be treated with care. As far as judgement patterns are concerned, to a certain extent German native speakers behave fairly similarly to their English native-speaking counterparts. In fact, six out of 15 items cause the same patterns across all adult evaluations. Therefore, chapters 7.2.1 to 7.2.3 deal again with Preference of Established Variants , Overall Acceptance and Contextual Acceptance as the three main patterns. Furthermore, there is a fourth pattern, which seems to reveal a certain reliance on length as a factor for evaluation, which can only be observed in the German data set (> 7.2.4). Like in the native speakers’ evaluations, there were also items which painted a less clear picture. These combinations are discussed in chapter 7.2.5. 7.2.1 Pattern 1: Preference of Established Variants A clear rejection of any use of a more creative variation can be observed in three items among the adult German learners; two items 10 fewer than in the data set of the adult UK -based native speakers of English. However, it is striking that all of these items occur in both groups (table 7.7). This holds for the combinations raise objections , meet the need and cook the meal . Thus, it seems that when it comes to the correct identification of restricted items, the judgement of advanced learners of English, to a certain extent resembles native speakers’ evaluation. 10 Pull a face as well as run a bath are regarded as rather restricted, idiomatic combinations by adult native speakers of English. In the German adult L2 speaker data, they occur with a pattern of Contextual Acceptance (> 7.2.3) and Contextual Influence (>7.2.4) respectively. 218 7 CollJudge Item Age e / s e / c c / s c / c raise objections/ reservations* Adult 3x questions 2x [phrase] 4x questions, 2x concerns, 2x doubts 2x [phrase], 1x doubts, 1x questions, 1x [“reservations” underlined] 1x suspicions, 1x have Year 5 ( LS ) 1x are - - - Year 5 ( IM ) 1x [phrase] - - - Year 9 1x ["raise" underlined] - - - Year 11 - - 1x questions meet the need/ want* Adult 1x get, 1x requirements, 1x acquirements, 1x ends 1x ["to meet the" underlined] 1x to meet, 7x need, 2x demand, 1x fulfill, 6x need, 1x demand, 1x requirements, 1x interest, 1x wish, 1x [phrase] 1x ["want" underlined] Year 5 ( LS ) - - - - Year 5 ( IM ) 1x they did this for health care 1x service to need the meet for health care 1x They did to meet and do want for health - Year 9 - - 1x to realise 1x spread Year 11 1x to reach 1x [phrase] 1x for health care being wanted 1x expectations, 1x to recognize the necessarity 7.2 Non-native speakers 219 Item Age e / s e / c c / s c / c cook the meal/ tea** Adult 3x prepare, 1x make, 1x started cooking - 17x made, 1x boil 9x make, 3x boil, 1x prepare Year 5 ( LS ) - - 1x make - Year 5 ( IM ) 1x She began to cook - 1x make, 1x drink, 1x She beginned to make a tea - Year 9 - - 1x boil - Year 11 - - 2x make 1x taking * = p< 0.1; ** = p < 0.5 Table 7.7: L2 speakers’ qualitative evaluation of items with the pattern of Established Variants sorted according to age and variant This pattern can also be observed in the test takers’ qualitative evaluations. Unlike the native speakers, who in most cases only commented on the rejected, more creative versions of an item, all versions received comments from adult German learners. However, the feedback on sentences which contain a more creative combination is often at least twice as high. At the same time, these items suggest that even advanced learners of English are more ready to accept sentences with more context or at least do not think it necessary to correct the respective sentences. In the case of raise objections , for example, the sentence which received fewest comments was the more complex version containing a more established variant of the collocation. Here, two adult German L2 participants showed general uncertainty concerning this combination. With respect to the whole group, this represents about 10 % percent of all evaluators. The simpler version of the same variation was pointed out three times. All three comments suggested questions as a replacement for objections and thus concerned the base. This trend continues for both versions of the more creative variant raise reservations . However, with nine and six comments respectively, the simpler and the more complex sentence are commented on more often by the test takers. But except for three comments which either suggest have instead of raise or are insecure about the whole combination, all statements refer to the base. Unlike raise objections , a more creative combination like raise reservations seems to trigger not only more 220 7 CollJudge comments but also a wider variety; interestingly, none of which is objections (table 7.7), indicating that raise objections might not be part of the evaluator's passive vocabulary. Compared to the adult evaluators, most teenage participants refrain from comments for this item. There is only one comment per age group. Thus neither IM students nor LS classes seem to find much fault with either of these sentences. Of course, this might be because they outperform their adult counterparts, but a more likely explanation would be that raise objections / reservations is quite an abstract concept. It displays a pattern of Academic Acceptance for the evaluation of CollMatch, which indicates that it might be used more often in more academic discourse. Thus, younger, less academically trained learners might be less familiar with this combination and tend to leave it uncommented, simply because they are not sure. Meet the want , the creative variant of meet the need , is rejected even more clearly. Every second adult participant points out potential alternatives for the simple as well as the complex sentence. Here as well, the creative, simple version receives the most comments. This again suggests a slight preference for combinations in a longer, more complex sentence. As in the case of raise objections , the noun collocate remains the element with the most comments, while the verb was corrected only twice. In general, the range of alternatives is much broader. Furthermore, in the case of meet a want , the more established need ranges among the most frequent suggestions, which might indicate that, unlike raise objections , meet the need is also part of most participants’ active vocabulary. Comments from teenage test-takers, on the other hand, are similarly few, though at least in year 9, their distribution resembles the adult pattern of Preference of Established Variants . Furthermore, in three out of seven comments, it is the verb collocate which is corrected, and a further two contain a suggestion for a complete rephrasing of the item. Compared to their LS peers, who do not comment at all on either version of this item, students from year 5 with a bilingual education seem to feel at least confident enough to suggest an alternative, even though it is not always a plausible rephrasing. From a qualitative point of view, the third item with a pattern of Preference of Established Variants , cook the meal / tea , is also, the clearest case of quite harsh evaluation. More than half of the adult L2 speakers correct the more creative versions of this combination and almost all find fault with cook the tea if it occurs in a simpler sentence. This once again confirms the observation that to a certain extent a more complex sentence structure can make a creative variation of a collocation more acceptable. Unlike the other items within this pattern, cook the meal / tea is the only item with suggestions for alternatives which refer exclusively to the verb collocate with make tea being the preferred phrasing. Moreover, young evaluators also show a slight preference for the more estab- 7.2 Non-native speakers 221 lished cook the meal while commenting on make the tea , at least if it occurs in a simpler sentence. In combination with even more tentative feedback on the previously discussed items, this might indicate that young learners tend to accept combinations with which they are unfamiliar, while more ordinary combinations which comprise words that even less advanced learners of English might recognise receive comments from all age groups. As mentioned before, the fact that “Tee kochen” (literal translation: tea cook) is used in German to refer to the preparation of this hot beverage might also contribute to the clearer rejection of cook tea . This trend continues among bilingually trained students who comment most on the simpler, more creative version of the item. 7.2.2 Pattern 2: Overall Acceptance Compared to the evaluations from adult native speakers of English, a steady level of acceptance of all versions within an item occurs less frequently among adult learners. Only one collocation, drop a hint / clue , produces an equal amount of positive evaluations for all four sentences within the adult non-native speakers’ data. Furthermore, L1 and L2 speakers do not seem to agree on their evaluation of this item. While L2 speakers accept simpler and more complex as well as established and more creative versions of this collocation equally well, L1 speakers prefer the established use as well as a sentence with more context to the simpler, more creative version. However, the overall acceptance of all test sentences is quite high (Appendix V), including for native speakers. Item Age e / s e / c c / s c / c Drop a hint/ clue Adult 2x gave, 1x [phrase] 1x [phrase] 2x a hint, 1x gave 1x a clue, 1x [phrase] Year 5 ( LS ) - - - - Year 5 ( IM ) - 1x [“hints” underlined] - 1x little clues Year 9 1x gave - - - Year 11 - - - - * = p< 0.1; ** = p < 0.5 Table 7.8: L2 speakers’ qualitative evaluation of items with the pattern of Overall Acceptance sorted according to age and variant 222 7 CollJudge This tendency is further confirmed by the general amount of feedback all sentences within this item receive. They range from one to three per sentence and could thus be seen as very scarce, compared to up to 18 comments on cook the tea for example. But here too a slight preference for more complex sentences can be observed. The feedback as such is more or less equally distributed across all parts of the collocation, ranging from a more abstract underlining of all collocates to suggestions for the verb collocate and noun collocate. However, it is interesting that all three comments on the noun collocate occur in sentences which contain a more creative variant of the collocation. Data from younger learners paints a similar picture. There is only one student who points out that drop should be substituted by gave in the simple sentence which contains an established version of the item. Since this item was not part of the obligatory sentences for participants in year 5, feedback from the youngest group of learners has to be treated with caution, because only fast and confident students commented on these sentences. Even so, it could be said that there is only one student per school who commented on this item. Since in this case advanced learners of English accept a version which not all native speakers agree on, this indicates that EFL learners might be more confident in allowing combinations which are grammatically acceptable but lack idiomatic, native-like phrasing. Yet to assume that the only inference which could be made from this pattern is a slight tendency of L2 speakers to accept items more readily (presumably because they were not made aware of any restrictions) would not do justice to the fact that there is a total of four items that fall under the pattern of Overall Acceptance in the native speakers’ data set but are not equally positively judged by non-native speakers. Collocations like say a prayer , lose one’s job , pretty girl or lend support are all equally accepted in all versions by L1 speakers of English, while L2 learners evaluate most of these combinations according to a pattern of Contextual Acceptance (> 7.3.3). This means that they would be hesitant to accept a sentence with a more creative, simpler version of the item in it but could be convinced to accept this combination if the context is large and / or complex enough. Thus, in the majority of cases, advanced learners could also be seen as less confident. There are two possible reasons for this tendency. First, it could be argued that the creative alternations of items which receive an evaluation as high as the established combinations occur as frequently as their more established counterparts in real language but simply do not appear in the BNC with a representative number of instances. If this is the case, then five out of 15 randomly selected lexemes, which are not portrayed with an authentic distribution of their collocates, appears a very high number and would make corpus research a questionable tool for collocational research. Another factor could be that, while L1 speakers have 7.2 Non-native speakers 223 in these cases a partially lexically filled construction which allows interpretation of less common fillers or at least a certain level of semantic prosody to fall back on, L2 learners rely in most cases on the complexity of the context. 7.2.3 Pattern 3: Contextual Acceptance Apart from the aforementioned say (a) prayer / grace , lose one’s job / work or pretty girl / boy , commit a crime / mistake , pull a face / smile , and false teeth / hair are also part of a pattern where test takers accept all but the simpler, more creative version of an item (table 7.9). Thus, this pattern is the most frequently reoccurring pattern for the adult non-native speakers’ evaluations of CollJudge’s items. Furthermore, a slight tendency towards more positive evaluation of more complex sentences could also be observed in the amount of feedback for items which quantitatively show a pattern of Overall Acceptance or a Preference of Established Variants. This suggests that context and syntactic complexity might be one of the influencing factors for advanced L2 learners’ judgment. Compared to the German adult L2 learners, the data of adult UK -based native speakers of English reveals only three cases in which the context could be seen as an indicative factor. Two of these, commit a crime and false teeth , are therefore analysed in a similar way by native as well as non-native speakers of English. But, while commit a crime remains a rather clear case from a qualitative point of view, false teeth shows less distinct feedback pattern. Thus, with 16 comments, the simpler sentence containing commit a mistake is by far the version which receives the most feedback. All suggestions refer to the verb collocate; most would prefer the more idiomatic make and only three would recommend did instead. Also for the more complex, creative version of the item, making remains the favoured correction, even if, with only three comments, it scores very close to the complete acceptance of the two sentences featuring the more established combination. This trend can already be observed in the teenage data. Since commit a crime as an item is not part of the obligatory CollJudge sentences for year 5, it comes as no surprise that feedback from year 5 is rather scarce. Except for two not very convincing comments from immersion students on the nature of the verb collocate in the item’s established, simpler version, no student from the youngest group of test takers commented on either version of the item. Feedback is also hesitant in years 9 and 11, but, here as well, the only comments - both suggest made - are on the simpler, more creative version of the item. False teeth , on the other hand, shows a less distinct qualitative evaluation. 224 7 CollJudge Item Age e / s e / c c / s c / c commit a crime/ mistake** Adult - - 13x made, 3x did 3x making Year 5 ( LS ) - - - - Year 5 ( IM ) 1x comed, 1x come oned - - - Year 9 - - 1x made - Year 11 - - 1x made pull a face/ smile* Adult 1x made, 1x ["pull" underlined], 1x frowned, 1x made. 1x looked, 1x [phrase ] 4x gave, 1x cracked, 1x flashed, 1x gave, 4x smiled, 1x ["pulled" underlined] 1x [phrase] 1x face, 8x smiled Year 5 ( LS ) 1x washed - 1x took, 1x maked 1x have Year 5 ( IM ) 1x make, 1x pushed, 1x saw, - 1x maked - 1x [phrase] Year 9 1x made 2x [phrase] 4x smiled, 1x started to smile - Year 11 1x hit - 1x made, 3x smiled say (a) prayer/ grace Adult 3x pray, 1x make, 1x grace, 2x grace. 2x praying 1x express, 3x prayer, 1x prayers, 2x praying, 1x ["grace" underlined], 1x [phrase] 1x [phrase] 2x pray, 1x thank the Lord, 1x [phrase] Year 5 ( LS ) 2x pray 1x grace, 1x praying 1x do - Year 5 ( IM ) - - - - Year 9 1x have, 1x a pray 1x pray - 1x [phrase] Year 11 - - - - 7.2 Non-native speakers 225 Item Age e / s e / c c / s c / c Lose one’s job/ work* Adult - - 3x job, 1x employment, 1xwhen you get fired - Year 5 ( LS ) - - 1x works - Year 5 ( IM ) - - 1x do - Year 9 - - 1x job - Year 11 - - 1x job pretty girl/ boy** Adult - - 9x handsome - Year 5 ( LS ) - - - - Year 5 ( IM ) - - - - Year 9 1x nice - 1x sweet - Year 11 - - - false teeth/ hair* Adult 1x [phrase] 1x artificial 2x fake, 1x artificial, 1x wig 1x artificial, 1x tooth, 1x wig Year 5 ( LS ) - - - - Year 5 ( IM ) - - - - Year 9 - - - - Year 11 - - - - * = p< 0.1; ** = p < 0.5 Table 7.9: L2 speakers’ qualitative evaluation of items with the pattern of Contextual Acceptance sorted according to age and variant While it could be argued that the items’ established versions give almost no reason for correction, the comments on the simpler and more complex sentence containing more creative versions of the item lie, with four and three suggestions respectively, very close together. Furthermore, in both, artificial was suggested once as an alternative for the adjective collocate false. One student would have preferred wig over false hair . Yet, the more complex version of the item also received a comment on the noun collocate, while a simpler variant of the sentence seems to trigger more corrections for the adjective collocate in the shape of fake . With no comments at all, qualitative feedback from teenage test-takers is even less revealing. This might be because false teeth represents a different 226 7 CollJudge type of collocation in terms of word class combination. Therefore, these items might be seen as less central within a sentence, which could explain why test takers pay less attention to alternations and thus comment less on potential issues. However, as the next paragraphs will show, another Adj+N combination, pretty girl , behaves more clearly, including with respect to the participants’ qualitative feedback. Here the pattern of Contextual Acceptance is visible in both the qualitative as well as the quantitative evaluations by German adult learners of English. True to the pattern, the simpler, more creative version of the item receives all of the nine comments, all of which suggest handsome as a substitution for pretty in combination with pretty boys . Feedback from the teenage groups, on the other hand, is again very similar to the qualitative evaluation for false teeth / hair , with only two comments on both versions with a simpler sentences structure, both from year 9 and both with a suggestion for the adjective collocate. In fact, evaluations from the younger test takers resemble the feedback in the adult L1 data, which places all variations of pretty girl / boy on the same level of acceptance. This might indicate that a restriction on the collocation pretty girl can be observed in advanced EFL learners - presumably, because advanced students are made explicitly aware of a collocation like pretty girl - while younger learners do not seem to be very conscious of these restrictions in Adj+N combinations. Further items which yield a pattern of Contextual Acceptance among German adult L2 learners of English but are judged equally positive in all variations by adult L1 speakers are say grace and lose one’s job . Both are particularly interesting because they were also part of year 5’s obligatory test set. But while lose one’s job / work paints a rather unanimous picture throughout all age groups, evaluations of say (a) prayer / grace seem to be slightly varied. Lose one’s job / work displays a very clear, if not very distinct, pattern of Contextual Acceptance. The only four comments refer again to the simpler, more creative sentence and either give alternatives for the noun collocate - with three out of four suggestions being the initial, more established phrasing - and one rephrasing. In the same vein, but with only one comment each, teenage groups comment on the noun collocate as well. But, while one student in year 5 suggests a plural instead of the singular form, year 9 and 11 are more accurate in suggesting job . Among the group of children in an immersion programme, there is again only one comment, yet here it refers to the verb collocate. Say (a) prayer / grace , on the other hand, behaves unlike any of the other items that fall under this pattern. While teenage evaluators were quite reluctant in their comments on the previous items and either judged the four versions of an item similarly to their adult counterparts or not at all, say (a) prayer / grace shows a clear discrepancy between adult and teenage test-takers. In fact, feed- 7.2 Non-native speakers 227 back from the German adult learners seems to favour both more complex variants of the item (since all receive four comments each) followed by the simpler version containing the established version of the collocation (with six comments) and the simpler and more creative sentence (nine comments). The suggestion of rephrasing the sentence and transforming the collocation into the verb pray is prevalent in all four versions. But what is unusual is the fact that each version received at least one comment which suggests using the other noun collocate instead. So, in this case it seems as if the adult feedback is rather mixed but with a slight tendency towards say a prayer . Teenagers from years 9 and 11, on the other hand, might be more familiar with say grace , since the two versions of say a prayer receive more comments. In year 5 none of the groups made any corrections. This suggests that, while all other items discussed so far were either seen as acceptable or deviant versions of one potentially correct combination, the more established and the less frequent version both have their supporters in this case. Thus, the highly idiomatic nature of both versions can also be observed in the qualitative evaluation of non-native speakers’ feedback. Furthermore, several conclusions could be drawn on the nature of this kind of silence from students in year 5. First of all, it might of course simply be that traditional classroom phrases like Let’s say a prayer! or Let’s say grace! are not used anymore in younger classes and that students therefore become accustomed to these phrases in later years. Especially in a school where older classes are seemingly familiar with one version of this item, it is unlikely that students did not come into contact with the one phrase or the other. Thus, another option could be that it takes more than one school year for theses phrases to become entrenched. As an explanation, this is even less plausible, since other elements like keep pets in CollMatch (> 6.4.4) showed that even very young L2 learners are able to memorise a collocation after an exposure of only one sequence in class if it is salient enough. A third explanation might be that - while it is very likely that students have heard one version or the other in class before - they are used to spoken input and have difficulties recognising this item in its written form. Feedback on pull a face / smile again comes from all age groups. This not only indicates that, even at a very early stage, students are able to comment on items from CollJudge, but also that a pattern like Contextual Acceptance can prevail across groups. Furthermore, there is a slight tendency among young learners to comment only on simpler sentences, which again supports the hypothesis that young learners, if they are not familiar with a sentence, usually refrain from comment. Thus, at first glance, pull a face / smile seems an item which shows a clear tendency towards accepting the more creative version of this collocation only if it occurs in a more complex sentence. In all other cases, a rephrasing from an NP + VP combination to simply smile was the most frequent suggestion. 228 7 CollJudge The remaining comments focused on the verb collocate. As has been indicated before, this pattern continues in all LS classes, while the majority of comments for IM students refer to the simpler, more established version of pull a face . Interestingly, in the native speakers’ evaluation, this item falls under the category of Preference of Established item, which means that L1 speakers of English regard this item as a rather restricted, highly idiomatic combination. Pull a face / smile is therefore the second item where non-native speakers seem to be more willing to accept a more creative version of an item, if the context is right. This indicates that language learners rely more on complexity or context than on the collocational combiantion itself. 7.2.4 Pattern 4: Contextual Influence The hypothesis that learners of English rely on sentence complexity and context when it comes to items with which they are less familiar is further supported by a fourth pattern, which does not occur in the analysis of the L1 data. Contextual Influence ( CI ), rather than Contextual Acceptance, refers to items which display the influence of complexity and / or context on sentences which contain the established version of an item (table 7.10). Lend support / advice , for example, receives the highest acceptance score for the established, more complex version of the item, while all other sentences are judged at the same, somewhat lower level of acceptability. A general tendency to evaluate more complex sentences more positively, as with run a bath / tub , also falls under general Contextual Influence. Moreover, the preferred version does not necessarily have to be the more complex sentences; in the case of heavy rain , the simpler versions of established and creative combinations received the highest acceptance scores. Thus, it seems as if under certain circumstances the greater complexity of a sentence could also have a negative effect. In this case, it is presumably the sentence’s general syntactic structure which confuses many L2 participants. Since this pattern does not occur in the native speakers’ evaluation of Coll- Judge’s sentences, of course, none of these items are treated in a similar way by L1 and L2 speakers of English. But, neither is there one pattern which consistently occurs in the native speakers’ data set instead. On the contrary, the three items from this group show the maximum spread of L1 patterns, ranging from run a bath / tub , which is evaluated rather strictly as an item with a Preference for the Established Version (> 7.1.1), to lend support / advice , which receives quite positive evaluations from native speakers across the different versions (> 7.1.2). Furthermore, heavy rain / wind is one of the items which was difficult to identify as belonging to one specific pattern (> 7.1.4). 7.2 Non-native speakers 229 Qualitative feedback from German adult learners of English mirrors the quantitative evaluations for lend support / advice and run a bath / tub . With only three comments, the more complex sentence containing an established version of the collocation receives by far the smallest number of corrections compared to the other three versions. However, the complexity of the sentence only seems to influence the sentence containing the established version of the collocation, while the same version with a more creative alternation received comments from almost all participants within this group. Item Age e / s e / c c / s c / c lend support/ advice* Adult 7x give, 1x offer, 1x supply, 1x give a hand 2x give, 1x support 11x give, 1x lend some advise [ SIC ] 1x lend you advice 17x give, 1x give and advice, 1x give you Year 5 ( LS ) - - - - Year 5 ( IM ) - 1x learn - - Year 9 - - 2x give 1x give you Year 11 - 1x [phrase] 3x give 1x give run bath/ tub Adult 5x take, 2x fill, 1x have, 1x ["run" underlined] 1x take, 1x have 3x full, 4x bath, 1x tab, 1x what the bath tub is going to cost every month Year 5 ( LS ) 1x had, 1x make - - - Year 5 ( IM ) 3x take, 1x make, 1x have - - - Year 9 1x make, 1x do - 3x fill 1x what a tub cost every month Year 11 1x have, 1x take - 1x to fill a tube - 230 7 CollJudge Item Age e / s e / c c / s c / c heavy rain/ wind Adult 1x hard / strong, 1x ["heavy" underlined] 1x [phrase] 1x strong 2x strong Year 5 ( LS ) - - 1x strong - Year 5 ( IM ) 1x ["heavy" underlined], 1x we have rain and ice temperatures 1x raining and cold 1x have ["heavy" underlined] - Year 9 - - 1x strong - Year 11 - - 1x strong - * = p< 0.1; ** = p < 0.5 Table 7.10: L2 speakers’ qualitative evaluation of items with the pattern of Contextual Influence sorted according to age and variant Across all versions, adult German test takers predominantly comment on the verb collocate and suggest give as an alternative. This indicates that the majority of adult L2 speakers are apparently unfamiliar with the collocation lend support and have stored the (in both cases highly associated 11 ) combinations of give support and give advice instead. The latter, however, seems to be more entrenched, since here not even more context or a more complex sentence structure change evaluators’ opinion. Thus, it could be argued that lend support and lend advice are perceived as two equally deviant variations of the more established give support and give advice rather than as two versions of one item. Students from years 9 and 11 also prefer give advice over lend advice , but they tend to refrain from any comment on lend support . For run a bath / tub , both versions seem to be more connected again, since bath is used as an alternative in half of the comments on the c / s-version. But independent of the combination, both sentences with a simpler sentence structure receive more comments, as they receive less positive evaluations. Therefore, it seems as if even advanced learners of English are not particularly familiar with the collocation run a bath as such. This can also be observed throughout the younger age groups, where in years 9 and 11 both sentences with a simpler structure obtain a similar amount of feedback. A 11 While lend , as intended by the test’s design, is strongly associated with support but not advice , give reaches high values throughout all of the BNC’s association measures for both noun-collocates. 7.2 Non-native speakers 231 more complex sentence structure, on the other hand, seems to be able to raise the level of acceptance throughout all groups. In all groups from year 5, the only version with any suggestions is the simpler sentence containing an established form of the collocation. Since the equally simple sentence with a more creative version of the item does not receive any corrections, this again might suggest that very young learners only comment on sentences which contain words and phrases they are familiar with. One variant, however, seems particularly prone to a higher acceptance of the simpler sentence in general: heavy rain / wind . The qualitative feedback from German adult learners, on the other hand, shows a more or less equal distribution of comments for all versions, with only one or two suggestions and strong being the preferred alternative for heavy . In the evaluations of younger learners, the simpler, more creative version is, in fact, the only sentence to receive any comments at all. Here again, the unanimous suggestion is strong . Even comments from bilingually trained young learners of English show no clear preference for simpler sentence structures; yet, once again they make slightly more suggestions, some containing an acceptable rephrasing ( raining and cold ), while another produces rather odd phrasing ( we have rain and ice temperatures ). Overall, there does not appear to be a specific tendency as far as qualitative analysis of the evaluations is concerned. Thus, it might be that the preference for a simpler sentence structure, in this case, is not based on the item itself but rather on the rest of the sentence. In fact, even in the L1 adult data, this item displayed slightly odd distribution of acceptance scores. Here as well, the version with the highest acceptance was the simpler sentence structure containing an established version of the collocation, while all other variants received comparatively low evaluations. Therefore, this item has not been counted as representative of any of the more frequently occurring patterns. This indicates that the more complex sentence, even though it occurs in the BNC , is a rather unusual way of phrasing. The fact that L1 speakers are also more reluctant with the simpler, more creative version of this item, while in the quantitative evaluation of the L2 data both simpler sentence structures score equally well, further indicates that heavy rain would be considered a rather restricted combination, while learners accept unusual combinations more readily. 7.2.5 Other Patterns The last chapters showed that, at least for the L2 speakers, there might be a relationship between acceptance and contextual factors. But while a more complex sentence structure results in a better evaluation in most cases, there seem to be instances where the overall structure of a sentence is less convincing, dampen- 232 7 CollJudge ing the positive impression irrespective of the actual acceptability of the item under investigation. As the case of heavy rain in the last chapter demonstrated, the transition between the evaluation of an item and feedback which is instead based on other aspects of a sentence is fluent. This chapter, however, focuses, like chapter 7.1.4, on items which appear to be strongly influenced by the phrasing of the test sentence and thus should not be used for a detailed analysis of the collocation they contain. Interestingly enough, only two items fall under this category. One of these is serve a sentence / apprenticeship , which already caused problems for the L1 test takers. Similar to L1 speakers’ evaluation, advanced learners of English accepted all versions except the established combination in a more complex sentence. At first glance, this might seem confusing, but considering the history of the item itself - which occurs in the BNC in the form of a complex sentence containing serve an apprenticeship - it becomes clear why test takers tend to accept both sentences featuring this combination. For the more established collocation serve a sentence , on the other hand, a plausible context could be construed, but this takes a certain amount of imagination and creativity. Once again, the qualitative analysis paints a clearer picture item-wise. Based on the amount of feedback a version receives, serve a sentence / apprenticeship might be regarded as an item within the Contextual Acceptance pattern, with most comments on the verb collocate of the combination serve an apprenticeship in a simpler sentence structure (table 7.11). Item Age e / s e / c c / s c / c serve a sentence/ apprenticeship* Adult 1x had, 1x [phrase] 1x meet, 1x write, 1x ["serve" underlined] 2x done, 1x had done, 1x had, 1x do 1x ["served" underlined], 1x ["apprenticeship" underlined], 1x [phrase ] Year 5 ( LS ) - - - 1x ["apprenticeship" underlined] Year 5 ( IM ) - - - - Year 9 1x explained - - 1x solve Year 11 - - - - 7.2 Non-native speakers 233 Item Age e / s e / c c / s c / c weak link/ connection Adult 1x bad, 2x [phrase] 1x rules, 1x limits, 2x ["links" underlined], 1x poor, 2x link, 1x point, 1x [phrase] 1x links, 1x spots, 1x ["weak" underlined] 1x pupils who are not that good at school, 1x [phrase] Year 5 ( LS ) 1x of link - - - Year 5 ( IM ) 1x This is the link weak 1x lights, 2x week - - Year 9 - - - - Year 11 - - - 1x links * = p< 0.1; ** = p < 0.5 Table 7.11: L2 speakers’ qualitative evaluation of items with an unclear pattern sorted according to age and variant With this kind of discrepancy between quantitative and qualitative evaluations, this item can hardly be used to make many useful inferences about the relationship of a collocation with its more creative alternations or the context in which it occurs. Yet this item can still be considered important for the analysis of CollJudge from a methodological point of view. The fact that awkward phrasing stood out to both groups of adult speakers of English shows that all adult participants were aware of irregularities like this and would have taken them into consideration had they been a problem for the other items as well. Thus, it can be deduced that the skewed evaluation of items like serve a sentence / apprenticeship makes the evaluations of other items more reliable. The second item, which shows a skewed pattern in the German adult learners’ evaluation, is weak link / connection . Unlike serve a sentence / apprenticeship , this collocation receives unanimously positive evaluations in all its four versions from adult native speakers of English, while adult L2 speakers struggle, particularly with the established version of the collocation in a more complex sentence. All other versions are at an equally high level of acceptance. From a qualitative point of view, the simpler version containing a more creative combination receives a similar amount of feedback, but its level of acceptance is none- 234 7 CollJudge theless higher. Thus, neither sentence complexity nor the degree of idiomaticity of the collocation appear to be underlying aspects in the evaluations of most German adult L2 test takers. The teenagers’ evaluations also do not provide a potential explanation for the distribution of either quantitative or qualitative feedback. In fact, most comments from the younger age groups can be regarded as not very suitable corrections. In years 9 and 11 there is only one person who correctly points out that link would be a more idiomatic noun collocate in the more complex variant of weak connection . 7.2.6 Summary Similar to the group of native speakers, EFL learners’ quantitative evaluations produced a variety of patterns, most of which can be found in the L1 data set as well. Furthermore, six items (which translates into a total of 40 %) show the same pattern in both groups of adult evaluators. Therefore, it could be claimed that advanced non-native speakers of English do not fare too badly, considering that, on average, they had less exposure to the English language as well as fewer years of language experience compared to the native speaker participants. Interestingly, these six items - commit a crime / mistake , raise objections / reservations , meet the need / want , cook the meal / tea and false teeth / hair - only occur in the patterns of Preference of Established Variants and Contextual Acceptance . For the pattern of Overall Acceptance , none of the items are equally well accepted in all their variants by the group of advanced learners. In fact, L2 speakers seem to be more reluctant and only accept a VP + NP or Adj+N combination if they occur in longer, contextually richer variations. This leads to a more conservative evaluation of creative alternations from adult L2 test takers for another 40 % of the items. Vice versa, the percentage of items which are more positively evaluated by L2 speakers of English is rather small. Only 13 % (or a total of two items) are accepted by non-native speakers where native speakers would be more reluctant. These items are drop a hint / clue and pull a face / smile . This rather unequal distribution of underand overaccepted items could be interpreted as L2 learners’ more tentative or even conservative attitude when it comes to the perception of collocations. This tendency has already been observed in the evaluation of CollMatch (> 6.4.6). Furthermore, this might lead to the conclusion that learners are less aware of the flexibility of the English language, which could be explained by a lack of exposure to authentic language. But the fact that some collocations might have been explicitly taught to learners could also lead to non-native speakers’ more restricted perception compared to their native-speaking counterparts, as for example in the case of pretty girl / boy . If an explicit focus on some collocations was part of a learner’s biography, this could 7.2 Non-native speakers 235 also explain why some items were perceived as less restricted by L2 speakers of English: these items would then be the combinations which tend not to be part of focus sessions or lists of collocations in a more traditional EFL classroom 12 . Thus, de Cock’s observation that “[…] advanced learners’ use of frequently recurring sequences of words displays a complex picture of overuse, underuse, misuse of target language NS [native speakers’] sequences and use of learner idiosyncratic sequences.” (2004: 243) could, in fact, be extended to “overand underacceptance”. Complete misjudgement in the shape of unacceptable suggestions for combinations, however, only occurs among younger learners. Furthermore, there is a rather clear tendency among L2 test takers to accept items which feature within a contextually richer setting more readily. Differently to L1 speakers, there are even items where the length and complexity of the sentence seem to be a decisive factor (> 7.2.4). In terms of overall pattern frequency, Contextual Acceptance is the most prevalent pattern for advanced learners of English (40 %), while Preference for Established Variants and Overall Acceptance are, with one-third each, the dominant evaluational distributions for the native speakers’ data set. This suggests that non-native speakers rely more heavily on context, whereas native speakers show either clear reservation or tend to accept a more creative variation irrespective of its setting. The latter phenomenon could also be interpreted as the influence of a semi-fixed construction which might cognitively support the reading and thus also the acceptance of more creative, less established variations of the respective collocation. Since there is only one item which yields a pattern of Overall Acceptance among learners of English but which is also not completely accepted by the native-speaking participants, it would then follow that L2 speakers have fewer constructional patterns to support their evaluations. Chapter 7.3 will come back to this aspect and investigate its claim from a more statistical point of view. As pointed out before, these observations can only be made for adult learners of English. Not only because the database from the younger L2 speakers is too small to make any quantitative claims, but also because teenage test-takers tend to be rather reluctant in their qualitative evaluations. Especially for the regularly LS taught children, it almost seems as if they only comment on sentences which are not too complex in their structure and contain familiar vocabulary as well. Children in immersion programmes, on the other hand, appear to be a little more confident in their qualitative evaluation, which, however, does not always result in an acceptable correction. Yet overall, the explicit teaching of selected 12 Since the adult participants of this study come from various schools, the group tested here is too diverse to make any precise claims about influence of this kind of explicit input. Nevertheless, it is an interesting question and would be worth further investigation. 236 7 CollJudge collocations might have effects on advanced learners’ collocational proficiency, since, as explained above, the tendency to underaccept certain items is greater than to overaccept others. As far as comments on a collocation’s constituents are concerned, the picture evaluations from L2 speakers paint is less clear than that for the L1 data set. For most items, some L2 participants focus on the verb or adjective collocate, while others comment on the noun. Again, to a certain extent this seems to conflict with Hausmann (1985: 121), who would regard the noun in both cases as the more dominant part. Thus, it would follow that verb and adjective collocates would potentially have to be adapted to fit the noun and not vice versa. Of course, Hausmann's claim refers to the productive side of a collocation, but the fact that verb and adjective collocates can be strong enough to trigger a change in the noun-collocate demonstrates that context is, at least another, influencing factor which should be taken into consideration. This is especially the case since CollJudge’s items were only modified in the noun-slot, which means that the context supports the more fitting verb or adjective slot rather than the potentially less established choice of noun. 7.3 Comparing Corpus Data and Evaluations from Judgement Tasks So far, data from this chapter has been able to show that, despite the fact that the established variants of all 15 CollJudge items could be regarded as collocations, their evaluation differs as soon as the factors ‘context’ and ‘creativity’ are added. While some collocations, like meet the need / want , appear rather robust against any of these modifications, for others, like commit a crime / mistake , ‘context’ could be regarded as an influencing factor. Lend support / advice is also accepted in a more creative variant by most of the L1 test takers. L2 evaluators also produced similar patterns but not necessarily for the same items. These findings could have implications for research on language attainment, as well as for EFL teaching (> 7.4), but collecting data from L1 and L2 speakers for a study proves to be quite costly and time-consuming. Thus, this chapter contrasts results from previous chapters (> 7.1-2) with association measures 13 (> 5.1) in order to find out whether corpus data 14 might be able to provide similar information. For 13 A more detailed overview, including raw frequencies as well as all association measures discussed in chapter 5.1, can be found in Appendix IV. 14 In order to receive reliable scores, query results were checked manually for every combination. 7.3 Comparing Corpus Data and Evaluations from Judgement Tasks 237 an initial overview table, 7.12 lists all 30 noun-collocates sorted according to t-score, MI , z-score and Collocational Strength. Table 7.12: Raw frequencies and association measures for CollJudge Items The degree of robustness of the respective combination in the adult native speakers’ evaluations is indicated through a simple colour scheme. Red stands for items which, in the native speakers’ evaluation, seem rather robust with respect to the factors of ‘context’ and ‘creativity’ and thus yield a pattern of Preference for Established Variants (> 7.1.1). Yellow is for collocational combinations which are at least receptive to contextual influences (> 7.1.3), and finally green is for items which seem to be accepted in any of their four variants (> 7.1.2). The more established variant of an item is underlined. Per definition, these items reach rather high levels on the list of z-scores, but for t-scores and MI as well as the combinations’ raw frequency and collocational strength these established variants also make the top of the respective lists 15 . Log-likelihood values and the raw frequency of the NP or N collocate do not seem to correspond with CollJudge’s general distinction between established and creative 15 In table 7.12 the respective thresholds are marked with an asterisk. See chapter 5.1.1 for a more detailed description of the individual thresholds and their statistical implications. 238 7 CollJudge variants. In general, the raw frequencies of the collocational combinations, as well as their t-scores and z-scores, seem to work best to distinguish established collocations from their more creative variations. But there are also two inconsistencies to be observed: one is meal , the established variant of the item cook the meal / tea , which appears rather low on these lists. It is, in all three cases, even outperformed by its alternation tea . This result would suggest that the degree of confidence is, in fact, higher for cook the tea than for cook the meal , yet, native speakers did clearly reject cook the tea as a potential combination within the given context. A possible reason might be that make+tea is strongly entrenched and thus pre-empts a combination like cook+tea for native speakers of English who are not familiar with the polysemous meaning of tea , even if the concept of preparing a meal called “tea” is acceptable in some varieties of English. This observation is supported by the fact that tea lies above the respective thresholds for t-score, z-score, and MI; Collocational Strength also indicates a strong attraction. In the case of serve+apprenticeship, the problem seems to be reversed: apprenticeship has been selected as the creative variant of the item serve sentence / apprenticeship , because it occurs less frequently with serve than sentence. But as all measures of association show, serve is also statistically associated with apprenticeship , a result most L1 evaluators agree with, which is presumably why this item did not yield a clear pattern for the CollJudge analysis (> 7.1.4) in the first place. However, as has been mentioned before, despite the fact that this item as such should not be reconsidered for any follow-up study, the evaluations of the native-speaking participants show that their judgements can be regarded as fairly accurate and reliable. Furthermore, boy and mistake , the two creative variants of the items pretty girl / boy and commit a crime / mistake , reach scores above the respective t-score and z-score thresholds, which simply indicates that these combinations might be regarded as collocations. Yet, as Evert stresses, t-score and z-score especially are not very precise measures for collocational combinations and should thus be used as a scale rather than an index (Evert 2009: 1216-1218). For the patterns as such, none of these lists seem to be able to sort the 30 combinations according to the respective pattern they occur in. Yet, here as well, t-score, z-score and, to a certain extent, MI too indicate an interesting trend: most creative variants not only lie below the threshold of significance for the respective score, but there also seems to be a distribution depending on the pattern they occur in. But other than expected, this distribution does not feature the most rejected variants - creative variants within a pattern of Preference for Established Items - toward the bottom of the list and creative yet accepted variants rather close to the respective threshold. On the contrary, creative alternations like advice , grace , connection or work which were readily accepted by adult 7.3 Comparing Corpus Data and Evaluations from Judgement Tasks 239 native speakers in either ‘context’-variant, actually produce negative results and would therefore usually be regarded as dissociated 16 with their verb or nouncollocates. Context-dependent variants, on the other hand, still score positively and range between 0 and the respective positive value of the threshold. Variants which have been rejected by native-speaker evaluators are scattered below this threshold, with the notable exception of tea . This suggests that to be eligible to become a creative alternation of a collocation, a lemma needs to be dissociated with its potential collocate in order to be accepted. In other cases, where this dissociation is less clear, context might be needed to support a certain reading and thus acceptance by native speakers. Therefore, a semantically similar collocate seems to be accepted not only under certain contextual circumstances, but also if the combination as such is indeed fairly new or uncommon (Bybee / Eddington 2006). To a certain extent, this observation contradicts the basic assumptions of pre-emption, which in these cases would suggest that, if an item has never been encountered in a certain combination, a speaker would initially reject it (> 4.2.2). But, as a comparison of native-speakers’ evaluation patterns of CollJudge with the statistical t-score, MI , and z-score measures indicates, this does not seem to be the case for creative alternations of collocations. Moreover, Collostructional Analysis correctly predicts “attraction” for grace , connection and boy . However, for the other two creative alternations, advice and work , a Collostructional Analysis suggests “repulsion”, which, in fact, does not appear to be the case, at least as far as the adult native speakers in this study are concerned. In a similar fashion, Collocational Strength, in agreement with the native-speaking participants, suggests “repulsion” for smile and tub but would predict “attraction” for raise reservations , which tends to be rejected by L1 test takers. Therefore, while it makes rather accurate predictions for most items within this study, there is a total of three items (or 20 %) where a Collostrucdtional Analysis arrives at a different conclusion than academically trained, adult native speakers of English. Of course, this could be due to corpus design or the definition of what counts as the total number of constructions in a corpus (> 5.1.2). Thus, these factors, as well as a thorough check of Collostructional Analysis against elicitation data, might help to identify relevant settings and parameters for this type of analysis. Nevertheless, at least for t-score and z-score and in parts MI , association measures seem to be able to shed further light on the statistical relationship between collocations and their creative alternations. This is interesting, since Evert and Krenn (2001) report that, despite its mathemati- 16 Especially lose work , which within tand z-score falls under the measures’ respective threshold and thus can be seen as negative attraction, since both tests are asymptotic (Evert 2005: 80-84) 240 7 CollJudge cal shortcomings (> 5.1.1; 5.3.1), t-score performs surprisingly well for German Adj+N as well as Prepositon+N+Verb combinations, while Ellis and colleagues obtain a good correspondence between native speakers’ evaluations and MI for high-frequent multi-word formulae (Ellis / Simpson-Vlach / Maynard 2008). Log-likelihood and Collocational Strength, however, seem to make assumptions about the likelihood of an item occuring as a creative alternation within a collocation, which does not correspond well with the L1 data set of this study. The influence of the factor ‘context’, on the other hand, is difficult to test with corpus-based association measures. However, ANOVA s across CollMatch results from adult evaluators (table 7.2) for all four CollJudge variants of an item showed that items for which the factors ‘context’ or ‘creativity’ do not play any major role statistically do not seem to come from different samples. However, they are not able to discriminate between influences from contextual or constructional factors. Here, Ellis (2006: 11) suggests using Δp to test the influence of a cue on an outcome. In general, these values can be interpreted as an index for the degree of influence a cue, in this case ‘context’, has on the outcome; here a participant’s evaluation of a sentence. Table 7.13 gives an overview of the Δp-values, which indicate whether the reading of a sentence containing either the established (est.) or more creative (creat.) variant of an item is influenced by a contextually richer setting. The higher the Δp-value for a condition is, the more likely it is that contextual factors supported native or non-native speakers’ evaluations. Table 7.13 lists all 15 CollJudge items with their Δp-values for the evaluation of the established as well as the creative variants. Once again, this overview is subdivided into adult native and non-native speakers. L1 L2 CollJudge (pattern) est. (Δp) creat. (Δp) CollJudge (pattern) est. (Δp) creat. (Δp) commit a crime / mistake CONTEXT -0.01 0.49 CONTEXT -0.20 0.35 drop hint / clue CONTEXT 0.11 0.25 OA -0.04 -0.11 lend support / advice OA -0.04 -0.005 CI 0.32 -0.19 raise objections / reservations EST 0.06 0.00 EST -0.02 0.11 meet the need / want EST -0.05 0.07 EST -0.08 -0.10 run a bath / tub EST -0.02 0.26 CI 0.24 0.13 pull a face / smile EST -0.04 0.21 CONTEXT 0.09 0.22 7.3 Comparing Corpus Data and Evaluations from Judgement Tasks 241 L1 L2 CollJudge (pattern) est. (Δp) creat. (Δp) CollJudge (pattern) est. (Δp) creat. (Δp) say (a) prayer / grace OA 0.05 -0.02 CONTEXT -0.04 0.23 serve sentences/ apprenticeships OTHER -0.36 -0.18 OTHER -0.47 -0.14 lose one's job / work OA 0.05 0.12 CONTEXT 0.04 0.19 cook the meal / tea EST -0.22 0.19 EST 0.22 0.09 pretty girls / boys OA 0.07 -0.01 CONTEXT -0.05 0.50 heavy rain / wind OTHER -0.13 -0.04 CI -0.08 -0.22 false teeth / hair CONTEXT -0.02 0.22 CONTEXT -0.14 0.06 weak link / connection OA 0.03 0.02 OTHER -0.17 0.20 Table 7.13: Contextual influence (Δp) on the evaluation of native and non-native speakers of English for established and creative variants within CollJudge In the case of native speakers of English, the established variants of an item do not yield high Δp-values. But, as expected, the numbers rise over Δp ≥ 0.20 for items whose creative variant is only accepted in a contextually richer setting, like commit a crime / mistake , drop a hint / clue or false teeth / hair ( CONTEXT ). Three of the four items with a pattern of Overall Acceptance ( OA ), on the other hand, display a negative Δp. This indicates that Δp can indeed contribute statistical information on the influence of a contextually richer setting. Thus, it is particularly interesting to see that some items, like run a bath / tub , pull a face / smile or cook the meal / tea , seem to show a context-effect; a tendency which was already apparent in native speakers’qualitative evaluations (> 7.1). Furthermore, there are two items among the established variants which yield a rather high negative Δp-value: serve sentences / apprenticeships and cook the meal / tea . In the case of serve sentences / apprenticeships , this negative Δp-value indicates once again that this measure could indeed be appropriate for testing the influence of contextual factors, since, as mentioned before, this item does not seem to work well, not least because the initial sentence features serve apprenticeships and the supposedly more established item serve sentences does not really fit into this context (OTHER) - a fact which also seems to be supported by a Δp-value of -0.36. The dissociation of cook the meal and its context, however, might have a different reason. Unlike serve sentences / apprenticeships , cook the meal / tea as an item shows a pattern of Preference of Established Items ( EST ). 242 7 CollJudge Thus, the combination of cook the meal can be regarded as fairly well accepted. The dissociation of collocation and context could therefore indicate that cook the meal is understood independent of the context it occurs in, and thus behaves rather like a single unit of meaning, which might also explain why cook the tea , despite its general association, was not evaluated with similarly high acceptance scores, nor interpreted within the context of the meaning of the established variant by adult native speakers of English. Context seems to have a stronger impact on the L2 speakers, on the other hand. While the established variants of only four items seem to have been influenced by contextual factors (|Δp| ≥ 0.1), this number almost doubles to seven for the adult EFL learners. Yet, for the adult EFL learners, the Δp-values do not correspond equally well with the patterns identified in chapter 7.2. However, all but one item which generated a pattern of Contextual Acceptance work comparatively well. They show a Δp of over or close to 0.2 for the creative variants of the respective items, which suggests that context indeed plays a major role in accepting these variants. Raise objections / reservations and meet a need / want - two out of the three items which created a pattern of Preference for Established Items - display, true to pattern, only a low Δp-value for their established variants. Nonetheless, context seems to play a bigger role in non-native speakers’ evaluations, especially when it comes to judging the more creative alternations of established collocations. Yet, while association measures as well as Δp-values seem to be able to yield or support trends and observations, they can be more fruitfully interpreted against the background of native-speakers’ evaluations and thus, should not be the sole source of information. They might, therefore, instead be applied in connection with native speakers’ evaluations or other methods of elicitation (> 5). 7.4 Summary and Implications Similarly to the results from CollMatch, CollJudge, a test with a focus on ‘creativity’ and ‘context’, yielded a spectrum of different evaluation patterns. These patterns indicate that some collocations are indeed restricted when it comes to selected creative variations (> 7.1.1; 7.2.1) while others might be more open and susceptive to alternations (> 7.1.2; 7.2.2). But in some cases, a higher level of acceptance for sentences containing a creative variation of an established collocation only appears possible with the support of sufficient and adequate context (> 7.1.3; 7.2.3; 7.2.4). Furthermore, substitutes which could be regarded as semantically similar to an established collocate but are at the same time statistically dissociated from the second collocate within a collocation seem to be accepted 7.4 Summary and Implications 243 more readily by adult native speakers of English (> 7.3). For the cognitive representation of collocations and their more creative alternations, this might suggest that despite the fact that all established variants within CollJudge could be regarded as statistically relevant, not all share the same degree of fixedness. While more restricted combinations could be seen as rather set units, collocations which are also accepted in their more creative variant might have developed an additional cognitive construction which supports the interpretation of even less likely collocates. As mentioned above, a third group then does not seem to have developed an additional, semi-fixed constructional level, but is flexible enough to allow for alternations if contextual factors support a more creative reading. This could also indicate that context triggers the development of analogies from exemplars (> 4.2.1) and is thus one of the driving forces when it comes to the development of semi-fixed constructions in the first place, because it broadens the potential applications of a collocational construction and thus might lead to a more schematic representation, or as Wray (1999: 222) observes: The increasing automatisation of language during phase three is marked by a switch from a preference for literal interpretations of standard formulaic sequences (e. g. she has him eating out of her hand) to their metaphorical counterparts, a process which is not complete until late teenage […]. (Wray 1999: 222) A similar relationship between high type-frequency and a construction’s higher creative productivity has also been part of Braðdal’s (2008: 172) productivity model. Admittedly, with only one alternation tested, the spectrum of variation analysed in this version of CollJudge is not particularly broad and thus it could well be that even for more restricted collocational combinations, context influences the level of acceptance of other, less associated collocates. Therefore, it would be interesting to test the observations from chapter 7.3 on a wider range of creative alternations of a collocation. It is, furthermore, very likely that different syntactical patterns also contribute to a different contextual setting (Stubbs 2001; Klotz 1997). To further investigate these implications, it would of course, be convenient if there were one statistical association measure which yielded similar results to the more costly elicitation of judgement data through CollJudge, but unfortunately neither the acceptance score of individual collocations nor the most common association measures within a large corpus like the BNC were fully able to predict these results, especially not the positive effect contextual factors can apparently have on the acceptance of less established combinations. Additional information from the corpus nonetheless shed some more light on the dissociate nature of more readily accepted creative alternations. Thus, as Labov (1972: 118) already observed more than forty years ago, for research on collocations too, “[t]he most effective way in which 244 7 CollJudge convergence can be achieved is to approach a single problem with different methods […]”. With the findings on context from this chapter in mind, one might furthermore want to add “and within different settings or contexts”. A further methodological implication which can be drawn from a mere quantitative analysis of the participants is that, even for items which were well accepted among native speakers of a group, comments could be found which suggested a change of phrasing or lexical choice. This indicates that judgement tasks should aim at a minimum number of five participants; at least in the cases indicated above, the number of comments for well-accepted sentences did not exceed this benchmark. Moreover, it is advisable to elicit quantitative as well as qualitative feedback from participants, since it could possibly prevent a kind of correctionmeans-rejection fallacy. As for the comparison of native and non-native speakers of English, results from CollJudge suggest that from the point of view of language perception, adult non-native speakers achieve similar results to their L1 peers for half of the test items. Thus, similar to CollMatch (> 6), these results indicate that, to a certain extent, advanced learners of English are able to yield L1-like results. Yet, there are also cases of over as well as underacceptance, which of course have implications for L2 speakers’ collocational proficiency. At first glance, cases of overacceptance seem to have the more far-reaching consequences, since this means that even advanced learners of English (some of whom are studying English to become English teachers) are not able to detect less acceptable phrases and combinations. Thus, it follows that participants who are enrolled in a teaching programme would not always be able to mark their future students’ language adequately. Furthermore, it could also mean that, if EFL learners are unaware of a certain restriction, they might be more likely to use these combinations productively as well. While this might not be the case for instances of underacceptance, even these items can potentially cause problems, especially for future English teachers. In a way, these might be even more severe, since if a teacher is unable to understand a language’s combinatorial potential, s / he might stick to language which is to some extent restricted. On the one hand, this would deprive his / her students of the chance to receive native-like input from their teacher, but, especially in more advanced groups, it could also mean that students who are able to use English in a more native-like way - for example because they have spent some time in an English-speaking country - are corrected or even marked down because of their supposedly wrong use of the English language. This process would then actively deteriorate a learner’s command of a language and might even discourage some students from pursuing the subject altogether. 7.4 Summary and Implications 245 Furthermore, L2 speakers of English seem to rely more heavily on contextual factors, which, once again, could imply that they are less familiar with the applications of lexical items per se. As mentioned before, this could be due to a lack of authentic input but also because, as a foreign, formally taught language, their exposure to the target language is simply not enough. In general, a comparison between young learners within English-as-a-subject ( LS ) and immersion ( IM ) programmes suggests that students with an IM background are slightly more confident with their comments on the different variants of CollJudge’s items, even if they are, admittedly, not always correct. Overall, this chapter, like any analysis of data, only offered a “post hoc” attempt to “discern” cognitive interpretations against the background of participants’ evaluations (cf. Siepmann 2005: 432). Yet, as Hunston (2007: 258-261) argues, these observations can be used to explain “collocational inference” and more creative instances of language, as well as rather frequently used stylistic devices and also, as the previous pages suggest, different levels of collocational variation and fixedness. As for the predictive value of this study, it has to be stressed that, despite its best efforts, this analysis was only able to provide a first idea of how the collocational proficiency of L1 and L2 speakers of English might develop. But, true to the principle of complex adaptive systems (> 4.3.3), these observations and explanations can now be taken as a starting point for further research, for example, for a broader longitudinal study. Thus, to conclude this study, chapter 8 provides a summary of further suggestions for future research as well as a general overview of this study’s results, limitations, and implications. 7.4 Summary and Implications 247 8 Main Results and Implications However, it is important not to view the regularities as primary and the gradience and variation as secondary; rather the same factors operate to produce both regular patterns and the deviations. (Bybee 2010: 6) This study set out to investigate collocations as dynamic linguistic phenomena which could be seen as subject to constant change rather than as static combinations with an additional level of syntagmatic and paradigmatic restrictions. Thus, it argues that collocations should not just be regarded as idiosyncratic phraseological items, which, depending on their degree of fixedness and semantic opaqueness, can be classified along a gradient of idiomaticity. There are creative changes and alternations which can also be observed in collocational combinations. To a certain degree, this conception of alternation and change has already formed part of most approaches towards collocations (Sinclair 1991; Halliday 1966; Firth 1951 / 1964), yet the focus of lexical sets, statistic analysis or typical relations has often been on “regular patterns” (Bybee 2010: 6). But, as Bybee emphasises, from a cognitive point of view the same mind produces both “regular patterns and the deviations”. Thus, both should be used to identify underlying cognitive processes and factors. Therefore, the study at hand has attempted to combine established collocations and creative alternations to test speakers’ receptive collocational proficiency at different points in time. In doing so, it was able to observe tendencies and trends, which on the one hand suggest that, like other linguistic units such as morphemes (Bybee 1995) or vocabulary (Rescorla 1980; Anglin 1977), collocations might to some degree be subject to a process of u-shaped language attainment, but also that contextual factors can play a crucial role in the evaluation and interpretation of collocations. In order to compare results from this study with existing research within the respective fields and discuss its implications for usage-based approaches, chapter 8.1 will first summarise the main findings, before chapter 8.2 discusses this study’s limitations and shortcomings and sketches potential areas for future research. Finally, chapter 8.3 outlines which implications these results might have for cognitive research in general, as well as first and second language attainment in particular. 248 8 Main Results and Implications 8.1 Main Results of this Study As outlined in chapter 1, three research questions serve as the focal points of this study. They sought to add a third developmental dimension to a phenomenon which often has only been viewed from two sides. Thus, these RQ s concerned themselves with the potential conception of collocations as dynamic, cognitive entities ( RQ 1), the proficiency of collocations throughout different stages of first and second language attainment (RQ 2a / b), and the possible influence of ‘creativity’ and ‘context’ on the perception of collocational combinations ( RQ 3). In answering these questions, the present study seeks to contribute to a more comprehensive understanding of collocational phenomena against the background of cognitive development in general and first and second language attainment in particular. Here, RQ 1 was predominantly concerned with a more theoretical reasoning about a potential model which is able to incorporate different desiderates and findings from theories on cognitive development as well as phraseological research in one unified model, the DMCDC -model. The other two RQ s called for a more applied approach. RQ 2a / b and 3 were therefore designed as a series of judgement tasks, which, at the same time, also served to put the DMCDC -model to a first test. This study’s answers to the three RQ s can be described as follows: RQ 1 Are collocations a cognitively stored entity and if so, how can this perspective be adequately described in a comprehensive conception of collocational combinations? Based on theoretical considerations of collocations (> 2) and creativity and change (> 3) as well as language attainment (> 4), Chapter 4.4 presented a dynamic model for the cognitive development of collocations: the DMCDC-model. Its conception draws on insights from constructionist approaches towards language acquisition which suggest that, based on general cognitive processes such as categorisation, association, and analogy as well as social interaction, patterns develop which form the basis for human behaviour and interaction. In the case of human language, this leads to the association of linguistic forms with a certain meaning or function. These constructions can then remain rather fixed, become more flexible in some ways or develop completely new, often more abstract form-meaning pairings. One of the most crucial factors for this process is the type and token frequency of input a system like the human mind receives. If the input does not occur with many variations (tokens), fairly fixed idiomatic constructions are created, but variation around a collocate (types) lead to the development of semi-fixed collocational constructions. Thus, the DMCDC -model 8.1 Main Results of this Study 249 regards collocations as constructions which, depending on their own frequency as well as the quantity and frequency of functionally and semantically potential collocates, operate on different levels of abstraction. But as Geshkoff-Stowe and Thelen point out: Variability is not merely “noise” in the system. Nor does variability reflect the vagaries of performance factors that interfere with the expression of the global mental structure. Rather these temporary gains, losses, and hesitations that mark the path towards mature growth offer important clues to the study of change, and indeed may be the very source of change. (Geshkoff-Stowe / Thelen 2004: 13) For collocational combinations, this change becomes most obvious in creative alternations of collocations. Therefore, creative as well as established combinations were later used to test whether, as the DMCDC -model predicts, the interpretation of creative alternations is based on semi-fixed constructions or is instead dependent on contextual factors. Furthermore, similar to Wray and Perkins’ (2000) stage model, the DMCDC -model includes a stage of quite analytic language processing which can be observed in most developmental processes (Gershkoff-Stowe / Thelen 2004). A further prediction made by the DMCDC -model is that the development of collocational proficiency does not follow discrete stages but rather builds and re-builds itself on an individual’s existing competence level. As a result, not all collocations should be expected to be found on the same level of entrenchment and acceptance at one point in time. Based on the principles of Complex Adaptive Systems (Ellis / Larsen-Freeman 2009; Larsen-Freeman / Cameron 2008), the model predicts that the attainment of each collocational combination is subject to an individual attainment process. Thus, while one collocation might already be a firmly established part of an individual’s collocational repertoire, another might not yet be fully understood or accepted. This lead directly to RQ 2a and RQ 2b. Both questions centre on whether the several stages of the dynamic process outlined above can indeed be observed in speakers of English. Furthermore, RQ 2b introduced another factor, the distinction between the cognitive processes of native and non-native speakers of English. RQ 2a Is there a unified process underlying the attainment of collocational proficiency? RQ 2b Does the collocational proficiency of native (L1) and nonnative (L2) speakers of English develop in the same way? 250 8 Main Results and Implications Based on CollMatch (Gyllstad 2007), a general test of collocational proficiency, this study was able to show that, as suggested by the DMCDC -model, the test’s 70 collocations fall into different patterns of acceptability depending on the age of the test takers. While there are items which were initially accepted by all participants (Steady Acceptance), others painted a more gradual pattern of acceptance (Gradual Acceptance) and some showed, as the analytic stage 1 within the model would predict, a tendency for over-acceptance among teenagers around the age of 14 (Peaked Acceptance). But there are also collocations which are gradually less accepted the older the native-speaking evaluators become (Receding Positive Evaluation). A possible reason for this might be that, despite the fact that the affected items could be regarded as collocations from a statistical point of view, they do not present the most common verbalisation of the concepts they are referring to, which could lead to gradual rejection the older, and therefore more experienced, the participants get. Since the study at hand was designed as a pseudo-longitudinal study, it is, however, difficult to claim that these four patterns cannot be regarded as a definite indicator that some collocations at least experience a stage of analytical over-generalisation before they are then re-stored as phraseologically restricted combinations. Yet they do show that at one point in time, several levels of acceptance can be observed, which suggests that different types of cognitive representation exist at different stages. Apart from these four general patterns, there is also another meta-pattern (Academic Acceptance), which shows a jump in acceptance scores between teenage evaluators and academically trained adults. This indicates that collocations are not only acquired individually and can thus be found in different shapes of representation, but also that ‘education’ might play a role when it comes to native speakers’ receptive collocational proficiency (> 6.3.6). Different age groups of non-native speakers of English with German as their L1 also produced these four-plus-one patterns. But, unlike their native-speaking counterparts, at the age of 15 German teenagers seem to feel less confident in their evaluations. They tend to be more reluctant and thus contribute to a pattern of Dented Acceptance for some items. This might indicate that the more these students learn about a language, the more cautious they become about accepting combinations they are not really sure about. But while the patterns are very similar to the native speakers’ evaluations, the items which can be found within the respective patterns are not. Thus, it seems that, while cognitive processes might yield similar patterns, differences in input and therefore type 1 Since this tendency can also be observed with an equally high percentage in the evaluation patterns of the test’s pseudo-collocations, this could indeed indicate a stage at which speakers tend to focus on analytical aspects of a language, rather than simply a better performance by teenagers from a certain group. 8.1 Main Results of this Study 251 frequency might be responsible for the shift in item distribution. A potential implication of this would be that this discrepancy is partly responsible for the frequently observed failure of non-native speakers to produce idiomatic English (Nesselhauff 2004; de Cock 2004; Granger 1998; Howarth 1996). Furthermore, an additional comparison of the influence of schooling context showed that students who had been part of an immersion programme ( IM ) for over three years perform statistically better than students who learnt English in the context of a regular language class ( LS ). However, this is not true for all LS groups. Thus, it seems that under certain circumstances, which might be connected with the level of activation and task-based learning a teacher chooses to use, LS classes can perform as well as early IM classes. To find out whether, as suggested by the DMCDC -model, the acceptance of more creative alternations of collocations coincides with a cognitively stored, semi-fixed collocational construction, RQ 3 was designed to explore how the factors of ‘creativity’ and ‘context’ influence a participant’s perception of collocations. RQ 3 Which role do the factors ‘creativity’ and ‘context’ play for the acceptability and analysis of collocational phenomena? The evaluation of data from CollJudge (> 7) suggests that less frequent and thus potentially more creative collocates can affect a native speaker’s acceptance of a sentence in three different ways: they do not produce any effect on acceptance, they trigger a rather clear rejection, or they are accepted, but only if they are supported by contextual factors. For a level of semi-fixed collocations, as suggested by the DMCDC-model, this could indicate that collocational constructions exist to support the interpretation of rare, more creative collocational combinations, at least in those cases where all variants of an item are equally well accepted. Yet, a preference for established variants as well as a certain degree of context dependency indicate that, if collocational constructions exist to support or facilitate the reading of an utterance, other factors, like early acquisition 2 or a discursive salience in the shape of a fitting context also appear to contribute to the interpretation and acceptance of more creative combinations. The relevance of contextual factors in the interpretation of more creative variations of collocations could also be interpreted as a further indication that cognitive processes 2 Most items which generate a pattern of Preference for Established Items were initially accepted by all native speaking participants (> 6.3.3). This suggests that they are part of a native speaker’s collocational inventory from a comparatively early stage. They might therefore be stored as holistic units rather than as more flexible semi-fixed constructions. 252 8 Main Results and Implications might at least partly be based on exemplar models rather than more abstract prototypes (> 4.2.1). In this framework, creative alternations of collocations then function as both symptom and cause at the same time. Symptom, because they indicate that there is a certain degree of flexibility, a potential constructional slot, within a collocational construction, and cause, because the more often a speaker encounters creative variations of a construction, the less lexically fixed that construction becomes, which then, of course, serves as the foundation of new, creative combinations. Hence, collocations, like possibly any other constructions, need a variable, creative input (type frequency) so they can expand and develop more abstract constructions. Furthermore, statistical analysis (> 7.3) has shown that collocates which are similar in meaning to a more established collocate and rather dissociated as a combination tend to be more readily accepted. Moreover, Δp-values not only supported the observation that a contextually richer setting supports the acceptance of less expected combinations in some cases, but also that even in other patterns, like Preference for Established Variants or Overall Acceptance, contextual factors might additionally influence the judgement of L1 evaluators. L2 participants again produce evaluation patterns which suggest that context as well as collocational restrictions influences the evaluation of established collocations and their more creative variants. But contextual factors seem to guide adult EFL learners’ judgement more than that of their L1 counterparts. As before, the items which generated the respective patterns do not really correspond to the native speakers’ evaluations. Based on these findings the working definition from Chapter 2 can be modified as follows: A collocation is a construction which consists of a lexical form and a functional or semantic meaning. Depending on the type frequency of the collocational combination as well as the frequency of semantically similar combinations which are encountered for the respective collocation outside the collocation, collocations can develop into fixed - and therefore closely related to a compound - semi-fixed or delexicalised construction. Semi-fixed collocational constructions open up one or more slots, which through inheritance relations influence the meaning of any constituent chosen to be used in this particular slot. The constituents of a collocation are interdependent; each can be regarded as a fixed point or an exemplary item within a slot. Depending on the number of constituents of a collocation, they are at least bi-directional. Apart from the implications for language attainment and processing, results from CollJudge might also contribute two caveats to the theory of elicitation in general and judgement tasks in particular. One is that context is, as Searle 8.2 Limitations and Further Research 253 (1979: 117) already emphasised, a ubiquitous phenomenon which increases the salience of a specific reading. This was also an influencing factor for L1 and L2 speakers in this study. Thus, any task which asks participants to evaluate an item should be aware of the fact that context (as well as its absence) might trigger different results. Furthermore, it is quite likely that participants create their own contextual setting as soon as they perceive an item. To test items in isolation might then run the risk of not being able to control for contextual factors in the first place. Moreover, the qualitative evaluations in chapter 7 have shown that, from time to time, even generally accepted variants received comments from participants, demonstrating a certain level of disagreement among evaluators. Thus, if a study chooses to ask speakers of a language about their evaluation of an item, it might be good advice to include more than five participants since even established variants like raise objections or lend support obtained two comments from adult L1 evaluators. Raising the number of participants therefore reduces the chance of over-evaluating these comments. 8.2 Limitations and Further Research Despite this study’s best efforts to produce data which was representative yet adequate for the research questions at hand, it also has its limitations and shortcomings. The number and set up of participants, for example, could be considered one of the study’s shortcomings, since sampling according to courses and classes might be seen as too restricted. While the number of participants in all groups used for the statistical analysis lies above ten, one might, of course, argue that some subgroups in this sample are not representative enough. Especially among the different teenagers, groups come from one class or year and thus are not randomly sampled or spread evenly across different socio-economic backgrounds. Yet a random and at the same time balanced sample of participants would not have been suitable for making inferences about the potential influence of a classroom setting (> 6.5). A follow-up study with more detailed socioeconomic data could help to gain further insight into the patterns identified in chapters 6 and 7 and would yield a more fine-grained analysis and correlations. Complex Adaptive Systems approaches which partly inspired the DMCDC model, even suggest abandoning comparisons of groups completely and focusing instead on individual speakers and their development. This would also imply that the data obtained needs to come from an actual longitudinal study, which, as pointed out before, might not only be difficult to administer but could also cause potential effects of familiarisation with the tests. 254 8 Main Results and Implications However, these observations too would, as Siepmann points out, “only be discerned post hoc” (2005; 2005: 432). Yet, if different methods and studies yield similar results, this might be a strong indication in favour of a certain interpretation. Thus, despite its limitations, this study tried to apply different models and measures to provide a first overview of potential processes, patterns, and interdependencies within the development of collocations in language attainment. One of the main issues, however, is the fact that the evaluation patterns discussed in chapter 6 are based on pseudo-longitudinal data. While the use of pseudo-longitudinal data today can be considered a common tool in linguistic research - for example in corpus linguistic studies on learner language (Maden- Weinberger 2015; Hasko 2013) or diachronic language development (Hilpert 2008; Traugott 2008) - it can only provide initial trends and potential processes. Therefore, this study might have set the stage for further studies on the interrelation of established patterns and creativity in general or may even inspire potential models of language processing (like the DMCDC -model), but more reliable data still needs to come from true longitudinal studies. Depending on the scope and focus of such studies, they might also decide to include further phraseological phenomena and constructions to find out whether the basic idea of the DMCDC -model could also be applied to other linguistic items. Moreover, it would be interesting to trace potential differences between varieties of English such as British English, American English, Australian English or Indian English. A closer look at potential differences between non-native speakers with different linguistic backgrounds could also be an interesting perspective. In order to minimise the effect of the L1 on learner’s perception, this study chose to focus on EFL learners with a German L1 background. But a follow-up study could put the pervasiveness of the model in general and in particular the patterns identified in chapters 6 and 7 to the test. Furthermore, this study speculated that the discrepancies between items within the respective patterns in chapters 6 and 7 stem from the differences in input received by L1 as opposed to L2 speakers of English. Some aspects indeed indicate that input might partly be responsible for this observation, like the fact that EFL learners who are part of an immersion programme seem to perform quite well compared to their LS peers, the fact that classroom phrases like pay attention achieve a high score within both groups, and the tendency of EFL learners to be less familiar with more common, every-day combinations like run a bath or throw a party . Here, a study which also measures the frequency and type of input a speaker receives would be able to test the amount of input which might be necessary to achieve more native-speaker-like results. 8.3 Implications for a Usage-based Approach Towards language 255 In this context, a closer look at the relationship between input, classroom setting and teaching method would also help discern whether more input alone can contribute to better collocational proficiency among EFL learners. 8.3 Implications for a Usage-based Approach Towards language As outlined in the previous chapters, this study produced some interesting results which, to a large extent, fit in with current findings in cognitive linguistics but could also be interpreted against the background of neurological as well as psychological studies. It argued that variability and change form an integral part of human cognition and that changes in contextual factors such as age or contextual complexity influence the linguistic perception of native as well as nonnative speakers of English. Based on these assumptions, the two-dimensional DMCDC-model was designed. Chapters 6 and 7 then concerned themselves with these two dimensions of the DMCDC-model: while chapter 6 compared different stages of receptive knowledge of collocations in L1 and L2 attainment, chapter 7 focused on different types of cognitive representation and their interdependence with ‘creativity’ and ‘context’. For the more temporal dimension, chapter 6 demonstrated that, as suggested by the model, not all collocations seem to be accepted equally well by all groups of L1 test takers. This might imply that, while adult native speakers unanimously accept most collocations, younger L1 speakers perceive some combinations as familiar but struggle to identify others. The patterns of acceptance which result from these evaluations not only indicate that native speakers’ collocational proficiency develops in stages but also that this process is still active for older teenagers around the age of 16. Furthermore, there appears to be a stage of rather high acceptance for students in year 9, which might correspond to the more analytical stage suggested by Wray and Perkins (2000) and the DMCDC -model (> 6.3.2). Moreover, as the contrast with adult L1 speakers showed, not all native speakers might achieve the same level of proficiency when it comes to collocations. Apart from the general conception of phase-wise development, the vertical axis of the DMCDC -model suggests that this process results in different cognitive representations: from more restricted, almost holistic association of collocates to rather flexible combinations and context dependency. Chapter 7 attempted to demonstrate that some collocational combinations are indeed rather robust to change and thus might be stored as more or less idiosyncratic units, while other combinations seem to be able to handle creative alternations quite well. The last two sub-chapters contrast these findings with existing research 256 8 Main Results and Implications in first (> 8.3.1) and second (> 8.3.2) language attainment. They furthermore include other aspects like diachronic linguistics or valency theory, which have also found some common ground with usage-based theories in general and the respective types of construction grammar in particular. 8.3.1 First Language Acquisition When it comes to age-related comparisons, usage-based studies on first language acquisition often report that the acquisitional process as such do not seem to evolve in a linear manner but rather advance with phases of deterioration in-between. As mentioned before, this phenomenon is usually referred to as a u-shaped learning curve (Ambridge / Pine / Rowland 2011; Geshkoff-Stowe / Thelen 2004; Brooks / Tomasello / Dodson / Lawrence 1999; Bybee 1995; Plunkett/ Marchman 1991; Cazden 1968; Berko 1958). Wray and Perkins’ (2000) model also predicts a decline in holistically stored phraseological chunks in favour of a predominantly analytical phase, which is then followed by a rise in holistically processed items. At first glance, data from this study seems to run in the opposite direction, since there is not one instance of a u-shaped acceptance pattern among the L1 evaluation patterns for CollMatch’s 70 collocations. On the contrary, native speakers around the age of 14 accepted collocations and pseudo-collocations more readily than teenagers in years 5 or 11. Yet if items, especially pseudo-collocations were rejected by adult native speakers of English, this could also mean that the focus of this particular teenage group does not lie on idiomatic restrictions but rather on the general plausibility of a VP + NP combination. Since the items in CollMatch are structurally all constructed according to syntactic rules within the English language, it could therefore be argued that this peak in acceptance translates into a phase of more analytical language processing and is therefore very much in line with similar findings within the field of morphology or semantics. Furthermore, the conception of language as an ongoing, usage-based process of individual development and (re-)organisation, as in theories based on the idea of Complex Adaptive Systems (Ellis / Larsen-Freeman 2009; Larsen- Freeman / Cameron 2008), also fits quite well into this study’s results, first and foremost because not all collocations develop simultaneously. There is a total of four patterns (> 6.3), indicating that not all collocations are equally well accepted even within a given age group. Thus, even among native speakers, some items (like pull a face ) seem to have been acquired quite early and are therefore initially accepted across all participants’ evaluations, while others (like exercise discretion ) might take more time to be stored as a collocational combination or even be encountered as one in the first place. Here, a true longitudinal study 8.3 Implications for a Usage-based Approach Towards language 257 might also be able to reveal whether these patterns could not only be item but also person-specific. But development over time can also be observed over a much longer timescale. Interestingly, in diachronic research authors also consider “[…] cognitively-based motivations such as analogical thinking and acquisition […]” (Traugott / Trousdale 2013: 35) as one of the major “mechanisms of change”. Thus, Traugott and Trousdale, for example, argue that constructions not only develop within the span of a lifetime but also across generations and centuries (Traugott / Trousdale 2013: 1-44). However, even if these phenomena can only be observed once a group of speakers agrees on the same constructional changes, this process of Constructionalisation is still based on the same cognitive mechanisms. For the present study, this seems to be the case for items which were accepted in a less established variation, like [ pretty +N]. Despite the fact that only about 30 years ago linguists like Palmer (1976: 96) rejected a combination like pretty boys , it appears that the co-occurrence of these two lexemes today is well established among native speakers of English (> 7.1.2). Other items, like [ commit + NP ], still need context to support a more creative alternation. However, the combination commit a mistake is already listed in the Oxford Collocations Dictionary ( OCD ), which shows that this particular construction might be at a stage where it is expanded yet not fully accepted among all native speakers. Diewald too identifies contextual factors as one of the prerequisites for the expansion of a construction. She regards “[…] critical context, in which, because of its multiple structural and semantic ambiguity, the grammaticalization process is triggered” (Diewald 2002: 116) as especially essential for the development of new constructional meaning through grammaticalisation. Since the contextually richer variants from CollJudge also supported a semantically ambiguous reading which coerced the meaning of the lemma into the more established reading of the collocation (> 3), these sentences might also be seen as “critical context”. Furthermore, as already indicated by the example of [ commit + NP ], it seems that there are cases where the evaluations from academically trained adult L1 test takers or lexicographers can differ, at times quite drastically, from the evaluations of less experienced native speakers. As pointed out before, this shows that the acquisition of collocational proficiency is a process which starts and ends for each collocation at a different point in time and seems to take well into the late teens to stabilise, though not necessarily on a predefined level. Nippold and Martin (1989) report similar findings for idioms. This supports Dąbrowska’s (2015, 2012, 2010) claim that not every native speaker reaches what is usually referred to as native speaker proficiency. Overall, this study showed that these findings from usage-based studies on ageor time-related aspects also seem to be applicable to phraseological phe- 258 8 Main Results and Implications nomena such as collocations. But the development of different levels of representations - the y-axis of the DMCDC -model - has already been discussed with respect to other constructions such as Argument Structure Constructions ( ASC ). Herbst (2011: 361-363), for example, summarises that, while ASC s seem to work well for more creative instances of language, more established and often structurally restricted constructions (like give which can be used in a ditransitive construction and explain which cannot) might be more suitably accounted for by quite item-specific valency constructions. Therefore, semi-fixed collocational constructions such as [ pretty + NP ] could also be seen as a type of valency construction. However, at the level of collocation, it is questionable whether abstract, fully open constructions along the line of argument structure constructions actually exist. At least for the interpretation of creative alternations at this level, semi-fixed constructional representation seems to suffice and Hampe and Schönefeld (2006: 150) similarly point out that for more creative syntax “[…] argument structure constructions are not as central to the process as some lower-level (i. e. partially filled) constructions […]”. In a similar vein, more statistical approaches like Bod (2009; 1998) argue that “[t]he regularities we observe in language may be viewed as emergent phenomena, but they cannot be summarized into a consistent non-redundant system that unequivocally defines the structures of new utterances.” Therefore, even though they might be convincing from a theory-building perspective, fully abstract constructions do not necessarily need to be part of a speaker’s cognitive inventory, since, at least to a certain extent, semi-fixed constructions and context in the shape of neighbouring constructions seem to be able to account for creative alternations. Hanks (2013: 49-50) also stresses that in order to interpret new words, a speaker quite possibly relies on “contextual anchoring” as well as the item’s internal structure. And the fact that ‘context’ plays a role in the acceptance of some creative alternations of collocations suggests a more exemplar-based approach in L1 language attainment (> 4.2.1). Today, the importance of contextual factors is largely attributed to more pragmatic considerations (van Dijk 2008; Widdowson 2004), but in fact, context has been one of the defining features for collocations from a very early stage. Recall Firth’s (1957 / 1968: 179) famous definition of collocation as “the company” a word keeps, or Sinclair’s (1991: 112) observation that context can, for example, account for a certain semantic prosody or reading of a noun which co-occurs in context with the verb happen . In the latter case, this context could, however, be regarded as fairly entrenched, which ultimately would make it a semi-fixed construction. Thus, even if it was not explicitly stated at the time, even in the early days of modern collocational research the relationship between context and constructions was part of at least some conceptions of collocation. There- 8.3 Implications for a Usage-based Approach Towards language 259 fore, the fact that this dimension of the DMCDC -model could be influenced by context as well as constructions, both of which seem able to affect the reading of a collocational combination, simply emphasises the cognitive value of contextoriented approaches towards collocation. 8.3.2 Second Language Acquisition and Learning As far as the cognitive representation of collocations in second language learning is concerned, it has often been argued that EFL learners of English produce, and thus quite likely also store, collocations in a different way than native speakers of English. Therefore, Howarth (1996: 60) concludes that “[i]t is reasonable to suggest that learners do not approach the phenomena [collocations in academic writing] from the same direction as native speakers.”, while Granger (1998: 158) observes that “[…] learners’ phraseological skills are severely limited […]”. Furthermore, Ellis and colleagues (2008) were able to show that throughout three different tasks, L1 speakers of English process formulaic sequences faster, the higher the respective MI -value is, while L2 speakers seem to be more sensitive to the raw frequency of the sequences themselves. At first glance, the present study could also be regarded as yet another piece of evidence that non-native speakers of English struggle to process collocations. At the very least, their evaluations differ from native speakers’ judgements in the majority of cases. But, while the result appears to be different, the underlying processes look fairly similar. The major patterns of acceptance (> 6) and evaluation (> 7) can be observed in L1 as well as L2 data. However, EFL learners’ data also yields learner specific patterns: for CollMatch the pattern of Dented Acceptance indicates that young non-native speakers tend to be more careful in their evaluation than teenagers from years 5 and 11, while for CollJudge, L2 evaluators, if in doubt, seem to resort to contextual cues. Nonetheless, in general, the data suggests that in analogy to de Cock (2004: 243), non-native speakers’ receptive collocational knowledge is mainly characterised by over and under-evaluations. Nevertheless, it could be argued that similar cognitive processes produce these results because of a difference in input. Support for this hypothesis comes from Siyanova and Schmitt (2008: 439-449), who report that native-speakers of English seem to intuitively know the respective frequency of collocations, while even advanced EFL learners were not able to judge the plausibility of a collocational combination. From an emergentist perspective, this could be interpreted as a lack of adequate data which leads to a skewered perception of the respective constructions. 260 8 Main Results and Implications Barfield (2009: 107) too observes 3 that “[t]here is evidence, then, that the lexicons of more collocationally proficient learners are distinguishable along two dimensions, those of size and organisation.” Thus, here as well, more input seems to correlate with a more native-like level of proficiency. Furthermore, also for this study, participants who received more input, for example through an immersion programme, scored higher than their peers (> 6.5). This also corresponds with Reber (2009) who postulates two types of category learning: rulebased learning and subconscious information-integration. The observation that advanced as well as IM learners obtain native-like results for some collocations while they struggle with others could therefore also indicate that the former were acquired through information-integration, while the latter are instead based on rule-based learning. However, Howarth (1996: 159) also warns that […] it would be misleading to suggest that learners should attain or aspire to full L1 competence. Given its great complexity and the fact that native speakers themselves can fail to live up to its demands even under optimal conditions of composition, it would be unrealistic to expect such proficiency in all learners. (Howarth 1996: 159) But if, as suggested above, EFL learners need more authentic input in order to form associations and develop categorisations which are similar to a native speaker’s cognitive representations, it would not be at all unrealistic to expect at least a level of proficiency which is similar to the collocational proficiency of less proficient native speakers of English (> 6). However, Herbst and Klotz (2003: 288) note that it is very likely that most EFL learners are simply not aware of certain restrictions. In terms of the perception of collocational combinations, the present study can only partly confirm this observation. Among the 15 items in CollJudge, there is only one, drop a hint / clue , which is accepted in every variant by L2 participants, while L1 speakers only accept the collocation’s creative alternation if there is enough context to support it. In this case, it really seems as if EFL learners are oblivious to potential restrictions and therefore run the risk of producing less authentic combinations. The majority of CollJudg’s items, however, generate a pattern of Contextual Acceptance among adult non-native participants, which might imply that, even if advanced learners of English are able to judge collocations on a level which is similar to the proficiency of L1 3 Barfield (2009: 108) also speculates that adj+N collocations might serve as the foundation of EFL learners’ collocational knowledge. Since the four adj+N collocations in CollJudge produce either skewed or context-dependent patterns, the present study can, however, not claim that these combinations are robust to creativity or change, which would be an indicator that they are learned and stored - more or less as holistic units - from an early stage onwards. 8.3 Implications for a Usage-based Approach Towards language 261 teenagers (> 6), they can quite easily be influenced by contextual factors like a longer, more complex sentence structure (> 7.2). As mentioned before, these findings indicate that, at least for collocations, the process of acquisition in second language attainment seems to be similar to first language acquisition. Of course, as Ellis (2008: 382-396) explains, there are still other factors which might hinder a learner’s native-like proficiency, like the interference of a learner’s L1 or overshadowing and blocking of existing (L1) constructions, as well as the fact that, unlike native speakers of a language, non-native speakers often cannot rely on a long linguistic history of perceptual learning. Thus, authentic, native-like input might only produce the same native-like results to a certain degree. Yet, Ellis (2006) as well argues that there is reason to believe that the underlying cognitive processes of L1 and L2 attainment are very similar, if not the same. Gries and Wulff (2005) also found evidence which suggests that EFL learners, like native speakers, base their evaluations on constructional patterns. Therefore, if the systems which process and produce language seems to work according to the same principles in first and second language attainment, they might as well be provided with the same input. To a certain extent, this emergentist perspective on second language acquisition has already been suggested by Krashen and Terrell (1983). Their natural approach also views communication and interaction as one of the basic factors within L2 processes. Krashen and Terrell (1983: 60) themselves claim that patterns and routines should not be regarded as gradually acquired language but rather as chunks which need to be learnt. But the rather positive results from students who were part of an early partial immersion programme suggest that these classroom settings might indeed be a potential answer to “the great paradox of language teaching” as Krashen and Terrell (1983: 55) call it, namely that “language is best taught when it is being used to transmit messages, not when it is explicitly taught for conscious learning.”. Since this is exactly what immersion programmes do, they might not only provide a good setting for the development of grammar and vocabulary comprehension in the English target language (Steinlen / Piske 2013; Kuska / Zaunbauer / Möller 2010; Zaunbauer / Möller 2007), but also be a suitable method for phraseologically more authentic L2 proficiency. Thus, in order to improve EFL learners’ collocational proficiency, it might help to raise awareness, as Müller (2010) or Lewis (2000) suggest. But training students in the use of dictionaries in class or devising awareness raising tasks can only be regarded as interim solutions. Learning collocations through a variety of mnemonic strategies, as Jehle (2007: 233-279) proposes, might also contribute to a better understanding of some collocational combinations but would still bear the risk of neglecting others. However, this should not imply 262 8 Main Results and Implications that teaching and classroom activities are not useful tools for helping students to develop a good level of collocational proficiency. Instructions or tasks though could presumably profit from a better understanding of the underlying processes of language attainment in general and the development of collocational knowledge in particular, since, as (for example) Pienemann (1988) suggests, it seems that classroom activities are most successful if they aim at a level just above a learner’s current developmental stage. This study provided a first model for potential stages and factors within the L2 attainment of collocations, but at the same time it also showed that this might not be a uniform or linear process. As a consequence, authentic input needs to be combined with differentiated instructions and student-centred teaching. Nevertheless, a strong focus on authentic input, like in immersion programmes, might not be possible in every situation. Therefore, similar to the present study, Nesselhauf (2004: 264) points out that learning needs repetition. This input could, for example, also come from corpora, which could be fruitfully applied in the EFL classroom to enable students to explicitly or implicitly formulate their own research questions on the use or distribution of linguistic phenomena and test them against corpus data. Bernardini (2004: 17) is also convinced that “[corpora] can provide enough evidence and stimuli for the learner to arrive at developmental appropriate generalizations […]”. Ultimately, these “generalisations” might also be regarded as constructions which could help the learner to identify or even develop native-like semantic prosodies or semi-fixed constructional patterns through subconscious information-integration (Reber 2009). Quintessentially, this process of generalisation, which is created through language use in different familiar as well as novel contexts, is what Langacker (1987: 85) describes when he stresses: Putting together novel expressions is something that speakers do, not grammars. It is a problem-solving activity that requires a constructive effort on the part of a speaker and occurs when he puts linguistic convention to use in specific circumstances. Langacker (1987: 85) This quote, which lay at the beginning of this study, also acts as a good summary of what this study has investigated: the effect that novel, creative expressions might have on the perception and ultimately construction of a language. It approached this task with the assumption that language is an “activity” in mind, since, after all, as Langacker points out, language might not be a static, rulebased framework but rather an ever-developing system, which through use in different situations fossilises some phrases and broadens or even reinvents the concept of others. This study sought to show the implications of this perspec- 8.3 Implications for a Usage-based Approach Towards language 263 tive for a small selection of collocational combinations, but its findings appear to support Langacker’s view that “novel expressions” - or ‘creativity’ - as well as “specific circumstances” - in other words, ‘context’ - could be two of the driving forces behind this process. 8.3 Implications for a Usage-based Approach Towards language 265 Appendices 266 Appendices Appendix I: Questionnaire Appendix I: Questionnaire 267 268 Appendices Appendix I: Questionnaire 269 270 Appendices Appendix I: Questionnaire 271 272 Appendices Appendix II: CollMatch (Acceptance Scores) 273 Appendix II: CollMatch (Acceptance Scores) (pseudo-collocations are marked with an asterisk: *) item: native non-native adult year 7 year 9 year 11 FAU year 5 year 9 year 11 1 have a say 91 68 79 88 53 10 5 14 2 lose sleep 98 60 58 63 39 57 55 67 3 do justice 67 49 67 63 86 52 60 76 4 draw a breath 81 43 56 50 28 71 25 48 5 *turn a reason 1 2 7 13 13 52 40 52 6 say grace 99 51 77 75 45 57 45 14 7 *pick a glance 6 17 19 15 37 67 70 67 8 break news 71 60 58 75 69 48 65 57 9 make a move 100 91 95 95 94 86 55 95 10 *claim trade 22 21 37 0 26 30 35 30 11 raise objections 97 40 77 65 75 24 30 33 12 bear witness 97 38 53 50 84 57 40 48 13 *supply one‘s assistance 51 36 40 40 59 67 85 81 14 give a speech 100 91 93 90 92 67 80 90 15 serve a sentence 84 38 51 48 37 48 25 29 16 *stretch a regard 7 9 28 13 10 48 25 24 17 *restore a favour 38 51 70 38 32 43 35 52 18 keep pets 98 83 84 70 71 86 60 62 19 catch fire 99 70 74 78 92 52 95 76 20 hold meetings 99 72 84 70 82 38 30 67 21 pull a face 99 94 88 93 46 43 20 29 22 run a bath 100 89 95 90 37 43 30 67 274 Appendices item: native non-native adult year 7 year 9 year 11 FAU year 5 year 9 year 11 23 throw a party 100 94 95 90 83 52 85 71 24 *shake a smile 9 19 23 13 5 33 20 14 25 set an example 100 96 93 98 82 81 55 70 26 *fetch an illness 2 23 9 8 24 38 30 52 27 drop hints 100 55 77 88 84 57 55 67 28 play a trick 98 91 84 78 85 90 50 81 29 pay attention 100 98 93 93 100 24 95 100 30 meet a need 76 19 63 45 56 14 16 10 31 reach a conclusion 100 66 81 78 67 38 50 81 32 *drag a limit 2 11 16 10 25 71 35 62 33 *gather a matter 12 23 42 13 18 52 20 10 34 assume responsibility 90 60 70 58 44 48 60 67 35 suffer damage 88 66 51 60 86 43 45 57 36 cut a corner 100 87 88 80 48 67 25 43 37 fly a flag 92 53 74 38 13 38 20 10 38 realise a potential 90 38 58 58 77 29 55 76 39 *sink speed 2 21 19 3 11 43 35 24 40 fit the bill 94 36 67 58 44 52 60 57 41 push one‘s luck 99 74 77 78 83 76 60 71 42 gain ground 74 38 58 53 85 43 55 57 43 perform a miracle 95 68 81 65 45 33 65 76 44 *win one‘s memory 8 23 30 15 17 62 30 52 Appendix II: CollMatch (Acceptance Scores) 275 item: native non-native adult year 7 year 9 year 11 FAU year 5 year 9 year 11 45 *impose success 28 38 47 33 26 43 20 29 46 adopt an approach 98 26 40 33 77 38 35 47 47 clear one‘s throat 99 57 67 68 93 43 55 76 48 strike a blow 92 38 58 54 67 67 35 43 49 beat eggs 99 68 86 80 36 57 25 29 50 employ a technique 91 43 56 48 60 29 60 76 51 press charges 100 96 93 88 75 43 50 48 52 settle a dispute 99 43 49 60 72 29 25 57 53 *swing a secret 6 19 40 15 8 52 25 10 54 grant permission 98 72 79 80 91 29 65 81 55 *express a worry 93 34 74 53 87 62 80 71 56 *rule an award 5 34 44 30 16 43 35 38 57 commit a sin 98 45 72 70 79 33 20 52 58 launch a campaign 99 68 74 78 91 57 60 57 59 *stick one‘s mood 5 15 37 10 28 43 55 38 60 acquire a skill 99 66 63 69 93 33 50 81 61 deliver a speech 99 89 93 83 64 38 50 90 62 spread one‘s wings 99 55 60 63 97 29 60 71 63 assess damage 95 47 70 50 34 48 20 29 64 afford an opportunity 41 57 49 35 25 52 55 52 65 ride a storm 66 26 40 23 34 38 35 38 276 Appendices item: native non-native adult year 7 year 9 year 11 FAU year 5 year 9 year 11 66 jump a queue 99 77 72 83 48 48 40 62 67 *score problems 17 34 37 28 26 52 45 38 68 *roll a look 15 32 47 18 10 48 30 5 69 exercise discretion 70 17 30 23 41 38 30 43 70 blow one‘s nose 100 53 72 73 69 43 35 38 71 *rush rank 6 9 21 10 15 38 25 19 72 steal someone‘s thunder 97 53 63 60 39 33 45 48 73 dress a wound 99 55 79 63 26 67 40 29 74 pursue a career 100 51 74 73 75 33 20 67 75 challenge a view 98 32 72 55 60 62 55 57 76 *knock a concern 23 32 28 20 17 43 35 24 77 *lay pressure 45 36 58 38 66 38 45 62 78 *pack an affair 6 23 35 25 16 57 50 33 79 abandon ship 98 81 70 75 63 48 45 43 80 clean windows 99 87 81 80 98 76 80 100 81 dismiss an idea 100 64 70 65 80 38 25 38 82 shift gear 94 53 65 55 77 33 35 43 83 justify one‘s existence 94 32 47 35 85 19 45 76 84 *bind blood 10 21 40 18 24 48 30 33 85 *charge respect 12 34 42 33 38 57 60 43 86 cast a vote 100 81 84 80 69 38 55 86 87 kick one‘s heels 81 47 51 50 69 33 40 38 88 bend a rule 95 66 74 68 70 48 60 48 Appendix III: REC items vs. alternate combinations 277 item: native non-native adult year 7 year 9 year 11 FAU year 5 year 9 year 11 89 *fill an aim 28 38 49 30 23 38 40 48 90 lend support 95 77 79 63 45 43 35 33 91 sustain an injury 98 62 65 60 47 33 30 71 92 *hit approval 15 40 33 28 40 33 30 43 93 cease fire 94 55 67 53 66 33 30 43 94 snap one‘s fingers 99 62 72 65 93 62 75 95 95 shrug one‘s shoulders 100 72 74 68 87 38 85 67 96 *stand an occasion 19 23 42 35 33 43 15 38 97 grab a hold 76 43 58 48 78 57 45 52 98 *sit seed 0 17 23 13 10 43 10 19 99 *fall a failure 1 17 26 18 10 33 25 38 100 file a report 100 64 91 85 79 62 65 76 Appendix III: REC items vs. alternate combinations Item token frequency ( BNC ) alternate type combinations ( BNC , raw frequency) syntagmatic paradigmatic play a trick 3 play* a trick (16) play* tricks (74) VP : perform (43), conjure (13) NP : role (3447), part (3141), game (2030) press charges** 30 press* the charge (3) press* a charge (0) VP : bring (230), prefer (18) NP : fine (0) 278 Appendices Item token frequency ( BNC ) alternate type combinations ( BNC , raw frequency) syntagmatic paradigmatic afford an opportunity 2 afford* the opportunity (18) afford* [someone] the opportunity (9) afford* opportunities (6) VP : give (1340), provide (1010), offer (679) NP : chance (14) clean windows** 4 clean* the windows (31) cleaning windows (12) clean a window (1) VP : wash (6) NP : n / a keep pets** 9 keep* pets (16) keep* a pet (9) VP : have (158), own (6) NP : n / a hold meetings** 25 hold* a meeting* (87) hold* meetings (72) VP : have (1763), organise (88) NP : conference (568) assume responsibility 50 assume* responsibility* (106) assume* the responsibility* (13) VP : have (1660), take (1031), claim (155) NP : power (98), duty (21)r cut a corner** 1 cut the corner (7) cut* corners (54) VP : turn (292) NP : n / a fly a flag 1 fly* the flag (54) VP : wave (66), hoist (24) NP : banner (13) perform a miracle** 3 perform* miracles (24) VP : work (62) NP : miracle (68), wonder (10) deliver a speech** 3 deliver* a speech (21) VP : make (576), give (160), read (79) NP : message (104), lecture (86) ride a storm 0 ride out the storm (12) ride the storm (6) VP : n / a NP : wave (56) Appendix III: REC items vs. alternate combinations 279 Item token frequency ( BNC ) alternate type combinations ( BNC , raw frequency) syntagmatic paradigmatic lend support** 29 lend* support (65) lend* [someones] support (42) VP : give (983), provide (840), offer (380) NP : money (261), hand (110) cease fire 9 a ceasefire (757) VP : stop (24) NP : hostility (22) shrug one's shoulders** 1 shrug* [someone’s] shoulders (221) VP : n / a NP : hand (23) suffer damage 12 sufffer* damage (64) VP : cause (1153), NP : injury (494), loss (432) abandon ship** 12 abandon* ship (15) VP : leave (78) NP : n / a * contains all realisations of this lemma ** scores of all age groups lie above 60 % 280 Appendices Appendix IV: Raw Frequency Rankings and Association Measures for CollJudge Appendix V: CollJudge (z-transformed acceptance scores) 281 * last item within the threshold value for the respective association measure (> 5.1.1) ** this item shows a negative value within asymptotic tests, which could be interpreted as statistical rejection Appendix V: CollJudge (z-transformed acceptance scores) Appendix V: CollJudge (z-transformed acceptance scores) 283 References Aitken, Adam J., Richard W. Bailey and Neil Hamliton-Smith (1973). The Computer and Literary Studies. Edinburgh: Edinburgh University Press. Akhtar, Nameera and Michael Tomasello (1996). "Two-year-olds learn words for absent objects and actions". British Journal of Developmental Psychology 14(1), 79-93. Albert, Ruth and Nicole Marx (2010). Empirisches Arbeiten in Linguistik und Sprachlehrforschung - Anleitung zu quantitativen Studien von der Planungsphase bis zum Forschungsbericht. Tübingen: Narr Verlag. Allen, Lorraine G. (1980). "A note on measurement of contingency between two binary variables in judgment tasks". Bulletin of Psychonomic Society 15(3), 147-149. Ambridge, Ben and Adele Goldberg (2008). "The island status of clausal complements: Evidence in favor of an information structure explanation". Cognitive Linguistics 19(3), 349-38. Ambridge, Ben and Elena Lieven (2011). Child Language Acquisition - Contrasting Theoretical Approaches. Cambridge: Cambridge University Press. Ambridge, Ben, Julian M. Pine and Caroline F. Rowland (2011). “Children use verb semantics to retreat from overgeneralization errors: A novel verb grammaticality judgment study”. Cognitive Linguistics 22(2), 303-323. Anglin, Jeremy M. (1993). Vocabulary Development - A Morphological Analysis. Online: Monographs of the Society for Research, 58(10), Serial No. 238. Accessed via: http: / / www.jstor.org/ stable/ 1166112. Anglin, Jeremy M. (1977). Word, object, and conceptual development. New York: W. W. Norton. Archer, Dawn, Paul Rayson, Andrew Wilson and Tony McEnery (2003). Proceedings of the Corpus Linguistics 2003 Conference. Lancaster: University Centre for Computer Corpus Research on Language. Aronoff, Mark (1976). Word formation in generative grammar. Cambridge, Mass.: M. I. T. Press. Aronoff, Mark and Frank Anshen (1998). "Morphology and the lexicon: Lexicalization and Productivity". In: Andrew Spencer and Arnold M. Zwicky (eds.). The Handbook of Morphology. Oxford/ Malden, MA : Blackwell, 237-247. Aston, Guy and Lou Burnard (1998). The BNC Handbook - Exploring the British National Corpus with SARA . Edinburgh: Edinburgh University Press. Atkins, Sue, Jeremy Clear and Nicholas Ostler (1992). "Corpus design criteria". Literary and linguistic computing 7, 1-16. Autorengruppe Bildungsberichterstattung (2014). Bildung in Deutschland 2014 - Ein indikatorengestützter Bericht mit einer Analyse zur Bildung von Menschen mit Behinderungen. Bielefeld: Bertelsmann Verlag: http: / / www.bildungsbericht.de/ daten2014/ bb_2014.pdf. 284 References Baayen, R. Harald (2009). "Corpus linguistics in morphology: Morphological productivity". In: Anke Lüdeling and Merja Kytö (eds.). Corpus linguistics - An international handbook. Volume 2. Berlin/ New York: Walter de Gruyter, 899-919. Baayen, R. Harald (2001). Word frequency distributions. Dordrecht/ Boston: Kluwer Academic Publishers. Baayen, R. Harald (1992). "Quantitative aspects of morphological productivity". Yearbook of Morphology , 109-149. Baayen, R. Harald and Rochelle Lieber (1991). "Productivity and English derivation: A corpus-based study". Linguistics 29(5), 801-843. Bahns, Jens (1997). Kollokationen und Wortschatzarbeit im Englischunterricht. Tübingen: Gunter Narr Verlag. Bahns, Jens (1996). Kollokationen als lexikographisches Problem - Eine Analyse allgemeiner und spezieller Lernerwörterbücher des Englischen. Tübingen: Niemeyer. Bailey, Charles-James N. and Roger W. Shuy (1973). New ways of Analyzing Variation in English. Washington: Georgetown University Press. Baker, Mona, Gil Francis and Elena Tognini-Bonelli (1993). Text and Technology - In Honour of John Sinclair. Philadelphia/ Amsterdam: John Benjamins Publishing Company. Bar, Moshe (2007). “The proactive brain: using analogies and associations to generate predictions”. TRENDS in Cognitive Sciences 11(7), 280-289. Bardel, Camilla, Batia Laufer and Christina Lindqvist (2013). L2 vocabulary acquisition, knowledge and use: New perspectives on assessment and corpus analysis. Online: European Second Language Association. Available online via: http: / / www.eurosla. org/ monographs/ EM02/ TOC .pdf. Barfield, Andrew (2009). "Exploring Productive L2 Collocation Knowledge". In: Tess Fitzpatrick and Andrew Barfield (eds.). Lexical Processing in Second Language Learners. Bristol/ Buffalo/ Toronto: Multilingual Matters, 95-110. Barfield, Andrew and Henrik Gyllstad (2009). Researching collocations in another language - Multiple interpretations. Basingstoke: Palgrave Macmillan. Bargh, John A., Mark Chen and Lara Burrows (1996). "Automaticity of Social Behavior: Direct Effects of Trait Construct and Stereotype Activation on Action". Journal of Personality and Social Psychology 71(2), 230-244. Barnbrook, Geoff, Oliver Mason and Ramesh Krishnamurthy (2013). Collocation - Applications and Implications. London: Palgrave Macmillan. Bartsch, Sabine (2004). Structural and Functional Properties of Collocations in English. Tübingen: Gunter Narr Verlag. Bauer, Laurie (2001). Morphological productivity. Cambridge/ New York: Cambridge University Press. Bazell, Charles E., John C. Catford, Michael A. K. Halliday and Robert H. Robins (1966). In Memory of J. R. Firth. London: Longman. Benson, Morton, Evelyn Benson and Robert Ilson (1997). The BBI dictionary of English Word Combinations. Amsterdam/ Philadelphia: John Benjamins Publishing Company. References 285 Berez, Andrea L. and Stefan Th. Gries (2009). "In defense of corpus-based methods: A behavioral profile analysis of polysemous get in English". In: Steven Moran, Darren S. Tanner and Michael Scanlon (eds.). Proceedings of the 24th Northwest Linguistics Conference - University of Washington Working Papers in Linguistics Vol. 27. Seattle, WA : Department of Linguistics, 157-166. Bergen, Benjamin K. and Nancy Chang (2005). "Embodied Construction Grammar in Simulation-Based Language Understanding”. In: Jan-Ola Ostman and Mirjam Fried (eds.). Construction Grammars - Cognitive Grounding and Theoretical Extensions. Amsterdam: John Benjamins Publishing Company, 147-190. Bergenholtz, Henning and Joachim Mugdan (1985). Lexikographie und Grammatik - Akten des Essener Kolloquiums zur Grammatik im Wörterbuch, 28.-30. 6. 1984. Tübingen: Niemeyer. Bergs, Alexander (2012). “New Perspectives, Theories and Methods: Construction Grammar”. In: Alexander Bergs and Laurel J. Brinton (eds.). English Historical Linguistics - An International Handbook. Berlin/ Boston: De Gruyter Mouton, 1631-1646. Bergs, Alexander and Laurel J. Brinton (2012). English Historical Linguistics - An International Handbook. Berlin/ Boston: De Gruyter Mouton. Bergs, Alexander and Gabrile Diewald (2008). Constructions and Language Change. Berlin/ New York: Mouton de Gruyter. Berko, Jean (1958). “The child’s acquisition of English morphology”. Word Journal of the International Linguistic Association 14, 150-177. Berlin, Brent and Paul Kay (1969). Basic color terms - Their universality and evolution. Berkeley/ Los Angeles: University of California Press. Bernardini, Silvia (2004). "Corpora in the classroom: An overview and some reflections on future developments". In: John M. Sinclair (ed.). How to use corpora in language teaching. Amsterdam: John Benjamins Publishing Company, 15-36. Berry-Rogghe, Godelieve L. M. (1973). "The computation of collocations and their relevance to lexical studies". In: Adam J. Aitken, Richard W. Bailey and Neil Hamilton- Smith (eds.). The Computer and Literary Studies. Edinburgh: Edinburgh University Press, 103-112. Biber, Douglas (1993). "Co-occurrence Patterns among Collocations: A Tool for Corpus-Based Lexical Knowledge Acquisition". Computational Linguistics 19(2), 531-538. Blue, George M. and Rosamond Mitchell (1995). Language and Education - Papers from the Annual Meeting of the British Association for Applied Linguistics held at the University of Southhampton. Clevedon/ Philadelphia/ Adelaide: British Association for Applied Linguistics in association with Multilingual Matters. Boas, Hans and Ivan Sag (2012). Sign-Based Construction Grammar. Stanford: CSLI Publications. Bod, Rens (2009). "Constructions at Work or at Rest? ". Cognitive Linguistics 20(1), 129-134. 286 References Bod, Rens (1998). Beyond Grammar - An Experience-Based Theory of Language. Stanford, CA : CSLI Publications. Bod, Rens (2009a). “From Exemplar to Grammar: A Probabilistic Analogy-Based Model of Language Learning”. Cognitive Science 33(5), 752-793. Boden, Margaret A. (2013). "Creativity as a Neuroscientific Mystery”. In: Oshin Vartanian, Adam S. Bristol and James C. Kaufmann (eds.). Neuroscience of Creativity. Cambridge, MA : M. I. T. Press, 3-18. Boden, Margeret A. (2001). "Creativity and Knowledge“. In: Anna Craft, Bob Jeffrey and Mike Leibling (eds.). Creativity in education. London/ New York: Continuum, 95-102. Bonk, William J. (2001). "Testing ESL learners' knowledge of collocations". In: Thom Hudson and James Dean Brown (eds.). A focus on language test development - Expanding the language proficiency construct across a variety of tests. Honolulu: University of Hawaii, 113-142. Booij, Geert (2010). "Construction Morphology". Language and Linguistics Compass 4(7), 543-555. Bortz, Jürgen and Christof Schuster (2010). Statistik für Human- und Sozialwissenschaftler. 7th edition. Berlin/ Heidelberg/ New York: Springer Verlag. Braðdal, Joanna (2008). Productivity evidence from case and argument structure in Icelandic. Amsterdam/ New York: John Benjamins Publishing Company. Braðdal, Joanna, Elena Smirnova, Lotte Sommerer and Spike Gildea (2015). Diachronic Construction Grammar. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Braine, Martin D. S. (1963). “The ontogeny of English phrase structure”. Language 39, 1-14. Brooks, Patricia J. and Michael Tomasello (1999). “How Children Constrain Their Argument Structure Constructions”. Language 75(4), 720-738. Brooks, Patricia J., Michael Tomasello, Kelly Dodson and Lawrence B. Lewis (1999). “Young Children's Overgeneralizations with Fixed Transitivity Verbs”. Child Development 70(6), 1325-1337. Bublitz, Wolfram (1996). "Semantic prosody and cohesive company: somewhat predictable'". Leuvense Bijdragen 85, 1-32. Bublitz, Wolfram (1995). Semantic Prosody and Cohesive Company. Duisburg: L. A. U. D. Burgschmidt, Ernst (1973). System, Norm und Produktivität in der Wortbildung - Aufsätze. Erlangen: E. Burgschmidt. Bybee, Joan (2013). "Usage-based Theory and Exemplar Represenations of Constructions". In: Thomas Hoffman and Graeme Trousdale (eds.). The Oxford Handbook of Construction Grammar. Oxford/ New York: Oxford University Press, 49-69. Bybee, Joan (2010). Language, Usage and Cognition. Cambridge: Cambridge University Press. Bybee, Joan (2007). Frequency of Use and the Organization of Language. Oxford/ New York: Oxford University Press. References 287 Bybee, Joan (1995). “Regular morphology and the lexicon”. Language and Cognitive Processes 10(5), 425-455. Bybee, Joan (1985). Morphology - A study of the relation between meaning and form. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Bybee, Joan and David Eddington (2006). "A usage-based approach to Spanish verbs of 'becoming'". Language 82, 323-355. Bybee, Joan and Paul Hopper (2001). Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins Publishing Company. Bybee, Joan and Carol L. Moder (1983). "Morphological Classes as Natural Categories". Language 59(2), 251-270. Bybee, Joan and William Pagliuca (1987). “The evolution of future meaning”. In: Anna Giacalone Ramat, Onofrio Carruba and Giuliano Bernini (eds.). Papers from the 7th International Conference on Historical Linguistics. Amsterdam/ Philadelphia: John Benjamins Publishing Company, 108-122. Bybee, Joan and Dan I. Slobin (2007). "Rules and Schemas in the Development of the English Past Tense". In: Joan Bybee (ed.). Frequency of Use and the Organization of Language. Oxford/ New York: Oxford University Press, 101-126. Cacciari, Cristina and Dedre Gentner (1995). Similarity in language, thought and perception. Turnhout: Brepols. Cameron-Faulkner, Thea, Elena Lieven and Michael Tomasello (2003). "A construction based analysis of child directed speech". Cognitive Science 27(6), 843-873. Carroll, Lewis (1871 / 2001). The annotated Alice - The Definitive Edition: Alice's Adventures in Wonderland and Through the Looking-Glass. London/ New York/ Camberwell/ Toronto/ amongst others: Penguin. Carter, Ronald (2004). Language and Creativity - The Art of Common Talk. Abingdon/ New York: Routledge. Cazden, Courtney B. (1968). “The Acquisition of Noun and Verb Inflections”. Child Development 39(2), 433-448. Channell, Joanna (2000). "Corpus-based Analysis of Evaluative Lexis". In: Susan Hunston and Geoff Thompson (eds.). Evaluation in Text - Authorial Stance and the Construction of Discourse. Oxford: Oxford University Press, 38-55. Chipere, Ngoni (2003). Understanding Complex Sentences - Native Speaker Variation in Syntactic Competence. Basingstoke/ New York: Palgrave Macmillan. Chomsky, Noam (1995). The Minimalist program. Cambridge, MA / London: M. I. T. Press. Chomsky, Noam (1992). “Some Notes on Economy of Derivation and Representation”. ASJU 27, 53-82. Chomsky, Noam (1972). Language and Mind. New York/ Chicago/ San Francisco/ Atlanta: Harcourt Brace Jovanovich. Chomsky, Noam (1965). Aspects of the Theory of Syntax. Cambridge, MA : M. I. T. Press. Chomsky, Noam (1964). Current Issues in Linguistics Theory. The Hague: Mouton. Chomsky, Noam (1959). "A Review of B. F. Skinner's Verbal Behavior". Language 35(1), 26-58. 288 References Chomsky, Noam (1957). Syntactic Structures. The Hague: Mouton. Christiansen, Morten H. and Nick Chater (2008). “Language as Shaped by the Brain”. Behavioral and Brain Sciences 31, 489-509. Church, Kenneth and Patrick Hanks (1990). "Word Association Norms, Mutual Information, and Lexicography". Computational Linguistics 16(1), 22-29. Church, Kenneth, Patrick Hanks, Donald Hindle and William Gale (1991). “Using Statistics in Lexical Analysis”. In: Uri Zernik (ed.). Lexical Acquisition - Using On-line Resources to Build a Lexicon. Hillsdale: Erlbaum, 115-164. Clark, Eve (2009). First Language Acquisition. 2nd edition. Cambridge/ New York: Cambridge University Press. Clark, Eve (1995). “Later lexical development and word formation”. In: Paul Fletcher and Brain MacWhinney (eds.). The Handbook of Child Language. Oxford: Blackwell, 393-412. Clark, Eve (1993). The Lexicon in Acquisition. Cambridge: Cambridge University Press. Clark, Eve (1988). "On the logic of contrast". Journal of Child Language 15(2), 317-335. Clark, Eve and Herbert Clark (1979). "When Nouns Surface as Verbs". Language 55(4), 767-811. Clark, Eve and Herbert Clark (1977). Psychology and Language - An Introduction to Psycholinguistics. New York/ Chicago/ San Francisco/ Atlanta: Harcourt Brace Jovanovich. Clark, Herbert (1978). "Inferring What is Meant". In: Willem J. M. Levelt and Giovanni B. Flores D'Arcais (eds.). Studies in the perception of language. London: Wiley, 295-322. Cock, Sylvie de (2004). "Preferred sequences of words in NS and NNS speech". Belgian journal of English language and literature , 225-246. Cock, Sylvie de (2003). Recurrent sequences of words in native speaker and advanced learner spoken and written English: a corpus-driven approach. Université Catholique de Louvain: unpublished. Cock, Sylvie de (1999). “Repetitive phrasal chunkiness and advanced EFL speech and writing”. In: Christian Mair and Marianne Hundt (eds.). Corpus Linguistics and Linguistic Theory - Papers from the Twentieth International Conference on English Language Research on Computerized Corpora ( ICAME 20). Amsterdam/ Atlanta: Rodopi, 51-68. Cohen, Andrew D. and Merrill Swain (1976). "Bilingual Education: The "Immersion" Model in the North American Context". TESOL Quarterly 10(1), 45-53. Cole, Peter and Jerry L. Morgan (1975). Syntax and semantics - Volume 3: Speech Acts. New York/ London: Academic Press. Conklin, Kathy and Norbert Schmitt (2008). "Formulaic sequences: Are they processed more quickly than non-formulaic language by native and nonnative speakers? ". Applied Linguistics 29(1), 72-89. Coseriu, Eugenio (1973). Probleme der strukturellen Semantik. Tübingen: Narr. Coseriu, Eugenio (1967). "Lexikalische Solidaritäten". Poetica 1(3), 293-303. References 289 Cowart, Wayne (1997). Experimental Syntax - Applying Objective Methods to Sentence Judgements. Thousand Oaks/ London/ New Dehli: Sage Publications. Cowie, A. P. and Peter Howarth (1995). “Phraseological Competence and Written Proficiency”. In: George M. Blue and Rosamond Mitchell (eds.). Language and Education - Papers from the Annual Meeting of the British Association for Applied Linguistics held at the University of Southhampton. Clevedon/ Philadelphia/ Adelaide: British Association for Applied Linguistics in association with Multilingual Matters, 80-93. Cowie, Anthony Paul (2012). “ IJL : Dictionaries, Language Learning and Phraseology”. International Journal of Lexicography 25(4), 386-392. Cowie, Anthony Paul (1999). English Dictionaries for Foreign Learners - A History. Oxford: Oxford University Press. Cowie, Anthony Paul (1998). Phraseology - Theory, Analysis, and Applications. Oxford: Clarendon Press. Cowie, Anthony Paul (1983). "General Introduction". In: Anthony Paul Cowie, Ronald Mackin and I. R. McCaig (eds.). Oxford dictionary of English Idioms. Oxford: Oxford University Press, x-xvii. Cowie, Anthony Paul, Ronald Mackin and I. R. McCaig (1983). Oxford dictionary of English Idioms. Oxford: Oxford University Press. Craft, Anna (2001). “Little c Creativity". In: Anna Craft, Bob Jeffrey and Mike Leibling (eds.). Creativity in education. London/ New York: Continuum, 45-61. Craft, Anna, Bob Jeffrey and Mike Leibling (2001). Creativity in education. London/ New York: Continuum. Croft, William (1998). "Linguistic evidence and mental representations". Cognitive Linguistics 9, 151-173. Croft, William (1993). “The role of domains in the interpretation of metaphors and metonymies”. Cognitive Linguistics 4, 332-370. Croft, William and Allan D. Cruse (2004). Cognitive linguistics. Cambridge/ New York: Cambridge University Press. Cruse, Allan D. (1986). Lexical semantics. Cambridge/ New York: Cambridge University Press. Crystal, David (1998). Language Play. London/ New York/ Victoria/ Ontario/ Auckland: Penguin. Culpeper, Jonathan (2005). "Impoliteness and Entertainment in the Television Quiz Show: The Weakest Link". Journal of Politeness Research. Language, Behaviour, Culture 1(1), 35-72. Dąbrowska, Ewa (2015). "Individual differences in grammatical knowledge". In: Ewa Dąbrowska and Dagmar Divjak (eds.). Handbook of Cognitive Linguistics. Berlin: De Gruyter Mouton, 650-668. Dąbrowska, Ewa (2012). "Different speakers, different grammars". Linguistic Approaches to Bilingualism 2(3), 219-253. Dąbrowska, Ewa (2010). "Native vs. expert intuitions: An empirical study of acceptability judgements". The Linguistic Review 27, 1-23. 290 References Dąbrowska, Ewa (2004). Language, Mind and Brain - Some Psychological and Neurological Constraints on Theories of Grammar. Edinburgh: Edinburgh University Press. Dąbrowska, Ewa (1997). "The LAD goes to school". Linguistics 35, 735-766. Dąbrowska, Ewa and Dagmar Divjak (2015). Handbook of Cognitive Linguistics. Berlin: De Gruyter Mouton: http: / / www.reference-global.com/ doi/ book/ 10.1515/ 9783110292022. Dąbrowska, Ewa and James A. Street (2006). "Individual Differences in language attainment: Comprehension of passive sentences by native and non-native English speakers". Language Science 28, 604-615. Davies, Mark (2008). The Corpus of Contemporary American English: 450 million words, 1990-present. Available online via: http: / / corpus.byu.edu/ coca/ . Dennis, Sally F. (1965). "The construction of a thesaurus automatically from a sample of text". Proceedings of the Symposium on Statistical Association Methods For Mechanized Documentation. Washington, DC , 61-148. DESI -Konsortium (2008). Unterricht und Kompetenzerwerb in Deutsch und Englisch - Ergebnisse der DESI -Studie. Weinheim: Beltz. Diewald, Gabriele (2002). "A model for relevant types of contexts in grammaticalisation". In: Gabriele Diewald and Ilse Wischer (eds.). New Reflections on Grammaticalization. Amsterdam/ Philadelphia: John Benjamins Publishing Company, 103-120. Diewald, Gabriele and Ilse Wischer (2002). New Reflections on Grammaticalization. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Doyen, Stéphane, Olivier Klein, Cora-Lise Pichon and Axel Cleeremans (2012). "Behavioral priming: it's all in the mind, but whose mind? ". PL oS ONE 7(1), e29 081. Dunning, Ted (1993). "Accurate methods for the statistics of surprise and coincidence". Computational Linguistics 19(1), 61-74. Ellis, Nick (2008). "Usage-based and form-focused Language Acquisition: The associative learning of constructions, learned attention, and the limited L2 endstate". In: Peter Robinson and Nick Ellis (eds.). Handbook of Cognitive Linguistics and Second Language Acquisition. New York/ London: Routledge, 372-405. Ellis, Nick (2006). “Language Acquisition as Rational Contingency Learning.”. Applied Linguistics 27(1), 1-24. Ellis, Nick (1996). “Sequencing in SLA : phonological memory, chunking and points of order”. Studies in Second Language Acquisition 18, 91-126. Ellis, Nick and Diane Larsen-Freeman (2009). Language as a complex adaptive system. Malden, MA / Oxford/ Chichester: Wiley-Blackwell. Ellis, Nick, Rita Simpson-Vlach and Carson Maynard (2008). “Formulaic Language in Native and Second Language Speakers: Psycholinguistics, Corpus Linguistics and TESOL “. TESOL Quarterly 42(3), 375-396. Elman, Jeff L. (2001). “Connectionism and language acquisition”. In: Michael Tomasello and Elisabeth Bates (eds.). Language Development - The Essential Readings. Oxford: Blackwell, 295-306. Everaert, Martin, Erik-Jan van der Linden, Andr Schenk and Rob Schreuder (1995). Idioms - Structural and psychological perspectives. Hillsdale, NJ : Erlbaum. References 291 Evert, Stefan (2009). “Corpora and Collocations”. In: Anke Lüdeling and Merja Kytö (eds.). Corpus linguistics - An international handbook. Volume 2. Berlin/ New York: Walter de Gruyter, 1212-1248. Evert, Stefan (2005). The Statistics of Word Cooccurrences. Available online via: http: / / elib.uni-stuttgart.de/ opus/ volltexte/ 2005/ 2371/ . Evert, Stefan and Brigitte Krenn (2001). "Models for the Qualitative Evaluation of Lexical Association Measures". Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France , 188-195. Eyckmans, June (2009). "Towards an assessment of learners' receptive and productive syntagmatic knowledge". In: Andrew Barfield and Henrik Gyllstad (eds.). Researching collocations in another language - Multiple interpretations. Basingstoke: Palgrave Macmillan, 139-152. FAU Sprachenzentrum (2007). "Eignungsfeststllungsverfahren - Second Part". Available online via: http: / / www.sz.uni-erlangen.de/ abteilungen/ englisch/ download/ efva.pdf. Faulhaber, Susen (2011). Verb valency patterns - A challenge for semantics-based accounts. Berlin/ New York: De Gruyter Mouton. Fillmore, Charles J. (1988). “The mechanisms of ‘Construction Grammar". Berkely Linguistic Society 14, 35-55. Fillmore, Charles J. (1985). “Frames and the semantics of understanding”. Quaderni di Semantica VI , 222-254. Fillmore, Charles J. and Beryl T. Atkins (1992). “Toward a frame-based lexicon: the semantics of RISK and its neighbors”. In: Adrienne Lehrer and Eva Kittay (eds.). Frames, Fields, and Contrasts. Hillsdale, NJ : Lawrence Erlbaum, 75-102. Fillmore, Charles J., Paul Kay and Mary Catherine O’Connor (1988). “Regularity and Idiomaticity in Grammatical Constructions: The Case of Let Alone”. Language 64(3), 501-538. Fillmore, Charles J., Russell R. Lee-Goldman and Russell Rhodes (2012). "The FrameNet Constructicon". In: Hans Boas and Ivan Sag (eds.). Sign-Based Construction Grammar. Stanford: CSLI Publications, 309-372. Firth, John R. (1968). “A Synopsis of Linguistic Theory, 1930-55”. In: Frank R. Palmer (ed.). Selected Papers of J. R. Firth 1952-59. London/ Harlow: Longman, 168-205. Firth, John R. (1951 / 1964). “Modes of Meaning”. In: John R. Firth (ed.). Papers in Linguistics 1934-1951. London/ New York/ Toronto: Oxford University Press, 190-215. Firth, John R. (1951 / 1964). Papers in Linguistics 1934-1951. London/ New York/ Toronto: Oxford University Press. Firth, John R. (1951 / 1964). "The Technique of Semantics". In: John R. Firth (ed.). Papers in Linguistics 1934-1951. London/ New York/ Toronto: Oxford University Press, 7-33. Firth, John R. (1948). "Sounds and Prosodies". Philological Society 47(1), 127-152. Fischer, Kerstin and Anatol Stefanowitsch (2006). Konstruktionsgrammatik - Von der Andwendung zur Theorie. Tübingen: Stauffenburg Verlag. 292 References Fischer, Kerstin and Anatol Stefanowitsch (2006). “Konstruktionsgrammatik ein Überblick”. In: Kerstin Fischer and Anatol Stefanowitsch (eds.). Konstruktionsgrammatik - Von der Anwendung zur Theorie. Tübingen: Stauffenburg Verlag, 3-17. Fisher, Ronald A. (1922). "On the Interpretation of X2 from Contingency Tables, and the Calculation of P". Journal of the Royal Statistical Society 85(1), 87-94. Fitzpatrick, Tess and Andrew Barfield (2009). Lexical Processing in Second Language Learners. Bristol/ Buffalo/ Toronto: Multilingual Matters. Fletcher, Paul and Brain MacWhinney (1995). The Handbook of Child Language. Oxford: Blackwell. Fodor, Jerry and Zenon Pylyshyn (1988). “Connectionism and cognitive architecture: a critical analysis“. Cognition 28(1-2), 3-71. Fried, Mirjam and Jan-Ola Östmann (2004). Construction Grammar in a Cross-Language Perspective. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Games, Paul A. and John F. Howell (1976). "Pairwise multiple comparison procedures with unequal N's and / or variances: A Monte Carlo study". Journal of Educational Statistics 1(2), 113-125. Ganger, Jennifer and Michael R. Brent (2004). "Reexamining the vocabulary spurt". Developmental psychology 40(4), 621-632. García, Ofelia (2009). Bilingual education in the 21st century - A global perspective. Oxford: Wiley-Blackwell. Gaviolo, Laura (2005). Exploring Corpora for ESP Learning. Amsterdam: John Benjamins Publishing Company. Geeraerts, Dirk (2009). Theories of Lexical Semantics - A Cognitive Perspective. Oxford: Oxford University Press. Geeraerts, Dirk and Hubert Cuyckens (2007). The Oxford Handbook of Cognitive Linguistics. Oxford/ New York: Oxford University Press. Genesee, Fred (1987). Learning through two languages - Studies of immersion and bilingual education. Cambridge, MA : Newbury House. Gentner, Dedre and Arthur B. Markman (1995). "Similarity is like analogy: structural alignment in Comparison". In: Cristina Cacciari and Dedre Gentner (eds.). Similarity in language, thought and perception. Turnhout: Brepols, 111-147. Gershkoff-Stowe, Lisa and Esther Thelen (2004). "U-Shaped Changes in Behavior: A Dynamic Systems Perspective". Journal of Cognition and Development 5(1), 11-36. Giacalone Ramat, Anna, Onofrio Carruba and Giuliano Bernini (1987). Papers from the 7th International Conference on Historical Linguistics. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Glass, Cordula (2010). Relevance and Influence of Phraseological Phenomena in Native and Non-native Text Production. Erlangen: unpublished. Goldberg, Adele (2013). “Constructionist Approaches”. In: Thomas Hoffmann and Graeme Trousdale (eds.). The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press, 15-31. Goldberg, Adele (2006). Constructions at Work - The Nature of Generalization in Language. Oxford/ New York: Oxford University Press. References 293 Goldberg, Adele (1995). Constructions - a construction grammar approach to argument structure. Chicago: University of Chicago Press. Goldberg, Adele (1992). "The inherent semantics of argument structure: The case of the English ditransitive construction". Cognitive Linguistics 3(1), 37-74. Goldfield, Beverly A. and J. Steven Reznick (1990). "Early lexical acquisition: Rate, content, and the vocabulary spurt". Journal of Child Language 17(1), 171-183. Goldfield, Beverly A. and Steven J. Reznick (1996). "Measuring the vocabulary spurt: a reply to Mervis & Bertrand". Journal of Child Language 23, 241-246. Gonzalez-Marquez, Monica, Irene Mittelberg, Seana Coulson and Michael J. Spivey (2007). Methods in Cognitive Linguistics. Amsterdam: John Benjamins Publishing Company. Granger, Sylviane (1998). "Prefabricated Patterns in Advanced EFL Writing: Collocations and Formulae". In: Anthony Paul Cowie (ed.). Phraseology - Theory, Analysis, and Applications. Oxford: Clarendon Press, 145-160. Granger, Sylviane and Fanny Meunier (2008). Phraseology - An interdisciplinary perspective. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Granger, Sylviane and Magali Paquot (2008). "Disentangling the phraseological web". In: Sylviane Granger and Fanny Meunier (eds.). Phraseology - An interdisciplinary perspective. Amsterdam/ Philadelphia: John Benjamins Publishing Company, 27-49. Greenland, Colin (1990). Take Back Plenty. New York: Avon Books. Grice, Paul (1975). “Logic and Conversation”. In: Peter Cole and Jerry L. Morgan (eds.). Syntax and semantics - Volume 3: Speech Acts. New York/ London: Academic Press, 41-58. Gries, Stefan (2013). "Basic significance testing". In: Robert J. Podesva and Devyani Sharma (eds.). Research Methods in Linguistics. Cambridge/ New York: Cambridge University Press, 316-336. Gries, Stefan and Anatol Stefanowitsch (2006). Corpora in Cognitive Linguistics - The Syntax-Lexis Interface. Berlin/ New York: Mouton de Gruyter. Gries, Stefan and Anatol Stefanowitsch (2004). "Extending collostructional analysis: A corpus-based perspective on 'alternations'". International Journal of Corpus Linguistics 9(1), 97-129. Gries, Stefan and Stefanie Wulff (2005). “Do foreign language learners also have constructions? Evidence from priming, sorting and corpora”. Annual Review of Cognitive Linguistics 3, 182-200. Gries, Stefan Th. (2010). "Behavioral profiles: A fine-grained and quantitative approach in corpus-based lexical semantics". The Mental Lexicon 5(3), 323-346. Gries, Stefan Th. and Naoki Otani (2010). "Behavioral profiles: A corpus-based perspective on synonymy and antonymy". ICAME Journal 34, 121-150. Grondelaers, Stefan, Dirk Speelman and Dirk Geeraerts (2007). "A Case for a Cognitive Corpus Linguistics". In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana Coulson and Michael J. Spivey (eds.). Methods in Cognitive Linguistics. Amsterdam: John Benjamins Publishing Company, 149-169. 294 References Gyllstad, Henrik (2013). "Looking at L2 vocabulary knowledge dimensions from an assessment perspective - challenges and potential solutions". In: Camilla Bardel, Batia Laufer and Christina Lindqvist (eds.). L2 vocabulary acquisition, knowledge and use: New perspectives on assessment and corpus analysis. Online: European Second Language Association, 11-28. Gyllstad, Henrik (2007). Testing English Collocations - Developing Receptive Tests for Use with Advanced Swedish Learners. Lund: Språkoch Litteraturcentrum, Lund University: http: / / lup.lub.lu.se/ luur/ download? func=downloadFile&recordO Id =599011&fileOId=2172422. Halliday, Michael A. K. (1966). "Lexis as a Linguistic Level". In: Charles E. Bazell, John C. Catford, Michael A. K. Halliday and Robert H. Robins (eds.). In Memory of J. R. Firth. London: Longman, 148-162. Halliday, Michael A. K. and Ruqaiya Hasan (1976). Cohesion in English. London: Longman. Hampe, Beate and Doris Schönefeld (2006). "Syntactic Leaps or Lexical Variation? More on 'Creative Syntax'". In: Stefan Gries and Anatol Stefanowitsch (eds.). Corpora in Cognitive Linguistics - The Syntax-Lexis Interface. Berlin/ New York: Mouton de Gruyter, 127-157. Hanks, Patrick (2013). Lexical Analysis - Norms and Exploitations. Cambridge, MA / London: M. I. T. Press. Hasko, Victoria (2013). "Capturing the Dynamics of Second Language Development via Learner Corpus Research: A Very Long Engagement". Modern Language Journal 97(1), 1-10. Hausmann, Franz Josef (2007). "Die Kollokationen im Rahmen der Phraseologie: Systematische und historische Darstellung". ZAA 55(3), 217-234. Hausmann, Franz Josef (1985). "Kollokationen im deutschen Wörterbuch. Ein Beitrag zur Theorie des lexikographischen Beispiels". In: Henning Bergenholtz and Joachim Mugdan (eds.). Lexikographie und Grammatik - Akten des Essener Kolloquiums zur Grammatik im Wörterbuch, 28.-30. 6. 1984. Tübingen: Niemeyer, 118-129. Hausmann, Franz Josef (1984). "Wortschatzlernen ist Kollokationslernen". Praxis des neusprachlichen Unterrichts 31, 395-406. Hawkins, Jeff and Sandra Blakeslee (2004). On Intelligence. New York: St. Martin’s Griffin. Hebb, Donald O. (1949). The organization of behavior - A neurophysiological theory. New York: Wiley and Sons. Herbst, Thomas (2011). "The Status of Generalizations: Valency and Argument Structure Constructions". In: Thomas Herbst and Anatol Stefanowitsch (eds.). Argument structure - Valency and / or constructions? Würzburg: Königshausen & Neumann, 347-367. Herbst, Thomas (2010). English Linguistics - A Coursebook for Students of English. Berlin/ New York: De Gruyter Mouton. Herbst, Thomas (1996). "What are Collocations: Sandy Beaches or False Teeth? ". English Studies 77(4), 379-393. References 295 Herbst, Thomas, David Heath, Ian F. Roe and Dieter Götz (2004). A valency dictionary of English - A corpus-based analysis of the complementation patterns of English verbs, nouns, and adjectives. Berlin/ New York: Mouton de Gruyter. Herbst, Thomas and Michael Klotz (2003). Lexikografie. Paderborn: Schöningh. Herbst, Thomas and Susen Schüller (2008). Introduction to Syntactic Analysis - A Valency Approach. Tübingen: Narr. Herbst, Thomas and Anatol Stefanowitsch (2011). Argument structure - Valency and / or constructions? Würzburg: Königshausen & Neumann. Hilpert, Martin (2008). Germanic Future Constructions - A usage-based approach to language change. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Hintzman, Douglas L. (1986). "'Schema Abstraction' in a Multiple: Trace Memory Model”. Psychological Review 93(4), 411-428. Hoey, Michael (2005). Lexical Priming - A new theory of words and language. London/ New York: Routledge. Hoffmann, Sebastian and Hans-Martin Lehmann (2000). "Collocational Evidence from the British National Corpus". In: John M. Kirk (ed.). Corpora Galore - Analyses and Techniques in Describing English. Amsterdam: Rodopi, 17-32. Hoffmann, Thomas and Graeme Trousdale (2013). The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press. Hollmann, Willem B. (2013). "Constructions in Cognitive Sociolinguistics". In: Thomas Hoffmann and Graeme Trousdale (eds.). The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press, 491-509. Hornby, Albert S. (2005). Oxford Advanced Learner's Dictionary of Current English. Oxford: Oxford University Press. Hornby, Albert S., Edward V. Gatenby and H. Wakefield (1948). The Advanced Learner's Dictionary of Current English. London: Oxford University Press. Hornby, Albert S., Edward V. Gatenby and H. Wakefield (1942). Idiomatic and Syntactic English Dictionary. Tokyo: The Institute for Research in English Teaching. Howarth, Peter Andrew (1996). Phraseology in English Academic Writing - Some implications for language learning and dictionary making. Tübingen: Max Niemeyer Verlag. Hudson, Thom and James Dean Brown (2001). A focus on language test development - Expanding the language proficiency construct across a variety of tests. Honolulu: University of Hawaii. Hunston, Susan (2007). “Semantic prosody revisited”. International Journal of Corpus Linguistics 12(2), 249-268. Hunston, Susan (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Hunston, Susan and Gil Francis (2000). Pattern Grammar - A Corpus-driven Approach to the Lexical Grammar of English : John Benjamins Publishing Company. Hunston, Susan and Geoff Thompson (2000). Evaluation in Text - Authorial Stance and the Construction of Discourse. Oxford: Oxford University Press. 296 References ISB (2004). Lehrplan für das Gymnasium in Bayern (Pflicht-/ Wahlpflichtfächer). Available online via: http: / / www.isb-gym8-lehrplan.de/ contentserv/ 3.1.neu/ g8.de/ index. php? StoryID=26414. Ishikawa, Shin'ichiro, Toshihiko Uemura, M. Kaneda, Shinichi Shimizu, Naoki Sugimori and Yukio Tono (2003). JACET 8000 - JACET List of 8000 Basic Words. Tokyo: JACET . Jackendoff, Ray (2013). "Constructions in the Parallel Architecture". In: Thomas Hoffmann and Graeme Trousdale (eds.). The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press, 70-92. Jackendoff, Ray (2002). Foundations of Language - Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press. Jackendoff, Ray (1990). Semantic structures. Cambridge, MA : M. I. T. Press. Jaén, Maria M. (2007). "A corpus-driven design of a test for assessing the ESL collocational competence of university students". International Journal of English Studies 7(2), 127-147. Jehle, Günter (2007). The Advanced Foreign Learner's Mental Lexicon - Storage and Retrieval of Verb-Noun Collocations like 'to embezzle money'. Hamburg: Verlag Dr. Kovač. Joanes, D. N. and C. A. Gill (1998). "Comparing measures of sample skewness and kurtosis". Journal of the Royal Statistical Society 47(1), 183-189. Johnson, Samuel (1747 / 1837). "The Plan of a Dictionary of the English Language; Addressed to the Right Honourable Philip Dormer, Earl of Chesterfield; One of His Majesty's Principal Secretaries of State.". In: Arthur Murphy (ed.). The Works of Samuel Johnson, LL . D. - With an Essay on his Life and Genius. Volume II . New York: George Dearborn, 439-445. Jones, Susan and John M. Sinclair (1974). "English Lexical Collocations. A Study in Computational Linguistics". Cahier de Lexicologie 24, 15-61. Katz, Jerrold J. (1971). "Generative Semantics is Interpretive Semantics". Linguistic Inquiry 2(3), 313-331. Katz, Jerrold J. and Jerry A. Fodor (1963). “The structure of a semantic theory”. Language 39(2), 170-210. Katz, Jerrold J. and Paul M. Postal (1964). An Integrated Theory of Linguistic Descriptions. Cambridge, MA : M. I. T. Press. Katz, Jerrold J. and Paul M. Postal (1963). "Semantic interpretation of idioms and sentences containing them". Quarterly Progress Report of the Research Laboratory of Electronics, Massachusetts Institute of Technology 70, 275-282. Kaufman, James C. and Robert J. Sternberge (2010). The Cambridge Handbook of Creativity. Cambridge/ New York: Cambridge University Press. Kay, Paul and Charles J. Fillmore (1999). “Grammatical Constructions and Linguistic Generalisations: The ‘What’s X doing Y? ’ Construction”. Language 75(1), 1-33. Kirk, John M. (2000). Corpora Galore - Analyses and Techniques in Describing English. Amsterdam: Rodopi. Kline, Rex B. (2004). Beyond significance testing - Reforming data analysis methods in behavioral research. Washington: American Psychological Association. References 297 Klotz, Michael (1998). Grammatik und Lexik - Studien zur Syntagmatik englischer Verben. Tübingen: Stauffenburg Verlag. Klotz, Michael (1997). "Ein Valenzwörterbuch englischer Verben, Adjektive und Substantive - Vorstellung eines Projektes". Zeitschrift für angewandte Linguistik 27, 93-111. Kotzbelt, Aaron, Ronald A. Beghetto and Mark A. Runco (2010). “Theories of Creativity”. In: James C. Kaufman and Robert J. Sternberg (eds.). The Cambridge Handbook of Creativity. Cambridge/ New York: Cambridge University Press, 20-47. Krashen, Stephen D. and Tracy D. Terrell (1983). The Natural Approach - Language Acquisition in the Classroom. New York/ London/ Toronto/ Sydney/ Tokyo/ Singapore: Phoenix ELT . Kuska, Sandra Kristina, Anna C. M. Zaunbauer and Jens Möller (2010). "Sind Immersionsschüler wirklich leistungsstärker? - Ein Lernexperiment". Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie 42(3), 143-153. Labov, William (1973). “The boundaries of words and their meaning”. In: Charles- James N. Bailey and Roger W. Shuy (eds.). New ways of Analyzing Variation in English. Washington: Georgetown University Press, 340-373. Labov, William (1972). "Some Principles of Linguistic Methodology". Language in Society 1(1), 97-120. Lakoff, George (1990). Women, fire, and dangerous things - What categories reveal about the mind. Chicago/ London: University of Chicago Press. Lakoff, George and Mark Johnson (1980). Metaphors we live by. Chicago: University of Chicago Press. Lambert, Wallace E. and G. Richard Tucker (1972). Bilingual education of children - The St. Lambert Experiment. Rowley: Newbury House. Langacker, Ronald W. (2009). Investigations in Cognitive Grammar. Berlin/ New York: Mouton de Gruyter. Langacker, Ronald W. (1987). Foundations of Cognitive Grammar - Theoretical prerequisites. Trier: L. A. U. T. Langacker, Ronald W. (1983). Foundations of Cognitive Grammar I - Orientation. Trier: L. A. U. T. Larsen-Freeman, Diane and Lynne Cameron (2008). Complex Systems and Applied Linguistics. Oxford: Oxford University Press. Larson-Hall, Jenifer (2012). “How to Run Statistical Analyses“. In: Alison Mackey and Susan M. Gass (eds.). Researching Methods in Second Language Acquisition. Chichester: Wiley-Blackwell, 245-274. Leech, Geoffrey N. (1974). Semantics. Harmondsworth/ Baltimore/ Ringwood/ Markham/ Auckland: Penguin. Lehr, Andrea (1996). Kollokationen und maschinenlesbare Korpora - ein operationales Analysemodell zum Aufbau lexikalischer Netze. Tübingen: Niemeyer. Lehrer, Adrienne and Eva Kittay (1992). Frames, Fields, and Contrasts. Hillsdale, NJ : Lawrence Erlbaum. 298 References Leśniewska, Justyna and Ewa Witalisz (2007). "Cross-linguistic influence and acceptability judgments of L2 and L1 collocations: A study of advanced Polish learners of English". EUROSLA Yearbook 7, 27-48. Levelt, Willem J. M. and Giovanni B. Flores D'Arcais (1978). Studies in the perception of language. London: Wiley. Levinson, Stephen C. (1983). Pragmatics. Cambridge: Cambridge University Press. Levorato, Maria Chiara and Cristina Cacciari (1992). "Children's comprehension and production of idioms: The role of context and familiarity". Journal of Child Language 19(2), 415-433. Lewis, Michael (2000). "Learning in the lexical approach". In: Michael Lewis (ed.). Teaching Collocation - Further Developments in the Lexical Approach. Hove: Language Teaching Publications, 155-185. Lewis, Michael (2000). Teaching Collocation - Further Developments in the Lexical Approach. Hove: Language Teaching Publications. Locke, John L. (1997). "A Theory of Neurolinguistic Development". Brain and Language 58, 265-326. Louw, Bill (2003). "Dressing up waiver: a stochastic collocational reading of ‘The Truth and Reconciliation Commission ( TRC )'". Available online via: http: / / amsacta.cib.unibo. it/ archive/ 00001142/ 01/ L OUW _paper.pdf. Louw, Bill (1993). “Irony in the Text of Insincerity in the Writer? - The Diagnostic Potential of Semantic Prosodies”. In: Mona Baker, Gil Francis and Elena Tognini-Bonelli (eds.). Text and Technology - In Honour of John Sinclair. Philadelphia/ Amsterdam: John Benjamins Publishing Company, 157-176. Lüdeling, Anke and Peter Bosch (2003). "Identification of Productive Collocations". Actas. Proceedings of the 8th International Symposium on Social Communication, Santiago de Cuba , online. Lüdeling, Anke and Stefan Evert (2003). "Linguistic experience and productivity: corpus evidence for fine grained distinctions". In: Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds.). Proceedings of the Corpus Linguistics 2003 Conference. Lancaster: University Centre for Computer Corpus Research on Language, 475-483. Lüdeling, Anke and Merja Kytö (2009). Corpus linguistics - An international handbook. Volume 2. Berlin/ New York: Walter de Gruyter. Lyons, John (1977). Semantics - II . Cambridge/ London/ New York/ Melbourne: Cambridge University Press. Mackey, Alison and Susan M. Gass (2012). Researching Methods in Second Language Acquisition. Chichester: Wiley-Blackwell. Mackin, Ronald (1978). "On collocations: 'words shall be known by the company they keep'". In: Peter Strevens (ed.). In honour of A. S. Hornby. Oxford: Oxford University Press, 149-165. MacWhinney, Brian (2001). “Emergentist approaches to language”. In: Joan Bybee and Paul Hopper (eds.). Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins Publishing Company, 449-470. References 299 MacWhinney, Brian (2000). The CHILDES Project: Tools for analyzing talk. 3rd edition. Mahwah, NJ : Lawrence Erlbaum Associates. Maden-Weinberger, Ursula (2015). "'Hätte, wäre, wenn …': A pseudo-longitudinal study of subjunctives in the Corpus of Learner German ( CLEG )". International Journal of Learner Corpus Research 1(1), 25-57. Mair, Christian and Marianne Hundt (1999). Corpus Linguistics and Linguistic Theory - Papers from the Twentieth International Conference on English Language Research on Computerized Corpora ( ICAME 20). Amsterdam/ Atlanta: Rodopi. Makkai, Adam (1972). Idiom structure in English. The Hague: Mouton. Malinowski, Bronislaw (1923 / 1956). "The Problem of Meaning in Primitive Languages". In: Charles K. Odgen and Ivor A. Richards (eds.). The Meaning of Meaning. London: Routledge, 296-336. Manning, Christopher D. and Hinrich Schütze (1999). Foundations of Statistical Language Processing. Cambridge, MA / London: M. I. T. Press. Martin, Alex, Cheri L. Wiggs, Leslie G. Ungerleider and James V. Haxby (1996). "Neural correlates of category-specific knowledge". Nature 379, 649-652. Matthews, Peter H. (2001). A Short History of Structural Linguistics. Cambridge: Cambridge University Press. Matthews, Peter H. (1974). Morphology - An Introduction to the Theory of World-structure. London: Cambridge University Press. McEnery, Tony, Richard Xiao and Yukio Tono (2006). Corpus-Based Language Studies - An Advanced Resource Book. Oxon/ New York: Routledge. McLaughlin, Barry (1978). Second-Language Acquisition in Childhood. Hillsdale, NJ : Lawrence Erlbaum Associates. Mel’čuk, Igor (1998). ”Collocations and Lexical Functions”. In: Anthony Paul Cowie (ed.). Phraseology - Theory, Analysis, and Applications. Oxford: Clarendon Press, 23-53. Mel’čuk, Igor (1995). “Phrasemes in language and phraseology in linguistics”. In: Everaert, Martin, Erik-Jan van der Linden, Andr Schenk and Rob Schreuder (eds.). Idioms - Structural and psychological perspectives. Hillsdale, NJ : Erlbaum, 167-232. Mel’čuk, Igor (1989). “Semantic primitives from the Viewpoint of the Meaning-Text Linguistic Theory”. Quaderni di Semantica 10(1), 65-102. Mervis, Carolyn B. and Jacquelyn Bertrand (1995). "Early lexical acquisition and the vocabulary spurt: a response to Goldfield & Reznick". Journal of Child Language 22(2), 461-468. Miller, Earl K., Andreas Nieder, David J. Freedman and Jonathan D. Wallis (2003). "Neural correlates of categories and concepts". Current Opinion in Neurobiology 13, 198-203. Miller, George A. (1999). "On Knowing a Word". Annual Review of Psychology 50, 1-19. Möller, Jens and Anna C. M. Zaunbauer (2008). " MOBI - Monolinguales und bilinguales Lernen in der Grundschule". Available online via: http: / / www.psychpaed.uni-kiel.de/ freedownloads/ Untersuchungsergeb nisse.pdf. 300 References Moon, Rosamund (2009). Words, Grammar, Text - Revisiting the Work of John Sinclair. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Moore, Timothy E. (1973). Cognitive Development and the Acquisition of Language. New York/ San Francisco/ London: Academic Press. Moran, Steven, Darren S. Tanner and Michael Scanlon (2009). Proceedings of the 24th Northwest Linguistics Conference - University of Washington Working Papers in Linguistics Vol. 27. Seattle, WA : Department of Linguistics. Morgan, Pamela S. (1997). "The inherent semantics of argument structure: The case of the English ditransitive construction". Cognitive Linguistics 8(4), 327-357. Muir, Kenneth (1979). Shakespeare survey - An annual survey of Shakespearian study and production. London: Cambridge University Press. Müller, Thomas (2010). Aware of Collocations - Ein Unterrichtskonzept zum Erwerb von Kollokationskompetenz für fortgeschrittene Lerner des Englischen. Frankfurt/ Berlin/ Bern/ Bruxelles/ New York/ Oxford/ Wien: Peter Lang. Murphy, Arthur (1837). The Works of Samuel Johnson, LL . D. - With an Essay on his Life and Genius. Volume II . New York: George Dearborn. Nelson, Katherine (1973). Structure and strategy in learning to talk. Chicago: University of Chicago Press. Nesselhauf, Nadja (2004). Collocations in a Learner Corpus. Amsterdam/ Philadelphia: John Benjamins Publishing Company. Neumann, John von (1986). Die Rechenmaschine und das Gehirn. 5th edition. München: Oldenbourg. Nippold, Marilyn A. and Jill K. Duthie (2003). "Mental Imagery and Idiom Comprehension: A Comparison of School-Age Children and Adults". Journal of Speech Language and Hearing Research 46, 788-799. Nippold, Marilyn A. and Stephanie Tarrant Martin (1989). "Idiom Interpretation in Isolation versus Context: A Developmental Study with Adolescents". Journal of Speech Language and Hearing Research 32(1), 59-66. Oakes, Michael P. (1998). Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press. OALD online (2015). Oxford Learner's dictionaries. Available online via: http: / / www. oxfordlearnersdictionaries.com/ . Odgen, Charles K. and Ivor A. Richards (1956). The Meaning of Meaning. London: Routledge. OECD (2014). PISA 2012 Results: What Students Know and Can Do - Student Performance in Mathematics, Reading and Science (Volume I, Revised edition, February 2014). Available online via: http: / / dx.doi.org/ 10.1787/ 9789264201118-en. OECD (2010). PISA 2009 Results: Learning Trends: Changes in Student Performance Since 2000 (Volume V). Available online via: http: / / dx.doi.org/ 10.1787/ 9789264091580-en. Ostman, Jan-Ola and Mirjam Fried (2005). Construction Grammars - Cognitive Grounding and Theoretical Extensions. Amsterdam: John Benjamins Publishing Company. References 301 Pakulak, Eric and Helen J. Neville (2010). "Proficiency Differences In Syntactic Processing of Monolingual Native Speakers Indexed by Event-related Potentials". Journal of Cognitive Neuroscience 22(12), 2728-2744. Palmer, Frank R. (1981). Semantics. 2nd edition. Cambridge/ New York/ Melbourne: Cambridge University Press. Palmer, Frank R. (1976). Semantics - A New Outline. Cambridge: Cambridge University Press. Palmer, Frank R. (1968). Selected Papers of J. R. Firth 1952-59. London/ Harlow: Longman. Palmer, Harold (1933). Second Interim Report on English Collocation. Tokyo: The Institute for Research in English Teaching. Palmer, Harold (1930). First Interim Report on Vocabulary Selection. Tokyo: The Institute for Research in English Teaching. Palmer, Harold and Albert S. Hornby (1937). Thousand-Word English - What it is and what can be done with it. London: Harrap. Paltridge, Brian and Aek Phakiti (2010). Continuum companion to research methods in applied linguistics. London/ New York: Continuum. Partington, Alan (2004). "Utterly content in each other's company: Semantic prosody and semantic preference". International Journal of Corpus Linguistics 9(1), 131-156. Partington, Alan (1998). Patterns and meanings - Using corpora for English language research and teaching. Amsterdam: John Benjamins Publishing Company. Pawley, Andrew and Frances Hodgetts Syder (1983). “Two puzzles for linguistic theory - nativelike selection and nativelike fluency”. In: Jack C. Richards and Richard W. Schmidt (eds.). Language and communication. London/ New York: Longman, 191-226. Pienemann, Manfred (1988). "Determining the influence of instruction on L2 speech processing". AILA Review 5, 40-72. Plato and C. D. C. Reeve (1998). Cratylus - Translated, with Introduction and Notes, by C. D. C Reeve. Indianapolis, IN / Cambridge: Hackett Publishing Company. Plötzgen, Sebastian D. (2003). Probleme und Chancen des deutschen Bildungssystems - Eine Bestandsaufnahme aus Schülersicht. Marburg: Tectum-Verlag. Plunkett, Kim and Virginia Marchman (1991). "U-shaped learning and frequency effects in a multi-layered perception: Implications for child language acquisition". Cognition 38, 43-102. Podesva, Robert J. and Devyani Sharma (2013). Research Methods in Linguistics. Cambridge/ New York: Cambridge University Press. Prinz, Philip M. (1983). "The development of idiomatic meaning in children". Language and Speech 26(3), 263-272. Pulvermüller, Friedeman (1999). "Words in the brain’s language". Behavioral and Brain Sciences 22, 253-336. Read, John (1993). "The development of a new measure of L2 vocabulary knowledge". Language Testing 10, 355-371. 302 References Reber, Paul J. (2013). “The neural basis of implicit learning and memory: A review of neuropsychological and neuroimaging research”. Neuropsychologia 51, 2026-2042. Reber, Paul J. (2009). “Contributions of functional neuroimaging to theories of category learning”. In: Frank Rösler (ed.). Neuroimaging of human memory - Linking cognitive processes to neural systems. Oxford/ New York: Oxford University Press, 89-108. Rescorla, Leslie A. (1980). "Overextension in early language development". Journal of Child Language 7(2), 321-335. Richards, Jack C. and Richard W. Schmidt (1983). Language and communication. London/ New York: Longman. Robinson, Peter and Nick Ellis (2008). Handbook of Cognitive Linguistics and Second Language Acquisition. New York/ London: Routledge. Rögnvaldsson, Eiríkur (1993). "Collocations in the Minimalist Program.". Lambda 18, 107-118. Rosch, Eleanor (1975). "Cognitive representations of semantic categories". Journal of Experimental Psychology: General 104, 193-233. Rosch, Eleanor (1973). "On the internal structure of perceptual and semantic categories". In: Timothy E. Moore (ed.). Cognitive Development and the Acquisition of Language. New York/ San Francisco/ London: Academic Press, 111-144. Rösler, Frank (2009). Neuroimaging of human memory - Linking cognitive processes to neural systems. Oxford/ New York: Oxford University Press. Rumelhart, David E. and James L. McClelland (1986). Parallel distributed processing - Explorations in the microstructure of cognition. Volume 1: Foundations. Cambridge, Mass.: M. I. T. Press. Sandra, Dominiek (1998). "What linguists can and can't tell you about the human mind: A reply to Croft". Cognitive Linguistics 9(4), 361-378. Saussure, Ferdinand de (1916 / 1967). Grundfragen der Allgemeinen Sprachwissenschaft. 2nd edition. Berlin: Walter de Gruyter. Schacter, Daniel L. (1992). “Priming and multiple memory systems: Perceptual mechanisms of implicit memory”. Journal of Cognitive Neuroscience 4(3), 244-256. Schmidt, Richard W. (1990). "The Role of Consciousness in Second Language Learning". Applied Linguistics 11(2), 129-158. Schmitt, Norbert (2000). Vocabulary in Language Teaching. Cambridge: Cambridge University Press. Schneider, Wolfgang and David F. Bjorklund (2003). "Memory and knowledge development". In: Jaan Valsiner and Kevin J. Connolly (eds.). Handbook of Developmental Psychology. London/ Thousand Oaks, CA : Sage Publications, 370-406. Schütze, Carson T. and Jon Sprouse (2013). "Judgement data". In: Robert J. Podesva and Devyani Sharma (eds.). Research Methods in Linguistics. Cambridge/ New York: Cambridge University Press, 27-50. Searle, John R. (1979). Expression and Meaning - Studies in the Theory of Speech Acts. Cambridge/ New York/ Melbourne/ Madrid: Cambridge University Press. Siepmann, Dirk (2005). “Collocation, Colligation and Encoding Dictionaries. Part I: Lexicological Aspects.”. International Journal of Lexicography 18(4), 409-443. References 303 Sinclair, John M. (2007). "Preface". International Journal of Corpus Linguistics 12(2), 155-157. Sinclair, John M. (2004). How to use corpora in language teaching. Amsterdam: John Benjamins Publishing Company. Sinclair, John M. (2004). Trust the Text - Language, Corpus and Discourse. Abingdon/ New York: Routledge. Sinclair, John M. (1998). “The lexical item.”. In: Edda Weigand (ed.). Contrastive Lexical Semantics. Amsterdam: John Benjamins Publishing Company, 1-24. Sinclair, John M. (1996). "The Search for Units of Meaning". Textus 9(1), 75-106. Sinclair, John M. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sinclair, John M. (1966). “Beginning the Study of Lexis.”. In: Charles E. Bazell, John C. Catford, Michael A. K. Halliday and Robert H. Robins (eds.). In Memory of J. R. Firth. London: Longman, 410-430. Sinclair, John M. (1987a). Collins COBUILD English Language Dictionary. London/ Glasgow: Collins. Sinclair, John M. (1987b). Looking Up - An Account of the COBUILD Project in Lexical Computing and the Development of the Collins COBUILD English Language Dictionary. London/ Glasgow: Collins. Sinclair, John M., Susan Jones and Robert Daley (1970 / 2005). English collocation studies - The OSTI report. London: Continuum. Siyanova, Anna and Norbert Schmitt (2008). "L2 Learner Production and Processing of Collocation: A Multi-study Perspective". The Canadian Modern Language Review / La Revue Canadienne des langues vivantes 64(3), 429-458. Skinner, Burhus F. (1957). Verbal behavior. Englewood Cliffs, NJ : Prentice-Hall. Spencer, Andrew and Arnold M. Zwicky (1998). The Handbook of Morphology. Oxford/ Malden, MA : Blackwell. Spencer, N. J. (1973). "Differences between linguists and nonlinguists in intuitions of grammaticality-acceptability". Journal of Psycholinguistic Research 2(2), 83-98. Steels, Luc (2011). Design Patterns in Fluid Construction Grammar. Amsterdam: John Benjamins Publishing Company. Steels, Luc (1998). "The Origins of Syntax in Visually Grounded Robotic Agents”. Artificial Intelligence 103, 133-156. Stefanowitsch, Anatol and Stefan Gries (2003). “Collostructions: Investigating the Interaction of Words and Constructions”. International Journal of Corpus Linguistics 8(2), 209-243. Stefanowitsch, Anatol and Stefan Th. Gries (2005). "Covarying collexemes". Corpus Linguistics and Linguistic Theory 1(1), 1-43. Steinbügl, Birgit (2005). Deutsch-englische Kollokationen - Erfassung in zweisprachigen Wörterbüchern und Grenzen der korpusbasierten Analyse. Tübingen: Max Niemeyer Verlag. Steinlen, Anja and Thorsten Piske (2013). "Academic achievement of children with and without migration backgrounds". ZAA 61(3), 213-214. 304 References Sternberg, Robert J. (1999). Handbook of creativity. Cambridge/ New York: Cambridge University Press. Sternberge, Robert J. and Todd I. Lubart (1999). “The Concept of Creativity: Prospects and Paradigms.”. In: Robert J. Sternberg (ed.). Handbook of creativity. Cambridge/ New York: Cambridge University Press, 3-15. Stewart, Dominic (2010). Semantic prosody - A critical evaluation. New York: Routledge. Strevens, Peter (1978). In honour of A. S. Hornby. Oxford: Oxford University Press. Stubbs, Michael (2001). Words and Phrases - Corpus Studies of Lexical Semantics. Malden, MA : Blackwell. Stubbs, Michael (1995). "Collocations and semantic profiles: on the cause of the trouble with quantitative studies". Functions of Language 2(1), 23-55. Student [William Sealy Gossset] (1908). “The probable error of a mean”. Biometrika 6(1), 1-25. Swain, Merrill and Sharon Lapkin (1982). Evaluating bilingual education - A Canadian case study. Clevedon: Multilingual Matters: http: / / search.ebscohost.com/ login.aspx? d irect=true&scope=site&db=nlebk&db=nlabk&AN=18907. Taylor, John R. (2007). "Cognitive Linguistics and Autonomous Linguistics". In: Dirk Geeraerts and Hubert Cuyckens (eds.). The Oxford Handbook of Cognitive Linguistics. Oxford/ New York: Oxford University Press, 566-588. Tognini-Bonelli, Elena (2001). Corpus linguistics at work. Amsterdam/ New York: John Benjamins Publishing Company. Tomasello, Michael (2005). Constructing a language - A Usage-Based Theory of Language Acquisition. Cambridge, MA : Harvard University Press. Tomasello, Michael (1992). First verbs - A case study of early grammatical development. Cambridge/ New York: Cambridge University Press. Tomasello, Michael and Nameera Akhtar (1995). "Two-year-olds use pragmatic cues to differentiate reference to objects and actions". Cognitive Development 10(2), 201-224. Tomasello, Michael and Elisabeth Bates (2001). Language Development - The Essential Readings. Oxford: Blackwell. Tomasello, Michael and Daniel Stahl (1999). "Sampling children's spontaneous speech: How much is enough? ". Journal of Child Language 31(1), 101-121. Traugott, Elizabeth Closs (2015). “Toward a coherent account of grammatical constructionalization”. In: Joanna Braðdal, Elena Smirnova, Lotte Sommerer and Spike Gildea (eds.). Diachronic Construction Grammar. Amsterdam/ Philadelphia: John Benjamins Publishing Company, 51-79. Traugott, Elizabeth Closs (2008). "The grammaticalization of NP of NP patterns". In: Alexander Bergs and Gabriele Diewald (eds.). Constructions and Language Change. Berlin/ New York: Mouton de Gruyter, 23-45. Traugott, Elizabeth Closs and Graeme Trousdale (2013). Constructionalization and Constructional Changes. Oxford: Oxford University Press. Traugott, Elizabeth Closs and Graeme Trousdale (2010). Gradience, Gradualness and Grammaticalization. Amsterdam/ Philadelphia: John Benjamins Publishing Company. References 305 Tukey, John W. (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics 5(2), 99-114. Tulvig, Endel and Daniel L. Schacter (1990). “Priming and Human Memory Systems”. Science 247, 301-306. Uemura, Toshihiko and Shin'ichiro Ishikawa (2004). " JACET 8000 and Asia TEFL Vocabulary Initiative". The Journal of ASIA TEFL 1(1), 333-347. Valsiner, Jaan and Kevin J. Connolly (2003). Handbook of Developmental Psychology. London/ Thousand Oaks, CA : Sage Publications. van Dijk, Teun A. (2008). Discourse and Context - A Sociocognitive Approach. Cambridge: Cambridge University Press. van Geert, Paul (2003). "Dynamic Systems Approaches and Modeling of Developmental Processes". In: Jaan Valsiner and Kevin J. Connolly (eds.). Handbook of Developmental Psychology. London/ Thousand Oaks, CA : Sage Publications, 640-672. Vartanian, Oshin, Adam S. Bristol and James C. Kaufmann (2013). Neuroscience of Creativity. Cambridge, MA : M. I. T. Press. Waddington, Conrad H. (1957). Principles of embryology. London: Allen & Unwin. Warren, Roger (1979). "'Smiling at Grief': Some Techniques of Comedy in Twelfth Nigh and Cosi Fan Tutte". In: Kenneth Muir (ed.). Shakespeare survey - An annual survey of Shakespearian study and production. London: Cambridge University Press, 79-84. Weigand, Edda (1998). Contrastive Lexical Semantics. Amsterdam: John Benjamins Publishing Company. Welch, B. L. (1947). "The generalization of "Student's" problem when several different population variances are involved". Biometrika 34(1-2), 28-35. West, Michael (1953 / 1983). A General Service List of English Words with semantic frequencies and a supplementary word-list for the writing of popular science and technology. Harlow: Longman. Whitsitt, Sam (2005). "A critique of the concept of semantic prosody". International Journal of Corpus Linguistics 10(3), 283-305. Widdowson, Henry G. (2004). Text, Context, Pretext - Critical Issues in Discourse Analysis. Malden, MA / Oxford/ Victoria: Blackwell. Widdowson, Henry G. (2000). "On the limitations of linguistics applied". Applied Linguistics 21(1), 3-25. Wode, Henning (1995). Lernen in der Fremdsprache - Grundzüge von Immersion und bilingualem Unterricht. Ismaning: Max Hueber Verlag. Wray, Alison (2002). Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. Wray, Alison (1999). "Formulaic language in learners and native speakers". Language Teaching 32(4), 213-231. Wray, Alison and Michael Perkins (2000). "The functions of formulaic language: an integrated model”. Language and Communication 20, 1-28. Wulff, Stefanie (2010). Rethinking idiomaticity - A usage-based approach. London/ New York: Continuum. 306 References Xiao, Richard and Tony McEnery (2006). "Collocation, Semantic Prosody, and Near. Applied Linguistics 27(1), 103-129. Yates, Frank (1984). "Test of Significance for 2x2 Contingency Tables". Journal of the Royal Statistical Society 147(3), 426-463. Zabell, Sandy L. (2008). "On Student’s 1908 paper „The probable error of a mean"”. Journal of the American Statistical Association 103(481), 1-7. Zamost, Scott A. and Elizabeth Snead (1987). "New Madonna Tour Sets Racy Tone". Chicago Tribune July 2, online. Zaunbauer, Anna C. M. and Jens Möller (2007). "Schulleistungen monolingual und immersiv unterrichteter Kinder am Ende des ersten Schuljahres". Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie 39(3), 141-153. Zeldes, Amir (2012). Productivity in argument selection - From morphology to syntax. Berlin: De Gruyter Mouton. Zernik, Uri (1991). Lexical Acquisition - Using On-line Resources to Build a Lexicon. Hillsdale: Erlbaum. Ziem, Alexander and Alexander Lasch (2013). Konstruktionsgrammatik - Konzepte und Grundlagen gebrauchsbasierter Ansätze. Berlin/ Boston: Walter de Gruyter. Zipf, George Kingsley (1949). Human behavior and the principle of least effort - An introduction to human ecology. Mansfield Center, CT : Martino Publishing. Dictionaries A Valency Dictionary of English By Thomas Herbst, David Heath, Ian Roe and Dieter Götz (2004). Berlin/ New York: Mouton de Gruyter. [ VDE ] Oxford Advanced Learner's Dictionary of Current English. By Albert S. Hornby, edited by Sally Wehmeier (2005). Oxford: Oxford University Press. 7 th edition. [ OALD 7] Oxford Collocations Dictionary for Students of English. Edited by Colin McIntosh (2009). Oxford: Oxford University Press. 2 nd edition. [ OCD 2] The BBI dictionary of English Word Combinations. By Morton Benson, Evelyn Benson and Robert Ilson (1997). Amsterdam/ Philadelphia: John Benjamins Publishing Company. Corpora BNC The British National Corpus. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. http: / / www.natcorp.ox.ac.uk/ Coca The Corpus of Contemporary American English: 450 million words, 1990-present. By Davies, Mark (2008). http: / / corpus.byu.edu/ coca/ CHILDES The CHILDES Project: Tools for analyzing talk. By Brian MacWhinney, (2000). http: / / childes.psy.cmu.edu/ Approaching collocations from a usage-based perspective, this study investigates how the development of collocational proficiency in first and second language attainment could be explained. Against the background of recent approaches in cognitive linguistics such as construction grammar and Complex Adaptive Systems it argues that collocations should not be regarded as idiosyncratic phraseological items, which, depending on their degree of fixedness and semantic opaqueness, can be classified along a gradient of idiomaticity. Thus, this study regards collocations as dynamic linguistic phenomena, which could be seen as subject to constant change rather than more or less static combinations with an additional level of syntagmatic and paradigmatic restrictions. Furthermore it explores how creative changes and alternations of collocations can be used to learn more about a speaker’s cognitive processing of these phraseological phenomena and how this process might be influenced by language external factors such as ‘age’, ‘education’ or ‘context’. ISBN 978-3-8233-8171-6 Multilingualism and Language Teaching 6 MLT 6 Glass Collocations, Creativity and Constructions Multilingualism and Language Teaching 6 Cordula Glass Collocations, Creativity and Constructions A Usage-based Study of Collocations in Language Attainment