Posts Tagged ‘level’

I was intrigued to learn earlier this year that Oxford University Press had launched a new online test of English language proficiency, called the Oxford Test of English (OTE). At the conference where I first heard about it, I was struck by the fact that the presentation of the OUP sponsored plenary speaker was entitled ‘The Power of Assessment’ and dealt with formative assessment / assessment for learning. Oxford clearly want to position themselves as serious competitors to Pearson and Cambridge English in the testing business.

The brochure for the exam kicks off with a gem of a marketing slogan, ‘Smart. Smarter. SmarTest’ (geddit?), and the next few pages give us all the key information.

Faster and more flexible‘Traditional language proficiency tests’ is presumably intended to refer to the main competition (Pearson and Cambridge English). Cambridge First takes, in total, 3½ hours; the Pearson Test of English Academic takes 3 hours. The OTE takes, in total, 2 hours and 5 minutes. It can be taken, in theory, on any day of the year, although this depends on the individual Approved Test Centres, and, again, in theory, it can be booked as little as 14 days in advance. Results should take only two weeks to arrive. Further flexibility is offered in the way that candidates can pick ’n’ choose which of the four skills they want to have tests, just one or all four, although, as an incentive to go the whole hog, they will only get a ‘Certificate of Proficiency’ if they do all four.

A further incentive to do all four skills at the same time can be found in the price structure. One centre in Spain is currently offering the test for one single skill at Ꞓ41.50, but do the whole lot, and it will only set you back Ꞓ89. For a high-stakes test, this is cheap. In the UK right now, both Cambridge First and Pearson Academic cost in the region of £150, and IELTS a bit more than that. So, faster, more flexible and cheaper … Oxford means business.

Individual experience

The ‘individual experience’ on the next page of the brochure is pure marketing guff. This is, after all, a high-stakes, standardised test. It may be true that ‘the Speaking and Writing modules provide randomly generated tasks, making the overall test different each time’, but there can only be a certain number of permutations. What’s more, in ‘traditional tests’, like Cambridge First, where there is a live examiner or two, an individualised experience is unavoidable.

More interesting to me is the reference to adaptive technology. According to the brochure, ‘The Listening and Reading modules are adaptive, which means the test difficulty adjusts in response to your answers, quickly finding the right level for each test taker. This means that the questions are at just the right level of challenge, making the test shorter and less stressful than traditional proficiency tests’.

My curiosity piqued, I decided to look more closely at the Reading module. I found one practice test online which is the same as the demo that is available at the OTE website . Unfortunately, this example is not adaptive: it is at B1 level. The actual test records scores between 51 and 140, corresponding to levels A2, B1 and B2.

Test scores

The tasks in the Reading module are familiar from coursebooks and other exams: multiple choice, multiple matching and gapped texts.

Reading tasks

According to the exam specifications, these tasks are designed to measure the following skills:

  • Reading to identify main message, purpose, detail
  • Expeditious reading to identify specific information, opinion and attitude
  • Reading to identify text structure, organizational features of a text
  • Reading to identify attitude / opinion, purpose, reference, the meanings of words in context, global meaning

The ability to perform these skills depends, ultimately, on the candidate’s knowledge of vocabulary and grammar, as can be seen in the examples below.

Task 1Task 2

How exactly, I wonder, does the test difficulty adjust in response to the candidate’s answers? The algorithm that is used depends on measures of the difficulty of the test items. If these items are to be made harder or easier, the only significant way that I can see of doing this is by making the key vocabulary lower- or higher-frequency. This, in turn, is only possible if vocabulary and grammar has been tagged as being at a particular level. The most well-known tools for doing this have been developed by Pearson (with the GSE Teacher Toolkit ) and Cambridge English Profile . To the best of my knowledge, Oxford does not yet have a tool of this kind (at least, none that is publicly available). However, the data that OUP will accumulate from OTE scripts and recordings will be invaluable in building a database which their lexicographers can use in developing such a tool.

Even when a data-driven (and numerically precise) tool is available for modifying the difficulty of test items, I still find it hard to understand how the adaptivity will impact on the length or the stress of the reading test. The Reading module is only 35 minutes long and contains only 22 items. Anything that is significantly shorter must surely impact on the reliability of the test.

My conclusion from this is that the adaptive element of the Reading and Listening modules in the OTE is less important to the test itself than it is to building a sophisticated database (not dissimilar to the GSE Teacher Toolkit or Cambridge English Profile). The value of this will be found, in due course, in calibrating all OUP materials. The OTE has already been aligned to the Oxford Online Placement Test (OOPT) and, presumably, coursebooks will soon follow. This, in turn, will facilitate a vertically integrated business model, like Pearson and CUP, where everything from placement test, to coursework, to formative assessment, to final proficiency testing can be on offer.

Having spent a lot of time recently looking at vocabulary apps, I decided to put together a Christmas wish list of the features of my ideal vocabulary app. The list is not exhaustive and I’ve given more attention to some features than others. What (apart from testing) have I missed out?

1             Spaced repetition

Since the point of a vocabulary app is to help learners memorise vocabulary items, it is hard to imagine a decent system that does not incorporate spaced repetition. Spaced repetition algorithms offer one well-researched way of improving the brain’s ‘forgetting curve’. These algorithms come in different shapes and sizes, and I am not technically competent to judge which is the most efficient. However, as Peter Ellis Jones, the developer of a flashcard system called CardFlash, points out, efficiency is only one half of the rote memorisation problem. If you are not motivated to learn, the cleverness of the algorithm is moot. Fundamentally, learning software needs to be fun, rewarding, and give a solid sense of progression.

2             Quantity, balance and timing of new and ‘old’ items

A spaced repetition algorithm determines the optimum interval between repetitions, but further algorithms will be needed to determine when and with what frequency new items will be added to the deck. Once a system knows how many items a learner needs to learn and the time in which they have to do it, it is possible to determine the timing and frequency of the presentation of new items. But the system cannot know in advance how well an individual learner will learn the items (for any individual, some items will be more readily learnable than others) nor the extent to which learners will live up to their own positive expectations of time spent on-app. As most users of flashcard systems know, it is easy to fall behind, feel swamped and, ultimately, give up. An intelligent system needs to be able to respond to individual variables in order to ensure that the learning load is realistic.

3             Task variety

A standard flashcard system which simply asks learners to indicate whether they ‘know’ a target item before they flip over the card rapidly becomes extremely boring. A system which tests this knowledge soon becomes equally dull. There needs to be a variety of ways in which learners interact with an app, both for reasons of motivation and learning efficiency. It may be the case that, for an individual user, certain task types lead to more rapid gains in learning. An intelligent, adaptive system should be able to capture this information and modify the selection of task types.

Most younger learners and some adult learners will respond well to the inclusion of games within the range of task types. Examples of such games include the puzzles developed by Oliver Rose in his Phrase Maze app to accompany Quizlet practice.Phrase Maze 1Phrase Maze 2

4             Generative use

Memory researchers have long known about the ‘Generation Effect’ (see for example this piece of research from the Journal of Verbal Learning and Learning Behavior, 1978). Items are better learnt when the learner has to generate, in some (even small) way, the target item, rather than simply reading it. In vocabulary learning, this could be, for example, typing in the target word or, more simply, inserting some missing letters. Systems which incorporate task types that require generative use are likely to result in greater learning gains than simple, static flashcards with target items on one side and definitions or translations on the other.

5             Receptive and productive practice

The most basic digital flashcard systems require learners to understand a target item, or to generate it from a definition or translation prompt. Valuable as this may be, it won’t help learners much to use these items productively, since these systems focus exclusively on meaning. In order to do this, information must be provided about collocation, colligation, register, etc and these aspects of word knowledge will need to be focused on within the range of task types. At the same time, most vocabulary apps that I have seen focus primarily on the written word. Although any good system will offer an audio recording of the target item, and many will offer the learner the option of recording themselves, learners are invariably asked to type in their answers, rather than say them. For the latter, speech recognition technology will be needed. Ideally, too, an intelligent system will compare learner recordings with the audio models and provide feedback in such a way that the learner is guided towards a closer reproduction of the model.

6             Scaffolding and feedback

feebuMost flashcard systems are basically low-stakes, practice self-testing. Research (see, for example, Dunlosky et al’s metastudy ‘Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology’) suggests that, as a learning strategy, practice testing has high utility – indeed, of higher utility than other strategies like keyword mnemonics or highlighting. However, an element of tutoring is likely to enhance practice testing, and, for this, scaffolding and feedback will be needed. If, for example, a learner is unable to produce a correct answer, they will probably benefit from being guided towards it through hints, in the same way as a teacher would elicit in a classroom. Likewise, feedback on why an answer is wrong (as opposed to simply being told that you are wrong), followed by encouragement to try again, is likely to enhance learning. Such feedback might, for example, point out that there is perhaps a spelling problem in the learner’s attempted answer, that the attempted answer is in the wrong part of speech, or that it is semantically close to the correct answer but does not collocate with other words in the text. The incorporation of intelligent feedback of this kind will require a number of NLP tools, since it will never be possible for a human item-writer to anticipate all the possible incorrect answers. A current example of intelligent feedback of this kind can be found in the Oxford English Vocabulary Trainer app.

7             Content

At the very least, a decent vocabulary app will need good definitions and translations (how many different languages?), and these will need to be tagged to the senses of the target items. These will need to be supplemented with all the other information that you find in a good learner’s dictionary: syntactic patterns, collocations, cognates, an indication of frequency, etc. The only way of getting this kind of high-quality content is by paying to license it from a company with expertise in lexicography. It doesn’t come cheap.

There will also need to be example sentences, both to illustrate meaning / use and for deployment in tasks. Dictionary databases can provide some of these, but they cannot be relied on as a source. This is because the example sentences in dictionaries have been selected and edited to accompany the other information provided in the dictionary, and not as items in practice exercises, which have rather different requirements. Once more, the solution doesn’t come cheap: experienced item writers will be needed.

Dictionaries describe and illustrate how words are typically used. But examples of typical usage tend to be as dull as they are forgettable. Learning is likely to be enhanced if examples are cognitively salient: weird examples with odd collocations, for example. Another thing for the item writers to think about.

A further challenge for an app which is not level-specific is that both the definitions and example sentences need to be level-specific. An A1 / A2 learner will need the kind of content that is found in, say, the Oxford Essential dictionary; B2 learners and above will need content from, say, the OALD.

8             Artwork and design

My wordbook2It’s easy enough to find artwork or photos of concrete nouns, but try to find or commission a pair of pictures that differentiate, for example, the adjectives ‘wild’ and ‘dangerous’ … What kind of pictures might illustrate simple verbs like ‘learn’ or ‘remember’? Will such illustrations be clear enough when squeezed into a part of a phone screen? Animations or very short video clips might provide a solution in some cases, but these are more expensive to produce and video files are much heavier.

With a few notable exceptions, such as the British Councils’s MyWordBook 2, design in vocabulary apps has been largely forgotten.

9             Importable and personalisable lists

Many learners will want to use a vocabulary app in association with other course material (e.g. coursebooks). Teachers, however, will inevitably want to edit these lists, deleting some items, adding others. Learners will want to do the same. This is a huge headache for app designers. If new items are going to be added to word lists, how will the definitions, example sentences and illustrations be generated? Will the database contain audio recordings of these words? How will these items be added to the practice tasks (if these include task types that go beyond simple double-sided flashcards)? NLP tools are not yet good enough to trawl a large corpus in order to select (and possibly edit) sentences that illustrate the right meaning and which are appropriate for interactive practice exercises. We can personalise the speed of learning and even the types of learning tasks, so long as the target language is predetermined. But as soon as we allow for personalisation of content, we run into difficulties.

10          Gamification

Maintaining motivation to use a vocabulary app is not easy. Gamification may help. Measuring progress against objectives will be a start. Stars and badges and leaderboards may help some users. Rewards may help others. But gamification features need to be built into the heart of the system, into the design and selection of tasks, rather than simply tacked on as an afterthought. They need to be trialled and tweaked, so analytics will be needed.

11          Teacher support

Although the use of vocabulary flashcards is beginning to catch on with English language teachers, teachers need help with ways to incorporate them in the work they do with their students. What can teachers do in class to encourage use of the app? In what ways does app use require teachers to change their approach to vocabulary work in the classroom? Reporting functions can help teachers know about the progress their students are making and provide very detailed information about words that are causing problems. But, as anyone involved in platform-based course materials knows, teachers need a lot of help.

12          And, of course, …

Apps need to be usable with different operating systems. Ideally, they should be (partially) usable offline. Loading times need to be short. They need to be easy and intuitive to use.

It’s unlikely that I’ll be seeing a vocabulary app with all of these features any time soon. Or, possibly, ever. The cost of developing something that could do all this would be extremely high, and there is no indication that there is a market that would be ready to pay the sort of prices that would be needed to cover the costs of development and turn a profit. We need to bear in mind, too, the fact that vocabulary apps can only ever assist in the initial acquisition of vocabulary: apps alone can’t solve the vocabulary learning problem (despite the silly claims of some app developers). The need for meaningful communicative use, extensive reading and listening, will not go away because a learner has been using an app. So, how far can we go in developing better and better vocabulary apps before users decide that a cheap / free app, with all its shortcomings, is actually good enough?

I posted a follow up to this post in October 2016.

There are a number of reasons why we sometimes need to describe a person’s language competence using a single number. Most of these are connected to the need for a shorthand to differentiate people, in summative testing or in job selection, for example. Numerical (or grade) allocation of this kind is so common (and especially in times when accountability is greatly valued) that it is easy to believe that this number is an objective description of a concrete entity, rather than a shorthand description of an abstract concept. In the process, the abstract concept (language competence) becomes reified and there is a tendency to stop thinking about what it actually is.

Language is messy. It’s a complex, adaptive system of communication which has a fundamentally social function. As Diane Larsen-Freeman and others have argued patterns of use strongly affect how language is acquired, is used, and changes. These processes are not independent of one another but are facets of the same complex adaptive system. […] The system consists of multiple agents (the speakers in the speech community) interacting with one another [and] the structures of language emerge from interrelated patterns of experience, social interaction, and cognitive mechanisms.

As such, competence in language use is difficult to measure. There are ways of capturing some of it. Think of the pages and pages of competency statements in the Common European Framework, but there has always been something deeply unsatisfactory about documents of this kind. How, for example, are we supposed to differentiate, exactly and objectively, between, say, can participate fully in an interview (C1) and can carry out an effective, fluent interview (B2)? The short answer is that we can’t. There are too many of these descriptors anyway and, even if we did attempt to use such a detailed tool to describe language competence, we would still be left with a very incomplete picture. There is at least one whole book devoted to attempts to test the untestable in language education (edited by Amos Paran and Lies Sercu, Multilingual Matters, 2010).

So, here is another reason why we are tempted to use shorthand numerical descriptors (such as A1, A2, B1, etc.) to describe something which is very complex and abstract (‘overall language competence’) and to reify this abstraction in the process. From there, it is a very short step to making things even more numerical, more scientific-sounding. Number-creep in recent years has brought us the Pearson Global Scale of English which can place you at a precise point on a scale from 10 to 90. Not to be outdone, Cambridge English Language Assessment now has a scale that runs from 80 points to 230, although Cambridge does, at least, allocate individual scores for four language skills.

As the title of this post suggests (in its reference to Stephen Jay Gould’s The Mismeasure of Man), I am suggesting that there are parallels between attempts to measure language competence and the sad history of attempts to measure ‘general intelligence’. Both are guilty of the twin fallacies of reification and ranking – the ordering of complex information as a gradual ascending scale. These conceptual fallacies then lead us, through the way that they push us to think about language, into making further conceptual errors about language learning. We start to confuse language testing with the ways that language learning can be structured.

We begin to granularise language. We move inexorably away from difficult-to-measure hazy notions of language skills towards what, on the surface at least, seem more readily measurable entities: words and structures. We allocate to them numerical values on our testing scales, so that an individual word can be deemed to be higher or lower on the scale than another word. And then we have a syllabus, a synthetic syllabus, that lends itself to digital delivery and adaptive manipulation. We find ourselves in a situation where materials writers for Pearson, writing for a particular ‘level’, are only allowed to use vocabulary items and grammatical structures that correspond to that ‘level’. We find ourselves, in short, in a situation where the acquisition of a complex and messy system is described as a linear, additive process. Here’s an example from the Pearson website: If you score 29 on the scale, you should be able to identify and order common food and drink from a menu; at 62, you should be able to write a structured review of a film, book or play. And because the GSE is so granular in nature, you can conquer smaller steps more often; and you are more likely to stay motivated as you work towards your goal. It’s a nonsense, a nonsense that is dictated by the needs of testing and adaptive software, but the sciency-sounding numbers help to hide the conceptual fallacies that lie beneath.

Perhaps, though, this doesn’t matter too much for most language learners. In the early stages of language learning (where most language learners are to be found), there are countless millions of people who don’t seem to mind the granularised programmes of Duolingo or Rosetta Stone, or the Grammar McNuggets of coursebooks. In these early stages, anything seems to be better than nothing, and the testing is relatively low-stakes. But as a learner’s interlanguage becomes more complex, and as the language she needs to acquire becomes more complex, attempts to granularise it and to present it in a linearly additive way become more problematic. It is for this reason, I suspect, that the appeal of granularised syllabuses declines so rapidly the more progress a learner makes. It comes as no surprise that, the further up the scale you get, the more that both teachers and learners want to get away from pre-determined syllabuses in coursebooks and software.

Adaptive language learning software is continuing to gain traction in the early stages of learning, in the initial acquisition of basic vocabulary and structures and in coming to grips with a new phonological system. It will almost certainly gain even more. But the challenge for the developers and publishers will be to find ways of making adaptive learning work for more advanced learners. Can it be done? Or will the mismeasure of language make it impossible?

FluentU, busuu, Bliu Bliu … what is it with all the ‘u’s? Hong-Kong based FluentU used to be called FluentFlix, but they changed their name a while back. The service for English learners is relatively new. Before that, they focused on Chinese, where the competition is much less fierce.

At the core of FluentU is a collection of short YouTube videos, which are sorted into 6 levels and grouped into 7 topic categories. The videos are accompanied by transcriptions. As learners watch a video, they can click on any word in the transcript. This will temporarily freeze the video and show a pop-up which offers a definition of the word, information about part of speech, a couple of examples of this word in other sentences, and more example sentences of the word from other videos that are linked on FluentU. These can, in turn, be clicked on to bring up a video collage of these sentences. Learners can click on an ‘Add to Vocab’ button, which will add the word to personalised vocabulary lists. These are later studied through spaced repetition.

FluentU describes its approach in the following terms: FluentU selects the best authentic video content from the web, and provides the scaffolding and support necessary to bring that authentic content within reach for your students. It seems appropriate, therefore, to look first at the nature of that content. At the moment, there appear to be just under 1,000 clips which are allocated to levels as follows:

Newbie 123 Intermediate 294 Advanced 111
Elementary 138 Upper Int 274 Native 40

It has to be assumed that the amount of content will continue to grow, but, for the time being, it’s not unreasonable to say that there isn’t a lot there. I looked at the Upper Intermediate level where the shortest was 32 seconds long, the longest 4 minutes 34 seconds, but most were between 1 and 2 minutes. That means that there is the equivalent of about 400 minutes (say, 7 hours) for this level.

The actual amount that anyone would want to watch / study can be seen to be significantly less when the topics are considered. These break down as follows:

Arts & entertainment 105 Everyday life 60 Science & tech 17
Business 34 Health & lifestyle 28
Culture 29 Politics & society 6

The screenshots below give an idea of the videos on offer:

menu1menu2

I may be a little difficult, but there wasn’t much here that appealed. Forget the movie trailers for crap movies, for a start. Forget the low level business stuff, too. ‘The History of New Year’s Resolutions’ looked promising, but turned out to be a Wikipedia style piece. FluentU certainly doesn’t have the eye for interesting, original video content of someone like Jamie Keddie or Kieran Donaghy.

But, perhaps, the underwhelming content is of less importance than what you do with it. After all, if you’re really interested in content, you can just go to YouTube and struggle through the transcriptions on your own. The transcripts can be downloaded as pdfs, which, strangely are marked with a FluentU copyright notice.copyright FluentU doesn’t need to own the copyright of the videos, because they just provide links, but claiming copyright for someone else’s script seemed questionable to me. Anyway, the only real reason to be on this site is to learn some vocabulary. How well does it perform?

fluentu1

Level is self-selected. It wasn’t entirely clear how videos had been allocated to level, but I didn’t find any major discrepancies between FluentU’s allocation and my own, intuitive grading of the content. Clicking on words in the transcript, the look-up / dictionary function wasn’t too bad, compared to some competing products I have looked at. The system could deal with some chunks and phrases (e.g. at your service, figure out) and the definitions were appropriate to the way these had been used in context. The accuracy was far from consistent, though. Some definitions were harder than the word they were explaining (e.g. telephone = an instrument used to call someone) and some were plain silly (e.g. the definition of I is me).

have_been_definitionSome chunks were not recognised, so definitions were amusingly wonky. Come out, get through and have been were all wrong. For the phrase talk her into it, the program didn’t recognise the phrasal verb, and offered me communicate using speech for talk, and to the condition, state or form of for into.

For many words, there are pictures to help you with the meaning, but you wonder about some of them, e.g. the picture of someone clutching a suitcase to illustrate the meaning of of, or a woman holding up a finger and thumb to illustrate the meaning of what (as a pronoun).what_definition

The example sentences don’t seem to be graded in any way and are not always useful. The example sentences for of, for example, are The pages of the book are ripped, the lemurs of Madagascar and what time of day are you free. Since the definition is given as belonging to, there seems to be a problem with, at least, the last of these examples!

With the example sentence that link you to other video examples of this word being used, I found that it took a long time to load … and it really wasn’t worth waiting for.

After a catalogue of problems like this, you might wonder how I can say that this function wasn’t too bad, but I’ve seen a lot worse. It was, at least, mostly accurate.

Moving away from the ‘Watch’ options, I explored the ‘Learn’ section. Bearing in mind that I had described myself as ‘Upper Intermediate’, I was surprised to be offered the following words for study: Good morning, may, help, think, so. This then took me to the following screen:great job

I was getting increasingly confused. After watching another video, I could practise some of the words I had highlighted, but, again, I wasn’t sure quite what was going on. There was a task that asked me to ‘pick the correct translation’, but this was, in fact a multiple choice dictation task.translation task

Next, I was asked to study the meaning of the word in, followed by an unhelpful gap-fill task:gap fill

Confused? I was. I decided to look for something a little more straightforward, and clicked on a menu of vocabulary flash cards that I could import. These included sets based on copyright material from both CUP and OUP, and I wondered what these publishers might think of their property being used in this way.flashcards

FluentU claims  that it is based on the following principles:

  1. Individualized scaffolding: FluentU makes language learning easy by teaching new words with vocabulary students already know.
  2. Mastery Learning: FluentU sets students up for success by making sure they master the basics before moving on to more advanced topics.
  3. Gamification: FluentU incorporates the latest game design mechanics to make learning fun and engaging.
  4. Personalization: Each student’s FluentU experience is unlike anyone else’s. Video clips, examples, and quizzes are picked to match their vocabulary and interests.

The ‘individualized scaffolding’ is no more than common sense, dressed up in sciency-sounding language. The reference to ‘Mastery Learning’ is opaque, to say the least, with some confusion between language features and topic. The gamification is rudimentary, and the personalization is pretty limited. It doesn’t come cheap, either.

price table

In the words of its founder and CEO, self-declared ‘visionary’ Claudio Santori, Bliu Bliu is ‘the only company in the world that teaches languages we don’t even know’. This claim, which was made during a pitch  for funding in October 2014, tells us a lot about the Bliu Bliu approach. It assumes that there exists a system by which all languages can be learnt / taught, and the particular features of any given language are not of any great importance. It’s questionable, to say the least, and Santori fails to inspire confidence when he says, in the same pitch, ‘you join Bliu Bliu, you use it, we make something magical, and after a few weeks you can understand the language’.

The basic idea behind Bliu Bliu is that a language is learnt by using it (e.g. by reading or listening to texts), but that the texts need to be selected so that you know the great majority of words within them. The technological challenge, therefore, is to find (online) texts that contain the vocabulary that is appropriate for you. After that, Santori explains , ‘you progress, you input more words and you will get more text that you can understand. Hours and hours of conversations you can fully understand and listen. Not just stupid exercise from stupid grammar book. Real conversation. And in all of them you know 100% of the words. […] So basically you will have the same opportunity that a kid has when learning his native language. Listen hours and hours of native language being naturally spoken at you…at a level he/she can understand plus some challenge, everyday some more challenge, until he can pick up words very very fast’ (sic).

test4

On entering the site, you are invited to take a test. In this, you are shown a series of words and asked to say if you find them ‘easy’ or ‘difficult’. There were 12 words in total, and each time I clicked ‘easy’. The system then tells you how many words it thinks you know, and offers you one or more words to click on. Here are the words I was presented with and, to the right, the number of words that Bliu Blu thinks I know, after clicking ‘easy’ on the preceding word.

hello 4145
teenager 5960
soap, grape 7863
receipt, washing, skateboard 9638
motorway, tram, luggage, footballer, weekday 11061

test7

Finally, I was asked about my knowledge of other languages. I said that my French was advanced and that my Spanish and German were intermediate. On the basis of this answer, I was now told that Bliu Bliu thinks that I know 11,073 words.

Eight of the words in the test are starred in the Macmillan dictionaries, meaning they are within the most frequent 7,500 words in English. Of the other four, skateboard, footballer and tram are very international words. The last, weekday, is a readily understandable compound made up of two extremely high frequency words. How could Bliu Bliu know, with such uncanny precision, that I know 11,073 words from a test like this? I decided to try the test for French. Again, I clicked ‘easy’ for each of the twelve words that was offered. This time, I was offered a very different set of words, with low frequency items like polynôme, toponymie, diaspora, vectoriel (all of which are cognate with English words), along with the rather surprising vichy (which should have had a capital letter, as it is a proper noun). Despite finding all these words easy, I was mortified to be told that I only knew 6546 words in French.

I needn’t have bothered with the test, anyway. Irrespective of level, you are offered vocabulary sets of high frequency words. Examples of sets I was offered included [the, be, of, and, to], [way, state, say, world, two], [may, man, hear, said, call] and [life, down, any, show, t]. Bliu Bliu then gives you a series of short texts that include the target words. You can click on any word you don’t know and you are given either a definition or a translation (I opted for French translations). There is no task beyond simply reading these texts. Putting aside for the moment the question of why I was being offered these particular words when my level is advanced, how does the software perform?

The vast majority of the texts are short quotes from brainyquote.com, and here is the first problem. Quotes tend to be pithy and often play with words: their comprehensibility is not always a function of the frequency of the words they contain. For the word ‘say’, for example, the texts included the Shakespearean quote It will have blood, they say; blood will have blood. For the word ‘world’, I was offered this line from Alexander Pope: The world forgetting, by the world forgot. Not, perhaps, the best way of learning a couple of very simple, high-frequency words. But this was the least of the problems.

The system operates on a word level. It doesn’t recognise phrases or chunks, or even phrasal verbs. So, a word like ‘down’ (in one of the lists above) is presented without consideration of its multiple senses. The first set of sentences I was asked to read for ‘down’ included: I never regretted what I turned down, You get old, you slow down, I’m Creole, and I’m down to earth, I never fall down. I always fight, I like seeing girls throw down and I don’t take criticism lying down. Not exactly the best way of getting to grips with the word ‘down’ if you don’t know it!

bliubliu2You may have noticed the inclusion of the word ‘t’ in one of the lists above. Here are the example sentences for practising this word: (1) Knock the ‘t’ off the ‘can’t’, (2) Sometimes reality T.V. can be stressful, (3) Argentina Debt Swap Won’t Avoid Default, (4) OK, I just don’t understand Nethanyahu, (5) Venezuela: Hell on Earth by Walter T Molano and (6) Work will win when wishy washy wishing won t. I paid €7.99 for one month of this!

The translation function is equally awful. With high frequency words with multiple meanings, you get a long list of possible translations, but no indication of which one is appropriate for the context you are looking at. With other words, it is sometimes, simply, wrong. For example, in the sentence, Heaven lent you a soul, Earth will lend a grave, the translation for ‘grave’ was only for the homonymous adjective. In the sentence There’s a bright spot in every dark cloud, the translation for ‘spot’ was only for verbs. And the translation for ‘but’ in We love but once, for once only are we perfectly equipped for loving was ‘mais’ (not at all what it means here!). The translation tool couldn’t handle the first ‘for’ in this sentence, either.

Bliu Bliu’s claim that Bliu Bliu knows you very well, every single word you know or don’t know is manifest nonsense and reveals a serious lack of understanding about what it means to know a word. However, as you spend more time on the system, a picture of your vocabulary knowledge is certainly built up. The texts that are offered begin to move away from the one-liners from brainyquote.com. As reading (or listening to recorded texts) is the only learning task that is offered, the intrinsic interest of the texts is crucial. Here, again, I was disappointed. Texts that I was offered were sourced from IEEE Spectrum (The World’s Largest Professional Association for the Advancement of Technology), infowars.com (the home of the #1 Internet News Show in the World), Latin America News and Analysis, the Google official blog (Meet 15 Finalists and Science in Action Winner for the 2013 GoogleScience Fair) MLB Trade Rumors (a clearinghouse for relevant, legitimate baseball rumors), and a long text entitled Robert Waldmann: Policy-Relevant Macro Is All in Samuelson and Solow (1960) from a blog called Brad DeLong’s Grasping Reality……with the Neural Network of a Moderately-Intelligent Cephalopod.

There is more curated content (selected from a menu which includes sections entitled ‘18+’ and ‘Controversial Jokes’). In these texts, words that the system thinks you won’t know (most of the proper nouns for example) are highlighted. And there is a small library of novels, again, where predicted unknown words are highlighted in pink. These include Dostoyevsky, Kafka, Oscar Wilde, Gogol, Conan Doyle, Joseph Conrad, Oblomov, H.P. Lovecraft, Joyce, and Poe. You can also upload your own texts if you wish.

But, by this stage, I’d had enough and I clicked on the button to cancel my subscription. I shouldn’t have been surprised when the system crashed and a message popped up saying the system had encountered an error.

Like so many ‘language learning’ start-ups, Bliu Bliu seems to know a little, but not a lot about language learning. The Bliu Bliu blog has a video of Stephen Krashen talking about comprehensible input (it is misleadingly captioned ‘Stephen Krashen on Bliu Bliu’) in which he says that we all learn languages the same way, and that is when we get comprehensible input in a low anxiety environment. Influential though it has been, Krashen’s hypothesis remains a hypothesis, and it is generally accepted now that comprehensible input may be necessary, but it is not sufficient for language learning to take place.

The hypothesis hinges, anyway, on a definition of what is meant by ‘comprehensible’ and no one has come close to defining what precisely this means. Bliu Bliu has falsely assumed that comprehensibility can be determined by self-reporting of word knowledge, and this assumption is made even more problematic by the confusion of words (as sequences of letters) with lexical items. Bliu Bliu takes no account of lexical grammar or collocation (fundamental to any real word knowledge).

The name ‘Bliu Bliu’ was inspired by an episode from ‘Friends’ where Joey tries and fails to speak French. In the episode, according to the ‘Friends’ wiki, ‘Phoebe helps Joey prepare for an audition by teaching him how to speak French. Joey does not progress well and just speaks gibberish, thinking he’s doing a great job. Phoebe explains to the director in French that Joey is her mentally disabled younger brother so he’ll take pity on Joey.’ Bliu Bliu was an unfortunately apt choice of name.

friends