“Words co-occur in sentences.” … Google & Och … Melville’s poetry 30th of April, 2006 ANTE·MERIDIEM 11:23
Via Jóska, a paper [PDF] on trying to model words in a language as a network-oriented system (network-oriented systems are his bread and butter). From its first page:
“Words interact in many ways. Some words co-occur with certain words at a higher probability than with others and co-occurrence is not trivial, i.e. it is not a straightforward implication of the known frequency distribution of words. If a text is scrambled, the frequency distribution is maintained but its content will not make sense.”They then go on to describe various repercussions of modelling it like this, and admit that what they are modelling has realistically very little to do with language as it is used by human beings, because the model is so limited. While the very admission of this is a breath of fresh air if you’re used to everyone’s friend from MIT, it depresses me that people are essentially playing with toys, ignoring the world as it exists, and this is being sold as academic research.
On the bright side, I learn from Language Log that Franz Och and Google Labs have made their statistical machine translation engine available for Arabic to English, and it’s really good. (Machine translation “really good” that is, not hand-translation “really good.” So lots of the translated text is in the form of sentences, but the word choice isn’t great.) I wonder when the superior alien civilisation invades us, will stylistically awkward English (or German, or whatever the local vernacular is) become a status symbol, because that’s what their translation machines produce.
Funny thing: from t’Wikipedia, Herman Melville published an epic poem in 1876, with a print run of 350 copies, with this result:
‘The critic Lewis Mumford found a copy of the poem in the New York Public Library in 1925 “with its pages uncut.” Essentially, it had sat there unread for 50 years.’
Phrase of the day: дашт кашидан is Tajik for “to stop doing something;” the separable verb „aufhören“ is German for the same, and ‘terminar’ is the Spanish.
Now, Ferrer i Cancho and Solé are not about to do that, I accept—they are network systems people, not neurologists. But maybe if there were good numbers of people taking a real interest in language and how one could relate it to the structure of the brain, there wouldn’t be the space to publish that sort of all-I’ve-got-is-a-hammer conclusion, and the network systems people would find another area to amuse themselves with.
I suspect this is worth making clear here as well as by email. Sheila wrote:
> … The guy you linked to writes like a crank!I replied: Yes, he does. I do not assert that he is anything other than a crank; being able to say that would involve knowing a lot more about neurology than I do, for example. (Though, of the areas I know enough about to judge in what he writes, what he says is more antagonistically put than is constructive, but not actually wrong, IMO.) But when he’s not maniesting crankdom I like the direction he’s going in more than most directions.
Anyway, final thought (ha, as if), one would also need to check against random, pseudo-random, &c. noise.
and I’m sure these are well studied problems, so it is presumptious of me to even try to start a discussion on them.
(I know for example, from my ex doing Monte Carlo simulations in order to do simulations for high energy collisions, that there can be structure in pseudo-random numbers ...from crypto, from an article I once read, &c. la la la &c.)
Some HTML is allowed. Use Preview if you’re not sure that what you type will be.