January 2018

S M T W T F S
  123456
78910111213
14151617181920
21222324252627
28293031   

Style Credit

Expand Cut Tags

No cut tags
Thursday, November 15th, 2007 08:41 am
  • Do not try to wget -m the whole of the Electronic Text Corpus of Sumerian Literature. There's rather a lot more there than you'd expect. Why they couldn't just provide a zipfile and/or tarball of the XML they store in the backend database is anyone's guess. I can't be the only geek who's read Snow Crash and wants to contribute.
  • Other than that, I'm really rather impressed with the ETCSL. Check out the mouseover text. Of course, all that stuff needs stripping away for my purposes :-( The background articles on Sumerian language, literature and cuneiform (literally, "wedge-shaped") writing look pretty useful too. Annoyingly, their funding ran out in late 2006, so the site hasn't been updated for a while, and they seem to have made it unnecessarily hard for anyone to take over.
  • The correlation coefficient of a constant signal with anything else, even itself, is always zero, so if I take [livejournal.com profile] elvum's suggestion to use autocorrelation then the "short short short short" problem becomes a non-issue. However, I then end up with another signal, whereas what I really need is a single number with higher values representing higher levels of poeticity. Possibly I can limit the number of possibilities I need to check by counting syllables-per-line; or maybe I could just take the maximum value of the autocorrelation? It's been nearly ten years since I did any statistics, so this is all a bit painful. I've tried asking friends in the stats department, and been met with the slightly worried look of an expert challenged on something that's just outside their narrow specialism. I know it well, because it's a look I often use myself.
  • I'm not the first person to apply statistical ideas to analyse the corpus. There's even a book out: Analysing literary Sumerian: corpus-based approaches (or you can buy it from Amazon!) Nothing especially relevant-looking in the chapter headings, but I wonder if I could persuade the library to buy a copy... they don't have it in stock right now, but they do have the intriguing-looking Sumerian or Cryptology? Further investigation reveals that it used to be thought that Sumerian wasn't an actual language, but rather a priestly cryptosystem used for enciphering Semitic texts. More details here.

Reply

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting