Quality, not quantity

Are you wise in the ways of word counts? If so, would you happen to know a nifty method of accurately counting the words in LaTeX documents?

I’ve tried converting the .pdf to a PostScript or ASCII file and counting the words in that with a simple wc command in shell, but that doesn’t work — it just returns 0.

So I just tried a normal wc command for the PDF document

 wc myfile.pdf

and a detex command on the .tex file

detex myfile.tex | wc

and I tried copying and pasting from the .pdf into Word and using Word’s word count tool*. And all three of these are giving me different answers, with the difference in answers being up to 102 words, which seems just silly.

So I could just pick the wordcount that is most convenient for my purposes, but that goes against the “quality not quantity” pedantry that grips me in moments of, well, pedantry.

Nevertheless, methinks there must be a more accurate way, although that more accurate way might be for me to write a program myself in Haskell or Java to do the count. Is there a non-effort-consuming alternative?

–IP

*Relying on Word when one has the option of TeX seems plain embarrassing.

4 Responses to “Quality, not quantity”

  1. Louisa Says:

    hello!

    i hope all is well with you? I have been thinking of you much latly and i have realised its been ages since we’ve met up. id really love to hve some catch up time and talk logic and science with you again, i do miss it!

    on a related to post note: do you know a good free version of laTex for the mac? getting to the point now where i am writing papers (eek) and such and i really should invest the time to Tex.

    x

  2. irrationalpoint Says:

    Hey hon. Yes we should absolutely meet up. Come visit! Or maybe we can arrange something around Christmas?

    Regarding TeX, I use TeXShop. There’s a rather good book called “TeX for the Impatient” which is good, although I can’t now remember if it’s for LaTeX or PlainTeX.

    Take care.
    IP

  3. Einar Says:

    Hi,

    You, and others who come across the same question of counting the words in LaTeX documents, may be interested in trying out the LaTeX word-count script I made.

    http://folk.uio.no/einarro/Comp/texwordcount.html

    You can either download and run the script itself (requires Perl) or use the web-interface (requires only Internet access and a web browser).

    I think it should be quite accurate (unless you write a language with lots of non-latin letters), and it does output the details of the parsing so you can assess the accuracy quite easily.

    Einar

  4. IrrationalPoint Says:

    Thanks, Einar — you saved me the effort! I shall report back the next time I have an occasion to write something in LaTeX.

    (unless you write a language with lots of non-latin letters)

    I sometimes use International Phonetic Alphabet. I guess I’ll see how it handles that.

    –IP

Leave a Reply