Computing for Poets
COMP 131


Readings   |   Intro  |   Determining your grade  |   Instructor's Home |   Dept Home

Whom does this stately Navy bring?
O! 'tis Great Britain's Glorious King

Katherine Phillips (1631-1664),
Arion to a Dolphin

A little Learning is a dang'rous Thing;
Drink deep, or taste not the Pierian Spring:
There shallow Draughts
intoxicate the Brain,
And drinking largely sobers us again.

Alexander Pope, An Essay on Criticism, 1711.

Instructor: Mark LeBlanc
Office: Science Center B103
Office Hours: by appt.
   (set time via email)
    MWF 9:30-10:30
    MW 3:30-4:00

Mark's Web Page -- Email
Phone: 286-3970
Class Meeting Times:
    Mon 2:00-3:20pm, A102
    Wed 2:00-3:20pm, csLab

The use of computers to manage the storage and retrieval of written texts creates new opportunities for scholars of ancient and other written works. Recent advances in computer software, hypertext, and database methodologies have made it possible to ask novel questions about a poem, a story, a trilogy, or anthology. This course teaches computer programming as a vehicle to explore poems and other texts that are now available online. Programming facilitates top-down thinking and practice with real-world problem solving skills such as problem decomposition and algorithmic thinking. Programming on texts introduces students to rich new areas of scholarship including stylometry and authorship attribution. Prerequisites: A love of the written (and digital) word; no computer programming experience required.

Using computers to analyze poems and stories is an exciting new area of research. In this course, you will learn to write programs in the language called Perl. Perl is a wonderful language when you are dealing with strings of characters.

Some of the programs that you will write will analyze texts to:

  1. compute the percentage of vowels used in a text;
  2. find the longest and average sentence length of your favorite poem or book;
  3. search for patterns of letters or words using the powerful pattern matching language of regular expressions;
  4. search your own papers that you have written in the past for questionable writing style
  5. determine the top-10 most frequently used words by an author;
  6. build a concordance of all the words in a poem or a collection of poems;
  7. learn the beginning steps in authorship attribution as you keep statistics for:
    • the relative frequencies of the most commonly used words in a poem or story
    • the average and standard deviation of the frequencies of the most commonly used words for a particular author across multiple poems or stories
    • z-scores to compare a sample work from one author with those from other authors
    • hapax legomena (words that appear only once)
    • dis legomena (words that appear only twice)
For example, we might attempt to determine if a 17th century poem was written by John Donne or Aemilia Lanyer?

How every thing retaind a sad dismay:
Nay long before, when once an inkeling came,
Me thought each thing did unto sorrow frame:
The trees that were so glorious in our view,
Forsooke both flowres and fruit, when once they knew
Of your depart, their very leaves did wither,
Changing their colours as they grewe together.

Aemilia Lanyer, Salve Deus Rex Judaeorum

Come live with me, and be my love,
And we will some new pleasures prove
Of golden sands, and crystal brooks,
With silken lines and silver hooks.

John Donne, The Bait

We will learn to combine Perl programs, comma-separated output from those Perl programs, and Excel spreadsheets in a manner very similar to the way your instructor does when doing research. One of the goals is to take the mystery out of the problem solving and tools that one needs to do research in this area.

In addition to programming in Perl and work in Excel, we will also study how computers store individual characters, including the traditional (English-only) ASCII character code and the international standard called UniCode. The end of semester will be devoted to the the Text Encoding Initiative (TEI). The TEI is an international and interdisciplinary standard that helps libraries, museums, publishers, and individual scholars store all kinds of literary and linguistic texts on computers for online research and teaching. TEI is a "must-know" for all scholars of the 21st century.

NOTE: This course is but an introduction to using computing to study written texts. Computers allow us to study texts in exciting new ways that we could not otherwise do; however, as we'll discuss at length, we are wise if we keep in mind what computers can not do. The following quotes can help us (1) stay humble and (2) stay focused.

"As students of a powerful new form of scholarship, we have much to offer.
We do ourselves no justice when we forget that the quantifiable features we deal
in are but the shadow of a shadow."

John Burrows, Computers and the Humanities, v37, 2003, p30.

"The onus of competency, clarity, and completeness is on the practioner.
The researcher must document and make clear every step of the way.
No smoke and mirrors, no hocus-pocus, no 'trust me on this.' "

Joseph Rudman, Computers and the Humanities, v31, 1998, p353.

In computer science, if you are almost correct you are a liability.
Fred Kollett (1941-1997), MathCS, Wheaton College, Norton, MA

Hammond, Michael (2003). Programming for Linguists - Perl for Language Researchers. Blackwell Publishing, 2003. Required textbook. An introduction to learning to program using the language Perl. Lots of small examples with an emphasis on the essentials of "good programming."

Cook, Gareth (2003). Much ado about data. Health Science - Boston Globe. August 5, 2003, D1-D4.

Gould, John (2004). Before the computer bug, there was the type louse. Weekly column from the Christian Science Monitor, January 9, 2004, p23.

Hockey, Susan (2000). Electronic Texts in the Humanities. Oxford University Press, New York, NY. Ch. 7 Stylometry and Attribution Studies.

Klarreich, Erica (2003). Bookish Math - Statistical tests are unraveling knotty literary mysteries. Science News, v164, No. 25/26, Dec. 2003, p392-394.

Levy, Steven (2003). Welcome to History 2.0. Newsweek. Nov. 10, 2003, p58.

Relihan, Joel (2002). Translating Boethius. Wheaton Quarterly, Fall 2002, 21-25.

Rudman, Joseph (1998). The State of Authorship Attribution Studies: Some Problems and Solutions. Computers and the Humanities, v31, p351-365.


Finding books and poems online

Literature Online  (Wheaton students only)
A fully searchable library of more than 350,000 works of English and American poetry, drama and prose, plus biographies, bibliographies and key criticism and reference resources.

This site combines three sites first created in 1996 to provide a starting point for students and enthusiasts of English Literature.

Humanities Text Initiative
The Humanities Text Initiative (HTI) is an umbrella organization for the creation, delivery, and maintenance of electronic texts, as well as a mechanism for furthering the library community's capabilities in the area of online text.

University of Virginia Library
The Center combines an on-line archive of tens of thousands of SGML and XML-encoded electronic texts and images with a library service that offers hardware and software suitable for the creation and analysis of text.

Project Gutenberg
Project Gutenberg is the brainchild of Michael Hart, who in 1971 decided that it would be a really good idea if lots of famous and important texts were freely available to everyone in the world.

Women Writers Project
The Brown University Women Writers Project is a long-term research project devoted to early modern women's writing and electronic text encoding.

Women Writers Online

Renaissance Women Online
When complete, the RWO collection will include 100 Renaissance texts from the main WWP textbase, together with contextual introductions and topical essays on women's life and writing in the Renaissance. When complete, the RWO collection will include 100 Renaissance texts from the main WWP textbase, together with contextual introductions and topical essays on women's life and writing in the Renaissance.

Storing and Encoding Text Online

The TEI is an international and interdisciplinary standard that helps libraries, museums, publishers, and individual scholars represent all kinds of literary and linguistic texts for online research and teaching.

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

Lots of good reading about the larger Perl community.

Perl documentation
When you need to find out how to do something in Perl.

The longest palindrome

Your Grade:
Things to do Grading Percents Frequency
Labs 10% overall most Wednesday's
Homeworks 10% overall TBA
5 Programs 50% overall
    p0: My Favorite Poem 4% Mon., Feb. 9
    p1: Vowel Counter 8% Wed., Feb. 18
    p2: How's My Writing? 8% Wed., Mar. 10
    p3: Author Attribution
         (Part I)
10% Wed., Mar. 31
    p4: Author Attribution
         (Part II)
20% Wed., Apr. 28
Exams 30% overall
    Exam I 15% Wed., Mar. 3
    Exam II 15% Mon., Apr. 19
    No Final Exam

Late Submissions:
Due is due. Always turn in whatever you have on time. Something turned in on time is much better than not having it accepted because it is late. Late is not an option. (Good, glad we can all agree with this).

Honor Code Revisited: It goes without saying that all submitted work will be the student's own, in keeping with the Wheaton Honor Code, unless the assignment has assigned groups. For labs, you may get "help" from fellow classmates, but remember that all completed work must be your own. Use discretion; don't ask your colleague for "the" answer or for a piece of software. However, I do encourage you to discuss the problem in general, such as the type of statements one might use. For homework, your answers and software must be your own from beginning to end. Here is an analogy. Almost no one would every "use/steal" a line or two from another person's poem. Consider it the same with your Perl programs. Don't "borrow/use" lines or sections of Perl from another classmate. Your program is (like) your poem; everyone's program should be unique.

Be wise. If a colleague is asking you for too much help, be honest and remind them your program is just that, your program.

On your own ....
(0)It is expected that you spend at least 4-5 hours on reading, study and preparation for every 90 minutes of lecture and discussion.
(1)It is expected that you spend at least 6-10 hours per week on your current programming assignment. WARNING: Programmers typically underestimate the time it takes to complete a software project; 6-10 hours per week on your programming assignment may be one of those "underestimations."

(0) The labs are a critical part of the course. In a way, it is your time to "hack", solve unique problems, and show that you can work hard on the problem at hand. Your labs will prepare you to work on your next programming assignment. You must be in lab to get credit for the session. If you happen to miss a lab you are strongly encouraged to do it on your own time, but please do not ask for credit.
(1) In order to best grasp the material presented in the lab, I strongly suggest that you completely redo any labs that you find difficult. (Read that last sentence again, unless of course you've already reread it once).

Your two exams will test your comprehension of Perl as well as your ability to write your own Perl. During lecture, I will give "hints" of what a typical test question might be; take good notes! There will be no makeups, nor will the lowest exam be dropped. If you are an athlete and/or you have a conflict with an exam date, please see me within the first week of classes.

A few homeworks are sprinkled throughout the semester. These are mostly to provide you with deadlines, practice with time management, produce (next) drafts of your programs, and a chance to take a lab one step further.

I have listed my office hours on the syllabus. But I'm always near a keyboard so we can schedule a time to meet. Study, study, study and talk about it with me and others as often as you can.

Please don't wait too long before you see me;
a quick chat in my office can often clear things up.
I'm here a lot...

Detailed Schedule
Readings and homeworks are assigned in lecture.

1W (1W means "1st week, Wednesday)
WED, Jan 28
  • Introduction, review of syllabus (wave to your instructor :)
  • A glimpse at some of the programs that you will write
  • Your goals and expectations

2M (2M means "2nd week, Monday)
MON, Feb 2
  • Intro to programming and the software development cycle
  • High-level and low-level languages
  • print "hello poets";

WED, Feb 4
  • Welcome to the csLab
  • Moving files around the network: SmartFTP
  • Lab: print "hello poets";
  • Assign Program #0:    p0 - "My Favorite Poem"

MON, Feb 9
  • Perl statements
  • variables, numbers, "strings"
  • built-in functions, e.g., the length() function
  • p0 is due today in class

WED, Feb 11
  • Guest Lecture: Dr. Joel Relihan, Professor of Classics

  • "Turning the computer into a reader of poetry."

  • Finding poems and stories on the web
  • Preparing files to be read by Perl
  • Reading text files in Perl
  • Counting and other math in Perl
  • assign p1 - "Vowel Counter"

MON, Feb 16
  • Intro to that cool operating system, MacOS X
  • Intro to regex (Regular Expressions)

WED, Feb 18
  • Lab: "Travels in DNA Land" (practice with regular expressions)
  • p1 - "Vowel Counter" is due

MON, Feb 23
  • making decisions with conditional control: if-elsif-elsif-else
  • index(), substr()
  • writing loops and repetitional control: while

WED, Feb 18
  • Lab: if-else, index(), word finding
  • assign p2 - "How's My Writing?"

MON, Mar 01
  • Review for Exam I

WED, Mar 03
  • Exam I (in class)

MON, Mar 8
  • more loops in Perl:    foreach
  • introduction to descriptive statistics

WED, Mar 10
  • Lab: introduction to Excel
  • managing experimental data: Perl to Excel

MON, Mar 15

WED, Mar 17

MON, Mar 22
  • Guest Lecture: Dr. Kirk Anderson, Professor of French

  • "Words words words: the use and abuse of literary concordances"

  • more of the foreach loop
  • hash tables in Perl
  • building a concordance

WED, Mar 24
  • assign: Read Rudman paper and answer questions for Monday
  • Lab: hash tables and concordances
  • assign: p3 - "Authorship Attribution (Part I)"

MON, Mar 29
  • due: Rudman paper and questions
  • storing characters in the computer: the ASCII (English-only) character code
  • handling international character sets: the UniCode standard

WED, Mar 31
  • Lab: Perl and UniCode
  • due p3 - "Authorship Attribution (Part I)"

MON, Apr 05
  • more statistics on texts - means, medians, standard deviation
  • gathering statistics in Perl

WED, Apr 07
  • more statistics on texts - z-scores
  • Lab: Excel
  • assign homework, due Monday

MON, Apr 12
  • Designing an Experiment
  • assign p4 - "Authorship Attribution (Part II)"

WED, Apr 14
  • review for Exam II

MON, Apr 19
  • Exam II (in class)

WED, Apr 21
  • Introduction to the Text Encoding Initiative (TEI)

MON, Apr 26
  • learning to mark-up poems in TEI

WED, Apr 28
  • Lab: TEI

  • p4 due

MON, May 03
  • more TEI

WED, May 05
  • Where have we been? What can you do?
  • Course Evaluations
  • No Final Exam for this course.

Readings   |   Intro  |   Determining your grade  |   Instructor's Home |   Dept Home

    Maintained by: Mark LeBlanc
    Dept of Math & Computer Science
    Wheaton College, Norton, Massachusetts