vote. seriously.

If the election turns out in favor of John McCain, you will probably hear sounds like the following coming from 1600 Thames tomorrow night. I mean, seriously. I’ve had moments of respect for McCain but everything good he’s done is overruled by the cynical, pandering lack of judgement he showed in choosing Sarah “racist hate rally” Palin as his running mate. I choose hope, thanks.

Good luck, Mr. Obama.

For those who don’t remember the original Bud commercial, go here.

Posted by beautiful on November 3rd, 2008 .
Filed under: 2008, [2008.11] November | No Comments »

nostalgia

You were there, and so was L&P. Warning: contains the word “bum.” But it’s delivered in the fabulously flat accent of Jemaine Clement (I think).

Terminology notes for Americans: puku=tummy. L&P=fizzy yellow soft drink. The actual name is “Lemon and Paeroa” (Paeroa is a town).

Posted by beautiful on September 6th, 2008 .
Filed under: 2008, [2008.09] September | No Comments »

monday notes

Day one. The RBS turns out to be in the basement of the library, and, having missed the mixer on Sunday, I’m going to register. There’s a little confusion: the library is locked until 8, so there are a lot of vaguely lost looking people with name tages slouching outside. They are mostly of the geeky librarian personality type (like me): thus, cautious conversation.

Down in the basement, there is milling about. The meeting room is rather cramped for so many people (there are about 35 people here for 4 classes), but most importantly there is coffee. Conversation consists of “which class are you taking?” and “what’s your collection?”. I, being me, stand around uncomfortably and occasionally engage with people who make eye contact. Note to self: must figure out how to mingle successfully in social situations.

The class seems to be split into nerds (3) and collections specialists (7). Three of the attendees are from the Walters Museum in downtown Baltimore, and are working on the Archimedes Palmpsest. Their next project is the St. Johns Bible, which is an illuminated, handmade bible being made right now (more here).

A brief history of tei antecedents

1988-98 Standard Generalized Markup Language (SGML).
An early attempt to represent information (not code) in an interchangeable format, international standard. Requires interoperability - basic bedrock is ASCII - very limited character set. (Later, there would be entities (&eacute) added into the spec.) SGML was not the first markup language, but it was the first attempt at a standardized ASCII cross-platform tool.

HTML is a limited tag set derived from xml specifically designed for screen display. It was primarily (DS would say exclusively, although this is no longer the case with XHTML) for text formatting, rather than organization. Earlier DTDs for HTML were not very well understood or implemented and the browser became the default validator for most people. But we have to give props to HTML for widening out the use of markup to the point where it was being done by “regular people” - without it, the various SGML sets would probably have languished in CS and (maybe) library departments.

Unlike HTML, every other subset of SGML is a list of elements that DESCRIBES what something IS (this is now coming back into XHTML with logical DIVs). DS says: “XML is SGML with the stupid bits taken out.” by separating appearance from structure. In a surprising move, microsoft has been a lead driver of xml development. DOCX dos are now readable in ASCII.

Validation

There are two ways to validate your documents: against a DTD (which SGML uses), or a schema (which XML uses). The advantage of the latter is that it’s also written in XML, so if you want to read it you can figure it out without having to learn another language. To validate XML, you’ll need a marked-up text, a parser, and a schema. The tool we’ll be working with oXygen, uses the built-in parser most of us have on our laptops. (When you’re validating your XHTML docs on the web, the process is the same: the browser is calling the parser on the W3C server, and it’s comparing the page with the DTD and passing on the results to you.)

One thing I think is super-cool is that you can actually infer what the schema says by reading the markup usage. There are even such things as automatic schema generators, which generate a schema by looking at example xml documents. But you’d better make sure your doc includes at least one example of every usage!

Unlike the XHTML DTDs (and this is where it gets interesting) you can also make up your own tags and rules, or combine them - remixing an existing one such as TEI with your own custom schema. (But if you do, make sure you rename the schema to something new - since if you then pass the document on to someone else, they might try to validate it against a standard schema such as TEI have have problems.) This is a key part of the flexibility of xml as opposed to XHTML. Instead of having to use a very limited set of tags approved by the W3C, you can make up your own. (For example, I can declare a tag called <rant> with its own rules and styles, and then insert it into my document.) This is why it’s called extensible.

xsl

Let’s get even more cool. We’re used to using CSS to style the look and feel of our XHTML documents. For xml, there’s a similar function, using XSL (extensible stylesheet language). xsl does a lot of the same things, but with two key differences: 1, it’s written in xml (so you don’t have to learn another language); and 2, it doesn’t just style. It can also run operations (called transforms) on a document, which is something CSS can’t do. For example, you could ask an xsl sheet to insert extra tags, or translate an xml document to another type - eg XHTML, pdf or wml. You can also insert pieces of XHTML and CSS for outputting, or suppress certain tags if you don’t want a section to show up.

One thing I’m having trouble grasping is this idea of transforms as a way of including new information. In php/mySQL, you’d use a php script to do those changes, slurping the data from a database, and then style it. But in xsl/xml, there’s no database necessarily; you’re using xsl to “run the query” on the static xml document. Hmm.

What followed was a discussion in which I tried to grasp this concept. One of the things I didn’t quite get was that in php you can get data dynamically through a front end, i.e. use a form, send a query, and that determines what comes back. But with xsl, in order to run the query you need a separate xsl document for each different bit of data - because it’s not a “query,” it’s a transform using a static stylesheet. One thing another person in the class suggested was that you could use a scripting language to auto-generate an xsl on the fly with each query. But we agreed this was a bit much. I still don’t quite get it - I wonder if xsl is flexible enough to run “queries” onside itself (like a programming language). We didn’t explore xml databases this week, and I also wonder if that’s where the missing part of the puzzle comes in.

Overall, though, the model is pretty neat: You have an xml doc. You can use multiple xsl sheets to transform that doc into different formats, including XHTML. So on your server you might have a folder with one xsl for pdf, one for XHTML, one for wml etc. And you can also have an external CSS sheet, of course, for styling the XHTML.

Tomorrow: the TEI schema, for fun and profit (mostly fun).

Notes to self

Can you key & do 2nd-level XML in the same way you’d use a key to a second table in SQL?
Can you do conceptual markup? Is this where Topic Maps might come in? Could you do two (conceptual and structural) and link them?
(Ambitious: if I was working on a Flash piece, like Blind Side of a Secret, is there a way to get xml out of there (I know Flash can do this), and then do something like the following:
1) mark sections
2) mark triggers
3) use these to build a TEI description of the prject
4) cross reference with the Blind Side video?
Further reading to pursue: Cohen/Rosenweig, Digital History: A Guide to Gathering, Preserving, and Preserving the Past on the Web. Jerome McGann, Radiant Textuality.
Check out SMIL: xml markup for streaming media.

Posted by beautiful on June 27th, 2008 .
Filed under: 2008, [2008.06] June | No Comments »

Sunday notes

[I should say before I start: Jette, I promised to blog about this week! And it’s already Wednesday. What a slacker. So I’m transcribing my copious notes and redating the entries to match them with the days.]

I’m at the Rare Book School at UVa this week, attending a week-long class on Electronic Texts and Images, which is being run by the highly accomplished and very kind David Seaman. Mostly I’m here to learn about TEI (the Text Encoding Initiative) and what it can do with me. I’m particularly interested in trying out some kind of TEI implementation for Rhizomes (Hyperrhiz might be more tricky so I’ll stick with Rhiz for the time being).

These notes are from my Sunday reading.

All about XML

xml is descriptive, not procedural
it describes the parts of a document
xsl stylesheets process the document (not just style, but actively process)

doctypes — a formal description of parts & structure
checked by a parser against a specification

platform independence. all xml documents use unicode

textual structures:
use meaningful terms of structure that are useful for processing (or even just finding!). e.g. sentence, para, etc. These are structural — the form.

q: but are there also structures of meaning? e.g. pick out extended metaphor or lineage. The q is how much you’d have to mark up a doc (how fine-grained) to be as flexible as a trained human reader. How are you going to code for intellectual context? (does this amount to annotation)

an early question i had: it seems like you could use xml as a simple textual database. what would be the limitations compared with a relational databse like mysql (eg no keying?). Q to follow up: how does an xml database work?

<my :line>blah</my>
my=the namespace prefix
line=an element

<xml> is a reserved namespace

Schemas (schemae?)

A schema is the formal criteria for a valid document - it’s a specific version of a DTD. “Every schema results from an interpretation of a text” - ie we necessarily make presuppositions when defining a schema (just as we do when designing a database).
sample:

poem_p = element poem {heading_p?, stanza_p+}

where poem_p is the element pattern, poem is the name of the element (generic id), and everything inside the curly brackets are the content-model (ie list what it can contain).

Occurrence indicators: + = one or more
? = one at most but not mandatory (could be zero)
* = one or more but not mandatory (could be zero)
Connectors: , |
, patterns must be in sequence, ie heading then stanza
| can be one or other but not both

Nesting:

poem_p = element poem {heading_p?, (stanza_p+ | couplet_p+ | line_p+)}

would account to “a poem is made of two parts: one (optional) heading and some content which will be one or more stanzas, OR one or more couplets, OR one or more lines.”

More tricky:

poem_p = {heading_p?, (stanza_p | couplet_p | line_p)+ }

“a poem is made of two parts: one (optional) heading, and one or more of a mixture of stanzas, couplets and lines.” Doesn’t make sense unless you think of it as a loop: choose one of the three over and over.

XPath

A non-graphical tree descriptor: shows a hierarchy that looks like a pathname, and also an optional count. Example:
/anthology/poem[1]/stanza[2]/line[4] =”line 4 of stanza 2 of poem 1 in the anthology.”
/anthology/poem/stanza/line = “all lines contained by all stanzas contained by all poems in the anthology.”
This is called OHCO: ordered hierarchy of content objects. It’s useful, but too hierarchical - doesn’t allow for overlap.

Attributes

You can define multiple attributes, e.g.:

att.status = attribute status {"draft" | "revised" | "published"}
<poem status="draft"> ... </poem>

You have to include attributes in your schema, e.g.

poem_p = element poem {att.status?, heading_p?, stanza_p+}

A particularly useful attribute is id, e.g. <poem xml:id = “Rose”> is an unique identifier for the Rose poem so you can refer to it later. To call it, you’d say “Blake’s poem <poemref target=’#Rose’ />” — acts like an anchor.

You can use attributes with XPath, e.g.
/anthology/poem[@status=’draft’]/heading accounts to “the heading of every poem with draft status”.

Character references

You have multiple ways of referring to characters eg eacute or é . You can use character references to provide translations or substitutions for readers using different encodings. It’s called an entity declaration, e.g. < !ENTITY eacute “&#233″>. You can also use it as find & replace: < !ENTITY HJB “Helen J Burgess”>.

Misc

Formatting processing instruction < ?tex \newpage ?> tells the tex editor to force a new page.
<?xml-stylesheet href=”whatever/document.xsl” type=”text/xsl”?> will attach an xsl sheet.

Something neat: “A valid XML document necessarily specifies the schema in which its constituent elements are defined” (p.26). — i.e. you can infer the schema from the xml document. There are actually a few tools that do a fairly good job of this.

Namespaces: allow you to take terms that might have different meanings in different contexts. For example, <table> might be a wooden table, or a data table, depending on context. Using a namespace you can define which one you want it to be.

FInally: using marked sections turns off formatting (a bit like <pre>):

< ![CDATA[<line> ... ]]>

will print <line> … </line>

Posted by beautiful on June 15th, 2008 .
Filed under: 2008, [2008.06] June | 1 Comment »

I like to hear the rain come down

Caution: contains common (but hilariously delivered) expletive at the end.

Posted by beautiful on May 12th, 2008 .
Filed under: 2008, [2008.05] May | No Comments »

leave me alone box

Posted by beautiful on April 25th, 2008 .
Filed under: 2008, [2008.04] April | 1 Comment »

biofutures

Biofutures is now available for pre-order at Amazon. For more details about the project, you can go see our website, biofuturesdvd.com. Whee!

Posted by beautiful on February 29th, 2008 .
Filed under: 2008, [2008.02] February | No Comments »

The Author

Greetings from Helen J Burgess, nerdy English prof. I teach in the communications/technology track in the Department of English at UMBC. I also edit Hyperrhiz: New Media Cultures, a peer-reviewed journal for net art and electronic literature. I'm married to Tim Menzies, amazing rocket scientist. This blog is about nothing in particular.

Hero of the Day

archives