(Ab)using RSS for poetic reasons

[01-2019 / English / 847 words]

Computers don’t like to be creative. Forcing a computer to do so, and then looking through its output is like reading tea leaves – it is us that assign meaning to the different lumpy shapes left as residue in the cup. Generating meaningful poetry seems impossible – at times it is frustrating, and at times hilarious. However, the word groupings that arise out of computer-mediated randomness still have the potential to intrigue us, and provide inspiration.

In this experiment, I use Markov chains to create poetry from RSS feeds. Markov chains are an easy way to generate plausible-sounding text based on simple statistical methods. This is done by training a model on a large corpus of text – for example your favorite books. For each word in this corpus of text, all successor words, and their respective likelihood of following, are recorded. Then, by pure randomness, new text is generated based on this model. I used the module markovify for this process. You can read more about Markov chains in my post on how to build a poetry bot.

RSS is a great standard that allows different online sources to provide their information in a structured format. Many online newspapers provide machine-readable information via RSS feeds. This makes them an attractive text source for me, and I ended up incorporating several of them into my program (Guardian, New York Times, BBC, LA Times). By pulling text from newspapers via the module feedparser, and cleaning it up with the module beautifulsoup, I was able to generate fairly interesting poetry. However, the poems still had a dry feeling to them – after all, they were nothing more than scrambled newspaper clippings. To get more artistic control, I blended the occasionally bland newspaper text with different works of fiction. I trained several models on wildly divergent source texts, and ended up being able to generate poems that were markedly more interesting to read, and were exhibiting recognizably different moods.

The model that I christened "dystopic", and that I trained on the books “1984”, “Brave New World”, and “Neuromancer”, produces ominous text mentioning burning buildings, devastating blazes, and nightclubs. The intellectual model, trained on “Ulysses” and “Naked Lunch”, muses about a majorgeneral Tweedy, candy bars, Dublin, and “endlessnessnessness”. The model that I call abrahamic, trained on the Tanakh, the Bible, and the Quran, seems to be utterly obsessed with laws, the Genesis, and – no surprise – god. And, finally, the erotic model that I trained on “Memoirs of Fanny Hill”, “120 Days of Sodom”, and “Tropical Cancer”, tends to talk about the “fucking business”, various body parts, and innocence (or the lack thereof).

I struggled a bit with balancing the resulting blend of newspaper text and fiction text to achieve poems that are both relevant in their description of real-world events, and also exhibit the atmosphere of their “parent” works of literature. Since RSS texts are generally much shorter than the fiction models (which are trained on several books), I had to create a dynamic weight factor that balanced the weight given to each type of text by multiplying the frequencies of words from the shorter text in the combined model. If you are interested in playing around with different weights yourself, you can do that in this file.

Poets will probably want to kill me for saying this: the one thing that separates contemporary “free-form” poetry from regular literary text is a line-break. This is because almost any text – when cut into several lines – sounds somehow like poetry. If you don’t believe me, you should try it yourself with a sentence from your favorite book or from today’s newspaper. Line-breaks shouldn’t be completely random, however. I found out that a good way to get decent-sounding results is to break lines at comma and dot characters. This respects the syntactic flow of a sentence, and seems logical to a reader. Many times, however, even with this simple pre-processing I got very long lines of text that refused to be chopped apart. I solved this with a recursive function that wraps lines if they are longer than 35 characters.

I like the image-processing module pillow because it allows for a large degree of freedom. To prettify the poem output a bit, I started with the idea of a non-uniform background pattern that is inspired by the physics concept of Brownian motion. Random gray-scale ellipses give a grainy feel without distracting too much from the foreground. Mixed into the blurred background is a dash of color that is the exact opposition to the color used for the text. This ensures a maximum of contrast, and also gives some visual tension to the image. I chose the mono-space font Consolas for the text because its characters all have the same width, and it radiates a comforting command-line esthetic.

Thanks for reading! You can find rss_poetry on my Github. If you have questions, comments, or unrelated thoughts, I’m always happy to talk. You can reach me via fd (at) fabiandietrich (dot) com