Baked-in captioning

Why norms must be continually critiqued

A frame from Rick and Morty featuring Jerry driving his car enjoying 'human music.' The simple three-note sequence is represented on the back of Jerry's car seat, which is transformed in my custom captions into a piano keyboard. When access is baked into a scene, it becomes integral, part of the visual landscape and contributing to its aesthetic. Baked-in captions don't simply access meaning linguistically but transform it. They blur boundaries, raising ethical questions, and yet address hard problems that are difficult to solve with traditional captioning methods. For example, to more fully describe the simple three-note sequence called "human music" in an episode of Rick and Morty, I transformed Jerry's bench seat into a piano keyboard.

Call them what you like: enhanced, extreme, integral, animated, or kinetic captions. These experimental and novel forms of digital accessibility break down barriers in ways that are likely to make some people uncomfortable. When captions are truly baked into a multimodal composition, they may no longer be recognizable as captions. Traditional U.S. captions add a hidden layer of text that can be displayed discreetly at the bottom of the screen. By contrast, fully integrated captions may be indistinguishable from the programs themselves. Because they are baked-in, they are also transformative, which means that integral captions can significantly alter the meaning of movies and TV shows, a problem I return to at the end of this webtext. What's more, they can't be turned off (or "closed"), so everyone is forced to view them (a problem that can be solved today, in an era of infinite data storage, by providing multiple versions of the same movie or program). We tend to use the term "burned in" to refer to hard-coded subtitles or on-screen text that can't be turned off, but even burned-in subtitles will still be recognizable as subtitles. What I'm proposing is something quite different and quite disruptive to our commonsense notions of digital accessibility and the boundaries we have erected between programs and captions, producers and captioners. Baked-in captions embody the unrealized promise of universal design: a multimodal text that is designed from the start to be accessible without requiring additional steps in post-production.

Experiments with music captioning

Captioning music is challenging. Doing it well demands creativity and sensitivity to its functions in specific contexts. The usual approach is to use words to caption one or more of the following: artist's name, song title, music lyrics, short description. But what may be significant about music in a particular scene may have little to do with the specific semantic or linguistic content of its lyrics. It may be more helpful to describe the mood or feeling of a piece of music, particularly in the case of instrumental or ambient music, in addition to or instead of its lyrics. Unfortunately, descriptions can easily devolve into stock phrases that are short on helpful information. Terse descriptions can also strip music of melodic subtleties that are difficult to express in a few words. When a melody matters—when someone is humming a tune, or a joke hinges on having access to a sequence of musical notes—traditional captions offer limited options.

Let me suggest two supplements to linguistic description: the animated musical staff and the color-coded piano keyboard. Each option does require from readers some basic music literacy, which of course places additional demands on top of the demands of print literacy. Each option also runs the risk of calling too much attention to itself, seeming garish or even childish. Nonetheless, I suggest that the risks are outweighed by the benefits that come from re-embodying captions and re-imagining boundaries.

First, consider the musical staff as a caption supplement for music. In a pivotal scene from episode three of the first season of The 100 ("Earth Kills," 2014), Clarke (Eliza Taylor) kills Atom (Rhys Ward) in an act of mercy after Atom is mortally injured in a toxic post-apocalyptic fog. As Clarke plunges a knife into Atom's neck, she hums a line from a children's lullaby. In this moment, with rival Bellamy (Bob Morley) looking on, she shows her leadership potential. She assumes the role of nurturing mother figure to the other juvenile delinquents. She is also willing to make the most difficult decisions about life and death on an unforgiving planet. But the original captions obscure the nurturing connotations associated with the lullaby. The original captions only reference the music with a single nonspeech caption: [HUMMING]. They do not identify the song as a lullaby, name the lullaby, or identify that Clarke hums the same line four times. (Warning: This clip depicts graphic violence.)


Source: The 100, "Earth Kills," 2014. Hulu. Original captions.

The source of the lullaby is "All the Pretty Horses," also called "Hushabye," a traditional American lullaby. The song tells the story of a mother or caretaker comforting a child, akin to what Clarke is doing for Atom. "Hush-a-bye, don't you cry,/ Go to sleepy little baby." According to Lyn Ellen Lacy (1986), "'Hushabye' is known to have a bitter background: it was a southern slave song sung by a black nurse or 'Mammy' to white children while her own children were left alone and unprotected" (pp. 75–76). One verse describes "'bees and butterflies...peckin' out the eyes' of her own 'poor little lambie'" (p. 76). Because Clarke merely hums two lines repeatedly, I don't want to make too much of the lyrics. Captions are not annotations. It's enough simply to point out that caption readers need access to the name of the lullaby in order to draw the clearest analogy: Clarke is to Atom as mother is to child. The "bitter background" of the song offers a bonus to viewers, in my opinion, an additional intertextual link connecting the context of slavery to the one hundred juvenile delinquents who were held in captivity on the space station and given little choice about being sent down to a hostile, irradiated, post-apocalyptic planet to determine if it is safe for the adults. None of these intertextual associations can be made unless the captions go beyond [HUMMING] to name the lullaby.

Clarke's humming extends across the scene as she repeats the same 14-note verse more than three times. Humming is hard to see, even when the camera is focused tightly on Clarke's face. Lips don't have to move when a person is humming. Unlike speech, there may be no visual cues in humming to link each caption to its speaker, making humming a compelling case and challenge for caption studies. To embody Clarke's humming with meaning, I experimented with a musical staff across the top of the screen. Each note that Clarke hums was highlighted in red and supplemented with captions at the bottom of the screen to identify the title of the lullaby and to indicate when Clarke (hums the same verse).


Source: The 100, "Earth Kills," 2014. Hulu. Animated musical staff was created by the author in Adobe After Effects.

Second, consider the color-coded piano keyboard as a caption supplement for music. In an episode of Rick and Morty ("M. Night Shaym-Aliens!," 2014), Jerry, the bumbling father of young Morty and disapproving son-in-law of Rick, is caught in a rudimentary computer simulation of reality but doesn't know it. He continues to go to work, make a winning sales pitch, and have sex with Beth, without realizing that everything is a pale, robotic reflection. But Jerry was trapped by mistake. Rick is the aliens' true "target." When the aliens discover their error, they severely reduce the processing power allotted to Jerry's simulation. As the boss alien (voiced by David Cross) puts it: "Well, cap [Jerry's] sector at 5% processing, keep his settings on auto, and we'll deal with him later." Even with a barely functioning simulation of his reality, Jerry still doesn't notice that anything is wrong. His car's "earth radio" plays "human music," a simple repetition of two musical notes in a three-note sequence of computer beeps: C-D-C with a pause following each sequence. And Jerry likes it.


Source: Rick and Morty, "M. Night Shaym-Aliens!," 2014. Hulu. Original captions.

The original captions describe the repetitive three-note sequence as [Repetitive rhythmic beeping]. But, I wondered, could this caption be supplemented with a fuller description? A visual interpretation of that simple sequence could reinforce the comedic effect of putting Jerry in the most absurd and robotic version of his reality without him noticing. One way to embody those notes with visual meaning is to recognize that the long bench seat of Jerry's car could serve as a makeshift piano keyboard. I experimented with converting the seat into musical keys, which was simply a matter of adding a few black keys to frame the two piano keys in the sequence. Then I highlighted each note in a different color (red for the first note and blue for the second note). It seemed unimportant to specify the notes themselves (C and D), because the point was the simple sequence. So I placed a musical eighth note on each highlighted note as it played on the back of the car seat.


Source: Rick and Morty, "M. Night Shaym-Aliens!," 2014. Hulu. Animated piano keys were created by the author in Adobe After Effects.

In another version of this experiment, I used hand signs from the Kod├íly or solfege method instead of musical notes. This method was intended to supplement music education for children by providing visual aids for each note in a musical scale. The hand signs were made famous in Close Encounters of the Third Kind (1977) as an embodied representation of the alien's 5-note motif. In the Rick and Morty example, C and D correspond to Do and Re in a C-major scale and have corresponding hand signs: a closed fist for Do (strong tone) and a raised hand, fingers together, for Re (rising/hopeful tone). Granted, it remains to be seen whether the hand signs—or the color-coded piano keys for that matter—enhance traditional captions or only increase confusion. At the least, the colored keys, insofar as they intrude on and transform the scene, raise questions about the limits and boundaries of captioning. I welcome these discussions as a way to dislodge and disrupt our assumptions about the nature of access itself.


Source: Rick and Morty, "M. Night Shaym-Aliens!," 2014. Hulu. Animated piano keys were created by the author in Adobe After Effects.

Experiments with custom text animations

When form and content blend seamlessly together, when language and meaning are embodied in visual design, captions become more expressive and efficient at delivering information. In caption studies, we have yet to consider seriously the notion that formal features—the visual elements of caption design—can reinforce meaning, style, aesthetics, and nuance in film, or that other visual notational systems besides language (e.g., icons, graphics, color) can supplement traditional captions. Let me conclude with four examples of enhanced captions that express the meaning and function of speech through form. These examples use custom animations (built on animation presets in After Effects) to reinforce how speech is expressed or experienced by characters.

Animations should be used sparingly, even as they open the domain of captioning to greater creativity and new possibilities. Animated text must never confuse or distract viewers. It must never feel gratuitous, cheesy, or tacky. It must always be highly legible. It must serve well-defined functions and reduce ambiguity. It must be clear to viewers what the animated text is doing, which is why studies of animated emotions in captioning can be problematic. In user studies, viewers haven't always been sure what emotions the bouncy or shaky text were intended to signify (Rashid, Vy, Hunt, & Fels, 2008, p. 516). For these reasons, I have steered clear of animating emotions, even though most studies of kinetic captioning have focused on communicating emotions (e.g., Malik, Aitken, & Waalen, 2009). Instead, I've identified captions that aren't expressive of human emotion per se but manner or quality of speaking, such as haunted, drugged, and echoing speech. When the quality of speech is significant or pronounced, animations can be carefully designed to clarify or enhance its identity.

Four experiments follow. First, consider the whispers of ghostly, invisible children who say a prayer in The Haunting (1999), a horror movie starring Lili Taylor. The invisible children are whispering a bedtime prayer in unison ("Now I lay me down to sleep"). Although the original captions identified the [Children Whispering], I wondered whether the haunted whispering itself could be visualized. The scene is supposed to be unsettling but the captions don't reinforce the mood. In my experiment, the ghostly whispers materialize out of a smoky ether.


Source: The Haunting, 1999. Netflix. Original captions.


Source: The Haunting, 1999. Netflix. Custom captions were created by the author in Adobe After Effects.

Second, consider the otherworldly voices that materialize out of newspapers in Constantine (2005). In one scene, Father Hennessy (Pruitt Taylor Vince), who has an ability to communicate with the dead, moves his hands over newspaper articles announcing recent deaths. The voices of journalists seemingly rise up from the newsprint as he scans the ether for information about a particular death. Because the original captions resemble basic speech captions, they feel detached from the otherworldly power that is producing them. My experimental captions mimic the force of Hennessy's ability, akin to reading Braille, to hear with his hands. (Warning: This clip describes graphic violence.)


Source: Constantine, 2005. Netflix. Original captions.


Source: Constantine, 2005. Netflix. Custom captions were created by the author in Adobe After Effects.

Third, consider the altered speech that Ethan (Matt Dillon) hears after being drugged and pursued by Nurse Pam (Melissa Leo) in Wayward Pines (2015). Ethan experiences Pam's words through the echoey, foggy haze of a sedative taking effect. But the original captions don't convey what is happening to Ethan. Like nearly every other speech caption we encounter, the original captions are sober and formal. To support the intensity of the scene, as the seemingly evil Nurse Pam closes in on a wobbly Ethan, I experimented with a number of text effects.


Source: Wayward Pines, "Where Paradise is Home," 2015. Hulu. Original captions.


Source: Wayward Pines, "Where Paradise is Home," 2015. Hulu. Custom captions were created by the author in Adobe After Effects.

Finally, consider the repetitive sounds of an echo chamber in Portlandia (2012). In one scene, Fred Armisen plays a guy who has built a music recording studio in his home. The studio includes a separate echo chamber. He demonstrates a number of recording effects as his friend (Bobby Moynihan) watches silently. The echo chamber scene is quirky and weird (like the show itself) but the captions don't quite live up to the intended silliness. "[ECHOES] CANYON! CANYON! CANYON!" provides some distilled information about what is happening but without the nuance and annoying intensity of the echoing sounds. I experimented with some radical transformations of the original clip in an attempt to visualize the force of the echo sounds. I created two new videos of approximately one second each for each utterance ("Canyon" and [Slurping]). Then I looped each video inside new compositions, timing each loop to the speed of the original echo. I pasted each looping video into the original composition and cropped it tight around the face of each speaker using a mask. For each new echo sound, I dropped a new cropped loop into the original composition, which resulted in numerous looping faces in the frame. Feather effects were used, albeit with limited success, to blend the loops into the video.


Source: Portlandia, "Winter in Portlandia," 2012. Netflix. Original captions.


Source: Portlandia, "Winter in Portlandia," 2012. Netflix. Custom captions were created by the author in Adobe After Effects.


Source: Portlandia, "Winter in Portlandia," 2012. Netflix. Custom captions were created by the author in Adobe After Effects.

This experiment, like some of the others, is disruptive and problematic. It fundamentally alters the scene as it aims to replicate the annoying repetition of the echo chamber. While the original captions did not capture the intensity of the layered sound effect, the experimental captions are intrusive and controlling, perhaps not unlike the sounds bouncing off the walls of the echo chamber. This experiment raises ethical questions about the nature of captioning itself and the captioner's role. When the line between art and captioning blurs, that role may be productively questioned. Such experiments, should they continue, need to be done in the style of Night Watch (2004): in close collaboration with the producers. In the composition and technical communication classrooms, where students are both producers and captioners, instructors can present captioning as an integrated, creative process that inevitably raises provocative questions about meaning, authority, access, and the different affordances of sound and writing.

Next: Conclusion: Imagining different futures for captioning