Your eyes are absorbing this webpage. They're passing over this, this, then this word, right now. That's how reading works, online: you take this for granted. But what if you couldn't?
We grant our gaze to electronic screens for most of the day, and in return, they give us anything we want. We stare; they glow. We rarely speak, and neither do they.
And this makes sense! The internet is a boundless collection of text, images and video, channeled to flat pieces of glass and plastic, beamed through lens, retina, and nerve, all the way into our brains. It can show us anything, and for most web users, that's exactly what it does.
But for millions of others—those who are unable to see—the web is a wildly different place. Characters become sounds. Layouts are meaningless. Images are, at best, words, and at worst, blank spaces. And yet the blind browse the same internet as everyone else, every day. They use the same gadgets the sighted do, and happily. But how?
The Sightless Internet
The most common way for the vision-impaired user to access the internet is with a traditional browser and text-to-speech software. You're probably already vaguely familiar with some of it—Windows users will have come across Microsoft Narrator, and I defy you to find a single Mac OS user who hasn't forced VoiceOver to hurl insults at his friends. It's these tools—or tools like these—that millions of people depend on to access the internet.
But to say that blind users just "hear" the internet is a gross oversimplification. It's not just text and images that blind users miss, it's virtually every part of the fundamental browsing experience.
Here, try this: Stop reading for a moment. Lean back and survey this page. Now think about what you do when you visit this site. You eyes are probably drawn to the stories listed horizontally across the top of the page. They look important, right? Why else would they be up there? Further down you'll see the site's banner, but you probably don't spend much time looking at that, and your eyes dart to the list of stories in the middle of the page. You scroll down, glancing at pictures then headlines, or perhaps headlines then pictures. The margins of the site are either full of ads or static information, so you probably don't pay them much mind. Now try somewhere else, somewhere more visually complicated. Think about how you're reading it.
Your habits aren't just sight-dependent (obviously), they're pretty weird. Your eyes fly around, sometimes randomly and sometimes in response to cues onscreen. You hunt for links and cherrypick from galleries. The word you're looking for catches your eye, so you click it. Consciously or subconsciously, you usually know where to look.
With a screen reader, there is no "looking." It's a simple parser, and it starts at the top. It combs through a website a lot like a web browser combs through HTML, except instead of rendering an IMG tag as an image, or an EM tag as italicized text, it converts them to sounds: a readout of the image description—the alt text—and a changed audio inflection, respectively.
Then, of course, there's all that text. On a visually rendered webpage, it lives in blocks and columns. If you're lucky, these blocks and columns will be organized in a logical or familiar way. They'll be laid out, basically. But that's such a visual concept. What happens when a layout becomes words?
"Screen reading software presents the webpage as a set of lines and links, and possibly other things—frames and headers, if the software employs that." That's Paul Schroeder, VP of Programs and Policy for the American Foundation for the Blind. Vision-impaired himself, he uses screen reading software for daily browsing. "When you log onto a website using screen reading software, what you start with is a site that tells you how many lines, and some basic structure—but not very much. When you're experiencing a cluttered site, the information you want may be 300-400 lines in, and if you're going line by line, or section by section, it can take you a very long time to find what you want."
Think about that: The internet is anything but linear—website code is nested and cryptic, and often looks jumbled and out of order. (Right click, view source! Oh, yikes, maybe don't.) Websites often have multiple visual directions, or sometimes none at all. Yet audio screen readers—and Braille modules, which display about one line of text at a time—have to render them in sequence, somehow. And listeners have to make sense of it, to develop some kind of intuition for a site's layout and structure based on very, very small amounts of information, all out of order.
Of course there are tricks. Screen reading software, like VoiceOver in OS X or JAWS for Windows, is more clever than I've made it sound. It parses websites for headers, and sometimes navigational elements. It can give you a literal description of a page's layout—"three columns, two rows"—and its surprisingly unrobotic voices reflect all kinds of punctuation. It even differentiates between outwardly identical tags. My editor actually just sent us an email to this effect: Stop using < EM > and < I > tags interchangeably. One is for italics, and one is for emphasis. It's a difference you can't see, but it's a difference some will hear.
These are the small features that make spoken webpages usable, but they can't be taken for granted: People who design websites have to be vigilant about including headers to divide large blocks of text, to include alternative text for images, and to use their tags properly. Problem is, a whole ton of sites—ours included—often don't. Ever had—or overheard—a tedious argument about whether or not a site is "standards compliant", as in W3C, HTML compliant? Well, this is like that. Actually, this is that. The W3C defines standards for accessibility just like they define standards for the rest of the web. But like those other standards, they're often disregarded.