Posts about software

Designing a Terminal for an Audio-First Workflow

Designing a Terminal for an Audio-First Workflow

Most shell environments assume a visual interface. Prompts are colorful, information-dense, and optimized for quick scanning. My workflow is different. I'm blind, and I work with a screen reader, which means my terminal is fundamentally an audio interface. Instead of scanning the screen, I'm listening to the environment as I work. That changes what matters.

Most terminal workflows carry a lot of unnecessary noise such as long paths, redundant context, and visual-only cues. Designing for audio forced me to remove that noise entirely.

The result is a shell environment that’s faster to navigate, easier to parse, and often better even if you can see the screen. If you like what I describe here, take a look at my dotfiles repo for your own inspiration. If you are interested in what path substitution built directly into shell configuration might look like, and maintain shell code, I'd love to talk.

Where this started

While I was working at Google, I spent a lot of time inside the monorepo. (If you haven't encountered that model before: imagine nearly the entire company's code living inside one enormous repository). The ideas that eventually led to this system started there, but the implementation in this repository was written later from scratch and is entirely my own code.

The paths were extremely long. In some cases it could take several seconds for my screen reader, even reading at 700 WPM, just to tell me where I was during a large refactor. A typical path might look something like this (simplified):

/mounted/path/to/the/monorepo/perforce_client/root/javascript/namespace_part/namespace_subpart/namespace_subpartsubpart/app_root/subsystem/main/tests/audioInformation_test.ts

For a sighted developer, this is mostly a visual annoyance. You glance at the prompt, pick out the important segment, and move on. For me, the terminal had to speak the entire thing, or I had to interrupt my work and read it piece by piece. Every time the prompt appeared, my screen reader would start reading the path, and after a while I realized I often had no idea where I was unless I waited several seconds for the prompt to finish speaking. That was not going to work.

So I started building tools to compensate. First came "teleport" functions. These were small helpers that jumped directly to important parts of the repository:

function jsdf {
  cd <path_to_google_drive_javascript_code>
}

function jdf {
  cd <path_to_java_drive_code>
}

I can't show the real internal layout because I don't want to reveal how Google's internal monorepo is organized, but jsdf simply meant "JavaScript Drive frontend code." This saved a ton of time in typing along, a simple jsdf;cd infra would get me somewhere I needed to be to do something on a specific file, but it wasn't enough. These worked, but they were crude, and I was still constantly hearing huge paths in the prompt. Eventually I added a small rewriting pipeline that shortened common segments before displaying the path. The first version was extremely simple:

sed -e 's/path_1/short_segment/' \
    -e 's/path_2/another_short/'

It was ugly, but it proved something important: shortening paths dramatically reduced cognitive load. To make that concrete, here is the kind of transformation I was trying to achieve.

Before:

/mnt/c/users/driem/programs/python/ai_image_describer/main/tests/audioInformation_test.ts

After:

py/ai_image_describer/main/tests/audioInformation_test.ts

Saving keystrokes was important, but not as important as saving attention. Hearing a short, predictable path segment is far easier than listening through a long directory hierarchy every time the prompt appears. At Google, the savings was well over two thirds the path length, and left me more mental capacity to actually think about the problem at hand. That idea eventually evolved into the alias and namespace system I use today.

Making terminal sessions resilient

My workflow makes this problem harder to ignore. At Google, and still today, I am often working from home, a café, my desk, or even while camping in the middle of the desert, so most of my development happens inside a persistent tmux session on a remote machine. I jump between windows, reconnect from different devices, and keep long-running work alive in the background. That means I'm constantly re-orienting myself: switching panes, reattaching sessions, and resuming work. Every time I do, the first thing my terminal speaks is the full working directory. If that path is long, I'm back to waiting several seconds just to answer a simple question: "Where am I?"

Path aliases

The first step was simple: shorten frequently used paths. For example, instead of typing cd /mnt/c/users/driem, I can define an alias:

driem /mnt/c/users/driem

and then navigate with:

p driem

The p command is a thin wrapper around cd, backed by the alias map. I also use pd for pushd when I want a quick stack to bounce between locations. Both commands support tab completion over the alias names, so navigation stays fast even as the alias set grows. This alone reduces both typing and how much the prompt has to speak.

Each alias also exports an environment variable. For example, P_driem - which makes it easy to reuse paths in scripts:

tail -f $P_driem/log.txt

This helped, but it didn't scale well once directory structures became deeper and shortcuts naturally nested. That led to the namespace system.

Namespaced paths

Earlier versions of my environment experimented with something I called namespaced paths. The idea was to treat filesystem paths more like hierarchical identifiers than raw strings. After the sed rewriting pipeline, I briefly invented a space separated list of simple substitutions, applied in order. This was much better, but I started desiring something that wasnn't just a prefix match, and didn't strictly do a longest match either. This lead me to a new, more elegant, solution.

The new system is fundamentally a key-value store based on aliases. Each alias has two roles: a real filesystem expansion used for navigation, and a display representation used when rendering the prompt. That distinction allows the prompt to either preserve hierarchy for orientation or collapse it to reduce noise, depending on how an alias is defined. The prompt uses a display path rather than a literal filesystem path, optimized for readability rather than exact reproduction. In the rare case I need the real path, I'll just run pwd.

Example aliases:

driem /mnt/c/users/driem
gmscripts [driem]/mydrive/software/gm_scripts
py driem/programs/python
easy_ui py/easy_ui

Literal paths

If an alias target starts with /, it is treated as a normal filesystem path:

driem /mnt/c/users/driem

Visible composition with [alias]

If the target starts with [alias], the referenced alias is expanded for the real path while remaining visible in the prompt:

gmscripts [driem]/mydrive/software/gm_scripts

This keeps the parent namespace visible when it carries useful meaning. When working inside gm_scripts, the prompt can display:

/driem/gm_scripts$

Hidden-prefix composition with alias/...

If the target starts with alias/..., the alias is expanded for the real path but its ancestry can be collapsed in the prompt:

py driem/programs/python
easy_ui py/easy_ui

When working inside easy_ui, the prompt can simply display:

easy_ui$

while the filesystem path remains long.

Mental model of the algorithm

Conceptually the system resolves aliases while tracking two outputs: the expanded filesystem path, and the compressed display path.

resolve(value):
  if value starts with /:
    return real_path=value, display=value

  if value starts with [alias]suffix:
    expand alias for real path
    keep alias in display path

  if value starts with alias/suffix:
    expand alias for real path
    allow display to collapse the prefix

When rendering the prompt, the system finds the best matching expansion and substitutes the corresponding display form.

Concrete example

Real filesystem path:

/mnt/c/users/driem/programs/python/easy_ui

Prompt display:

easy_ui

The filesystem stays long and stable, but the spoken prompt stays short.

Hacks to make reading code faster

Another trick that speeds up spoken output is shortening how punctuation is pronounced. Instead of hearing every punctuation symbol spoken in full, I use shorthand pronunciations:

symbol pronunciation shorthand
( left paren par
) right paren ren
[ left bracket brà
] right bracket ket
{ left brace curl
} right brace lea
: colon coal
; semicolon dah
... ... ...

For example, a typical C-style loop would normally be spoken like this:

for left paren i equals zero semi i less ten semi i plus plus right paren left brace

With shorthand it becomes:

for par i eq zero dah i less ten dah i plus plus ren curl

This dramatically reduces how long it takes to listen to code. The learning curve was a little weird at first, but I adapted to it within a week.

Small shell tweaks that reduce friction

Most of the rest of the repository consists of small adjustments to default shell behavior.

Nice to have: Immediate history syncing

Normally Bash writes command history when a shell exits, which means that if you have multiple terminals open, commands from one session may not appear in another until much later. This configuration appends history immediately after each command so history search remains consistent across terminals, and survives reboots.

Per-machine overrides

The repository includes a .bash_local file that allows machine-specific configuration. Local settings are stored separately so the main configuration remains portable across machines. Something as trivial as the hostname adds words to my prompt, so the host is one such local knob, and only remote systems have that knob turned on.

Bootstrapping and configuration drift

Installers sometimes modify shell startup files without asking. To reduce ambiguity, this repository bootstraps itself by symlinking dotfiles into $HOME, so there is always one canonical copy. If .bashrc changes unexpectedly, the difference becomes immediately visible and I know exactly who to blame for messing with my config without asking.

WSL as a practical compromise

I split my time between WSL and native Linux. Strong accessibility tooling on Windows provides a great accessibility environment for me, while Linux offers the developer tooling I prefer. WSL lets me keep a Linux shell while still accessing the Windows GUI when necessary, Linux GUI accessibility is ... complicated.

A tiny Vim tweak that matters

set noru

This disables Vim's ruler display. For a screen reader, the ruler constantly announces line and column numbers, and turning it off removes unnecessary speech. In general, the only time I want changing text on screen is if that text matters right now.

Designing terminals for listening instead of looking

Most terminal environments assume the user is visually scanning the screen. When the interface is audio, different tradeoffs emerge:

Visual shell design Audio-first shell design
show lots of context minimize repeated information
color cues text cues
long paths are fine long paths create audio noise
visual scanning stable spoken landmarks
highlight changes highlight importance

Individually, these changes are small. Together, they turn the terminal into something I can navigate at the speed I can think. Interestingly, many of these ideas are useful even for sighted users. Shorter prompts, clearer cues, and stable navigation primitives improve terminal workflows regardless of how you interact with them.

The full configuration is available here: https://codeberg.org/derekriemer/dotfiles-public

disclaimers:

  1. This repo is a cleaned-up, minimal version of my actual setup. I’ve removed machine-specific config (SSH, hostnames, private paths), but the structure and workflows are the same as what I use day to day.
  2. My crude bash implementations are far from elegant, and do not even attempt to maximize performance. If the pipeline ever becomes noticeably slow, I'll likely rewrite the core matcher as a simple rust program that uses the current directory to walk a tree of substitutions or something like that. However, I'm not optimizing something that works well enough with no noticeable overhead to me, the intended user.

Weather app

This is a web app implemented with django that aims to present weather data in  a simple format. I host this on my website running in a django instance. I currently am copyrighting this under the GNU agpl license because I want anyone who wishes to see my code to be able to see it. The core of this app came from a python based command line weather app I wrote to play around with the dark sky companies API. see developer.forecast.io for more info on the quite nice weather API they provide.

how it works:

It uses the native javascript location api to get the users current (quite precise) location. It then loads the weather data requested asyncrinously through ajax. On the back end, I am using post requests and a simple api located at /weather/forecast. Logic decides which subpage to load based on the parameters given in the request. I would do this differently if I rewrote this, it's a pile of junk how it works, but it was my first web app, so hey. The front-end presents most things textually. I may use the platform I have built to explore audio representations of weather radar. I built a little hacked together weather chart where I map tones to temperature and volume to chance of precipitation. I might explore using 3 dimensional audio and other factors to represent weather phenomena in a audio weather map, for once giving the blind the ability to see oncoming rain storms or threats from thunder or just to look at the next hours radar like a sighted friend might.

URL:

https://django.derekriemer.com/weather

Virtual Clock Tower

I am minorly enthusiastic about antique clocks. I don't really know that much about the mechanical workings but have an interest in them. This little program was a prototype I created to turn a desktop computer into a clock tower. It sucks because I can't find decent recordings of the proper bell sounds. Anyhow, it was an experiment I may put a ui on some day to see if I can turn the computer into a device that will engage children with an integral part of our societies history so they become interested in studying antique clocks, especially the massive towers that house clocks in some of the worlds coolest cities.

Source:

https://github.com/derekriemer/VirtualClockTower

Crash Hero!

NVDA Developers and bug smashers! ATTENTION! This is an important announcement from the department of release stability management. Recently, a new member joined the NVDA community. Her name will remain anonymous, but you may refer to her as the crash hero. In fact, she is the first NVDA superhero. She exhibits her superpower in the form of an NVDA Add-on that can save all your crash dumps in a folder of your choosing on your computer and she does this automatically when NVDA reboots after a crash.

Read Crash Hero!… (6 paragraphs remaining).

nvda-notepadPlusPlus

This is an NVDA AppModule to improve accessibility of the notepad Plus Plus editor. It adds support for announcement of many different things when navigating including find next and previous. It also allows the user to use autocomplete with audible announcement when suggestions appear, and reporting of each suggestion in the resulting list. Other notable features include support for the incremental find dialog, so that the user can get the same info a sighted person might from the dialog, and accessibility enhancements to the key mapper dialog. This addon was done in collaboration with tuukkao.

https://github.com/derekriemer/nvda-notepadPlusPlus

Learn To Type

This app is a prototype of a system in which a word is picked automatically for the user based on the text they have the most trouble typing. It takes the 3 most commonly miss typed characters collected over the course of a training session, and adapts future training by presenting a word for them which should challenge them to use the characters they miss type. This app doesn't save the training data for future reference, however this could be arranged easily. The goal of writing this was to practice working with priority queues and other data structures. Learn more at https://github.com/derekriemer/learnToType

Indentone, making NVDA read indents as musical spacial tones.

This addon has been deprecated. See (https://derekriemer.com/blog/indentone-deprecation)[The indentone deprecation post for details]. I have decided to make an addon that lets NVDA report indents as tones. This for now is not an official NVDA Add-on which has gone through community review (see future work for more reasons). Here is how it works. When you are reading some code or text with indents, if NVDA sees 4 spaces, or 1 tab, it plays a note. Each indent level we increase, the add-on plays the next whole tone up. Example: c3 all the way on your left (one octave below middle c), 0 tabs. D3, slightly farther right, 1 tab. for each level of indent NVDA sees, it plays a note farther to the right, and up that many levels on a whole tone scale. Then, when indent level decreases again, the notes pitch decreases, and the tone moves back to the left a bit. NVDA previously played no tone for no indent (technical reason) (fixed in indentone0.3.0).

The readme is pasted here. for those who don't care and just want a download link, go to the download heading level 2.

How to use:

Installation

Install this addon by pressing enter or double clicking it from the file manager. Then tell NVDA to install it by following the prompts.

Using

When NVDA  would normally speak indents, this addon should activate. If it doesn't, please contact me. This addon will detect changes in indent level and beep to inform you that an indent occurred. When the text you are reading is more indented than the last text you were reading, it beeps farther to your right than it did before. Also, the tone will play one whole tone higher  than the previous indent level would. For example, no tabs will be all the way to your left at one octave below a middle c. The first tab will cause NVDA to play a D3 (one step up), 2 tabs an E3 (two steps up), 3 tabs an fSharp3 (technically 3 steps), and so on. The 3 tabs will be slightly farther right of the C, and a middle c would be much closer to the center of your body than the c below that. When the text is less indented than it was before (assuming it was already indented), NVDA will do the opposite. For example, lowering the tone and moving it to the left. The farthest right tab level is 3 octaves higher than the no indent level.

Future work

I may play around with panning the audio dynamically. This would allow me to start the beep at your left, and move it 1 indent unit over a time of about 200 milliseconds. The advantage of this is you could judge the difference in indentation that just occurred, while in parallell hearing the code you are currently editing, even if you don't musically easily judge whole tone steps. I am also probably going to experiment with integrating this into NVDA core (I'm going to open up a ticket about this after finals). I spoke about Indentone at NVDACon 2016 in a session about my add-ons. I received much great feedback, and I am excited to continue work on this.

https://files.derekriemer.com/indentone-0.3.0.nvda-addon

Source Code:

https://github.com/derekriemer/nvda-indentone