Posts about tech

Designing a Terminal for an Audio-First Workflow

Designing a Terminal for an Audio-First Workflow

Most shell environments assume a visual interface. Prompts are colorful, information-dense, and optimized for quick scanning. My workflow is different. I'm blind, and I work with a screen reader, which means my terminal is fundamentally an audio interface. Instead of scanning the screen, I'm listening to the environment as I work. That changes what matters.

Most terminal workflows carry a lot of unnecessary noise such as long paths, redundant context, and visual-only cues. Designing for audio forced me to remove that noise entirely.

The result is a shell environment that’s faster to navigate, easier to parse, and often better even if you can see the screen. If you like what I describe here, take a look at my dotfiles repo for your own inspiration. If you are interested in what path substitution built directly into shell configuration might look like, and maintain shell code, I'd love to talk.

Where this started

While I was working at Google, I spent a lot of time inside the monorepo. (If you haven't encountered that model before: imagine nearly the entire company's code living inside one enormous repository). The ideas that eventually led to this system started there, but the implementation in this repository was written later from scratch and is entirely my own code.

The paths were extremely long. In some cases it could take several seconds for my screen reader, even reading at 700 WPM, just to tell me where I was during a large refactor. A typical path might look something like this (simplified):

/mounted/path/to/the/monorepo/perforce_client/root/javascript/namespace_part/namespace_subpart/namespace_subpartsubpart/app_root/subsystem/main/tests/audioInformation_test.ts

For a sighted developer, this is mostly a visual annoyance. You glance at the prompt, pick out the important segment, and move on. For me, the terminal had to speak the entire thing, or I had to interrupt my work and read it piece by piece. Every time the prompt appeared, my screen reader would start reading the path, and after a while I realized I often had no idea where I was unless I waited several seconds for the prompt to finish speaking. That was not going to work.

So I started building tools to compensate. First came "teleport" functions. These were small helpers that jumped directly to important parts of the repository:

function jsdf {
  cd <path_to_google_drive_javascript_code>
}

function jdf {
  cd <path_to_java_drive_code>
}

I can't show the real internal layout because I don't want to reveal how Google's internal monorepo is organized, but jsdf simply meant "JavaScript Drive frontend code." This saved a ton of time in typing along, a simple jsdf;cd infra would get me somewhere I needed to be to do something on a specific file, but it wasn't enough. These worked, but they were crude, and I was still constantly hearing huge paths in the prompt. Eventually I added a small rewriting pipeline that shortened common segments before displaying the path. The first version was extremely simple:

sed -e 's/path_1/short_segment/' \
    -e 's/path_2/another_short/'

It was ugly, but it proved something important: shortening paths dramatically reduced cognitive load. To make that concrete, here is the kind of transformation I was trying to achieve.

Before:

/mnt/c/users/driem/programs/python/ai_image_describer/main/tests/audioInformation_test.ts

After:

py/ai_image_describer/main/tests/audioInformation_test.ts

Saving keystrokes was important, but not as important as saving attention. Hearing a short, predictable path segment is far easier than listening through a long directory hierarchy every time the prompt appears. At Google, the savings was well over two thirds the path length, and left me more mental capacity to actually think about the problem at hand. That idea eventually evolved into the alias and namespace system I use today.

Making terminal sessions resilient

My workflow makes this problem harder to ignore. At Google, and still today, I am often working from home, a café, my desk, or even while camping in the middle of the desert, so most of my development happens inside a persistent tmux session on a remote machine. I jump between windows, reconnect from different devices, and keep long-running work alive in the background. That means I'm constantly re-orienting myself: switching panes, reattaching sessions, and resuming work. Every time I do, the first thing my terminal speaks is the full working directory. If that path is long, I'm back to waiting several seconds just to answer a simple question: "Where am I?"

Path aliases

The first step was simple: shorten frequently used paths. For example, instead of typing cd /mnt/c/users/driem, I can define an alias:

driem /mnt/c/users/driem

and then navigate with:

p driem

The p command is a thin wrapper around cd, backed by the alias map. I also use pd for pushd when I want a quick stack to bounce between locations. Both commands support tab completion over the alias names, so navigation stays fast even as the alias set grows. This alone reduces both typing and how much the prompt has to speak.

Each alias also exports an environment variable. For example, P_driem - which makes it easy to reuse paths in scripts:

tail -f $P_driem/log.txt

This helped, but it didn't scale well once directory structures became deeper and shortcuts naturally nested. That led to the namespace system.

Namespaced paths

Earlier versions of my environment experimented with something I called namespaced paths. The idea was to treat filesystem paths more like hierarchical identifiers than raw strings. After the sed rewriting pipeline, I briefly invented a space separated list of simple substitutions, applied in order. This was much better, but I started desiring something that wasnn't just a prefix match, and didn't strictly do a longest match either. This lead me to a new, more elegant, solution.

The new system is fundamentally a key-value store based on aliases. Each alias has two roles: a real filesystem expansion used for navigation, and a display representation used when rendering the prompt. That distinction allows the prompt to either preserve hierarchy for orientation or collapse it to reduce noise, depending on how an alias is defined. The prompt uses a display path rather than a literal filesystem path, optimized for readability rather than exact reproduction. In the rare case I need the real path, I'll just run pwd.

Example aliases:

driem /mnt/c/users/driem
gmscripts [driem]/mydrive/software/gm_scripts
py driem/programs/python
easy_ui py/easy_ui

Literal paths

If an alias target starts with /, it is treated as a normal filesystem path:

driem /mnt/c/users/driem

Visible composition with [alias]

If the target starts with [alias], the referenced alias is expanded for the real path while remaining visible in the prompt:

gmscripts [driem]/mydrive/software/gm_scripts

This keeps the parent namespace visible when it carries useful meaning. When working inside gm_scripts, the prompt can display:

/driem/gm_scripts$

Hidden-prefix composition with alias/...

If the target starts with alias/..., the alias is expanded for the real path but its ancestry can be collapsed in the prompt:

py driem/programs/python
easy_ui py/easy_ui

When working inside easy_ui, the prompt can simply display:

easy_ui$

while the filesystem path remains long.

Mental model of the algorithm

Conceptually the system resolves aliases while tracking two outputs: the expanded filesystem path, and the compressed display path.

resolve(value):
  if value starts with /:
    return real_path=value, display=value

  if value starts with [alias]suffix:
    expand alias for real path
    keep alias in display path

  if value starts with alias/suffix:
    expand alias for real path
    allow display to collapse the prefix

When rendering the prompt, the system finds the best matching expansion and substitutes the corresponding display form.

Concrete example

Real filesystem path:

/mnt/c/users/driem/programs/python/easy_ui

Prompt display:

easy_ui

The filesystem stays long and stable, but the spoken prompt stays short.

Hacks to make reading code faster

Another trick that speeds up spoken output is shortening how punctuation is pronounced. Instead of hearing every punctuation symbol spoken in full, I use shorthand pronunciations:

symbol pronunciation shorthand
( left paren par
) right paren ren
[ left bracket brà
] right bracket ket
{ left brace curl
} right brace lea
: colon coal
; semicolon dah
... ... ...

For example, a typical C-style loop would normally be spoken like this:

for left paren i equals zero semi i less ten semi i plus plus right paren left brace

With shorthand it becomes:

for par i eq zero dah i less ten dah i plus plus ren curl

This dramatically reduces how long it takes to listen to code. The learning curve was a little weird at first, but I adapted to it within a week.

Small shell tweaks that reduce friction

Most of the rest of the repository consists of small adjustments to default shell behavior.

Nice to have: Immediate history syncing

Normally Bash writes command history when a shell exits, which means that if you have multiple terminals open, commands from one session may not appear in another until much later. This configuration appends history immediately after each command so history search remains consistent across terminals, and survives reboots.

Per-machine overrides

The repository includes a .bash_local file that allows machine-specific configuration. Local settings are stored separately so the main configuration remains portable across machines. Something as trivial as the hostname adds words to my prompt, so the host is one such local knob, and only remote systems have that knob turned on.

Bootstrapping and configuration drift

Installers sometimes modify shell startup files without asking. To reduce ambiguity, this repository bootstraps itself by symlinking dotfiles into $HOME, so there is always one canonical copy. If .bashrc changes unexpectedly, the difference becomes immediately visible and I know exactly who to blame for messing with my config without asking.

WSL as a practical compromise

I split my time between WSL and native Linux. Strong accessibility tooling on Windows provides a great accessibility environment for me, while Linux offers the developer tooling I prefer. WSL lets me keep a Linux shell while still accessing the Windows GUI when necessary, Linux GUI accessibility is ... complicated.

A tiny Vim tweak that matters

set noru

This disables Vim's ruler display. For a screen reader, the ruler constantly announces line and column numbers, and turning it off removes unnecessary speech. In general, the only time I want changing text on screen is if that text matters right now.

Designing terminals for listening instead of looking

Most terminal environments assume the user is visually scanning the screen. When the interface is audio, different tradeoffs emerge:

Visual shell design Audio-first shell design
show lots of context minimize repeated information
color cues text cues
long paths are fine long paths create audio noise
visual scanning stable spoken landmarks
highlight changes highlight importance

Individually, these changes are small. Together, they turn the terminal into something I can navigate at the speed I can think. Interestingly, many of these ideas are useful even for sighted users. Shorter prompts, clearer cues, and stable navigation primitives improve terminal workflows regardless of how you interact with them.

The full configuration is available here: https://codeberg.org/derekriemer/dotfiles-public

disclaimers:

  1. This repo is a cleaned-up, minimal version of my actual setup. I’ve removed machine-specific config (SSH, hostnames, private paths), but the structure and workflows are the same as what I use day to day.
  2. My crude bash implementations are far from elegant, and do not even attempt to maximize performance. If the pipeline ever becomes noticeably slow, I'll likely rewrite the core matcher as a simple rust program that uses the current directory to walk a tree of substitutions or something like that. However, I'm not optimizing something that works well enough with no noticeable overhead to me, the intended user.

accessibility in the web ecosystem

Introduction

Websites exist thanks to multiple layers of technologies where every layer is dependent on all layers below. This stack of technologies make it possible to send a website from a remote server to a users computer, and display the information in that webpage. Each layer of this stack has a highly specialized role, and attempts to misuse a layer of the stack will result in a poor user experience, heightened security risks, and poor performance. In order to show a person with disabilities a webpage and make the experience as useful as possible, it is necessary to use each layer of the technology stack for the purpose it was built.

Read accessibility in the web ecosystem… (7 paragraphs remaining).

running and debugging rust code from the windows operating system

For a hobby project, I will likely need to build a COM interface, for a screen reader integration, and doing so in python would be annoying to say the least. I therefore decided I might take a stab at implementingn it in rust for personal development, and rust seems more interesting than c++. I recently ran into an annoying issue trying to configure vscode debugging and running on windows. Here is how to get up and running and avoid a major pitfall.

Read running and debugging rust code from the windows operating system… (5 paragraphs remaining).

Some Thoughts on Smart Canes

over the last few years, many attempts have been made at creating a smart cane. None of them have successfully lead to a market transforming technology that's actually used by any substantial users.

I saw a recent example of a smart cane getting news coverage, and it deserves particular attention because of a particularly egregious argument used within. I've seen this argument, or variations therein, made in several posts about smart canes. I will address this below, and lay out why this argument does not present a solid case in favor of a smart cane. I will then lay out several design and engineering constraints that must be met before I would ever be able to recommend a smart cane to another blind person.

Read Some Thoughts on Smart Canes… (6 paragraphs remaining).

Changes to Googles Youtube for IOS, and how to use it with voiceover.

Youtube was recently updated, and several voiceover changes were put in. At first, you may do what I did. "oh, damn, it, google, stop, breaking, things!!!" It turns out that google actually fixed a lot of things in this version, making the user experience more streamlined and much more efficient. They did seem to break one thing though.

Read Changes to Googles Youtube for IOS, and how to use it with voiceover.… (4 paragraphs remaining).