Focus Ancestry
Background
Screen readers announce focus changes using a fairly simple algorithm. When focus changes, only announce the things that are necessary to announce, given the change between where the current user is now, and where the user was before the change occurred. If focus moved from a item within a combo box, and another item within that combo box, the combo box didn't gain or lose focus, but the option in focus changed. However, if the user gets moved from one labeled region of the page, to another labeled region of the page, announcing the new region is necessary. Further focus movements within that region are not necessary to announce. This article explains how a screen reader like NVDA represents the objects on screen, and decides what to announce. It explains a useful heuristic that developers can use to understand why and how to understand focus changes and designing useful focus interactions.
Object hierarchy
Web development is hierarchical. A tree of objects are created, with the objects generally representing rectangles on screen, and or logical containers that make working with those rectangles easier. However, most of these objects are not interactive, and thus it does not make sense to convey each of these objects at all times when they are focused.
Container nodes
Some objects are called container objects. All focusable containers contain at least 1 interactive object. These are things like a region of the page, a toolbar, the tree of a tree view, a combo box (<select>
), or a list. Containers can contain other containers, and leaf nodes as well. Some containers, like a generic div, are not meaningful to the end-user, and thus are simply ignored by the parts of screen readers that present the website to a user. Others are very important, and must be conveyed to the user when they are entered or exited.
leaf nodes
In web development, I use the term leaf node generally to refer to html elements with no children. These are generally text nodes, or some other nodes that don't have children. However, in this blog, I am using the term leaf nodes to express a specific subset of leaf nodes we use in the field of web development: An interactive and focusable element with no children interactive elements. This means that a button that has an icon child, and a label child, nevertheless is a leaf node for the purposes of focus calculations. Its children cannot gain focus, thus for focus announcement calculation, it is the leaf-most node possible.
Try to avoid focusing most non-clickable containers
Container elements are only supposed to be focusable if they are managing focus within their contained elements. These are specific subsets of containers, like combo boxes, list boxes, or some trees. Regions, logical groupings, etc. shouldn't keep focus. Some developers may choose to focus them to then have the container forward focus to a child, but storing focus on one of these non-interactive containers generally leads to a poor user experience, and is not necessary for screen readers, as outlined below. This, at the minimum, makes it necessary for the user to tab through more elements on the page than is necessary.
Concepts for algorithmically determining when to speak focus changes
When focus shifts from one item on the page to another, a screen reader can algorithmically determine what to present to the user to express what has changed. In order to explain this algorithm, I need to present a few concepts.
Focus ancestry
Like all tree nodes, The focus, being an element on the page, has a path to get from itself to the parent
element of the page. This path is simply created by walking the parents of the focus, all the way to the body (technically screen readers go to the root of the OS in some cases). This ancestry is maintained along with the focus in some cases, to enhance performance.Greatest common ancestor
When focus shifts from one element to another, we can get a sense for how much changed by taking the focus ancestry of both the previous focus and the current focus. Walking the ancestors of both the current and previous element from focus upward, a common ancestor will be found where both the previous focus and current focus share all parents to the body. In the below image, I present a toolbar, and a checkbox and button. The button previously had focus, but focus has moved to the checkbox, and I present the ancestry chains with dotted paths.
here is a more complicated example.
The focus instead moves from the button in the toolbar to the first search result. Since the focus moves into a labeled result region and out of the toolbar, the screen reader has to explain to the user that they are no longer in the toolbar, and instead they are inside the list. This is usually done by saying something like "out of toolbar, results region, list with 2 items, birthday cards link." To speed things up, only sometimes is the fact that focus left an item actually announced. finally, this example highlights a situation where there are several items that are above the lowest common ancestor in the focus ancestry.
Graphviz dot representation available for screen reader users.
Both the currant and previous focus in this case are within the filters region. Thus as focus moves from one filter to the other, it is not necessary for the screen reader to announce the filters region again. The focus remained within the region and thus it is simply not announced.
Recap
When parents gain focus or lose focus, the screen reader may choose to announce them, if this fact is important. For example, not being in a combo box is very important to indicate, whereas leaving a heading that had a link, not so much. The rules that govern this behavior are ridiculously complicated, and you as a developer don't need to be bothered figuring them out. Different screen readers have solved this with different heuristics, as screen readers don't all present the same exact UX, and you should not rely on any given implementation for deciding how to set things up. However, simply understanding that things generally only announce when they have changed will get you 95% of the way to understanding how focus announcement works. The screen readers will announce focus changes reliably because figuring out which things changed is dead simple to calculate algorithmically.
In short, items are only conveyed by the screen reader when focus moves into them or out of them, which can be determined by comparing the parent items of the previous focus with the parents of the current focus. Items that are in the current focus ancestry, and not the previous focus ancestry are now focused parents, and items that are in the previous ancestry chain that are not in the current one no longer are focused parents. Since containers contain focus, they may get announced, and as containers lose focus, they may have that fact announced. It is not necessary to explicitly focus the containers to make sure the user sees them, because focus announcement heuristics algorithmically will ensure the user knows of important changes to focused containers. Thus, when it really is necessary, newly focused parents are automatically conveyed by screen readers, and no longer focused parents are also automatically conveyed by screen readers. Developers simply don't have to do anything special to make this work. It really is that simple. Hopefully this makes it clearer to understand how screen reader users know when they enter some region of the page, or a toolbar, etc.