Named Capture Groups in Python


By Joe Carboni

Named Capture Groups provide a way to tag regex capture groups with, you guessed it, names. Instead of trying to remember which positions your capture groups are in, you can find them by your custom names, which is a more readable and manageable approach.

A String Parsing Task

Dealing with product model nomenclatures

Recently, I got myself into a bit of string parsing – taking a manufacturer’s model numbers and producing rich tables of features and pricing based on the model number itself.

An HVAC indoor coil manufacturer named Advanced Distributor Products (ADP) sells a popular and versatile coil, called the Multi-Position – HE Series (https://www.adpnow.com/product/multi-position-he-series/)

A typical model number looks something like this: HG30924D145B1205AP. And their nomenclature describes some features that you can glean straight from the model number itself, such as paint color, width, height, etc.

Image of nomenclature breakdown. Source https://www.adpnow.com/product/multi-position-he-series/

Strings Incoming

Using named capture groups to parse a model number

Perhaps you have a Python script reading file contents, and you would like to capture instances of strings like “HG30924D145B1205AP“ and doing something with them. Given the contents of the file, you can’t be sure where or even if you’re going to see these model numbers, but either way you’re tasked with not only capturing it, but also looking up other information using references.

This is where I found named capture groups quite useful. Using our example, “HG30924D145B1205AP“, a regex pattern like this will match with it (any string that fits this pattern.)

Breaking down the named capture group

  • (?P<name>…) is the syntax for a named capture group, replacing name with your name for the capture group, and the ellipses with the regex for matching
  • These groups are in the same sequence as the nomenclature definition, shown here vertically stacked for readability, but they are treated as in-line. Note the use of a raw string and a quote block.

Python’s standard regex library `re` can handle these, but you have to make sure to use the verbose flag. Here is an example of compiling the regex above for use when comparing to other strings.

Extraction

Translation from Match to Dict using groupdict

You have your regex pattern set up, and now you want to take a string coming in and compare it. Assuming you have a match with something like our example, “HG30924D145B1205AP”, the magic comes in with the use of `.groupdict()`. The `groupdict` method takes your named capture groups and outputs them as key-value pairs in a dictionary.

The result

Wider Applications

I used named capture groups in this way to parse many other model series from ADP, extracting these nomenclature features and using features in the dictionary to look up other features in reference tables. Seamlessly transitioning from a regex match to a dictionary with named keys for each part is super nice!