I’m a UNIX command line guy, so it shouldn’t come as a surprise that I love regular expressions. This simple language allows one to slice and dice text strings with abandon, greatly simplifying data extraction and replacement tasks. Although it’s been there since 2010, I was thrilled recently to discover that Google Spreadsheets now supports in-cell regular expressions! It’s too bad Microsoft never added this.
What Are Regular Expressions?
Regular expressions (often abbreviated as “regex”) are a way to search for text within a string. Regex implementations are found throughout text-based computing, from common UNIX editors and tools (ed and grep) to Internet-friendly software (Perl, Python, JavaScript, PHP) and even some modern apps (Skype!)
A regular expression is usually implemented inside a simple text command, with the forward slash (“/”) marking the start and end of the expression. Systems then execute a given command wherever the expression is matched.
The most basic expressions are simply some text to match exactly, but things really get powerful when metacharacters and classes are employed. For example, you could search for “abc” by typing “/abc”. But what if you wanted to find any three letters? In Perl, and many similar implementations, /[\w]{3}/ matches any three Unicode letters in a row. Whoa!
Trivia time: Did you know that the popular UNIX tool, grep, gets its name from a regular expression? g/re/p is the command to search by regular expression, “re“, in the editor, ed!
Some Basic Regular Expressions
There are many, many metacharacters, modifiers, and operators in most regular expression implementations. I’ve been using the common Perl set for over 20 years and still don’t have everything memorized! But it’s easy to learn the basics!
- Any plain text string can be used for an exact match. Want to find “Stephen” in a file? Just use “Stephen” as your regular expression!
- You can group characters in square brackets. For example, “[az]” matches “a” or “z”. The system will even assume common sequences, so “[a-z]” matches any letter in the alphabet from “a” through “z”.
- A single period (“.”) matches any character, so “Gr…” matches “Grant”, “Grace”, or “Greta”.
- The caret and dollar-sign are anchors for the beginning and end of line, respectively. So “^Gr[aecnt]{3}$” matches a line containing “Grant”, “Grace”, or “Greta”!
- You can make a match optional with “?” or “*”, which match “none or one” or “any”, respectively. I use “.*” (matching any number of characters) and “.?” (matching none or one character) a lot!
There is much, much more you can do, but this ought to get you started!
Here’s a great Regex reference!
Regex Functions in Google Spreadsheets
With that out of the way, let’s move on to Google Spreadsheets!
As of mid-2010, Google added three Regular Expression functions to Google Docs (now part of Google Drive). These are not compatible with Microsoft Excel (which only supports regex in VBA) or Apple Numbers. But they’re awesome anyway!
- =REGEXEXTRACT(“text”;”regex”) – This is my favorite. It allows you to extract a bit of a string from text using parenthesis. It’s replaced a whole horde of LEFT(), MID() and FIND() in my spreadsheets! =REGEXEXTRACT(A2,” \(K([A-Z]{3})\)$”) will fill a cell with three capital letters after “K” in parenthesis after a space in the last word of a string. Let’s see you do that with MID()! Note: Those forward slashes say “literally a parenthesis character”, as opposed to the parenthesis that marks what I want to extract.
- =REGEXMATCH(“text”;”regex”) – Just like MATCH(), this uses a regular expression to locate the position of text in a cell.
- =REGEXREPLACE(“text”,”regex”,”replacement”) – This is the inverse of REGEXEXTRACT(). It spits out the entire content but with the regular expression matched content replaced. =REGEXREPLACE(A2,”apple”,”orange”) will fill a cell with the same text as A2, but with the apple becoming an orange!
Stephen’s Stance
If you do any sort of spreadsheet work, you’ll find these functions incredibly useful. I know regular expressions can be intimidating, and I hate that they’re not compatible outside Google Spreadsheets, but it’s worth learning these regex functions!