Adventures in Learning Full Stack Web Development

## Refactoring Ruby Regex

2017.11.30

A very long method is not fun to read. Refactoring involves rewriting code to increase efficiency and readability. To demonstrate refactoring, I’ll show you how I started with a long method and refactored it to find a cleaner solution.

Both methods below accomplish the same task: accept an argument of a phone number and return true for valid phone numbers, regardless of format, and return false for invalid phone numbers. Examples of valid numbers include: 1234567890, 123-456-7890, (123)456-7890, 123 456 7890, and (123)4567890.

Code smell:

def valid_phone_number?(phone)
phone.match?(/\d{10}|^\d{3}-\d{3}-\d{4}$|^[(]\d{3}+[)]\d{3}-\d{4}|^\d{3}.\d{3}.\d{4}|^[(]\d{3}[)]\d{7}/) end  Clean code: def valid_phone_number?(phone) phone.match?/$$?\d{3}$$?(\s|-)?\d{3}(\s|-)?\d{4}/ end  ### What was I thinking? I initially approached the problem by creating a separate regular expression pattern for each different version of valid phone number. I enclosed the regex in extraneous parentheses and inserted unnecessary characters, such as “^” and “$”. Each expression was separated by a pipe “|” to represent “or”. Here is the breakdown on the regex for each valid number.

• /\d{10}
• ^\d{3}-\d{3}-\d{4}\$
• ^[(]\d{3}+[)]\d{3}-\d{4}
• ^\d{3}.\d{3}.\d{4}
• ^[(]\d{3}[)]\d{7}

### The refactoring process

To begin refactoring, I found commonalities in the valid phone numbers. All of the numbers include 10 digits. The first 3 digits are sometimes preceded by an open parentheses. The next 3 digits are also sometimes divided by a closed parentheses, a space, or a dash. I created a skeleton number with these separate digit groups. If you test the code below against the valid numbers in Rubular, only 1234567890 will match.

\d{3}\d{4}\d{5}

In some of the valid numbers, the area code might be enclosed in parentheses. Let’s add an expression that allows a character to be present once or not at all “\?”. To be sure we are allowing a parentheses to be present once or not at all, let’s include “$$?” before and “$$?” after the first three digits. If you test the code below against our valid numbers in Rubular, only 1234567890 and (123)4567890 will match.

$$?\d{3}$$?\d{3}\d{4}

What about the numbers with either a space or a dash between groups of digits? In regular expressions, a space can be represented by “\s”. If we want to check for a space or a dash or nothing we can use the following syntax: “(\s|-)?”. Let’s place this syntax between the digit groups. If you test the code below against our valid numbers in Rubular, all valid numbers will match.

$$?\d{3}$$?(\s|-)?\d{3}(\s|-)?\d{4}

Ah, much better.