Refactoring Ruby Regex

Posted by hcarnes on December 1, 2017

A very long method is not fun to read. Refactoring involves rewriting code to increase efficiency and readability. To demonstrate refactoring, I’ll show you how I started with a long method and refactored it to find a cleaner solution.

Both methods below accomplish the same task: accept an argument of a phone number and return true for valid phone numbers, regardless of format, and return false for invalid phone numbers. Examples of valid numbers include: 1234567890, 123-456-7890, (123)456-7890, 123 456 7890, and (123)4567890.

Code smell:

def valid_phone_number?(phone)
  phone.match?(/\d{10}|^\d{3}-\d{3}-\d{4}$|^[(]\d{3}+[)]\d{3}-\d{4}|^\d{3}.\d{3}.\d{4}|^[(]\d{3}[)]\d{7}/)
end

Clean code:

def valid_phone_number?(phone)
  phone.match?/\(?\d{3}\)?(\s|-)?\d{3}(\s|-)?\d{4}/
end

What was I thinking?

I initially approached the problem by creating a separate regular expression pattern for each different version of valid phone number. I enclosed the regex in extraneous parentheses and inserted unnecessary characters, such as “^” and “$”. Each expression was separated by a pipe “|” to represent “or”. Here is the breakdown on the regex for each valid number.

  • /\d{10}
  • ^\d{3}-\d{3}-\d{4}$
  • ^[(]\d{3}+[)]\d{3}-\d{4}
  • ^\d{3}.\d{3}.\d{4}
  • ^[(]\d{3}[)]\d{7}

The refactoring process

To begin refactoring, I found commonalities in the valid phone numbers. All of the numbers include 10 digits. The first 3 digits are sometimes preceded by an open parentheses. The next 3 digits are also sometimes divided by a closed parentheses, a space, or a dash. I created a skeleton number with these separate digit groups. If you test the code below against the valid numbers in Rubular, only 1234567890 will match.

\d{3}\d{4}\d{5}

In some of the valid numbers, the area code might be enclosed in parentheses. Let’s add an expression that allows a character to be present once or not at all “\?”. To be sure we are allowing a parentheses to be present once or not at all, let’s include “\(?” before and “\)?” after the first three digits. If you test the code below against our valid numbers in Rubular, only 1234567890 and (123)4567890 will match.

\(?\d{3}\)?\d{3}\d{4}

What about the numbers with either a space or a dash between groups of digits? In regular expressions, a space can be represented by “\s”. If we want to check for a space or a dash or nothing we can use the following syntax: “(\s|-)?”. Let’s place this syntax between the digit groups. If you test the code below against our valid numbers in Rubular, all valid numbers will match.

\(?\d{3}\)?(\s|-)?\d{3}(\s|-)?\d{4}

Ah, much better.