Adventures in Learning Full Stack Web Development

SQL GROUP BY using JavaScript

2019.06.25

This post explores how PostgreSQL’s GROUP BY clause and JavaScript’s reduce method both have the ability to help you group objects/rows of data based on properties. I’m assuming you are already familiar with both GROUP BY and reduce, and that you are also (mildly) interested in gaining a deeper understanding of each.

PostgreSQL’s GROUP BY clause

According the PostgreSQL documentation, “the GROUP BY clause is used to group together those rows in a table that have the same values in all the columns listed…The effect is to combine each set of rows having common values into one group row that represents all rows in the group.”

Here is a refresher on what that this looks like in PostreSQL (Source: PostgreSQL tutorial):

SELECT column_1, aggregate_function(column_2)
FROM tbl_name
GROUP BY column_1;

Let’s say we have a cities table that includes a list of cities with the properties of name and state, and we want to aggregate those cities by state.

A list of United States cities and their states.
Name State
San Antonio TX
Atlanta GA
Austin TX
New York City NY
Houston TX

The following query will group together those rows in the table that have the same value in the state property.

SELECT state
FROM cities
GROUP BY state;

Below is a visual representation of the result of the query. You can see that GROUP BY in this case acts as SELECT DISTINCT. In other words, GROUP BY removed all the duplicate states in our table.

A list of all unique states in the cities table.
State
GA
NY
TX

Getting insights from the data

Aggregate functions operate on the groups created by GROUP BY to help you answer interesting questions about your data. Below lists some of the aggregate functions: (Source: PostgreSQL Documentation).

  • AVG() – return the average value.
  • COUNT() – return the number of values.
  • MAX() – return the maximum value.
  • MIN() – return the minimum value.
  • SUM() – return the sum of all or distinct values.

Let’s answer some questions about our data using GROUP BY and aggregate functions!

How many cities are in each state?

SELECT state, COUNT(state)
FROM cities
GROUP BY state;
Each state along with the number of cities in that state.
State Count
GA 1
NY 1
TX 3

Which state has the most cities?

SELECT state, COUNT(state)
FROM cities
GROUP BY state
ORDER BY count DESC
LIMIT 1
The state with the most cities.
State Count
TX 3

JavaScript’s reduce method

What if you’re working in JavaScript and have an array of objects you need to group by a particular property? Well, let’s extend the example above by assuming we have an array of staff location objects, and each object has the property of name and city and state. JavaScript’s reduce method is one way to approach the problem.

According to the the MDN documentation, “the reduce() method executes a reducer function (that you provide) on each element of the array, resulting in a single output value.

You can write a function that takes 2 parameters: the array of objects and the properties that you would like to group the object by. The properties will represent the “bucket” that you put your staffLocations in based on their state.

The reduce method below takes the following arguments:

  • accumulator - This stores the return values created each time the callback function is invoked. This is returned when the method is complete (assuming the array passed in is not empty, in which case the initial value is returned).
  • object - This is the current object being manipulated in the array.
  • callback - This is the function you want to execute on each object in the array.
  • initialValue - The first time the reducer function runs, this will be the accumulator value. Below, the initialValue is {}.
const staffLocations = [
  { name: "Hannah", city: 'Houston', state: 'GA' },
  { name: "Ilhan",  city: 'Atlanta', state: 'GA' },
  { name: "Preet",  city: 'Houston', state: 'TX' },
  { name: "Adam",  city: 'Austin', state: 'TX' },
  { name: "Preston", city: 'New York City', state: 'NY' },
  { name: "Anna", city: 'Houston', state: 'TX' },
  { name: "Jakub",  city: 'Atlanta', state: 'GA' },
];

const groupBy = (objectArray, ...properties) => {
  return [...Object.values(objectArray.reduce((accumulator, object) => {
    const key = JSON.stringify(properties.map((x) => object[x] || null));

    if (!accumulator[key]) {
      accumulator[key] = [];
    }
    accumulator[key].push(object);
    return accumulator;
  }, {}))];
}

const groupedStaffLocations = groupBy(staffLocations, 'state');

groupedStaffLocations

groupedStaffLocations looks like:

[
  [
    { name: "Preet", city: "Houston", state: "TX" },
    { name: "Adam", city: "Austin", state: "TX" },
    { name: "Anna", city: "Houston", state: "TX" },
  ],
  [
    { name: "Hannah", city: "Houston", state: "GA" },
    { name: "Ilhan", city: "Atlanta", state: "GA" },
    { name: "Jakub", city: "Atlanta", state: "GA" },
  ],
  [
    { name: "Preston", city: "New York City", state: "NY" },
  ]
]

The callback steps include the following:

  • Read the values of the grouping properties, and store them in key. This symbolizes the group
  • If the accumulator doesn’t have an existing group for the values in key, create a new group
  • Put the object in the group

Source: MDN: Reduce: Grouping objects by property

Getting insights from the data

After you reduce data to buckets of information with key value stores, you can map the same data to answer interesting questions, like the question we answered above: “Which state has the most cities?".

const groupedCities = groupBy(cities, 'state');

// sort by length of array
let sortedArr = groupedCities.sort((a, b) => b.length - a.length);
// get the state of the first array, which would have the greatest length
sortedArr[0][0]['state'];

// returns:
// "TX"

Multiple properties

This function also supports grouping by multiple properties, so it works like GROUP BY in SQL:

const cityGroupedStaffLocations = groupBy(staffLocations, 'state', 'city');

In this case, cityGroupedStaffLocations returns groups representing staff that live in the same city:

[
  [
    { name: 'Hannah', city: 'Houston', state: 'GA' },
  ],
  [
    { name: 'Ilhan', city: 'Atlanta', state: 'GA' },
    { name: 'Jakub', city: 'Atlanta', state: 'GA' },
  ],
  [
    { name: 'Preet', city: 'Houston', state: 'TX' },
    { name: 'Anna', city: 'Houston', state: 'TX' },
  ],
  [
    { name: 'Adam', city: 'Austin', state: 'TX' },
  ],
  [
    { name: 'Preston', city: 'New York City', state: 'NY' },
  ]
]

This can be easily paired up with map to get the number of staff in each city:

cityGroupedStaffLocations.map(cityStaff => ({location: `${cityStaff[0].city}, ${cityStaff[0].state}`, numberOfStaff: cityStaff.length}))

returning:

[
  { location: 'Houston, GA', numberOfStaff: 1 },
  { location: 'Atlanta, GA', numberOfStaff: 2 },
  { location: 'Houston, TX', numberOfStaff: 2 },
  { location: 'Austin, TX', numberOfStaff: 1 },
  { location: 'New York City, NY', numberOfStaff: 1 },
]

JSON.stringify???

const key = JSON.stringify(properties.flatMap((x) => object[x] || null));

When reading through the groupBy method, did you notice that the key was JSON? In order to ensure that multiple grouping properties can be passed into the function (state, name, or city), key had to be an array of the corresponding values. In JavaScript, Objects can only use strings and symbols as keys. Transforming the group (key) to JSON allows us to cheat JavaScript’s lack of deep structural equality by using simple JSON string comparison. When the values in the group convert to the same JSON, they’ll be considered part of the same group. While this probably hurts performance, it’s the most succinct way I’ve found with vanilla JavaScript to use arrays as keys.