Using GROUP_CONCAT in SQL to see multiple field values

I came across this SQL command that came in quite useful. It’s not that advanced of a query and I probably should’ve learned it long ago, but it’s been a game changer in a recent project.

Say you have a data set like this in test.data:

| User_ID | Name         |
|--------:|--------------|
|     001 | Bob S.       |
|     001 | Bobby Smith  |
|     001 | Robert Smith |
|     002 | Joe Adams    |
|     002 | J. Adams     |

You want to know how many records each person has, but their names are not consistent. You don’t want to just get a count of the User_ID, because you won’t know who they are, and you want to capture all the different ways the name is spelled, but you don’t want a different record for each different name.

SELECT
  User_ID,
    GROUP_CONCAT(Name) as Names,
    count(*) as Records
FROM test.data
GROUP BY User_ID

You get a result like this:

| User_ID | Names                           | Records |
|--------:|---------------------------------|---------|
|     001 | Bob S.,Bobby Smith,Robert Smith | 3       |
|     002 | Joe Adams,J. Adams              | 2       |

That was syntax from MySQL using MySQL Workbench. I’ve also used this with Google BigQuery, but I could also set the separator in a nicer way:

GROUP_CONCAT(unique(Name), '; ') as Names

That gave me a nice, semi-colon separated list of Names.

Previous: Csvkit data pipelines with Bash in Jupyter Notebooks

Next: One cool trick to inspect live objects in your Django app with a standard Python function