Wednesday, October 28, 2015

Concatenating in T-SQL without abusing XML

Sometimes when you're writing an aggregate query in T-SQL you want to concatenate strings in a group together, instead of applying an operation like min, max, or count. The common suggestion is to use FOR XML PATH in a magical way that concatenates for a reason that no one understands (unless you actually take the time to learn the T-SQL XML functionality, which I highly recommend).

I'm going to demonstrate an alternate approach that involves the use of a PIVOT query. In order to use this method, you have to define the maximum number of values that will be combined. This may or may not be a drawback. (In my experience, it's actually a useful sanity check.) If you do it this way, you have a lot more flexibility in join delimiters and the syntax is more straightforward.

-- Prepare example tables. @result is an ordinary data table, and @msg is a
-- related table that stores comments for the records in @result.
DECLARE @result TABLE
    (record_id int IDENTITY(1, 1) PRIMARY KEY,
    col1 int, col2 int);
 
DECLARE @msg TABLE
    (msg_id int IDENTITY(1, 1) PRIMARY KEY,
    record_id int NOT NULL,
    msg nvarchar(300) NOT NULL);
 
INSERT INTO @result (col1, col2)
VALUES (1, 2), (2, 4), (3, 6), (4, 8);
 
INSERT INTO @msg (record_id, msg)
VALUES
    (2, N'first message'),
    (2, N'second message'),
    (4, N'another message');
 
-- Use pivoting to concatenate messages for the records together into a
-- single column. A CTE is used to create a temporary result set that joins
-- @result and @msg and includes a column that indicates the message
-- sequence. A pivot query flattens the result set by concatenating the
-- messages together.
WITH q AS
    (SELECT [@result].record_id, [@result].col1, [@result].col2,
        [@msg].msg,
        'Msg' + CONVERT(varchar, ROW_NUMBER() OVER
            (PARTITION BY [@msg].record_id
            ORDER BY [@msg].msg_id)) AS rn
    FROM @result
    LEFT JOIN @msg
    ON [@result].record_id = [@msg].record_id)
SELECT record_id, col1, col2,
    COALESCE(Msg1, '') +
    COALESCE(', ' + Msg2, '') +
    COALESCE(', ' + Msg3, '') AS msgs
FROM q
PIVOT (MAX(msg) FOR rn IN (Msg1, Msg2, Msg3)) p;
 
-- Results:
-- +-----------+------+------+-------------------------------+
-- | record_id | col1 | col2 | msgs                          |
-- +-----------+------+------+-------------------------------+
-- | 1         | 1    | 2    |                               |
-- | 2         | 2    | 4    | first message, second message |
-- | 3         | 3    | 6    |                               |
-- | 4         | 4    | 8    | another message               |
-- +---------------------------------------------------------+

No comments:

Post a Comment