Thursday, May 19, 2016

What's the fastest way to concatenate worksheet cells into a string?

I often store string resources (e.g. SQL queries, XML documents) in Excel worksheets. This makes the strings easy to update during development and avoids hassles associated with hard-coding the strings in the VBA code, or using external text files which must be distributed with the workbook.

In order to use the strings in the code, a function is needed that concatenates worksheet cells and returns a string. To keep things simple, I'm going to assume the worksheet only contains data in the first column. The most naive way to write this function is to concatenate in a loop, like this.

Function GetText(ws As Worksheet) As String
    Dim CellData As Range
    Dim Result As String
    Dim i As Long

    Set CellData = Intersect(ws.[A:A], ws.UsedRange)
    For i = 1 To CellData.Cells.Count
        Result = Result & CellData(i) & vbCrLf
    Next

    GetText = Result
End Function

The performance of this method is unacceptable even for a relatively small number of cells, with one call taking up to several minutes to run. At each iteration of the loop, the previous string is copied into a new allocated string and then left as garbage. As the loop continues, the intermediate string gets longer, more data has to be copied, and more and more garbage is generated. This is a textbook instance of a Schlemiel the painter's algorithm.

A better method is to copy the cells into an array and use the Join function to generate the final result.

Function GetText(ws As Worksheet) As String
    Dim CellData As Range
    Dim Result() As String
    Dim i As Long

    Set CellData = Intersect(ws.[A:A], ws.UsedRange)
    ReDim Result(1 To CellData.Cells.Count)
    For i = 1 To UBound(result)
        Result(i) = CellData(i)
    Next

    GetText = Join(Result, vbCrLf)
End Function

Performance is much better, but you'll notice that even this version takes several seconds to process a few thousand lines of data. It would be better if we could just concatenate the cells directly without having to employ an intermediate array. Luckily, it's possible to do this using the WorksheetFunction.Transpose function, which is smart enough to generate the 1-dimensional array we need for Join when a single column 2-dimensonal array is provided as input.

Function GetText(ws As Worksheet) As String
    Dim CellData As Range

    Set CellData = Intersect(ws.[A:A], ws.UsedRange)
    GetText = Join(WorksheetFunction.Transpose(CellData), vbCrLf)
End Function
This function will run in just a second or two even with 10s of thousands of lines of data. It's always worth it to keep an eye out for a better method even when you think you've already optimized your solution.

No comments:

Post a Comment