Monday, May 25, 2015

Simple profiling in VBA

When you are optimizing for performance, the best approach is concentrate on the code where the most time is being spent. Even if you have a lot of experience, it's usually nearly impossible to guess where these hotspots are, and there's no reason to even attempt that when it's so easy to determine exactly what needs to be optimized by profiling your code.

VBA, of course, doesn't have a built in profiler. But it's really simple to roll your own, since fundamentally the only thing a profiler does is store the elapsed time between events in a data structure.

The simple implementation I made uses a static dictionary variable to store all its data. This is mostly for performance reasons: lookups in a dictionary are very fast. You don't want any part of the profiling code to run slowly, because that would greatly distort your results. You could also do this with a module-level variable, but I felt like having most of the implementation in a single function simplified the API and didn't sacrifice too much readability.

To use this profiler, add a call to Profile paFunctionEnter, "FunctionNameHere" to the start of the functions and subs that you want to measure, and a call to Profile paFunctionExit to all exit points. Since you have to mark the start and end points yourself, you will probably want to avoid adding profiling to trivial functions, which will just clutter up your output. You also might want to add profiling in the middle of a function; I've adopted the convention of prepending an ampersand to internal paFunctionEnters to distinguish them from actual functions in the output.

Enum ProfileAction
End Enum

Enum ProfileFormat
    pfSelfTime = 0
    pfTotalTime = 1
    pfCallCount = 2
End Enum

Public Function Profile(a As ProfileAction, ParamArray args())
    Static sProf As Object
    Dim stack As String
    Dim elapsed As Single
    Dim calls As Variant
    If a <> paStart And sProf Is Nothing Then Exit Function
    If a = paStart Then
        Set sProf = CreateObject("Scripting.Dictionary")
        stack = args(0)
        sProf("_stack") = stack
        sProf("_t") = Timer
        sProf("_calls:" & stack) = 1
        Exit Function
    ElseIf a = paEnd Then
        Set sProf = Nothing
        Exit Function
    End If
    elapsed = Timer - sProf("_t")
    stack = sProf("_stack")
    Select Case a
        Case paFunctionEnter
            sProf(stack) = sProf(stack) + elapsed
            sProf("_stack") = stack & "/" & args(0)
            sProf("_parent:" & sProf("_stack")) = stack
            stack = sProf("_stack")
            calls = sProf("_calls:" & stack)
            If IsNull(calls) Then
                sProf("_calls:" & stack) = 1
                sProf("_calls:" & stack) = calls + 1
            End If
        Case paFunctionExit
            sProf(stack) = sProf(stack) + elapsed
            stack = sProf("_parent:" & stack)
            sProf("_stack") = stack
        Case paShowResults
            ProfileResults sProf, CLng(args(0))
    End Select
    sProf("_t") = Timer
End Function

Private Sub ProfileResults(prof As Object, fmt As ProfileFormat)
    Dim k As Variant
    Dim k2 As Variant
    Dim fmt_label As String
    For Each k In prof.Keys
        If Left(k, 1) <> "_" Then
            prof("_self:" & k) = prof(k)
            prof("_total:" & k) = 0!
            For Each k2 In prof.Keys
                If Left(k2, Len(k)) = k Then
                    prof("_total:" & k) = prof("_total:" & k) + prof(k2)
                End If
        End If
    fmt_label = Array("_self:", "_total:", "_calls:")(fmt)

    For Each k In prof.Keys
        If Left(k, Len(fmt_label)) = fmt_label Then
            Debug.Print Mid(k, Len(fmt_label) + 1) & ": " & prof(k)
        End If
End Sub

Example usage:

' Add calls to Profile paFunctionEnter and Profile paFunctionExit
' to the functions and subs that you are measuring.
Public Sub ProfilingExample()
    Dim x As Variant
    Profile paStart, "Test"
    x = SlowFunction(arg1, arg2, arg3)
    Profile paFunctionExit ' exit top level "Test" pseudofunction
    ' Profile paShowResults, pfCallCount
    ' Profile paShowResults, pfSelfTime
    Profile paShowResults, pfTotalTime
    Profile paEnd
End Sub

Here's what the output looks like. Since I called paShowResults with pfTotalTime, the values are the total seconds spent in each function. You can also get the number of calls and the "self" time, which doesn't include the time spent in sub-calls that are also profiled.

Test: 1.382813
Test/Filtered: 1.382813
Test/Filtered/GetScratchWorksheet: 0
Test/Filtered/CopyValues: 1.179688
Test/Filtered/CopyValues/FirstRow: 0
Test/Filtered/CopyValues/FirstCol: 0
Test/Filtered/CopyValues/SqueezeRange: 1.179688
Test/Filtered/CopyValues/SqueezeRange/FirstRow: 0
Test/Filtered/CopyValues/SqueezeRange/FirstCol: 0
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper: 1.179688
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea: 1.148438
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea/LastCol: 0
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea/&RowCollision: 0.6015625
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea/&RowCollision/LastCol: 0.234375
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea/&RowCollision/LastRow: 0.2578125
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea/LastRow: 0
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea/&ColumnCollision: 0.515625
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea/&ColumnCollision/LastRow: 0.125
Test/Filtered/CopyValues/SqueezeRange/SqueezeRangeHelper/&ShiftArea/&ColumnCollision/LastCol: 0.203125
Test/Filtered/CopyValues/LastRow: 0
Test/Filtered/CopyValues/LastCol: 0

Thursday, May 14, 2015

Convert Excel column number to letters and vice versa

Since I haven't written on here for a while, I thought I'd put up a quick post with some code to convert an Excel column letter reference to a number or the reverse: convert column letters to a number. This is a good candidate for your Excel utility module.

This function uses recursion, but there isn't any risk of stack overflow because there will at most log26(n) recursions, where n is the column number. It also uses the relatively unknown VBA \ operator, which does integer division (rounding down). Input validation is left as an exercise. I don't do any because I only ever call this function with constant values, which are pretty hard to get wrong.

' converts column letter to number or vice versa
Public Function Col(c As Variant) As Variant
    Dim x As Variant
    If VarType(c) = vbLong Or VarType(c) = vbInteger Then
        x = Chr(Asc("A") + (c - 1) Mod 26)
        If c > 26 Then
            Col = Col((c - 1) \ 26) & x
            Col = x
        End If
    ElseIf VarType(c) = vbString Then
        x = Asc(Right(c, 1)) - Asc("A") + 1
        If Len(c) > 1 Then
            Col = x + Col(Left(c, Len(c) - 1)) * 26
            Col = x
        End If
        Col = Null
    End If
End Function

Excel versions 2007 and later support 16,384 columns, so a good test is with column 16,384, which you can verify is XFD by opening a new Workbook and pressing Ctrl+Right.

' Executed in the immediate window


Friday, May 8, 2015

Quickly set an Excel range to a literal array of values

You might know already that the best way to set the values of a Range in Excel is use a Variant array containing the desired values, keeping in mind that the most efficient way to write VBA code is to minimize calls to the Excel API, so, of course, setting the cells one at a time is completely out of the question.

' Method one (BAD): set the cells one at a time.
Range("A1") = "a"
Range("B1") = "b"
Range("C1") = "c"
Range("A2") = 1
Range("B2") = 2
Range("C2") = 3

' Method two (BETTER): set the cells using a Variant array
Dim x(1 To 2, 1 To 3)
x(1, 1) = "a"
x(1, 2) = "b"
x(1, 3) = "c"
x(2, 1) = 1
x(2, 2) = 2
x(2, 3) = 3
Range("A1:C2") = x

Since there (seems to be) no way to create a literal 2D array in VBA, you might think that this slightly verbose style is the only option. But there is an alternative to do the same thing, which takes advantage of Excel array constant notation:

[A1:C2] = [{"a", "b", "c"; 1, 2, 3}]

You can use a similar technique to set formulas for several columns in a single API call.

[OFFSET(D2:E2,0,0,COUNTA(A:A)-1)] = Array( _
    "=VLOOKUP($A2,Sheet2:$A:$C,2,FALSE)", _

The offset formula creates a range starting at D2:E2 and extends the row count using the number of non-blank cells in column A. This won't work so well if each of your table columns may have blank cell gaps, so it's a bit more brittle than the previous trick, but is still a useful technique if you have control over the data that you're processing.

Thursday, May 7, 2015

Excel get autofilter row count

After having this question myself and doing a little bit of searching, I've found that there's a lot of misinformation out there, so I'm going to post a reliable way to get the number of visible records in an Excel autofiltered worksheet.

Public Function GetRecordCount(ws As Worksheet) As Long
    GetRecordCount = ws.AutoFilter.Range.Columns(1).SpecialCells( _
        xlCellTypeVisible).Count - 1
End Function

This is the most common incorrect solution to this problem I've come across:

' This code is incorrect.
row_count = filter.Range.SpecialCells(xlCellTypeVisible).Rows.Count

The Rows property does not work correctly on Ranges that have multiple Areas, such as a range returned from the SpecialCells method. The property only returns the rows in the first Area of the Range, so you will only get a row count of the first contiguous range of the filtered data. E.g., if rows 1, 2, 5, 6, 7, and 8 are visible, the result would be 2, since rows 1 and 2 comprise the first contiguous range.

The other approach that I've seen uses a for loop to count rows. As I've discussed before, in VBA you always want make as few calls to the Excel API as possible. You never want to operate on an Excel worksheet one row at a time when the API exposes a method to get the same result using a single call, because operating one row or cell at a time will kill your macro performance when you are processing a lot of data.

Friday, May 1, 2015

Excel changing data validation based on cell value

In Excel, it doesn't seem like you can use a different type of list validation based on changing values in another column. One common workaround is to just define completely different validation rules for different ranges in the same column. This is a mistake, because if you sort or copy the cells, the data validation does not travel with them and you end up with validation rules being applied to the wrong cells.

There is a trick, though. You need to use a formula to define the range that will give you the list values that the validation applies to. For example, if column B defines which validation rule to apply, and you have the list values for each validation rule in the lookup tab, with each column B value in the range A1:D1, and the possible cell values in a list below each B value, you could use a formula like this as the list validation formula, which you are applying to the entire column A.

Experienced Excel users might be questioning my use of the CHOOSE function here, and wondering why I didn't use OFFSET. Usage of OFFSET (and related functions like INDIRECT) should always be avoided if possible. Since Excel's calculation engine cannot determine the cells that an OFFSET formula depends on, every formula that uses OFFSET will always be recalculated when anything in the workbook changes. This can have a very negative impact on spreadsheet performance. However, if you have a large number of different validation rules, the OFFSET version might be preferable, and would look like this.