Tuesday, April 28, 2015

SSH tunneling trick: using localhost subdomains

Here's a little trick I just discovered which is useful if you do a lot of tunneling over ssh.

You can actually bind your local port tunnels to their own local address, instead of just using the default of localhost.

The first step is to define a localhost subdomain in /etc/hosts. (On Windows, this file is located at %windir%\System32\drivers\etc\hosts). Bind it to a local IP address other than 127.0.0.1 (you have free reign over the last three numbers, as long as the IP starts with 127).

127.0.0.100 nytimes.com.localhost

Next, create your ssh tunnel.

ssh -L 127.0.0.100:80:nytimes.com:80 user@example.com 1

Now try opening http://nytimes.com.localhost in a browser on your local machine. The entire page will not load, because nytimes.com loads images and other assets from domains other than nytimes.com, but enough should load to let you know that the tunnel is working.

This technique is useful for remembering your ssh tunnels if you have a lot of them (like me). It's a lot easier to remember http://nytimes.com.localhost than it is to remember http://localhost:6103. It's also useful if you need to tunnel a program that is finicky about which port it connects to.

If you're having trouble getting this to work, make sure you're not running a local web server that has bound itself to port 80 on all of your 127.* addresses. If you are, either disable your web server, or try this experiment using a different service, such as RDP (port 3389) or SQL Server (port 1433).

1I'm actually using PuTTY, but have translated it to the regular SSH command for wider applicability. The same forwarding will work in PuTTY, even though it looks like it wouldn't because of the narrow local port text field in the settings.

Friday, April 24, 2015

AddToArray VBA function

I'm going to discuss a function to add an element to a VBA Variant array: AddToArray.

In VBA, arrays have a fixed size. If you've already sized an array, while it's possible to resize it to add new items, doing so requires a couple lines of code with complicated indexing operations.

' Add another element to x.
ReDim Preserve x(UBound(x) + 1)
x(UBound(x)) = "d"

Most people might not think that this is terrible in terms of complexity, but it's a common enough operation that I think it would be good to add a Sub to our utility module that adds an element to an array in a generic way.

It's worth noting that resizing an array in this manner has very bad performance if you are working with large arrays, especially compared to alternate data structures that are widely available in other languages. ReDim Preserve will allocate an entirely new array with the new size, copy the old array to the new array, and finally discard the old array. Not only does this make resizing an array an O(n) operation, doing it several times generates a lot of garbage. Because of this, if there's a way to pre-identify the necessary size of an array, it's always preferable to size it to the correct length beforehand instead of adding to it one element at a time.

VBA does offer an efficient, growable collection type (aptly named Collection), but this class has several shortcomings. It has a strange API. It's 1-indexed. It's impossible to modify items that have been stored in a collection without removing and re-adding them. It only can store Variants; there's no type safety. And it's non-standard. Although this isn't so much of an issue if you're chiefly an Excel developer, there is no support for Collection objects in VBScript or VB6, and because of this, standard COM and ActiveX objects will only deal with arrays. Even APIs in Excel in general prefer to use arrays: for example, if you want to assign cell values to multiple cells in a range at once, you'll be doing so with an array.

Here is the AddToArray Sub. I'm not going to go into too much discussion of the details here, as I'm still in the middle of my series on ArraySearch.

Public Sub AddToArray(ByRef av As Variant, v As Variant)
    Dim array_size As Long
    Dim err_info As Variant
    
    If IsArray(av) Then
        ' Normally I hate using error handlers for an ordinary flow of
        ' execution, but there's no other option in VBA to test for an
        ' uninitialized, empty Array when it gets passed to a Sub as a
        ' Variant.
        ' If the parameter had been declared with "av() As Variant",
        ' "IsArray(av) And Not av" would detect uninitialized arrays.
        ' but then only variables explicitly declared as arrays would be
        ' passable without causing a compile-time error.
        
        On Error Resume Next
        array_size = UBound(av)
        err_info = Array(Err.Number, Err.Source, Err.Description)
        On Error GoTo 0
        
        If err_info(0) = 9 Then
            ReDim av(0)
            av(0) = v
        ElseIf err_info(0) <> 0 Then
            ' Re-raise other errors
            Err.Raise err_info(0), err_info(1), err_info(2)
        Else
            ReDim Preserve av(LBound(av) To array_size + 1)
            av(UBound(av)) = v
        End If
    ElseIf IsEmpty(av) Then
        av = Array(v)
    Else
        av = Array(av, v)
    End If
End Sub

And here are the tests for AddToArray.

#If False Then
' basTemp @UnitTests[start]
Private Sub TestAddToArray()
    ' add Long to empty non-array Variant
    Dim x As Variant
    AddToArray x, 1&
    AssertType x, "Variant()"
    AssertEqual Join(x, ","), "1"
    AssertType x(0), "Long"
    
    ' add String to singleton array
    AddToArray x, "Hello"
    AssertEqual Join(x, ","), "1,Hello"
    
    ' test adding problematic variant types to arrays
    AddToArray x, CVErr(1)
    AssertEqual x(2), CVErr(1)
    
    AddToArray x, Null
    AssertIsNull x(3)
    
    AddToArray x, Array(3, 6)
    AssertEqual UBound(x(4)), 1&
    AssertEqual x(4)(1), 6
    
    ' Adding to fixed-length arrays is not supported.
    Dim y(2) As String
    AssertErrStart
    AddToArray y, -1
    AssertError 10 ' This array is fixed or temporarily locked

    ' Test that adding to staticly typed arrays retains the type.
    Dim z() As String
    AddToArray z, "Hello"
    AssertType z, "String()"
    AssertType z(0), "String"
    
    ' Adding to a non-array value should convert the value to a
    ' Variant array containing the original value and new value.
    Dim a As Variant
    a = "Non-array"
    AddToArray a, 3
    AssertType a, "Variant()"
    AssertEqual UBound(a), 1&
    AssertEqual a(0), "Non-array"
    AssertEqual a(1), 3
    
    ' Multidimensional arrays
    Dim b() As Variant
    ReDim b(1, 1)
    
    AssertErrStart
    AddToArray b, 3
    AssertError 9 ' Subscript out of range
End Sub
' basTemp @UnitTests[end]
#End If

Thursday, April 23, 2015

In defense of ScreenUpdating = True

Most people who write a lot of Excel VBA code settle into a pattern of always including this line of code at the beginning of every macro of any significance.

Application.ScreenUpdating = False

This will cause Excel to stop painting the screen, which means that the Excel window will not repaint for a few seconds or minutes while the macro runs.

If you were to ask a VBA programmer why she or he does this, the response you would get would likely be along the lines of "it makes the macro run faster", or possibly "the screen jumps around too much if you let it paint". I don't think either of these reasons holds water, and to get to the real reason, you have to think back to what things were like in the early days of VBA.

VBA has been around for a very, very long time. The first version of Excel to have something resembling modern VBA was Excel 5.0, in 1993. 20 years ago, computer hardware was very different than what it is now. Even with the best optimization, most computers could not play sophisticated games or video in a window larger than a postage stamp. Windows was different back then, too: the video code prioritized flexibility and abstraction at the expense of performance. Combine these two factors, and it's easy to see why ScreenUpdating = False came into vogue: in 1993, repainting the screen was indeed an enormous performance hit, even if your macro wasn't making changes that caused big repaints.

Today, on the other hand, is very different. And I think there are good reasons to let the screen repaint: your macro won't look like it has crashed, your users will be able to see the macro doing its work, and, as a programmer, watching progress of your code on the screen can be an invaluable tool for finding out if something is going wrong or if an improvement can be made.

If you actually are hitting a performance bottleneck by leaving screen repaints on, 99% of the time, it's good indication that your macro is doing something inefficiently. Every action you take in Excel should affect as many cells as possible. You should avoid doing little changes "one at a time" and concentrate on making just a few big changes all at once.

For example, instead of setting the fill color of a column one cell at a time in a loop, get a Range object for all the cells that you need to change and set the fill color for the range, just once.

Or, instead of deleting one row at a time when the cell in a certain column matches a certain value, sort the table so that the rows to be deleted are all together, and delete them all at once. That will convert hundreds or even thousands of steps into just two. You will notice a huge speed boost, even if you compare the old macro with screen updating off to the new macro with it on.

In my opinion, the only reason to use ScreenUpdating = False is when you are manipulating a sheet with a lot of embedded shapes and charts and your macro is causing a distracting flicker. In every other situation, if you are running into a performance problem, take a careful look at your code and make sure that you're not doing something in a stupidly inefficient way.

Monday, April 20, 2015

ArraySearch for VBA: Part 2

Last time, we implemented a simple VBA function to search an array for a value, returning a boolean whether it was found or not, and the index where it was found in an optional parameter.
Public Function ArraySearch(a As Variant, x As Variant, _
                            Optional ByRef found_index As Variant) As Boolean
    Dim i As Long
    Dim result As Boolean
 
    result = False
    For i = LBound(a) To UBound(a)
        If a(i) = x Then
            result = True
            found_index = i
            Exit For
        End If
    Next
 
    ArraySearch = result
End Function

Unfortunately, this implementation failed two of our original unit tests.

Private Sub TestArraySearchSpecial()
    Dim avar As Variant
      
    ' ut6 searching non-array
    AssertFalse ArraySearch("two", 2)
     
    ' ut7 searching for array within array
    avar = Array(1, Array(2, 3), 4, 5)
    Assert ArraySearch(avar, Array(2, 3))
End Sub

Ut6 exposes a simple bug. We have to add an early test that the parameter a is really an array before attempting to index it.

' Add this code to the beginning of ArraySearch.
If Not IsArray(a) Then Err.Raise 13, , "Parameter a is not an array."
Ut7 is more complicated. The VBA equality operator = does not support comparing two arrays. In order to compare two arrays for equality, the array elements would need to be compared, one at a time. Such a comparison would also need to be recursive, since there's nothing stopping the arrays from containing sub-array elements themselves.

We have to step back and evaluate whether the benefit outweighs the complexity. The goal is to write a generalized array search algorithm, and if the function is to be as general as possible, supporting searching for a sub-array seems like it's something that we would want to do. On the other hand, in my experience, this is not a scenario that comes up very often. I rarely get to the point where I have a data structure complicated enough to involve arrays within arrays. Before things have gotten to that level of complexity, I either have moved the data into a SQL database, or have developed a properly encapsulated data model using user-defined types and classes. Arrays in VBA are best used to store a homogeneous list of data or (in a pinch) to define a small, limited-use data structure, similar to a tuple in Python. Implementing a complex data model using arrays, sub-arrays, and perhaps a Dictionary or two will quickly lead you to an unmaintainable mess of spaghetti code.

Using that reasoning, I've made the decision that I will not support searching for sub-arrays. We still should check for sub-arrays in the input so that this edge case is properly accounted for. This test can be combined with the earlier check.

' Add this code to the beginning of ArraySearch.
If Not IsArray(a) Then
    Err.Raise 13, , "Parameter a is not an array."
ElseIf IsArray(x) Then
    Err.Raise 13, , "Searching for a sub-array is not supported."
End If

While we're at it, let's update the unit tests to accommodate our revised design.

Private Sub TestArraySearchSpecial()
    Dim avar As Variant
    Dim result As Boolean
      
    ' ut6 searching non-array
    On Error Resume Next
    result = ArraySearch("two", 2)
    AssertEqual Err.Number, 13
    Err.Clear
    On Error GoTo 0
     
    ' ut7 searching for array within array
    avar = Array(1, Array(2, 3), 4, 5)
    On Error Resume Next
    result = ArraySearch(avar, Array(2, 3))
    AssertEqual Err.Number, 13
    Err.Clear
    On Error GoTo 0
End Sub

We're not done yet, though. ArraySearch still has problems. A final form will be presented in tomorrow's post, but before that, here are some more failing unit tests. As an exercise, see if you can guess what causes these tests to fail and how the code will need to be updated to deal with them.

Private Sub TestArraySearchTypes()
    ' ut8 Search for vbObject
    Assert ArraySearch(Array(1, 2, 3, ThisWorkbook), ThisWorkbook)

    ' ut9 Searches involving Empty, Null, and vbObjects
    Assert ArraySearch(Array(1, 2, Empty, Sheets(1), Null, 4), 4)
    AssertFalse ArraySearch(Array(1, 2, Empty, vbNullString), Null)
    
    ' ut10 ParamArray searches
    TestArraySearchParamArray 1, "a", Null, Sheets(1), "arg1"
End Sub

Private Sub TestArraySearchParamArray(ParamArray params() As Variant)
    Assert ArraySearch(params, "arg1")
    AssertFalse ArraySearch(params, vbNullString)
End Sub

Private Sub TestArraySearchCollection()
    Dim c As New Collection

    ' ut11 Support for Collections? Why not?
    c.Add(1)
    c.Add("b")
    c.Add(Null)
    c.Add("cdef")
    Assert ArraySearch(c, "cdef")
    Assert ArraySearch(c, 1)
    AssertFalse ArraySearch(c, Empty)

Friday, April 17, 2015

A robust implementation of ArraySearch for VBA: Part 1

VBA doesn't include a built-in function for searching an Array. Because of this, after doing work in VBA for a while, you'll probably have written several loops like this.

Dim i As Long
Dim found As Boolean

For i = 0 To UBound(a)
    If a(i) = v Then
        found = True
        Exit For
    End If
Next

If found Then
    Debug.Print "Value was found at index " & i
Else
    Debug.Print "Value is missing"
End If

You might think that this is quite a bit of code for what should be a simple operation. A huge element of good programming style is to eliminate repeated logic and abstract it into separate procedures. VBA, with its out of date, weirdly designed standard library, provides us with several opportunities to practice this aspect of software design.

I think that most VBA programmers should put together their own "utility" module that they include in any non-trivial projects to eliminate the standard, boilerplate code that they find themselves writing often. As you might have guessed by now, this post is going to focus on writing a standard ArraySearch function that would fit perfectly in such a utility module.

First attempt at ArraySearch

This is our first attempt. We're going to try just converting the code in the snippet above into a Function.

' Search the array [a] for value [x], returning whether x was found.
' If [found_index] is provided and [x] is found, [found_index] receives the
' index where [x] was found.
Public Function ArraySearch(a As Variant, x As Variant, _
                            Optional ByRef found_index As Variant) As Boolean
    Dim i As Long
    Dim result As Boolean

    result = False
    For i = LBound(a) To UBound(a)
        If a(i) = x Then
            result = True
            found_index = i
            Exit For
        End If
    Next

    ArraySearch = result
End Function

I changed the indexing from the prior version to go from LBound to UBound in case someone passes in an array with a base other than zero, which is possible when declaring an array or by using the Option Base directive. Other than that, it's pretty similar.

As an exercise, try reading this function carefully and make a prediction of what kinds of input it will fail to process.

Let's do some unit testing, with a focus on VBA's many different possible kinds of arrays.

Private Sub TestArraySearchFixed()
    Dim afixed(4) As String
    Dim found As Long
    
    afixed(0) = "alpha"
    afixed(1) = "bravo"
    afixed(2) = "1"
    afixed(3) = "two"
    afixed(4) = "3"
    
    ' ut1 successful/unsuccessful searches on fixed array
    ' unsuccessful search should preserve found_index
    Assert ArraySearch(afixed, "two", found)
    AssertEqual found, 3&
    AssertFalse ArraySearch(afixed, 2, found)
    AssertEqual found, 3&
End Sub

Private Sub TestArraySearchDynamic()
    Dim adynamic() As Long

    ReDim adynamic(3)
    adynamic(0) = 78503
    adynamic(1) = 65075
    adynamic(2) = 21738
    adynamic(3) = 43044
    
    ' ut2 unsuccessful/sucessful search on dynamic array
    AssertFalse ArraySearch(adynamic, 44444)
    Assert ArraySearch(adynamic, 78503)
End Sub

Private Sub TestArraySearchSpecial()
    Dim abase(1 To 2) As Variant
    Dim asingleton(0) As Integer
    Dim aempty As Variant
    Dim avar As Variant

    abase(1) = "abc"
    abase(2) = 123&
    
    asingleton(0) = -1
    
    aempty = Array()
    
    ' ut3 search on a singleton array (found and not found)
    Assert ArraySearch(asingleton, -1)
    AssertFalse ArraySearch(asingleton, -2)
    
    ' ut4 search on empty array
    AssertFalse ArraySearch(aempty, vbNullString)
    
    ' ut5 search on homogeneous variant array (found and not found)
    avar = Array(1, 2, 3, 4)
    Assert ArraySearch2(avar, 3)
    AssertFalse ArraySearch(avar, 0)
    
    ' ut6 searching non-array
    AssertFalse ArraySearch("two", 2)
    
    ' ut7 searching for array within array
    avar = Array(1, Array(2, 3), 4, 5)
    Assert ArraySearch(avar, Array(2, 3))
End Sub

Unit tests 1 through 6 pass okay, but we run into problems on the last two.

Why did the tests ut6 and ut7 fail? How are we going to fix the function so that the tests pass? Could there be other problems with our ArraySearch function that we haven't even tested for yet? Find on Monday, in the next blog post!

Converting Variant types in Excel VBA

As part of an Excel project, I needed to write some code that converted Variants in a reliable way.

The project involved calling a stored procedure that took parameters of many different types which were provided via cells in a spreadsheet. An Excel spreadsheet cell can contain values in a variety of different data types. When you access a cell value with VBA code, it comes to wrapped in a Variant type. Essentially, I needed to be able to convert data from a very forgiving type system (spreadsheet cells) to a very strictly typed system (a stored procedure parameter in a SQL Server database). And since the data was being supplied by non-technical end-users, I had to be sure that unusual Variant types didn't cause issues with data integrity or summon the dreaded VBA error dialog box. For example, the Variant type vbError, which is used by error cell values such as #NA or #NAME, is infamous for causing problems with VBA macros. You'll see why this is in a future post when I discuss Variant comparisons.

To solve the problem, my approach was to write a function that takes a Variant type (that could be any sub-type) and returns a Variant of the desired sub-type, converting it if possible, choosing a logical substitution value if not, and only raising an error when there is no other option.

In preparation, I wrote some code to build a Variant type conversion chart, which shows whether a subtype can be converted to any other subtype, and any errors that may be generated when the conversion is attempted.

Most of the results in this chart are obvious. It's to be expected that just about every type can be converted to a string or that numbers will sometimes overflow when converting to narrower numeric types. Some results are surprising, such as the inability to convert a vbError to a vbDate or a vbDecimal, even though it can be converted to just about every other type.

The conversions for vbEmpty and vbNull are instructive. The Empty value in VBA represents the default value, so it makes sense that it can be converted into anything else. When you convert Empty to a subtype, it turns into the default value for that subtype: for example, an empty string or the integer 0. The Null value, on the other hand, is intended by VB's designers to be an "intentionally missing" value, and so it's a mistake to coerce it into a data containing type. Thankfully, a spreadsheet cell cannot contain a null value. Null values behave similarly as in SQL in comparisons, and Null is considered Falsey in a VBA If statement.

Here is the variant conversion function that I ended up writing. Remember, the intention was to convert a variant value from any subtype to a defined variant subtype, and to be as forgiving as possible in the process.

Const CustomError = 2742

' Convert Variant [x] to subtype [rtype], returning [if_null] if conversion is
' impossible, otherwise returns a converted Variant value.
' Additional parameters are values that should be treated as null.
Public Function VConvert(rtype As VbVarType, _
                         x As Variant, _
                         if_null As Variant, _
                         ParamArray nullable_values() As Variant) As Variant
    Dim result As Variant

    ' Raise an error if provided with an unsupported type.
    Select Case rtype
        Case vbEmpty, vbNull, vbInteger, vbLong, vbSingle, vbDouble, _
            vbCurrency, vbDate, vbString, vbError, vbBoolean, vbVariant, _
            vbDecimal, vbByte
            ' Guard against Case Else if a valid type is provided.

        Case vbObject, vbDataObject, vbLongLong, vbUserDefinedType, vbArray
            Err.Raise CustomError + 1, , "Invalid result type: " & rtype

        Case Else
            Err.Raise CustomError + 1, , "Unknown result type: " & rtype
    End Select
    
    If VarType(x) = vbString Then
        result = Trim(x)
    Else
        result = x
    End If
    
    If ArraySearch(nullable_values, x) Then
        result = if_null
    ElseIf IsEmpty(result) Or IsNull(result) Or result = vbNullString Then
        result = if_null
    Else
        ' Attempt conversion and return if_null if an error is raised.
        On Error GoTo ErrHandler
        Select Case rtype
            Case vbEmpty: result = Empty
            Case vbNull: result = Null
            Case vbInteger: result = CInt(result)
            Case vbLong: result = CLng(result)
            Case vbSingle: result = CSng(result)
            Case vbDouble: result = CDbl(result)
            Case vbCurrency: result = CCur(result)
            
            Case vbDate
                If IsDate(result) Then
                    result = DateValue(CDate(result))
                Else
                    result = if_null
                End If
            
            Case vbString: result = CStr(result)
            Case vbError: result = CVErr(result)
            Case vbBoolean: result = CBool(result)
            Case vbVariant: result = CVar(result)
            Case vbDecimal: result = CDec(result)
            Case vbByte: result = CByte(result)
        End Select
        On Error GoTo 0
    End If
        
    VConvert = result
    Exit Function

ErrHandler:
    VConvert = if_null
End Function

And here are the unit tests for this function.

Private Sub TestVConvert()
    Dim vDate As Variant
    Dim vResult As Variant
    
    vDate = DateSerial(1982, 5, 22) + TimeSerial(5, 30, 0)
    
    ' Test time truncation
    AssertEquals VConvert(vbDate, vDate, Null), #5/22/1982#
    
    ' Test conversion of vbDate to vbString
    AssertEquals VConvert(vbString, vDate, Null), "5/22/1982 5:30:00 AM"
    
    ' Test nullable ParamArray
    AssertIsNull VConvert(vbString, "Hello", Null, "a", "b", "c", "Hello", "d")
    
    ' Test conversion that causes overflow
    AssertEquals VConvert(vbInteger, 999999, -1), -1
    
    ' Test that the overflow Err was cleared
    AssertEquals Err.Number, 0&
    
    ' Test vbLong to vbInteger regular conversion
    AssertEquals VConvert(vbInteger, 100&, -1, 99.9), 100
    
    ' Test string trimming during conversion
    AssertEquals VConvert(vbString, "   abcd  ", Null), "abcd"
    
    ' Test number conversion with untrimmed string
    AssertEquals VConvert(vbDecimal, "   42.25 ", Null), CDec(42.25)
    
    ' Test that conversions retain type of is_null parameter
    AssertType VConvert(vbDate, "Not a date", 42&), "Long"
    
    ' Test that Nulls get converted to is_null
    AssertEquals VConvert(vbInteger, Null, -1), -1
    
    ' Test that Emptys get converted to is_null
    AssertEquals VConvert(vbLong, Empty, 50&), CLng(50)
    
    ' Test that vbNullStrings get converted to is_null
    AssertEquals VConvert(vbString, "", "[empty]"), "[empty]"
    
    ' Test that converting to invalid type raises error
    On Error Resume Next
    vResult = VConvert(vbObject, vStr, Null)
    AssertEquals Err.Number, CustomError + 1
    Err.Clear
    On Error GoTo 0
End Sub

You may have noticed a call to an unimplemented function called "ArraySearch". VBA provides no built-in function for searching an array, so a robust implementation should be a part of any serious VBA programmer's toolbox. Implementing ArraySearch in a way that intelligently handles heterogeneous arrays of multiple subtypes without producing errors or incorrect result is not trivial and will be a topic for a future post.

Thursday, April 16, 2015

New blog

I'm going to be posting here about my programming work, which is mostly in Excel VBA and T-SQL these days. I might occasionally write about Android development, but most likely this will become one of the abandoned blogs that you see that have three posts from 5 years ago and were never touched again.