Wednesday, August 19, 2015

Merging data into an MS Access table

One of my favorite commands in T-SQL is the MERGE statement, which allows you to merge a table expression into a table, INSERTing records if they don't exist, or UPDATEing records when they do in a single command. Unfortunately, it's a little verbose, since you still have to write the full INSERT and UPDATE statements that you would be using instead of the MERGE statement.

If you're using MS Access, you don't have MERGE. If you want to merge data into an MS Access table, you need to write the INSERT and UPDATE statements separately, and you'll have to qualify your INSERT statement in some way to prevent INSERTing keys that already exist. There are two alternatives that have some big drawbacks:

  1. You could run a DELETE beforehand to get rid of the keys that you're inserting; then you would just have to write the INSERT statement. This will cause problems if you have defined relationships on the table you're merging into: the DELETE will either fail or cascade in an inappropriate way.
  2. You could INSERT just the keys that are missing and then run the UPDATE statement to fill in the rest of the columns. This requires you to define all the non-key columns as nullable. Nullable columns cause all kinds of issues, so I prefer to avoid them unless NULLs are truly required.

I don't think there's a pure SQL solution to this problem, but there is a solution if you use ADO to merge the records. ADO Recordsets support a Seek operation, which will efficiently locate a record when you provide a key. Merging can be achieved by using Seek to search for each key you are merging, and if the key is not found, adding a new record.

I've written up an implementation of this technique in Python. Before you can use Seek, you must specify the unique index name you are using in the Index property, and you must use a server-side cursor. By default, MS Access will name a table's primary key "PrimaryKey", so this code assumes that the key you are using has this name.

import logging, os.path
from win32com.client.gencache import EnsureDispatch
from win32com.client import constants as c

log = logging.getLogger(__name__)

class AccessDb(object):
    def __init__(self, dbpath):
        self.dbpath = os.path.abspath(dbpath)

    def __enter__(self):
        self.db = EnsureDispatch('ADODB.Connection')
                     'Data source=%s' % self.dbpath)
        return self

    def __exit__(self, etyp, einst, etb):

    def bulkload(self, table, data, key=None):
        """Bulk merge data into a table.

            table (str): Name of table where data is to be loaded.
            data: Data rows to be merged in (sequence of sequences).
            key (int sequence): List of indexes of key columns. If not
                provided, all rows will be appended.
        rs = EnsureDispatch('ADODB.Recordset')
        if key:
            rs.Index = 'PrimaryKey'
            rs.CursorLocation = c.adUseServer

        rs.Open(table, self.db, c.adOpenKeyset, c.adLockOptimistic,
            for record in data:
                if key:
                    rs.Seek([record[i] for i in key], c.adSeekFirstEQ)
                    if rs.EOF:

                for index, field in enumerate(record):
                    rs.Fields(index).Value = field


            if rs.EditMode == c.adEditAdd:


Example usage:

# TestTable is defined as:
#     (id int PRIMARY KEY,
#      c1 VARCHAR(5),
#      c2 VARCHAR(5));
with AccessDb('test.accdb') as db:
                [(1, 'a', 'b'),
                 (2, 'c', 'd'),
                 (3, 'e', 'f')], key=[0])

No comments:

Post a Comment