One of my favorite commands in T-SQL is the MERGE
statement, which allows you to merge a table expression into a table, INSERT
ing records if they don't exist, or UPDATE
ing records when they do in a single command. Unfortunately, it's a little verbose, since you still have to write the full INSERT
and UPDATE
statements that you would be using instead of the MERGE
statement.
If you're using MS Access, you don't have MERGE
. If you want to merge data into an MS Access table, you need to write the INSERT
and UPDATE
statements separately, and you'll have to qualify your INSERT
statement in some way to prevent INSERT
ing keys that already exist. There are two alternatives that have some big drawbacks:
- You could run a
DELETE
beforehand to get rid of the keys that you're inserting; then you would just have to write the INSERT
statement. This will cause problems if you have defined relationships on the table you're merging into: the DELETE
will either fail or cascade in an inappropriate way.
- You could
INSERT
just the keys that are missing and then run the UPDATE
statement to fill in the rest of the columns. This requires you to define all the non-key columns as nullable. Nullable columns cause all kinds of issues, so I prefer to avoid them unless NULL
s are truly required.
I don't think there's a pure SQL solution to this problem, but there is a solution if you use ADO to merge the records. ADO Recordset
s support a Seek
operation, which will efficiently locate a record when you provide a key. Merging can be achieved by using Seek
to search for each key you are merging, and if the key is not found, adding a new record.
I've written up an implementation of this technique in Python. Before you can use Seek
, you must specify the unique index name you are using in the Index
property, and you must use a server-side cursor. By default, MS Access will name a table's primary key "PrimaryKey", so this code assumes that the key you are using has this name.
import logging, os.path
from win32com.client.gencache import EnsureDispatch
from win32com.client import constants as c
log = logging.getLogger(__name__)
class AccessDb(object):
def __init__(self, dbpath):
self.dbpath = os.path.abspath(dbpath)
def __enter__(self):
self.db = EnsureDispatch('ADODB.Connection')
self.db.Open('Provider=Microsoft.ACE.OLEDB.12.0;'
'Data source=%s' % self.dbpath)
return self
def __exit__(self, etyp, einst, etb):
self.db.Close()
def bulkload(self, table, data, key=None):
"""Bulk merge data into a table.
Args:
table (str): Name of table where data is to be loaded.
data: Data rows to be merged in (sequence of sequences).
key (int sequence): List of indexes of key columns. If not
provided, all rows will be appended.
"""
rs = EnsureDispatch('ADODB.Recordset')
if key:
rs.Index = 'PrimaryKey'
rs.CursorLocation = c.adUseServer
rs.Open(table, self.db, c.adOpenKeyset, c.adLockOptimistic,
c.adCmdTableDirect)
try:
for record in data:
if key:
rs.Seek([record[i] for i in key], c.adSeekFirstEQ)
if rs.EOF:
rs.AddNew()
else:
rs.AddNew()
for index, field in enumerate(record):
rs.Fields(index).Value = field
rs.Update()
finally:
if rs.EditMode == c.adEditAdd:
log.info(record)
rs.CancelUpdate()
rs.Close()
Example usage:
# TestTable is defined as:
# CREATE TABLE TestTable
# (id int PRIMARY KEY,
# c1 VARCHAR(5),
# c2 VARCHAR(5));
with AccessDb('test.accdb') as db:
db.bulkload('TestTable',
[(1, 'a', 'b'),
(2, 'c', 'd'),
(3, 'e', 'f')], key=[0])