While T-SQL admittedly isn't the greatest language there is for text processing, you shouldn't dismiss it out of hand for simple tasks.
To remove non-digits from a column, you can use PATINDEX
and STUFF
in a loop. If you're careful to do this in a set-based way, T-SQL should be efficient enough in most situations.
WHILE 1 = 1 BEGIN WITH q AS (SELECT table_id, PATINDEX('%[^0-9.]%', column) AS n FROM table) UPDATE table SET column = STUFF(column, q.n, 1, '') FROM q WHERE table_id = q.table_id AND q.n != 0; IF @@ROWCOUNT = 0 BREAK; END;
This code should be easy enough to follow. It removes the first nondigit in column in each row of the table. Once it goes through an iteration without finding any nondigits, it exits the loop. This will iterate the maximum number of nondigits in a single field in the table, but it's still inefficient because it runs PATINDEX
against each row in the table at each iteration.
To solve that problem, we can maintain a column that specifies whether the column has been scrubbed yet. In my example, "NULL" represents unknown and should be the default value for this field.
DECLARE @done bit = 0; WHILE @done = 0 BEGIN WITH q AS (SELECT table_id, PATINDEX('%[^0-9.]%', column) AS n FROM table WHERE COALESCE(column_scrubbed, 0) = 0) UPDATE table SET column = STUFF(column, q.n, 1, N''), column_scrubbed = 0 FROM q WHERE table_id = q.table_id AND q.n != 0; IF @@ROWCOUNT = 0 SET @done = 1; -- if column_scrubbed is still NULL, then the PATINDEX -- must have given 0 UPDATE table SET column_scrubbed = CASE WHEN column_scrubbed IS NULL THEN 1 ELSE NULLIF(column_scrubbed, 0) END; END;
It's a sloppy design to add a column to the schema just to make this operation more efficient. The best approach, in my opinion, is to store intermediate results in a table-valued variable and apply it to the actual table at the end.
-- Find nondigit character in a string DECLARE @pattern varchar(30) = '%[^0-9.]%'; -- Maps [table_id]s to scrubbed values DECLARE @t TABLE (table_id int PRIMARY KEY, column nvarchar(200) NOT NULL, scrubbed bit NOT NULL); -- Initialize current value of [column] for each [table_id] INSERT INTO @t (table_id, column, scrubbed) SELECT table_id, column, 0 FROM table; -- Iterate to remove non-digits from @t.[column], ending when [scrubbed] -- is set for all rows. WHILE @@ROWCOUNT > 0 MERGE INTO @t USING (SELECT table_id, PATINDEX(@pattern, column) AS n FROM @t WHERE scrubbed = 0) q ON [@t].table_id = q.table_id WHEN MATCHED AND n > 0 THEN UPDATE SET column = STUFF(column, q.n, 1, N'') WHEN MATCHED AND n = 0 THEN UPDATE SET scrubbed = 1; -- Apply the results to permanent table UPDATE table SET column = (SELECT [@t].column FROM @t WHERE [@t].table_id = table.table_id);
No comments:
Post a Comment