Monday, August 11, 2008

Big Update

Big Update
Filed under: Performance, Tuning — Jonathan Lewis @ 8:34 pm UTC Jul 28,2008

A recent post on the OTN forum asked:

I was wondering is there any fast method for updating 8 million records out of 10 million table? For eg :
I am having a customer table of 10m records and columns are cust_id, cust_num and cust_name.
i need to update 8m records out of 10m customer table as follows.
update customer set cust_id=46 where cust_id=75;
The above statement will update 8m records. And cust_id is indexed.

If you read the replies to this post you will see that there are a number of variations on a theme that might be appropriate. Ultimately your final choice comes down to a trade-off between the size of the maintenance window, your desire to avoid complexity, and where your most serious bottlenecks are.

Option 1: Just update the column - probably very slow with lots of undo and redo, and with a massive impact on other processes because of their need to work hard to achieve read-consistency. But this is possibly the ONLY guaranteed correct update if you aren’t allowed to block access to the table for the duration. Given the particular nature of this update, you might want to rebuild (online) or coalesce the critical index afterwards

Option 2: Drop the index on cust_id, update the column, then rebuild the index. You need to be a little careful that you do rebuild the index; and any other sessions code may run into serious problems while the index is out of place - so this solution (like all those that follow) is really an “offline” solution. This is likely to save you a lot of undo and redo (from updating the index - an index update requires roughly double the resources of a table update) and could save you quite a lot of random reads and writes as the index is updated. This is quite clean and simple, and could be pretty quick - especially if you have enough resources to do things like parallel updates and multiple concurrent index rebuilds.

Option 3: Do a CTAS (create table as select) with a decode to change the value of the cust_id as you make the copy. This could be a messy job if the table is in the middle of a number of referential integrity constraints. You also have the expense of rebuilding all the indexes after you’ve created the table - but you could do everything nologging (so long as you remember to do a backup of the relevant tablespaces afterwards). After the CTAS, you have to run through a series of renames (and / or drops). So this needs some care. Even in the simplest case (i.e. no referential integrity to worry about) you have the added expense of all the tablescans to create new indexes - though the impact of this could, perhaps, be minimised if you can build all the indexes at the same time.

Option 4: Create an empty copy of the table with its indexes in place, and run an insert with the /*+ append */ hint, possibly with the nologging option. The benefit of this is that Oracle optimises the index update when using /*+ append */. Instead of updating the index row by row as the data arrives, it accumulates all the key entries, then when the table is loaded, sorts each set of index entries in turn and does a type of array update into the index. This adds to the demand for memory, but minimises undo, redo, random I/O and index leaf block splits. (Oracle does this without the /*+ append */ hint for bitmap indexes - it is only b-tree indexes that require you to use the hint to enable this optimisation). The hint also eliminates undo for the table; and nologging can be used to eliminate redo for the table - but again, if you load the table using nologging then you need to backup the tablespace afterwards. Finally, of course, you don’t have to scan the table once for each index that you rebuild - all the data for all the indexes is accumulated (in memory) as the table is populated.

So you’ve got lots of choices - and if you’ve got a full size development or test system, you can even test the different strategies to see which one works best for your environment, your data, and your set of indexes.

No comments: