Batch Updating Data in Production Databases

· 2 min read · 286 Words · -Views -Comments

Recently part of my work has revolved around SQL, and I ran into a few gotchas—leaving myself a note here.

Background

Complex UPDATEs across hundreds of thousands of rows caused issues; we switched to chunked single‑statement updates.

Script

Approach outline and code:

/**
 * @description
 *
 * 1. Query the target primary keys with SQL and export them as CSV
 * 2. Create a template for a single UPDATE statement
 * 3. Run this JS file to generate the batched SQL
 * 4. Execute the generated SQL on the target machine
 *
 * Note: tweak CHUNK_SIZE to control how many records are updated per statement
 */
const csv = require('csv-parser');
const fs = require('fs');
const ids = [];
const CHUNK_SIZE = 10;
// Input CSV file
const INPUT_FILE = 'quote_ids.csv';
// Output SQL file to generate
const OUTPUT_FILE = 'repair_data.sql';
const createSQLTemplate = (idArr) => `update \`a-prod\`.quote a left join \`b-prod\`.geographic_unit b
    on a.sales_org = b.sales_org and a.sales_office = b.sales_office and a.business_unit = b.business_unit and
       a.country = b.country_iso_code
set a.geographic_unit_id=b.id
where a.id in ('${idArr.join('\',\'')}');\n\n\n`;
fs.createReadStream(INPUT_FILE)
    .pipe(csv())
    .on('data', (row) => {
        ids.push(row.id);
    })
    .on('end', () => {
        const chunkIds = ids.reduce((res, item, index) => {
            let group = Math.floor(index / CHUNK_SIZE);
            if (res[group] === undefined) {
                res[group] = [];
            }
            res[group].push(item);
            return res;
        }, []);

        fs.writeFile(OUTPUT_FILE, chunkIds.map(item => createSQLTemplate(item)).join(''), () => {
            console.log('write success');
        });
    });

The above avoids single gigantic updates that stress MySQL.

Notes

  • Even with ~200k rows, queries can be slow if retrieving large TEXT fields; select only needed columns and avoid *. Multi‑table joins also slow things down.
  • For UPDATEs, always add WHERE clauses to limit scope unless absolutely certain.
  • Large production data fixes suggest bugs or questionable schema/design; address root causes too.
Authors
Developer, digital product enthusiast, tinkerer, sharer, open source lover