Magento 2 Data Migration
The original video this piece is based on.
This is two videos welded into one. The first I recorded back in 2021, mid-way through dragging a Magento 1.9 store up to Magento 2.4 and swearing at my screen. The second came a year later, about the import tool I reach for every time a client hands me a spreadsheet of products. The version numbers have aged. The problems have not. Getting data into and out of Magento cleanly is still most of the actual work, and it is still where projects quietly go wrong.
So here is what I have learned, split the way the job really splits: the big one-off platform migration, and the everyday grind of moving product data around.
The big one: Magento 1 to Magento 2
Magento ships an official data migration tool, and it does the heavy lifting of moving products, customers, orders and settings from a Magento 1 store into a Magento 2 one. It works. But it has opinions, and if you do not learn them up front it will hand you a store that looks fine right up until a client clicks the wrong product.
First rule I settled on: always migrate into a fresh, blank Magento 2 install, and keep a saved copy of that blank database. Migrations crash. When they do, you want to drop the database and start clean rather than try to resume. The tool has a reset flag, and in theory running without it picks up where it left off. In practice I had patchy luck with that, so now I make my fixes up front, restore the blank snapshot, and run it again from scratch. It is slower to type and a lot easier to trust.
The colour trap that cost me an afternoon
This is the bit nobody warns you about. Magento 2 comes with a default attribute called color, spelled the American way. On a blank install that attribute exists, but it is disabled and not assigned to the default attribute set. Manufacturer is the same.
Here is why that matters. If your Magento 1 store used colour as the attribute that links configurable parents to their child products, the migration brings all the simple products across but quietly fails to rebuild that parent-child relationship. Size came over fine for me. Colour did not. The only difference was that size was not sitting disabled in the default set and colour was. The migration handles attributes that are properly part of an attribute set, and skips the configuration on one that is left unassigned.
The fix is dull and it works: enable colour and assign it to the default attribute set before you run the migration. Then the super-attribute relationships link up the way they should. If your store leans on some custom attribute for those configurable relationships, check the same thing for that attribute too.
Settings, deltas, and not exposing your live database
A few more things I wish I had known earlier:
- Dump your settings once. The tool can migrate settings, but I export them and commit them into config.php so the store rebuilds itself from code. After that I rarely need to re-pull settings from the old store again.
- Skip the junk. My migration kept choking on the reviews table, which was full of spam. The client did not need the old reviews, so I left them behind rather than fight an import that was never going to come clean. Know what you are allowed to drop.
- Understand delta migration. The tool can run in delta mode, which watches the source for changes and keeps the new store in sync. It only works against a live database, because it drops markers to track what it has already moved. So my pattern is to run the bulk data migration against a snapshot during the build, then hook delta up to the live store a couple of days before go-live to catch the last orders and customers. I do not point my development process at a live production database for six to twelve weeks while I build. And you cannot just keep swapping in a fresh copy of the database either, because you lose the markers the original migration set.
The everyday one: importing product data
Most data work is not a once-a-project platform migration. It is a client emailing you a spreadsheet of two thousand products and asking you to get them in by Friday. Magento has a native importer and it is fine for the simple cases, but the moment you hit configurable products it turns nasty. The additional-attributes string syntax for configurables is genuinely horrible, and native import stops dead on the first error and sends you digging through log files to find out what broke.
For years I have used a paid third-party extension for this instead, Firebear's Improved Import and Export. I am not on commission and it is not free, so judge it on the work, but the things it does that native does not have saved me more hours than I can count:
- Import straight from a Google Sheet URL. No emailing spreadsheets back and forth, no stale copies floating around. The client and I work in one sheet and I run the import against it.
- A real error console. It shows you the failing rows as it runs instead of silently halting. When you are pushing twenty thousand products in, that feedback is the difference between an afternoon and a week.
- Import behaviour you can control. Add-and-update only, which is safe on production because you are never deleting anything. Clear old categories, strip old images, create URL rewrites. Real switches for real jobs.
- Category creation from the spreadsheet. You build category paths with separators, a slash between each level and a comma to drop a product into more than one. I have rebuilt an entire category tree from a sheet in minutes, the kind of restructure that would otherwise be an afternoon of dragging and dropping in the admin.
One gotcha bites everyone, native importer or not: set your image columns properly. If you do not specify base, small and thumbnail images, Magento shows its placeholder and you are left wondering whether your index is broken. I once spelled an image column wrong on a big import and lost every category image, then spent ages blaming the indexer. It was the column name.
Why this matters more once you go headless
Here is the thread that ties both halves together. The whole reason a migration hurts is that the data was locked into one platform's shape, with its own attribute quirks and its own import format. When I moved my own work to a headless setup, getting data in and out cleanly stopped being an afterthought and became the point. If you can export your catalogue to a plain, structured format and read it back without needing a special extension, you are not held hostage by any one platform's import tool.
You will still do migrations. You will still get the Friday spreadsheet. But the lesson under all of it is the same. Own your data in a form you can move, and the next migration is a job rather than a crisis.