The EAV schema is why most Magento migrations fail.
It looks manageable from the outside: products stored across catalog_product_entity, catalog_product_entity_varchar, catalog_product_entity_int, catalog_product_entity_decimal, catalog_product_entity_datetime, and catalog_product_entity_text. Six tables, straightforward ETL job, done in a weekend.
Then you discover that attribute_id = 75 means “product name” in your Magento instance and “color” in your staging instance. Every attribute ID is generated at install time and differs between environments. Any ETL script that hardcodes attribute IDs will produce corrupted data in production.
Answer-first: Migrate EAV by (1) building a magento_id_map table that translates Magento integer IDs to UUIDs, (2) using eav_attribute.attribute_code (not attribute_id) as the stable lookup key, (3) applying a dynamic SQL pivot that reads attribute definitions at runtime rather than hardcoding column mappings, and (4) running the extraction in three phases: initial full load → incremental delta → cutover validation. The SQL queries that make this work are in this article.
1. The EAV Data Model
A Magento product with SKU MSH-BLK-L (Men’s Black Shirt, Large) is stored like this:
catalog_product_entity (the entity table):
entity_id | sku | entity_type_id | attribute_set_id
2841 | MSH-BLK-L | 4 | 9
catalog_product_entity_varchar (string attributes):
entity_id | attribute_id | store_id | value
2841 | 75 | 0 | Men's Black Shirt
2841 | 76 | 0 | MSH-BLK-L
2841 | 97 | 0 | Men's Shirts
catalog_product_entity_int (integer attributes):
entity_id | attribute_id | store_id | value
2841 | 80 | 0 | 8 ← status: 1=Enabled, 2=Disabled
2841 | 81 | 0 | 4 ← visibility
2841 | 134 | 0 | 2 ← tax_class_id
catalog_product_entity_decimal (decimal attributes):
entity_id | attribute_id | store_id | value
2841 | 77 | 0 | 299000.0000 ← price
2841 | 78 | 0 | 0.0000 ← special_price
The problem: attribute_id = 75 means “name” in this database. In your staging database (set up from a fresh Magento install), attribute_id = 75 might mean “color”. Attribute IDs are instance-specific, generated by AUTO_INCREMENT during Magento installation.
2. The Trap: Hardcoded Attribute IDs
The naive migration approach, copied from countless Stack Overflow answers:
-- ❌ WRONG: Hardcoded attribute IDs break in production
SELECT
e.entity_id,
e.sku,
v_name.value AS name,
d_price.value AS price
FROM catalog_product_entity e
LEFT JOIN catalog_product_entity_varchar v_name
ON v_name.entity_id = e.entity_id AND v_name.attribute_id = 75 -- "name" hardcoded!
LEFT JOIN catalog_product_entity_decimal d_price
ON d_price.entity_id = e.entity_id AND d_price.attribute_id = 77 -- "price" hardcoded!
WHERE v_name.store_id = 0
This query will produce correct results in your development environment, possibly incorrect results in staging, and silently wrong results in production if anyone ran a Magento re-index or added attributes in a different order.
3. The Fix: Lookup by Attribute Code
-- ✅ CORRECT: Resolve attribute IDs dynamically from eav_attribute
SELECT attr.attribute_id
FROM eav_attribute attr
JOIN eav_entity_type et ON et.entity_type_id = attr.entity_type_id
WHERE et.entity_type_code = 'catalog_product'
AND attr.attribute_code = 'name'; -- 'name' is stable, 'attribute_id' is not
The attribute_code column is stable across instances. name, price, sku, status, visibility — these codes are defined by Magento core and don’t change between installs. Always look up attribute IDs from eav_attribute at runtime. Never hardcode them.
4. The magento_id_map: Integer → UUID Translation
Every record in Magento uses MySQL AUTO_INCREMENT integer primary keys. Every record in the new microservice platform uses UUIDs. The migration creates a cross-reference table as the first step:
-- Migration step 1: Create the identity map
CREATE TABLE magento_id_map (
id BIGSERIAL PRIMARY KEY,
entity_type VARCHAR(64) NOT NULL, -- 'product', 'customer', 'order'
magento_id BIGINT NOT NULL, -- Original Magento AUTO_INCREMENT ID
platform_uuid UUID NOT NULL DEFAULT gen_random_uuid(),
migrated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(entity_type, magento_id)
);
-- Populate for all products
INSERT INTO magento_id_map (entity_type, magento_id)
SELECT 'product', entity_id
FROM catalog_product_entity
ON CONFLICT (entity_type, magento_id) DO NOTHING;
-- Verify: every product has a UUID
SELECT COUNT(*) AS total_products,
COUNT(platform_uuid) AS mapped_products
FROM magento_id_map
WHERE entity_type = 'product';
-- Expected: total_products = mapped_products
After this step, the Debezium CDC sync service (described in Part 6) uses magento_id_map to translate every entity_id in CDC events to its corresponding UUID before publishing to Dapr PubSub.
5. Dynamic SQL Pivot: Reading at Runtime
With attribute IDs resolved dynamically, the extraction query becomes a pivot that works on any Magento instance:
-- EAV extraction with dynamic attribute ID resolution
WITH attr_ids AS (
-- Resolve all attribute codes to IDs for this Magento instance
SELECT
attribute_id,
attribute_code,
backend_type -- 'varchar', 'int', 'decimal', 'datetime', 'text'
FROM eav_attribute
WHERE entity_type_id = (
SELECT entity_type_id FROM eav_entity_type
WHERE entity_type_code = 'catalog_product'
)
AND attribute_code IN ('name', 'price', 'special_price', 'status',
'visibility', 'description', 'short_description',
'tax_class_id', 'weight', 'manufacturer')
)
SELECT
e.entity_id AS magento_id,
idmap.platform_uuid AS id,
e.sku,
e.attribute_set_id,
MAX(CASE WHEN a.attribute_code = 'name'
THEN v.value END) AS name,
MAX(CASE WHEN a.attribute_code = 'description'
THEN t.value END) AS description,
MAX(CASE WHEN a.attribute_code = 'status'
THEN i.value END) AS status,
MAX(CASE WHEN a.attribute_code = 'price'
THEN d.value END) AS price,
MAX(CASE WHEN a.attribute_code = 'special_price'
THEN d2.value END) AS special_price,
MAX(CASE WHEN a.attribute_code = 'weight'
THEN d3.value END) AS weight,
e.created_at,
e.updated_at
FROM catalog_product_entity e
JOIN magento_id_map idmap
ON idmap.entity_type = 'product' AND idmap.magento_id = e.entity_id
-- Varchar attributes (name, manufacturer, etc.)
LEFT JOIN catalog_product_entity_varchar v
ON v.entity_id = e.entity_id AND v.store_id = 0
AND v.attribute_id IN (SELECT attribute_id FROM attr_ids WHERE backend_type = 'varchar')
LEFT JOIN attr_ids a_v ON a_v.attribute_id = v.attribute_id
-- Text attributes (description, short_description)
LEFT JOIN catalog_product_entity_text t
ON t.entity_id = e.entity_id AND t.store_id = 0
AND t.attribute_id IN (SELECT attribute_id FROM attr_ids WHERE backend_type = 'text')
LEFT JOIN attr_ids a_t ON a_t.attribute_id = t.attribute_id AND a_t = a_v
-- Integer attributes (status, visibility, tax_class_id)
LEFT JOIN catalog_product_entity_int i
ON i.entity_id = e.entity_id AND i.store_id = 0
AND i.attribute_id IN (SELECT attribute_id FROM attr_ids WHERE backend_type = 'int')
-- Decimal attributes (price, special_price, weight)
LEFT JOIN catalog_product_entity_decimal d
ON d.entity_id = e.entity_id AND d.store_id = 0
AND d.attribute_id = (SELECT attribute_id FROM attr_ids WHERE attribute_code = 'price')
LEFT JOIN catalog_product_entity_decimal d2
ON d2.entity_id = e.entity_id AND d2.store_id = 0
AND d2.attribute_id = (SELECT attribute_id FROM attr_ids WHERE attribute_code = 'special_price')
LEFT JOIN catalog_product_entity_decimal d3
ON d3.entity_id = e.entity_id AND d3.store_id = 0
AND d3.attribute_id = (SELECT attribute_id FROM attr_ids WHERE attribute_code = 'weight')
GROUP BY e.entity_id, e.sku, e.attribute_set_id, idmap.platform_uuid, e.created_at, e.updated_at
ORDER BY e.entity_id;
This query is the safe, production-ready version. The WITH attr_ids AS (...) CTE runs once at query start and resolves all attribute IDs from the current database state — no hardcoding, no environment-specific constants.
6. The Go Transformation Layer
After SQL extraction, the data needs transformation before insertion into the Catalog Service’s PostgreSQL:
// migration/transformer/product_transformer.go
type MagentoProduct struct {
MagentoID int64
UUID string
SKU string
Name string
Description string
Status int // 1=Enabled, 2=Disabled
Price float64
SpecialPrice *float64
Weight float64
CreatedAt time.Time
UpdatedAt time.Time
}
func TransformProduct(mp *MagentoProduct) *catalog.Product {
p := &catalog.Product{
ID: mp.UUID,
SKU: mp.SKU,
Name: mp.Name,
Description: mp.Description,
Status: transformStatus(mp.Status),
// Convert float64 price to Money type (no floating point in API layer)
Price: money.FromFloat("VND", mp.Price),
Weight: mp.Weight,
CreatedAt: mp.CreatedAt,
UpdatedAt: mp.UpdatedAt,
}
if mp.SpecialPrice != nil {
p.SpecialPrice = money.FromFloat("VND", *mp.SpecialPrice)
}
return p
}
func transformStatus(magentoStatus int) catalog.ProductStatus {
switch magentoStatus {
case 1: return catalog.ProductStatus_ENABLED
case 2: return catalog.ProductStatus_DISABLED
default: return catalog.ProductStatus_UNSPECIFIED
}
}
7. Customer EAV: The Same Pattern
Customer data in Magento is also EAV (customer_entity + type-specific tables). The extraction follows the same pattern:
-- Customer extraction with dynamic attribute resolution
WITH customer_attrs AS (
SELECT attribute_id, attribute_code, backend_type
FROM eav_attribute
WHERE entity_type_id = (
SELECT entity_type_id FROM eav_entity_type
WHERE entity_type_code = 'customer'
)
AND attribute_code IN ('firstname', 'lastname', 'dob', 'gender', 'taxvat')
)
SELECT
ce.entity_id AS magento_id,
idmap.platform_uuid AS id,
ce.email,
ce.group_id AS customer_group,
ce.is_active,
ce.created_at,
MAX(CASE WHEN ca.attribute_code = 'firstname'
THEN v.value END) AS first_name,
MAX(CASE WHEN ca.attribute_code = 'lastname'
THEN v.value END) AS last_name
FROM customer_entity ce
JOIN magento_id_map idmap
ON idmap.entity_type = 'customer' AND idmap.magento_id = ce.entity_id
LEFT JOIN customer_entity_varchar v
ON v.entity_id = ce.entity_id
AND v.attribute_id IN (SELECT attribute_id FROM customer_attrs WHERE backend_type = 'varchar')
JOIN customer_attrs ca ON ca.attribute_id = v.attribute_id
GROUP BY ce.entity_id, ce.email, ce.group_id, ce.is_active, ce.created_at, idmap.platform_uuid;
8. Incremental Delta Sync
After the initial full load, the CDC pipeline (Debezium, described in Part 6) handles incremental changes. But before Debezium is deployed, a polling-based delta sync covers the gap:
-- Delta sync: products changed since last extraction
SELECT e.entity_id
FROM catalog_product_entity e
WHERE e.updated_at > :last_sync_timestamp
ORDER BY e.updated_at ASC
LIMIT 1000;
This is a fallback only — it misses DELETEs and is vulnerable to clock skew. As soon as Debezium is running (Phase 1 milestone), the polling-based delta sync is disabled.
9. Validation: Before Declaring Extraction Complete
#!/bin/bash
# validate-product-extraction.sh
MAGENTO_COUNT=$(mysql -h $MAGENTO_DB -e "
SELECT COUNT(*) FROM catalog_product_entity
WHERE status = 1 -- Enabled products only
" | tail -1)
PLATFORM_COUNT=$(psql $PLATFORM_DB -t -c "
SELECT COUNT(*) FROM products WHERE status = 'ENABLED'
")
echo "Magento enabled products: $MAGENTO_COUNT"
echo "Platform products: $PLATFORM_COUNT"
# Count must match before proceeding to Phase 1
if [ "$MAGENTO_COUNT" -ne "$PLATFORM_COUNT" ]; then
echo "❌ Count mismatch — extraction incomplete"
exit 1
fi
# Sample validation: spot-check 100 products
psql $PLATFORM_DB -c "
SELECT p.sku, p.name, p.price
FROM products p
ORDER BY RANDOM()
LIMIT 100
" > /tmp/platform_sample.csv
echo "✅ Count matches. Review sample in /tmp/platform_sample.csv"
Key Takeaways
- Never hardcode
attribute_id— always resolve fromeav_attribute.attribute_code - Build
magento_id_mapfirst — every subsequent migration step depends on UUID cross-references - Use dynamic SQL pivot — the CTE approach works on any Magento instance regardless of installation order
- Validate count before Phase 1 — a count mismatch means your extraction is incomplete
- Price is decimal in MySQL, Money in proto — convert explicitly, never pass float through the API layer
The EAV extraction is the most labor-intensive part of the migration. Once magento_id_map is populated and the dynamic pivot query produces the correct count, you’re ready for the actual migration phases. Part 6 shows how Debezium makes the incremental sync continuous and reliable — replacing the polling-based delta with binlog-level CDC.
FAQ
Why can’t I just use Magento’s built-in import/export CSV instead of direct SQL extraction?
Magento’s CSV import/export doesn’t export attribute relationships or the magento_id_map cross-reference you need. CSV export gives you a flat row per product — losing the EAV structure — and has a 5,000-product limit on vanilla Magento without extension. For 10,000+ SKUs, direct SQL extraction against MySQL is the only approach that works at scale without modification.
Do I need to keep magento_id_map permanently?
Yes — until Phase 3 cutover is complete. The CDC sync pipeline (Part 6), the dual-write adapter (Part 7), and the conflict resolver (Part 7) all use magento_id_map to translate Magento integer IDs to UUIDs when processing events. Once Phase 3 is done and Magento is decommissioned, the table can be archived (not deleted — keep it for audit trail).
What if two Magento instances have the same attribute_code but different semantics?
This happens with custom attributes created by third-party extensions that use generic codes like custom_attribute_1. In this case, also check eav_attribute.frontend_label and eav_attribute.source_model to disambiguate. Document any ambiguous attributes in your magento_id_map as a comment field before starting the extraction — resolving this post-extraction is significantly more expensive.