Stop me if you've heard this before.
I have products in my Drupal site, and products in another (maybe not Drupal) site. I want to run a process and sync the product data. Over time, I've learned that you can't just fire and forget that process. There is a strong need to have a series of reports that shows data that isn't properly synced, as well as vigorously report on any issues with the sync so we can fix whatever issue is leading to the data not syncing well. So that's the use case.
When creating data sync reports it's important to answer the following questions:
- Is there any data in datasource A that is not in datasource B?
- Is there any data in datasource B that is not in datasource A?
- When I do an operation (create/update/disable/delete), I need to record detailed error messages / validation errors (from either side/source) if anything goes wrong so I can figure out how to correct it.
and maybe I could optimize performance by doing these checks during the process:
- Create/Update logic, Does my product in datasource A exist in datasource B?
- Skip Update logic, Is my product in datasource A have any updates for the product in datasource B?
There are many programmatic approaches to solving the above questions. It would be nice to see a solution that used TypeData. I am imagining a solution that takes data from each datasource and converts them into a common data type so that:
- direct comparisons can be easier
- the size of the intermediate state could be much smaller that a fully hydrated node
- ultimately be less code because of inherited getters/setters
- maybe easier to write code that handles validation of individual properties.
If you've got something like this already thought through, I'd say run with that. If not, I'm eager to help write some documentation on how to do this...as soon as I figure it out.
example request