04-25-2013 10:50 AM
I have a regular incoming feed of data, and am using the DATA step with the UPDATE function.
It is a feed of vehicles which have returned for routine maintenance. When the car is sold, we get records for every covered service period, with an OPEN flag. The unique variables I am using to update are VIN and Mileage (5k, 10k, etc..). Then as the service is completed, it moves to COMPLETED. My problem is, some records are showing back up as OPEN, which is being overwritten.
VIN #1 is sold, and it creates all the OPEN records (5 lines) for the 5 different mileage intervals.
As VIN #1 returns for service, I get another record with VIN 1, 5k, and status COMPLETED.
2 weeks later, I get bad data saying VIN 1 at 5k is now OPEN.
Data update will overwrite with bad data.
My thought: As it is imported, create a new variable (let's call it "priority") and assign a 0 for OPEN and a 1 for COMPLETED. Can I update Vin 1, 5k, only if the Priority in the new record is greater than the existing record?
Any other thoughts to this issue? Besides going back to the data team and getting laughed at?
Thanks a ton!
04-25-2013 11:25 AM
If you're getting some bad data how do you know the rest of the data is good? I think it's always worth asking...
Workarounds are using a SQL Update instead, using a where to specify which records to selectively update.
04-25-2013 01:11 PM
I would say allowing a dataset to update an older one without some form of checking for validity might not be a best practice. Especially if the data in question is from human entries...
By any chance could these completed come from one maintenance issue being resolved but finding that something else is also going on?
My dealer can't tell be why my car throws one of the codes periodically but recommends doing about $2,000 worth of work that might correct the issue. So repeat visits on a VIN and mileage may be multiple looks at the car.