02-23-2014 07:35 PM
I receive monthly data imports from various sources and I need to create a check program to run every now and then to ensure that there is no data missing. There is a unique ID, plus start and end dates.
I am stuck on how I would code what I'm after and hoping you can help me out. I initially want to check that I have received all the data, then run it on an ad hoc basis to ensure everything is still under control.
As mentioned, I have a unique ID, plus a start and end date. I need to check that all data has been received and there isn't a period of time that has been missed.
An example of my data looks similar to the below :
UID || Start_Date || End_Date
1234 01012013 31012013
1234 01022013 28022013
1234 01042013 30042013
You can see from the example above the data for the month of March is missing. Note, I have provided a really simple and clean example, many of the records don't have start and end dates matching the calendar.
02-23-2014 09:53 PM
Hi Reeza, yes I am looking for a way to identify any gaps or overlaps. The start and end dates will always be correct, albeit different for each record. To give you a bit of background, I am analysing energy data. Each month I receive a file containing data which I import into SAS. One of the checks that I want to run periodically (and right now as I have just imported 2 years worth of backdata) is to ensure that I am receiving data for all records and one site has not been accidently left off the original data file.
02-24-2014 05:03 AM
Assuming start and end dates always are on the first and last of a month, one could build a time series table for all the UID's, then check for matching start and end dates between time series and actual data.
02-24-2014 05:25 AM
The code below assumes end date is greater start date, it will give you indicators for gap or overlap to the previous obs.
02-24-2014 04:43 PM
Thanks Barry, I will give it a try.
Data_null - most of the data I collect doesn't run on calander days. It is electricity usage data so it will all depend on when the meter was read. For example, the readings may be from 15th April - 18th July. There is no set rule.
I have just imported 2 years worth of back data so I want to check that all data has been included, and with over 500k+ lines, not something I want to do manually!