Checking for missing / concurrent dates

Reply
Occasional Contributor
Posts: 15

Checking for missing / concurrent dates

I receive monthly data imports from various sources and I need to create a check program to run every now and then to ensure that there is no data missing. There is a unique ID, plus start and end dates. 

I am stuck on how I would code what I'm after and hoping you can help me out. I initially want to check that I have received all the data, then run it on an ad hoc basis to ensure everything is still under control.

As mentioned, I have a unique ID, plus a start and end date. I need to check that all data has been received and there isn't a period of time that has been missed.

An example of my data looks similar to the below :

UID       ||       Start_Date       ||       End_Date

1234            01012013                31012013

1234            01022013                28022013

1234            01042013                30042013

You can see from the example above the data for the month of March is missing. Note, I have provided a really simple and clean example, many of the records don't have start and end dates matching the calendar.

Any suggestions?

Super User
Posts: 19,822

Re: Checking for missing / concurrent dates

Are you looking for gaps/overlaps for example? Can you assume that the first start date and last end date are correct?

Occasional Contributor
Posts: 15

Re: Checking for missing / concurrent dates

Hi Reeza, yes I am looking for a way to identify any gaps or overlaps. The start and end dates will always be correct, albeit different for each record. To give you a bit of background, I am analysing energy data. Each month I receive a file containing data which I import into SAS. One of the checks that I want to run periodically (and right now as I have just imported 2 years worth of backdata) is to ensure that I am receiving data for all records and one site has not been accidently left off the original data file.

SAS Super FREQ
Posts: 708

Re: Checking for missing / concurrent dates

Assuming start and end dates always are on the first and last of a month, one could build a time series table for all the UID's, then check for matching start and end dates between time series and actual data.

data have;
  infile cards dlm=",";
 
input
    uid :
8.
    start_date :
ddmmyy8.
    end_date :
ddmmyy8.
  ;
  format
    start_date end_date
date9.
  ;
  cards;
1234,01012013,31012013
1234,01022013,28022013
1234,01042013,30042013
5678,01012013,30012013
;

data uid_date_series;
  do uid = 1234, 5678;
   
do year = 2013 to 2013;
     
do month = 1 to 12;
        sDate = mdy(month,
1, year);
        eDate = intnx("month", sDate, 0, "E");
        output;
     
end;
   
end;
 
end;

 
format
    sDate eDate
date9.
  ;
run;

PROC SQL;
 
CREATE TABLE WORK.dateCheck AS
   
SELECT
      t1.uid
      , t1.sDate
      , t1.eDate    
      , t2.start_date
      , t2.end_date
      , (t1.sDate = t2.start_date)
AS startMatch
      , (t1.eDate = t2.end_date)
AS endMatch
   
FROM
      WORK.uid_date_series t1
   
LEFT JOIN
      WORK.HAVE t2
     
ON (t1.uid = t2.uid and t1.sDate = t2.start_date)
   
ORDER BY
      t1.uid,
      t1.sDate
  ;
QUIT;
SAS Super FREQ
Posts: 708

Re: Checking for missing / concurrent dates

The code below assumes end date is greater start date, it will give you indicators for gap or overlap to the previous obs.

data have;
  infile cards dlm=",";
 
input
    uid :
8.
    start_date :
ddmmyy8.
    end_date :
ddmmyy8.
  ;
  format
    start_date end_date
date9.
  ;
  cards;
1234,01012013,31012013
1234,01022013,28022013
1234,01042013,30042013
5678,01012013,30012013
5678,31012013,28022013
5678,28022013,28032013
9012,01012013,31012013
;

proc sort data=have;
  by uid start_date;
run;

data want;
  set have;
  by uid;

  startEndDiff = start_date - lag1(end_date);
  firstEntry = first.uid;

 
if first.uid = 0 then do;
    gap = (startEndDiff >
1);
    overlap = (startEndDiff < 1);
  end;
run;
Respected Advisor
Posts: 3,799

Re: Checking for missing / concurrent dates

What does "many of the records don't have start and end dates matching the calendar." mean and what do you do about it?

Occasional Contributor
Posts: 15

Re: Checking for missing / concurrent dates

Thanks Barry, I will give it a try.

Data_null - most of the data I collect doesn't run on calander days. It is electricity usage data so it will all depend on when the meter was read. For example, the readings may be from 15th April - 18th July. There is no set rule.

I have just imported 2 years worth of back data so I want to check that all data has been included, and with over 500k+ lines, not something I want to do manually!

SAS Super FREQ
Posts: 708

Re: Checking for missing / concurrent dates

hi ddeaves

Have you tried my second example this does not depend what the start and end dates are

Bruno

Occasional Contributor
Posts: 15

Re: Checking for missing / concurrent dates

Hi Bruno,

Thank you very much, your code has worked like a treat!

Cheers,

DD.

Ask a Question
Discussion stats
  • 8 replies
  • 398 views
  • 0 likes
  • 4 in conversation