DATA Step, Macro, Functions and more

identifying data gaps in the data

Reply
Contributor
Posts: 40

identifying data gaps in the data

[ Edited ]

I have a data set below

grp_name is the name of client data_have is like what months we have received the data, start_dt is from when we should we receive the data, end_dt is till when we have to receive the data.

 

grp_name data_have start_dt end_dt

X Jan2016 Jan2016 Dec2016

X Mar2016 Jan2016 Dec2016

X Dec2016 Jan2016 Dec2016

y Jan2017 Jan2017 Apr2017

y Feb2017 Jan2017 Apr2017

y Mar2017 Jan2017 Apr2017

y Apr2017 Jan2017 Apr2017

z Apr2018 Jan2018 Dec2018

;

 

What i need is 

For x group we have for Jan, Mar, Dec 2016 but we should have data from jan to dec 2016 so a flag should be created and should be marked as 1 else 0.

 

Thanks in advance

 

 

 

Super User
Posts: 10,280

Re: Got struck Please help.

Posted in reply to srinath3111

So you need to identify groups that have no complete series of months?

Use the lag() and intck() functions. Like

data flags (keep=grp_name flag);
set have;
by grp_name;
retain flag;
prev_date = lag(data_have);
if first.grp_name then flag = 0;
else do;
  if intck('month',prev_date,data_have) > 1 then flag = 1;
end;
if last.grp_name then output;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 40

Re: Got struck Please help.

Posted in reply to KurtBremser

Actually I wanted to know the data gaps.

 

What i need is for suppose

 

x client has given data for jan, Mar, Dec 2017 but we are suppose to receive the data for jan2017 to dec 2017 our code should identify the data gaps for example FEB, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV is missing data so we should have flag "should reach out"

 

Thanks in advance

Super User
Posts: 23,773

Re: Got struck Please help.

Posted in reply to srinath3111

Please show what exactly what you expect as output for the input you've provided. 

 


@srinath3111 wrote:

Actually I wanted to know the data gaps.

 

What i need is for suppose

 

x client has given data for jan, Mar, Dec 2017 but we are suppose to receive the data for jan2017 to dec 2017 our code should identify the data gaps for example FEB, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV is missing data so we should have flag "should reach out"

 

Thanks in advance


 

 

Esteemed Advisor
Posts: 5,540

Re: Got struck Please help.

Posted in reply to srinath3111

An approach using an array:

 

data have;
input grp_name $ (data_have start_dt end_dt) (:anydtdte.);
format data_have start_dt end_dt mmyyd7.;
datalines;
X Jan2016 Jan2016 Dec2016
X Mar2016 Jan2016 Dec2016
X Dec2016 Jan2016 Dec2016
y Jan2017 Jan2017 Apr2017
y Feb2017 Jan2017 Apr2017
y Mar2017 Jan2017 Apr2017
y Apr2017 Jan2017 Apr2017
z Apr2018 Jan2018 Dec2018
;

proc sort data=have; by grp_name start_dt end_dt data_have; run;

data gaps;
array m{0:99};
do until (last.end_dt);
    set have; by grp_name start_dt end_dt;
    month = intck("month", start_dt, data_have);
    m{month} = 1;
    end;
do month = 0 to intck("month", start_dt, end_dt);
    if not m{month} then do;
        missing_dt = intnx("month", start_dt, month);
        output;
        end;
    end;
format missing_dt mmyyd7.;
keep grp_name missing_dt;
run;

proc print data=gaps noobs; run;
PG
PROC Star
Posts: 1,836

Re: identifying data gaps in the data

Posted in reply to srinath3111
data have;
input grp_name $ (data_have start_dt end_dt) (:anydtdte.);
format data_have start_dt end_dt mmyyd7.;
datalines;
X Jan2016 Jan2016 Dec2016
X Mar2016 Jan2016 Dec2016
X Dec2016 Jan2016 Dec2016
y Jan2017 Jan2017 Apr2017
y Feb2017 Jan2017 Apr2017
y Mar2017 Jan2017 Apr2017
y Apr2017 Jan2017 Apr2017
z Apr2018 Jan2018 Dec2018
;

proc sort data=have; by grp_name start_dt end_dt data_have; run;

data gaps1;
if _n_=1 then do;
   declare hash H (dataset:'have',ordered: "y") ;
   h.definekey  ('grp_name',"data_have") ;
   h.definedone () ;
end;
set have;
by grp_name;
if first.grp_name;
do until(start_dt>end_dt);
if h.check(key:grp_name,key:start_dt) ne 0 then do;
missing_dt=start_dt;
output;
end;
start_dt=intnx('month',start_dt,1,'b');
end;
format missing_dt mmyyd7.;
keep grp_name missing_dt;
run;
Ask a Question
Discussion stats
  • 5 replies
  • 121 views
  • 0 likes
  • 5 in conversation