I have a dataset (created after bunch of data management steps) which is arranged as below. The gap indicator shows the number of gap days ( 1 indicates gap). Now, I would want to allow a gap of 30 days (i.e., I would count them as having no gap if the gap is less than or equal to 30 days). I have tried a few things (used retain, lag statements), but haven't been able to do so.
ID | Gap_indicator | Days |
A | 0 | 90 |
A | 1 | 20 |
A | 0 | 60 |
A | 1 | 10 |
A | 0 | 60 |
A | 1 | 40 |
A | 0 | 90 |
This is how the dataset is laid out. My expected output is (I want to add 30 days of gap to the previous no gap days)
ID | Gap_indicator | Days |
A | 0 | 270 |
A | 1 | 10 |
A | 0 | 90 |
I am not able to figure out how to do this. Your help would be much appreciated. Thank you!
Try this:
data have;
input ID $ Gap Days;
datalines;
A 0 90
A 1 20
A 0 60
A 1 10
A 0 60
A 1 40
A 0 90
;
data want;
gDays = 0;
do until(last.id);
set have; by id;
if gap then do;
if days > 30 then do;
gDays = gDays + 30;
gGap = 0;
output;
gDays = days - 30;
gGap = 1;
output;
gDays = 0;
end;
else gDays = gDays + days;
end;
else gDays = gDays + days;
end;
if gDays > 0 then do;
gGap = 0;
output;
end;
drop gap days;
rename gGap=Gap gDays=Days;
run;
Post what you've tried so far please.
Based on some of the readings, I did the following:
data test;
set problem;
by id;
lg_days=lag(days);
if first.id then lg_days=.;
run;
data test1;
merge test test (firstobs=2 rename=(id =id2 days=new_days gap=gap2));
if id ne id2 then new_days=.;
drop id2 gap2;
run;
I created a dataset by bringing the next observation in the same line. I am, however, struggling to collapse the observations with respect to 30 days allowable gap.
Try this:
data have;
input ID $ Gap Days;
datalines;
A 0 90
A 1 20
A 0 60
A 1 10
A 0 60
A 1 40
A 0 90
;
data want;
gDays = 0;
do until(last.id);
set have; by id;
if gap then do;
if days > 30 then do;
gDays = gDays + 30;
gGap = 0;
output;
gDays = days - 30;
gGap = 1;
output;
gDays = 0;
end;
else gDays = gDays + days;
end;
else gDays = gDays + days;
end;
if gDays > 0 then do;
gGap = 0;
output;
end;
drop gap days;
rename gGap=Gap gDays=Days;
run;
This works! I will try to parse the codes. One thing I haven't understood so far is, what is the difference between initializing a variable before and after the set statement? Thank you once again.
Imported dataset variables (with a set statement) are overwritten when the set statement executes. Other variables (such as gDays above) are reset to missing at the beginning of the implicit dataset loop (unless they are RETAINed).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.