04-10-2018 05:36 PM
I have a data-set with individual id, date variable for each day, and a filled date variable, as shown below. I wanted to know the gaps in filled dates, so I created a count variable which starts as 1 on the first gap for each gap interval and then increments by 1. I would now like to create an id -level data-set (one observation per person) with indicator variables for number of gaps within each gap interval (for instance, Jan 3 to Jan 5, 3 days gap, and Jan 8 to Jan 9, 2 days gap - for id A). How could I do this? Thank you!
|ID||each date||filled date||count|
04-10-2018 06:30 PM
Something like this would work (I want how many days were continuously filled and how many continuous gap intervals). Thank you!
04-10-2018 06:50 PM - edited 04-10-2018 06:54 PM
Here's a start:
where count > 0;
if first.id then group=1;
else if count=1 then group + 1;
That creates an artificial GROUP variable that can identify when groups of observations having count > 0 begin and end.
by id group;
That's closer to what you asked for. It gives you the last observation in each grouping. Take a look at that much, and then reconsider. Trying to change this into one observation per ID is a bad idea. You will end up with different numbers of variables populated for each ID, and the structure of the data will be difficult to program with. But if you still want to pursue that as the final result, it can be done using PROC TRANSPOSE (perhaps more than one PROC TRANSPOSE depending on what you want the final result to look like).
While I was composing this, you must have posted what you want the result to look like. I may be able to get to that later, to include some of the "0" records. Right now, all you get is the non-zero gaps. But that will have to wait (at least for me).
04-10-2018 07:07 PM
data have; input (ID each_date filled_date) ($) count; datalines; A 1-Jan 1-Jan 0 A 2-Jan 2-Jan 0 A 3-Jan . 1 A 4-Jan . 2 A 5-Jan . 3 A 6-Jan 6-Jan 0 A 7-Jan 7-Jan 0 A 8-Jan . 1 A 9-Jan . 2 A 10-Jan 10-Jan 0 ; data temp; set have; by id ; if first.id then k=0; if count=1 or count=0 and lag(count) ne 0 then k+1; run; proc sql; create table want as select id,each_date,max(count) as _count from temp group by id, k having each_date=min(each_date) and count(k)>1; quit;
04-11-2018 09:46 AM
data have; input (ID each_date filled_date) ($) count; datalines; A 1-Jan 1-Jan 0 A 2-Jan 2-Jan 0 A 3-Jan . 1 A 4-Jan . 2 A 5-Jan . 3 A 6-Jan 6-Jan 0 A 7-Jan 7-Jan 0 A 8-Jan . 1 A 9-Jan . 2 A 10-Jan 10-Jan 0 ; data have; set have; flag=missing(filled_date); run; data have; set have; by id flag notsorted; group+first.flag; run; data want; do until(last.group); set have; by group; if first.group then new_date=each_date; end; drop each_date filled_date group flag; run;