10-05-2017 06:35 PM
I'm struggling to create the appropriate groups for my data.
The question is around data submission. A hospital can submit data non-consecutively and I need a way to understand the count of consecutive months per hospital. The last data step in the code below will sum the total number of months but I need to find a way to report on each segment of months reported (e.g., 2017-01 to 2017-05 (one sequence), 2017-07 to 201708 (second sequence), etc). Thank you!
data flatfile2; set flatfile; beg_month=input(begin_date,anydtdte19.); format beg_month yymms10.; run; proc sort data=flatfile2; by hospital beg_month; run; data testing; set flatfile2; groupnum+(First.hospital or (dif(beg_month) GT 1)); run; data flatfile3; set testing; by hospital; begindate=first.hospital; enddate=last.hospital; lagdate=lag(groupnum); if begindate=1 then do; lagdate=groupnum-1; end; run; /*this is the piece that is flagging the consecutive months*/ /*so, a hospital with records where diff= 1 0 0 0 0 0 (on separate rows) would be indicative of 6 months of consecutive data*/ /*I'm stuck here because I need to find a way to set this as one sequence.*/ data flatfile4 (drop=begindate enddate); set flatfile3; diff=groupnum-lagdate; if diff <=1 then diff=0; else if diff>1 then diff=1; if begindate=1 and diff=0 then diff=1; run; proc sort data=flatfile4; by hospital diff; run; data flatfile5; set flatfile4; by hospital diff; if first.hospital and first.diff=1 then totalmo=0; totalmo+1; if last.hospital then output; run; proc print data=flatfile5; run;
10-06-2017 08:36 AM
I think you have a problem about identifying the consecutive months. The variable beg_month is a DATE value, not a MONTH, even though you assign a format to it that only displays the month, not the day. The expression "dif(beg_month) GT 1" will return 1 (true) whenever there is more than one day (not one month) between visits.
I think what you want to do is something like this:
data flatfile2; set flatfile; beg_month=input(begin_date,anydtdte19.); format beg_month yymms10.; run; proc sort data=flatfile2; by hospital beg_month; run; data groups; set flatfile2; by hospital; prev_date=lag(beg_month); if first.hospital then group=1; else if intck('month',prev_date,beg_month)>1 then group+1; run;
You then have data sorted by HOSPITAL and GROUP, where each GROUP is a series of visits in consecutive months.
10-06-2017 02:30 PM
@s_lassen Thanks so much! That worked to create the groups!
Do you know of an easy way to them sum the number of months (rows) for each of those groups?
This is the code that I currently have but it's only giving me the last.hospital.
proc sort data=groups; by hospital group; run; data flatfilefin; set groups; by hospital group; if first.hospital and first.group=1 then totalmo=0; totalmo+1; if last.hospital then output; run; proc summary data=flatfilefin nway; class Hospital Group; var totalmo; output out=test sum=; run; proc print data=test; run;