I have a raw data and not sure how to flag if service_from_date is not a continuous month.
for example, from observation1-3, need flag=1; from observation 4-5, need flag=2; from observation 6-13, need flag=3.
observation 1-5 has a same provider, but there is a gap (not continuous) between observation 3 and 4.
MEMBER_ID | SERVICE_FROM_DATE | SERVICE_THROUGH_DATE | SERVICE_PROVIDER_ID |
ABC | 01/05/2017 | 01/05/2017 | 64058469 |
ABC | 02/24/2017 | 02/24/2017 | 64058469 |
ABC | 03/08/2017 | 03/08/2017 | 64058469 |
ABC | 07/10/2017 | 07/10/2017 | 64058469 |
ABC | 08/21/2017 | 08/21/2017 | 64058469 |
ABC | 03/25/2017 | 03/25/2017 | 64089900 |
ABC | 04/30/2017 | 04/30/2017 | 64089900 |
ABC | 05/28/2017 | 05/28/2017 | 64089900 |
ABC | 06/25/2017 | 06/25/2017 | 64089900 |
ABC | 07/23/2017 | 07/23/2017 | 64089900 |
ABC | 08/20/2017 | 08/20/2017 | 64089900 |
ABC | 09/24/2017 | 09/24/2017 | 64089900 |
ABC | 10/22/2017 | 10/22/2017 | 64089900 |
wanted results:
MEMBER_ID | START_DT | END__DT | SERVICE_PROVIDER_ID |
ABC | 01/05/2017 | 03/08/2017 | 64058469 |
ABC | 07/10/2017 | 08/21/2017 | 64058469 |
ABC | 03/25/2017 | 10/22/2017 | 64089900 |
is your definition of "continuous" that the month of service_from_date is exactly one month after the service_through_date from the previous record when the member_id and service_provider_id are the same?
Since your results do not show any flag values it looks like you want to collapse the from and to period, correct?
data have;
input MEMBER_ID $ SERVICE_FROM_DATE :mmddyy10. SERVICE_THROUGH_DATE :mmddyy10. SERVICE_PROVIDER_ID $;
format SERVICE_FROM_DATE SERVICE_THROUGH_DATE mmddyy10.;
datalines;
ABC 01/05/2017 01/05/2017 64058469
ABC 02/24/2017 02/24/2017 64058469
ABC 03/08/2017 03/08/2017 64058469
ABC 07/10/2017 07/10/2017 64058469
ABC 08/21/2017 08/21/2017 64058469
ABC 03/25/2017 03/25/2017 64089900
ABC 04/30/2017 04/30/2017 64089900
ABC 05/28/2017 05/28/2017 64089900
ABC 06/25/2017 06/25/2017 64089900
ABC 07/23/2017 07/23/2017 64089900
ABC 08/20/2017 08/20/2017 64089900
ABC 09/24/2017 09/24/2017 64089900
ABC 10/22/2017 10/22/2017 64089900
;
data want;
do n=1 by 1 until(last.member_id);
set have;
by MEMBER_ID SERVICE_FROM_DATE notsorted;
if n=1 then _start_DATE=SERVICE_FROM_DATE;
array temp(100);
array _temp2(100);
k=month(SERVICE_FROM_DATE);
retain grp 1;
k1=dif(k);
if k1 not in (1,.) then
do;
grp+1;
temp(n-1)=_start_DATE;
_start_DATE=SERVICE_FROM_DATE;
_temp2(n)=n-1;
end;
*output;
end;
do n1= 1 by 1 until(last.member_id);
set have;
by MEMBER_ID SERVICE_FROM_DATE notsorted;
if n1 in _temp2 then
do;
start_date=temp(n1);
output;
end;
end;
if last.member_id then do;
start_date=_start_DATE;
output;
end;
format start_date _start_date mmddyy10.;
drop n temp: _temp: k: grp n:;
run;
sorry for lazy and unclean code, but you can. i'll see if i can clean after lunch. I am hungry
Better version:
NOTES: The code may require tweaking a bit as the service provider changes for the same member id. I trust if you are capable of maintaining this code, you certainly can tweak on your own. Nevertheless, should you face any discrepancies while testing, let me know. And of course, I am sure some super user will give a far better solution than this
data have;
input MEMBER_ID $ SERVICE_FROM_DATE :mmddyy10. SERVICE_THROUGH_DATE :mmddyy10. SERVICE_PROVIDER_ID $;
format SERVICE_FROM_DATE SERVICE_THROUGH_DATE mmddyy10.;
datalines;
ABC 01/05/2017 01/05/2017 64058469
ABC 02/24/2017 02/24/2017 64058469
ABC 03/08/2017 03/08/2017 64058469
ABC 07/10/2017 07/10/2017 64058469
ABC 08/21/2017 08/21/2017 64058469
ABC 03/25/2017 03/25/2017 64089900
ABC 04/30/2017 04/30/2017 64089900
ABC 05/28/2017 05/28/2017 64089900
ABC 06/25/2017 06/25/2017 64089900
ABC 07/23/2017 07/23/2017 64089900
ABC 08/20/2017 08/20/2017 64089900
ABC 09/24/2017 09/24/2017 64089900
ABC 10/22/2017 10/22/2017 64089900
;
data want(rename=(SERVICE_THROUGH_DATE=end_date));
retain MEMBER_ID start_date;
do _n=1 by 1 until(last.member_id);
set have;
by MEMBER_ID SERVICE_FROM_DATE notsorted;
if _n=1 then _start_DATE=SERVICE_FROM_DATE;
array temp(100) _temporary_;
array _temp2(100) _temporary_;
_k=month(SERVICE_FROM_DATE);
retain _grp 1;
_k1=dif(_k);
if _k1 not in (1,.) then
do;
_grp+1;
temp(_n-1)=_start_DATE;
_start_DATE=SERVICE_FROM_DATE;
_temp2(_n)=_n-1;
end;
end;
do _n_= 1 by 1 until(last.member_id);
set have;
by MEMBER_ID SERVICE_FROM_DATE notsorted;
if _n_ in _temp2 then
do;
start_date=temp(_n_);
output;
end;
if last.member_id then
do;
start_date=_start_DATE;
output;
end;
end;
format start_date mmddyy10.;
drop _: SERVICE_FROM_DATE;
run;
@novinosrin "I am hungry".. gotta love it 😄
@PeterClemmensen Yep indeed loved it . I can't miss the amazing subway's sub of the day oven roasted chicken topping with mayonnaise and hot sauce lol
If you have SAS/ETS do something like this, since the creation of START_DT and END_DT really only depends on looking 1 obs back and 1 ahead 🙂
proc expand data=have out=have_ll method=none;
by SERVICE_PROVIDER_ID;
id SERVICE_FROM_DATE;
convert SERVICE_FROM_DATE=lead_SERVICE_FROM_DATE / transformout=(lead 1);
convert SERVICE_FROM_DATE=lag_SERVICE_FROM_DATE / transformout=(lag 1);
run;
data want(keep=MEMBER_ID START_DT END_DT SERVICE_PROVIDER_ID);
format MEMBER_ID START_DT END_DT SERVICE_PROVIDER_ID;
set have_ll;
by SERVICE_PROVIDER_ID;
monthback=month(SERVICE_FROM_DATE)-month(lag_SERVICE_FROM_DATE);
monthforward=month(lead_SERVICE_FROM_DATE)-month(SERVICE_FROM_DATE);
if first.SERVICE_PROVIDER_ID then START_DT=SERVICE_FROM_DATE;
else if (monthback=1 and monthforward>1) or last.SERVICE_PROVIDER_ID then do;
END_DT=SERVICE_THROUGH_DATE;
output;
START_DT=lead_SERVICE_FROM_DATE;
end;
retain START_DT;format START_DT END_DT mmddyy10.;
run;
I think that the following meets your specifications:
data want (keep=MEMBER_ID START_DT END_DT SERVICE_PROVIDER_ID); set have; format START_DT END_DT mmddyy10.; by SERVICE_PROVIDER_ID; retain START_DT in_process; set have ( firstobs = 2 keep = SERVICE_FROM_DATE rename = (SERVICE_FROM_DATE = Next_Date) ) have ( obs = 1 drop = _all_); if first.SERVICE_PROVIDER_ID or in_process eq 0 then do; in_process=1; START_DT=SERVICE_FROM_DATE; end; if last.SERVICE_PROVIDER_ID or intck('month',SERVICE_FROM_DATE,Next_Date) gt 1 then do; END_DT=SERVICE_THROUGH_DATE; output; in_process=0; end; run;
Art, CEO, AnalystFinder.com
Like this?
data WANT;
retain START_DT;
set HAVE end=LASTOBS;
PREV_ST =lag(SERVICE_FROM_DATE);
PREV_END =lag(SERVICE_THROUGH_DATE);
PREV_PROV =lag(SERVICE_PROVIDER_ID);
if _N_=1 then START_DT=SERVICE_FROM_DATE;
if _N_>1;
if (month(PREV_ST) + 1 ne month(SERVICE_FROM_DATE) )
or ( PREV_PROV ne SERVICE_PROVIDER_ID )
or LASTOBS then do;
END_DT = ifn(LASTOBS, SERVICE_THROUGH_DATE,PREV_END);
PROVIDER_ID= ifn(LASTOBS, SERVICE_PROVIDER_ID ,PREV_PROV);
output;
START_DT = SERVICE_FROM_DATE;
end;
format START_DT END_DT date9.;
keep MEMBER_ID START_DT END_DT PROVIDER_ID;
run;
MEMBER_ID | START_DT | END_DT | PROVIDER_ID |
---|---|---|---|
ABC | 05JAN2017 | 08MAR2017 | 64058469 |
ABC | 10JUL2017 | 21AUG2017 | 64058469 |
ABC | 25MAR2017 | 22OCT2017 | 64089900 |
@ChrisNZ: Using lag or reading the next record (like I did) accomplish the same thing. However, since you didn't use the INTCK function, your code will fail if the visits go across years. e.g.:
data have; input MEMBER_ID $ SERVICE_FROM_DATE :mmddyy10. SERVICE_THROUGH_DATE :mmddyy10. SERVICE_PROVIDER_ID $; format SERVICE_FROM_DATE SERVICE_THROUGH_DATE mmddyy10.; datalines; ABC 01/05/2017 01/05/2017 64058469 ABC 02/24/2017 02/24/2017 64058469 ABC 03/08/2017 03/08/2017 64058469 ABC 07/10/2017 07/10/2017 64058469 ABC 08/21/2017 08/21/2017 64058469 ABC 03/25/2017 03/25/2017 64089900 ABC 04/30/2017 04/30/2017 64089900 ABC 05/28/2017 05/28/2017 64089900 ABC 06/25/2017 06/25/2017 64089900 ABC 07/23/2017 07/23/2017 64089900 ABC 08/20/2017 08/20/2017 64089900 ABC 09/24/2017 09/24/2017 64089900 ABC 10/22/2017 10/22/2017 64089900 ABC 11/22/2017 11/22/2017 64089900 ABC 12/22/2017 12/22/2017 64089900 ABC 01/22/2018 01/22/2018 64089900 ABC 02/22/2018 02/22/2018 64089900 ;
Art, CEO, AnalystFinder.com
True. I just wanted to supply a much shorter data step than the 2 above, which were the only 2 when I started replying..
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.