BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nitink26
Obsidian | Level 7

 

Hi,

Please help -

In test data set I have id , dte_start and dte_end columns.

for ID 1- 

Customer Joined on 01JAN1961  and  left on 01JAN1962 and again joined on 01MAR1962. So here if the difference between dte_end (01JAN1962 )and latest  Dte_start (01MAR1962) is less than 90 days, so we keep Dte_start as initial Dte_start (01JAN1961). Dte_start only change if latest dte_start and previous Dte_end difference is more than 90 days.

DTE_end should be latest dte_end. For ID=1 DTE_end should be 31DEC9999.

OUTPUT FOR ID=1 -

id DTE_START DTE_END

1 01JAN1961 31DEC9999

 

data test;
input id : BEST8. DTE_START : DATE9. DTE_END : DATE9.;
FORMAT DTE_START DATE9. DTE_END DATE9.;
cards;
1 01JAN1961 01JAN1962
1 01MAR1962 01JAN1963
1 01FEB1963 31DEC9999
2 01JAN2001 01FEB2001
2 01MAR2001 31DEC9999
3 01JUL2011 30JUN2015
3 01NOV2015 31DEC9999
4 01MAY2016 01MAY2018
;
run;

 

Output I am looking for -

id DTE_START DTE_END
1 01JAN1961 31DEC9999
2 01JAN2001 31DEC9999
3 01NOV2015 31DEC9999
4 01MAY2016 01MAY2018

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You want 1 record per id, having the last DTE_END, and the latest DTE_START following a gap of over 90 days.

 

data test;
input id : BEST8. DTE_START : DATE9. DTE_END : DATE9.;
FORMAT DTE_START DATE9. DTE_END DATE9.;
cards;
1 01JAN1961 01JAN1962
1 01MAR1962 01JAN1963
1 01FEB1963 31DEC9999
2 01JAN2001 01FEB2001
2 01MAR2001 31DEC9999
3 01JUL2011 30JUN2015
3 01NOV2015 31DEC9999
4 01MAY2016 01MAY2018
;
run;

data want (drop=_:);
  set test;
  by id;
  retain _strt;
  _strt=ifn(first.id=1 or dte_start-90>lag(dte_end),dte_start,_strt);
  if last.id;
  dte_start=_strt;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Ksharp
Super User

Assuming I understood what you mean.

 

 

data test;
input id : BEST8. DTE_START : DATE9. DTE_END : DATE9.;
FORMAT DTE_START DATE9. DTE_END DATE9.;
cards;
1 01JAN1961 01JAN1962
1 01MAR1962 01JAN1963
1 01FEB1963 31DEC9999
2 01JAN2001 01FEB2001
2 01MAR2001 31DEC9999
3 01JUL2011 30JUN2015
3 01NOV2015 31DEC9999
4 01MAY2016 01MAY2018
;
run;

data temp;
 set test;
 by id;
 diff=DTE_START-lag(DTE_END) ;
 if first.id then diff=.;
run;
data temp2;
 set temp;
 by id;
 if first.id or diff>90 then group+1;
run;
proc sql;
create table want as
select id,min(DTE_START) as DTE_START format=date9.,max(DTE_END) as DTE_END format=date9.
 from (select * from temp2 group by id  having group=max(group))
  group by id;
quit;
yabwon
Amethyst | Level 16

According to data you provided I think id=3 should have 2 obs in result

data _null_;
  x = '30JUN2015'd - '01NOV2015'd;
  put x=;
run;

A simple approach would be something like that:

data test;
input id : BEST8. DTE_START : DATE9. DTE_END : DATE9.;
FORMAT DTE_START DATE9. DTE_END DATE9.;
cards;
1 01JAN1961 01JAN1962
1 01MAR1962 01JAN1963
1 01FEB1963 31DEC9999
2 01JAN2001 01FEB2001
2 01MAR2001 31DEC9999
3 01JUL2011 30JUN2015
3 01NOV2015 31DEC9999
4 01MAY2016 01MAY2018
;
run;

data want1;
  set test;
  if abs(DTE_START - lag(DTE_END)) >= 90  then grp + 1;
run;
proc print;
run;

data want2;
  set want1;
  by id grp;

  if first.grp then s = DTE_START;
  if last.grp then output;

  retain s;
  drop DTE_START;
  rename 
    s = DTE_START
  ;
  format s DATE9.;
run;
proc print;
run;

 

Bart

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



nitink26
Obsidian | Level 7

Thanks for the reply and you are right but If the difference is more than 90 days then dte_start I wanna pick should be the latest dte_start as I am looking only one record per ID.

so for ID =3 output will be  -

id DTE_START DTE_END

3 01NOV2015 31DEC9999

If you could let me know how to get that would be highly appreciated. Thanks a ton!

mkeintz
PROC Star

You want 1 record per id, having the last DTE_END, and the latest DTE_START following a gap of over 90 days.

 

data test;
input id : BEST8. DTE_START : DATE9. DTE_END : DATE9.;
FORMAT DTE_START DATE9. DTE_END DATE9.;
cards;
1 01JAN1961 01JAN1962
1 01MAR1962 01JAN1963
1 01FEB1963 31DEC9999
2 01JAN2001 01FEB2001
2 01MAR2001 31DEC9999
3 01JUL2011 30JUN2015
3 01NOV2015 31DEC9999
4 01MAY2016 01MAY2018
;
run;

data want (drop=_:);
  set test;
  by id;
  retain _strt;
  _strt=ifn(first.id=1 or dte_start-90>lag(dte_end),dte_start,_strt);
  if last.id;
  dte_start=_strt;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1345 views
  • 0 likes
  • 4 in conversation