DATA Step, Macro, Functions and more

by group processing and loop

Accepted Solution Solved
Reply
Contributor
Posts: 65
Accepted Solution

by group processing and loop

The following is part of my data: If you can't see this because of format issue, please see the attached data.

FAMILY_IDFROMDATE
00013075801Jan2016
00013075811Feb2016
00013075801Mar2016
00053584721Dec2015
00053584722Dec2015
00180877201May2016
00180877202May2016
00180877203May2016
00908428530Jan2016
00908428501Mar2016
03506442023Aug2015
03506442024Sep2015
03506442023Oct2015
03506442024Nov2015
56783552024Dec2015

keep the fromdate If each distinct family_ID has only one fromdate. If each distinct family_ID has multiple fromdate, then compare the later fromdate and earlier fromdate. If the difference between later fromdate and earlier fromdate is > 30 then keep both date, then keep both fromdate; if the difference between later fromdate and earlier fromdate is <= 30, then keep only the earlier fromdate. For example, for family_ID  000130758, I keep all of three fromdate; for family_ID 000535847, I only keep the earlier fromdate 21Dec2015 and delete the fromdate 22Dec2015.

I appreciate any solutions and/or suggestions!

 

Accepted Solutions
Solution
‎12-21-2016 04:46 PM
Frequent Contributor
Posts: 136

Re: by group processing and loop

Posted in reply to zhuxiaoyan1

data one;

infile datalines;

input FAMILY_ID :9. FROMDATE date9. ;

Format FROMDATE date9. ;

datalines;

000130758 01Jan2016

000130758 11Feb2016

000130758 01Mar2016

000535847 21Dec2015

000535847 22Dec2015

001808772 01May2016

001808772 02May2016

001808772 03May2016

009084285 30Jan2016

009084285 01Mar2016

035064420 23Aug2015

035064420 24Sep2015

035064420 23Oct2015

035064420 24Nov2015

567835520 24Dec2015

;

run;

proc sort data=one;

by family_id fromdate;

run;

data two /*(drop=lag_family_id lag_fromdate lead_fromdate lagdate days)*/;

merge one one(firstobs=2 keep=fromdate rename=(fromdate=lead_fromdate));

lag_family_id=lag(family_id);

lag_fromdate=lag(fromdate);

if family_id=lag_family_id then lagdate=lag_fromdate;

if lagdate=. then lagdate=lead_fromdate;

days=ABS(INTCK('days',lagdate,fromdate));

run;

data three(drop=days);

set two (drop=lag_family_id lag_fromdate lead_fromdate lagdate);

by family_id;

if first.family_id or days>30 then output;

run;

Thanks,
Suryakiran

View solution in original post


All Replies
Trusted Advisor
Posts: 1,553

Re: by group processing and loop

Posted in reply to zhuxiaoyan1

Assumind data is sorted by family ID  fromdate:

 

data want;

 set have;

       by family_id;

           if first.family_id then output;

           else do;

                  lagdate = lag(fromdate);  /* previous observarion date */

                 if fromdate - lagdate > 30 then output;

         and;

run;

 

Frequent Contributor
Posts: 136

Re: by group processing and loop

Posted in reply to zhuxiaoyan1

You said you keep all the three for family_id 130758. Here family_id 130758 has 3 records. Is your criteria when it has more than 2 is to check the date diff with only first and last record.


000130758 01Jan2016
000130758 11Feb2016
000130758 01Mar2016

 

when you compare first two records then date diff is >30 so keep both. when you compare 2 and 3rd record then date diff is <30 then should keep only 2nd record. Finally it should have only two records.

 

Correct me if I'm wrong.

Thanks,
Suryakiran
Contributor
Posts: 65

Re: by group processing and loop

Posted in reply to SuryaKiran
Yes, you are right. It should have only 2 records for family_ID 00130758. I said it wrong. I'm very sorry about this.

Xiaoyan Zhu
Trusted Advisor
Posts: 1,137

Re: by group processing and loop

Posted in reply to zhuxiaoyan1

Please check this output

data want1 ;
record=_n_+1;
set have end=last;
if not last then set have(keep=fromdate rename=(fromdate=fromdate2)) point=record;
else call missing(fromdate2);
run;

data want2 ;
set want1;
by FAMILY_ID;
lagdate=lag(fromdate);
if last.FAMILY_ID then fromdate2=.;
if first.family_id then lagdate=.;
fromdiff=ifn((fromdate2-fromdate)>30,(fromdate2-fromdate),.);
lagdiff=ifn((fromdate-lagdate)>30,(fromdate-lagdate),.);
result=coalesce(lagdiff,fromdiff);
format lagdate date9.;
if result ne .;
drop fromdate2 lagdate fromdiff lagdiff result;
run;
Thanks,
Jag
Trusted Advisor
Posts: 1,018

Re: by group processing and loop

Posted in reply to zhuxiaoyan1

You want to look both forward and backward to find records within 30 days of a predecessor or successor.   This is an appropriate context to use the LAG functions  (to see if current record is near its predecessor) and also a MERGE with FIrSTOBS=2 parameter (to  see if current record is near to successor).

 

If the family_id is a character var, then change "dif(family_id)^=0" to "family_id^=lag(family_id)".  and an analogous change for "dif(family_id)=0".

 

data want (drop=nxtSmiley Happy;

merge have

      have (firstobs=2 keep=family_id fromdate rename=(family_id=nxtfid fromdate=nxtfd));

 

if (dif(family_id)^=0 and nxtfid^=family_id) /* Single record*/

  or

   (nxtfid=family_id and (nxtfd-fromdate)<=30) /* Early member of 30-day pair*/

  or

   (dif(family_id)=0 and dif(fromdate)<=30) /* Late member of 30-day pair*/

  ;

run;

Contributor
Posts: 65

Re: by group processing and loop

Posted in reply to zhuxiaoyan1
You have received a secure message from "Zhu, Xiaoyan" entitled, "Re: by group processing and loop".

You can view the message (before 01/04/2017) at the following web address:
https://mg.usablecs.com/enduser/msg.html?x=d-07507df3c11041317d318b077abb618f9cee1286b2d1eef009b5d59...


-----------------------------------------------------------------------------------------
Delivered with MailGate SC (TM)
http://www.axway.com
MailGate SC is a trademark of Axway.
Frequent Contributor
Posts: 136

Re: by group processing and loop

Posted in reply to zhuxiaoyan1

I guess your condition dosen't have a proper reason, beacuase consider this group

03506442023Aug2015
03506442024Sep2015
03506442023Oct2015
03506442024Nov2015

When you condider row 1 and 2 the difference is >30 so you need both of them. when considering row 2 and 3 difference is <30 then you only need row 2 and not the row 3. Now here comes the problem, when you consider row 3 and row 4 the difference is >30 so youy need both the rows 3 and 4. So according to your condition the row 3 is satsifing and even not satsifing. confusingSmiley Surprised

 

Thanks,
Suryakiran
Contributor
Posts: 65

Re: by group processing and loop

Posted in reply to SuryaKiran

So I keep row1, row2 and row4. It is confusing. but this is what my boss wants.Smiley Frustrated

Frequent Contributor
Posts: 136

Re: by group processing and loop

Posted in reply to zhuxiaoyan1

data one;

infile datalines;

input FAMILY_ID :9. FROMDATE date9. ;

Format FROMDATE date9. ;

datalines;

000130758 01Jan2016

000130758 11Feb2016

000130758 01Mar2016

000535847 21Dec2015

000535847 22Dec2015

001808772 01May2016

001808772 02May2016

001808772 03May2016

009084285 30Jan2016

009084285 01Mar2016

035064420 23Aug2015

035064420 24Sep2015

035064420 23Oct2015

035064420 24Nov2015

567835520 24Dec2015

;

run;

proc sort data=one;

by family_id fromdate;

run;

data two (drop=lag_family_id lag_fromdate lead_fromdate lagdate days);

merge one one(firstobs=2 keep=fromdate rename=(fromdate=lead_fromdate));

lag_family_id=lag(family_id);

lag_fromdate=lag(fromdate);

if family_id=lag_family_id then lagdate=lag_fromdate;

if lagdate=. then lagdate=lead_fromdate;

days=ABS(INTCK('days',lagdate,fromdate));

if days>30 or days=. then output;

run;

Thanks,
Suryakiran
Contributor
Posts: 65

Re: by group processing and loop

Posted in reply to SuryaKiran
I also need these two rowes:
000535847 21Dec2015
001808772 01May2016
Solution
‎12-21-2016 04:46 PM
Frequent Contributor
Posts: 136

Re: by group processing and loop

Posted in reply to zhuxiaoyan1

data one;

infile datalines;

input FAMILY_ID :9. FROMDATE date9. ;

Format FROMDATE date9. ;

datalines;

000130758 01Jan2016

000130758 11Feb2016

000130758 01Mar2016

000535847 21Dec2015

000535847 22Dec2015

001808772 01May2016

001808772 02May2016

001808772 03May2016

009084285 30Jan2016

009084285 01Mar2016

035064420 23Aug2015

035064420 24Sep2015

035064420 23Oct2015

035064420 24Nov2015

567835520 24Dec2015

;

run;

proc sort data=one;

by family_id fromdate;

run;

data two /*(drop=lag_family_id lag_fromdate lead_fromdate lagdate days)*/;

merge one one(firstobs=2 keep=fromdate rename=(fromdate=lead_fromdate));

lag_family_id=lag(family_id);

lag_fromdate=lag(fromdate);

if family_id=lag_family_id then lagdate=lag_fromdate;

if lagdate=. then lagdate=lead_fromdate;

days=ABS(INTCK('days',lagdate,fromdate));

run;

data three(drop=days);

set two (drop=lag_family_id lag_fromdate lead_fromdate lagdate);

by family_id;

if first.family_id or days>30 then output;

run;

Thanks,
Suryakiran
Contributor
Posts: 65

Re: by group processing and loop

Posted in reply to SuryaKiran
Thank you so much!

Xiaoyan
Trusted Advisor
Posts: 1,137

Re: by group processing and loop

[ Edited ]
Posted in reply to zhuxiaoyan1

I updated my previous code as to get your expectated output

 

data want1 ;
record=_n_+1;
set have end=last;
if not last then set have(keep=fromdate rename=(fromdate=fromdate2)) point=record;
else call missing(fromdate2);
run;

data want2 ;
set want1;
by FAMILY_ID;
lagdate=lag(fromdate);
if last.FAMILY_ID then fromdate2=.;
if first.family_id then lagdate=.;
fromdiff=fromdate2-fromdate;
lagdiff=fromdate-lagdate;
result=coalesce(lagdiff,fromdiff);
format lagdate date9.;
if first.family_id or result >30;
drop fromdate2 lagdate fromdiff lagdiff result;
run;

 

Thanks,
Jag
Contributor
Posts: 65

Re: by group processing and loop

Posted in reply to Jagadishkatam
It works perfectly! Thank you very much!

Xiaoyan
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 439 views
  • 0 likes
  • 5 in conversation