Solved: Collapse two observation by ID and condition

phantomofshell · Posted 02-10-2020 10:25 AM

Hello there,

I am try to collapse two observation by group and condition

When the Id is the same, and Type for both observation is repeat ,and the date different is within 1 month .

Then collapse two observation and only keep the early date.

id	amount	date	type
1	5	2017-01-01	new
1	5	2018-01-01	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new

Thanks

novinosrin · Posted 02-11-2020 01:51 PM

Hi @phantomofshell Can you check the results for ID=2 in your latest post and let me know where the discrepancy is. If my understanding is correct, the below should come close. The only concern I have is sets of 2 or more within type(new,repeat)


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1 5 2017-01-01 new 
1 5 2017-01-07 repeat 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 
2 1 2017-01-02 other
2 1 2018-01-01 repeat 
2 1 2018-01-07 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 
;


data want ;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("date") ;
  h.definedata ("prev","curr","amount") ;
  h.definedone () ;
 end;
 do _n_=1 by 1 until(last.id);
  set have end=z;
  by id ;
  curr=date;
  _s=lag(amount);
  if type not in ('new','repeat') then prev=.;
  else if prev then do;
   if intck('mon',prev,curr,'c')<1 then do;
   	amount=sum(amount,_s);
	h.add(key:prev,data:prev,data:curr,data:amount);
	h.add(key:curr,data:prev,data:curr,data:amount);
   end;
	prev=date;
  end;
  else prev=date;
 end;
 do _n_=1 to _n_;
  set have;
  if h.find()=0 then do;
   if date=curr then continue;
   output;
  end;
  else output;
 end;
 h.clear();
 drop curr prev _s;
run;

View solution in original post

novinosrin · Posted 02-10-2020 10:28 AM

HI @phantomofshell Please post the sample as plain text to make it convenient for community members to copay/paste and work on the solution. Pics make it impossible.

phantomofshell · Posted 02-10-2020 12:48 PM

id	amount	date	type
1	5	2017-01-01	new
1	5	2018-01-01	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new

new to the community , sorry

novinosrin · Posted 02-10-2020 01:40 PM

Hi @phantomofshell Welcome to SAS communities. To make it easy, I am herewith providing a traditional rather 2 step solution for reading ease and convenience.

Logic:

1. Determine the repeat groups that are consecutive and check for the sets of 2

2. Should the date difference within a set of 2 is less than a month, group that set as 1

3. Iterate the same logic to identify different sets of grps that satisfy condition2 and increment the grp number accordindly

4. The result sets the base for a simple Summary stats to sum the group and take the earliest number within that group. This is what the Proc SQL does



data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1	5	2017-01-01	new
1	5	2018-01-01	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new
;

data temp;
 set have;
 by id type notsorted;
 retain grp prev_date;
 if first.id then grp=0;
 if first.type and type='repeat' then do;
   grp=1;
   prev_date=date;
 end;
 else if type='repeat' and intck('mon',prev_date,date,'c')>1 then do;
  grp+1;
  prev_date=date;
 end;
 else if type ne 'repeat' then grp=0;
 drop prev_date;
run;
proc sql;
create table want(drop=sum) as
select id,sum( case when type='repeat' then amount else . end) as sum,
ifn(type='repeat',calculated sum,amount) as amount,date,type
from temp
group by id,grp
having min(date)=date and grp or not grp
order by  id,date;
quit;

See if the above helps and let us know.

Also, Your sample leads to some assumptions that can be best helped if you can clarify:

1. Within a ID group, can exist a scenario like below?

1 5 2018-01-01 repeat
1 1 2018-01-17 repeat
1 4 2018-01-19 repeat
1 6 2018-01-22 repeat

So if yes, does this make it 2 sets of two obs or all 4 belongs to one set making it a sum(5,1,4,6)=16 with date 2018-01-01 ?

2. Two obs combining to one as stated in the subject line can make the situation highly ambiguous as sometimes what if 2&3 can make an independent set leaving 1 and 4. How would you want to handle this.

I would request you to review the requirement diligently

phantomofshell · Posted 02-10-2020 06:32 PM

1	5	2017-01-01	new
1	5	2017-01-015	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new

Thanks for your help novionosrin, what if I would like to change the condition say :for the same id if previous type within 1 month= new and current type = repeat, then I collapse the two as what we did before

novinosrin · Posted 02-10-2020 09:06 PM

What should the output look like, would the collapsed summed record take thew date of NEW or REPEAT?

Or is the below assumption is what it is:


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1	5	2017-01-01	new
1	5	2017-01-015	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new
;

data temp;
 set have;
 by id ;
 retain grp prev_date;
 if first.id then grp=0;
 if  type='new' then do;
   grp+1;
   prev_date=date;
 end;
  if   lag(type) ne 'new'  and type='repeat' or
 type='repeat' and lag(type)='new' and intck('mon',prev_date,date,'c')
then  grp=0;
 drop prev_date;
run;

proc sql;
create table want(drop=sum) as
select id,sum(amount) as sum,
ifn(type='new',calculated sum,amount) as amount,date,type
from temp
group by id,grp
having min(date)=date and grp or not grp
order by  id,date;
quit;

phantomofshell · Posted 02-11-2020 11:31 AM

This is the Original Set

1 5 2017-01-01 new
1 5 2017-01-07 repeat
1 1 2018-01-17 repeat
2 1 2017-01-01 new

2 1 2017-01-02 other
2 1 2018-01-01 repeat
2 1 2018-01-07 repeat
3 2 2001-01-05 repeat
3 2 2020-01-03 new
3 2 2020-04-01 new

This is the result

1 10 2017-01-01 new
1 1 2018-01-17 repeat
2 1 2017-01-01 new
2 1 2018-01-01 repeat

2 1 2017-01-02 other
2 2 2018-01-01 repeat
3 2 2001-01-05 repeat
3 2 2020-01-03 new
3 2 2020-04-01 new

Sorry about the ambiguous description. Please be aware of there's other Type out there, so I added one other here.

The logic is simple, for one ID , if there are observations for new or repeat ,and these observations have time stamp within one month.

Then I consider those observation as one, so I add the amount and take the date and type of the first observation (timely most oldest)

novinosrin · Posted 02-11-2020 01:51 PM

Hi @phantomofshell Can you check the results for ID=2 in your latest post and let me know where the discrepancy is. If my understanding is correct, the below should come close. The only concern I have is sets of 2 or more within type(new,repeat)


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1 5 2017-01-01 new 
1 5 2017-01-07 repeat 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 
2 1 2017-01-02 other
2 1 2018-01-01 repeat 
2 1 2018-01-07 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 
;


data want ;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("date") ;
  h.definedata ("prev","curr","amount") ;
  h.definedone () ;
 end;
 do _n_=1 by 1 until(last.id);
  set have end=z;
  by id ;
  curr=date;
  _s=lag(amount);
  if type not in ('new','repeat') then prev=.;
  else if prev then do;
   if intck('mon',prev,curr,'c')<1 then do;
   	amount=sum(amount,_s);
	h.add(key:prev,data:prev,data:curr,data:amount);
	h.add(key:curr,data:prev,data:curr,data:amount);
   end;
	prev=date;
  end;
  else prev=date;
 end;
 do _n_=1 to _n_;
  set have;
  if h.find()=0 then do;
   if date=curr then continue;
   output;
  end;
  else output;
 end;
 h.clear();
 drop curr prev _s;
run;

phantomofshell · Posted 02-11-2020 04:42 PM

Thanks so much for your help! It works

Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

Re: Collapse two observation by ID and condition

SAS Innovate 2025: Call for Content

Classroom Training Available!