BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
phantomofshell
Calcite | Level 5

Hello there, 

I am try to collapse two observation by group and conditionAnnotation .png

 When the Id is the same, and Type for both observation is repeat ,and the date different is within 1 month .

Then collapse two observation and only keep the early date.

 

id amount  date type 
152017-01-01new
152018-01-01repeat
112018-01-17repeat
212017-01-01new
212018-01-01repeat
212019-01-01repeat
322001-01-05repeat
322020-01-03new
322020-04-01new

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @phantomofshell  Can you check the results for ID=2 in your latest post and let me know where the discrepancy is. If my understanding is correct, the below should come close. The only concern I have is sets of 2 or more within type(new,repeat)

 


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1 5 2017-01-01 new 
1 5 2017-01-07 repeat 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 
2 1 2017-01-02 other
2 1 2018-01-01 repeat 
2 1 2018-01-07 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 
;


data want ;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("date") ;
  h.definedata ("prev","curr","amount") ;
  h.definedone () ;
 end;
 do _n_=1 by 1 until(last.id);
  set have end=z;
  by id ;
  curr=date;
  _s=lag(amount);
  if type not in ('new','repeat') then prev=.;
  else if prev then do;
   if intck('mon',prev,curr,'c')<1 then do;
   	amount=sum(amount,_s);
	h.add(key:prev,data:prev,data:curr,data:amount);
	h.add(key:curr,data:prev,data:curr,data:amount);
   end;
	prev=date;
  end;
  else prev=date;
 end;
 do _n_=1 to _n_;
  set have;
  if h.find()=0 then do;
   if date=curr then continue;
   output;
  end;
  else output;
 end;
 h.clear();
 drop curr prev _s;
run;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20

HI @phantomofshell  Please post the sample as plain text to make it convenient for community members to copay/paste and work on the solution. Pics make it impossible. 

phantomofshell
Calcite | Level 5
id amount  date type 
152017-01-01new
152018-01-01repeat
112018-01-17repeat
212017-01-01new
212018-01-01repeat
212019-01-01repeat
322001-01-05repeat
322020-01-03new
322020-04-01new

new to the community , sorry

novinosrin
Tourmaline | Level 20

Hi @phantomofshell  Welcome to SAS communities. To make it easy, I am herewith providing a traditional rather 2 step solution for reading ease and convenience.

 

Logic:  

1. Determine the repeat groups that are consecutive and check for the sets of 2

2. Should the date difference within a set of 2 is less than a month, group that set as 1

3. Iterate the same logic to identify different sets of grps that satisfy condition2 and increment the grp number accordindly

4. The result sets the base for a simple Summary stats to sum the group and take the earliest number within that group. This is what the Proc SQL does

 



data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1	5	2017-01-01	new
1	5	2018-01-01	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new
;

data temp;
 set have;
 by id type notsorted;
 retain grp prev_date;
 if first.id then grp=0;
 if first.type and type='repeat' then do;
   grp=1;
   prev_date=date;
 end;
 else if type='repeat' and intck('mon',prev_date,date,'c')>1 then do;
  grp+1;
  prev_date=date;
 end;
 else if type ne 'repeat' then grp=0;
 drop prev_date;
run;
proc sql;
create table want(drop=sum) as
select id,sum( case when type='repeat' then amount else . end) as sum,
ifn(type='repeat',calculated sum,amount) as amount,date,type
from temp
group by id,grp
having min(date)=date and grp or not grp
order by  id,date;
quit;

See if the above helps and let us know.

 

Also, Your sample leads to some assumptions that can be best helped if you can clarify:

 

1. Within a ID group, can exist a scenario like below?

1 5 2018-01-01 repeat
1 1 2018-01-17 repeat
1 4 2018-01-19 repeat
1 6 2018-01-22 repeat

So if yes, does this make it 2 sets of two obs or all 4 belongs to one set making it a sum(5,1,4,6)=16 with date 2018-01-01 ?

 

2. Two obs combining to one as stated in the subject line can make the situation highly ambiguous as sometimes what if 2&3 can make an independent set leaving 1 and 4. How would you want to handle this.

 

I would request you to review the requirement diligently

phantomofshell
Calcite | Level 5
1	5	2017-01-01	new
1	5	2017-01-015	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new

Thanks for your help novionosrin, what if I would like to change the condition say :for the same id  if previous type  within 1 month= new and  current type = repeat, then I collapse the two as what we did before 

novinosrin
Tourmaline | Level 20

What should the output look like, would the collapsed summed record take thew date of NEW or REPEAT?

 

Or is the below assumption is what it is:

 


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1	5	2017-01-01	new
1	5	2017-01-015	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new
;

data temp;
 set have;
 by id ;
 retain grp prev_date;
 if first.id then grp=0;
 if  type='new' then do;
   grp+1;
   prev_date=date;
 end;
  if   lag(type) ne 'new'  and type='repeat' or
 type='repeat' and lag(type)='new' and intck('mon',prev_date,date,'c')
then  grp=0;
 drop prev_date;
run;

proc sql;
create table want(drop=sum) as
select id,sum(amount) as sum,
ifn(type='new',calculated sum,amount) as amount,date,type
from temp
group by id,grp
having min(date)=date and grp or not grp
order by  id,date;
quit;

 

phantomofshell
Calcite | Level 5

This is the Original Set

1 5 2017-01-01 new 
1 5 2017-01-07 repeat 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 

2 1 2017-01-02 other
2 1 2018-01-01 repeat 
2 1 2018-01-07 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 

 

This is the result

 

1 10 2017-01-01 new 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 
2 1 2018-01-01 repeat 

2 1 2017-01-02 other
2 2 2018-01-01 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 

 

Sorry about the ambiguous description.  Please be aware of there's other Type out there, so I added one other here.

The logic is simple, for one ID , if there are observations for new or repeat ,and these observations have time stamp within one month.

Then I consider those observation as one, so I add the amount and take the date and type of the first observation (timely most oldest)

novinosrin
Tourmaline | Level 20

Hi @phantomofshell  Can you check the results for ID=2 in your latest post and let me know where the discrepancy is. If my understanding is correct, the below should come close. The only concern I have is sets of 2 or more within type(new,repeat)

 


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1 5 2017-01-01 new 
1 5 2017-01-07 repeat 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 
2 1 2017-01-02 other
2 1 2018-01-01 repeat 
2 1 2018-01-07 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 
;


data want ;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("date") ;
  h.definedata ("prev","curr","amount") ;
  h.definedone () ;
 end;
 do _n_=1 by 1 until(last.id);
  set have end=z;
  by id ;
  curr=date;
  _s=lag(amount);
  if type not in ('new','repeat') then prev=.;
  else if prev then do;
   if intck('mon',prev,curr,'c')<1 then do;
   	amount=sum(amount,_s);
	h.add(key:prev,data:prev,data:curr,data:amount);
	h.add(key:curr,data:prev,data:curr,data:amount);
   end;
	prev=date;
  end;
  else prev=date;
 end;
 do _n_=1 to _n_;
  set have;
  if h.find()=0 then do;
   if date=curr then continue;
   output;
  end;
  else output;
 end;
 h.clear();
 drop curr prev _s;
run;
phantomofshell
Calcite | Level 5

Thanks so much for your help! It works

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1526 views
  • 0 likes
  • 2 in conversation