BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
phantomofshell
Calcite | Level 5

Hello there, 

I am try to collapse two observation by group and conditionAnnotation .png

 When the Id is the same, and Type for both observation is repeat ,and the date different is within 1 month .

Then collapse two observation and only keep the early date.

 

id amount  date type 
152017-01-01new
152018-01-01repeat
112018-01-17repeat
212017-01-01new
212018-01-01repeat
212019-01-01repeat
322001-01-05repeat
322020-01-03new
322020-04-01new

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @phantomofshell  Can you check the results for ID=2 in your latest post and let me know where the discrepancy is. If my understanding is correct, the below should come close. The only concern I have is sets of 2 or more within type(new,repeat)

 


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1 5 2017-01-01 new 
1 5 2017-01-07 repeat 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 
2 1 2017-01-02 other
2 1 2018-01-01 repeat 
2 1 2018-01-07 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 
;


data want ;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("date") ;
  h.definedata ("prev","curr","amount") ;
  h.definedone () ;
 end;
 do _n_=1 by 1 until(last.id);
  set have end=z;
  by id ;
  curr=date;
  _s=lag(amount);
  if type not in ('new','repeat') then prev=.;
  else if prev then do;
   if intck('mon',prev,curr,'c')<1 then do;
   	amount=sum(amount,_s);
	h.add(key:prev,data:prev,data:curr,data:amount);
	h.add(key:curr,data:prev,data:curr,data:amount);
   end;
	prev=date;
  end;
  else prev=date;
 end;
 do _n_=1 to _n_;
  set have;
  if h.find()=0 then do;
   if date=curr then continue;
   output;
  end;
  else output;
 end;
 h.clear();
 drop curr prev _s;
run;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20

HI @phantomofshell  Please post the sample as plain text to make it convenient for community members to copay/paste and work on the solution. Pics make it impossible. 

phantomofshell
Calcite | Level 5
id amount  date type 
152017-01-01new
152018-01-01repeat
112018-01-17repeat
212017-01-01new
212018-01-01repeat
212019-01-01repeat
322001-01-05repeat
322020-01-03new
322020-04-01new

new to the community , sorry

novinosrin
Tourmaline | Level 20

Hi @phantomofshell  Welcome to SAS communities. To make it easy, I am herewith providing a traditional rather 2 step solution for reading ease and convenience.

 

Logic:  

1. Determine the repeat groups that are consecutive and check for the sets of 2

2. Should the date difference within a set of 2 is less than a month, group that set as 1

3. Iterate the same logic to identify different sets of grps that satisfy condition2 and increment the grp number accordindly

4. The result sets the base for a simple Summary stats to sum the group and take the earliest number within that group. This is what the Proc SQL does

 



data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1	5	2017-01-01	new
1	5	2018-01-01	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new
;

data temp;
 set have;
 by id type notsorted;
 retain grp prev_date;
 if first.id then grp=0;
 if first.type and type='repeat' then do;
   grp=1;
   prev_date=date;
 end;
 else if type='repeat' and intck('mon',prev_date,date,'c')>1 then do;
  grp+1;
  prev_date=date;
 end;
 else if type ne 'repeat' then grp=0;
 drop prev_date;
run;
proc sql;
create table want(drop=sum) as
select id,sum( case when type='repeat' then amount else . end) as sum,
ifn(type='repeat',calculated sum,amount) as amount,date,type
from temp
group by id,grp
having min(date)=date and grp or not grp
order by  id,date;
quit;

See if the above helps and let us know.

 

Also, Your sample leads to some assumptions that can be best helped if you can clarify:

 

1. Within a ID group, can exist a scenario like below?

1 5 2018-01-01 repeat
1 1 2018-01-17 repeat
1 4 2018-01-19 repeat
1 6 2018-01-22 repeat

So if yes, does this make it 2 sets of two obs or all 4 belongs to one set making it a sum(5,1,4,6)=16 with date 2018-01-01 ?

 

2. Two obs combining to one as stated in the subject line can make the situation highly ambiguous as sometimes what if 2&3 can make an independent set leaving 1 and 4. How would you want to handle this.

 

I would request you to review the requirement diligently

phantomofshell
Calcite | Level 5
1	5	2017-01-01	new
1	5	2017-01-015	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new

Thanks for your help novionosrin, what if I would like to change the condition say :for the same id  if previous type  within 1 month= new and  current type = repeat, then I collapse the two as what we did before 

novinosrin
Tourmaline | Level 20

What should the output look like, would the collapsed summed record take thew date of NEW or REPEAT?

 

Or is the below assumption is what it is:

 


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1	5	2017-01-01	new
1	5	2017-01-015	repeat
1	1	2018-01-17	repeat
2	1	2017-01-01	new
2	1	2018-01-01	repeat
2	1	2019-01-01	repeat
3	2	2001-01-05	repeat
3	2	2020-01-03	new
3	2	2020-04-01	new
;

data temp;
 set have;
 by id ;
 retain grp prev_date;
 if first.id then grp=0;
 if  type='new' then do;
   grp+1;
   prev_date=date;
 end;
  if   lag(type) ne 'new'  and type='repeat' or
 type='repeat' and lag(type)='new' and intck('mon',prev_date,date,'c')
then  grp=0;
 drop prev_date;
run;

proc sql;
create table want(drop=sum) as
select id,sum(amount) as sum,
ifn(type='new',calculated sum,amount) as amount,date,type
from temp
group by id,grp
having min(date)=date and grp or not grp
order by  id,date;
quit;

 

phantomofshell
Calcite | Level 5

This is the Original Set

1 5 2017-01-01 new 
1 5 2017-01-07 repeat 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 

2 1 2017-01-02 other
2 1 2018-01-01 repeat 
2 1 2018-01-07 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 

 

This is the result

 

1 10 2017-01-01 new 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 
2 1 2018-01-01 repeat 

2 1 2017-01-02 other
2 2 2018-01-01 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 

 

Sorry about the ambiguous description.  Please be aware of there's other Type out there, so I added one other here.

The logic is simple, for one ID , if there are observations for new or repeat ,and these observations have time stamp within one month.

Then I consider those observation as one, so I add the amount and take the date and type of the first observation (timely most oldest)

novinosrin
Tourmaline | Level 20

Hi @phantomofshell  Can you check the results for ID=2 in your latest post and let me know where the discrepancy is. If my understanding is correct, the below should come close. The only concern I have is sets of 2 or more within type(new,repeat)

 


data have;
input id 	amount  	date :yymmdd10. 	type $;
format date yymmdd10.;
cards;
1 5 2017-01-01 new 
1 5 2017-01-07 repeat 
1 1 2018-01-17 repeat 
2 1 2017-01-01 new 
2 1 2017-01-02 other
2 1 2018-01-01 repeat 
2 1 2018-01-07 repeat 
3 2 2001-01-05 repeat 
3 2 2020-01-03 new 
3 2 2020-04-01 new 
;


data want ;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("date") ;
  h.definedata ("prev","curr","amount") ;
  h.definedone () ;
 end;
 do _n_=1 by 1 until(last.id);
  set have end=z;
  by id ;
  curr=date;
  _s=lag(amount);
  if type not in ('new','repeat') then prev=.;
  else if prev then do;
   if intck('mon',prev,curr,'c')<1 then do;
   	amount=sum(amount,_s);
	h.add(key:prev,data:prev,data:curr,data:amount);
	h.add(key:curr,data:prev,data:curr,data:amount);
   end;
	prev=date;
  end;
  else prev=date;
 end;
 do _n_=1 to _n_;
  set have;
  if h.find()=0 then do;
   if date=curr then continue;
   output;
  end;
  else output;
 end;
 h.clear();
 drop curr prev _s;
run;
phantomofshell
Calcite | Level 5

Thanks so much for your help! It works

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 821 views
  • 0 likes
  • 2 in conversation