BookmarkSubscribeRSS Feed
SAS_Muggle
Fluorite | Level 6

I have some data that looks like the below:

 

ID  TYPE

1     A

1     A

1     B

1     B

1     A     *

1     A     *

1     A     *

1     C

1     C

2     A

...

 

This data is already sorted by  a 3rd variable TIME, which is not shown here. Basically I was to flag when the count of a particular combination of ID and TYPE is greater than 2 (see starred rows). It is possible for example that 1A is followed by 1B but the count of each is less than or equal to two, I wouldnt flag those. I only want to flag it if they occur more than twice consecutively in the sequence it is already sorted by. 

 

8 REPLIES 8
SASKiwi
PROC Star

There are probably smarter ways to do this but here is one possibility:

data Have;
  input ID $ Type $;
  datalines;
1     A
1     A
1     B
1     B
1     A 
1     A 
1     A  
1     C
1     C
2     A
;
run;

data Have2;
  set Have;
  by ID;
  if first.ID then Type_Count = 1;
  retain Last_Type;
  else do;
    if Type = Last_Type then Type_Count + 1;
	else Type_Count = 1;
    if Type_Count >= 3 then Type_Count_Flag = 'Y';
  end;
  Last_Type = Type;
  if Type_Count = 1 then Group_ID + 1;
  Order + 1;
run;

proc sql;
  create table Want as
  select A.ID
        ,A.Type
        ,B.Type_Count_Flag
  from Have2 as A
  left join Have2 as B
  on A.Group_ID = B.Group_ID
  and B.Type_Count_Flag = 'Y'
  order by A.Order;
quit;
whymath
Lapis Lazuli | Level 10

You can use a lookup array:

data have;
  input ID TYPE$;
  cards;
1 A
1 A
1 B
1 B
1 A
1 A
1 A
1 C
1 C
2 A
1 C
1 C
;
run;

%let nobs=&sysnobs;
data want;
  set have;
  by id type notsorted;

  combine=catx('#',id,type);
  array _par_[&nobs.]$_temporary_;
  retain flag;

  if first.type then do;
    if combine in _par_ then flag='*';
    else do;
      _par_[_n_]=combine;
      flag='';
    end;
  end;
run;

1.png

If ID and TYPE combination have been appeared in the array, flag it, if not, push this combination into array. And be caution to do this at first.TYPE, or it is more difficult to consider the consecutive ID and TYPE combinations.

whymath
Lapis Lazuli | Level 10

Shift! I got it wrong. Here is the right anwser:

data want;
  do until(last.type);
    set have;
    by id type notsorted;
    sum=sum(sum,1);
  end;

  do until(last.type);
    set have;
    by id type notsorted;
    if sum>2 then flag='*';
    output;
  end;
run;
SAS_Muggle
Fluorite | Level 6
hm its not working, its flagging everything for me because its sorting by ID and TYPE regardless of the TIME variable.
whymath
Lapis Lazuli | Level 10
Could you please post an example data with the TIME variable? And write down what is your desired output in this example, too.
Tom
Super User Tom
Super User

@SAS_Muggle wrote:
hm its not working, its flagging everything for me because its sorting by ID and TYPE regardless of the TIME variable.

Why did it SORT by ID and TYPE?

Did you accidentally add an unwanted PROC SORT step?
Is the data coming from some external database that is doing the sorting on the fly?

s_lassen
Meteorite | Level 14

I assume that your data is sorted by ID and TIME, and if you do not sort the data by ID and TYPE before the data step, the solution proposed by @whymath should work (note the NOTSORTED option on the BY statement) - or this one, which makes the count in the loop header:

data want;
  do _N_=1 by 1 until(last.type);
    set have;
    by id type notsorted;
    end;
  if _N_>2 then flag ='*';
  do until(last.type);
    set have;
    by id type notsorted;
    output;
    end;  
run;
novinosrin
Tourmaline | Level 20

Hi @SAS_Muggle , Assuming my understanding of your requirement is correct,  methinks your question presents a neat use case for Hash suminc method. Of course, Hash requires significant RAM, nevertheless should be quick.

 


data Have;
  input ID $ Type $;
  datalines;
1     A
1     A
1     B
1     B
1     A 
1     A 
1     A  
1     C
1     C
2     A
;
run;


data want ;
  retain _iorc_ 1 ;
  if _n_=1 then do;
    dcl hash h (suminc:'_iorc_') ;
    h.definekey('id','type' ) ;
    h.definedone( );
  end ;
  set have ;
  h.ref ( ) ;
  h.sum (sum : _count) ; 
  if _count > 2 then flag = '*' ;
  drop _: ;
run ;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2295 views
  • 5 likes
  • 6 in conversation