I have some data that looks like the below:
1 A *
1 A *
1 A *
This data is already sorted by a 3rd variable TIME, which is not shown here. Basically I was to flag when the count of a particular combination of ID and TYPE is greater than 2 (see starred rows). It is possible for example that 1A is followed by 1B but the count of each is less than or equal to two, I wouldnt flag those. I only want to flag it if they occur more than twice consecutively in the sequence it is already sorted by.
There are probably smarter ways to do this but here is one possibility:
data Have; input ID $ Type $; datalines; 1 A 1 A 1 B 1 B 1 A 1 A 1 A 1 C 1 C 2 A ; run; data Have2; set Have; by ID; if first.ID then Type_Count = 1; retain Last_Type; else do; if Type = Last_Type then Type_Count + 1; else Type_Count = 1; if Type_Count >= 3 then Type_Count_Flag = 'Y'; end; Last_Type = Type; if Type_Count = 1 then Group_ID + 1; Order + 1; run; proc sql; create table Want as select A.ID ,A.Type ,B.Type_Count_Flag from Have2 as A left join Have2 as B on A.Group_ID = B.Group_ID and B.Type_Count_Flag = 'Y' order by A.Order; quit;
You can use a lookup array:
data have; input ID TYPE$; cards; 1 A 1 A 1 B 1 B 1 A 1 A 1 A 1 C 1 C 2 A 1 C 1 C ; run; %let nobs=&sysnobs; data want; set have; by id type notsorted; combine=catx('#',id,type); array _par_[&nobs.]$_temporary_; retain flag; if first.type then do; if combine in _par_ then flag='*'; else do; _par_[_n_]=combine; flag=''; end; end; run;
If ID and TYPE combination have been appeared in the array, flag it, if not, push this combination into array. And be caution to do this at first.TYPE, or it is more difficult to consider the consecutive ID and TYPE combinations.
hm its not working, its flagging everything for me because its sorting by ID and TYPE regardless of the TIME variable.
Why did it SORT by ID and TYPE?
Did you accidentally add an unwanted PROC SORT step?
Is the data coming from some external database that is doing the sorting on the fly?
I assume that your data is sorted by ID and TIME, and if you do not sort the data by ID and TYPE before the data step, the solution proposed by @whymath should work (note the NOTSORTED option on the BY statement) - or this one, which makes the count in the loop header:
data want; do _N_=1 by 1 until(last.type); set have; by id type notsorted; end; if _N_>2 then flag ='*'; do until(last.type); set have; by id type notsorted; output; end; run;
Hi @SAS_Muggle , Assuming my understanding of your requirement is correct, methinks your question presents a neat use case for Hash suminc method. Of course, Hash requires significant RAM, nevertheless should be quick.
data Have; input ID $ Type $; datalines; 1 A 1 A 1 B 1 B 1 A 1 A 1 A 1 C 1 C 2 A ; run; data want ; retain _iorc_ 1 ; if _n_=1 then do; dcl hash h (suminc:'_iorc_') ; h.definekey('id','type' ) ; h.definedone( ); end ; set have ; h.ref ( ) ; h.sum (sum : _count) ; if _count > 2 then flag = '*' ; drop _: ; run ;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.