BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

Hi all,

i have a dataset as below 

  data have;
        input col1 col2;
        datalines;
    1001 1.5
    1001 1
    1001 2.5
    1001 2.5
    1001 1.5
1001 1.5
1001 1.5
1002 1.5 1002 4.25 1002 2.5 1002 2.5 1002 2.5
1002 2.5
1002 1.5
1003 6.5 1003 1003 1.5 1003 1.5 1003 2.5
1003 2.5
1003 2.5
1003 ; run;

from the above I have to filter out the rows that have repeats of values at least 3 times

for the first category(1001), I need to flag the 5th row as that row got repeated with 1.5 with out any blank/any other value even though 1.5 is present on the first row(as it's not repetitive). similarly row no 10 should be flagged for 1002 as repeats are present at least 3 times there.

 

any help will be appriciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

So we need another flag which keeps track of repeats found:

data want;
retain flag2;
count = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  if first.col1 then flag2 = 1;
  count + 1;
end;
flag = (count ge 3) and flag2;
if flag then flag2 = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  output;
  flag = 0;
end;
drop count flag2;
run;

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User
data want;
count = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  count + 1;
end;
flag = (count ge 3);
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  output;
  flag = 0;
end;
drop count;
run;
  

Untested, posted from my tablet.

Astounding
PROC Star
Take a two-step approach:

data want;
set have;
by col1 col2 notsorted;
if first.col2 then count=1;
else count + 1;
if count=3;
found_at = _n_ - 2;
drop count;
run;

There are a variety of ways to combine WANT and HAVE to flag the proper observations (if you still need to do that).
sahoositaram555
Pyrite | Level 9

Thank you both. @Kurt_Bremser it worked, but I realized this code is flagging every 1st occurrence if there are multiple triplicate recs present per category of COL1, 

I'd really appreciate if you can help with an idea- if the triplicate repeats found in col2 first time per category of col1, then not to consider all other triplicate repeats present afterwords(and not to consider missing triplicates if coming on the way). 

Sorry, I should have mentioned about this condition, apologies, should have been clearer earlier. 

Kurt_Bremser
Super User

So we need another flag which keeps track of repeats found:

data want;
retain flag2;
count = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  if first.col1 then flag2 = 1;
  count + 1;
end;
flag = (count ge 3) and flag2;
if flag then flag2 = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  output;
  flag = 0;
end;
drop count flag2;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 407 views
  • 0 likes
  • 3 in conversation