Solved: How to identify/flag the first row of repetition(if repeats at least 3...

sahoositaram555 · Posted 08-15-2023 01:42 AM

Hi all,

i have a dataset as below

  data have;
        input col1 col2;
        datalines;
    1001 1.5
    1001 1
    1001 2.5
    1001 2.5
    1001 1.5
    1001 1.5
    1001 1.5
    1002 1.5
    1002 4.25
    1002 2.5
    1002 2.5
    1002 2.5
    1002 2.5
    1002 1.5
    1003 6.5
    1003 
    1003 1.5
    1003 1.5
    1003 2.5
    1003 2.5
    1003 2.5
    1003
    ;
    run;

from the above I have to filter out the rows that have repeats of values at least 3 times

for the first category(1001), I need to flag the 5th row as that row got repeated with 1.5 with out any blank/any other value even though 1.5 is present on the first row(as it's not repetitive). similarly row no 10 should be flagged for 1002 as repeats are present at least 3 times there.

any help will be appriciated.

Kurt_Bremser · Posted 08-15-2023 03:00 AM

So we need another flag which keeps track of repeats found:

data want;
retain flag2;
count = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  if first.col1 then flag2 = 1;
  count + 1;
end;
flag = (count ge 3) and flag2;
if flag then flag2 = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  output;
  flag = 0;
end;
drop count flag2;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 08-15-2023 02:13 AM

data want;
count = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  count + 1;
end;
flag = (count ge 3);
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  output;
  flag = 0;
end;
drop count;
run;

Untested, posted from my tablet.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 08-15-2023 02:19 AM

Take a two-step approach:

data want;
set have;
by col1 col2 notsorted;
if first.col2 then count=1;
else count + 1;
if count=3;
found_at = _n_ - 2;
drop count;
run;

There are a variety of ways to combine WANT and HAVE to flag the proper observations (if you still need to do that).

sahoositaram555 · Posted 08-15-2023 02:33 AM

Thank you both. @Kurt_Bremser it worked, but I realized this code is flagging every 1st occurrence if there are multiple triplicate recs present per category of COL1,

I'd really appreciate if you can help with an idea- if the triplicate repeats found in col2 first time per category of col1, then not to consider all other triplicate repeats present afterwords(and not to consider missing triplicates if coming on the way).

Sorry, I should have mentioned about this condition, apologies, should have been clearer earlier.

Kurt_Bremser · Posted 08-15-2023 03:00 AM

So we need another flag which keeps track of repeats found:

data want;
retain flag2;
count = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  if first.col1 then flag2 = 1;
  count + 1;
end;
flag = (count ge 3) and flag2;
if flag then flag2 = 0;
do until (last.col2);
  set have;
  by col1 col2 notsorted;
  output;
  flag = 0;
end;
drop count flag2;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Re: How to identify/flag the first row of repetition(if repeats at least 3 times)

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!