Re: how to dynamically remove records until 2 distinct values of a fie...

jhahn3814 · Posted 03-07-2018 03:59 PM

I have a data set layout that looks something like this:

ID Category Score

1 A 100

2 A 95

3 B 80

4 A 75

5 C 70

6 B 65

7 A 60

The data set needs to be sorted by descending score. However, I have a requirement where after the fact the top 4 need to contain at least one "A" and at least one "C" category, so the resulting would end up like so:

ID Category Score

1 A 100

2 A 95

3 B 80

5 C 70

6 B 65

7 A 60

Does anyone have advice on how to tackle this? I'm thinking potentially someething with some counters and a retain statement but I'm not sure how to keep it going until the above stated criteria.

ballardw · Posted 03-07-2018 04:06 PM

What will you do if the actual data has no A category at all? or no C? or neither A or C?

I have to assume that by "top" you mean largest values of score (not explicitly stated and large is not always "top").

Astounding · Posted 03-07-2018 04:12 PM

There must be more rules than that. Why remove ID 4, rather than ID 2 or ID 3?

What if ID 4 was in Category B instead of A? How would you know whether to remove ID 2 or ID 4?

What if ID 4 had received a score of 90 instead of a score of 75? Could the only Category B observation (ID 3) be removed?

jhahn3814 · Posted 03-07-2018 04:19 PM

Assume they’re already sorted by highest score and I want the highest scores possible while still having one A and one C

jhahn3814 · Posted 03-07-2018 04:20 PM

If the criteria is never met then nothing will be output

novinosrin · Posted 03-07-2018 04:35 PM

data have;
input ID       Category   $   Score;
datalines;
1          A                   100
2          A                   95
3          B                   80
4          A                   75
5          C                   70
6          B                   65
7          A                   60
;


proc sort data=have  out=__have nodupkey;
by Category ;
where Category in ('A','C');
run;

proc sql;
   create table want as
   (select * from __have 
         intersect
         select * from have )
   outer union corr
   (select * from have 
         except
         select * from __have);
quit;

novinosrin · Posted 03-07-2018 05:57 PM

data have;
input ID       Category   $   Score;
datalines;
1          A                   100
2          A                   95
3          B                   80
4          A                   75
5          C                   70
6          B                   65
7          A                   60
;
data want(rename=(_id=id _score=score));
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset: 'have(where=(category in ("A","C"))',ordered:'d');
    h.definekey('Category');
   h.definedata('ID','Category', 'score');
   h.definedone();
    declare hiter iter('h');
	rc = iter.first();
do while (rc = 0);
_id=id;
_score=score;
      output;
   rc = iter.next();
end;
   end; 
 set have(rename=(id=_id score=_score)) end=last;
 rc=h.find();
 if rc ne 0 then output;
 else if rc=0 and id ne _id and score ne _score then output;
drop id score rc;
 run;

PGStats · Posted 03-08-2018 12:01 AM

It could be done this way:

data want;
array _c(2) $1 ("A", "C");
retain w 4 found 0;
set have end=done;
if found >= dim(_c) then output;
else do;
    if category in _c then do;
        output;
        found = found + 1;
        _c{whichc(category, of _c{*})} = "-";
        w = w - 1;
        end;
    else if w > dim(_c) - found then do;
        output;
        w = w - 1;
        end;
    end;
if done then 
    if w > 0 then abort;
drop w found _c:;
run;

PG

Ksharp · Posted 03-08-2018 09:34 AM

Very interesting Question.

data have;
input ID       Category   $   Score;
datalines;
1          A                   100
2          A                   95
3          B                   80
4          A                   75
5          C                   70
6          B                   65
7          A                   60
;
data temp;
 set have;
 retain A C;
 if Category='A' then A=1;
  else if Category='C' then C=1;
 output;
 if A and C then do;call symputx('n',_n_);stop;end;
run;

%macro xx;
%if &n <= 4 %then %do;
  data want;
   set have;
  run;
%end;
%else %do;
  data want;
   set have(obs=3) have(firstobs=&n);
  run;
%end;
%mend;

%xx

how to dynamically remove records until 2 distinct values of a field are within the top 4 records

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away