DATA Step, Macro, Functions and more

how to dynamically remove records until 2 distinct values of a field are within the top 4 records

Reply
New Contributor
Posts: 3

how to dynamically remove records until 2 distinct values of a field are within the top 4 records

I have a data set layout that looks something like this:

 

ID       Category      Score

1          A                   100

2          A                   95

3          B                   80

4          A                   75

5          C                   70

6          B                   65

7          A                   60

 

The data set needs to be sorted by descending score. However, I have a requirement where after the fact the top 4 need to contain at least one "A" and at least one "C" category, so the resulting would end up like so:

 

ID       Category      Score

1          A                   100

2          A                   95

3          B                   80

5          C                   70

6          B                   65

7          A                   60

 

Does anyone have advice on how to tackle this? I'm thinking potentially someething with some counters and a retain statement but I'm not sure how to keep it going until the above stated criteria. 

 

Super User
Posts: 13,304

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Posted in reply to jhahn3814

What will you do if the actual data has no A category at all? or no C? or neither A or C?

 

I have to assume that by "top" you mean largest values of score (not explicitly stated and large is not always "top").

Super User
Posts: 6,629

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Posted in reply to jhahn3814

There must be more rules than that.  Why remove ID 4, rather than ID 2 or ID 3?

 

What if ID 4 was in Category B instead of A?  How would you know whether to remove ID 2 or ID 4?

 

What if ID 4 had received a score of 90 instead of a score of 75?  Could the only Category B observation (ID 3) be removed?

 

New Contributor
Posts: 3

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Posted in reply to jhahn3814
Assume they’re already sorted by highest score and I want the highest scores possible while still having one A and one C
New Contributor
Posts: 3

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Posted in reply to jhahn3814
If the criteria is never met then nothing will be output
PROC Star
Posts: 1,570

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Posted in reply to jhahn3814
data have;
input ID       Category   $   Score;
datalines;
1          A                   100
2          A                   95
3          B                   80
4          A                   75
5          C                   70
6          B                   65
7          A                   60
;


proc sort data=have  out=__have nodupkey;
by Category ;
where Category in ('A','C');
run;

proc sql;
   create table want as
   (select * from __have 
         intersect
         select * from have )
   outer union corr
   (select * from have 
         except
         select * from __have);
quit;
PROC Star
Posts: 1,570

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

[ Edited ]
Posted in reply to novinosrin
data have;
input ID       Category   $   Score;
datalines;
1          A                   100
2          A                   95
3          B                   80
4          A                   75
5          C                   70
6          B                   65
7          A                   60
;
data want(rename=(_id=id _score=score));
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset: 'have(where=(category in ("A","C"))',ordered:'d');
    h.definekey('Category');
   h.definedata('ID','Category', 'score');
   h.definedone();
    declare hiter iter('h');
	rc = iter.first();
do while (rc = 0);
_id=id;
_score=score;
      output;
   rc = iter.next();
end;
   end; 
 set have(rename=(id=_id score=_score)) end=last;
 rc=h.find();
 if rc ne 0 then output;
 else if rc=0 and id ne _id and score ne _score then output;
drop id score rc;
 run;
Esteemed Advisor
Posts: 5,479

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Posted in reply to jhahn3814

It could be done this way:

 

data want;
array _c(2) $1 ("A", "C");
retain w 4 found 0;
set have end=done;
if found >= dim(_c) then output;
else do;
    if category in _c then do;
        output;
        found = found + 1;
        _c{whichc(category, of _c{*})} = "-";
        w = w - 1;
        end;
    else if w > dim(_c) - found then do;
        output;
        w = w - 1;
        end;
    end;
if done then 
    if w > 0 then abort;
drop w found _c:;
run;
PG
Super User
Posts: 10,686

Re: how to dynamically remove records until 2 distinct values of a field are within the top 4 record

Posted in reply to jhahn3814

Very interesting Question.

 

data have;
input ID       Category   $   Score;
datalines;
1          A                   100
2          A                   95
3          B                   80
4          A                   75
5          C                   70
6          B                   65
7          A                   60
;
data temp;
 set have;
 retain A C;
 if Category='A' then A=1;
  else if Category='C' then C=1;
 output;
 if A and C then do;call symputx('n',_n_);stop;end;
run;

%macro xx;
%if &n <= 4 %then %do;
  data want;
   set have;
  run;
%end;
%else %do;
  data want;
   set have(obs=3) have(firstobs=&n);
  run;
%end;
%mend;

%xx
Ask a Question
Discussion stats
  • 8 replies
  • 142 views
  • 0 likes
  • 6 in conversation