BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
tonybesas
Obsidian | Level 7

I'm working a large datataset of linguistic data for a research project, and I'd like to select the n highest values for each observation, output those, turn the other values to zeros and save the results to a new dataset.

 

My dataset has over 210,000 observations and 110 variables.

 

For example:

 

 

data have ;
   input id A1 A2 A3 B1 B2 B3 C1 C2 D1 D2 ;
   datalines;
1 10  9  8  7  6  5  4  3  2  1 
2  0  0 10  9  8  7  6  5  4  3
3  0  0  0  0  6  5  4  3  0  0
;
run;

If I were to select the 5 highest values for each observation in the 'have' dataset, the new dataset would look like:

 

 

1 10 9 8 7 6 0 0 0 0 0 
2 0 0 10 9 8 7 6 0 0 0
3 0 0 0 0 6 5 4 3 0 0

 

But if I were to select the 3 highest values for each observation, the new dataset would look like:

 

1 10 9 8 0 0 0 0 0 0 0 
2 0 0 10 9 8 0 0 0 0 0
3 0 0 0 0 6 5 4 0 0 0

Thank you all ahead for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data have ;
   input id A1 A2 A3 B1 B2 B3 C1 C2 D1 D2 ;
   datalines;
1 10  9  8  7  6  5  4  3  2  1 
2  0  0 10  9  8  7  6  5  4  3
3  0  0  0  0  6  5  4  3  0  0
;
run;

%let top_n = 5;

proc transpose data=have out=long;
by id;
var A1--D2;
run;


proc sort data=long;
by id descending col1;
run;

data long_marked;
set long;
by id;
if first.id then counter=1;
else counter+1;
if counter> &top_n then col1=0;
run;

proc sort data=long_marked;
by id _name_;
run;

proc transpose data=long_marked out=want;
by id;
id _name_;
var col1;
run;

In general, storing your data in a long format may make this easier then. 

View solution in original post

7 REPLIES 7
Reeza
Super User
data want;
set have;

array have(*) A1--D2;
array top(5) top1-top5;

do i=1 to dim(top);
top(i) = largest(i, of have(*));
end;

do i=1 to dim(have);
if i<=dim(top) then have(i) = top(i);
else have(i) = 0;
end;

run;
tonybesas
Obsidian | Level 7

Hi Reeza, thank you for your reply!

 

This is giving the following output:

 

Obs	id	A1	A2	A3	B1	B2	B3	C1	C2	D1	D2
1	1	10	9	8	7	6	0	0	0	0	0
2	2	10	9	8	7	6	0	0	0	0	0
3	3	6	5	4	3	0	0	0	0	0	0

Which is different from the desired output:

 

 

1 10  9  8  7  6  0  0  0  0  0 
2  0  0 10  9  8  7  6  0  0  0
3  0  0  0  0  6  5  4  3  0  0

Because the values of vars A1 and A2 in observation 2 should be 0, not 10 and 9, the values of vars A1 and A2 in observation 3 should be both 0, and so on.  The top 5 values should be placed back in the same variables as they were in the 'have' dataset.

Reeza
Super User
data have ;
   input id A1 A2 A3 B1 B2 B3 C1 C2 D1 D2 ;
   datalines;
1 10  9  8  7  6  5  4  3  2  1 
2  0  0 10  9  8  7  6  5  4  3
3  0  0  0  0  6  5  4  3  0  0
;
run;

%let top_n = 5;

proc transpose data=have out=long;
by id;
var A1--D2;
run;


proc sort data=long;
by id descending col1;
run;

data long_marked;
set long;
by id;
if first.id then counter=1;
else counter+1;
if counter> &top_n then col1=0;
run;

proc sort data=long_marked;
by id _name_;
run;

proc transpose data=long_marked out=want;
by id;
id _name_;
var col1;
run;

In general, storing your data in a long format may make this easier then. 

tonybesas
Obsidian | Level 7

Hi Reeza, this is perfect! I was testing it on my full dataset and it worked really fast and really well!! 👏 Thank you very very much!

PeterClemmensen
Tourmaline | Level 20

Here is a single pass solution

 

data have ;
   input id A1 A2 A3 B1 B2 B3 C1 C2 D1 D2 ;
   datalines;
1 10  9  8  7  6  5  4  3  2  1 
2  0  0 10  9  8  7  6  5  4  3
3  0  0  0  0  6  5  4  3  0  0
;
run;

data want(drop = rc i val pos);

   if _N_ = 1 then do;
      dcl hash h(ordered : 'D', multidata : 'Y');
      h.definekey('val');
      h.definedata('val', 'pos');
      h.definedone();
      dcl hiter hi('h');
   end;

   set have;

   array a{*} a1 -- d2;

   do i = 1 to dim(a);
      val = a[i];
      pos = i;
      h.add();
      a[i] = 0;
   end;

   do i = 1 by 1 while (hi.next() = 0);
      a[pos] = val;
      if i = 3 then leave;
	end;

   rc = hi.last();
   rc = hi.next();
   rc = h.clear();

run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1666 views
  • 0 likes
  • 3 in conversation