BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
tonybesas
Obsidian | Level 7

I'm working a large datataset of linguistic data for a research project, and I'd like to select the n highest values for each observation, output those, turn the other values to zeros and save the results to a new dataset.

 

My dataset has over 210,000 observations and 110 variables.

 

For example:

 

 

data have ;
   input id A1 A2 A3 B1 B2 B3 C1 C2 D1 D2 ;
   datalines;
1 10  9  8  7  6  5  4  3  2  1 
2  0  0 10  9  8  7  6  5  4  3
3  0  0  0  0  6  5  4  3  0  0
;
run;

If I were to select the 5 highest values for each observation in the 'have' dataset, the new dataset would look like:

 

 

1 10 9 8 7 6 0 0 0 0 0 
2 0 0 10 9 8 7 6 0 0 0
3 0 0 0 0 6 5 4 3 0 0

 

But if I were to select the 3 highest values for each observation, the new dataset would look like:

 

1 10 9 8 0 0 0 0 0 0 0 
2 0 0 10 9 8 0 0 0 0 0
3 0 0 0 0 6 5 4 0 0 0

Thank you all ahead for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data have ;
   input id A1 A2 A3 B1 B2 B3 C1 C2 D1 D2 ;
   datalines;
1 10  9  8  7  6  5  4  3  2  1 
2  0  0 10  9  8  7  6  5  4  3
3  0  0  0  0  6  5  4  3  0  0
;
run;

%let top_n = 5;

proc transpose data=have out=long;
by id;
var A1--D2;
run;


proc sort data=long;
by id descending col1;
run;

data long_marked;
set long;
by id;
if first.id then counter=1;
else counter+1;
if counter> &top_n then col1=0;
run;

proc sort data=long_marked;
by id _name_;
run;

proc transpose data=long_marked out=want;
by id;
id _name_;
var col1;
run;

In general, storing your data in a long format may make this easier then. 

View solution in original post

7 REPLIES 7
Reeza
Super User
data want;
set have;

array have(*) A1--D2;
array top(5) top1-top5;

do i=1 to dim(top);
top(i) = largest(i, of have(*));
end;

do i=1 to dim(have);
if i<=dim(top) then have(i) = top(i);
else have(i) = 0;
end;

run;
tonybesas
Obsidian | Level 7

Hi Reeza, thank you for your reply!

 

This is giving the following output:

 

Obs	id	A1	A2	A3	B1	B2	B3	C1	C2	D1	D2
1	1	10	9	8	7	6	0	0	0	0	0
2	2	10	9	8	7	6	0	0	0	0	0
3	3	6	5	4	3	0	0	0	0	0	0

Which is different from the desired output:

 

 

1 10  9  8  7  6  0  0  0  0  0 
2  0  0 10  9  8  7  6  0  0  0
3  0  0  0  0  6  5  4  3  0  0

Because the values of vars A1 and A2 in observation 2 should be 0, not 10 and 9, the values of vars A1 and A2 in observation 3 should be both 0, and so on.  The top 5 values should be placed back in the same variables as they were in the 'have' dataset.

Reeza
Super User
data have ;
   input id A1 A2 A3 B1 B2 B3 C1 C2 D1 D2 ;
   datalines;
1 10  9  8  7  6  5  4  3  2  1 
2  0  0 10  9  8  7  6  5  4  3
3  0  0  0  0  6  5  4  3  0  0
;
run;

%let top_n = 5;

proc transpose data=have out=long;
by id;
var A1--D2;
run;


proc sort data=long;
by id descending col1;
run;

data long_marked;
set long;
by id;
if first.id then counter=1;
else counter+1;
if counter> &top_n then col1=0;
run;

proc sort data=long_marked;
by id _name_;
run;

proc transpose data=long_marked out=want;
by id;
id _name_;
var col1;
run;

In general, storing your data in a long format may make this easier then. 

tonybesas
Obsidian | Level 7

Hi Reeza, this is perfect! I was testing it on my full dataset and it worked really fast and really well!! 👏 Thank you very very much!

PeterClemmensen
Tourmaline | Level 20

Here is a single pass solution

 

data have ;
   input id A1 A2 A3 B1 B2 B3 C1 C2 D1 D2 ;
   datalines;
1 10  9  8  7  6  5  4  3  2  1 
2  0  0 10  9  8  7  6  5  4  3
3  0  0  0  0  6  5  4  3  0  0
;
run;

data want(drop = rc i val pos);

   if _N_ = 1 then do;
      dcl hash h(ordered : 'D', multidata : 'Y');
      h.definekey('val');
      h.definedata('val', 'pos');
      h.definedone();
      dcl hiter hi('h');
   end;

   set have;

   array a{*} a1 -- d2;

   do i = 1 to dim(a);
      val = a[i];
      pos = i;
      h.add();
      a[i] = 0;
   end;

   do i = 1 by 1 while (hi.next() = 0);
      a[pos] = val;
      if i = 3 then leave;
	end;

   rc = hi.last();
   rc = hi.next();
   rc = h.clear();

run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1361 views
  • 0 likes
  • 3 in conversation