Hi all,
I have the following situation:
Id Variable1 Variable 2 Variable 3
A 33 45 21
B 13 18 25
And I like to know the order of the value of variable 1, for each row, compared with variable 2 and 3. The expected results will be:
Id Variable1 Variable 2 Variable 3 Order1 Order2 Order 3
A 33 45 21 2 1 3
B 13 18 25 1 2 3
Thats means, for Id "A", the variable 1 is the second largest compared with variable 2 and 3, so the new variable Order1 has value 2. The variable 2, has the greatest value, so Order2 is 1, etc.
Any suggestions?
You can also do it using a datastep. e.g.:
data have; input Id$ Variable1-Variable3; datalines; A 33 45 21 B 13 18 25 ; data want (drop=i); set have; array have(3) var:; array sorted(3) _temporary_; array order(3); do i=1 to 3; sorted(i)=have(i); end; call sortn(sorted(3),sorted(2),sorted(1)); do i=1 to 3; order(i)=whichn(sorted(i),of have(*)); end; run;
Art, CEO, AnalystFinder.com
This is very simple if you do a transpose of your original data, and then run PROC RANK to get the orders.
data have;
input Id$ Variable1-Variable3;
datalines;
A 33 45 21
B 13 18 25
;
proc transpose data=have out=have_long(rename=(_NAME_=variable COL1=value));
by ID;
run;
proc rank data=have_long out=want;
by ID;
var value;
ranks order;
run;
Neat suggestion by @PaigeMiller and demo by @PeterClemmensen, However you may need another transpose after proc rank step that is if we(the community) want to meet OP's requirement.
Thanks! That's correct, I'm looking for some solutions that not require applied more than one transpose.
Well @art297 the genius always has it slick before ordinary blokes like me could provide one.Take his and often read his posts I think hall of famers like him should pariticipate a bit slowly. lol hahahaha All the Best!
@MariaD wrote:
Thanks! That's correct, I'm looking for some solutions that not require applied more than one transpose.
It's not really clear why you don't want to transpose your data twice, it's relatively safe and quick (unless you have a HUGE data set); this is also much simpler (in my opinion) that the brilliant solution from @art297, and has the benefit that the algorithms used are thoroughly tested and debugged, and also has the benefit that you can specify how ties are handled. I usually advise people to use built-in solutions (e.g. SAS PROCs), than trying to write your own code and then confirming that you got it right and it handles all situations properly.
First at all, thanks so much all of you for greats ideas.
Yes, @PaigeMiller, I have a huge table (hundred of million of records) so I like to consider all possible options to choose the most optimal for it.
Well, @MariaD, having thought about it some more, I would think that a PROC would still handle HUGE datasets faster than data step code. But I admit I don't know that this is the right answer in your case. I guess the only way to find out is to try it.
@PaigeMiller: I have to disagree! Datastep code, in my experience, almost always works faster than a proc.
Since I'm only using SAS UE these days I can't really see the effects on a "large" dataset, but below is an example on just a 2 million record file (expanded from the original example).
As you can see, the whole process only took .45 seconds using datastep code, while just the initial proc transpose took 1.34 seconds.
Also, for comparison purposes, I included a conversion from wide to long using the %untranspose macro (http://www.sascommunity.org/wiki/An_Easier_and_Faster_Way_to_Untranspose_a_Wide_File .. which uses datastep code) that we'll be presenting at this year's SGF. As you'll see, if you run the test code, it does the untransposition task in just .16 seconds.
Art, CEO, AnalystFinder.com
/*time comparisons on a 2 million record file*/
data have (drop=i); input Variable1-Variable3; if _n_ eq 1 then do; do i=1 to 1000000; id=put(i,8.); output; end; end; else do; do i=1000001 to 2000000; id=put(i,8.); output; end; end; datalines; 33 45 21 13 18 25 ; proc transpose data=have out=need (rename=(Col1=value)); by id; var variable1-variable3; run; /* cpu time: 1.34 seconds */ /* using the %untranspose macro (http://www.sascommunity.org/wiki/An_Easier_and_Faster_Way_to_Untranspose_a_Wide_File )*/ %untranspose(data=have, out=need, var=variable, id=position,by=id) /* cpu time: 0.16 seconds */ data want (drop=i); set have; array have(3) var:; array sorted(3) _temporary_; array order(3); do i=1 to 3; sorted(i)=have(i); end; call sortn(sorted(3),sorted(2),sorted(1)); do i=1 to 3; order(i)=whichn(sorted(i),of have(*)); end; run; /* cpu time: 0.45 seconds */
That's good to know!
You can also do it using a datastep. e.g.:
data have; input Id$ Variable1-Variable3; datalines; A 33 45 21 B 13 18 25 ; data want (drop=i); set have; array have(3) var:; array sorted(3) _temporary_; array order(3); do i=1 to 3; sorted(i)=have(i); end; call sortn(sorted(3),sorted(2),sorted(1)); do i=1 to 3; order(i)=whichn(sorted(i),of have(*)); end; run;
Art, CEO, AnalystFinder.com
Here's a simple approach. It's nowhere near as flexible as @art297's program (since it only handles exactly 3 incoming variables). But it's simple enough that a junior programmer can understand it.
data want;
set have;
order1=2;
order2=2;
order3=2;
if variable1 = max(of variable1-variable3) then order1=1;
else if variable1 = min(of variable1-variable3) then order1=3;
if variable2 = max(of variable1-variable3) then order2=1;
else if variable2 = min(of variable1-variable3) then order2=3;
if variable3 = max(of variable1-variable3) then order3=1;
else if variable3 = min(of variable1-variable3) then order3=3;
run;
It is really easy for IML.
data have;
input Id$ Variable1-Variable3;
datalines;
A 33 45 21
B 13 18 25
;
proc iml;
use have nobs nobs;
read all var _num_ into x;
close;
rank=j(nrow(x),ncol(x));
do i=1 to nobs;
rank[i,]=ranktie(x[i,],'dense');
end;
create temp from rank;
append from rank;
close;
quit;
data want;
merge have temp;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.