I have the follwing table
ID | a1 | s2 | d3 | f4 | g5 |
a 1 | 0 | 2.56217 | 3.36783 | 3.53842 | 3.00214 |
s 2 | 2.56217 | 0 | 2.87413 | 4.1333 | 3.60389 |
d 3 | 3.36783 | 2.87413 | 0 | 2.46764 | 3.39893 |
f 4 | 3.53842 | 4.1333 | 2.46764 | 0 | 2.97952 |
g 5 | 3.00214 | 3.60389 | 3.39893 | 2.97952 | 0 |
I want to sort this table rowwise(or may be columnwise) but instead of the observation value I need the variable name as output, something like below
ID | id1 | id2 | id3 | id4 | id5 |
a 1 | a1 | s2 | g5 | d3 | f4 |
s 2 | s2 | a1 | d3 | g5 | f4 |
d 3 | d3 | f4 | s2 | a1 | g5 |
f 4 | f4 | d3 | g5 | a1 | s2 |
g 5 | g5 | f4 | a1 | d3 | s2 |
Some more details:
Considering only the first row of the table
ID | a1 | s2 | d3 | f4 | g5 |
a 1 | 0 | 2.56217 | 3.36783 | 3.53842 | 3.00214 |
Sorting It alonwith variable names
ID | a1 | s2 | d3 | f4 | g5 |
a 1 | 0 | 2.56217 | 3.36783 | 3.53842 | 3.00214 |
Resulting into
ID | a1 | s2 | g5 | d3 | f4 |
a 1 | 0 | 2.56217 | 3.00214 | 3.36783 | 3.53842 |
Now taking only the variable names
ID | |||||
a 1 | a1 | s2 | g5 | d3 | f4 |
Doing it for all the rows is my desired output
data have;
infile cards truncover expandtabs;
input ID $ a1 s2 d3 f4 g5;
cards;
a1 0 2.56217 3.36783 3.53842 3.00214
s2 2.56217 0 2.87413 4.1333 3.60389
d3 3.36783 2.87413 0 2.46764 3.39893
f4 3.53842 4.1333 2.46764 0 2.97952
g5 3.00214 3.60389 3.39893 2.97952 0
;
run;
proc transpose data=have out=temp;
by id notsorted;
run;
proc sort data=temp;
by id col1;
run;
proc transpose data=temp out=want(drop=_:) prefix=id;
by id;
var _name_;
run;
that's one weird puzzle.
why is d3 sometimes followed by f4 and sometimes by g5?
data have;
infile cards truncover expandtabs;
input ID $ a1 s2 d3 f4 g5;
cards;
a1 0 2.56217 3.36783 3.53842 3.00214
s2 2.56217 0 2.87413 4.1333 3.60389
d3 3.36783 2.87413 0 2.46764 3.39893
f4 3.53842 4.1333 2.46764 0 2.97952
g5 3.00214 3.60389 3.39893 2.97952 0
;
run;
proc transpose data=have out=temp;
by id notsorted;
run;
proc sort data=temp;
by id col1;
run;
proc transpose data=temp out=want(drop=_:) prefix=id;
by id;
var _name_;
run;
Wow ! Its done in just few lines and I was thinking about a loop. Thanks a lot !
Here is IML code , if you like it . It would be faster than PROC TRANSPOSE.
data have;
infile cards truncover expandtabs;
input ID $ a1 s2 d3 f4 g5;
cards;
a1 0 2.56217 3.36783 3.53842 3.00214
s2 2.56217 0 2.87413 4.1333 3.60389
d3 3.36783 2.87413 0 2.46764 3.39893
f4 3.53842 4.1333 2.46764 0 2.97952
g5 3.00214 3.60389 3.39893 2.97952 0
;
run;
proc iml;
use have;
read all var _num_ into x[c=vnames r=id];
close;
want=j(nrow(x),ncol(x),blankstr(nleng(vnames)));
do i=1 to nrow(x);
temp=t(x[i,]);
call sortndx(ndx,temp,1);
want[i,]=t(vnames[ndx]);
end;
names='id1':'id'+char(ncol(x));
create want from want[c=names r=id];
append from want[r=id];
close;
quit;
Actually I have big data of around 200,0000 rows. This step will be in between the whole logic and the imput would be a square matrix of 200,0000 rows & columns. So what do you suggest in this suggestion. Transpose or IML ? Also could you advice some tutorial/ documentation which can help me in solving such things on my own ...?
I would recommend to use IML, if you could make memory bigger by set option -memsize 20G in sasconfv9 file.
If you don't have enough memory , try the following code could give you more fast .
data have;
infile cards truncover expandtabs;
input ID $ a1 s2 d3 f4 g5;
cards;
a1 0 2.56217 3.36783 3.53842 3.00214
s2 2.56217 0 2.87413 4.1333 3.60389
d3 3.36783 2.87413 0 2.46764 3.39893
f4 3.53842 4.1333 2.46764 0 2.97952
g5 3.00214 3.60389 3.39893 2.97952 0
;
run;
data want;
set have;
array temp{5};
array have{5} a1 s2 d3 f4 g5;
array id_{5} $ 32;
do i=1 to dim(have);
temp{i}=have{i};
end;
call sortn(of temp{*});
do i=1 to dim(have);
id_{i}=vname(have{whichn(temp{i},of have{*})});
end;
drop i temp: a1 s2 d3 f4 g5;
run;
Hi, If there are duplicated value in an obs, my data step might not work for you .
Actually IML can handle BIG data .
data have;
infile cards truncover expandtabs;
input ID $ a1 s2 d3 f4 g5;
cards;
a1 0 2.56217 3.36783 3.53842 3.00214
s2 2.56217 0 2.87413 4.1333 3.60389
d3 3.36783 2.87413 0 2.46764 3.39893
f4 3.53842 4.1333 2.46764 0 2.97952
g5 3.00214 3.60389 3.39893 2.97952 0
a11 0 2.56217 3.36783 3.53842 3.00214
s12 2.56217 0 2.87413 4.1333 3.60389
d13 3.36783 2.87413 0 2.46764 3.39893
f14 3.53842 4.1333 2.46764 0 2.97952
g15 3.00214 3.60389 3.39893 2.97952 0
a21 0 2.56217 3.36783 3.53842 3.00214
s22 2.56217 0 2.87413 4.1333 3.60389
d23 3.36783 2.87413 0 2.46764 3.39893
f24 3.53842 4.1333 2.46764 0 2.97952
g25 3.00214 3.60389 3.39893 2.97952 0
;
run;
proc iml;
want=j(1,5,blankstr(32));
id=blankstr(32);
use have;
create want from want[c={ a1 s2 d3 f4 g5} r=id];
setin have;
setout want;
do data;
read next var _num_ into x[c=vnames r=id];
call sortndx(ndx,x`,1);
want=t(vnames[ndx]);
append from want[r=id];
end;
close have want;
quit;
Thankyou Xia Keshan. It was a great help. However, it will take good amount of time in testing these codes on actual data, currently I am just trying on Dummy data. I dont know how feasible it will be on actual data, if required I will make clusters to reduce the data. I will inform you which one finally works. I have used proc transpose but IML is new to me. I tried it once and the output was not coming in the form of SAS Table so I did not try the same. Is it possible to save the output from IML in SAS table ?
Yes. That is what I am trying to do. Use the following code to save output into a SAS Table WANT .
create want from want;
append from want;
close;
HaHa, I found the following code could handle duplicate value as long as there are no missing value .
data have;
infile cards truncover expandtabs;
input ID $ a1 s2 d3 f4 g5;
cards;
a1 0 2.56217 3.36783 3.53842 3.00214
s2 2.56217 0 2.87413 4.1333 3.60389
d3 3.36783 2.87413 0 2.46764 3.39893
f4 3.53842 4.1333 2.46764 0 2.97952
g5 3 3 3.39893 2.97952 0
;
run;
data want;
set have;
array have{5} a1 s2 d3 f4 g5;
array id_{5} $ 32;
do i=1 to dim(have);
min=min(of have{*});
idx=whichn(min,of have{*});
id_{i}=vname(have{idx});
have{idx}=.;
end;
drop idx min i a1 s2 d3 f4 g5;
run;
Make a macro variable to hold these variables name . OR you could try my IML code . data have; infile cards truncover expandtabs; input ID $ a1 s2 d3 f4 g5; cards; a1 0 2.56217 3.36783 3.53842 3.00214 s2 2.56217 0 2.87413 4.1333 3.60389 d3 3.36783 2.87413 0 2.46764 3.39893 f4 3.53842 4.1333 2.46764 0 2.97952 g5 3 3 3.39893 2.97952 0 ; run; proc transpose data=have(obs=0) out=temp; run; proc sql noprint; select count(*) into : n from temp; select _name_ into : list separated by ' ' from temp; quit; data want; set have; array have{&n} &list ; array id_{&n} $ 32; do i=1 to dim(have); min=min(of have{*}); idx=whichn(min,of have{*}); id_{i}=vname(have{idx}); have{idx}=.; end; keep id id_: ; run;
The double-dash syntax allows you to list all the variables in a table in the order they are stored.
array have{*} a1 -- g5;
data t;
a1=2; b2=3;z5=4; g5=1;
run;
data tt;
set t ;
array have{*} a1 -- g5;
put have[3]=;
run;
z5=4
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.