Dear all, I have a dataset which looks like this
CODE | cluster_id | a1 | a2 | a3 | a4 | a5 |
a1 | 1 | 0 | 0.19516 | 0.19242 | 0.24021 | 0.32064 |
a2 | 2 | 0.19516 | 0 | 0.22981 | 0.30144 | 0.31116 |
a3 | 2 | 0.19242 | 0.22981 | 0 | 0.10592 | 0.11456 |
a4 | 3 | 0.24021 | 0.30144 | 0.10592 | 0 | 0.16124 |
a5 | 3 | 0.32064 | 0.31116 | 0.11456 | 0.16124 | 0 |
I would like to transpose this so that my final dataset looks like below
CODE | CODE_n | cluster_id | cluster_idn | dist |
a1 | a1 | 1 | 1 | 0 |
a1 | a2 | 1 | 2 | 0.19516 |
a1 | a3 | 1 | 2 | 0.19242 |
a1 | a4 | 1 | 3 | 0.32064 |
and so on .....
Any help is appreciated many thanks!!
Is there a reason that the code_n value of a4 in your data is associated with the value of a5 in the dist variable? or should that have had a dist value of 0.24021?
And what is the logic for assigning cluster_idn?
At first glance it looks like you want to take the first row of data and create one row for each of a1 through a5 but I'm not sure your output result displayed actually matches what you want. You may want to post a few more rows of output as your other rows may need something else.
I think you wand something like this:
data have;
input CODE $ cluster_id a1 a2 a3 a4 a5;
datalines;
a1 1 0 0.19516 0.19242 0.24021 0.32064
a2 2 0.19516 0 0.22981 0.30144 0.31116
a3 2 0.19242 0.22981 0 0.10592 0.11456
a4 3 0.24021 0.30144 0.10592 0 0.16124
a5 3 0.32064 0.31116 0.11456 0.16124 0
;
proc transpose data=have out=haveLong;
by CODE cluster_id notsorted;
run;
/* Add the cluster_idn matching CODE_n */
proc sql;
create table want as
select
hl.CODE,
_name_ as CODE_n,
hl.cluster_id,
n.cluster_id as cluster_idn,
COL1 as dist
from
haveLong as hl inner join
have as n on hl._name_=n.CODE
order by hl.CODE, CODE_n;
select * from want;
quit;
PG
data have;
input CODE $ cluster_id a1 a2 a3 a4 a5;
datalines;
a1 1 0 0.19516 0.19242 0.24021 0.32064
a2 2 0.19516 0 0.22981 0.30144 0.31116
a3 2 0.19242 0.22981 0 0.10592 0.11456
a4 3 0.24021 0.30144 0.10592 0 0.16124
a5 3 0.32064 0.31116 0.11456 0.16124 0
;
run;
data want;
do i=1 by 1 until (last);
set have end=last;
array temp x1-x5 ;
temp(i)=cluster_id;
end;
do point=1 to nobs;
set have point=point nobs=nobs;
array vars a1-a5;
do i=1 to dim(vars);
retain vars;
retain cluster_id;
code_n=vname(vars(i));
cluster_idn=temp(i);
dist=vars(i);
output;
end;
end;
drop x: i a:;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.