So i have a large dataset with around 2m rows and I want to join 30 of the columns and add a new variable for each result. I'm not great at describing data so here is what I have and what i want
Below is an example dataset, keep in mind the real dataset has results1-30
data have; input subject $ group $ result1 $ result2 $ result3 $; datalines; s1 a bl ye pu s1 b re gr s2 a bl re s3 a ye pu re ; data codes; input code $ color $; datalines; re red bl blue gr green pu purple ye yellow
;
i want the dataset to look like this
subject group result1 result2 result3 color1 color2 color3 s1 a bl ye pu blue yellow purple s1 b re gr red green s2 a bl re blue red s3 a ye pu re yellow purple red ;
Is there a simple way to join multiple columns in one step? I can make 30 proc sqls but i feel like that would be incredibly inefficient. Any help would be appreciated, Thanks!
data have;
input subject $ group $ result1 $ result2 $ result3 $;
datalines;
s1 a bl ye pu
s1 b re gr
s2 a bl re
s3 a ye pu re
;
data codes;
input code $ color $;
datalines;
re red
bl blue
gr green
pu purple
ye yellow
;
data format;
set codes(rename=(code=start color=label));
retain fmtname 'fmt' type 'C';
run;
proc format cntlin=format;run;
data want;
set have;
array x{*} $ 40 result1-result3;
array y{*} $ 40 color1-color3;
do i=1 to dim(x);
y{i}=put(x{i},$fmt.);
end;
drop i;
run;
Does it really have to be adding 30 new columns to a data set? Or can you simply apply a custom format, and that will suffice?
proc format;
value $colorf 're'='red' 'bl'='blue' 'gr'='green'
'pu'='purple' 'ye'='yellow';
run;
proc datasets noprint library=work;
modify have;
format result1-result30 $colorf.;
run;
Why is any of this worth the effort? What can you do with 'red' that you can't do with 're' ??
Depending on what analysis you might do with your data, you can perform the analysis with the formats, and another analysis without the formats. I realize that may not work in all situations. What analysis will you be doing with these codes?
data have;
input subject $ group $ result1 $ result2 $ result3 $;
datalines;
s1 a bl ye pu
s1 b re gr
s2 a bl re
s3 a ye pu re
;
data codes;
input code $ color $;
datalines;
re red
bl blue
gr green
pu purple
ye yellow
;
data format;
set codes(rename=(code=start color=label));
retain fmtname 'fmt' type 'C';
run;
proc format cntlin=format;run;
data want;
set have;
array x{*} $ 40 result1-result3;
array y{*} $ 40 color1-color3;
do i=1 to dim(x);
y{i}=put(x{i},$fmt.);
end;
drop i;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.