I have a data set with baseball players and a lot of info on them:
Name | Team | Var1 | Val1 | Var2 | Val2 | Var3 | Val3 |
Allanson, Andy | Cleveland | At_bat | 293 | n_hits | 66 | n_home | 1 |
Ashby, Alan | Houston | At_bat | 315 | n_hits | 81 | n_home | 7 |
Davis, Alan | Seattle | At_bat | 479 | n_hits | 130 | n_home | 18 |
Dawson, Andre | Montreal | At_bat | 496 | n_hits | 141 | n_home | 20 |
Galarraga, Andres | Montreal | At_bat | 321 | n_hits | 87 | n_home | 10 |
I want to create a program that goes through all of the columns and renames Val_i to the value of Var_i and drops the Var_i columns:
Name | Team | At_bat | n_hits | n_home |
Allanson, Andy | Cleveland | 293 | 66 | 1 |
Ashby, Alan | Houston | 315 | 81 | 7 |
Davis, Alan | Seattle | 479 | 130 | 18 |
Dawson, Andre | Montreal | 496 | 141 | 20 |
Galarraga, Andres | Montreal | 321 | 87 | 10 |
Can anyone help?
data have;
length name $20 Team $20;
input Name $ Team $ Var1 $ Val1 Var2 $ Val2 Var3 $ Val3;
infile datalines dlm='|';
datalines;
Allanson, Andy|Cleveland|At_bat|293|n_hits|66|n_home|1
Ashby, Alan|Houston|At_bat|315|n_hits|81|n_home|7
Davis, Alan|Seattle|At_bat|479|n_hits|130|n_home|18
Dawson, Andre|Montreal|At_bat|496|n_hits|141|n_home|20
Galarraga, Andres|Montreal|At_bat|321|n_hits|87|n_home|10
;
data _null_;
set have(obs=1);
call execute('data have;set have;rename');
array vx(*) var:;
array vy(*) val:;
do i=1 to dim(vx);
call execute (vname(vy(i))||'='||vx(i)||' ');
end;
call execute(';drop var:;run;');
run;
How big is your data set?
If size is not an issue, then something like this will do
data have;
length name $20 Team $20;
input Name $ Team $ Var1 $ Val1 Var2 $ Val2 Var3 $ Val3;
infile datalines dlm='|';
datalines;
Allanson, Andy|Cleveland|At_bat|293|n_hits|66|n_home|1
Ashby, Alan|Houston|At_bat|315|n_hits|81|n_home|7
Davis, Alan|Seattle|At_bat|479|n_hits|130|n_home|18
Dawson, Andre|Montreal|At_bat|496|n_hits|141|n_home|20
Galarraga, Andres|Montreal|At_bat|321|n_hits|87|n_home|10
;
data _null_;
set have;
if _N_=1 then do;
call symputx('rename1', Var1);
call symputx('rename2', Var2);
call symputx('rename3', Var3);
end;
stop;
run;
data want;
set have(drop=Var:);
rename Val1=&rename1. Val2=&rename2. Val3=&rename3.;
run;
Thanks!
This works. It's a pain to put in for each of the 51 variables, but since the names are identical apart from the number, I used excel to write each line.
Is there no way to have a loop that does this? pseudo code: for N = 1 to 51 valN=&renameN
@SusanLash wrote:
I have a data set with baseball players and a lot of info on them:
Name Team Var1 Val1 Var2 Val2 Var3 Val3 Allanson, Andy Cleveland At_bat 293 n_hits 66 n_home 1 Ashby, Alan Houston At_bat 315 n_hits 81 n_home 7 Davis, Alan Seattle At_bat 479 n_hits 130 n_home 18 Dawson, Andre Montreal At_bat 496 n_hits 141 n_home 20 Galarraga, Andres Montreal At_bat 321 n_hits 87 n_home 10
I want to create a program that goes through all of the columns and renames Val_i to the value of Var_i and drops the Var_i columns:
Name Team At_bat n_hits n_home Allanson, Andy Cleveland 293 66 1 Ashby, Alan Houston 315 81 7 Davis, Alan Seattle 479 130 18 Dawson, Andre Montreal 496 141 20 Galarraga, Andres Montreal 321 87 10
Can anyone help?
If this data started as an external file then I would name the variables when reading them ans keep the correct variables.
Assuming the data started as CSV the code might look something like:
data example; infile "sometextfile" dlm=',' dsd; informat name $25. team $15. At_bat n_hits n_home best. dropvar $1.; input name team dropvar at_bats dropvar n_hits dropvar n_home ; drop dropvar; run;
If the data was in fixed columns then just read the columns needed and the dropvar wouldn't be needed.
data have;
length name $20 Team $20;
input Name $ Team $ Var1 $ Val1 Var2 $ Val2 Var3 $ Val3;
infile datalines dlm='|';
datalines;
Allanson, Andy|Cleveland|At_bat|293|n_hits|66|n_home|1
Ashby, Alan|Houston|At_bat|315|n_hits|81|n_home|7
Davis, Alan|Seattle|At_bat|479|n_hits|130|n_home|18
Dawson, Andre|Montreal|At_bat|496|n_hits|141|n_home|20
Galarraga, Andres|Montreal|At_bat|321|n_hits|87|n_home|10
;
data _null_;
set have(obs=1);
call execute('data have;set have;rename');
array vx(*) var:;
array vy(*) val:;
do i=1 to dim(vx);
call execute (vname(vy(i))||'='||vx(i)||' ');
end;
call execute(';drop var:;run;');
run;
Works perfectly and it's easy! Thank you.
Most welcome!
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.