I have a data set with baseball players and a lot of info on them:
Name | Team | Var1 | Val1 | Var2 | Val2 | Var3 | Val3 |
Allanson, Andy | Cleveland | At_bat | 293 | n_hits | 66 | n_home | 1 |
Ashby, Alan | Houston | At_bat | 315 | n_hits | 81 | n_home | 7 |
Davis, Alan | Seattle | At_bat | 479 | n_hits | 130 | n_home | 18 |
Dawson, Andre | Montreal | At_bat | 496 | n_hits | 141 | n_home | 20 |
Galarraga, Andres | Montreal | At_bat | 321 | n_hits | 87 | n_home | 10 |
I want to create a program that goes through all of the columns and renames Val_i to the value of Var_i and drops the Var_i columns:
Name | Team | At_bat | n_hits | n_home |
Allanson, Andy | Cleveland | 293 | 66 | 1 |
Ashby, Alan | Houston | 315 | 81 | 7 |
Davis, Alan | Seattle | 479 | 130 | 18 |
Dawson, Andre | Montreal | 496 | 141 | 20 |
Galarraga, Andres | Montreal | 321 | 87 | 10 |
Can anyone help?
data have;
length name $20 Team $20;
input Name $ Team $ Var1 $ Val1 Var2 $ Val2 Var3 $ Val3;
infile datalines dlm='|';
datalines;
Allanson, Andy|Cleveland|At_bat|293|n_hits|66|n_home|1
Ashby, Alan|Houston|At_bat|315|n_hits|81|n_home|7
Davis, Alan|Seattle|At_bat|479|n_hits|130|n_home|18
Dawson, Andre|Montreal|At_bat|496|n_hits|141|n_home|20
Galarraga, Andres|Montreal|At_bat|321|n_hits|87|n_home|10
;
data _null_;
set have(obs=1);
call execute('data have;set have;rename');
array vx(*) var:;
array vy(*) val:;
do i=1 to dim(vx);
call execute (vname(vy(i))||'='||vx(i)||' ');
end;
call execute(';drop var:;run;');
run;
How big is your data set?
If size is not an issue, then something like this will do
data have;
length name $20 Team $20;
input Name $ Team $ Var1 $ Val1 Var2 $ Val2 Var3 $ Val3;
infile datalines dlm='|';
datalines;
Allanson, Andy|Cleveland|At_bat|293|n_hits|66|n_home|1
Ashby, Alan|Houston|At_bat|315|n_hits|81|n_home|7
Davis, Alan|Seattle|At_bat|479|n_hits|130|n_home|18
Dawson, Andre|Montreal|At_bat|496|n_hits|141|n_home|20
Galarraga, Andres|Montreal|At_bat|321|n_hits|87|n_home|10
;
data _null_;
set have;
if _N_=1 then do;
call symputx('rename1', Var1);
call symputx('rename2', Var2);
call symputx('rename3', Var3);
end;
stop;
run;
data want;
set have(drop=Var:);
rename Val1=&rename1. Val2=&rename2. Val3=&rename3.;
run;
Thanks!
This works. It's a pain to put in for each of the 51 variables, but since the names are identical apart from the number, I used excel to write each line.
Is there no way to have a loop that does this? pseudo code: for N = 1 to 51 valN=&renameN
@SusanLash wrote:
I have a data set with baseball players and a lot of info on them:
Name Team Var1 Val1 Var2 Val2 Var3 Val3 Allanson, Andy Cleveland At_bat 293 n_hits 66 n_home 1 Ashby, Alan Houston At_bat 315 n_hits 81 n_home 7 Davis, Alan Seattle At_bat 479 n_hits 130 n_home 18 Dawson, Andre Montreal At_bat 496 n_hits 141 n_home 20 Galarraga, Andres Montreal At_bat 321 n_hits 87 n_home 10
I want to create a program that goes through all of the columns and renames Val_i to the value of Var_i and drops the Var_i columns:
Name Team At_bat n_hits n_home Allanson, Andy Cleveland 293 66 1 Ashby, Alan Houston 315 81 7 Davis, Alan Seattle 479 130 18 Dawson, Andre Montreal 496 141 20 Galarraga, Andres Montreal 321 87 10
Can anyone help?
If this data started as an external file then I would name the variables when reading them ans keep the correct variables.
Assuming the data started as CSV the code might look something like:
data example; infile "sometextfile" dlm=',' dsd; informat name $25. team $15. At_bat n_hits n_home best. dropvar $1.; input name team dropvar at_bats dropvar n_hits dropvar n_home ; drop dropvar; run;
If the data was in fixed columns then just read the columns needed and the dropvar wouldn't be needed.
data have;
length name $20 Team $20;
input Name $ Team $ Var1 $ Val1 Var2 $ Val2 Var3 $ Val3;
infile datalines dlm='|';
datalines;
Allanson, Andy|Cleveland|At_bat|293|n_hits|66|n_home|1
Ashby, Alan|Houston|At_bat|315|n_hits|81|n_home|7
Davis, Alan|Seattle|At_bat|479|n_hits|130|n_home|18
Dawson, Andre|Montreal|At_bat|496|n_hits|141|n_home|20
Galarraga, Andres|Montreal|At_bat|321|n_hits|87|n_home|10
;
data _null_;
set have(obs=1);
call execute('data have;set have;rename');
array vx(*) var:;
array vy(*) val:;
do i=1 to dim(vx);
call execute (vname(vy(i))||'='||vx(i)||' ');
end;
call execute(';drop var:;run;');
run;
Works perfectly and it's easy! Thank you.
Most welcome!
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.