Hi,
I have created a list of variables ending with months and years in the format costMMYYYY (cost12017 as an example). I need to rename the set of variables to cost_MMYYYY (cost_012017), however, there are a lot of them to do using a simple RENAME (4 years worth). I tried an array and do loop and my code is creating the correct var names, but the values are inaccurate. I was trying to set them equal to the original and was planning to then drop the original:
DATA want
SET have;
array old(12) cost12012 cost22012 cost32012 cost42012 cost52012 cost62012
72012 cost82012 cost92012 cost102012 cost112012 cost122012;
array new (12) cost_12012 cost_22012 cost_32012 cost_42012 cost_52012 cost_62012
cost_72012 cost_82012 cost_92012 cost_102012 cost_112012 cost_122012;
do i=1 to dim(old);
do j=1 to dim(new);
new(j)=old_hci(i);
end;
end;
RUN;
Any suggestions on how to either correct this array/loop OR another way to rename my vars?
Thank you!
You have received a lot of good advice here. However, just in case it is too complex given your current experience with SAS ...
Your program is close to working. You just need only a single DO loop not nested DO loops.
do i=1 to dim(new);
new{i} = old{i};
end;
That still leaves you with an extra set of variables in your data set. You might want to drop the old names, once you are certain that the program is working.
Also, if you insist on using months and years as part of the variable name (and you have been advised against it by now), it would be much easier if you put the year first: cost_201201
That way, the variable names will always be in order even if you have more than one year included in your data.
Let me introduce you to the magic of dictionary tables. They first appeared in SAS 6.07, referenced in my favourite manual: Technical Report P-222.
You can reference them in SQL, or as their SASHELP analogues (dictionary.columns becomes the view sashelp.vcolumn).
Here's how I made your code work - and it doesn't need arrays at all. The first step is just populating the source dataset - you won't need this, of course.
data have;
retain cost12012 cost22012 cost32012 cost42012 cost52012 cost62012
cost72012 cost82012 cost92012 cost102012 cost112012 cost122012;
call missing(of _numeric_);
stop;
run;
proc sql noprint;
select strip(name) || '=' || catx('_', substr(name, 1, 4), substr(name, 5))
into :new_vars separated by ' '
from dictionary.columns
where libname = 'WORK'
and memname = 'HAVE';
quit;
data want;
set have;
rename &new_vars;
run;
The SQL clause is creating a long macro variable (new_vars) with the variable renaming. The dataset does the actual renaming with the - well - rename command.
And if you don't want to replace your data, only rename the variables use PROC DATASETS instead of the final data step.
proc datasets library=LIBRARY nodetails nolist;
modify DATASETNAME;
rename &new_vars;
run;quit;
Any you have encountered one of the many reasons not to incorporate data in the variable name. For most analysis and reporting data works much better in the form of
Date and Value
with one record per date and value.
instead of ValueatDate1 ValueAtDate2 ValueAtDate3.
Your number of variables will continue to grow requiring any program doing analyis to keep adding variables and complexity of code. With the Date and Value approach the code stays the same and selection of time frames is done using the Date not variables.
For Reports then the date, often with appropriate format to group by year, month and year, quarter and year or even week and year, can form appropriate column headings, and without adding any additional variables.
You have received a lot of good advice here. However, just in case it is too complex given your current experience with SAS ...
Your program is close to working. You just need only a single DO loop not nested DO loops.
do i=1 to dim(new);
new{i} = old{i};
end;
That still leaves you with an extra set of variables in your data set. You might want to drop the old names, once you are certain that the program is working.
Also, if you insist on using months and years as part of the variable name (and you have been advised against it by now), it would be much easier if you put the year first: cost_201201
That way, the variable names will always be in order even if you have more than one year included in your data.
Don't use a data step but use proc datasets for this task as proposed by @Reeza
Reasons:
- renaming only requires processing of the descriptor part (the "header") of your SAS file. No need to iterate through all the data.
- renaming instead of replacing preserves all other variable attributes like formats and labels.
data work.have;
retain cost12012 cost22012 cost32012 cost42012 cost52012 cost62012
cost72012 cost82012 cost92012 cost102012 cost112012 cost122012;
call missing(of _numeric_);
stop;
run;
proc sql noprint;
select strip(name) || '=' || catx('_', substr(name, 1, 4), substr(name, 5))
into :new_vars separated by ' '
from dictionary.columns
where libname = 'WORK'
and memname = 'HAVE';
quit;
proc datasets library=work nodetails nolist;
modify have;
rename &new_vars;
run;
quit;
I endorse @Patrick's response. In my solution, I allowed for wanting a separate dataset with different variable names. But if you don't need that, proc datasets is far superior. It results in virtually no I/O: instantaneous, as it only modifies the dataset's metadata.
But reformatting your data to use by groups would be a better idea. It requires a little pre-parsing of the data beforehand, but it will reap efficiency and reliability rewards.
Below some sample code demonstrating how you could transpose your data from a wide to a long structure.
You can then always use Proc Transpose should you ever need a wide structure again.
data work.have;
retain cost12012 cost22012 cost32012 cost42012 cost52012 cost62012
cost72012 cost82012 cost92012 cost102012 cost112012 cost122012;
call missing(of _numeric_);
output;
stop;
run;
data want(drop=_i cost: rename=(_cost=cost));
set have;
attrib
_cost length=8
date length=8 format=mmyyn6.
;
array costs {*} cost:;
do _i=1 to dim(costs);
_cost=costs[_i];
if length(vname(costs[_i]))=9 then
do;
date=mdy(substrn(vname(costs[_i]),5,1),'01',substrn(vname(costs[_i]),6,4));
end;
else
do;
date=mdy(substrn(vname(costs[_i]),5,2),'01',substrn(vname(costs[_i]),7,4));
end;
output;
end;
run;
You could use the features of the RENAME data set option but you will need help from a somewhat obscure SAS system option.
DKRICOND=ERROR Option Definition Information for SAS Option DKRICOND Group= ERRORHANDLING Group Description: Error messages and error conditions settings Group= SASFILES Group Description: Library and member file information Description: Specifies the error level to report when a variable is missing from an input data set during the processing of a DROP=, KEEP=, or RENAME= data set option.
data have;
stop;
retain cost12012 cost22012 cost32012 cost42012 cost52012 cost62012
cost72012 cost82012 cost92012 cost102012 cost112012 cost122012;
call missing(of _numeric_);
run;
proc options option=dkricond define;
run;
options dkricond=nowarning;
data need;
set have(rename=(cost12012-cost122012=cost_12012-cost_122012));
run;
options dkricond=error;
proc contents;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.