Hallo,
I have a data set with observations for several years (2010,2011,2012). So there is a variable year and some variables with premiums and so on. Now I try to process the data set and create particular variables for each year. e.g.
if uwyear = 2012 then do;
premium12 = premium;
......
ELSE DO;
premium12 = 0;
END;
This leads to a lot of coding which I try to avoid. My idea was to process the assignment statements in a loop. But I am stuck on how to create the variable names depending on the index variable.
DO i = 2010 to 2012;
premium&i = premium;
END;
I tried a lot of combinations with rename and call symput.
Is there a way to this? Hope someone can help me.
Christoph
Ok... similar to before, use a PROC SQL to get the distinct years in the data, but instead create a list of variable names...
e.g.
proc sql ;
select distinct(cats('sale_',year(uwyear),'_rn')) into :VAR1 separated by ' ' ;
quit ;
%PUT &VAR1 ;
VAR1 resolves to
sale_10_rn sale_11_rn sale_12_rn
then use that to define the variables for an array, i.e. (assuming &MIN = 10, &MAX = 12)
array sale{&MIN,&MAX} &VAR1 ;
Hope that helps.
One solution is to use an array...
/* Simplest form, hard-coded years */
data new ;
set mydata ;
array py{2010:2012} premium2010-premium2012 ;
yr = year(uwyear) ;
py{yr} = premium ;
run ;
/* Dynamic, based on years within data, and initialize to zero */
proc sql ;
select min(year(uwyear)) into :Y1 from mydata ;
selex max(year(uwyear)) into :Y2 from mydata ;
quit ;
data new ;
set mydata ;
array py{&Y1:&Y2} premium&Y1-premium&Y2 ;
/* Set to zero */
do i = lbound(py) to hbound(py) ;
py{i} = 0 ;
end ;
py{year(uwyear)} = premium ;
run ;
Hi Chris,
Your reply was very helpful. Thanks. To avoid writing several array statements (in my case up to 13) I used a multidimensional array.
%let min = 11;
%let max = 13;
array arr{2,20&min:20&max} net_qu&min-net_qu&max sale_nb&min-sale;
arr{1,uwyear} = net_qu; |
arr{2,uwyear} = sale_nb;
So next year I just have to change my min and max values. 🙂
But there are still some variables that seems not that easy to handle. Here the year is in the middle of the variable. e.g. sale_11_rn . So I can't use a numbered range list. The people worked on this report before loved it to hard code everything. Maybe there is also a nice trick to avoid hard coding all the variables without change the names?
Christoph
Ok... similar to before, use a PROC SQL to get the distinct years in the data, but instead create a list of variable names...
e.g.
proc sql ;
select distinct(cats('sale_',year(uwyear),'_rn')) into :VAR1 separated by ' ' ;
quit ;
%PUT &VAR1 ;
VAR1 resolves to
sale_10_rn sale_11_rn sale_12_rn
then use that to define the variables for an array, i.e. (assuming &MIN = 10, &MAX = 12)
array sale{&MIN,&MAX} &VAR1 ;
Hope that helps.
It may be a pain but it could pay lots of dividends to change naming systems like sale_11_rn sale_12_rn to sale_rn_11 sale_rn_12. Every time I've gotten involved with a middle of variable scheme with chaning numbers it has lead to lots of headaches.
I'm also paranoid having had to deal with Y2K coding and would recommend using 4 digit year naming so there is no confusion in the inevitable stem_11_12 type named variable.
Sorry for my late response.
@chris:
Thanks. The Proc sql solution helped a lot.
@ballardw:
Of Course to change the variable names would be the best solution. But due to a lot of dependencies changing names is not that easy in this case.
Have a good weekend.
Christoph
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.