I am trying to create 5 variables each for a bunch of groups.
I am trying to do so with a do loop as follows:
data new_data; set sasave.scorecard_vars; do i = 1 to 5; age_i_perc = age_i / sum(of age_1-age_5); marks_i_perc = marks_i / sum(of marks_1-marks_5); ............................. and so on end; run;
How can I reference the value of i into the variable name so that the variables will be created as age_1_perc, age_2_perc and so on?
Use arrays, like this:
data new_data;
set sasave.scorecard_vars;
array age age_1-age_5 ;
array marks marks_1-marks_5 ;
array age_perc age_perc_1-age_perc_5;
array marks_perc marks_perc_1-marks_perc_5;
do i = 1 to 5;
age_perc{i} = age{i} / sum(of age{*});
marks_perc{i} = marks{i} / sum(of marks{*});
............................. and so on
end;
run;
Use arrays, like this:
data new_data;
set sasave.scorecard_vars;
array age age_1-age_5 ;
array marks marks_1-marks_5 ;
array age_perc age_perc_1-age_perc_5;
array marks_perc marks_perc_1-marks_perc_5;
do i = 1 to 5;
age_perc{i} = age{i} / sum(of age{*});
marks_perc{i} = marks{i} / sum(of marks{*});
............................. and so on
end;
run;
The columns age_1 to age_5 and marks_1 to marks_5 already exist in the table. Would I still need to make arrays for them?
Restructuring your data and using PROC FREQ is easier, more dynamic and scales better.
Your data needs to be in a long format and proc freq can report many different levels of percentages.
Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
Yes, an array simply defines a name equivalence. Referring to age{3} is the same as referring to age_3, for the duration of the data step.
Thanks for the clarification! However, there is one small issue that I noticed when I looked at my data. The column names are actually age_1_sum to age_5_sum and so on. How would I add those into the arrays since the variable names don't end with the numbers?
@aalluru wrote:
Thanks for the clarification! However, there is one small issue that I noticed when I looked at my data. The column names are actually age_1_sum to age_5_sum and so on. How would I add those into the arrays since the variable names don't end with the numbers?
Just list the actual variable names in the ARRAY statement.
But you should consider changing your naming convention to place the numeric suffix at the end of the name instead of inserting it in the middle. Then you can use variable lists to make your code shorter and easier to create.
Yes that worked!! Yeah I thought of that but these names were autogenerated by a macro that the company uses so I couldn't change the variable names. Thank you so much clearing my doubts!
@aalluru wrote:
Yes that worked!! Yeah I thought of that but these names were autogenerated by a macro that the company uses so I couldn't change the variable names. Thank you so much clearing my doubts!
Since Proc Means and/or Summary will do sums and using the /autoname feature for the output statement create names like that I really hope someone didn't actually go and write a macro to do such.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.