Hello,
I am attempting to add a new set of variables using an array.
I currently have a dataset that looks like this:
data have;
infile datalines4 dlm='|' missover dsd;
input id val q1_2018_units q1_2018_stdunits q2_2018_stdunits q3_2018_stdunits q4_2018_stdunits;
datalines;
1|11111|214252|214252|149115|78686543|68777|
2|11000|12|24|0|0|3665664|
3|122211|21252|21252|0|0|3656|
4|11331|214252|6677543|0|0|35535|
5|5555|2252|999999|14115|4234554|44566|
;
run;
I am trying to create a new set of variables that takes the quarter,year, val and stdunits to derive a new set of variables called q1_2018_computed all the way to q4_2018_computed:
q1_2018_computed= q1_2018_stdunits*100/val/365 and so on...
This would essentially add 4 new columns with this computation for each quarter that I can extend to multiple quarter/year combinations
Can I do this with an array and sql?
1. I was thinking to do sql to create a variable list; one for existing and one for new variables
2. Use the 2 arrays and do loop to compute new columns for each based on the formula above?
Would that work? I am unclear about creating and incorporating the new variables I want to add into my array and creating them without replacing the old ones.
Does the below suit your purpose:
data have;
infile datalines4 dlm='|' missover dsd;
input id val q1_2018_units q1_2018_stdunits q2_2018_stdunits q3_2018_stdunits q4_2018_stdunits q1_2019_stdunits q2_2019_stdunits q3_2019_stdunits q4_2019_stdunits;
datalines;
1|11111|214252|214252|149115|78686543|6777|214252|14115|7868543|6877|
2|11000|12|24|0|0|366664|24|0|0|365664|
3|122211|21252|21252|0|0|36786|21252|5436|0|656|
4|11331|214252|6677543|0|0|3535|667543|0|0|3535|
5|5555|2252|999999|14115|4234554|446566|9545999|14115|4534554|4445566|
;
run;
%macro test;
%macro _;
%mend _;
%local year j k;
proc sql noprint;
select min(year) as min_year,max(year) as max_year,count(year) as cnt
into: min_year TRIMMED, :max_year TRIMMED, :cnt TRIMMED
from
(
select distinct input(scan(name,2,"_"),best12.) as year
from dictionary.columns
where libname = 'WORK'
and memname = 'HAVE');
quit;
%put &=min_year &=max_year &=cnt;
data want;
set have;
array q_have{%sysevalf(&cnt.*4)} %do year = &min_year. %to &max_year.;
%do j = 1 %to 4;
q&j._&year._stdunits
%end;
%end;
;
array q_want{%sysevalf(&cnt.*4)} %do year = &min_year. %to &max_year.;
%do k = 1 %to 4;
q&k._&year._computed
%end;
%end;
;
do i = 1 to dim(q_have);
q_want{i} = q_have{i}*100/val/365;
end;
drop i;
run;
%mend;
%test
SQL does not allow arrays or loops. Why SQL?
proc sql to create a variable list first then use the &var_list to use in the array.
@A_Swoosh wrote:
proc sql to create a variable list first then use the &var_list to use in the array.
I don't understand why proc sql is required to create a variable list. What can it do in that regard that a data step can't do. And array are a well-defined and easy-to-use capability of data steps.
Does the below suit your purpose:
data have;
infile datalines4 dlm='|' missover dsd;
input id val q1_2018_units q1_2018_stdunits q2_2018_stdunits q3_2018_stdunits q4_2018_stdunits q1_2019_stdunits q2_2019_stdunits q3_2019_stdunits q4_2019_stdunits;
datalines;
1|11111|214252|214252|149115|78686543|6777|214252|14115|7868543|6877|
2|11000|12|24|0|0|366664|24|0|0|365664|
3|122211|21252|21252|0|0|36786|21252|5436|0|656|
4|11331|214252|6677543|0|0|3535|667543|0|0|3535|
5|5555|2252|999999|14115|4234554|446566|9545999|14115|4534554|4445566|
;
run;
%macro test;
%macro _;
%mend _;
%local year j k;
proc sql noprint;
select min(year) as min_year,max(year) as max_year,count(year) as cnt
into: min_year TRIMMED, :max_year TRIMMED, :cnt TRIMMED
from
(
select distinct input(scan(name,2,"_"),best12.) as year
from dictionary.columns
where libname = 'WORK'
and memname = 'HAVE');
quit;
%put &=min_year &=max_year &=cnt;
data want;
set have;
array q_have{%sysevalf(&cnt.*4)} %do year = &min_year. %to &max_year.;
%do j = 1 %to 4;
q&j._&year._stdunits
%end;
%end;
;
array q_want{%sysevalf(&cnt.*4)} %do year = &min_year. %to &max_year.;
%do k = 1 %to 4;
q&k._&year._computed
%end;
%end;
;
do i = 1 to dim(q_have);
q_want{i} = q_have{i}*100/val/365;
end;
drop i;
run;
%mend;
%test
You have just 4 variables per array. You don't need sql to create the list.
data want;
set have;
arraiy a {4} q1_2018_units q1_2018_stdunits q2_2018_stdunits q3_2018_stdunits q4_2018_stdunits;
array b {4) q1_2018_computed q2_2018_computed q3_2018_computed q4_2018_computed;
do i=1 to 4;
b(i) = a(i) *100/val/365;
end;
drop i;
run;
For this example, I have 4 but for my actual dataset I have multiple years and 4 quarters for each year which is why I wanted to create a list. The q1_2018 and so forth carry on from year to year, and I just want to add a suffix of _computed to each one with a computed value based on the formula.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.