data mytable;
input name $ 1_1 2_1 3_1 4_1 5_1 6_1 7_1 8_1 9_1 10_1 11_1 12_1 13_1 14_1 15_1 16_1 17_1 18 19 20_1 21_1 22_1 23_1 24_1 25_1 26;
datalines;
John 0 1 3 5 7 2 4 7 6 8 3 4 5 8 9 10 3 4 5 1 3 4 6 32 1
Mary 32 3 5 7 2 4 7 6 8 3 4 39 8 9 10 39 8 9 10 39 8 9 10
Peter 9 10 3 4 5 1 3 4 41
Sarah 28 8 3 4 5 8 0 1 3 5 7 2
Chris 19 9 10 3 4 5 1 0 1 3 5 7 2 3
Jane 36 32 3 5 7 2 4 7 6 8 5 1 3 4 6 32
Bob 47 0 1 4 7 11 20
Alice 22
David 39 8 9 10 3 4 5 4 5 8 0 1 3 8 5 1 3 4 6 32
;
run;
Hi,
I want to create variables such as:
"0_4ans_H"n=sum(of "1_1"n-"4_1"n); "
5_14ans_H"n=sum(of "5_1"n-"14_1"n);
"15_24ans_H"n=sum(of "15_1"n-"24_1"n);
Here is my code:
/*Age classes: Male*/
array var_0_4 {"0_1"n:"4_1"n} _numeric_;
"0_4ans_H"n=sum(of _numeric_var_0_4 [*]);
"5_14ans_H"n=sum(of "5_1"n-"14_1"n);
"15_24ans_H"n=sum(of "15_1"n-"24_1"n);
Note that the table columns are named 1, 2, 3, 4, 1_1, 2_1, 3_1, 4_1, etc.
I just took part of the table in occurrence from 1_.... For part 1, 2, 3, 4, etc. I was able to calculate the range sum by this formula: "0_4ans_F"n=sum(of "0"n-"4"n);
Which works but with 1_1, 2_1, etc. I can't find a solution.
Can anyone help me please. A SAS/SQL code proposal will be welcome.
THANKS
Gick
1) use the Maxim 1 and got to Documentation: Name Range Lists
2) do it like this:
options validvarname=any;
data mytable;
infile cards missover;
input name $ "1_1"n "2_1"n "3_1"n "4_1"n "5_1"n "6_1"n "7_1"n "8_1"n "9_1"n "10_1"n "11_1"n "12_1"n "13_1"n "14_1"n "15_1"n "16_1"n "17_1"n "18"n "19"n "20_1"n "21_1"n "22_1"n "23_1"n "24_1"n "25_1"n "26_1"n
;
datalines;
John 0 1 3 5 7 2 4 7 6 8 3 4 5 8 9 10 3 4 5 1 3 4 6 32 1
Mary 32 3 5 7 2 4 7 6 8 3 4 39 8 9 10 39 8 9 10 39 8 9 10
Peter 9 10 3 4 5 1 3 4 41
Sarah 28 8 3 4 5 8 0 1 3 5 7 2
Chris 19 9 10 3 4 5 1 0 1 3 5 7 2 3
Jane 36 32 3 5 7 2 4 7 6 8 5 1 3 4 6 32
Bob 47 0 1 4 7 11 20
Alice 22
David 39 8 9 10 3 4 5 4 5 8 0 1 3 8 5 1 3 4 6 32
;
run;
proc print;
run;
data want;
set mytable;
"0_4ans_H"n = sum(of "1_1"n -numeric- "4_1"n);
"5_14ans_H"n = sum(of "5_1"n -numeric- "14_1"n);
"15_24ans_H"n = sum(of "15_1"n -numeric- "24_1"n);
run;
proc print;
run;
Bart
SAS variable names cannot start with a number so you will have to add at least one alpha character on the front, like V (for variable) or other character of your choosing.
EDIT: Setting the SAS option VALIDVARNAME to ANY will allow you specify the names you want as long as you specify all such names in quotes with a trailing letter N:
options validvarname = any;
data test;
"1_1"n = 0;
run;
Your naming convention will require a LOT of extra typing however.
I suspect any problem that some attempts to solve with arrays is going to drive them nuts with the amount of code required to accomplish similar actions in SQL. SQL does not allow any list arguments.
Also you have a bit of misunderstanding on how you define an array. The subscript portion, that controls the values of of the index of the array must be literal integer values, not variable names.
So in the following statement the highlighted text is an error.
array var_0_4 {"0_1"n:"4_1"n} _numeric_;
Second when you provide the _numeric_ then you have told SAS to use ALL of the numeric variables in the data set.
The proper array statement fro that attempt would be:
Array var_0_4 (*) "1_1"n "2_1"n "3_1"n "4_1"n ;
The use of variable names with the same suffix (the ending _1) means that you cannot use any of the list building options that would be be possible using a numeric suffix that changed like A1 - A4 where variable lists are allowed.
From the example data shown I see no reason to even use the 1_1 2_1 etc names. It might make sense if you also had 1_2 1_3 but without those why not use just 1 and 2?????
And this makes no sense at all: "
Note that the table columns are named 1, 2, 3, 4, 1_1, 2_1, 3_1, 4_1, etc.
I just took part of the table in occurrence from 1_.... For part 1, 2, 3, 4, etc. I was able to calculate the range sum by this formula: "0_4ans_F"n=sum(of "0"n-"4"n); "
Your example data does not show any "columns name 1 2 3 or 4. So where did they come from.
Your data step has other issues in that you are reading way more values than appear in the datalines which causes multiple errors.
IF you have variables that should be linked conceptually as a group then typically name the "group" with common prefix and then differentiate the members with a suffix: V1 V2 V3, A1 A2 A3, not 1_v 2_v 3_v or 1_a 2_a 3_a.
Then you can use lists like V1-V3 directly without having to create arrays (unless there are other things that need arrays)
data example ; input v1 - v3 a1-a4; v1_v3 = sum(of v1-v3); a1_a4 = sum(of a1-a4); datalines; 1 2 3 44 55 66 77 ;
As a restart I would go back to when you brought this data into SAS. I suspect probable use of proc import with a poor data layout. But if you set:
Options validvarname=V7; before that step at least it won't create all of those terribly verbose name literals. Better would be to write a data step to read the data in sensible form.
If this "
/*Age classes: Male*/
array var_0_4 {"0_1"n:"4_1"n} _numeric_;"
implies that certain columns are supposed to relate only to male subjects, with a different set of variables implied for females then your data is terribly screwy and probably needs to be completely restructured.
1) use the Maxim 1 and got to Documentation: Name Range Lists
2) do it like this:
options validvarname=any;
data mytable;
infile cards missover;
input name $ "1_1"n "2_1"n "3_1"n "4_1"n "5_1"n "6_1"n "7_1"n "8_1"n "9_1"n "10_1"n "11_1"n "12_1"n "13_1"n "14_1"n "15_1"n "16_1"n "17_1"n "18"n "19"n "20_1"n "21_1"n "22_1"n "23_1"n "24_1"n "25_1"n "26_1"n
;
datalines;
John 0 1 3 5 7 2 4 7 6 8 3 4 5 8 9 10 3 4 5 1 3 4 6 32 1
Mary 32 3 5 7 2 4 7 6 8 3 4 39 8 9 10 39 8 9 10 39 8 9 10
Peter 9 10 3 4 5 1 3 4 41
Sarah 28 8 3 4 5 8 0 1 3 5 7 2
Chris 19 9 10 3 4 5 1 0 1 3 5 7 2 3
Jane 36 32 3 5 7 2 4 7 6 8 5 1 3 4 6 32
Bob 47 0 1 4 7 11 20
Alice 22
David 39 8 9 10 3 4 5 4 5 8 0 1 3 8 5 1 3 4 6 32
;
run;
proc print;
run;
data want;
set mytable;
"0_4ans_H"n = sum(of "1_1"n -numeric- "4_1"n);
"5_14ans_H"n = sum(of "5_1"n -numeric- "14_1"n);
"15_24ans_H"n = sum(of "15_1"n -numeric- "24_1"n);
run;
proc print;
run;
Bart
The idea of arranging data in this fashion seems to be a poor idea. Almost all SAS PROCs expect data to be arranged vertically, not horizontally. Yes, you can do additional programming to make it work horizontally, but its much much easier with vertical data sets, which is why I keep stealing Maxim 19
Maxim 19
Long beats wide.
(Don't keep data in structure)
In the world of spreadsheets, people tend to line up data side-by-side, and put data items (dates, categories, …) into column headers. This runs counter to all the methods available in SAS for group processing, and makes programming difficult, as one has variable column names and has to resort to creating dynamic code (with macros and/or call execute) where such is not necessary at all if categories were represented in their own column and data aligned vertically.
There are times where a wide format is needed, eg when preparing data for regression analysis. But for the processing and storing of data, long formats are always to be preferred.Dynamic variable names force unnecessary dynamic code.
data have;
input name $ time value;
datalines;
John 1 0
John 2 1
John 3 3
...
David 16 32
;
From here, the programming is a lot easier than working with variable names like '4_1'n, using PROCs like PROC SUMMARY, and maybe even SQL works a lot easier. And while you have marked an answer as "Correct", I think the whole approach is a poor way to do this.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.