To whom it may concern,
I have the dataset below (initial_data):
Obs CustomerID Var1 Var2 Var3 Var4 Var5
1 424535 5013 6784 5673 . 3462
2 576775 . 4328 3231 4563 2445
3 645678 . . 4356 . 6534
4 768395 6476 8732 6428 3542 2314
I want to increment each of the values in from Var1 through Var5 by 3000. I have the following code that I am hoping will help me achieve my goal, however, it doesn't seem to be working. Can someone please advise on what I still need to do on the code to achieve my objective?
data final_data;
set initial_data;
do i = Var1, Var2, Var3, Var4, Var5;
i = i + 3000;
end;
run;
Doesn't work is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the "</>" to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.
Since you don't say what "doesn't work" lets look at this code:
data final_data; set initial_data; do i = Var1, Var2, Var3, Var4, Var5; i = i + 3000; end; run;
The I variable is assigned values sequentially set by 5 variables. So far so good. Then you add 3000 to each value of I.
The result is that I has the value of Var5 with 3000 added.
The variable on the LEFT side of an = gets the assigned value. Since nowhere did you have Var1, Var2, ... Var5 on the left of an = sign those variable values do not change.
The ARRAY statement as used in @SASKiwi's suggestion is a way to create a reference to those variables. The index value, the number that appears in the () after the array name is short had for "use the ith variable in the list defined as array vars".
Note that SAS will check to see that the index value is in the defined range for the array. So if your code has 5 variables and you ask for the 0 or 6th or 16th value in the list you will get an error.
I wouldn't bother with loops for only two variables. Try this:
data final_data;
set initial_data;
var1 = var1 + 3000;
var2 = var2 + 3000;
run;
What happens with missing values?
Thanks; but that was meant as illustrative example. I really need to use the do - loop on a variable list.
In that case:
data final_data;
set initial_data;
array vars (*) var1-var2;
do i= 1 to dim(vars);
vars(i) = vars(i) + 3000;
end;
run;
Doesn't work is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the "</>" to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.
Since you don't say what "doesn't work" lets look at this code:
data final_data; set initial_data; do i = Var1, Var2, Var3, Var4, Var5; i = i + 3000; end; run;
The I variable is assigned values sequentially set by 5 variables. So far so good. Then you add 3000 to each value of I.
The result is that I has the value of Var5 with 3000 added.
The variable on the LEFT side of an = gets the assigned value. Since nowhere did you have Var1, Var2, ... Var5 on the left of an = sign those variable values do not change.
The ARRAY statement as used in @SASKiwi's suggestion is a way to create a reference to those variables. The index value, the number that appears in the () after the array name is short had for "use the ith variable in the list defined as array vars".
Note that SAS will check to see that the index value is in the defined range for the array. So if your code has 5 variables and you ask for the 0 or 6th or 16th value in the list you will get an error.
Thanks all for your very helpful responses; they helped me frame the problem correctly in my mind.
Is it then fair to say that from the variables Var1 - Var5 in my dataset, if I only wanted to make changes to 3 of them, say Var1, Var3, and Var5 there is no way to do so other than putting Var1, Var3, and Var 5 into an array? In other words, is anyone aware of a way to modify my syntax below in order to take my Have dataset and get my Want dataset below? I have references to my syntax below in Base SAS textbooks, however, I have not come across any explicit enough examples to implement those references; everyone seems to default to using an array. As always, all thoughts are welcome.
My syntax:
data want;
set have;
do i = Var1, Var3, Var5;
i = i + 3000;
end;
run;
Have Dataset:
Obs CustomerID Var1 Var2 Var3 Var4 Var5
1 424535 5013 6784 5673 . 3462
2 576775 . 4328 3231 4563 2445
3 645678 . . 4356 . 6534
4 768395 6476 8732 6428 3542 2314
Want Dataset (Var1, Var3, Var5 have all be incremented by 3000):
Obs CustomerID Var1 Var2 Var3 Var4 Var5
1 424535 8013 6784 8673 . 6462
2 576775 3000 4328 6231 4563 5445
3 645678 3000 . 7356 . 9534
4 768395 9476 8732 9428 3542 5314
@maroulator1 wrote:
Is it then fair to say that from the variables Var1 - Var5 in my dataset, if I only wanted to make changes to 3 of them, say Var1, Var3, and Var5 there is no way to do so other than putting Var1, Var3, and Var 5 into an array?
Not at all. What the ARRAY syntax allows is quick flexibility.
If you do something like
Array v (*) var1 var3 var5; do i= 1 to dim(v); v[i]=sum(v[i], 3000); end;
is equivalent to writing
var1= sum(var1,3000); var3= sum(var3,3000); var5= sum(var5,3000);
The flexibility of an array based approach is that consider your next data set you need to do similar has 15 variables. It is much easier to just replace the list in the Array definition with the "new" list of 15 variable names. None of the other code would need to change. However if using the the Var1= statements you would need to rewrite/add 15 statements. And then the time after that with 8 variable for example you would need to remove statements and make the variables match the new list.
You could even use something like
array V(*) var1-var5; do i=1 to dim(v); if i in (1, 3, 5) then v[i] = sum(v[i], 3000); end;
The IN operator returns true when i is any of the values in the list (note: not variables).
This would only do the addition with the index variable is the value of the element in the array that you want to modify. The STYLE choice of which code to use could depend. The last would be a better solution if you know that the variables will stay constant but that the variables that need that adjustment change frequently.
If you search this forum long enough I believe you will find a few questions/solutions involving arrays with 1000 or more variables.
Another use of the Arrays might be that you need to adjust the values differently for different variables. That might lead to a temporary array with the values used for the adjustment and code that looks like:
Array v (*) var1 var3 var5; do i= 1 to dim(v); v[i]=sum(v[i], temp[i]); end;
Where Temp has been properly defined to hold the constant values needed to adjust each variable.
Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
Also, look at SUM function versus statement and how they handle missing and decide which you should be using.
https://communities.sas.com/t5/SAS-Programming/difference-between-sum-and-lt-operator-gt/td-p/392751
data want;
set have;
array _vars(5) var1-var5;
do i=1 to dim(_vars);
_vars(i) = sum(_vars(i), 3000);
end;
run;
2 things you need to consider is:
- the same operation on multiple variables is done in ARRAYS, not in do loops itself
- an operation on missing values must be avoided either by using SAS functions ( eg. SUM), or by conditional processing (eg. if VAR ne missing then ..).
Below is the code how it works;
data have;
input obs cust var1 var2 var3 var4 var5;
datalines;
1 424535 5013 6784 5673 . 3462
2 576775 . 4328 3231 4563 2445
3 645678 . . 4356 . 6534
4 768395 6476 8732 6428 3542 2314
;
run;
data want;
set have;
array x Var1 Var2 Var3 Var4 Var5;
do over x;
if x ne . then x = x + 3000;
end;
run;
As @Reeza suggested above, you might better check referenced documentations for learning more..
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.