data have;
input buildingname & $20. population vaxa pctvax1 vax2 pctvax2 vax3 pctvax3 buildingtype $;
cards;
happy days 100 99 99 96 96 100 100 private
learning tree 20 18 90 18 90 19 95 public
bean stalk 50 50 100 50 100 50 100 public
;
I have the data above.
If any value for pctvax is 100, I need to replace all vax1, vax2, vax3 and pctvax1, pctvax2 and
pctvax3 with dashes such as below:
buildingname population vax1 pctvax1 vax2 pctvax2 vax3 pctvax3 buildingtype
happy days 100 -- -- -- -- -- -- private
learning tree 20 18 90 18 90 19 95 public
bean stalk 50 -- -- -- -- -- -- public
How would I do that? Thanks!
I tried this:
proc format;
value pct
100="--";
value nm
-1="--*";
run;
data want;
set have;
array pcts pctvax1 pctvax2 pctvax3 pct_:;
array nums vax1 vax2 vax3 nm_:;
do i=1 to dim(pcts);
if pcts=100 then nums=-1;
if pcts=100 then pcts=-1;
end;
format pct_: pct.;
format num_: nm.;
run;
I am getting the error array subscript out of range
You are using an older style of the ARRAY statement, which permits you to refer to just the array name to identify one of the elements of the array. When doing that, you can't use i as the index to the array. You have to use _i_ instead to control your DO loop:
do _i_ = 1 to dim(pcts);
Thanks. Now the code ran but it converts the 'vax' to -1 where the associated 'pct' was 100 (ie, if pct_vax1 was 100, it converts both pct_ vax1 to 100 and vax1 to -1) but I need now to convert also pct_vax2, vax2, pct_vax3, vax3 all to -1.
OK, so we cleared up the syntax. Now we need to clear up the logic.
Just as an aside, a couple of items are not clear here. Why you have added pct_: and nm_: to these arrays. At least in the sample data you provided, there are no variable names that match. And why are you using arrays at all instead of hard-coding the entire result like this:
if pctvax1=100 or pctvax2=100 or pctvax3=100 then do;
pctvax1=-1; pctvax2=-1; pctvax3=-1; vax1=-1; vax2=-1; vax3=-1;
end;
But assuming that this is merely representative data, and you have many more variables to process, this would be an approach that uses arrays:
do _i_=1 to dim(pcts);
if pct=100 then reset_values='Y';
end;
if reset_values='Y' then do _i_=1 to dim(pcts);
pcts = -1;
nums = -1;
end;
Thanks! This, for some reason doesn't work. I tried changing reset values to -1 since I want to keep the columns as number variables. But it still didn't work. Any suggestions?
I tried hard coding it as you suggested but it changed all the values in the dataset to -1....
Thanks.
Well, your earlier post suggested that's what you wanted to do. If any pctvax variable is 100, change all 6 variables to -1 (at least for that particular observation). If that's not what you want, you'll have to give a couple of "before and after" examples: what data comes in, and what should come out.
Yes, only for that line. I'm not sure why all the num and pct values for the entire dataset is -1 when I hardcode using 'or'. Thanks.
I can picture only two cases where that might happen.
Unlikely, but it would happen if every observation contains 100 in one of the pctvax variables.
More likely, if you changed the code like this it would also happen:
if pctvax1 or pctvax2 or pctvax3=100 then do;
If it's neither of those, you might have to show more of the program that you used.
SAS is fairly limited in that respect. An alternative would be to create a set of new character variables, holding the characters that you would like to print (either -- or a set of digits). Then print those new variables instead of the original variables.
data have;
input buildingname & $20. population vax1 pctvax1 vax2 pctvax2 vax3 pctvax3 buildingtype $;
cards;
happy days 100 99 99 96 96 100 100 private
learning tree 20 18 90 18 90 19 95 public
bean stalk 50 50 100 50 100 50 100 public
;
proc format;
value grp
.="--";
run;
data want;
set have;
array grp pct: vax: ;
do _n_=1 to dim( grp);
if 100 in grp then call missing(of grp{*});
end;
format pct: vax: grp.;
run;
Regards,
Naveen Srinivasan
I'm unfamiliar with what pct: means. Does that mean I need to write the array out with all the variable names like:
pcts pct_var1 pct_var2 pct_var3?
Thanks!
Pct_: means all variable names that begin with the four characters pct_ (which means it's not clear whether you actually have any such variable names in your data). If you only want three variables in your array, then you have already spelled them out using:
array pcts pctvax1 pctvax2 pctvax3;
Are there other variables that should be added to that list?
Appreciate the information. No variables need to be added. Thanks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.