I have 291 "dichotomous" variables, Yes/No. Instead of ' ' [missing] all the missing values are recorded as "unknown".
I know I can use if/then sub-setting to set "Unknown" to " ", but I do not want to have to manually enter all 291 variables.
I was thinking something like this:
data test;
set trim;
if _all_ = "Unknown" then put " ";
run;
but I can't quite figure out what the code would be.
Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
You need an array instead.
data test;
set trim;
array _vars2recode(*) <list of variables>;
do i=1 to dim(_vars2recode);
if _vars2recode(i) = 'Unknown' then _vars2recode(i) = .;
end;
run;
Here is a reference that illustrates how to refer to variables and datasets in a short cut list, so you don't have to list them 1 by 1 for the array list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
@vanarsdale2 wrote:
I have 291 "dichotomous" variables, Yes/No. Instead of ' ' [missing] all the missing values are recorded as "unknown".
I know I can use if/then sub-setting to set "Unknown" to " ", but I do not want to have to manually enter all 291 variables.
I was thinking something like this:
data test;
set trim;
if _all_ = "Unknown" then put " ";
run;
but I can't quite figure out what the code would be.
So you have 291 character variables or length $7 (or longer)?
Why not use a format?
proc format;
value $ynunk 'Yes'='Yes' 'No'='No' 'unknown',' '=' ' other='Invalid' ;
run;
Then use that format with your 291 variables.
proc freq data=have;
tables var1-var291;
format var1-var291 $ynunk. ;
run;
Doesn't the format option just change how the data is presented and not how it is stored? Would that mess up any future logistic regression analyses or anything else?
Most PROCS will use the formatted values when creating groups. Try it with your logistic regression.
Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
You need an array instead.
data test;
set trim;
array _vars2recode(*) <list of variables>;
do i=1 to dim(_vars2recode);
if _vars2recode(i) = 'Unknown' then _vars2recode(i) = .;
end;
run;
Here is a reference that illustrates how to refer to variables and datasets in a short cut list, so you don't have to list them 1 by 1 for the array list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
@vanarsdale2 wrote:
I have 291 "dichotomous" variables, Yes/No. Instead of ' ' [missing] all the missing values are recorded as "unknown".
I know I can use if/then sub-setting to set "Unknown" to " ", but I do not want to have to manually enter all 291 variables.
I was thinking something like this:
data test;
set trim;
if _all_ = "Unknown" then put " ";
run;
but I can't quite figure out what the code would be.
So you are saying I would be able to use the _All_ keyword in an array? Because I was trying but couldn't with the subsetting
Arrays cannot mix numeric and character variables. So you can only use _ALL_ variable list in an array if your data step has only defined one type of variable (all numeric or all character). You could use _CHARACTER_ variable list if you wanted.
data want;
set have;
array _c _character_;
do over _c;
if upcase(_c)='UNKNOWN' then _c=' ';
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.