Dear all,
I have a, perhaps very basic, question on looping over several variables, but I couldn't find anything online that fits my problem.
In my application, I have a long list of variables which all have the same prefix “var_”, followed by some numbers. However, it is not a simple integer count, but something “non-predictable” like var_1003, var_1250, var_2510, var_2950, …
data test;
input id var_1003 var_1250 var_2510 var_2950;
datalines;
1 50 . 250 40
2 . 40 . .
3 100 200 . 30
;
For each variable, I want to perform the same operation and generate a new variable which includes the name of the old variable. For example, say for each variable "var_..." I want to generate an indicator “missvar_...” =1 if the variable is missing and =0 if it is not missing:
if var_1003=. then missvar_1003=1; else missvar_1003=0;
if var_1250=. then missvar_1250=1; else missvar_1250=0;
…
Is there any way to automate this for all variables without having to specify each variable separately?
Any help is greatly appreciated.
Thank you!
you need to use either macro language or a previous data step to generate the required code.
You can also simplify the assignment of the flag variable:
missvar_1003 = (var_1003 eq .) ;
as the condition returns 1 or 0.
Hi @MrPenguin Here is a boring traditional solution that avoids reading dictionary columns, however very convenient-
data test;
input id var_1003 var_1250 var_2510 var_2950;
datalines;
1 50 . 250 40
2 . 40 . .
3 100 200 . 30
;
proc transpose data=test out=temp;
by id;
var var_:;
run;
data _temp;
length _name_ $32;
set temp;
output;
_name_=catx('_','missvar',scan(_name_,-1,'_'));
col1=nmiss(col1);
output;
run;
proc transpose data=_temp out=want(drop=_name_);
by id;
id _name_ ;
var col1 ;
run;
If I may offer help of the SAS Packages Framework and two packages: macroArray and BasePlus:
data TEST;
input id var_1003 var_1250 var_2510 var_2950;
datalines;
1 50 . 250 40
2 . 40 . .
3 100 200 . 30
;
run;
filename packages "%sysfunc(pathname(work))"; /* setup temporary directory for packages in the WORK */
filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas";
%include SPFinit; /* enable the framework */
%installPackage(macroArray basePlus) /* install packages */
%loadPackageS(macroArray,basePlus) /* load packages content into the SAS session */
%array(Vars[*] %getVars(TEST, pattern=^var_\d), macarray=Y, vnames=Y)
%put *%do_over(Vars)*;
data want;
set TEST;
array V %do_over(Vars);
array M %do_over(Vars, phrase=%nrstr(missing_%Vars(&_I_.)));
do over V;
if V = . then M = 0; else M = 1;
end;
run;
Check out the intro presentation: https://github.com/yabwon/SAS_PACKAGES/blob/main/SPF/Documentation/Getting_Started_with_SAS_Packages...
to learn more.
All the best
Bart
Do you the actual list of variables somewhere, in a file or a macro variable? or do you want every variable whose name starts with VAR_ ? Are they all of the same type (numeric or character)?
Let's make some assumptions and show some code.
1) Assume you have the list in a macro variable. Macro variables can hold 64K bytes so you should be able to fit more than 2K names into one macro variable.
2) Assume that none of the existing names are so long that adding four extra characters will make the name longer than 32 characters.
3) Assume that you do not already have any of the target names in the dataset. (That is if VAR_1003 is in your list then MISSVAR_1003 does not already exist, or it if does it is numeric and you don't care about whatever value it used to have.)
4) All of the names follow standard naming conventions (that is you don't need to use name literals to reference them).
5) All of the variables named are of the same type (numeric or character).
So if the list is like this:
%let names=var_1003 var_1250 var_2510 var_2950;
Then your data step is like this:
data want;
set have;
array _in &names ;
array _out miss%sysfunc(tranwrd(&names,%str( ),%str( )miss));
do over _in;
_out = not missing(_in);
end;
run;
Now if you want to create the macro variable with the list of names from the names of the variables in the dataset you could use the INTO clause of PROC SQL. So if your source dataset is named HAVE in the library WORK you might do this to find all of the variables whose names start with VAR_.
proc sql noprint;
%let names=;
select name into :names separated by ' '
from dictionary.columns
where memname="HAVE" and libname="WORK"
and substr(upcase(name),1,4) = 'VAR_'
;
quit;
data have;
input id var_1003 var_1250 var_2510 var_2950;
datalines;
1 50 . 250 40
2 . 40 . .
3 100 200 . 30
;
run;
*generate column list and observation count as macro variable using proc sql "into" and "separated by" clause;
proc sql noprint;
select cats("miss_", name),
count(name) into:varslist separated by " ",
:num
from dictionary.columns where libname="WORK" and memname="HAVE" and name ne "id";
quit;
*insert macro variables into want dataset;
data want;
set have;
array vars[&num] var_:;
array missvars[&num] &varslist;
do i=1 to dim(vars);
if missing(vars[i]) then
missvars[i]=1;
else
missvars[i]=0;
end;
drop i;
run;
Hello,
data _NULL_;
call execute('data want; set test;');
do until(lastrow);
set sashelp.vcolumn(where=(LIBNAME='WORK' and MEMNAME='TEST' and upcase(NAME)=:'VAR_')) end=lastrow;
call execute(cats('miss_',scan(NAME,2,'_'),'=missing(',NAME,');'));
end;
call execute('run;');
stop;
run;
EDIT: added the necessary 'stop' instruction at the end of the data step.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.