BookmarkSubscribeRSS Feed
MrPenguin
Calcite | Level 5

Dear all,

 

I have a, perhaps very basic, question on looping over several variables, but I couldn't find anything online that fits my problem.

 

In my application, I have a long list of variables which all have the same prefix “var_”, followed by some numbers. However, it is not a simple integer count, but something “non-predictable” like var_1003, var_1250,   var_2510, var_2950, …


data test;
input id var_1003 var_1250 var_2510 var_2950;
datalines;
1 50 . 250 40
2 . 40 . .

3 100 200 . 30
;

 

For each variable, I want to perform the same operation and generate a new variable which includes the name of the old variable. For example, say for each variable "var_..." I want to generate an indicator “missvar_...” =1 if the variable is missing and =0 if it is not missing:

if var_1003=. then missvar_1003=1; else missvar_1003=0;

if var_1250=. then missvar_1250=1; else missvar_1250=0;

Is there any way to automate this for all variables without having to specify each variable separately?

 

Any help is greatly appreciated.

 

Thank you!

6 REPLIES 6
mgregory
Calcite | Level 5

you need to use either macro language or a previous data step to generate the required code.

 

You can also simplify the assignment of the flag variable:

missvar_1003 = (var_1003 eq .) ;

as the condition returns 1 or 0.

novinosrin
Tourmaline | Level 20

Hi @MrPenguin  Here is a boring traditional solution that avoids reading dictionary columns, however very convenient-



data test;
input id var_1003 var_1250 var_2510 var_2950;
datalines;
1 50 . 250 40
2 . 40 . .
3 100 200 . 30
;
proc transpose data=test out=temp;
 by id;
 var var_:;
run;
data _temp;
 length _name_ $32;
 set temp;
 output;
 _name_=catx('_','missvar',scan(_name_,-1,'_'));
 col1=nmiss(col1);
 output;
run;
proc transpose data=_temp out=want(drop=_name_);
 by id;
 id _name_ ;
 var col1 ;
run;


yabwon
Onyx | Level 15

If I may offer help of the SAS Packages Framework and two packages: macroArray and BasePlus:

data TEST;
input id var_1003 var_1250 var_2510 var_2950;
datalines;
1 50 . 250 40
2 . 40 . .
3 100 200 . 30
;
run;


filename packages "%sysfunc(pathname(work))"; /* setup temporary directory for packages in the WORK */
filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas";
%include SPFinit; /* enable the framework */

%installPackage(macroArray basePlus) /* install packages */
  %loadPackageS(macroArray,basePlus) /* load packages content into the SAS session */

%array(Vars[*] %getVars(TEST, pattern=^var_\d), macarray=Y, vnames=Y)

%put *%do_over(Vars)*;

data want;
  set TEST;
  array V %do_over(Vars);
  array M %do_over(Vars, phrase=%nrstr(missing_%Vars(&_I_.)));
  do over V;
    if V = . then M = 0; else M = 1;
  end;
run;

Check out the intro presentation: https://github.com/yabwon/SAS_PACKAGES/blob/main/SPF/Documentation/Getting_Started_with_SAS_Packages...

to learn more.

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

Do you the actual list of variables somewhere, in a file or a macro variable?  or do you want every variable whose name starts with VAR_ ?  Are they all of the same type (numeric or character)?

 

Let's make some assumptions and show some code.

1) Assume you have the list in a macro variable.  Macro variables can hold 64K bytes so you should be able to fit more than 2K names into one macro variable.

2) Assume that none of the existing names are so long that adding four extra characters will make the name longer than 32 characters.

3) Assume that you do not already have any of the target names in the dataset.  (That is if VAR_1003 is in your list then MISSVAR_1003 does not already exist, or it if does it is numeric and you don't care about whatever value it used to have.)

4) All of the names follow standard naming conventions (that is you don't need to use name literals to reference them).

5) All of the variables named are of the same type (numeric or character).

So if the list is like this:

%let names=var_1003 var_1250 var_2510 var_2950;

Then your data step is like this:

data want;
  set have;
  array _in &names ;
  array _out miss%sysfunc(tranwrd(&names,%str( ),%str( )miss));
  do over _in;
    _out = not missing(_in);
  end;
run;

Now if you want to create the macro variable with the list of names from the names of the variables in the dataset you could use the INTO clause of PROC SQL.  So if your source dataset is named HAVE in the library WORK you might do this to find all of the variables whose names start with VAR_.

proc sql noprint;
%let names=;
select name into :names separated by ' '
  from dictionary.columns
  where memname="HAVE" and libname="WORK"
    and substr(upcase(name),1,4) =  'VAR_' 
;
quit;

 

hhinohar
Quartz | Level 8
data have;
	input id var_1003 var_1250 var_2510 var_2950;
	datalines;
1 50 . 250 40
2 . 40 . .
3 100 200 . 30
;
run;

*generate column list and observation count as macro variable using proc sql "into" and "separated by" clause;
proc sql noprint;
	select cats("miss_", name), 
	       count(name) into:varslist separated by " ",
	                       :num 
        		from dictionary.columns where libname="WORK" and memname="HAVE" and name ne "id";
quit;

*insert macro variables into want dataset;
data want;
	set have;
	array vars[&num] var_:;
	array missvars[&num] &varslist;

	do i=1 to dim(vars);

		if missing(vars[i]) then
			missvars[i]=1;
		else
			missvars[i]=0;
	end;
	drop i;
run;
gamotte
Rhodochrosite | Level 12

Hello,

data _NULL_;
    call execute('data want; set test;');
    do until(lastrow);
        set sashelp.vcolumn(where=(LIBNAME='WORK' and MEMNAME='TEST' and upcase(NAME)=:'VAR_')) end=lastrow;
        call execute(cats('miss_',scan(NAME,2,'_'),'=missing(',NAME,');'));
    end;
    call execute('run;');
    stop;
run;

EDIT: added the necessary 'stop' instruction at the end of the data step.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 972 views
  • 2 likes
  • 7 in conversation