BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
UniversitySas
Quartz | Level 8

Suppose I have the following data:

data temp;
infile DATALINES dsd missover;
input ID Miles Windows Occupants;
CARDS;
01, 100, 2, 5
02, 200, 4, 5
03, 300, 2, .
04, ., 6, 2
05, 500, ., 8
06, 600, ., .
07, 700, 4, 5
08, 800, 4, .
02, 4, 1.7, 3
02, 5, 5.1, 4
;
run;

 

 

Is there a way for me to check whether any set of variables, excluding another set of variables is missing, and then set that value =0?

 

For instance, I want to check if Windows and Occupants are missing, but not Miles. I know how to do this individually, but I have around 50 or so variables I want to check, and 5 I want to exclude, so it would be somewhat tedious to write it all out.

 

My output would be:

 

data temp;
infile DATALINES dsd missover;
input ID Miles Windows Occupants;
CARDS;
01, 100, 2, 5
02, 200, 4, 5
03, 300, 2, 0
04, ., 6, 2
05, 500, 0, 8
06, 600, 0, 0
07, 700, 4, 5
08, 800, 4, 0
02, 4, 1.7, 3
02, 5, 5.1, 4
;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @UniversitySas 

 

You can do this by adding a statement in the do loop:

proc sql noprint;
	select name
	into:mylist separated by " "
	from dictionary.columns
	where libname="WORK" and memname="TEMP" and name not in ("ID" "Miles"); /* -> select your columns*/
quit;

data want;	
	set temp;
	array _miss (*) &mylist;
	do i=1 to dim(_miss);
		if _miss(i) = . then _miss(i) = 0;
		_miss(i) = _miss(i) / /*your_variable*/;
	end;
	drop i;
run;

In fact, the array is just a way to repeat the same manipulation on a series of columns.

In practice, the column can be identify by its position in the array:

e.g. _miss(1) corresponds to the first column referenced in the array, so Windows

e.g. _miss(2) corresponds to the second column referenced in the array, so Occupants

...

So instead of writing 

If Windows = . then Windows =0;
Windows = Windows /  x;
If Occupants = . then Occupants =0;
Occupant = Occupants /  x;

You can write:

If _miss(1) = . then _miss(1) =0;
_miss(1) = _miss(1) /  x;
If _miss(2) = . then _miss(2) =0;
_miss(2) = _miss(2) /  x;

... and what is better -> loop through the position 1, 2, ...

 

Best,

 

View solution in original post

12 REPLIES 12
ed_sas_member
Meteorite | Level 14

Hi @UniversitySas 

 

You can use an array to achieve this:

data want;	
	set temp;
	array _miss (*) Windows Occupants /* NB: reference the variables that are concerned or use _numeric_ if you want to refer to all numeric variables*/;
	do i=1 to dim(_miss);
		if _miss(i) = . then _miss(i) = 0;
	end;
	drop i;
run;

Best,

UniversitySas
Quartz | Level 8
Thanks for your response. In your code I can reference the variables I want, or use _Numeric_. Instead, is there a way such that I can reference the variables I do NOT want, and check for all others? This is because I have lots of variables I want, and referencing them separately would be tedious.
ed_sas_member
Meteorite | Level 14

Hi @UniversitySas 

 

You're welcome!

 

I suggest that you use a macrovariable to retrieve the column name that you want to include in the array statement.

You just need to adapt the "where" clause in Proc SQL to specify which variables you don't want.

 

proc sql noprint;
	select name
	into:mylist separated by " "
	from dictionary.columns
	where libname="WORK" and memname="TEMP" and name not in ("ID" "Miles"); /* -> select your columns*/
quit;

data want;	
	set temp;
	array _miss (*) &mylist;
	do i=1 to dim(_miss);
		if _miss(i) = . then _miss(i) = 0;
	end;
	drop i;
run;

Best,

UniversitySas
Quartz | Level 8

I think this is along the lines of what I want. Although, when I run the proc sql with my actual data, I get "No rows were selected". Any ideas why? I tried the test data and it works.

 

Edit: I solved it, the memname has to be in uppercase! Thanks!!

UniversitySas
Quartz | Level 8

Quick question, if I also wanted to divide this list of variables by their corresponding row entry for another variable, e.g., Miles, can I somehow do it in that data-step?

 

Edit: Got it, in the loop I can just put _miss(i)/miles

 

thanks!

PaigeMiller
Diamond | Level 26

@UniversitySas wrote:
Quick question, if I also wanted to divide this list of variables by their corresponding row entry for another variable, e.g., Miles, can I somehow do it in that data-step?

Quick answer ... don't do this. If you really need separate analyses for the different values of another variable ... there are better ways to do this such as using formats or using BY variables.

--
Paige Miller
ed_sas_member
Meteorite | Level 14

Hi @UniversitySas 

 

You can do this by adding a statement in the do loop:

proc sql noprint;
	select name
	into:mylist separated by " "
	from dictionary.columns
	where libname="WORK" and memname="TEMP" and name not in ("ID" "Miles"); /* -> select your columns*/
quit;

data want;	
	set temp;
	array _miss (*) &mylist;
	do i=1 to dim(_miss);
		if _miss(i) = . then _miss(i) = 0;
		_miss(i) = _miss(i) / /*your_variable*/;
	end;
	drop i;
run;

In fact, the array is just a way to repeat the same manipulation on a series of columns.

In practice, the column can be identify by its position in the array:

e.g. _miss(1) corresponds to the first column referenced in the array, so Windows

e.g. _miss(2) corresponds to the second column referenced in the array, so Occupants

...

So instead of writing 

If Windows = . then Windows =0;
Windows = Windows /  x;
If Occupants = . then Occupants =0;
Occupant = Occupants /  x;

You can write:

If _miss(1) = . then _miss(1) =0;
_miss(1) = _miss(1) /  x;
If _miss(2) = . then _miss(2) =0;
_miss(2) = _miss(2) /  x;

... and what is better -> loop through the position 1, 2, ...

 

Best,

 

Jagadishkatam
Amethyst | Level 16

Alternatively, consider the do over

 

data want;
set temp;
array mis Windows Occupants;
do over mis;
if mis=. then mis=0;
end;
run;
Thanks,
Jag
Ksharp
Super User
proc stdize data=temp out=want reponly missing=0;
var Windows Occupants;
run;
PaigeMiller
Diamond | Level 26

Setting values to zero when they are missing seems like a dangerous thing to do, it will change your averages and every other statistic as well. Does it really make any sense to say if Occupants is missing, that's really a zero? Such a modification of the data should only be done if you have a good understanding of each variable and why it might be missing, and without such reasoning, I would not recommend doing this.

--
Paige Miller
UniversitySas
Quartz | Level 8
Hi Paige, I agree. In this case, the nature of the real data allows for this change to make sense, but thank you for your input.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 715 views
  • 1 like
  • 5 in conversation