Solved: Re: Setting value to 0 if missing - a way to check for all variables?

UniversitySas · Posted 03-10-2020 05:10 AM

Suppose I have the following data:

data temp;
infile DATALINES dsd missover;
input ID Miles Windows Occupants;
CARDS;
01, 100, 2, 5
02, 200, 4, 5
03, 300, 2, .
04, ., 6, 2
05, 500, ., 8
06, 600, ., .
07, 700, 4, 5
08, 800, 4, .
02, 4, 1.7, 3
02, 5, 5.1, 4
;
run;

Is there a way for me to check whether any set of variables, excluding another set of variables is missing, and then set that value =0?

For instance, I want to check if Windows and Occupants are missing, but not Miles. I know how to do this individually, but I have around 50 or so variables I want to check, and 5 I want to exclude, so it would be somewhat tedious to write it all out.

My output would be:

data temp;
infile DATALINES dsd missover;
input ID Miles Windows Occupants;
CARDS;
01, 100, 2, 5
02, 200, 4, 5
03, 300, 2, 0
04, ., 6, 2
05, 500, 0, 8
06, 600, 0, 0
07, 700, 4, 5
08, 800, 4, 0
02, 4, 1.7, 3
02, 5, 5.1, 4
;
run;

ed_sas_member · Posted 03-10-2020 07:09 AM

Hi @UniversitySas

You can do this by adding a statement in the do loop:

proc sql noprint;
	select name
	into:mylist separated by " "
	from dictionary.columns
	where libname="WORK" and memname="TEMP" and name not in ("ID" "Miles"); /* -> select your columns*/
quit;

data want;	
	set temp;
	array _miss (*) &mylist;
	do i=1 to dim(_miss);
		if _miss(i) = . then _miss(i) = 0;
		_miss(i) = _miss(i) / /*your_variable*/;
	end;
	drop i;
run;

In fact, the array is just a way to repeat the same manipulation on a series of columns.

In practice, the column can be identify by its position in the array:

e.g. _miss(1) corresponds to the first column referenced in the array, so Windows

e.g. _miss(2) corresponds to the second column referenced in the array, so Occupants

...

So instead of writing

If Windows = . then Windows =0;
Windows = Windows /  x;
If Occupants = . then Occupants =0;
Occupant = Occupants /  x;

You can write:

If _miss(1) = . then _miss(1) =0;
_miss(1) = _miss(1) /  x;
If _miss(2) = . then _miss(2) =0;
_miss(2) = _miss(2) /  x;

... and what is better -> loop through the position 1, 2, ...

Best,

View solution in original post

ed_sas_member · Posted 03-10-2020 05:15 AM

Hi @UniversitySas

You can use an array to achieve this:

data want;	
	set temp;
	array _miss (*) Windows Occupants /* NB: reference the variables that are concerned or use _numeric_ if you want to refer to all numeric variables*/;
	do i=1 to dim(_miss);
		if _miss(i) = . then _miss(i) = 0;
	end;
	drop i;
run;

Best,

UniversitySas · Posted 03-10-2020 06:03 AM

Thanks for your response. In your code I can reference the variables I want, or use _Numeric_. Instead, is there a way such that I can reference the variables I do NOT want, and check for all others? This is because I have lots of variables I want, and referencing them separately would be tedious.

ed_sas_member · Posted 03-10-2020 06:22 AM

Hi @UniversitySas

You're welcome!

I suggest that you use a macrovariable to retrieve the column name that you want to include in the array statement.

You just need to adapt the "where" clause in Proc SQL to specify which variables you don't want.

proc sql noprint;
	select name
	into:mylist separated by " "
	from dictionary.columns
	where libname="WORK" and memname="TEMP" and name not in ("ID" "Miles"); /* -> select your columns*/
quit;

data want;	
	set temp;
	array _miss (*) &mylist;
	do i=1 to dim(_miss);
		if _miss(i) = . then _miss(i) = 0;
	end;
	drop i;
run;

Best,

UniversitySas · Posted 03-10-2020 06:44 AM

I think this is along the lines of what I want. Although, when I run the proc sql with my actual data, I get "No rows were selected". Any ideas why? I tried the test data and it works.

Edit: I solved it, the memname has to be in uppercase! Thanks!!

ed_sas_member · Posted 03-10-2020 06:54 AM

Awesome!

UniversitySas · Posted 03-10-2020 07:03 AM

Quick question, if I also wanted to divide this list of variables by their corresponding row entry for another variable, e.g., Miles, can I somehow do it in that data-step?

Edit: Got it, in the loop I can just put _miss(i)/miles

thanks!

PaigeMiller · Posted 03-10-2020 07:08 AM

@UniversitySas wrote:
Quick question, if I also wanted to divide this list of variables by their corresponding row entry for another variable, e.g., Miles, can I somehow do it in that data-step?

Quick answer ... don't do this. If you really need separate analyses for the different values of another variable ... there are better ways to do this such as using formats or using BY variables.

--
Paige Miller

ed_sas_member · Posted 03-10-2020 07:09 AM

Hi @UniversitySas

You can do this by adding a statement in the do loop:

proc sql noprint;
	select name
	into:mylist separated by " "
	from dictionary.columns
	where libname="WORK" and memname="TEMP" and name not in ("ID" "Miles"); /* -> select your columns*/
quit;

data want;	
	set temp;
	array _miss (*) &mylist;
	do i=1 to dim(_miss);
		if _miss(i) = . then _miss(i) = 0;
		_miss(i) = _miss(i) / /*your_variable*/;
	end;
	drop i;
run;

In fact, the array is just a way to repeat the same manipulation on a series of columns.

In practice, the column can be identify by its position in the array:

e.g. _miss(1) corresponds to the first column referenced in the array, so Windows

e.g. _miss(2) corresponds to the second column referenced in the array, so Occupants

...

So instead of writing

If Windows = . then Windows =0;
Windows = Windows /  x;
If Occupants = . then Occupants =0;
Occupant = Occupants /  x;

You can write:

If _miss(1) = . then _miss(1) =0;
_miss(1) = _miss(1) /  x;
If _miss(2) = . then _miss(2) =0;
_miss(2) = _miss(2) /  x;

... and what is better -> loop through the position 1, 2, ...

Best,

Jagadishkatam · Posted 03-10-2020 06:02 AM

Alternatively, consider the do over

data want;
set temp;
array mis Windows Occupants;
do over mis;
if mis=. then mis=0;
end;
run;

Thanks,
Jag

Ksharp · Posted 03-10-2020 07:32 AM

proc stdize data=temp out=want reponly missing=0;
var Windows Occupants;
run;

PaigeMiller · Posted 03-10-2020 07:35 AM

Setting values to zero when they are missing seems like a dangerous thing to do, it will change your averages and every other statistic as well. Does it really make any sense to say if Occupants is missing, that's really a zero? Such a modification of the data should only be done if you have a good understanding of each variable and why it might be missing, and without such reasoning, I would not recommend doing this.

--
Paige Miller

UniversitySas · Posted 03-10-2020 09:24 AM

Hi Paige, I agree. In this case, the nature of the real data allows for this change to make sense, but thank you for your input.

Registration is open

SAS Training: Just a Click Away