I am working with a dataset that has 110,000 observations and 170 variables. I am trying to delete all the observations that are 0 in all of around 30 variables. They must be 0 in all of the 30 variables. This is what I have done so far:
DATA new_set;
SET old_set (WHERE= (var1<>0 AND var2<>0 AND var3<>0 AND var4<>0 AND....var30<>0));
RUN;
The variables are not in actually numbered 1-30, are not all in a row, and are sparsely scattered about the other 170 in the dataset.
Is there an easier or more efficient way to do this?
I am using SAS 9.4 (Unicode)
Hello @gorba004 and welcome to the SAS Support Communities!
First of all, please note that your suggested WHERE condition is inconsistent with your requirement: You would need to replace "AND" with "OR" to correct this. (The "<>" works like "NE" in a WHERE condition, but I still wouldn't use it for the reason @ballardw has explained.)
As has been suggested, you can use functions with an OF variable-list argument to simplify the condition, where variable-list may be a list of arbitrary variable names (it's not limited to numbered range lists etc.). If you need more than one function, you can use an array to avoid duplicating the long list.
Example:
data new_set;
array x[*] list your variables here;
set old_set;
if not(nmiss(of x[*])=sumabs(of x[*])=0);
run;
data have;
input var1-var3;
cards;
0 0 0
0 1 0
0 1 2
;
data want;
set have;
k=compress(cats(of var1-var3),'0');
if missing(k) then delete;
drop k;
run;
The 30 variables appear to be numeric so this should work:
if sum(of var1-var30) = 0 then delete;
@SASKiwi wrote:
The 30 variables appear to be numeric so this should work:
if sum(of var1-var30) = 0 then delete;
I would hesitate to use that alone. Since the OP is attempting to use <> as not equal there may actually be negative values. So if there is a -1 and a +1 the sum would be 0 . Sum=0 and Range=0 might be sufficient though.
You do not want to use <> if looking to do a "not equal", use one of: NE ^=
Examine this code and the result:
data example; x= 5; y1 = x <> 2; y2 = x <> 7; z = x ne 2; run;
The <> returns the larger of the two values. The x ne 2 returns a 1 for true.
And while SAS will treat any number not 0 or missing as "True" you may get very unexpected results from comparisons if you use <> instead of NE if you intend "not equal" for the comparison.
@ballardw - good point. If data is all greater than or equal to zero it should be OK.
@SASKiwi wrote:
@ballardw - good point. If data is all greater than or equal to zero it should be OK.
Care to guess how much time I spent diagnosing a very similar issue years ago where a very small number of records had that offsetting sum behavior? Though in the case I had it was a combination of 4 variables where one negative value just happened to match the total of 3 positive...
That's exactly what i was wondering even a champion like Sir @SASKiwi who taught me effective boolean expressions can get "jetlagged" on way to dallas? 🙂
@novinosrin - quite likely. I've got a 3 hour followed by a 15.5 hour flight to contend with....
Have a safe trip and great fun sir. Pass on my regards to our coomunity folks in SGF. Cheers!
Hello @gorba004 and welcome to the SAS Support Communities!
First of all, please note that your suggested WHERE condition is inconsistent with your requirement: You would need to replace "AND" with "OR" to correct this. (The "<>" works like "NE" in a WHERE condition, but I still wouldn't use it for the reason @ballardw has explained.)
As has been suggested, you can use functions with an OF variable-list argument to simplify the condition, where variable-list may be a list of arbitrary variable names (it's not limited to numbered range lists etc.). If you need more than one function, you can use an array to avoid duplicating the long list.
Example:
data new_set;
array x[*] list your variables here;
set old_set;
if not(nmiss(of x[*])=sumabs(of x[*])=0);
run;
if not(nmiss(of x[*])=sumabs(of x[*])=0);
Super slick
Thanks so much, this worked like a charm
Using array makes life easier:
data want;
set have;
array varx var1-var30;
do i=1 to dim(varx);
if varx(i) <> 0 then do; output; leave; end;
end;
drop i ;
run;
You want to delete a row of variables each having a value of zero. Don't you want to delete Columns having zeros?
For instance,
data have; input var1-var3; cards; 0 0 0 0 1 0 0 1 2 ; run;
If you delete the first row, you will have the entire column (VAR1) with zeros. Do you want VAR1 to be kept?
data have;
input var1-var3;
cards;
0 0 0
0 1 0
0 1 2
;
data want;
set have;
array x{*} var1-var3;
if min(of x{*})=0 and max(of x{*})=0 then delete;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.