- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am working with a dataset that has 110,000 observations and 170 variables. I am trying to delete all the observations that are 0 in all of around 30 variables. They must be 0 in all of the 30 variables. This is what I have done so far:
DATA new_set;
SET old_set (WHERE= (var1<>0 AND var2<>0 AND var3<>0 AND var4<>0 AND....var30<>0));
RUN;
The variables are not in actually numbered 1-30, are not all in a row, and are sparsely scattered about the other 170 in the dataset.
Is there an easier or more efficient way to do this?
I am using SAS 9.4 (Unicode)
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @gorba004 and welcome to the SAS Support Communities!
First of all, please note that your suggested WHERE condition is inconsistent with your requirement: You would need to replace "AND" with "OR" to correct this. (The "<>" works like "NE" in a WHERE condition, but I still wouldn't use it for the reason @ballardw has explained.)
As has been suggested, you can use functions with an OF variable-list argument to simplify the condition, where variable-list may be a list of arbitrary variable names (it's not limited to numbered range lists etc.). If you need more than one function, you can use an array to avoid duplicating the long list.
Example:
data new_set;
array x[*] list your variables here;
set old_set;
if not(nmiss(of x[*])=sumabs(of x[*])=0);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input var1-var3;
cards;
0 0 0
0 1 0
0 1 2
;
data want;
set have;
k=compress(cats(of var1-var3),'0');
if missing(k) then delete;
drop k;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The 30 variables appear to be numeric so this should work:
if sum(of var1-var30) = 0 then delete;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SASKiwi wrote:
The 30 variables appear to be numeric so this should work:
if sum(of var1-var30) = 0 then delete;
I would hesitate to use that alone. Since the OP is attempting to use <> as not equal there may actually be negative values. So if there is a -1 and a +1 the sum would be 0 . Sum=0 and Range=0 might be sufficient though.
You do not want to use <> if looking to do a "not equal", use one of: NE ^=
Examine this code and the result:
data example; x= 5; y1 = x <> 2; y2 = x <> 7; z = x ne 2; run;
The <> returns the larger of the two values. The x ne 2 returns a 1 for true.
And while SAS will treat any number not 0 or missing as "True" you may get very unexpected results from comparisons if you use <> instead of NE if you intend "not equal" for the comparison.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ballardw - good point. If data is all greater than or equal to zero it should be OK.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SASKiwi wrote:
@ballardw - good point. If data is all greater than or equal to zero it should be OK.
Care to guess how much time I spent diagnosing a very similar issue years ago where a very small number of records had that offsetting sum behavior? Though in the case I had it was a combination of 4 variables where one negative value just happened to match the total of 3 positive...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's exactly what i was wondering even a champion like Sir @SASKiwi who taught me effective boolean expressions can get "jetlagged" on way to dallas? 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@novinosrin - quite likely. I've got a 3 hour followed by a 15.5 hour flight to contend with....
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have a safe trip and great fun sir. Pass on my regards to our coomunity folks in SGF. Cheers!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @gorba004 and welcome to the SAS Support Communities!
First of all, please note that your suggested WHERE condition is inconsistent with your requirement: You would need to replace "AND" with "OR" to correct this. (The "<>" works like "NE" in a WHERE condition, but I still wouldn't use it for the reason @ballardw has explained.)
As has been suggested, you can use functions with an OF variable-list argument to simplify the condition, where variable-list may be a list of arbitrary variable names (it's not limited to numbered range lists etc.). If you need more than one function, you can use an array to avoid duplicating the long list.
Example:
data new_set;
array x[*] list your variables here;
set old_set;
if not(nmiss(of x[*])=sumabs(of x[*])=0);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if not(nmiss(of x[*])=sumabs(of x[*])=0);
Super slick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks so much, this worked like a charm
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using array makes life easier:
data want;
set have;
array varx var1-var30;
do i=1 to dim(varx);
if varx(i) <> 0 then do; output; leave; end;
end;
drop i ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You want to delete a row of variables each having a value of zero. Don't you want to delete Columns having zeros?
For instance,
data have; input var1-var3; cards; 0 0 0 0 1 0 0 1 2 ; run;
If you delete the first row, you will have the entire column (VAR1) with zeros. Do you want VAR1 to be kept?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input var1-var3;
cards;
0 0 0
0 1 0
0 1 2
;
data want;
set have;
array x{*} var1-var3;
if min(of x{*})=0 and max(of x{*})=0 then delete;
run;