Excluding records from SQL output table

Reply
Occasional Contributor
Posts: 6

Excluding records from SQL output table

I have a table with a number of numeric variables that do not have similar names.  All of the numeric fields contain both negative and positive values.  I want to summarize the data by an ID field and exclude any rows from the output where all the numeric fields sum to zero.  I have the below sample code:

 

proc sql; create table TEST as

    select ID,

              SUM(numeric1) as numeric1,

             ....

             SUM(numericN) as numericN

    from TABLE

    group by ID

    having SUM(ABS(numeric1), ...., ABS(numericN)) > 0;

quit;

 

Is there a way to simplify the "having" line without listing all the numeric fields?

 

Thanks in advance.

 

Super User
Posts: 10,466

Re: Excluding records from SQL output table

I think you should provide some example data and what you expect for output.

 

Your statement "exclude any rows from the output where all the numeric fields" looks more like you want to test values on a row. The code you have posted using SQL SUM means you are looking at the sum across rows.

 

I think that you are looking for something like:

 

data want;

  set have;

  if sum(var1,var2,var3) ne 0;

run;

 

Occasional Contributor
Posts: 6

Re: Excluding records from SQL output table

Suppose the first 3 rows of the output table look like:

 

ID      numeric1     numeric2     numeric3     numeric4     numeric5     numeric6

1        100             -100             0                  0                 0                   0

2        100             5000            0                  0                 0                   0

3        0                 0                  0                  0                 0                   0           

 

 

I want to keep rows 1 and 2 but exclude row 3 but I want to do it in the "having" line of the SQL.  I use absolute value in my example to keep row 1 from being dropped.

Super User
Posts: 10,466

Re: Excluding records from SQL output table

You really can't use SUM (or Mean, Max, Min) that way as those are AGGREGATE functions in SQL meaning that they look across all rows (of a group if defined).

 

What is wrong with using a data step if that is the best tool?

 

and if using a dataste you could use the

  if sum(of _numeric_) ne 0;

 

Your comment about absolute value also means that you are violating your initial requirement: exclude any rows from the output where all the numeric fields sum to zero.

 

It looks like you mean that you want to exclude if all of the values are exactly 0. Which could be done in a datastep with:

 

if max(of _numeric_) =0 and min(of _numeric_) = 0 then delete;

Super User
Posts: 5,071

Re: Excluding records from SQL output table

It's sounding like you have multiple rows (originally) for each ID.  Then you want to exclude IDs where the sum of all numeric fields across all rows is 0.  Does that sound right?

 

Are you planning to type out the complete SELECT clause, but avoid typing out the complete HAVING clause?  It might make more sense to type out the list of field names and have macro language generate both clauses (assuming that you can locate a legitimate, working syntax for macro language to generate).

 

See if you can construct a working program using just three numeric fields, as a starting point.

Contributor
Posts: 22

Re: Excluding records from SQL output table

 

I'm not saying this simplifies things very much, but I think this will work in the special case where you have summed every numeric column and used the same name as the alias for the resulting value. This creates a macro variable of all the numeric columns in TABLE and uses the USS statistic function to test for any nonzero value among this list.

 

proc sql;

select name into :numvarlist separated by ','

from dictionary.columns

where libname = 'WORK' and memname = 'TABLE' and type = 'num';

create table TEST as

    select ID,

              SUM(numeric1) as numeric1,

             ....

             SUM(numericN) as numericN

    from TABLE

    group by ID

    having uss(&numvarlist.);

quit;

Super User
Posts: 9,662

Re: Excluding records from SQL output table

No. It is not good for SQL. But good for Data Step:

proc summary data=have;
by id;
var num1-numN;
output out=test sum=;
run;

data want;
 set test;
array x{*} num1-numN;
sum=0;
 do i=1 to dim(x);
  sum+abs(x{i});
end;
if sum > 0;
run;


CODE NOT TESTED

Ask a Question
Discussion stats
  • 6 replies
  • 378 views
  • 0 likes
  • 5 in conversation