Help using Base SAS procedures

PROC SQL - restric sum function to non-missing values

Accepted Solution Solved
Reply
Contributor
Posts: 42
Accepted Solution

PROC SQL - restric sum function to non-missing values

[ Edited ]

Dear community,

 

the pace and level of detail of the help I received thus far on this forum is really astonishing.

I am ashamed to come up with a next, my now fourth, issue.

 

This time, I want to restric the "sum" statement in a "PROC SQL" step to return values only, if all variables are available.

(would be great if I could choose from which amount of missing values on the sum-function shall return a missing value, but this only as a side note),

 

The overall-goal is to merge return-data of companies to a data set containing fundamental data of said companies.

I order to achieve this, I would like to sum up, in monthly intervals, continuous returns.

 

 

 

       PROC SQL;
          CREATE TABLE data.expansion
          AS SELECT a.*,
	/* creation of monthly increments */
	INTCK('month',datevar,c.date)   AS   dt,
	log(1+c.discrete_return)                 AS   log_return
          FROM       fundamental_data		      AS   a
          LEFT JOIN  data.return_data             AS  c
          ON        a.firmid = c.firmid 
            AND     a.datevar  < c.date <= intnx('month',a.datevar,12,'E')
            AND     not(missing(a.firmid))
            AND     not(missing(a.datevar))
            AND     not(missing(c.date))
          ORDER BY a.firmid_1, a.datevar, dt;
       QUIT;
       PROC SORT data=data.expansion NODUPKEYS; 
          BY firmid_1 datevar dt;
       RUN;

    
       PROC SQL;
           CREATE TABLE temporaer
           AS SELECT a.*,
                     sum(c.log_return)        AS logret_k12
           FROM      fundamental_data         AS a
           LEFT JOIN data.expansion           AS c
             ON      a.datevar  = c.datevar
             AND     a.firmid     = c.firmid
             AND     1 <= c.dt <= 12
           GROUP BY a.datevar, a.firmid
           HAVING c.dt = max(c.dt);
       QUIT;

 

"Sum" would not calculate "1.st log_return + 2.nd log_return... + last log_return" and thus return a missing value, if one log_return were missing.

I have never "worked" with "PROC SQL" thus far.

 

I would be glad, if someone could give me a hint.


Accepted Solutions
Solution
‎08-11-2016 10:56 AM
Super User
Super User
Posts: 7,942

Re: PROC SQL - restric sum function to non-missing values

Posted in reply to Sinistrum

Its not really clear to me, test data in the form of a datastep and required output generally helps.  Conditionals in sql are done by using case statements:

case when xyz=1 then 1 else 99 end as VAR

I.e. if xyz-1 then var=1 else var=99.  Sum functions however are aggregates, they work over a number of rows in the data defined by the group.  You can however restrict what goes into the group by sub-querying the from statement so:

proc sql;
  select   SEX,
           sum(AGE) as AGE_SUM
  from     SASHELP.CLASS
  group by SEX;
quit;

If from that I wanted to drop any records with missings:

proc sql;
  select   SEX,
           sum(AGE) as AGE_SUM
  from     (select * from SASHELP.CLASS where AGE ne .)
  group by SEX;
quit;

So I filter the data in the from clause before the groups and aggregates are created.  This is a useful technique to learn.  You could also try:

proc sql;
  select   SEX,
           sum(case when AGE=. then 0 else AGE end) as AGE_SUM
  from     SASHELP.CLASS
  group by SEX;
quit;

 

View solution in original post


All Replies
Super User
Posts: 5,424

Re: PROC SQL - restric sum function to non-missing values

Posted in reply to Sinistrum

A bit tricky, especially if you want to solve this within a single query.

But if you don't want to sum certain groups, shall they still be part of the output (as missing)?

If so, you could try to use the nmiss() aggregate function in combination with the sum().

Data never sleeps
Solution
‎08-11-2016 10:56 AM
Super User
Super User
Posts: 7,942

Re: PROC SQL - restric sum function to non-missing values

Posted in reply to Sinistrum

Its not really clear to me, test data in the form of a datastep and required output generally helps.  Conditionals in sql are done by using case statements:

case when xyz=1 then 1 else 99 end as VAR

I.e. if xyz-1 then var=1 else var=99.  Sum functions however are aggregates, they work over a number of rows in the data defined by the group.  You can however restrict what goes into the group by sub-querying the from statement so:

proc sql;
  select   SEX,
           sum(AGE) as AGE_SUM
  from     SASHELP.CLASS
  group by SEX;
quit;

If from that I wanted to drop any records with missings:

proc sql;
  select   SEX,
           sum(AGE) as AGE_SUM
  from     (select * from SASHELP.CLASS where AGE ne .)
  group by SEX;
quit;

So I filter the data in the from clause before the groups and aggregates are created.  This is a useful technique to learn.  You could also try:

proc sql;
  select   SEX,
           sum(case when AGE=. then 0 else AGE end) as AGE_SUM
  from     SASHELP.CLASS
  group by SEX;
quit;

 

Contributor
Posts: 42

Re: PROC SQL - restric sum function to non-missing values


RW9 wrote:

Its not really clear to me, test data in the form of a datastep and required output generally helps. 

 

I am sorry for this obvious dereliction - I should have prepared better, before positing.
Next time, I should not let panic seize hold of me, if something does not work properly or I cannot solve directly.

 


RW9 wrote:

Conditionals in sql are done by using case statements [...]


So I filter the data in the from clause before the groups and aggregates are created.  This is a useful technique to learn. [...]


 

Nonetheless, that was exactly the kind of support I asked for.

Thank you.

 

 

 

 

 

 

 

 


 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 348 views
  • 2 likes
  • 3 in conversation