BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sinistrum
Quartz | Level 8

Dear community,

 

the pace and level of detail of the help I received thus far on this forum is really astonishing.

I am ashamed to come up with a next, my now fourth, issue.

 

This time, I want to restric the "sum" statement in a "PROC SQL" step to return values only, if all variables are available.

(would be great if I could choose from which amount of missing values on the sum-function shall return a missing value, but this only as a side note),

 

The overall-goal is to merge return-data of companies to a data set containing fundamental data of said companies.

I order to achieve this, I would like to sum up, in monthly intervals, continuous returns.

 

 

 

       PROC SQL;
          CREATE TABLE data.expansion
          AS SELECT a.*,
	/* creation of monthly increments */
	INTCK('month',datevar,c.date)   AS   dt,
	log(1+c.discrete_return)                 AS   log_return
          FROM       fundamental_data		      AS   a
          LEFT JOIN  data.return_data             AS  c
          ON        a.firmid = c.firmid 
            AND     a.datevar  < c.date <= intnx('month',a.datevar,12,'E')
            AND     not(missing(a.firmid))
            AND     not(missing(a.datevar))
            AND     not(missing(c.date))
          ORDER BY a.firmid_1, a.datevar, dt;
       QUIT;
       PROC SORT data=data.expansion NODUPKEYS; 
          BY firmid_1 datevar dt;
       RUN;

    
       PROC SQL;
           CREATE TABLE temporaer
           AS SELECT a.*,
                     sum(c.log_return)        AS logret_k12
           FROM      fundamental_data         AS a
           LEFT JOIN data.expansion           AS c
             ON      a.datevar  = c.datevar
             AND     a.firmid     = c.firmid
             AND     1 <= c.dt <= 12
           GROUP BY a.datevar, a.firmid
           HAVING c.dt = max(c.dt);
       QUIT;

 

"Sum" would not calculate "1.st log_return + 2.nd log_return... + last log_return" and thus return a missing value, if one log_return were missing.

I have never "worked" with "PROC SQL" thus far.

 

I would be glad, if someone could give me a hint.

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its not really clear to me, test data in the form of a datastep and required output generally helps.  Conditionals in sql are done by using case statements:

case when xyz=1 then 1 else 99 end as VAR

I.e. if xyz-1 then var=1 else var=99.  Sum functions however are aggregates, they work over a number of rows in the data defined by the group.  You can however restrict what goes into the group by sub-querying the from statement so:

proc sql;
  select   SEX,
           sum(AGE) as AGE_SUM
  from     SASHELP.CLASS
  group by SEX;
quit;

If from that I wanted to drop any records with missings:

proc sql;
  select   SEX,
           sum(AGE) as AGE_SUM
  from     (select * from SASHELP.CLASS where AGE ne .)
  group by SEX;
quit;

So I filter the data in the from clause before the groups and aggregates are created.  This is a useful technique to learn.  You could also try:

proc sql;
  select   SEX,
           sum(case when AGE=. then 0 else AGE end) as AGE_SUM
  from     SASHELP.CLASS
  group by SEX;
quit;

 

View solution in original post

3 REPLIES 3
LinusH
Tourmaline | Level 20

A bit tricky, especially if you want to solve this within a single query.

But if you don't want to sum certain groups, shall they still be part of the output (as missing)?

If so, you could try to use the nmiss() aggregate function in combination with the sum().

Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its not really clear to me, test data in the form of a datastep and required output generally helps.  Conditionals in sql are done by using case statements:

case when xyz=1 then 1 else 99 end as VAR

I.e. if xyz-1 then var=1 else var=99.  Sum functions however are aggregates, they work over a number of rows in the data defined by the group.  You can however restrict what goes into the group by sub-querying the from statement so:

proc sql;
  select   SEX,
           sum(AGE) as AGE_SUM
  from     SASHELP.CLASS
  group by SEX;
quit;

If from that I wanted to drop any records with missings:

proc sql;
  select   SEX,
           sum(AGE) as AGE_SUM
  from     (select * from SASHELP.CLASS where AGE ne .)
  group by SEX;
quit;

So I filter the data in the from clause before the groups and aggregates are created.  This is a useful technique to learn.  You could also try:

proc sql;
  select   SEX,
           sum(case when AGE=. then 0 else AGE end) as AGE_SUM
  from     SASHELP.CLASS
  group by SEX;
quit;

 

Sinistrum
Quartz | Level 8

@RW9 wrote:

Its not really clear to me, test data in the form of a datastep and required output generally helps. 

 

I am sorry for this obvious dereliction - I should have prepared better, before positing.
Next time, I should not let panic seize hold of me, if something does not work properly or I cannot solve directly.

 


@RW9 wrote:

Conditionals in sql are done by using case statements [...]


So I filter the data in the from clause before the groups and aggregates are created.  This is a useful technique to learn. [...]


 

Nonetheless, that was exactly the kind of support I asked for.

Thank you.

 

 

 

 

 

 

 

 


 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 5234 views
  • 2 likes
  • 3 in conversation