Dear community,
the pace and level of detail of the help I received thus far on this forum is really astonishing.
I am ashamed to come up with a next, my now fourth, issue.
This time, I want to restric the "sum" statement in a "PROC SQL" step to return values only, if all variables are available.
(would be great if I could choose from which amount of missing values on the sum-function shall return a missing value, but this only as a side note),
The overall-goal is to merge return-data of companies to a data set containing fundamental data of said companies.
I order to achieve this, I would like to sum up, in monthly intervals, continuous returns.
PROC SQL;
CREATE TABLE data.expansion
AS SELECT a.*,
/* creation of monthly increments */
INTCK('month',datevar,c.date) AS dt,
log(1+c.discrete_return) AS log_return
FROM fundamental_data AS a
LEFT JOIN data.return_data AS c
ON a.firmid = c.firmid
AND a.datevar < c.date <= intnx('month',a.datevar,12,'E')
AND not(missing(a.firmid))
AND not(missing(a.datevar))
AND not(missing(c.date))
ORDER BY a.firmid_1, a.datevar, dt;
QUIT;
PROC SORT data=data.expansion NODUPKEYS;
BY firmid_1 datevar dt;
RUN;
PROC SQL;
CREATE TABLE temporaer
AS SELECT a.*,
sum(c.log_return) AS logret_k12
FROM fundamental_data AS a
LEFT JOIN data.expansion AS c
ON a.datevar = c.datevar
AND a.firmid = c.firmid
AND 1 <= c.dt <= 12
GROUP BY a.datevar, a.firmid
HAVING c.dt = max(c.dt);
QUIT;
"Sum" would not calculate "1.st log_return + 2.nd log_return... + last log_return" and thus return a missing value, if one log_return were missing.
I have never "worked" with "PROC SQL" thus far.
I would be glad, if someone could give me a hint.
Its not really clear to me, test data in the form of a datastep and required output generally helps. Conditionals in sql are done by using case statements:
case when xyz=1 then 1 else 99 end as VAR
I.e. if xyz-1 then var=1 else var=99. Sum functions however are aggregates, they work over a number of rows in the data defined by the group. You can however restrict what goes into the group by sub-querying the from statement so:
proc sql; select SEX, sum(AGE) as AGE_SUM from SASHELP.CLASS group by SEX; quit;
If from that I wanted to drop any records with missings:
proc sql; select SEX, sum(AGE) as AGE_SUM from (select * from SASHELP.CLASS where AGE ne .) group by SEX; quit;
So I filter the data in the from clause before the groups and aggregates are created. This is a useful technique to learn. You could also try:
proc sql; select SEX, sum(case when AGE=. then 0 else AGE end) as AGE_SUM from SASHELP.CLASS group by SEX; quit;
A bit tricky, especially if you want to solve this within a single query.
But if you don't want to sum certain groups, shall they still be part of the output (as missing)?
If so, you could try to use the nmiss() aggregate function in combination with the sum().
Its not really clear to me, test data in the form of a datastep and required output generally helps. Conditionals in sql are done by using case statements:
case when xyz=1 then 1 else 99 end as VAR
I.e. if xyz-1 then var=1 else var=99. Sum functions however are aggregates, they work over a number of rows in the data defined by the group. You can however restrict what goes into the group by sub-querying the from statement so:
proc sql; select SEX, sum(AGE) as AGE_SUM from SASHELP.CLASS group by SEX; quit;
If from that I wanted to drop any records with missings:
proc sql; select SEX, sum(AGE) as AGE_SUM from (select * from SASHELP.CLASS where AGE ne .) group by SEX; quit;
So I filter the data in the from clause before the groups and aggregates are created. This is a useful technique to learn. You could also try:
proc sql; select SEX, sum(case when AGE=. then 0 else AGE end) as AGE_SUM from SASHELP.CLASS group by SEX; quit;
@RW9 wrote:Its not really clear to me, test data in the form of a datastep and required output generally helps.
I am sorry for this obvious dereliction - I should have prepared better, before positing.
Next time, I should not let panic seize hold of me, if something does not work properly or I cannot solve directly.
@RW9 wrote:Conditionals in sql are done by using case statements [...]
So I filter the data in the from clause before the groups and aggregates are created. This is a useful technique to learn. [...]
Nonetheless, that was exactly the kind of support I asked for.
Thank you.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.