BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JPARK
Calcite | Level 5

When I run the proc sql sas code below, my program keep running and it does not stop

In note, it says  the query requires remerging summary statistics back with the original data.

Does anybody know why it keep doing this?

proc sql;

create table comp3 as select a.*, abs(a.roa-b.roa) as diff_roa, (a.r-b.r) as pamjone

from DAPAMJ_1 as a left join DAPAMJ_1 as b

on a.sic2=b.sic2 and a.fyear=b.fyear and a.gvkey^=b.gvkey

group by a.gvkey, a.fyear, a.sic2

having abs(a.roa-b.roa)=min(abs(a.roa-b.roa));

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Because you've selected a.* in your select statement.

In standard SQL you can only have the group by variables plus the aggregated values, but SAS SQL is different and allows you include other variables and remerges the data together. 

View solution in original post

9 REPLIES 9
Reeza
Super User

Because you've selected a.* in your select statement.

In standard SQL you can only have the group by variables plus the aggregated values, but SAS SQL is different and allows you include other variables and remerges the data together. 

JPARK
Calcite | Level 5

It still run forever. I changed my code like below. are there still a problem in my code?

proc sql;

create table comp3 as select
a.gvkey, a.sic2, a.fyear, a.roa, a.r, abs(a.roa-b.roa)
as diff_roa, (a.r-b.r) as pamjone

from DAPAMJ_1 as
a
left join DAPAMJ_1 as
b on a.sic2=b.sic2 and
a.fyear=b.fyear
and a.gvkey^=b.gvkey

group by
a.gvkey, a.fyear, a.sic2

having abs(a.roa-b.roa)=min(abs(a.roa-b.roa));

quit;

Reeza
Super User

How big is your data? 

Try it in steps, first join without having clause and see if it runs.

Haikuo
Onyx | Level 15

Looks like a deja vu to me and I strongly concur with Reeza.  There are two possible places that may cause long time running:

1. Reeze pointed out: the having clause.

2. The unequal look up: a.gvkey^=b.gvkey

Break them into multiple steps to see how it goes. If it is still unbearable slow, then we need to know more specs about your tables, a Hash look up may be available.

Haikuo

PGStats
Opal | Level 21

You could also try the equivalent operation (as far as I can tell) as a datastep using random access to your sorted data :

proc sort data=comp3; by sic2 fyear roa; run;

data comp3;

set DAPAMJ_1; by sic2 fyear;

p = _n_ - 1;

if not first.fyear then

  set DAPAMJ_1(keep=gvkey roa rename=(gvkey=pgvkey roa=proa)) point=p;

else call missing(pgvkey, proa);

n = _n_ + 1;

if not last.fyear then

  set DAPAMJ_1(keep=gvkey roa rename=(gvkey=ngvkey roa=nroa)) point=n;

else call missing(ngvkey, nroa);

if n(ngvkey, pgvkey) = 2 then

    if abs(roa-proa) < abs(roa-nroa) then do;

        _gvkey = pgvkey;

        _roa = proa;

        end;

    else do;

        _gvkey = ngvkey;

        _roa = nroa;

        end;

else do;

    _gvkey = coalesce(ngvkey, pgvkey);

    _roa = coalesce(nroa, proa);

    end;

if not missing(_roa) then diff_roa = abs(roa-_roa);

drop ngvkey pgvkey nroa proa;

run;

PG

PG
Tom
Super User Tom
Super User

You are still including THREE non group by variables.

, a.roa

, a.r

, (a.r-b.r) as pamjone

How many observations in each of your input datasets.

JPARK
Calcite | Level 5

Thank you~

Should I put those variables in group by?

I have 48,229 observations

PGStats
Opal | Level 21

48229 is a moderate size dataset... it shouldn't take forever to process. One problem might come from the combination of the left join and the ABS function. When there is no record on the right side of the join, i.e. when only one gvkey is present for a given sic2 and fyear, b.roa is missing and computing ABS(a.roa-b.roa) generates a note to the LOG. You could try avoiding those with the changes:

proc sql;

create table comp3 as select
a.gvkey, a.sic2, a.fyear, a.roa, a.r,

case b.roa when . then . else abs(a.roa-b.roa) end as diff_roa,

(a.r-b.r) as pamjone

from DAPAMJ_1 as
a
left join DAPAMJ_1 as
b on a.sic2=b.sic2 and
a.fyear=b.fyear
and a.gvkey^=b.gvkey

group by
a.gvkey, a.fyear, a.sic2

having (a.roa-b.roa)**2 = min((a.roa-b.roa)**2);

quit;

PG

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 41692 views
  • 4 likes
  • 5 in conversation