BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

The logic works by

1. steps of remerges to the original

2. inner most inline view or easy to understand as sub query computes the frequency**2 and the next sub qery computes the sum of squared frequency.

3. The computed sum of squares is then merged back with the original 

4. You deterine the above by starting from processing in hierarchical steps fromthe inner most sub query to the outer most for each level of computation. 

5. Best way to understand break each select clause and execute separately and view the results. You'll eventually get the hang of it. 

 

PS

IMHO proc sql can take a bit longer to grasp as it's not intutive as a datastep for the reason datastep's sequential processing gives greater control. But sql is fun nonetheless

fama
Fluorite | Level 6
Hi Novinosrin,
I really apprecite your time..Was very helpful!!
fama
Fluorite | Level 6
Hi Novinosrin,

A quick question. I appreciate your help in advance.

I have played around with the code. However, when we have other variables ( V4, V5 etc), the results change like below:
data have;
input V1 V2 v3 V4 V5 ;
cards;
1 1 1 2 3
1 1 1 3 3
1 1 1 4 5
1 1 2 5 6
1 2 6 4 4
1 2 3 4 7
1 2 3 5 4
1 2 4
1 2 4
1 2 4
;

Instead of having 10 and 14 like before we have 28 and 36 because the sum function sums up all values created in want.
Maybe we need a different grouping statement. But I don't know how?
novinosrin
Tourmaline | Level 20

Hi @fama  The reason that happened because the inclusion of extra vars v4,v5 which is not part of the group by claus will cause the SQL processor of SAS to remerge resulting in duplicates and then the sum of the dups happens to be 3*9 +1=28  unlike other Database sql products like oracel, sql server etc. Therefore the logic will change resulting in change in code.

 

Before I jump in and do the needful, I would like you to get the exact requirements in full and let me know. So that I can attempt to give you a solution in one shot. Take your time, come back to us with full clarity. Thanks!

 

 

fama
Fluorite | Level 6

Hi @novinosrin ,

Thanks for your reply.

My dataset is quiet big and I have more than 30 variables.  To create a want variable I need to use V1, v2 and V3.

 

data have;
input V1 V2 v3 V4 V5……want ;
cards;
1 1 1  2  3 ….10
1 1 1  3  3…..10
1 1 1  4  5……10
1 1 2  5  6……10
1 2 6  4  4…….9
1 2 3  4  7…….9
1 2 6  5  4…….9
1 2 4  5  5…….9

1 2 4  8  0…….9
2 2 4  4   3……8
2 2 4  2  5…….8

2 2 8  4   3……8
2 2 8   2  5…….8

2 5 9   4   3……16
2 5 9   2  5…….16

2 5 9   4   3……16
2 5 9   2  5…….16
;

 

10=( 3**2)+ (1**2) where 3 is the frequency of 1 in V3 and 1 is the frequency of 2 in V3
9= (2**2) +(2**2)+(1**2) where 2 is the frequency of 6 in V3 and 2 (in the second Perrantes) is the frequency of 4 in V3 and 1 is the frequency of 3

8= (2**2) +(2**2) where 2 is the frequency of 4 in V3 and 2 (in the second Perrantes) is the frequency of 8 in V3

8= (2**2) +(2**2) where 2 is the frequency of 4 in V3 and 2 (in the second Perrantes) is the frequency of 8 in V3

16= (4**2) where 4 is the frequency of 9 in V3

 

I need to create the want variable and merge it to the original dataset  like what I showed.

 

Thank you

novinosrin
Tourmaline | Level 20

Ok no biggie, just a very very minor change to not let SQL processor of SAS to remerge and that is

This part

from have a, (select v1,v2, sum(c) as want from (select *, count(v3)**2 as c from have group by v1, v2,v3 )

 becomes

from have a, (select v1,v2, sum(c) as want from (select v1,v2,v3, count(v3)**2 as c from have group by v1, v2,v3 )

So here the test follows with your new sample--

 

data have;
input V1 V2 v3 V4 V5 ;
cards;
1 1 1  2  3 ….10
1 1 1  3  3 …..10
1 1 1  4  5 ……10
1 1 2  5  6 ……10
1 2 6  4  4  …….9
1 2 3  4  7  …….9
1 2 6  5  4  …….9
1 2 4  5  5  …….9
1 2 4  8  0  …….9
2 2 4  4   3 ……8
2 2 4  2  5 …….8
2 2 8  4   3  ……8
2 2 8   2  5  …….8
2 5 9   4   3  ……16
2 5 9   2  5  …….16
2 5 9   4   3  ……16
2 5 9   2  5  …….16
;

proc sql;
create table fama_want as
select a.*,want
from have a, (select v1,v2, sum(c) as want from (select v1,v2,v3, count(v3)**2 as c from have group by v1, v2,v3 )
group by v1,v2) b
where a.v1=b.v1 and a.v2=b.v2;
quit;

Results:

 

V1 V2 v3 V4 V5 want

1 1 1 2 3 10
1 1 1 3 3 10
1 1 1 4 5 10
1 1 2 5 6 10
1 2 6 4 4 9
1 2 3 4 7 9
1 2 6 5 4 9
1 2 4 5 5 9
1 2 4 8 0 9
2 2 4 4 3 8
2 2 4 2 5 8
2 2 8 4 3 8
2 2 8 2 5 8
2 5 9 4 3 16
2 5 9 2 5 16
2 5 9 4 3 16
2 5 9 2 5 16

 

 

 

fama
Fluorite | Level 6
Thanks a lot. I really appreciate your time
ccaulkins9
Pyrite | Level 9
for FYI's sake, this code/view of the data was useful to me as I started using proc sql to get stats similar to what proc datasets or proc contents does.
e-SAS regards,

novinosrin
Tourmaline | Level 20

Hi @ccaulkins9  It's because of you folks, the community has so much energy. Thanks to you as well. We're all a SAS family after all!

 

 

Reeza
Super User

@fama just one clarification no one has mentioned, * is multiplication ** is power. 

3*2=6

3**2=9

fama
Fluorite | Level 6
Thank you
Maybe because I corrected this in one of my posts..
PGStats
Opal | Level 21

I would do:

 

data have;
input V1 V2 v3;
cards;
1 1 1 
1 1 1 
1 1 1 
1 1 2 
1 2 6 
1 2 3 
1 2 3 
1 2 4 
1 2 4 
1 2 4 
;

proc sql;
select a.*, sumSqFreq
from
    have as a inner join
    ( select v1, v2, uss(freq) as sumSqFreq
      from 
        ( select v1, v2, v3, count(*) as freq
          from have
          group by v1, v2, v3 ) 
      group by v1, v2 ) as b 
    on a.v1=b.v1 and a.v2=b.v2;
quit;

Are you sure you need to remerge the statistics with the original data?

PG
fama
Fluorite | Level 6
Thank you PGStats for your reply and help.
Yes, I need the table for the next analysis.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 27 replies
  • 2646 views
  • 1 like
  • 5 in conversation