The logic works by
1. steps of remerges to the original
2. inner most inline view or easy to understand as sub query computes the frequency**2 and the next sub qery computes the sum of squared frequency.
3. The computed sum of squares is then merged back with the original
4. You deterine the above by starting from processing in hierarchical steps fromthe inner most sub query to the outer most for each level of computation.
5. Best way to understand break each select clause and execute separately and view the results. You'll eventually get the hang of it.
PS
IMHO proc sql can take a bit longer to grasp as it's not intutive as a datastep for the reason datastep's sequential processing gives greater control. But sql is fun nonetheless
Hi @fama The reason that happened because the inclusion of extra vars v4,v5 which is not part of the group by claus will cause the SQL processor of SAS to remerge resulting in duplicates and then the sum of the dups happens to be 3*9 +1=28 unlike other Database sql products like oracel, sql server etc. Therefore the logic will change resulting in change in code.
Before I jump in and do the needful, I would like you to get the exact requirements in full and let me know. So that I can attempt to give you a solution in one shot. Take your time, come back to us with full clarity. Thanks!
Hi @novinosrin ,
Thanks for your reply.
My dataset is quiet big and I have more than 30 variables. To create a want variable I need to use V1, v2 and V3.
data have;
input V1 V2 v3 V4 V5……want ;
cards;
1 1 1 2 3 ….10
1 1 1 3 3…..10
1 1 1 4 5……10
1 1 2 5 6……10
1 2 6 4 4…….9
1 2 3 4 7…….9
1 2 6 5 4…….9
1 2 4 5 5…….9
1 2 4 8 0…….9
2 2 4 4 3……8
2 2 4 2 5…….8
2 2 8 4 3……8
2 2 8 2 5…….8
2 5 9 4 3……16
2 5 9 2 5…….16
2 5 9 4 3……16
2 5 9 2 5…….16
;
10=( 3**2)+ (1**2) where 3 is the frequency of 1 in V3 and 1 is the frequency of 2 in V3
9= (2**2) +(2**2)+(1**2) where 2 is the frequency of 6 in V3 and 2 (in the second Perrantes) is the frequency of 4 in V3 and 1 is the frequency of 3
8= (2**2) +(2**2) where 2 is the frequency of 4 in V3 and 2 (in the second Perrantes) is the frequency of 8 in V3
8= (2**2) +(2**2) where 2 is the frequency of 4 in V3 and 2 (in the second Perrantes) is the frequency of 8 in V3
16= (4**2) where 4 is the frequency of 9 in V3
I need to create the want variable and merge it to the original dataset like what I showed.
Thank you
Ok no biggie, just a very very minor change to not let SQL processor of SAS to remerge and that is
This part
from have a, (select v1,v2, sum(c) as want from (select *, count(v3)**2 as c from have group by v1, v2,v3 )
becomes
from have a, (select v1,v2, sum(c) as want from (select v1,v2,v3, count(v3)**2 as c from have group by v1, v2,v3 )
So here the test follows with your new sample--
data have;
input V1 V2 v3 V4 V5 ;
cards;
1 1 1 2 3 ….10
1 1 1 3 3 …..10
1 1 1 4 5 ……10
1 1 2 5 6 ……10
1 2 6 4 4 …….9
1 2 3 4 7 …….9
1 2 6 5 4 …….9
1 2 4 5 5 …….9
1 2 4 8 0 …….9
2 2 4 4 3 ……8
2 2 4 2 5 …….8
2 2 8 4 3 ……8
2 2 8 2 5 …….8
2 5 9 4 3 ……16
2 5 9 2 5 …….16
2 5 9 4 3 ……16
2 5 9 2 5 …….16
;
proc sql;
create table fama_want as
select a.*,want
from have a, (select v1,v2, sum(c) as want from (select v1,v2,v3, count(v3)**2 as c from have group by v1, v2,v3 )
group by v1,v2) b
where a.v1=b.v1 and a.v2=b.v2;
quit;
Results:
V1 V2 v3 V4 V5 want
1 1 1 2 3 10
1 1 1 3 3 10
1 1 1 4 5 10
1 1 2 5 6 10
1 2 6 4 4 9
1 2 3 4 7 9
1 2 6 5 4 9
1 2 4 5 5 9
1 2 4 8 0 9
2 2 4 4 3 8
2 2 4 2 5 8
2 2 8 4 3 8
2 2 8 2 5 8
2 5 9 4 3 16
2 5 9 2 5 16
2 5 9 4 3 16
2 5 9 2 5 16
Hi @ccaulkins9 It's because of you folks, the community has so much energy. Thanks to you as well. We're all a SAS family after all!
I would do:
data have;
input V1 V2 v3;
cards;
1 1 1
1 1 1
1 1 1
1 1 2
1 2 6
1 2 3
1 2 3
1 2 4
1 2 4
1 2 4
;
proc sql;
select a.*, sumSqFreq
from
have as a inner join
( select v1, v2, uss(freq) as sumSqFreq
from
( select v1, v2, v3, count(*) as freq
from have
group by v1, v2, v3 )
group by v1, v2 ) as b
on a.v1=b.v1 and a.v2=b.v2;
quit;
Are you sure you need to remerge the statistics with the original data?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.