Hi,
How can i add count rows by groups through proc sql
Have dataset
x y
1 a
1 b
1 c
2 g
2 p
3 f
wanted output
x y n
1 a 1
1 b 2
1 c 3
2 g 1
2 p 2
3 f 1
Grouping variable is x
i know how to do it with data step
Proc sort data = have ;
by x;
run;
data want;
set have;
by x;
if first.x then N =1;
else N+1;
run;
However i am finding equivalent code in Proc SQL.
"row_number() over partition by " does not work in proc sql
HI @V_R These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.
The idea is to use MONOTONIC() and RANK by GROUP.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;
With SQL use a Group By clause. Looking at your sample data/desired result: What's the grouping variable? Why is n=1 for 1a but n=2 for 1b? If the grouping variable would be X then n should be 3 for the first 3 rows, if it's Y then it should be 1 for all rows.
Grouping variable is x
The data step you've posted is not really implementing a count by group but it's just counting up and you reset the count by group. There is not really something equivalent in ANSI SQL (many databases have extensions ANSI SQL - analytic functions - which allow for similar things).
With ANSI SQL you can have a count by group - but that works against sets of rows and not sequentially like with a SAS data step (compare the differences returned by below code).
Once you understand the differences between a SAS data step and SQL you can take full advantage of it and use whatever you need.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc print;
run;
data want;
set have;
by x;
if first.x then
N =1;
else N+1;
run;
proc print;
run;
proc sql;
select x, y, count(*) as n
from have
group by x
;
quit;
For sql part
output with your code is
x y n
1 a 3 1 b 3 1 c 3 2 g 2 2 p 2 3 f 1
what i want isx y n
1 a 1 1 b 2 1 c 3 2 g 1 2 p 2 3 f 1
That's what I've tried to tell you: Use a SAS data step for such a use case, use SAS SQL for a group count. Take advantage of the differences between the SAS data step and SQL.
As you see, it is very simple in a data step. Maxim 14: Use the Right Tool.
Stay with the data step.
> However i am finding equivalent code in Proc SQL.
SQL is not very suited to processing rows in a given order.
A data step is.
> "row_number() over partition by " does not work
This non-ANSI feature is not implemented in SAS. Time to vote.
HI @V_R These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.
The idea is to use MONOTONIC() and RANK by GROUP.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;
You'd be surprised-- I came to this thread looking for a solution to this as a SAS user doing data analytics for business! Needed to randomly select X number of observations from each value of the grouped variable, with X being variable depending on the value. Idea being to evenly distribute the control group to with the other group based on this variable. So I randomly ordered my members in the group, and then needed to label them to prepare for selection. So your solution does have real-world value 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.