Hi,
How can i add count rows by groups through proc sql
Have dataset
x y
1 a
1 b
1 c
2 g
2 p
3 f
wanted output
x y n
1 a 1
1 b 2
1 c 3
2 g 1
2 p 2
3 f 1
Grouping variable is x
i know how to do it with data step
Proc sort data = have ;
by x;
run;
data want;
set have;
by x;
if first.x then N =1;
else N+1;
run;
However i am finding equivalent code in Proc SQL.
"row_number() over partition by " does not work in proc sql
HI @V_R These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.
The idea is to use MONOTONIC() and RANK by GROUP.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;
With SQL use a Group By clause. Looking at your sample data/desired result: What's the grouping variable? Why is n=1 for 1a but n=2 for 1b? If the grouping variable would be X then n should be 3 for the first 3 rows, if it's Y then it should be 1 for all rows.
Grouping variable is x
The data step you've posted is not really implementing a count by group but it's just counting up and you reset the count by group. There is not really something equivalent in ANSI SQL (many databases have extensions ANSI SQL - analytic functions - which allow for similar things).
With ANSI SQL you can have a count by group - but that works against sets of rows and not sequentially like with a SAS data step (compare the differences returned by below code).
Once you understand the differences between a SAS data step and SQL you can take full advantage of it and use whatever you need.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc print;
run;
data want;
set have;
by x;
if first.x then
N =1;
else N+1;
run;
proc print;
run;
proc sql;
select x, y, count(*) as n
from have
group by x
;
quit;
For sql part
output with your code is
x y n
1 a 3 1 b 3 1 c 3 2 g 2 2 p 2 3 f 1
what i want isx y n
1 a 1 1 b 2 1 c 3 2 g 1 2 p 2 3 f 1
That's what I've tried to tell you: Use a SAS data step for such a use case, use SAS SQL for a group count. Take advantage of the differences between the SAS data step and SQL.
As you see, it is very simple in a data step. Maxim 14: Use the Right Tool.
Stay with the data step.
> However i am finding equivalent code in Proc SQL.
SQL is not very suited to processing rows in a given order.
A data step is.
> "row_number() over partition by " does not work
This non-ANSI feature is not implemented in SAS. Time to vote.
HI @V_R These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.
The idea is to use MONOTONIC() and RANK by GROUP.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;
You'd be surprised-- I came to this thread looking for a solution to this as a SAS user doing data analytics for business! Needed to randomly select X number of observations from each value of the grouped variable, with X being variable depending on the value. Idea being to evenly distribute the control group to with the other group based on this variable. So I randomly ordered my members in the group, and then needed to label them to prepare for selection. So your solution does have real-world value 🙂
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.