- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
How can i add count rows by groups through proc sql
Have dataset
x y
1 a
1 b
1 c
2 g
2 p
3 f
wanted output
x y n
1 a 1
1 b 2
1 c 3
2 g 1
2 p 2
3 f 1
Grouping variable is x
i know how to do it with data step
Proc sort data = have ;
by x;
run;
data want;
set have;
by x;
if first.x then N =1;
else N+1;
run;
However i am finding equivalent code in Proc SQL.
"row_number() over partition by " does not work in proc sql
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HI @V_R These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.
The idea is to use MONOTONIC() and RANK by GROUP.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
With SQL use a Group By clause. Looking at your sample data/desired result: What's the grouping variable? Why is n=1 for 1a but n=2 for 1b? If the grouping variable would be X then n should be 3 for the first 3 rows, if it's Y then it should be 1 for all rows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Grouping variable is x
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The data step you've posted is not really implementing a count by group but it's just counting up and you reset the count by group. There is not really something equivalent in ANSI SQL (many databases have extensions ANSI SQL - analytic functions - which allow for similar things).
With ANSI SQL you can have a count by group - but that works against sets of rows and not sequentially like with a SAS data step (compare the differences returned by below code).
Once you understand the differences between a SAS data step and SQL you can take full advantage of it and use whatever you need.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc print;
run;
data want;
set have;
by x;
if first.x then
N =1;
else N+1;
run;
proc print;
run;
proc sql;
select x, y, count(*) as n
from have
group by x
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For sql part
output with your code is
x y n1 a 3 1 b 3 1 c 3 2 g 2 2 p 2 3 f 1
what i want isx y n1 a 1 1 b 2 1 c 3 2 g 1 2 p 2 3 f 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's what I've tried to tell you: Use a SAS data step for such a use case, use SAS SQL for a group count. Take advantage of the differences between the SAS data step and SQL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As you see, it is very simple in a data step. Maxim 14: Use the Right Tool.
Stay with the data step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> However i am finding equivalent code in Proc SQL.
SQL is not very suited to processing rows in a given order.
A data step is.
> "row_number() over partition by " does not work
This non-ANSI feature is not implemented in SAS. Time to vote.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HI @V_R These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.
The idea is to use MONOTONIC() and RANK by GROUP.
data have;
input x y $;
datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You'd be surprised-- I came to this thread looking for a solution to this as a SAS user doing data analytics for business! Needed to randomly select X number of observations from each value of the grouped variable, with X being variable depending on the value. Idea being to evenly distribute the control group to with the other group based on this variable. So I randomly ordered my members in the group, and then needed to label them to prepare for selection. So your solution does have real-world value 🙂