BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
V_R
Fluorite | Level 6 V_R
Fluorite | Level 6

Hi,

How can i add count rows by groups through proc sql

Have dataset

x y

1 a

1 b

1 c

2 g

2 p

3 f

 

wanted output 

x y n

1 a 1

1 b 2

1 c 3

2 g 1

2 p 2

3 f 1

 

Grouping variable is x

 

 

i know  how to do it with data step

 

Proc sort data = have ; 

by x;

run;

 

data want;

set have;

by x;

if first.x then N =1;

else N+1;

run;

 

However i am finding equivalent code in Proc SQL. 

 

"row_number() over partition by " does not work in proc sql

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

HI @V_R   These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.

 

The idea is to use MONOTONIC() and RANK by GROUP. 

 


data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join 
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;

View solution in original post

9 REPLIES 9
Patrick
Opal | Level 21

With SQL use a Group By clause. Looking at your sample data/desired result: What's the grouping variable? Why is n=1 for 1a but n=2 for 1b? If the grouping variable would be X then n should be 3 for the first 3 rows, if it's Y then it should be 1 for all rows.

V_R
Fluorite | Level 6 V_R
Fluorite | Level 6

Grouping variable is x

Patrick
Opal | Level 21

The data step you've posted is not really implementing a count by group but it's just counting up and you reset the count by group. There is not really something equivalent in ANSI SQL (many databases have extensions ANSI SQL - analytic functions - which allow for similar things).

With ANSI SQL you can have a count by group - but that works against sets of rows and not sequentially like with a SAS data step (compare the differences returned by below code).

Once you understand the differences between a SAS data step and SQL you can take full advantage of it and use whatever you need.

data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc print;
run;

data want;
  set have;
  by x;
  if first.x then
    N =1;
  else N+1;
run;
proc print;
run;

proc sql;
  select x, y, count(*) as n
  from have
  group by x
  ;
quit;
V_R
Fluorite | Level 6 V_R
Fluorite | Level 6

For sql part 

output with your code is

 

x y n
1 a 3 1 b 3 1 c 3 2 g 2 2 p 2 3 f 1
 

what i want is
x y n
1 a 1 1 b 2 1 c 3 2 g 1 2 p 2 3 f 1

 

Patrick
Opal | Level 21

That's what I've tried to tell you: Use a SAS data step for such a use case, use SAS SQL for a group count. Take advantage of the differences between the SAS data step and SQL.

ChrisNZ
Tourmaline | Level 20

> However i am finding equivalent code in Proc SQL.

SQL is not very suited to processing rows in a given order.

A data step is.

 

> "row_number() over partition by " does not work

This non-ANSI feature is not implemented in SAS. Time to vote.

novinosrin
Tourmaline | Level 20

HI @V_R   These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.

 

The idea is to use MONOTONIC() and RANK by GROUP. 

 


data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join 
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;
jmann132
Calcite | Level 5

You'd be surprised-- I came to this thread looking for a solution to this as a SAS user doing data analytics for business! Needed to randomly select X number of observations from each value of the grouped variable, with X being variable depending on the value. Idea being to evenly distribute the control group to with the other group based on this variable. So I randomly ordered my members in the group, and then needed to label them to prepare for selection. So your solution does have real-world value 🙂 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 18882 views
  • 8 likes
  • 6 in conversation