SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
V_R
Fluorite | Level 6 V_R
Fluorite | Level 6

Hi,

How can i add count rows by groups through proc sql

Have dataset

x y

1 a

1 b

1 c

2 g

2 p

3 f

 

wanted output 

x y n

1 a 1

1 b 2

1 c 3

2 g 1

2 p 2

3 f 1

 

Grouping variable is x

 

 

i know  how to do it with data step

 

Proc sort data = have ; 

by x;

run;

 

data want;

set have;

by x;

if first.x then N =1;

else N+1;

run;

 

However i am finding equivalent code in Proc SQL. 

 

"row_number() over partition by " does not work in proc sql

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

HI @V_R   These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.

 

The idea is to use MONOTONIC() and RANK by GROUP. 

 


data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join 
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;

View solution in original post

9 REPLIES 9
Patrick
Opal | Level 21

With SQL use a Group By clause. Looking at your sample data/desired result: What's the grouping variable? Why is n=1 for 1a but n=2 for 1b? If the grouping variable would be X then n should be 3 for the first 3 rows, if it's Y then it should be 1 for all rows.

V_R
Fluorite | Level 6 V_R
Fluorite | Level 6

Grouping variable is x

Patrick
Opal | Level 21

The data step you've posted is not really implementing a count by group but it's just counting up and you reset the count by group. There is not really something equivalent in ANSI SQL (many databases have extensions ANSI SQL - analytic functions - which allow for similar things).

With ANSI SQL you can have a count by group - but that works against sets of rows and not sequentially like with a SAS data step (compare the differences returned by below code).

Once you understand the differences between a SAS data step and SQL you can take full advantage of it and use whatever you need.

data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc print;
run;

data want;
  set have;
  by x;
  if first.x then
    N =1;
  else N+1;
run;
proc print;
run;

proc sql;
  select x, y, count(*) as n
  from have
  group by x
  ;
quit;
V_R
Fluorite | Level 6 V_R
Fluorite | Level 6

For sql part 

output with your code is

 

x y n1 a 3
1 b 3
1 c 3
2 g 2
2 p 2
3 f 1 

what i want is
x y n1 a 1 1 b 2 1 c 3 2 g 1 2 p 2 3 f 1

 

Patrick
Opal | Level 21

That's what I've tried to tell you: Use a SAS data step for such a use case, use SAS SQL for a group count. Take advantage of the differences between the SAS data step and SQL.

ChrisNZ
Tourmaline | Level 20

> However i am finding equivalent code in Proc SQL.

SQL is not very suited to processing rows in a given order.

A data step is.

 

> "row_number() over partition by " does not work

This non-ANSI feature is not implemented in SAS. Time to vote.

novinosrin
Tourmaline | Level 20

HI @V_R   These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.

 

The idea is to use MONOTONIC() and RANK by GROUP. 

 


data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join 
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;
jmann132
Calcite | Level 5

You'd be surprised-- I came to this thread looking for a solution to this as a SAS user doing data analytics for business! Needed to randomly select X number of observations from each value of the grouped variable, with X being variable depending on the value. Idea being to evenly distribute the control group to with the other group based on this variable. So I randomly ordered my members in the group, and then needed to label them to prepare for selection. So your solution does have real-world value 🙂 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 23050 views
  • 8 likes
  • 6 in conversation