Solved: Re: Counting row by group through proc sql

V_R · Posted 02-16-2020 02:28 AM

Hi,

How can i add count rows by groups through proc sql

Have dataset

x y

1 a

1 b

1 c

2 g

2 p

3 f

wanted output

x y n

1 a 1

1 b 2

1 c 3

2 g 1

2 p 2

3 f 1

Grouping variable is x

i know how to do it with data step

Proc sort data = have ;

by x;

run;

data want;

set have;

by x;

if first.x then N =1;

else N+1;

run;

However i am finding equivalent code in Proc SQL.

"row_number() over partition by " does not work in proc sql

novinosrin · Posted 02-16-2020 09:34 AM

HI @V_R These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.

The idea is to use MONOTONIC() and RANK by GROUP.


data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join 
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;

View solution in original post

Patrick · Posted 02-16-2020 02:44 AM

With SQL use a Group By clause. Looking at your sample data/desired result: What's the grouping variable? Why is n=1 for 1a but n=2 for 1b? If the grouping variable would be X then n should be 3 for the first 3 rows, if it's Y then it should be 1 for all rows.

V_R · Posted 02-16-2020 02:48 AM

Grouping variable is x

Patrick · Posted 02-16-2020 03:10 AM

The data step you've posted is not really implementing a count by group but it's just counting up and you reset the count by group. There is not really something equivalent in ANSI SQL (many databases have extensions ANSI SQL - analytic functions - which allow for similar things).

With ANSI SQL you can have a count by group - but that works against sets of rows and not sequentially like with a SAS data step (compare the differences returned by below code).

Once you understand the differences between a SAS data step and SQL you can take full advantage of it and use whatever you need.

data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc print;
run;

data want;
  set have;
  by x;
  if first.x then
    N =1;
  else N+1;
run;
proc print;
run;

proc sql;
  select x, y, count(*) as n
  from have
  group by x
  ;
quit;

V_R · Posted 02-16-2020 03:18 AM

For sql part

output with your code is

x y n
1 a 3
1 b 3
1 c 3
2 g 2
2 p 2
3 f 1 

what i want is 
x y n
1 a 1
1 b 2
1 c 3
2 g 1
2 p 2
3 f 1

Patrick · Posted 02-16-2020 03:23 AM

That's what I've tried to tell you: Use a SAS data step for such a use case, use SAS SQL for a group count. Take advantage of the differences between the SAS data step and SQL.

Kurt_Bremser · Posted 02-16-2020 03:23 AM

As you see, it is very simple in a data step. Maxim 14: Use the Right Tool.

Stay with the data step.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisNZ · Posted 02-16-2020 04:15 AM

> However i am finding equivalent code in Proc SQL.

SQL is not very suited to processing rows in a given order.

A data step is.

> "row_number() over partition by " does not work

This non-ANSI feature is not implemented in SAS. Time to vote.

High-Performance SAS Coding - Third Edition

novinosrin · Posted 02-16-2020 09:34 AM

HI @V_R These kind of questions seems more of university fun puzzles rather than a practical application in industry. How I wish I can reverse the clock and go back to Uni. I am missing those days at the lab.

The idea is to use MONOTONIC() and RANK by GROUP.


data have;
  input x y $;
  datalines;
1 a
1 b
1 c
2 g
2 p
3 f
;
proc sql;
create table want as
select a.x,a.y,count(b.m) as n
from (select *,monotonic() as m from have) a
left join 
(select *,monotonic() as m from have) b
on a.x=b.x and b.m<=a.m
group by a.x,a.y
order by a.x,n;
quit;

jmann132 · Posted 07-15-2020 03:13 PM

You'd be surprised-- I came to this thread looking for a solution to this as a SAS user doing data analytics for business! Needed to randomly select X number of observations from each value of the grouped variable, with X being variable depending on the value. Idea being to evenly distribute the control group to with the other group based on this variable. So I randomly ordered my members in the group, and then needed to label them to prepare for selection. So your solution does have real-world value 🙂

Classroom Training Available!