BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BrahmanandaRao
Lapis Lazuli | Level 10
data dup;
set sashelp.class;
proc sort data =dup;
by age;
run;

data d;
set dup;
by Age ;
if first.Age then n=0;
n+1;proc print;
run;

Hi Guys

Good Morning 

How to get same output using proc sql

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Just to show how it's done:

proc sql;
select
  *,
  (
    select count(*)
    from sashelp.class t2
    where t2.age = t1.age
    and t2.name le t1.name
  ) as n
from sashelp.class t1
order by age, name
;
quit;

Note that sub-selects perform horribly, so on any larger dataset this will cause time and resource issues.

To illustrate this:

data class;
set sashelp.class;
do i = 1 to 100000;
  output;
end;
run;

proc sql;
create table want as
  select
    *,
    (
      select count(*)
      from class t2
      where t2.age = t1.age
      and t2.name le t1.name
    ) as n
  from class t1
  order by age, name
;
quit;

proc sort
  data=class
  out=dup
;
by age;
run;

data d;
set dup;
by Age;
if first.Age
then n = 1;
else n + 1;
run;

Log:

 73         data class;
 74         set sashelp.class;
 75         do i = 1 to 100000;
 76           output;
 77         end;
 78         run;
 
 NOTE: There were 19 observations read from the data set SASHELP.CLASS.
 NOTE: The data set WORK.CLASS has 1900000 observations and 6 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.10 seconds
       cpu time            0.09 seconds
       
 
 79         
 80         proc sql;
 81         create table want as
 82           select
 83             *,
 84             (
 85               select count(*)
 86               from class t2
 87               where t2.age = t1.age
 88               and t2.name le t1.name
 89             ) as n
 90           from class t1
 91           order by age, name
 92         ;
 NOTE: Table WORK.WANT created, with 1900000 rows and 7 columns.
 
 93         quit;
 NOTE:  Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
       real time           5.82 seconds
       cpu time            5.97 seconds
       
 
 94         
 95         proc sort
 96           data=class
 97           out=dup
 98         ;
 99         by age;
 100        run;
 
 NOTE: There were 1900000 observations read from the data set WORK.CLASS.
 NOTE: The data set WORK.DUP has 1900000 observations and 6 variables.
 NOTE:  Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit):
       real time           0.51 seconds
       cpu time            0.88 seconds
       
 
 101        
 102        data d;
 103        set dup;
 104        by Age;
 105        if first.Age
 106        then n = 1;
 107        else n + 1;
 108        run;
 
 NOTE: There were 1900000 observations read from the data set WORK.DUP.
 NOTE: The data set WORK.D has 1900000 observations and 7 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.31 seconds
       cpu time            0.30 seconds

You can see that there's already a factor of 6 between DATA and SQL; this difference grow exponentially with dataset sizes.

Rule of thumb: in 90% of cases, DATA step solutions outperform SQL. On top of being easier to read and maintain.

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

You want an accumulating sum in SQL, correct?

Kurt_Bremser
Super User

Very simple.

COUNT(*) GROUP BY AGE in the SQL SELECT

 

Edit:

 

Belay that, you want a running count.

Maxim 14: Use the Right Tool, which is the data step. Don't waste your brain cycles on stupidity when there's a simple solution you already have.

BrahmanandaRao
Lapis Lazuli | Level 10
Hi Krutbremser
i want count as like as datastep output in proc sql
your code give only count age wise but i want

--------------------------------------------------------------------------------
The SAS System


Obs Name Sex Age Height Weight n
1 Joyce F 11 51.3 50.5 1
2 Thomas M 11 57.5 85.0 2
3 James M 12 57.3 83.0 1
4 Jane F 12 59.8 84.5 2
5 John M 12 59.0 99.5 3
6 Louise F 12 56.3 77.0 4
7 Robert M 12 64.8 128.0 5
8 Alice F 13 56.5 84.0 1
9 Barbara F 13 65.3 98.0 2
10 Jeffrey M 13 62.5 84.0 3
11 Alfred M 14 69.0 112.5 1
12 Carol F 14 62.8 102.5 2
13 Henry M 14 63.5 102.5 3
14 Judy F 14 64.3 90.0 4
15 Janet F 15 62.5 112.5 1
16 Mary F 15 66.5 112.0 2
17 Ronald M 15 67.0 133.0 3
18 William M 15 66.5 112.0 4
19 Philip M 16 72.0 150.0 1



Kurt_Bremser
Super User

Just to show how it's done:

proc sql;
select
  *,
  (
    select count(*)
    from sashelp.class t2
    where t2.age = t1.age
    and t2.name le t1.name
  ) as n
from sashelp.class t1
order by age, name
;
quit;

Note that sub-selects perform horribly, so on any larger dataset this will cause time and resource issues.

To illustrate this:

data class;
set sashelp.class;
do i = 1 to 100000;
  output;
end;
run;

proc sql;
create table want as
  select
    *,
    (
      select count(*)
      from class t2
      where t2.age = t1.age
      and t2.name le t1.name
    ) as n
  from class t1
  order by age, name
;
quit;

proc sort
  data=class
  out=dup
;
by age;
run;

data d;
set dup;
by Age;
if first.Age
then n = 1;
else n + 1;
run;

Log:

 73         data class;
 74         set sashelp.class;
 75         do i = 1 to 100000;
 76           output;
 77         end;
 78         run;
 
 NOTE: There were 19 observations read from the data set SASHELP.CLASS.
 NOTE: The data set WORK.CLASS has 1900000 observations and 6 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.10 seconds
       cpu time            0.09 seconds
       
 
 79         
 80         proc sql;
 81         create table want as
 82           select
 83             *,
 84             (
 85               select count(*)
 86               from class t2
 87               where t2.age = t1.age
 88               and t2.name le t1.name
 89             ) as n
 90           from class t1
 91           order by age, name
 92         ;
 NOTE: Table WORK.WANT created, with 1900000 rows and 7 columns.
 
 93         quit;
 NOTE:  Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
       real time           5.82 seconds
       cpu time            5.97 seconds
       
 
 94         
 95         proc sort
 96           data=class
 97           out=dup
 98         ;
 99         by age;
 100        run;
 
 NOTE: There were 1900000 observations read from the data set WORK.CLASS.
 NOTE: The data set WORK.DUP has 1900000 observations and 6 variables.
 NOTE:  Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit):
       real time           0.51 seconds
       cpu time            0.88 seconds
       
 
 101        
 102        data d;
 103        set dup;
 104        by Age;
 105        if first.Age
 106        then n = 1;
 107        else n + 1;
 108        run;
 
 NOTE: There were 1900000 observations read from the data set WORK.DUP.
 NOTE: The data set WORK.D has 1900000 observations and 7 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.31 seconds
       cpu time            0.30 seconds

You can see that there's already a factor of 6 between DATA and SQL; this difference grow exponentially with dataset sizes.

Rule of thumb: in 90% of cases, DATA step solutions outperform SQL. On top of being easier to read and maintain.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1597 views
  • 0 likes
  • 3 in conversation