data dup;
set sashelp.class;
proc sort data =dup;
by age;
run;
data d;
set dup;
by Age ;
if first.Age then n=0;
n+1;proc print;
run;Hi Guys
Good Morning
How to get same output using proc sql
Just to show how it's done:
proc sql;
select
*,
(
select count(*)
from sashelp.class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from sashelp.class t1
order by age, name
;
quit;
Note that sub-selects perform horribly, so on any larger dataset this will cause time and resource issues.
To illustrate this:
data class;
set sashelp.class;
do i = 1 to 100000;
output;
end;
run;
proc sql;
create table want as
select
*,
(
select count(*)
from class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from class t1
order by age, name
;
quit;
proc sort
data=class
out=dup
;
by age;
run;
data d;
set dup;
by Age;
if first.Age
then n = 1;
else n + 1;
run;
Log:
73 data class;
74 set sashelp.class;
75 do i = 1 to 100000;
76 output;
77 end;
78 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 1900000 observations and 6 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.10 seconds
cpu time 0.09 seconds
79
80 proc sql;
81 create table want as
82 select
83 *,
84 (
85 select count(*)
86 from class t2
87 where t2.age = t1.age
88 and t2.name le t1.name
89 ) as n
90 from class t1
91 order by age, name
92 ;
NOTE: Table WORK.WANT created, with 1900000 rows and 7 columns.
93 quit;
NOTE: Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
real time 5.82 seconds
cpu time 5.97 seconds
94
95 proc sort
96 data=class
97 out=dup
98 ;
99 by age;
100 run;
NOTE: There were 1900000 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.DUP has 1900000 observations and 6 variables.
NOTE: Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit):
real time 0.51 seconds
cpu time 0.88 seconds
101
102 data d;
103 set dup;
104 by Age;
105 if first.Age
106 then n = 1;
107 else n + 1;
108 run;
NOTE: There were 1900000 observations read from the data set WORK.DUP.
NOTE: The data set WORK.D has 1900000 observations and 7 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.31 seconds
cpu time 0.30 seconds
You can see that there's already a factor of 6 between DATA and SQL; this difference grow exponentially with dataset sizes.
Rule of thumb: in 90% of cases, DATA step solutions outperform SQL. On top of being easier to read and maintain.
You want an accumulating sum in SQL, correct?
Very simple.
COUNT(*) GROUP BY AGE in the SQL SELECT
Edit:
Belay that, you want a running count.
Maxim 14: Use the Right Tool, which is the data step. Don't waste your brain cycles on stupidity when there's a simple solution you already have.
Just to show how it's done:
proc sql;
select
*,
(
select count(*)
from sashelp.class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from sashelp.class t1
order by age, name
;
quit;
Note that sub-selects perform horribly, so on any larger dataset this will cause time and resource issues.
To illustrate this:
data class;
set sashelp.class;
do i = 1 to 100000;
output;
end;
run;
proc sql;
create table want as
select
*,
(
select count(*)
from class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from class t1
order by age, name
;
quit;
proc sort
data=class
out=dup
;
by age;
run;
data d;
set dup;
by Age;
if first.Age
then n = 1;
else n + 1;
run;
Log:
73 data class;
74 set sashelp.class;
75 do i = 1 to 100000;
76 output;
77 end;
78 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 1900000 observations and 6 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.10 seconds
cpu time 0.09 seconds
79
80 proc sql;
81 create table want as
82 select
83 *,
84 (
85 select count(*)
86 from class t2
87 where t2.age = t1.age
88 and t2.name le t1.name
89 ) as n
90 from class t1
91 order by age, name
92 ;
NOTE: Table WORK.WANT created, with 1900000 rows and 7 columns.
93 quit;
NOTE: Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
real time 5.82 seconds
cpu time 5.97 seconds
94
95 proc sort
96 data=class
97 out=dup
98 ;
99 by age;
100 run;
NOTE: There were 1900000 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.DUP has 1900000 observations and 6 variables.
NOTE: Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit):
real time 0.51 seconds
cpu time 0.88 seconds
101
102 data d;
103 set dup;
104 by Age;
105 if first.Age
106 then n = 1;
107 else n + 1;
108 run;
NOTE: There were 1900000 observations read from the data set WORK.DUP.
NOTE: The data set WORK.D has 1900000 observations and 7 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.31 seconds
cpu time 0.30 seconds
You can see that there's already a factor of 6 between DATA and SQL; this difference grow exponentially with dataset sizes.
Rule of thumb: in 90% of cases, DATA step solutions outperform SQL. On top of being easier to read and maintain.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.