data dup;
set sashelp.class;
proc sort data =dup;
by age;
run;
data d;
set dup;
by Age ;
if first.Age then n=0;
n+1;proc print;
run;
Hi Guys
Good Morning
How to get same output using proc sql
Just to show how it's done:
proc sql;
select
*,
(
select count(*)
from sashelp.class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from sashelp.class t1
order by age, name
;
quit;
Note that sub-selects perform horribly, so on any larger dataset this will cause time and resource issues.
To illustrate this:
data class;
set sashelp.class;
do i = 1 to 100000;
output;
end;
run;
proc sql;
create table want as
select
*,
(
select count(*)
from class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from class t1
order by age, name
;
quit;
proc sort
data=class
out=dup
;
by age;
run;
data d;
set dup;
by Age;
if first.Age
then n = 1;
else n + 1;
run;
Log:
73 data class; 74 set sashelp.class; 75 do i = 1 to 100000; 76 output; 77 end; 78 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS has 1900000 observations and 6 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.10 seconds cpu time 0.09 seconds 79 80 proc sql; 81 create table want as 82 select 83 *, 84 ( 85 select count(*) 86 from class t2 87 where t2.age = t1.age 88 and t2.name le t1.name 89 ) as n 90 from class t1 91 order by age, name 92 ; NOTE: Table WORK.WANT created, with 1900000 rows and 7 columns. 93 quit; NOTE: Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit): real time 5.82 seconds cpu time 5.97 seconds 94 95 proc sort 96 data=class 97 out=dup 98 ; 99 by age; 100 run; NOTE: There were 1900000 observations read from the data set WORK.CLASS. NOTE: The data set WORK.DUP has 1900000 observations and 6 variables. NOTE: Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit): real time 0.51 seconds cpu time 0.88 seconds 101 102 data d; 103 set dup; 104 by Age; 105 if first.Age 106 then n = 1; 107 else n + 1; 108 run; NOTE: There were 1900000 observations read from the data set WORK.DUP. NOTE: The data set WORK.D has 1900000 observations and 7 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.31 seconds cpu time 0.30 seconds
You can see that there's already a factor of 6 between DATA and SQL; this difference grow exponentially with dataset sizes.
Rule of thumb: in 90% of cases, DATA step solutions outperform SQL. On top of being easier to read and maintain.
You want an accumulating sum in SQL, correct?
Very simple.
COUNT(*) GROUP BY AGE in the SQL SELECT
Edit:
Belay that, you want a running count.
Maxim 14: Use the Right Tool, which is the data step. Don't waste your brain cycles on stupidity when there's a simple solution you already have.
Just to show how it's done:
proc sql;
select
*,
(
select count(*)
from sashelp.class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from sashelp.class t1
order by age, name
;
quit;
Note that sub-selects perform horribly, so on any larger dataset this will cause time and resource issues.
To illustrate this:
data class;
set sashelp.class;
do i = 1 to 100000;
output;
end;
run;
proc sql;
create table want as
select
*,
(
select count(*)
from class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from class t1
order by age, name
;
quit;
proc sort
data=class
out=dup
;
by age;
run;
data d;
set dup;
by Age;
if first.Age
then n = 1;
else n + 1;
run;
Log:
73 data class; 74 set sashelp.class; 75 do i = 1 to 100000; 76 output; 77 end; 78 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS has 1900000 observations and 6 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.10 seconds cpu time 0.09 seconds 79 80 proc sql; 81 create table want as 82 select 83 *, 84 ( 85 select count(*) 86 from class t2 87 where t2.age = t1.age 88 and t2.name le t1.name 89 ) as n 90 from class t1 91 order by age, name 92 ; NOTE: Table WORK.WANT created, with 1900000 rows and 7 columns. 93 quit; NOTE: Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit): real time 5.82 seconds cpu time 5.97 seconds 94 95 proc sort 96 data=class 97 out=dup 98 ; 99 by age; 100 run; NOTE: There were 1900000 observations read from the data set WORK.CLASS. NOTE: The data set WORK.DUP has 1900000 observations and 6 variables. NOTE: Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit): real time 0.51 seconds cpu time 0.88 seconds 101 102 data d; 103 set dup; 104 by Age; 105 if first.Age 106 then n = 1; 107 else n + 1; 108 run; NOTE: There were 1900000 observations read from the data set WORK.DUP. NOTE: The data set WORK.D has 1900000 observations and 7 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.31 seconds cpu time 0.30 seconds
You can see that there's already a factor of 6 between DATA and SQL; this difference grow exponentially with dataset sizes.
Rule of thumb: in 90% of cases, DATA step solutions outperform SQL. On top of being easier to read and maintain.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.