data dup;
set sashelp.class;
proc sort data =dup;
by age;
run;
data d;
set dup;
by Age ;
if first.Age then n=0;
n+1;proc print;
run;
Hi Guys
Good Morning
How to get same output using proc sql
Just to show how it's done:
proc sql;
select
*,
(
select count(*)
from sashelp.class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from sashelp.class t1
order by age, name
;
quit;
Note that sub-selects perform horribly, so on any larger dataset this will cause time and resource issues.
To illustrate this:
data class;
set sashelp.class;
do i = 1 to 100000;
output;
end;
run;
proc sql;
create table want as
select
*,
(
select count(*)
from class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from class t1
order by age, name
;
quit;
proc sort
data=class
out=dup
;
by age;
run;
data d;
set dup;
by Age;
if first.Age
then n = 1;
else n + 1;
run;
Log:
73 data class; 74 set sashelp.class; 75 do i = 1 to 100000; 76 output; 77 end; 78 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS has 1900000 observations and 6 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.10 seconds cpu time 0.09 seconds 79 80 proc sql; 81 create table want as 82 select 83 *, 84 ( 85 select count(*) 86 from class t2 87 where t2.age = t1.age 88 and t2.name le t1.name 89 ) as n 90 from class t1 91 order by age, name 92 ; NOTE: Table WORK.WANT created, with 1900000 rows and 7 columns. 93 quit; NOTE: Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit): real time 5.82 seconds cpu time 5.97 seconds 94 95 proc sort 96 data=class 97 out=dup 98 ; 99 by age; 100 run; NOTE: There were 1900000 observations read from the data set WORK.CLASS. NOTE: The data set WORK.DUP has 1900000 observations and 6 variables. NOTE: Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit): real time 0.51 seconds cpu time 0.88 seconds 101 102 data d; 103 set dup; 104 by Age; 105 if first.Age 106 then n = 1; 107 else n + 1; 108 run; NOTE: There were 1900000 observations read from the data set WORK.DUP. NOTE: The data set WORK.D has 1900000 observations and 7 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.31 seconds cpu time 0.30 seconds
You can see that there's already a factor of 6 between DATA and SQL; this difference grow exponentially with dataset sizes.
Rule of thumb: in 90% of cases, DATA step solutions outperform SQL. On top of being easier to read and maintain.
You want an accumulating sum in SQL, correct?
Very simple.
COUNT(*) GROUP BY AGE in the SQL SELECT
Edit:
Belay that, you want a running count.
Maxim 14: Use the Right Tool, which is the data step. Don't waste your brain cycles on stupidity when there's a simple solution you already have.
Just to show how it's done:
proc sql;
select
*,
(
select count(*)
from sashelp.class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from sashelp.class t1
order by age, name
;
quit;
Note that sub-selects perform horribly, so on any larger dataset this will cause time and resource issues.
To illustrate this:
data class;
set sashelp.class;
do i = 1 to 100000;
output;
end;
run;
proc sql;
create table want as
select
*,
(
select count(*)
from class t2
where t2.age = t1.age
and t2.name le t1.name
) as n
from class t1
order by age, name
;
quit;
proc sort
data=class
out=dup
;
by age;
run;
data d;
set dup;
by Age;
if first.Age
then n = 1;
else n + 1;
run;
Log:
73 data class; 74 set sashelp.class; 75 do i = 1 to 100000; 76 output; 77 end; 78 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS has 1900000 observations and 6 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.10 seconds cpu time 0.09 seconds 79 80 proc sql; 81 create table want as 82 select 83 *, 84 ( 85 select count(*) 86 from class t2 87 where t2.age = t1.age 88 and t2.name le t1.name 89 ) as n 90 from class t1 91 order by age, name 92 ; NOTE: Table WORK.WANT created, with 1900000 rows and 7 columns. 93 quit; NOTE: Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit): real time 5.82 seconds cpu time 5.97 seconds 94 95 proc sort 96 data=class 97 out=dup 98 ; 99 by age; 100 run; NOTE: There were 1900000 observations read from the data set WORK.CLASS. NOTE: The data set WORK.DUP has 1900000 observations and 6 variables. NOTE: Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit): real time 0.51 seconds cpu time 0.88 seconds 101 102 data d; 103 set dup; 104 by Age; 105 if first.Age 106 then n = 1; 107 else n + 1; 108 run; NOTE: There were 1900000 observations read from the data set WORK.DUP. NOTE: The data set WORK.D has 1900000 observations and 7 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.31 seconds cpu time 0.30 seconds
You can see that there's already a factor of 6 between DATA and SQL; this difference grow exponentially with dataset sizes.
Rule of thumb: in 90% of cases, DATA step solutions outperform SQL. On top of being easier to read and maintain.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.