5 examples introducing a number of tips & tricks using PROC SQL to do other stuff than join - and some more...
It's not a discussion about advantages/disadvantages or trying to provide any best practices regarding using Proc SQL versa other proc’s - Just some inspiration 😉
You should be able to copy, paste and run the code.
Link to documentation at the bottom for more information.
* 1) proc freq count *;
* simple Proc freq etc *;
Proc Freq data=sashelp.cars;
Tables model;
Run;
proc sql number /* NOEXEC */ ;
/* validate */
select count(*) as row_count
, count(model) as model_count
, count(distinct model) as model_unique
, nmiss(Cylinders) 'Cylinders_missing_value' /* alternative syntax naming column */
, nmiss(origin) as Origin_missing_value /* nmiss works with both num and char */
from sashelp.cars;
quit;
|
The first example is a simple Data Quality assessment - See if a column is unique / contains duplicates or contains missing values |
* 2) Proc means calculations *;
* same function can sum more columns or more rows *;
* columns / down (aggregate function)*;
proc sql ;
create table means_sql as
select make, mean(MPG_Highway) as mean_MPG_Highway, avg(MPG_Highway) as avg_MPG_Highway
from sashelp.cars
group by make;
quit;
proc means data=sashelp.cars mean min max;
by make;
var MPG_Highway;
output out=cars_proc;
run;
* colums / across (scalar function)*;
proc sql;
select MONOTONIC() AS ROW_NUM, make, model, mean(MPG_Highway,MPG_Highway) as MPG
from sashelp.cars;
quit;
|
Can you spot the undocumented feature? |
* 3) Duplicates ? *;
* proc sort Duplicates - data contains 428 rows*;
proc sql feedback /* expanding select * */;
create table dist_model_sql425 as
select distinct model
from sashelp.cars;
%put SQLOBS=&SQLOBS. SQLRC=&SQLRC. SQLOOPS=&SQLOOPS. SQLEXITCODE=&SQLEXITCODE.;
create table dups_model_sql6 as
select * /* look in the log to see the expanded "select *" */
from sashelp.cars
group by model
having count(model) > 1;
quit;
proc sort data=sashelp.cars
out=dist_model_proc425 NODUPKEY /* delete one of the duplicates */
dupout=dups_model_proc3; * output keys that has duplicates *;
by model;
run;
proc sort data=sashelp.cars(keep=make model drivetrain)
out=dist_model_proc6 NOUNIQUEKEY /* rows with duplicates keys */
uniqueout=dups_model_proc422; * remove rows with duplicate keys (428-6);
by model;
run;
* Find rows in one table not in the other *;
proc sql;
title 'sql6 (drivetrain rear,all)';
select * from dups_model_sql6;
title 'proc3 (drivetrain all)';
select * from dups_model_proc3;
title 'Rows from sql6 not in proc3 - (drivetrain=rear)';
select * from dups_model_sql6
except
select * from dups_model_proc3;
quit;
title;
|
Pay close attention to the number of OBS, and the output of each table |
* 4) simple proc Rank *;
* simple PROC RANK - rank = 1 = worst*;
proc sql;
create table rank_sql as
select make,model, a.MPG_City
, min(MPG_City) as min_MPG_City
, avg(MPG_City) as avg_MPG_City format=3.
, max(MPG_City) as max_MPG_City
, (select count(distinct b.MPG_City)
from sashelp.cars b
where b.MPG_City <= a.MPG_City) as rank
from sashelp.cars a
order by rank desc;
quit;
proc rank data=sashelp.cars
out=rank_proc ties=low /* descending */;
var MPG_City;
ranks rank;
run;
|
So, what if we wanted the best MPG to have number 1 ? |
* 5) simple proc surveyselect *;
proc sql outobs = 8;
create table rand_sql as
select *
from sashelp.cars
order by rand('Normal');
quit;
proc surveyselect data=sashelp.cars
out=rand_proc
method=srs
sampsize=8
noprint;
run;
|
Are you still using ranuni() ? |
Link to Documentation (hover above link to see hint):
- options fullstimer
- Proc Freq
- Proc sql
- NOEXEC
- Validate
- Proc sql number
- count(distinct X)
- Nmiss() – have a look at Cmiss also
- Proc means
- MONOTONIC undocumented SAS function!
- SQLOBS, SQLRC, SQLOOPS, SQLEXITCODE
- feedback – expanded “select *”
- Proc sort NOUNIQUEKEY / UNIQUEOUT=
- Proc sql except
- Proc rank
- Proc surveyselect
- rand() function