Greetings all. I'm trying to turn a simple SQL group by into SAS code, but I'm not sure which proc I should use. The SQL is below.
select
max_n,
cr_scor,
bad_debt_at_conn,
dep_unpaid,
chg_off,
count(*) as count
from npso
where cr_scor is not null
and is_max_n = 1
group by max_n, cr_scor, bad_debt_at_conn, dep_unpaid, chg_off
I know I could just run proc sql, but I was thinking maybe there was a proc to do this. There are around 30 other variables in the table, but these are the only ones I want to keep. Thank you.
Greg
Better alternative is to use PROC SUMMARY as it does NOT calculate statistics at all, unless requested to do so. And of course, omitting calculation saves time (eventually you loose your coffee break )
Proc Summary NoPrint Nway Missing;
By Max_N Cr_Scor Bad_Dept_at_Conn Dep_Unpaid Chg_Off;
Output Out = Whichever_Name_You_Want (Drop = _type_);
Run;
This writes exactly 1 row for each combination of BY-variables.
BTW: You may need a PROC SORT before you call PROC SUMMARY if the data is not in proper order, or you may replace the BY statement with an identical CLASS statement. It depends on your input wether the SORT+BY or the CLASS version is most efficient.
Hi.
Why would you want to change a nicely looking SQL code to several PROCs and Data Steps?
You'd have to use PROC SORT, PROC MEANS/SUMMARY and at least one data step to combine the results from the MEANS/SUMMARY procedure.
Well, I've been using SQL Server for around 10 years now, and I'm pretty proficient with SQL. However, there are some ANSI standards in SQL, like 'WITH' statement, row_number(), among others, that are not included in SAS SQL, so in learning SAS, I would like to avoid using SQL as much as possible. I do use it, but the first couple of programs I wrote are nothing but a series of proc sql statements, and I feel like that is cheating a little bit.
:smileylaugh: fair.
Here is some really poorly written code, but it should give you an idea:
*using sashelp.class as example;
proc sql;
select
age, weight,height,sex, count(*) as count
from sashelp.class
where age is not null
group by sex,age;
quit;
It gives the following output:
Age Weight Height Sex count
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
11 50.5 51.3 F 1
12 84.5 59.8 F 2
12 77 56.3 F 2
13 98 65.3 F 2
13 84 56.5 F 2
14 102.5 62.8 F 2
14 90 64.3 F 2
15 112 66.5 F 2
15 112.5 62.5 F 2
11 85 57.5 M 1
12 83 57.3 M 3
12 99.5 59 M 3
12 128 64.8 M 3
13 84 62.5 M 1
14 112.5 69 M 2
14 102.5 63.5 M 2
15 112 66.5 M 2
15 133 67 M 2
16 150 72 M 1
Now, let's PROC it:
First sort by what you are "group by" in your SQL:
proc sort data = sashelp.class out = have; by sex age;run; *Create a temp file, so we don't sort the original;
Then we use "where" and "keep" statements in the "set" statement:
data want(keep = sex age cnt);
set temp(where = (age ne 0)
keep = age weight sex height);
by sex age;
if first.age then cnt = 1;
else cnt ++ 1;
if last.age; *we want the "total" count;
run;
Finally get it all together;
data really_want;
merge have want;
by sex age;
run;
Hee,hee...
Thank you Anca. That gives me some stuff to think about. I guess I thought there was a magic proc that would do the same as the SQL in one fell swoop.
Greg
Maybe there is a a PROC, I just don't know it...
Thanks, i learned alot about the SQL to SAS conversion from your discussion and posts.
I think that's a single proc means using a class statement, but you'll need to control the output using a ways or type statement. The only issue is since you're counting you need to have a variable to count, so I use max_n...though you can use any variable in your dataset.
proc means data=npso noprint;
class max_n cr_scor bad_debt_at_conn dep_unpaid chg_off;
ways max_n*cr_scor*bad_debt_at_conn*dep_unpaid*chg_off;
var max_n;
output out=want n=count;
run;
I think this will do it, but I'm not quite sure why there would be missing values for my 'N', which is just _freq_ renamed.
proc means data=npso noprint;
by max_n cr_scor bad_debt_at_conn dep_unpaid chg_off;
output out=want (rename=(_freq_=N)
keep=max_n cr_scor bad_debt_at_conn dep_unpaid chg_off n);
run;
Now, it seems my big problem is that I am unable to copy and paste anything in this window. Does anyone else have that happen?
Thank you
Greg
The _freq_ would give you what you want...we both forgot the initial where clause from the SQL query, if you put that in you might get what you wanted without renaming.
If you can't copy/paste that means you're using IE, use chrome or firefox instead.
I have the same copy/paste problem. The link shows you how I deal with the problem:
Well, that (what I posted above) can't be right either, since there are 36,925 obs in the want table, and only 27,519 in the source table. I have no option here to use anything other than explorer, but thank you for the tip Linlin.
The where clause needs to be added in:
proc means data=npso noprint;
where not missing(cr_scor)
and is_max_n = 1;
by max_n cr_scor bad_debt_at_conn dep_unpaid chg_off;
output out=want (rename=(_freq_=N)
keep=max_n cr_scor bad_debt_at_conn dep_unpaid chg_off n);
run;
Ok, I get it now. Since I dropped all but the columns I want, I did not see there are multiple rows for each category, one for each statatistic. It seems the only row I want to keep is where _stat_ = 'N'. The missing values were the STD row, which makes total sense now.
Better alternative is to use PROC SUMMARY as it does NOT calculate statistics at all, unless requested to do so. And of course, omitting calculation saves time (eventually you loose your coffee break )
Proc Summary NoPrint Nway Missing;
By Max_N Cr_Scor Bad_Dept_at_Conn Dep_Unpaid Chg_Off;
Output Out = Whichever_Name_You_Want (Drop = _type_);
Run;
This writes exactly 1 row for each combination of BY-variables.
BTW: You may need a PROC SORT before you call PROC SUMMARY if the data is not in proper order, or you may replace the BY statement with an identical CLASS statement. It depends on your input wether the SORT+BY or the CLASS version is most efficient.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.