Hi SAS gurus,
I am having a problem using proc freq out= to generate a new dataset. I have 5 variables in the original dataset (Dummy) and I am using sas proc freq procedure to generate some frequency counts and generate a new dataset, but also want to have all 5 variables from the original dataset 'Dummy' into this new dataset egenrated by using proc freq out option.
ex. my original dataset has
Name, Gender, County, Race, Address
after using proc freq data=dummy; tables gender/out=test; run;
I will have only Gender (male/Female, Freq, Percent, CumFreq, CumPercent) in 'test' dataset. But I want Gender, Race, Address, County and Name along with Freq and Percent from the test dataset. So the new dataset 'test' should have 5+2 variables in it.
Can anyone help me with this?
Thank you so much,
Malathi
Please show us the proc freq code that you are using. Without seeing it, I suspect you can solve the problem by specifying a five-way TABLES statement.
Just tell PROC FREQ that you want to use all of the variables.
tables Name*Gender*County*Race*Address / out=test;
Are you sure you won't end up with count of 1 for every combination?
The OUT= option on a table statement will only do one table request. If you have multiple variables you would need one table statement per variable.
OR
Use other options that will create data sets that have different structures:
proc freq data=sashelp.class; ods output onewayfreqs= work.freqs; table sex age ; run; proc tabulate data=sashelp.class out=work.tabulate; class sex age; table sex age, n pctn; run;
I just used two variables from a data set you should have so that you could examine the output for the data set structures. In most cases the Proc Tabulate approach may need the option /missing on the class statement as any record with a missing class variable will be excluded. In which case the percentages will be different that Proc Freq will produce for one-way tables.
Looks like you just want to remerge your frequencies back with the original data by gender. There're gobs of ways to do it after proc FREQ is done. Here's one:
data have ;
input gender $ name $ age ;
cards ;
M Alfred 14
F Barbara 13
F Carol 14
F Jane 12
F Janet 15
M Jeffrey 13
M John 12
F Judy 14
F Mary 15
M Robert 12
;
run ;
proc freq data = have noprint ;
tables gender / out = freq (index=(gender)) outcum ;
run ;
data want ;
set have ;
set freq key = gender / unique ;
run ;
Kind regards
Paul D.
SQL?
proc sql;
create table want as
select
a.*,
count(sex) as count_sex,
calculated count_sex / (select count(*) from sashelp.class) as percent_sex format=percent8.2
from sashelp.class a
group by sex
order by name;
quit;
Maybe more performant, but needs an extra step:
proc sql noprint;
select nobs into :nobs from dictionary.tables
where libname = 'SASHELP' and memname = 'CLASS';
create table want as
select
a.*,
count(sex) as count_sex,
calculated count_sex / &nobs as percent_sex format=percent8.2
from sashelp.class a
group by sex
order by name;
quit;
Both are fine, and the extra step for extra performance is more than well justified - after all, the performance penalty of getting stuff from the dictionary tables is next to nil.
The only problem is, due to the nature of the SQL's "no concept of row sequence", it can't give you the cumulatives. If I were tasked with doing the whole enchilada in a single step, I'd most likely resort to a hash. For example:
data have ;
input gender $ name $ age ;
cards ;
M Alfred 14
F Barbara 13
F Carol 14
F Jane 12
F Janet 15
M Jeffrey 13
M John 12
F Judy 14
F Mary 15
M Robert 12
;
run ;
data want ;
dcl hash f () ;
f.definekey ("gender") ;
f.definedata ("gender", "count") ;
f.definedone () ;
dcl hiter fi ("f") ;
dcl hash q () ;
q.definekey ("gender") ;
q.definedata ("count", "cum_freq", "cum_pct") ;
q.definedone () ;
do until (z1) ;
set have end = z1 nobs = n ;
if f.find() ne 0 then count = 1 ;
else count + 1 ;
f.replace() ;
end ;
do while (fi.next() = 0) ;
pct = 100 * divide (count, n) ;
cum_freq + count ;
cum_pct + pct ;
q.add() ;
end ;
do until (z2) ;
set have end = z2 ;
q.find() ;
output ;
end ;
run ;
However, I have to admit that though it takes a hash man but a few minutes to code it, in the real world I'd rather stick to FREQ and backward merge, especially since in the case of gender, the FREQ output data set has only 2 records, anyway. Plus, FREQ gives me everything I need basically just by aping rather than programming, so why bother? Not to mention those going on to cannibalize on my code who aren't exactly hash men.
Kind regards
Paul D.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.