Trying to all character fields frequencies details in a SAS dataset.
Below are sample inputs and output details what I am trying to accomplish.
Can someone able to help/suggest on this - how to get this done?.
Input:
Key | Char1 | Char2 | Char3 | Num1 | Char4 | Num2 |
2345 | Yes | MN | expected | 3445.50 |
| 1 |
3455 | No | MN | Out of bucket | 123.5 |
| 0 |
1234 |
| MN | expected | 0 | Done | 1 |
8865 | Yes | PA | expected | 0 | In-progress | 0 |
33345 | Yes | PA | Out of bucket | 808.34 | Validation | 1 |
74521 | Yes | MN | Out of bucket | 0 | In-progress | 1 |
11965 | No | MN | expected | 0 | In-progress | 1 |
Expected Output:
Var | Freq. | Count | Percentage |
Char1 | Yes | 4 | 57.14286 |
Char1 | No | 2 | 28.57143 |
Char1 |
| 1 | 14.28571 |
Char2 | MN | 5 | 71.42857 |
Char2 | PA | 2 | 28.57143 |
Char3 | expected | 4 | 57.14286 |
Char3 | Out of bucket | 3 | 42.85714 |
Char4 | Done | 1 | 14.28571 |
Char4 | In-progress | 3 | 42.85714 |
Char4 | Validation | 1 | 14.28571 |
Char4 |
| 2 | 28.57143 |
Important Note: the original table has more than 150 fields and 2+ billion records. So the code should be more efficiency by performance. Currently I am using PROC content to get char fields name then using PORC FREQ to get frequencies thru do loop.
Not going to meet the efficient requirements but this works quite well.
You can reference all character variables with _character_.
/*This code is an example of how to generate a table with
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required
Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table _character_;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want label;
run;
Certainly PROC FREQ can compute the Freq, Count and Percentage. It shouldn't be hard then to combine all of the Var into a single data set if that's what you want. Since you are already doing that, I don't understand what you are asking.
Not going to meet the efficient requirements but this works quite well.
You can reference all character variables with _character_.
/*This code is an example of how to generate a table with
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required
Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table _character_;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want label;
run;
Thanks Reeza, this is almost match with my expectation. Just it is missing to calculate the null (missing) values. Can you please include that.
For example Total number of record is 7 and Char1 has only 6.
variable | variable_value | Frequency | Percent | CumFrequency | CumPercent |
Char1 | No | 2 | 33.33 | 2 | 33.33 |
Char1 | Yes | 4 | 66.67 | 6 | 100.00 |
Char2 | MN | 5 | 71.43 | 5 | 71.43 |
Char2 | PA | 2 | 28.57 | 7 | 100.00 |
Char3 | Out of bucket | 3 | 42.86 | 3 | 42.86 |
Char3 | expected | 4 | 57.14 | 7 | 100.00 |
Char4 | Done | 1 | 20.00 | 1 | 20.00 |
Char4 | In-progress | 3 | 60.00 | 4 | 80.00 |
Char4 | Validation | 1 | 20.00 | 5 | 100.00 |
@ez123 wrote:
Thanks Reeza, this is almost match with my expectation. Just it is missing to calculate the null (missing) values. Can you please include that.
You should be able to figure that part out. Hint: it's 6 more characters.
Step 1: Become a little familiar with ODS. For example, run a simple proc freq:
ODS Trace On;
proc freq data=have;
tables char1;
run;
This will tell you the names of the output elements that are available. I think the one you want is named OneWayFreqs, but you will need to check that.
Step 2: Run a PROC FREQ on all character variables, capturing the results using ODS. Something along these lines:
proc freq data=have;
tables _character_;
ods output OneWayFreqs = lib.freq_results;
run;
That should give you what you are asking for, with a little bit of reformatting required.
******************EDITED:
Looks like Reeza beat me to the punch, with a little more detail as well.
One more thing ...
You should be able to use a very simple program:
proc tabulate data=have missing;
class _character_;
tables _character_, n*f=comma11. pctn;
run;
data have;
infile datalines expandtabs truncover;
input Key Char1 $ Char2 $ Char3 & $20. Num1 Char4 & $20. Num2;
datalines;
2345 Yes MN expected 3445.5 . 1
3455 No MN Out of bucket 123.5 . 0
1234 . MN expected 0 Done 1
8865 Yes PA expected 0 In-progress 0
33345 Yes PA Out of bucket 808.34 Validation 1
74521 Yes MN Out of bucket 0 In-progress 1
11965 No MN expected 0 In-progress 1
;
data _null_;
if _n_=1 then do;
dcl hash H (ordered: "A",hashexp: 20) ;
h.definekey ("var",'freq') ;
h.definedata ("var","freq", "count",'pct') ;
h.definedone () ;
end;
set have end=last nobs=nobs;
array t(*) _char_;
do _n_=1 to dim(t);
var=vname(t(_n_));
freq=t(_n_);
if h.check() ne 0 then do;
count=1;
pct=count/nobs*100;
h.replace();
end;
else if h.find()=0 then do;
count+1;
pct=count/nobs*100;
h.replace();
end;
end;
if last then h.output(dataset:"want");
run;
data have;
infile datalines expandtabs truncover;
input Key Char1 $ Char2 $ Char3 & $20. Num1 Char4 & $20. Num2;
datalines;
2345 Yes MN expected 3445.5 . 1
3455 No MN Out of bucket 123.5 . 0
1234 . MN expected 0 Done 1
8865 Yes PA expected 0 In-progress 0
33345 Yes PA Out of bucket 808.34 Validation 1
74521 Yes MN Out of bucket 0 In-progress 1
11965 No MN expected 0 In-progress 1
;
proc transpose data=have out=_have ;
by key notsorted;
var _char_;
run;
proc sql;
create table want as
select _name_ as var, col1 as freq,count(*) as count,(select count(*) from have) as cnt,calculated count/calculated cnt as pct
from _have
group by 1,2;
quit;
I would be very tempted to try something along these lines:
proc tabulate data=have; class _character_ /missing; classlev _character_ /style=[just=r]; table _character_, n=count colpctn='%'*f=9.5 ; run;
Though it is going to be a very long table given many variables and levels of variables.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.