BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ez123
Fluorite | Level 6

Trying to all character fields frequencies details in a SAS dataset.

Below are sample inputs and output details what I am trying to accomplish. 

Can someone able to help/suggest on this - how to get this done?. 

Input:

Key

Char1

Char2

Char3

Num1

Char4

Num2

2345

Yes

MN

expected

3445.50

 

1

3455

No

MN

Out of bucket

123.5

 

0

1234

 

MN

expected

0

Done

1

8865

Yes

PA

expected

0

In-progress

0

33345

Yes

PA

Out of bucket

808.34

Validation

1

74521

Yes

MN

Out of bucket

0

In-progress

1

11965

No

MN

expected

0

In-progress

1

 

Expected Output:

Var

Freq.

Count

Percentage

Char1

Yes

4

57.14286

Char1

No

2

28.57143

Char1

 

1

14.28571

Char2

MN

5

71.42857

Char2

PA

2

28.57143

Char3

expected

4

57.14286

Char3

Out of bucket

3

42.85714

Char4

Done

1

14.28571

Char4

In-progress

3

42.85714

Char4

Validation

1

14.28571

Char4

 

2

28.57143

 

 

Important Note: the original table has more than 150 fields and 2+ billion records. So the code should be more efficiency by performance. Currently I am using PROC content to get char fields name then using  PORC FREQ to get frequencies thru do loop.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Not going to meet the efficient requirements but this works quite well.

 

You can reference all character variables with _character_.

 

/*This code is an example of how to generate a table with 
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required

Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/

*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
	table _character_;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
	variable_value='Variable Value';
run;

*Display;
proc print data=want label;
run;

 

 

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

Certainly PROC FREQ can compute the Freq, Count and Percentage. It shouldn't be hard then to combine all of the Var into a single data set if that's what you want. Since you are already doing that, I don't understand what you are asking.

--
Paige Miller
Reeza
Super User

Not going to meet the efficient requirements but this works quite well.

 

You can reference all character variables with _character_.

 

/*This code is an example of how to generate a table with 
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required

Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/

*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
	table _character_;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
	variable_value='Variable Value';
run;

*Display;
proc print data=want label;
run;

 

 

PaigeMiller
Diamond | Level 26

@Reeza

ah yes, VVALUEX, I should have thought of that, brilliant

 

@ez123

It's hard to image any method being considered "efficient" with 2 BILLION records 

--
Paige Miller
ez123
Fluorite | Level 6

Thanks Reeza, this is almost match with my expectation. Just it is missing to calculate the null (missing) values. Can you please include that.

 

For example Total number of record is 7 and Char1 has only 6.  

 

variable

variable_value

Frequency

Percent

CumFrequency

CumPercent

Char1

No

2

33.33

2

33.33

Char1

Yes

4

66.67

6

100.00

Char2

MN

5

71.43

5

71.43

Char2

PA

2

28.57

7

100.00

Char3

Out of bucket

3

42.86

3

42.86

Char3

expected

4

57.14

7

100.00

Char4

Done

1

20.00

1

20.00

Char4

In-progress

3

60.00

4

80.00

Char4

Validation

1

20.00

5

100.00

 
Reeza
Super User

@ez123 wrote:

Thanks Reeza, this is almost match with my expectation. Just it is missing to calculate the null (missing) values. Can you please include that.

 

 


You should be able to figure that part out. Hint: it's 6 more characters.

Astounding
PROC Star

Step 1:  Become a little familiar with ODS.  For example, run a simple proc freq:

 

ODS Trace On;

proc freq data=have;

tables char1;

run;

 

This will tell you the names of the output elements that are available.  I think the one you want is named OneWayFreqs, but you will need to check that.

 

Step 2:  Run a PROC FREQ on all character variables, capturing the results using ODS.  Something along these lines:

 

proc freq data=have;

tables _character_;

ods output OneWayFreqs = lib.freq_results;

run;

 

That should give you what you are asking for, with a little bit of reformatting required.

 

 

******************EDITED:

 

Looks like Reeza beat me to the punch, with a little more detail as well.

Astounding
PROC Star

One more thing ...

 

  • If you are after a report rather than a data set, and
  • If the report format is mildly flexible

You should be able to use a very simple program:

 

proc tabulate data=have missing;

   class _character_;

   tables _character_, n*f=comma11.  pctn;

run;

novinosrin
Tourmaline | Level 20
data have;
infile datalines expandtabs truncover;
input Key	Char1 $	Char2 $	Char3 & $20.	Num1	Char4 & $20.	Num2;
datalines;
2345	Yes	MN	expected	3445.5	.	1
3455	No	MN	Out of bucket	123.5	.	0
1234	.	MN	expected	0	Done	1
8865	Yes	PA	expected	0	In-progress	 0
33345	Yes	PA	Out of bucket	808.34	Validation	1
74521	Yes	MN	Out of bucket	0	In-progress	 1
11965	No	MN	expected	0	In-progress	 1
;

data _null_;
if _n_=1 then do;
 dcl hash H (ordered: "A",hashexp: 20) ;
   h.definekey  ("var",'freq') ;
   h.definedata ("var","freq", "count",'pct') ;
   h.definedone () ;
end;
set have end=last nobs=nobs;
array t(*) _char_;
do _n_=1 to dim(t);
var=vname(t(_n_));
freq=t(_n_);
if h.check() ne 0 then do;
count=1;
pct=count/nobs*100;
h.replace();
end;
else if h.find()=0 then do;
count+1;
pct=count/nobs*100;
h.replace();
end;
end;
if last then h.output(dataset:"want");
run;
novinosrin
Tourmaline | Level 20
data have;
infile datalines expandtabs truncover;
input Key	Char1 $	Char2 $	Char3 & $20.	Num1	Char4 & $20.	Num2;
datalines;
2345	Yes	MN	expected	3445.5	.	1
3455	No	MN	Out of bucket	123.5	.	0
1234	.	MN	expected	0	Done	1
8865	Yes	PA	expected	0	In-progress	 0
33345	Yes	PA	Out of bucket	808.34	Validation	1
74521	Yes	MN	Out of bucket	0	In-progress	 1
11965	No	MN	expected	0	In-progress	 1
;

proc transpose data=have out=_have ;
by key notsorted;
var _char_;
run;
proc sql;
create table want as
select _name_ as var, col1 as freq,count(*) as count,(select count(*) from have) as cnt,calculated count/calculated cnt as pct
from _have
group by 1,2;
quit;
ballardw
Super User

I would be very tempted to try something along these lines:

proc tabulate data=have;
   class _character_ /missing;
   classlev _character_ /style=[just=r];
   table _character_,
         n=count colpctn='%'*f=9.5
         ;
run;

Though it is going to be a very long table given many variables and levels of variables.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1249 views
  • 9 likes
  • 6 in conversation