Trying to all character fields frequencies details in a SAS dataset.

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Trying to all character fields frequencies details in a SAS dataset.

Trying to all character fields frequencies details in a SAS dataset.

Below are sample inputs and output details what I am trying to accomplish. 

Can someone able to help/suggest on this - how to get this done?. 

Input:

Key

Char1

Char2

Char3

Num1

Char4

Num2

2345

Yes

MN

expected

3445.50

 

1

3455

No

MN

Out of bucket

123.5

 

0

1234

 

MN

expected

0

Done

1

8865

Yes

PA

expected

0

In-progress

0

33345

Yes

PA

Out of bucket

808.34

Validation

1

74521

Yes

MN

Out of bucket

0

In-progress

1

11965

No

MN

expected

0

In-progress

1

 

Expected Output:

Var

Freq.

Count

Percentage

Char1

Yes

4

57.14286

Char1

No

2

28.57143

Char1

 

1

14.28571

Char2

MN

5

71.42857

Char2

PA

2

28.57143

Char3

expected

4

57.14286

Char3

Out of bucket

3

42.85714

Char4

Done

1

14.28571

Char4

In-progress

3

42.85714

Char4

Validation

1

14.28571

Char4

 

2

28.57143

 

 

Important Note: the original table has more than 150 fields and 2+ billion records. So the code should be more efficiency by performance. Currently I am using PROC content to get char fields name then using  PORC FREQ to get frequencies thru do loop.


Accepted Solutions
Solution
‎04-16-2018 01:56 PM
Super User
Posts: 23,663

Re: Trying to all character fields frequencies details in a SAS dataset.

Not going to meet the efficient requirements but this works quite well.

 

You can reference all character variables with _character_.

 

/*This code is an example of how to generate a table with 
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required

Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/

*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
	table _character_;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
	variable_value='Variable Value';
run;

*Display;
proc print data=want label;
run;

 

 

View solution in original post


All Replies
Respected Advisor
Posts: 2,981

Re: Trying to all character fields frequencies details in a SAS dataset.

[ Edited ]

Certainly PROC FREQ can compute the Freq, Count and Percentage. It shouldn't be hard then to combine all of the Var into a single data set if that's what you want. Since you are already doing that, I don't understand what you are asking.

--
Paige Miller
Solution
‎04-16-2018 01:56 PM
Super User
Posts: 23,663

Re: Trying to all character fields frequencies details in a SAS dataset.

Not going to meet the efficient requirements but this works quite well.

 

You can reference all character variables with _character_.

 

/*This code is an example of how to generate a table with 
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required

Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/

*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
	table _character_;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
	variable_value='Variable Value';
run;

*Display;
proc print data=want label;
run;

 

 

Respected Advisor
Posts: 2,981

Re: Trying to all character fields frequencies details in a SAS dataset.

@Reeza

ah yes, VVALUEX, I should have thought of that, brilliant

 

@ez123

It's hard to image any method being considered "efficient" with 2 BILLION records 

--
Paige Miller
Occasional Contributor
Posts: 7

Re: Trying to all character fields frequencies details in a SAS dataset.

Thanks Reeza, this is almost match with my expectation. Just it is missing to calculate the null (missing) values. Can you please include that.

 

For example Total number of record is 7 and Char1 has only 6.  

 

variable

variable_value

Frequency

Percent

CumFrequency

CumPercent

Char1

No

2

33.33

2

33.33

Char1

Yes

4

66.67

6

100.00

Char2

MN

5

71.43

5

71.43

Char2

PA

2

28.57

7

100.00

Char3

Out of bucket

3

42.86

3

42.86

Char3

expected

4

57.14

7

100.00

Char4

Done

1

20.00

1

20.00

Char4

In-progress

3

60.00

4

80.00

Char4

Validation

1

20.00

5

100.00

 
Super User
Posts: 23,663

Re: Trying to all character fields frequencies details in a SAS dataset.


@ez123 wrote:

Thanks Reeza, this is almost match with my expectation. Just it is missing to calculate the null (missing) values. Can you please include that.

 

 


You should be able to figure that part out. Hint: it's 6 more characters.

Super User
Posts: 6,751

Re: Trying to all character fields frequencies details in a SAS dataset.

[ Edited ]

Step 1:  Become a little familiar with ODS.  For example, run a simple proc freq:

 

ODS Trace On;

proc freq data=have;

tables char1;

run;

 

This will tell you the names of the output elements that are available.  I think the one you want is named OneWayFreqs, but you will need to check that.

 

Step 2:  Run a PROC FREQ on all character variables, capturing the results using ODS.  Something along these lines:

 

proc freq data=have;

tables _character_;

ods output OneWayFreqs = lib.freq_results;

run;

 

That should give you what you are asking for, with a little bit of reformatting required.

 

 

******************EDITED:

 

Looks like Reeza beat me to the punch, with a little more detail as well.

Super User
Posts: 6,751

Re: Trying to all character fields frequencies details in a SAS dataset.

[ Edited ]

One more thing ...

 

  • If you are after a report rather than a data set, and
  • If the report format is mildly flexible

You should be able to use a very simple program:

 

proc tabulate data=have missing;

   class _character_;

   tables _character_, n*f=comma11.  pctn;

run;

PROC Star
Posts: 1,769

Re: Trying to all character fields frequencies details in a SAS dataset.

data have;
infile datalines expandtabs truncover;
input Key	Char1 $	Char2 $	Char3 & $20.	Num1	Char4 & $20.	Num2;
datalines;
2345	Yes	MN	expected	3445.5	.	1
3455	No	MN	Out of bucket	123.5	.	0
1234	.	MN	expected	0	Done	1
8865	Yes	PA	expected	0	In-progress	 0
33345	Yes	PA	Out of bucket	808.34	Validation	1
74521	Yes	MN	Out of bucket	0	In-progress	 1
11965	No	MN	expected	0	In-progress	 1
;

data _null_;
if _n_=1 then do;
 dcl hash H (ordered: "A",hashexp: 20) ;
   h.definekey  ("var",'freq') ;
   h.definedata ("var","freq", "count",'pct') ;
   h.definedone () ;
end;
set have end=last nobs=nobs;
array t(*) _char_;
do _n_=1 to dim(t);
var=vname(t(_n_));
freq=t(_n_);
if h.check() ne 0 then do;
count=1;
pct=count/nobs*100;
h.replace();
end;
else if h.find()=0 then do;
count+1;
pct=count/nobs*100;
h.replace();
end;
end;
if last then h.output(dataset:"want");
run;
PROC Star
Posts: 1,769

Re: Trying to all character fields frequencies details in a SAS dataset.

Posted in reply to novinosrin
data have;
infile datalines expandtabs truncover;
input Key	Char1 $	Char2 $	Char3 & $20.	Num1	Char4 & $20.	Num2;
datalines;
2345	Yes	MN	expected	3445.5	.	1
3455	No	MN	Out of bucket	123.5	.	0
1234	.	MN	expected	0	Done	1
8865	Yes	PA	expected	0	In-progress	 0
33345	Yes	PA	Out of bucket	808.34	Validation	1
74521	Yes	MN	Out of bucket	0	In-progress	 1
11965	No	MN	expected	0	In-progress	 1
;

proc transpose data=have out=_have ;
by key notsorted;
var _char_;
run;
proc sql;
create table want as
select _name_ as var, col1 as freq,count(*) as count,(select count(*) from have) as cnt,calculated count/calculated cnt as pct
from _have
group by 1,2;
quit;
Super User
Posts: 13,498

Re: Trying to all character fields frequencies details in a SAS dataset.

I would be very tempted to try something along these lines:

proc tabulate data=have;
   class _character_ /missing;
   classlev _character_ /style=[just=r];
   table _character_,
         n=count colpctn='%'*f=9.5
         ;
run;

Though it is going to be a very long table given many variables and levels of variables.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 189 views
  • 9 likes
  • 6 in conversation