SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Need Ouput dataset with counts for variable values based on input dataset

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Need Ouput dataset with counts for variable values based on input dataset

I'm stumped by this one, and am unsure how to even get started.

 

I have an extremely large dataset which I need to analyze by counting the values for each variable, and have the results written to an output dataset.  I need a detailed count of the number of times each value occurs for each variable. Starting with a dataset constructed like this:

data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;

I need to get to an output that looks like this:

 

attribute    value    count
color         Blue         4
color         Red          3
color         Yellow      3
class         A             3
class         B             4
class         C            3
size           S            3
size           M           2
size           L            3
size           XL          2

 

My input table is 300+ columns by millions of rows (unfortunately I cannot get this changed), and somewhat dynamic.  New variables could appear or existing variable names could change.  Also, the allowed values in any given column could change from time to time.  Thus, I really don't want to hard-code the variable names or allowed values.  I tried using proc transpose to at least get the variable names read into my 'attribute' column, but then was unable to figure out how to read the allowed values for each.  I'm looking for a method to loop through each variable, pick up all the values for each, and output the frequency count.  Sounded simple to me at first, but I can't even figure out how to get started. 

 

I'm on SAS 9.4/EG 7.1.

 

Thanks for any ideas you can provide. 


Accepted Solutions
Solution
a week ago
Super User
Posts: 23,296

Re: Need Ouput dataset with counts for variable values based on input dataset

Posted in reply to GRodiguez
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
	table _all_;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
	variable_value='Variable Value';
run;

*Display;
proc print data=want(obs=20) label;
run;

Run this and see if it's what you need. If it is, change sashelp.class to your own data set and see if it works. 

 


@GRodiguez wrote:

I'm stumped by this one, and am unsure how to even get started.

 

I have an extremely large dataset which I need to analyze by counting the values for each variable, and have the results written to an output dataset.  I need a detailed count of the number of times each value occurs for each variable. Starting with a dataset constructed like this:

data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;

I need to get to an output that looks like this:

 

attribute    value    count
color         Blue         4
color         Red          3
color         Yellow      3
class         A             3
class         B             4
class         C            3
size           S            3
size           M           2
size           L            3
size           XL          2

 

My input table is 300+ columns by millions of rows (unfortunately I cannot get this changed), and somewhat dynamic.  New variables could appear or existing variable names could change.  Also, the allowed values in any given column could change from time to time.  Thus, I really don't want to hard-code the variable names or allowed values.  I tried using proc transpose to at least get the variable names read into my 'attribute' column, but then was unable to figure out how to read the allowed values for each.  I'm looking for a method to loop through each variable, pick up all the values for each, and output the frequency count.  Sounded simple to me at first, but I can't even figure out how to get started. 

 

I'm on SAS 9.4/EG 7.1.

 

Thanks for any ideas you can provide. 


 

View solution in original post


All Replies
Super User
Posts: 6,632

Re: Need Ouput dataset with counts for variable values based on input dataset

Posted in reply to GRodiguez

Here's an example that does the type of thing you are asking for:

 

http://support.sas.com/kb/37/780.html

 

Solution
a week ago
Super User
Posts: 23,296

Re: Need Ouput dataset with counts for variable values based on input dataset

Posted in reply to GRodiguez
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
	table _all_;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
	variable_value='Variable Value';
run;

*Display;
proc print data=want(obs=20) label;
run;

Run this and see if it's what you need. If it is, change sashelp.class to your own data set and see if it works. 

 


@GRodiguez wrote:

I'm stumped by this one, and am unsure how to even get started.

 

I have an extremely large dataset which I need to analyze by counting the values for each variable, and have the results written to an output dataset.  I need a detailed count of the number of times each value occurs for each variable. Starting with a dataset constructed like this:

data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;

I need to get to an output that looks like this:

 

attribute    value    count
color         Blue         4
color         Red          3
color         Yellow      3
class         A             3
class         B             4
class         C            3
size           S            3
size           M           2
size           L            3
size           XL          2

 

My input table is 300+ columns by millions of rows (unfortunately I cannot get this changed), and somewhat dynamic.  New variables could appear or existing variable names could change.  Also, the allowed values in any given column could change from time to time.  Thus, I really don't want to hard-code the variable names or allowed values.  I tried using proc transpose to at least get the variable names read into my 'attribute' column, but then was unable to figure out how to read the allowed values for each.  I'm looking for a method to loop through each variable, pick up all the values for each, and output the frequency count.  Sounded simple to me at first, but I can't even figure out how to get started. 

 

I'm on SAS 9.4/EG 7.1.

 

Thanks for any ideas you can provide. 


 

PROC Star
Posts: 1,584

Re: Need Ouput dataset with counts for variable values based on input dataset

Posted in reply to GRodiguez
data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;

proc transpose data=have out=temp;
var color class size;
run;

proc transpose data=temp out=temp2;
by _name_ notsorted;
var col:;
run;

proc freq data=temp2 noprint;
by _name_ notsorted;
tables col1/out=want(drop=PERCENT);
run;
PROC Star
Posts: 1,584

Re: Need Ouput dataset with counts for variable values based on input dataset

Posted in reply to novinosrin
data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;


data _null_;
if _n_=1 then do;
 dcl hash H (ordered:'a') ;
   h.definekey  ("attribute",'value') ;
   h.definedata ("attribute","value", "count") ;
   h.definedone () ;
end;
set have end=l;
array t(*) color--size;
do _n_=1 to dim(t);
attribute=vname(t(_n_));
value=t(_n_);
if h.find() ne 0 then do;count=1;h.replace();end;
else do;count=count+1;h.replace();end;
end;
if l then h.output(dataset:'want');
run;
New Contributor
Posts: 2

Re: Need Ouput dataset with counts for variable values based on input dataset

Posted in reply to GRodiguez

Thanks to all for the suggestions!  I tried to give them all a fair shot at my real data.  All the suggestions got me to my desired output, but the solution I marked as accepted ran very quickly.  I liked the double-transpose solution as well, but it took well over an hour to run on my real data (as opposed to 3 minutes for the accepted solution).  

 

Thanks again.   

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 176 views
  • 1 like
  • 4 in conversation