BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GRodiguez
Calcite | Level 5

I'm stumped by this one, and am unsure how to even get started.

 

I have an extremely large dataset which I need to analyze by counting the values for each variable, and have the results written to an output dataset.  I need a detailed count of the number of times each value occurs for each variable. Starting with a dataset constructed like this:

data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;

I need to get to an output that looks like this:

 

attribute    value    count
color         Blue         4
color         Red          3
color         Yellow      3
class         A             3
class         B             4
class         C            3
size           S            3
size           M           2
size           L            3
size           XL          2

 

My input table is 300+ columns by millions of rows (unfortunately I cannot get this changed), and somewhat dynamic.  New variables could appear or existing variable names could change.  Also, the allowed values in any given column could change from time to time.  Thus, I really don't want to hard-code the variable names or allowed values.  I tried using proc transpose to at least get the variable names read into my 'attribute' column, but then was unable to figure out how to read the allowed values for each.  I'm looking for a method to loop through each variable, pick up all the values for each, and output the frequency count.  Sounded simple to me at first, but I can't even figure out how to get started. 

 

I'm on SAS 9.4/EG 7.1.

 

Thanks for any ideas you can provide. 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
	table _all_;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
	variable_value='Variable Value';
run;

*Display;
proc print data=want(obs=20) label;
run;

Run this and see if it's what you need. If it is, change sashelp.class to your own data set and see if it works. 

 


@GRodiguez wrote:

I'm stumped by this one, and am unsure how to even get started.

 

I have an extremely large dataset which I need to analyze by counting the values for each variable, and have the results written to an output dataset.  I need a detailed count of the number of times each value occurs for each variable. Starting with a dataset constructed like this:

data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;

I need to get to an output that looks like this:

 

attribute    value    count
color         Blue         4
color         Red          3
color         Yellow      3
class         A             3
class         B             4
class         C            3
size           S            3
size           M           2
size           L            3
size           XL          2

 

My input table is 300+ columns by millions of rows (unfortunately I cannot get this changed), and somewhat dynamic.  New variables could appear or existing variable names could change.  Also, the allowed values in any given column could change from time to time.  Thus, I really don't want to hard-code the variable names or allowed values.  I tried using proc transpose to at least get the variable names read into my 'attribute' column, but then was unable to figure out how to read the allowed values for each.  I'm looking for a method to loop through each variable, pick up all the values for each, and output the frequency count.  Sounded simple to me at first, but I can't even figure out how to get started. 

 

I'm on SAS 9.4/EG 7.1.

 

Thanks for any ideas you can provide. 


 

View solution in original post

5 REPLIES 5
Astounding
PROC Star

Here's an example that does the type of thing you are asking for:

 

http://support.sas.com/kb/37/780.html

 

Reeza
Super User
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
	table _all_;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
	variable_value='Variable Value';
run;

*Display;
proc print data=want(obs=20) label;
run;

Run this and see if it's what you need. If it is, change sashelp.class to your own data set and see if it works. 

 


@GRodiguez wrote:

I'm stumped by this one, and am unsure how to even get started.

 

I have an extremely large dataset which I need to analyze by counting the values for each variable, and have the results written to an output dataset.  I need a detailed count of the number of times each value occurs for each variable. Starting with a dataset constructed like this:

data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;

I need to get to an output that looks like this:

 

attribute    value    count
color         Blue         4
color         Red          3
color         Yellow      3
class         A             3
class         B             4
class         C            3
size           S            3
size           M           2
size           L            3
size           XL          2

 

My input table is 300+ columns by millions of rows (unfortunately I cannot get this changed), and somewhat dynamic.  New variables could appear or existing variable names could change.  Also, the allowed values in any given column could change from time to time.  Thus, I really don't want to hard-code the variable names or allowed values.  I tried using proc transpose to at least get the variable names read into my 'attribute' column, but then was unable to figure out how to read the allowed values for each.  I'm looking for a method to loop through each variable, pick up all the values for each, and output the frequency count.  Sounded simple to me at first, but I can't even figure out how to get started. 

 

I'm on SAS 9.4/EG 7.1.

 

Thanks for any ideas you can provide. 


 

novinosrin
Tourmaline | Level 20
data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;

proc transpose data=have out=temp;
var color class size;
run;

proc transpose data=temp out=temp2;
by _name_ notsorted;
var col:;
run;

proc freq data=temp2 noprint;
by _name_ notsorted;
tables col1/out=want(drop=PERCENT);
run;
novinosrin
Tourmaline | Level 20
data have;
	input id $ color $ class $ size $;
	datalines;
	A1 Blue A XL
	B2 Red B XL
	C3 Blue A L
	D4 Yellow C M
	E5 Red C S
	F6 Blue B L
	G7 Yellow B M
	H8 Blue C S
	I9 Red B S
	J10 Yellow A L
	;
run;


data _null_;
if _n_=1 then do;
 dcl hash H (ordered:'a') ;
   h.definekey  ("attribute",'value') ;
   h.definedata ("attribute","value", "count") ;
   h.definedone () ;
end;
set have end=l;
array t(*) color--size;
do _n_=1 to dim(t);
attribute=vname(t(_n_));
value=t(_n_);
if h.find() ne 0 then do;count=1;h.replace();end;
else do;count=count+1;h.replace();end;
end;
if l then h.output(dataset:'want');
run;
GRodiguez
Calcite | Level 5

Thanks to all for the suggestions!  I tried to give them all a fair shot at my real data.  All the suggestions got me to my desired output, but the solution I marked as accepted ran very quickly.  I liked the double-transpose solution as well, but it took well over an hour to run on my real data (as opposed to 3 minutes for the accepted solution).  

 

Thanks again.   

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 953 views
  • 1 like
  • 4 in conversation