BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mh2t
Obsidian | Level 7

I have a dataset :

 

ID    var1  var2  ...  var1000     Y

1        a       b               f         1111

2        c       m              g        222

3        a       m              g        4444

4        d       m              h        7777

 

Then I want an output like:

 

varName   Levels   Freq   Avg_Y

  var1             a         2          xx

  var1             c         1          yy

  var1             d         1          ww

  var2             m        3          zz

  var2             b         1          qq

 

The dataset is huge with lots of levels, so PROC FREQ fails to run. I'm wondering if PROC SQL can help me run the code.

 

Any help would be greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @mh2t  If you are ever interested in some advanced solutions like using Hash for your stated requirement, I encourage you to read Guru Paul D @hashman  / @DonH   book that has an example of all these kinds.  

Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study 

By Paul Dorfman and Don Henderson

 

Anyways, I wanted to have some fun-

data have;
input id (var1 var2) ($) y;
datalines;
1 a b 1111
2 c m 2222
3 a m 4444
4 d m 7777
;
data _null_ ;
 if _n_=1 then do;
  dcl hash H (ordered: "A") ;
  h.definekey  ("varname","levels") ;
  h.definedata ("varname","levels","freq","sum","avg_y") ;
  h.definedone () ;
 end;
 set have end=z;
 array t var1-var2;
 do over t;
  varname=vname(t);
  levels=t;
  if h.find()=0 then do;
   freq=sum(freq,1);
   sum=sum(y,sum);
   avg_y=sum/freq;
  end;
  else do;
   freq=1;
   sum=y;
   avg_y=sum;
  end;
  h.replace();
 end;
 if z;
 h.output(dataset:'want(drop=sum)');
run;

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

PROC FREQ will do this for the levels. (Although I'm skeptical what value there is in creating this table for 1000 variables...)

 

PROC MEANS/PROC SUMMARY will compute the mean for Y. (Although I'm skeptical what value there is in creating this table for 1000 variables...)

 

You can merge the tables together if you really need the output in the exact format shown.

--
Paige Miller
Kurt_Bremser
Super User

How do you calculate the value for the average of Y?

Anyway, this is the method for calculating the frequencies:

data have;
input id (var1 var2) ($);
datalines;
1        a       b
2        c       m
3        a       m
4        d       m
;

proc transpose data=have out=long name=varname;
by id;
var var:;
run;

proc freq data=long noprint;
tables varname*col1 / nopercent out=want (drop=percent);
run;
mh2t
Obsidian | Level 7

Thank you @Kurt_Bremser .

Avg_Y is calcualted within levels. So for Var1, level "a", the Avg_Y = (1111 + 4444)/2

Kurt_Bremser
Super User

Then this would be my suggestion:

data have;
input id (var1 var2) ($) y;
datalines;
1 a b 1111
2 c m 2222
3 a m 4444
4 d m 7777
;

proc transpose
  data=have
  out=long (rename=(col1=levels))
  name=varname
;
by id;
var var:;
run;

data long2;
merge
  long
  have (keep=id y)
;
by id;
run;

proc summary data=long2 nway;
class varname levels;
output
  out=want (drop=_type_ rename=(_freq_=freq))
  mean(y)=
;
run;
AhmedAl_Attar
Ammonite | Level 13

For Huge data sets, I would avoid Proc Transpose. Using Data Step for fixed data structure can be faster and uses less data passes

/* Transpose the data using Data Step */
data want(keep=varname levels y);
	length varname $7 levels $1;
	array vars (*) $1 var1-var3;
	set have;
	do i=1 to dim(vars);
		varname =vname(vars(i));
		levels =vvalue(vars(i));
		output;
	end;
run;
/* Summarize to produce your desired means */
proc summary data=want nway;
	class varname levels;
	var y;
	output out=want_sums(drop=_type_ Rename=(_freq_=freq)) mean=y;
run;
/* Remove unwanted data */
proc delete data=want; run;

Hope this helps,

Ahmed 

novinosrin
Tourmaline | Level 20

Hi @mh2t  If you are ever interested in some advanced solutions like using Hash for your stated requirement, I encourage you to read Guru Paul D @hashman  / @DonH   book that has an example of all these kinds.  

Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study 

By Paul Dorfman and Don Henderson

 

Anyways, I wanted to have some fun-

data have;
input id (var1 var2) ($) y;
datalines;
1 a b 1111
2 c m 2222
3 a m 4444
4 d m 7777
;
data _null_ ;
 if _n_=1 then do;
  dcl hash H (ordered: "A") ;
  h.definekey  ("varname","levels") ;
  h.definedata ("varname","levels","freq","sum","avg_y") ;
  h.definedone () ;
 end;
 set have end=z;
 array t var1-var2;
 do over t;
  varname=vname(t);
  levels=t;
  if h.find()=0 then do;
   freq=sum(freq,1);
   sum=sum(y,sum);
   avg_y=sum/freq;
  end;
  else do;
   freq=1;
   sum=y;
   avg_y=sum;
  end;
  h.replace();
 end;
 if z;
 h.output(dataset:'want(drop=sum)');
run;
Ksharp
Super User
data have;
input id (var1 var2) ($) y;
datalines;
1 a b 1111
2 c m 2222
3 a m 4444
4 d m 7777
;
proc transpose data=have(obs=0) out=temp;
var _all_;
run; 
proc delete data=want;run;
data _null_;
 set temp(where=(lowcase(_name_) like 'var%'));
call execute(cat('proc sql;create table x as select "',_name_,'" as vname length=40,',
_name_,' as levels,count(',_name_,') as freq,mean(y) as avg_y from have group by ',_name_,
';quit;proc append base=want data=x force;run;'));
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1149 views
  • 3 likes
  • 6 in conversation