topic Re: levels average and frequency for 1000 variables by proc sql in SAS Programming

levels average and frequency for 1000 variables by proc sql

mh2t — Thu, 25 Jun 2020 12:22:09 GMT

I have a dataset :

ID var1 var2 ... var1000 Y

1 a b f 1111

2 c m g 222

3 a m g 4444

4 d m h 7777

Then I want an output like:

varName Levels Freq Avg_Y

var1 a 2 xx

var1 c 1 yy

var1 d 1 ww

var2 m 3 zz

var2 b 1 qq

The dataset is huge with lots of levels, so PROC FREQ fails to run. I'm wondering if PROC SQL can help me run the code.

Any help would be greatly appreciated!

Re: levels average and frequency for 1000 variables by proc sql

PaigeMiller — Thu, 25 Jun 2020 12:28:58 GMT

PROC FREQ will do this for the levels. (Although I'm skeptical what value there is in creating this table for 1000 variables...)

PROC MEANS/PROC SUMMARY will compute the mean for Y. (Although I'm skeptical what value there is in creating this table for 1000 variables...)

You can merge the tables together if you really need the output in the exact format shown.

Re: levels average and frequency for 1000 variables by proc sql

Kurt_Bremser — Thu, 25 Jun 2020 12:55:43 GMT

How do you calculate the value for the average of Y?

Anyway, this is the method for calculating the frequencies:

data have;
input id (var1 var2) ($);
datalines;
1        a       b
2        c       m
3        a       m
4        d       m
;

proc transpose data=have out=long name=varname;
by id;
var var:;
run;

proc freq data=long noprint;
tables varname*col1 / nopercent out=want (drop=percent);
run;

Re: levels average and frequency for 1000 variables by proc sql

mh2t — Thu, 25 Jun 2020 13:02:22 GMT

Thank you @Kurt_Bremser .

Avg_Y is calcualted within levels. So for Var1, level "a", the Avg_Y = (1111 + 4444)/2

Re: levels average and frequency for 1000 variables by proc sql

Kurt_Bremser — Thu, 25 Jun 2020 13:14:39 GMT

Then this would be my suggestion:

data have;
input id (var1 var2) ($) y;
datalines;
1 a b 1111
2 c m 2222
3 a m 4444
4 d m 7777
;

proc transpose
  data=have
  out=long (rename=(col1=levels))
  name=varname
;
by id;
var var:;
run;

data long2;
merge
  long
  have (keep=id y)
;
by id;
run;

proc summary data=long2 nway;
class varname levels;
output
  out=want (drop=_type_ rename=(_freq_=freq))
  mean(y)=
;
run;

Re: levels average and frequency for 1000 variables by proc sql

AhmedAl_Attar — Thu, 25 Jun 2020 13:36:20 GMT

For Huge data sets, I would avoid Proc Transpose. Using Data Step for fixed data structure can be faster and uses less data passes

/* Transpose the data using Data Step */
data want(keep=varname levels y);
	length varname $7 levels $1;
	array vars (*) $1 var1-var3;
	set have;
	do i=1 to dim(vars);
		varname =vname(vars(i));
		levels =vvalue(vars(i));
		output;
	end;
run;
/* Summarize to produce your desired means */
proc summary data=want nway;
	class varname levels;
	var y;
	output out=want_sums(drop=_type_ Rename=(_freq_=freq)) mean=y;
run;
/* Remove unwanted data */
proc delete data=want; run;

Hope this helps,

Ahmed

Re: levels average and frequency for 1000 variables by proc sql

novinosrin — Thu, 25 Jun 2020 13:38:07 GMT

Hi @mh2t If you are ever interested in some advanced solutions like using Hash for your stated requirement, I encourage you to read Guru Paul D @hashman / @DonH book that has an example of all these kinds.

Data Management Solutions Using SAS^® Hash Table Operations: A Business Intelligence Case Study

By Paul Dorfman and Don Henderson

Anyways, I wanted to have some fun-

data have;
input id (var1 var2) ($) y;
datalines;
1 a b 1111
2 c m 2222
3 a m 4444
4 d m 7777
;
data _null_ ;
 if _n_=1 then do;
  dcl hash H (ordered: "A") ;
  h.definekey  ("varname","levels") ;
  h.definedata ("varname","levels","freq","sum","avg_y") ;
  h.definedone () ;
 end;
 set have end=z;
 array t var1-var2;
 do over t;
  varname=vname(t);
  levels=t;
  if h.find()=0 then do;
   freq=sum(freq,1);
   sum=sum(y,sum);
   avg_y=sum/freq;
  end;
  else do;
   freq=1;
   sum=y;
   avg_y=sum;
  end;
  h.replace();
 end;
 if z;
 h.output(dataset:'want(drop=sum)');
run;

Re: levels average and frequency for 1000 variables by proc sql

Ksharp — Fri, 26 Jun 2020 13:52:32 GMT

data have;
input id (var1 var2) ($) y;
datalines;
1 a b 1111
2 c m 2222
3 a m 4444
4 d m 7777
;
proc transpose data=have(obs=0) out=temp;
var _all_;
run; 
proc delete data=want;run;
data _null_;
 set temp(where=(lowcase(_name_) like 'var%'));
call execute(cat('proc sql;create table x as select "',_name_,'" as vname length=40,',
_name_,' as levels,count(',_name_,') as freq,mean(y) as avg_y from have group by ',_name_,
';quit;proc append base=want data=x force;run;'));
run;

topic Re: levels average and frequency for 1000 variables by proc sql in SAS Programming

levels average and frequency for 1000 variables by proc sql

Re: levels average and frequency for 1000 variables by proc sql

Re: levels average and frequency for 1000 variables by proc sql

Re: levels average and frequency for 1000 variables by proc sql

Re: levels average and frequency for 1000 variables by proc sql

Re: levels average and frequency for 1000 variables by proc sql

Re: levels average and frequency for 1000 variables by proc sql

Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study

Re: levels average and frequency for 1000 variables by proc sql

Data Management Solutions Using SAS^® Hash Table Operations: A Business Intelligence Case Study