Solved: Re: summary statistics - Page 2

Ksharp · Posted 12-02-2016 06:15 AM

Opps.

option  bufno=100 bufsize=128k;

chennupriya · Posted 12-02-2016 09:21 PM

Hi Ksharp ,

Sure I will try these options out and see if it increased efficiency

Thanks for all help

Ksharp · Posted 12-03-2016 03:35 AM

OR this maybe give you a little faster.







data have;
infile cards truncover expandtabs;
input customerid	pricing01	pricing02	pricing03;
cards;
12345	200	300	240
12345	200	300	240
123	100	300	626
121	200	300	721
126	150	400	666
;
run;

proc transpose data=have(keep=pricing: obs=0) out=temp;
run;
options bufno=100 bufsize=128k;
data _null_;
 set temp end=last;
 call execute(cat('proc sql; create table x as select "cnt_id" as a length=20,"',
  _name_,'" as b length=20 from have(obs=1) union 
 select put(count(customerid),best. -l),put(',_name_,',best. -l) from have group by ',_name_));

if not last then call execute(cat('union select " "," " from have(obs=1)'));
call execute('order by 1 desc;quit;proc append base=want data=x force;run;');
run;

chennupriya · Posted 12-05-2016 07:08 PM

hi ,

Now the code runs very very fast

Thanks you so much and thanks a lot for all your help and now its running perfect and very fast

Thank you

chennupriya · Posted 11-29-2016 05:59 PM

as we are using group by for a large dataset its taking time can u pls help if there is an efficient way

Thanks for your help

Cynthia_sas · Posted 11-30-2016 09:23 AM

Hi:

It seems to me that PROC TABULATE and transposed data might be better suited and faster. For example, TABULATE can produce this:

no matter how many variables named "pricingxx" you have -- once the data is transposed. If the variables aren't named the same, you might have a little more to declare in the array statement, but as of now, I don't see that a macro is necessary.

Here's the code:

data start;
  infile datalines;
  input customerid 	pricing01 	pricing02 	pricing03;
return;
datalines;
12345 	200 	300 	240
12345 	200 	300 	240
123 	100 	300 	626
121 	200 	300 	721
126 	150 	400 	666
;
run;
 
proc print data=start;
  title 'starting data';
run;
 
data intermediate(keep=customerid pgtype val);
  set start;
  ** this array will be as big as needed for all variables that start with 'pricing';
  ** assumes that the variables are numbered 01, 02, 03, etc in original data;
  array prc (*) pricing:;
  do I = 1 to dim(prc);
        pgtype = catt('Pricing',put(i,z2.));
	val = prc(i);
	output;
  end;
run;
 
proc print data=intermediate;
title 'intermediate to send to tabulate';
run;
 
proc tabulate data=intermediate;
  title 'Count customers for each value';
  class pgtype ;
  class val / order=freq;
  table pgtype=' ',
        val=' ', 
        n='Count' / box=_page_ ;
run;

cynthia

chennupriya · Posted 12-01-2016 02:42 PM

hi Cynthia ,

I have a question what if i have all different varaibles like item ,price ,insurance,name ?

Cynthia_sas · Posted 12-01-2016 06:21 PM

then you would need to list them in the array statement...but item, and name are character aren't they? price and insurance would be numeric? An array can only contain one type of variable (either character or numeric) -- that is different than the original scope of your question.

cynthia

chennupriya · Posted 12-02-2016 09:19 PM

Hi Cynthia,

Yes all my varaibles which i want to have a proc tabulate are all numeric variables and no characters . I will try it out

Thanks for all help

Cynthia_sas · Posted 12-03-2016 08:36 AM

Hi:

If you will need to preserve the name of the original variable, then you will need to use the VNAME function to capture that, as shown below. The PROC TABULATE step would be the same.

cynthia

  
** use fake names for the variables in the dataset;
** instead of numbered variables;   

   
data start;
infile datalines;
input customerid lucy    ricky ethel fred;
return;
datalines;
12345 	200 	300 	240 150
12345 	200 	300 	240 150
123 	100 	300 	626 200
121 	200 	300 	721 200
126 	150 	400 	666 200
;
run;
   
proc print data=start;
  title 'starting data';
run;
       
 ** need to list each variable in the ARRAY statement;              
data intermediate(keep=customerid pgtype val);
  set start;
  ** this array will be as big as needed for all variables that are listed;
  ** array name is just a convenient way to reference a group of variables;
  ** if need to keep the name of the original variable, use the VNAME function;   
  array prc (*) lucy ricky ethel fred;
  do I = 1 to dim(prc);
       pgtype = vname(prc(i));
       val = prc(i);
       output;
  end;
run;
** rest of program the same as previously posted;

Registration is open

SAS Training: Just a Click Away