BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
js5
Pyrite | Level 9 js5
Pyrite | Level 9

Hello,

 

I have the following code:

proc freq data=inputds noprint;
	by notsorted a b c;
	tables x*y*z/out=freq;
	weight count/zeros;
run;

proc means data=inputds completetypes noprint nway;
	by notsorted a b c;
	var count;
	output out=means(drop=_:) sum=count_sum_c;
run;

data final;
	merge freq means;
	by notsorted a b c;
run;

Is it possible to produce the dataset in one single step? having a notsorted means that the merge is not possible, and due to the fact that the input and freq datasets contain over 90 million rows, I would prefer not to have to sort anything unless absolutely unavoidable. Thank you for your tips in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So with example data we can begin to get an idea what you are trying to do.  Looks like you have grouping variables (in this case 3 of them: Make Drivetrain Type) that uniquely identify the groups.   And you have a CLASS variable (in this case: Model) to subdivide the groups.  And an analysis variable (in this case MSRP) that you want to SUM.

 

One obvious savings is to use the output of the first summary (at the BY GROUP* CLASS VARIABLE level) to calculate the BY GROUP sums.

 

In your example that would be changing the step that makes CARS_MEANS to use CARS_FREQ as the input.

proc summary data=cars_freq ;
  by make drivetrain type;
  var count ;
  output out=cars_means(drop=_:) sum=MSRP_sum_carstate;
run;

It just doesn't attach the DOLLAR format to the variable.

 

Here is how you can use one PROC SUMMARY (MEANS and SUMMARY are the same procedure, they just use different defaults for the PRINT/NOPRINT option) and one data step to calculate your COUNT, PERCENT and MSRP_sum_carstate variables.

proc summary data=sashelp.cars chartype ;
  class Make Drivetrain Type Model ;
  types Make*Drivetrain*Type Make*Drivetrain*Type*Model;
  output out=step1 sum(MSRP)=COUNT ;
run;

data want;
  merge step1(where=(not indexc(_type_,'0')))
        step1(where=(indexc(_type_,'0')) drop=Model rename=(COUNT=MSRP_sum_carstate))
  ;
  by Make Drivetrain Type;
  PERCENT=100*(COUNT/MSRP_sum_carstate);
  drop _type_ _freq_ ;
run;

Let's test and see if they make the same values (within 6 decimal places).

proc compare data=want compare=final method=absolute criterion=10E-6;
run;

Results

Variables Summary

Number of Variables in Common: 7.
Number of Variables with Differing Attributes: 2.


Listing of Common Variables with Differing Attributes

Variable  Dataset     Type  Length  Format    Label

COUNT     WORK.WANT   Num        8  DOLLAR8.
          WORK.FINAL  Num        8            Frequency Count
PERCENT   WORK.WANT   Num        8
          WORK.FINAL  Num        8            Percent of Total Frequency


Observation Summary

Observation      Base  Compare

First Obs           1        1
Last  Obs         428      428

Number of Observations in Common: 428.
Total Number of Observations Read from WORK.WANT: 428.
Total Number of Observations Read from WORK.FINAL: 428.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 428.


Values Comparison Summary

Number of Variables Compared with All Observations Equal: 7.
Number of Variables Compared with Some Observations Unequal: 0.
Total Number of Values which Compare Unequal: 0.
Total Number of Values not EXACTLY Equal: 141.
Maximum Difference: 1.4211E-14.

 

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

Without data or even a description of what you are trying to do it is going to be hard.

 

Why not just let PROC MEANS add up the WEIGHT variable for you (especially since you want to include the zero weight observations)?

proc means data=inputds completetypes noprint nway;
  class a b c;
  output out=means(drop=_:) n=nobs sum(count)=count_sum_c;
run;
js5
Pyrite | Level 9 js5
Pyrite | Level 9

This lacks the frequencies per a*b combination unfortunately

Tom
Super User Tom
Super User

@js5 wrote:

This lacks the frequencies per a*b combination unfortunately


Again, without actual data or an explanation I have no idea what that means.

ballardw
Super User

You might describe what you are wanting as output and do you actually need a data set? If the data set is not used for anything other than reporting it may be that a report procedure is the way to go.

And providing data in the form of a working data step usually helps.

Provide a subset and what you expect as a final result for that subset of data.

 

I am a bit concerned that you may not be getting what you want as output to begin with by using not sorted.

 

IF two data sets have the same number of observations you might try as without a BY then SAS does a line by line merge regardless of values. The output of the two data sets may be in the same order because of the way SAS process notsorted data.

data final;
	merge freq means;
run;

 

 

js5
Pyrite | Level 9 js5
Pyrite | Level 9

I do need a dataset as output. The two datasets do not have the same number of observations unfortunately. Here is a simplified example using sashelp.cars:

proc sort data=sashelp.cars out=cars_sorted;
by Make Drivetrain Type;
run;

proc freq data=cars_sorted noprint;
	by Make Drivetrain Type;
	tables Model/out=cars_freq;
	weight MSRP/zeros;
run;

proc means data=cars_sorted completetypes noprint nway;
	by Make Drivetrain Type;
	var MSRP;
	output out=cars_means(drop=_:) sum=MSRP_sum_carstate;
run;

data final;
	merge cars_freq cars_means;
	by Make Drivetrain Type;
run;

I am in the end interested in frequencies per by group (COUNT) as well as the denominator used to calculate them (MSRP_sum_carstate). Sorting before turned out not to be that resource-intensive after all, but using one step instead of two would still be cleaner.

Tom
Super User Tom
Super User

So with example data we can begin to get an idea what you are trying to do.  Looks like you have grouping variables (in this case 3 of them: Make Drivetrain Type) that uniquely identify the groups.   And you have a CLASS variable (in this case: Model) to subdivide the groups.  And an analysis variable (in this case MSRP) that you want to SUM.

 

One obvious savings is to use the output of the first summary (at the BY GROUP* CLASS VARIABLE level) to calculate the BY GROUP sums.

 

In your example that would be changing the step that makes CARS_MEANS to use CARS_FREQ as the input.

proc summary data=cars_freq ;
  by make drivetrain type;
  var count ;
  output out=cars_means(drop=_:) sum=MSRP_sum_carstate;
run;

It just doesn't attach the DOLLAR format to the variable.

 

Here is how you can use one PROC SUMMARY (MEANS and SUMMARY are the same procedure, they just use different defaults for the PRINT/NOPRINT option) and one data step to calculate your COUNT, PERCENT and MSRP_sum_carstate variables.

proc summary data=sashelp.cars chartype ;
  class Make Drivetrain Type Model ;
  types Make*Drivetrain*Type Make*Drivetrain*Type*Model;
  output out=step1 sum(MSRP)=COUNT ;
run;

data want;
  merge step1(where=(not indexc(_type_,'0')))
        step1(where=(indexc(_type_,'0')) drop=Model rename=(COUNT=MSRP_sum_carstate))
  ;
  by Make Drivetrain Type;
  PERCENT=100*(COUNT/MSRP_sum_carstate);
  drop _type_ _freq_ ;
run;

Let's test and see if they make the same values (within 6 decimal places).

proc compare data=want compare=final method=absolute criterion=10E-6;
run;

Results

Variables Summary

Number of Variables in Common: 7.
Number of Variables with Differing Attributes: 2.


Listing of Common Variables with Differing Attributes

Variable  Dataset     Type  Length  Format    Label

COUNT     WORK.WANT   Num        8  DOLLAR8.
          WORK.FINAL  Num        8            Frequency Count
PERCENT   WORK.WANT   Num        8
          WORK.FINAL  Num        8            Percent of Total Frequency


Observation Summary

Observation      Base  Compare

First Obs           1        1
Last  Obs         428      428

Number of Observations in Common: 428.
Total Number of Observations Read from WORK.WANT: 428.
Total Number of Observations Read from WORK.FINAL: 428.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 428.


Values Comparison Summary

Number of Variables Compared with All Observations Equal: 7.
Number of Variables Compared with Some Observations Unequal: 0.
Total Number of Values which Compare Unequal: 0.
Total Number of Values not EXACTLY Equal: 141.
Maximum Difference: 1.4211E-14.

 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1665 views
  • 0 likes
  • 3 in conversation