DATA Step, Macro, Functions and more

proc means class

Reply
Frequent Contributor
Posts: 77

proc means class

Hi,

 

I have the following simple proc means with class. 

 

There are a few classes that should produce a variety of results, however, this code takes only the max class and populates it everywhere. What would cause this to happen?

 

proc means data=tabs_dist_stats mean max n;
	class SORT_ORDER;
	var TabsEff_NOTABS TabsEff_TABS;
run;  

An excerpt of the output is attached.

 

 

Respected Advisor
Posts: 2,659

Re: proc means class

Can you show us a representative sample of your data?

 

I can't see your .pdf file in my browser.

--
Paige Miller
Frequent Contributor
Posts: 77

Re: proc means class

Posted in reply to PaigeMiller

The input data is below.

Super User
Posts: 6,542

Re: proc means class

One possibility:  Does the variable SORT_ORDER have a format permanently associated with it?  PROC CONTENTS will reveal that.

 

If that's, the case,  the format can group actual values into formatted levels.  You can temporarily remove the format by adding this statement to the PROC MEANS:

 

format SORT_ORDER;

Frequent Contributor
Posts: 77

Re: proc means class

Posted in reply to Astounding
I inserted the line, but got no change in output.
Super User
Posts: 13,066

Re: proc means class


capam wrote:

Hi,

 

I have the following simple proc means with class. 

 

There are a few classes that should produce a variety of results, however, this code takes only the max class and populates it everywhere. What would cause this to happen?

 


Please describe what "takes only the max class and populates it everywhere" means. Your output example has sort_order, your class variable, with multiple values and so "only the max class" doesn't make sense at all to me.

Frequent Contributor
Posts: 77

Re: proc means class

Thanks for the question. On the output under column SORT_ORDER the numbers go from 1113 on up. Under the Mean column there are duplicate values. For example, 1121:1129 all have 506044 which should apply only to 1129. The Mean value for 1120 is the max Mean which somehow is also placed on the values for 1121:1126. The same phenomenon applies to Maximum. Each of the SORT_ORDER's should have it's unique values for Mean/Maximum. The same applies to 1113:1118. The respective Mean/Maximum should be unique for each.

I hope this is clearer now.
Frequent Contributor
Posts: 77

Re: proc means class

Should be 'The Mean value for 1121'. Also, when I stated 'The same applies to 1113:1118' I intended to mean that 417767 and 315591 are repeated in 1113:1118. Those numbers for Mean are also repeated in 1130:1136.
Super User
Posts: 13,066

Re: proc means class

 

If you run this code I think you will find that you do not have unique values for the variable TabsEff-NOTABS.

 

proc freq data=tabs_dist_stats;
   tables sort_order * tabsEff_NoTabs/ list;
run;

That will produce a table of each value of sort_order pared with each value of tabsEff_NOTABS as they exist in your data with one row per combination and a count.

 

 

When I see repeated values for the mean of an analysis variable within a class variable as you show it usually means that the exact same pattern (number and values) for the analysis variables are repeated for the class the variable. Not that in you result table the N  values repeat for those group with the name number of values for the analysis variables for class 1113 through 1118: 7 non-missing values for TabsEff_NOTABS and 322 for TabsEff_TABS. Which exact same values show up again for N for the class variable values of 1130 through 1136. So you have lots of duplicated data in your data set.

Frequent Contributor
Posts: 77

Re: proc means class

Yes. I've somehow corrupted the data. Thanks.
Super User
Posts: 13,066

Re: proc means class


capam wrote:
Yes. I've somehow corrupted the data. Thanks.

Without seeing code a common issue like this can come from a data step that looks like:

 

Data want;

   set want;

<stuff>.

run;

or especially

Data want;

   set (or merge) want

         dataset2

   ;

run;

 

If you add records or a number of other things you can replace your original data set with one containing more (or fewer!) records. And if you rerun the code testing to add additional variables you can do this multiple times.

 

So see if your previous code has any of the input/output sets with the same name.

And be very careful when using that construct.

Ask a Question
Discussion stats
  • 10 replies
  • 148 views
  • 2 likes
  • 4 in conversation