BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Siroo
Calcite | Level 5

Hi all,

I have question pertaining to the challenger question in SAS programming 1 Lesson 5 - Analyzing and Reporting on Data.

Image below shows the solution to the "Challenger" practice of Topic "Creating Summary Reports and Data".

Screenshot 2020-11-29 131656.png

My question:

Why didn't the proc means step creating the output of top 3 parks grouped by REGION and YEARS?

The answer for 188594 is the third highest number of park visitors in Alaska region in the month of JUNE of year 2010.

 

If I were to sum the total of visitors by YEARS and REGION, what would the code look like?

 

Regards,

Siroo

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The value of MONTH is not considered at all by the PROC since you never referenced it.

 

It is using the values of VISITERS and PARKNAME from the three observations with the maximum value of VISITOR within the group of observations defined by the combination of REGION and YEAR.

 

The only reason it looks to like it has anything to do with months in because in your dataset there is a MONTH variable to distinguish the multiple observations per region and year combination

 

If you had daily counts instead of monthly counts (so 365 observations per region per year instead of just 12) then the top 3 would be the top daily counts.

 

If you want to see the month that corresponds to the values of VISITORS that you are outputting add it into the list of variables to select.

idgroup(max(Visitors) out[3] (Visitors ParkName Month)=)

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

The whole point of the IDGROUP is to let you output some of the individual values that are used to create the aggregate values that the normal options on the OUTPUT statement let you create.  Sounds like from your description the third largest number of visitors to the Alaska region during the year 2010 occurred in the month of June. 

 

Here is an example you can run that does not require the datasets from that course.

The CLASS statement will group the data by CLASS and the two IDGROUP will get some of the detail information for the two tallest and two shortest in the group.

proc summary data=sashelp.class nway ;
  class sex;
  var height ;
  output out=summary max=max min=min
    idgroup (max(height) out[2] (name height)=tall_name tall_height )
    idgroup (min(height) out[2] (name height)=short_name short_height ) 
  ;
run;
                                tall_   tall_    tall_    tall_  short_ short_  short_   short_
Obs Sex _TYPE_ _FREQ_  max  min name_1 name_2  height_1 height_2 name_1 name_2 height_1 height_2

 1   F     1      9   66.5 51.3 Mary   Barbara   66.5     65.3   Joyce  Louise   51.3     56.3
 2   M     1     10   72.0 57.3 Philip Alfred    72.0     69.0   James  Thomas   57.3     57.5

So you can see that the tallest boy is Philip and the shortest girl Joyce.  But you can also see that the second shortest boy is Thomas and the second tallest girl is Barbara.

Siroo
Calcite | Level 5

Hi Tom,

Let's assume the dataset comes with the variable REGION, MONTH, YEAR, PARKNAME and VISITORS.

The result of code below would return the top 3 number visitors by REGION, YEAR and MONTH.

 

proc means data=pg1.np_multiyr noprint;
    var Visitors;
    class Region Year;
    ways 2;
    output out=top3parks(drop=_freq_ _type_)
           sum=TotalVisitors
    	   idgroup(max(Visitors) out[3] (Visitors ParkName)=);
run;

 

SAS output running the above code:

Screenshot 2020-11-30 081033.png

 

By referring to the raw data below, 193,116 visitors is the data for Alaska in the 8th month of 2010.

Screenshot 2020-11-30 081411.png

 

I am just wondering since the code only classify the VISITORS by REGION and YEAR, why would "MONTH" be considered when there is no MONTH variable in the code?

Tom
Super User Tom
Super User

The value of MONTH is not considered at all by the PROC since you never referenced it.

 

It is using the values of VISITERS and PARKNAME from the three observations with the maximum value of VISITOR within the group of observations defined by the combination of REGION and YEAR.

 

The only reason it looks to like it has anything to do with months in because in your dataset there is a MONTH variable to distinguish the multiple observations per region and year combination

 

If you had daily counts instead of monthly counts (so 365 observations per region per year instead of just 12) then the top 3 would be the top daily counts.

 

If you want to see the month that corresponds to the values of VISITORS that you are outputting add it into the list of variables to select.

idgroup(max(Visitors) out[3] (Visitors ParkName Month)=)

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1568 views
  • 0 likes
  • 2 in conversation