I am trying to create a table from a VERY large dataset. I have several variables that need to be included (city, lat, long, year, month) and some I want to calculate. I have daily data with an avg, min and max for each day
ie:
Name Date Lat Long Month Year Max Min Average
Shaoguan 01/01/2000 24.667 113.6 Jan 2000 59 21 45
Shaoguan 01/02/2000 24.667 113.6 Jan 2000 63 20 49
I need to make a table that sorts by name (and keeps the corresponding lat & long), then by year, then by month, and that does away with the daily figures and just shows the average of max, min, and average for each month. I get this information via proc means, but the table format is not what I need. I want to add the average of each of the min, max and avg variables as new columns, rather than the way they display as rows in proc means.
I need a final result like this:
Name Lat Long Month Year AvgMax AvgMin AvgAverage
Shaoguan 24.667 113.6 Jan 2000 59 21 45
Shaoguan 24.667 113.6 Feb 2000 54 18 37
@lcsitl wrote:
I am trying to create a table from a VERY large dataset. I have several variables that need to be included (city, lat, long, year, month) and some I want to calculate. I have daily data with an avg, min and max for each day
ie:Name Date Lat Long Month Year Max Min Average
Shaoguan 01/01/2000 24.667 113.6 Jan 2000 59 21 45Shaoguan 01/02/2000 24.667 113.6 Jan 2000 63 20 49
I need to make a table that sorts by name (and keeps the corresponding lat & long), then by year, then by month, and that does away with the daily figures and just shows the average of max, min, and average for each month. I get this information via proc means, but the table format is not what I need. I want to add the average of each of the min, max and avg variables as new columns, rather than the way they display as rows in proc means.
I need a final result like this:
Name Lat Long Month Year AvgMax AvgMin AvgAverage
Shaoguan 24.667 113.6 Jan 2000 59 21 45Shaoguan 24.667 113.6 Feb 2000 54 18 37
Naming your variables max and min makes for ugly variable names in combinations.
If you showed the proc means code you used it likely only needs a small change. I would start here:
proc summary data=have nway; class name lat long month year; var max min average; output out=want mean= / autoname; run;
The output set has two variables _freq_ and _type_ that you might drop. The _freq_ is how many records were used for calculating stats on that record and _type_ indicates combinations of class variables. The NWAY on the Proc statement says to only have the output for combinations containing values for all of the class variables.
My code is basically that:
proc means data=Daily nonobs ;
class Name Latitude Longitude Elevation Year Month;
var TAvg TMax TMin;
output out=Monthly mean(TAvg) = MeanAvg mean(TMax)=MeanMax mean(TMin)=MeanMin;
run;
But it gives me this output ^^, and for what I need moving forward, I need this output to have been 3 rows, not 9, with the added columns of TAvg, TMax, and TMin. I can't figure out how to get there. It seems like it should be easier than it is proving.
@lcsitl wrote:
My code is basically that:
proc means data=Daily nonobs ; class Name Latitude Longitude Elevation Year Month; var TAvg TMax TMin; output out=Monthly mean(TAvg) = MeanAvg mean(TMax)=MeanMax mean(TMin)=MeanMin; run;
But it gives me this output ^^, and for what I need moving forward, I need this output to have been 3 rows, not 9, with the added columns of TAvg, TMax, and TMin. I can't figure out how to get there. It seems like it should be easier than it is proving.
Notice that I used PROC SUMMARY. While the syntax is similar the output dataset structure by default is different. Try it and see.
Oh, jeeze, yes, I see. Thank you!
That is so close to helping, except the table output is 128 rows, only the last 3 are what I want. And that's with the sample I'm running with only 63 rows to start (what I will ultimately be processing is about a million rows.. It's like it's spitting out all possible combinations of the class variables. I have tried moving them around or eliminating them, but I still don't get what I need, without losing things I do.
The 3 rows of data in the red square is all I want.
I have been fooling around with proc tabulate as an alternative, but that gives a way more complicated result than I want.
Details. Please look at the code I showed as an example and see if something is different in yours:
proc summary data=have nway; class name lat long month year; var max min average; output out=want mean= / autoname; run;
NWAY says to only take the records with all of the combinations present.
You have apparently dropped the _TYPE_ variable that would have provided a clue as to which records you wanted for further processing.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.