HELP! Trouble sorting complex data

Reply
Occasional Contributor
Posts: 6

HELP! Trouble sorting complex data

Hello,

 

I am working on my first big SAS project as part of my Capstone and I am completely stuck on how to get this complex data sorted. The data is daily readings of precipitation and temperature (min, mean, max) by day from over 90 weather stations (stationID). I have assigned each date into a CDC week (week 1,2,3, etc), but this data spans 12 years so there are 12 "Week 18's" for example. 

 

I need to know how to generate a new tab for average precipitation, and then tell SAS to give me the average precipitation for each station by week by year (ie, station akr02 for each week in 2004, each week in 2005, etc).

 

I have gone through my whole Little SAS book and numerous online tutorials and cannot figure out how to do this. Any help would be GREATLY appreciated!

 

I added a photo of the current data set I have (include the empty columns where the averages by week will go (I manually filled in the first row of those columns as an example).

 

 

Mary (A very stressed and overwhelmed graduate studentavg week.PNG

Super Contributor
Posts: 299

Re: HELP! Trouble sorting complex data

Posted in reply to MLKohrman0321

It will be good to give data in a format people can experiment to give a suitable solution. Give an example data set in the form of "DATALINES" with year, week, stationID, precipitation and temperature. Also provide the expected output derived from the example data set. 

It looks that your problem can be solved without even sorting the data. For instance, year and week-number is enough to summarize what seems from your description.

Frequent Contributor
Posts: 120

Re: HELP! Trouble sorting complex data

Posted in reply to MLKohrman0321

Start by adding a year column so you can sort appropriately. Either that or append a year to your week number: ex. 201708.

 

You don't have a data step or any other code here so hard to help.

 

Based upon what little info is here, I would suggest doing a sort with the year and week, then do a proc summary with a by statement. Then output the results.

Super User
Posts: 20,203

Re: HELP! Trouble sorting complex data

Note, I've combined your two posts into one as they're identical.
Occasional Contributor
Posts: 6

Re: HELP! Trouble sorting complex data

Do you know the steps to append a year to week number: ex. 201708? I have almost 200,000 rows of data so I can't do it manually. This is one of the main hindrances to me getting my data sorted by one variable.

Thanks!

Super User
Posts: 20,203

Re: HELP! Trouble sorting complex data

Posted in reply to MLKohrman0321

There's nothing shown in your data that I see about dates. 


It sort of depends on you start with that dictates how you do it. 

 

Assuming you have a year and week variable you could just concatenate them using CATX. 

 

date_entry = catx("-", year, week);
Super User
Posts: 20,203

Re: HELP! Trouble sorting complex data

Posted in reply to MLKohrman0321

You don't even have to sort. You can include the variables in a CLASS statement instead of BY statement, make sure to add the NWAY option to the PROC MEANS statement then. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic

 

The documentation has further examples. 

http://documentation.sas.com/?docsetId=proc&docsetTarget=n1qnc9bddfvhzqn105kqitnf29cp.htm&docsetVers...

 

If you're having further issues, post back with the code, errors/log and sample data. 

 

Remember that if you want example code you need to provide example data in a format we can work with. I'm not about to spend time typing out your data to work with AND typing out code to solve your assignment. 

 

If you're willing to learn, the first e-course on Statistics in SAS is offered free online. 

Occasional Contributor
Posts: 6

Complex data, need to sorted by stationID/year/week and get averages

[ Edited ]
Posted in reply to MLKohrman0321

Hello,

 

I am working on my first big SAS project as part of my Capstone and I am completely stuck on how to get this complex data sorted. The data is daily readings of precipitation and temperature (min, mean, max) by day from over 90 weather stations (stationID). I have assigned each date into a CDC week (week 1,2,3, etc), but this data spans 12 years so there are 12 "Week 18's" for example. 

 

I need to know how to generate a new tab for average precipitation, and then tell SAS to give me the average precipitation for each station by week by year (ie, station akr02 for each week in 2004, each week in 2005, etc).

 

I have gone through my whole Little SAS book and numerous online tutorials and cannot figure out how to do this. Any help would be GREATLY appreciated!

 

I added a photo of the current data set I have (include the empty columns where the averages by week will go (I manually filled in the first row of those columns as an example).

 

 

Mary (A very stressed and overwhelmed graduate studentavg week.PNG

Occasional Contributor
Posts: 6

Re: HELP! Trouble sorting complex data

Posted in reply to MLKohrman0321

Thank you! You have all been really helpful. I am reading the links you posted and working on my code now. If I have further questions I'll be sure to post it and include my code and data (sorry for not doing that before, this was my first post so I wasn't sure if posting code was ok).

 

Thanks again!!

Contributor
Posts: 29

Re: HELP! Trouble sorting complex data

Posted in reply to MLKohrman0321

here you go ..

 

 

 

proc sql;

create table yo as

select year(date) as year, month(date) as month, week(date,'w') as week, coalesce(snydjcm,0) as amt

from sashelp.citiday

order by year, month, week;

quit;

 

data yo2;

set yo;

by year month week;

retain total_accrued_annual days_counter_annual total_accrued_monthly days_counter_monthly total_accrued_weekly days_counter_weekly;

if (first.year) then do;

days_counter_annual=1;

total_accrued_annual=amt; end;

else do;

days_counter_annual+1;

total_accrued_annual=total_accrued_annual+amt;

end;

if(first.month) then do;

days_counter_monthly=1;

total_accrued_monthly=amt; end;

else do;

days_counter_monthly+1;

total_accrued_monthly=total_accrued_monthly+amt;

end;

if(first.week) then do;

days_counter_weekly=1;

total_accrued_weekly=amt; end;

else do;

days_counter_weekly+1;

total_accrued_weekly=total_accrued_weekly+amt;

end;

YTD_AVG=total_accrued_annual/days_counter_annual;

MTD_AVG=total_accrued_monthly/days_counter_monthly;

WTD_AVG=total_accrued_weekly/days_counter_weekly;

 

run;

 

Super User
Posts: 11,517

Re: HELP! Trouble sorting complex data

Posted in reply to MLKohrman0321

MLKohrman0321 wrote:

Hello,

 

I am working on my first big SAS project as part of my Capstone and I am completely stuck on how to get this complex data sorted. The data is daily readings of precipitation and temperature (min, mean, max) by day from over 90 weather stations (stationID). I have assigned each date into a CDC week (week 1,2,3, etc), but this data spans 12 years so there are 12 "Week 18's" for example. 

 

I need to know how to generate a new tab for average precipitation, and then tell SAS to give me the average precipitation for each station by week by year (ie, station akr02 for each week in 2004, each week in 2005, etc).

 

 

 

 


A large number of complexities involving date type data is trying to use non-date valued variables. SAS date valued variables can display or group data using formats.

An brief example:

data example;
   do datevalue = '01Jan2017'd to '30Apr2017'd;
      y= 10*rand('uniform');
      output;
   end;
run;
proc format library=work;
picture yearwk  (default=7)
low-high ='%Y-%U'    (datatype=date)
;
run;

proc means data=example min max sum;
   class datevalue;
   format datevalue yearwk.;
   var y ;
run;

The %U directive is one of three ways to define week that involves start day and how to consider which year a boundary value may have. Others are %V and %W (case sensitive). If you want a leading 0 in the week number then use %0U, %0V or %0W.

 

Use of %G instead of %Y also affects week assignment for year boundaries.

 

So having a DATE value often simplifies grouping for summaries and display at the same time.

Ask a Question
Discussion stats
  • 10 replies
  • 111 views
  • 1 like
  • 6 in conversation