turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- HELP! Trouble sorting complex data

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-15-2017 09:17 PM

Hello,

I am working on my first big SAS project as part of my Capstone and I am completely stuck on how to get this complex data sorted. The data is daily readings of precipitation and temperature (min, mean, max) by day from over 90 weather stations (stationID). I have assigned each date into a CDC week (week 1,2,3, etc), but this data spans 12 years so there are 12 "Week 18's" for example.

I need to know how to generate a new tab for average precipitation, and then tell SAS to give me the average precipitation for each station by week by year (ie, station akr02 for each week in 2004, each week in 2005, etc).

I have gone through my whole Little SAS book and numerous online tutorials and cannot figure out how to do this. Any help would be GREATLY appreciated!

I added a photo of the current data set I have (include the empty columns where the averages by week will go (I manually filled in the first row of those columns as an example).

Mary (A very stressed and overwhelmed graduate student

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MLKohrman0321

10-15-2017 10:50 PM

It will be good to give data in a format people can experiment to give a suitable solution. Give an example data set in the form of "DATALINES" with year, week, stationID, precipitation and temperature. Also provide the expected output derived from the example data set.

It looks that your problem can be solved without even sorting the data. For instance, year and week-number is enough to summarize what seems from your description.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MLKohrman0321

10-15-2017 11:05 PM

Start by adding a year column so you can sort appropriately. Either that or append a year to your week number: ex. 201708.

You don't have a data step or any other code here so hard to help.

Based upon what little info is here, I would suggest doing a sort with the year and week, then do a proc summary with a by statement. Then output the results.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to AlanC

10-15-2017 11:35 PM

Note, I've combined your two posts into one as they're identical.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to AlanC

10-16-2017 12:11 AM

Do you know the steps to append a year to week number: ex. 201708? I have almost 200,000 rows of data so I can't do it manually. This is one of the main hindrances to me getting my data sorted by one variable.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MLKohrman0321

10-16-2017 12:15 AM

There's nothing shown in your data that I see about dates.

It sort of depends on you start with that dictates how you do it.

Assuming you have a year and week variable you could just concatenate them using CATX.

`date_entry = catx("-", year, week);`

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MLKohrman0321

10-15-2017 11:34 PM

You don't even have to sort. You can include the variables in a CLASS statement instead of BY statement, make sure to add the NWAY option to the PROC MEANS statement then.

https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic

The documentation has further examples.

If you're having further issues, post back with the code, errors/log and sample data.

Remember that if you want example code you need to provide example data in a format we can work with. I'm not about to spend time typing out your data to work with AND typing out code to solve your assignment.

If you're willing to learn, the first e-course on Statistics in SAS is offered free online.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MLKohrman0321

10-15-2017 09:19 PM - edited 10-15-2017 09:20 PM

Hello,

I am working on my first big SAS project as part of my Capstone and I am completely stuck on how to get this complex data sorted. The data is daily readings of precipitation and temperature (min, mean, max) by day from over 90 weather stations (stationID). I have assigned each date into a CDC week (week 1,2,3, etc), but this data spans 12 years so there are 12 "Week 18's" for example.

I need to know how to generate a new tab for average precipitation, and then tell SAS to give me the average precipitation for each station by week by year (ie, station akr02 for each week in 2004, each week in 2005, etc).

I have gone through my whole Little SAS book and numerous online tutorials and cannot figure out how to do this. Any help would be GREATLY appreciated!

I added a photo of the current data set I have (include the empty columns where the averages by week will go (I manually filled in the first row of those columns as an example).

Mary (A very stressed and overwhelmed graduate student

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MLKohrman0321

10-15-2017 11:38 PM

Thank you! You have all been really helpful. I am reading the links you posted and working on my code now. If I have further questions I'll be sure to post it and include my code and data (sorry for not doing that before, this was my first post so I wasn't sure if posting code was ok).

Thanks again!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MLKohrman0321

10-16-2017 12:33 AM

here you go ..

**proc** **sql**;

create table yo as

select year(date) as year, month(date) as month, week(date,'w') as week, coalesce(snydjcm,**0**) as amt

from sashelp.citiday

order by year, month, week;

**quit**;

**data** yo2;

set yo;

by year month week;

retain total_accrued_annual days_counter_annual total_accrued_monthly days_counter_monthly total_accrued_weekly days_counter_weekly;

if (first.year) then do;

days_counter_annual=**1**;

total_accrued_annual=amt; end;

else do;

days_counter_annual+**1**;

total_accrued_annual=total_accrued_annual+amt;

end;

if(first.month) then do;

days_counter_monthly=**1**;

total_accrued_monthly=amt; end;

else do;

days_counter_monthly+**1**;

total_accrued_monthly=total_accrued_monthly+amt;

end;

if(first.week) then do;

days_counter_weekly=**1**;

total_accrued_weekly=amt; end;

else do;

days_counter_weekly+**1**;

total_accrued_weekly=total_accrued_weekly+amt;

end;

YTD_AVG=total_accrued_annual/days_counter_annual;

MTD_AVG=total_accrued_monthly/days_counter_monthly;

WTD_AVG=total_accrued_weekly/days_counter_weekly;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MLKohrman0321

10-16-2017 10:54 AM

MLKohrman0321 wrote:

Hello,

I am working on my first big SAS project as part of my Capstone and I am completely stuck on how to get this complex data sorted. The data is daily readings of precipitation and temperature (min, mean, max) by day from over 90 weather stations (stationID). I have assigned each date into a CDC week (week 1,2,3, etc), but this data spans 12 years so there are 12 "Week 18's" for example.

I need to know how to generate a new tab for average precipitation, and then tell SAS to give me the average precipitation for each station by week by year (ie, station akr02 for each week in 2004, each week in 2005, etc).

A large number of complexities involving date type data is trying to use non-date valued variables. SAS date valued variables can display or group data using formats.

An brief example:

data example; do datevalue = '01Jan2017'd to '30Apr2017'd; y= 10*rand('uniform'); output; end; run; proc format library=work; picture yearwk (default=7) low-high ='%Y-%U' (datatype=date) ; run; proc means data=example min max sum; class datevalue; format datevalue yearwk.; var y ; run;

The %U directive is one of three ways to define week that involves start day and how to consider which year a boundary value may have. Others are %V and %W (case sensitive). If you want a leading 0 in the week number then use %0U, %0V or %0W.

Use of %G instead of %Y also affects week assignment for year boundaries.

So having a DATE value often simplifies grouping for summaries and display at the same time.