BookmarkSubscribeRSS Feed
marinalev
Fluorite | Level 6

Hi everyone! I have quite a bit of data (varying amounts per day) and I only want one data point (highest) per day. Sometimes, there's no data for that date.

 

Have: (note, I'm only showing one participant, UU0500, but there will be many)

UUIDinjury_datedate_collectedheart_rate
UU050010/01/2010/01/20100
UU050010/01/2010/01/20104
UU050010/01/2010/02/2098
UU050010/01/2010/02/2099
UU050010/01/2010/02/20110
UU050010/01/2010/04/2089
UU050010/01/2010/04/20105
UU050010/01/2010/04/20103
UU050010/01/2010/04/20 100

 

Want: I want the highest value per date, for the same UUID. The caveat is, I need all the dates to be a function of "days from injury_date". I don't care about the date itself.

UUIDheart_rate_d0heart_rate_d1heart_rate_d2heart_rate_d3
UU0500104110. 105

 

What I'm thinking is there needs to be a new variable created that counts the number of days between the date of injury and the date of the data collected. Then there needs to be some serious data manipulation... I have no formal training with this and would really appreciate any of your help!

5 REPLIES 5
ballardw
Super User

Are your dates actually date values? Numeric with a format like MMDDYY6. applied?

 

If so the first thing would be in a data step add a variable with the number of days between the injury_date and date_collected.

Then include the number of days variable in by group processing with proc means/summary to get the maximum value of the measurement by UUID days.

 

 

Do you want a Report, that people read, or an actual data set? If you want a data set that structure is likely to be harder to work with.

 

I would also say that you may want to consider having the  injury_date in the output just in case the same UUID has more than one injury history.

marinalev
Fluorite | Level 6

I can apply that format but in my excel, the date format is how I wrote it in the post. This is to create a data set not a report. The participants in the data only have one injury date! We have a lot of data formatted the way want is formatted so that is the ideal format in the end. Thanks for you input!!

Kurt_Bremser
Super User

For a dataset to work with, the wide format is (almost) always BAD. The wide layout is only good for human consumption, or multivariate regressions and the like.

So you should seriously reconsider storing your data like this. We have several questions per day here on the communities where we have to re-transpose data so that we can use SAS procedures and/or uncomplicated data step/SQL code.

marinalev
Fluorite | Level 6

Hi thank you for the resources! I will check them out! We mainly work with multivariate regression analysis.

ballardw
Super User

@marinalev wrote:

I can apply that format but in my excel, the date format is how I wrote it in the post. This is to create a data set not a report. The participants in the data only have one injury date! We have a lot of data formatted the way want is formatted so that is the ideal format in the end. Thanks for you input!!


The question wasn't about the format. The question was about the value in SAS. Do you have a SAS date set?

If the dates are not SAS numeric value, typically with some sort of SAS format applied so people understand them then you need to do extra work to get things like the number of days from injury.

 

So do you even have a SAS data set yet? We really don't care about the Excel much except when it is time to read or write data from/to another file format.

 

And the reason I do not think that data structure is "ideal" because almost every bit of code needs to be reviewed when you have a new largest number of days from injury.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 694 views
  • 0 likes
  • 3 in conversation