Hi everyone! I have quite a bit of data (varying amounts per day) and I only want one data point (highest) per day. Sometimes, there's no data for that date.
Have: (note, I'm only showing one participant, UU0500, but there will be many)
UUID | injury_date | date_collected | heart_rate |
UU0500 | 10/01/20 | 10/01/20 | 100 |
UU0500 | 10/01/20 | 10/01/20 | 104 |
UU0500 | 10/01/20 | 10/02/20 | 98 |
UU0500 | 10/01/20 | 10/02/20 | 99 |
UU0500 | 10/01/20 | 10/02/20 | 110 |
UU0500 | 10/01/20 | 10/04/20 | 89 |
UU0500 | 10/01/20 | 10/04/20 | 105 |
UU0500 | 10/01/20 | 10/04/20 | 103 |
UU0500 | 10/01/20 | 10/04/20 | 100 |
Want: I want the highest value per date, for the same UUID. The caveat is, I need all the dates to be a function of "days from injury_date". I don't care about the date itself.
UUID | heart_rate_d0 | heart_rate_d1 | heart_rate_d2 | heart_rate_d3 |
UU0500 | 104 | 110 | . | 105 |
What I'm thinking is there needs to be a new variable created that counts the number of days between the date of injury and the date of the data collected. Then there needs to be some serious data manipulation... I have no formal training with this and would really appreciate any of your help!
Are your dates actually date values? Numeric with a format like MMDDYY6. applied?
If so the first thing would be in a data step add a variable with the number of days between the injury_date and date_collected.
Then include the number of days variable in by group processing with proc means/summary to get the maximum value of the measurement by UUID days.
Do you want a Report, that people read, or an actual data set? If you want a data set that structure is likely to be harder to work with.
I would also say that you may want to consider having the injury_date in the output just in case the same UUID has more than one injury history.
I can apply that format but in my excel, the date format is how I wrote it in the post. This is to create a data set not a report. The participants in the data only have one injury date! We have a lot of data formatted the way want is formatted so that is the ideal format in the end. Thanks for you input!!
For a dataset to work with, the wide format is (almost) always BAD. The wide layout is only good for human consumption, or multivariate regressions and the like.
So you should seriously reconsider storing your data like this. We have several questions per day here on the communities where we have to re-transpose data so that we can use SAS procedures and/or uncomplicated data step/SQL code.
Hi thank you for the resources! I will check them out! We mainly work with multivariate regression analysis.
@marinalev wrote:
I can apply that format but in my excel, the date format is how I wrote it in the post. This is to create a data set not a report. The participants in the data only have one injury date! We have a lot of data formatted the way want is formatted so that is the ideal format in the end. Thanks for you input!!
The question wasn't about the format. The question was about the value in SAS. Do you have a SAS date set?
If the dates are not SAS numeric value, typically with some sort of SAS format applied so people understand them then you need to do extra work to get things like the number of days from injury.
So do you even have a SAS data set yet? We really don't care about the Excel much except when it is time to read or write data from/to another file format.
And the reason I do not think that data structure is "ideal" because almost every bit of code needs to be reviewed when you have a new largest number of days from injury.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.