02-09-2017 04:20 AM
I have a large dataset that features multiple rows per User. Each row has a date that corresponds to the period where data was collected from the user. If the User stops needing to send data that ends up in this dataset the date column records 'END' rather than a date. I odn't know why, but that's the way it is.
I want to use a computed column to recode the dates to return the date if there's a date, but if there isn't a date to return the latest date against the appropriate User.
I can do the CASE WHEN bit, but I don't know how to pick out the MAX date for each User? I'm sure it's really easy, but I've not done it before and I'm struggling to know what to google for!
Any help would be appreciated!
02-09-2017 05:12 AM
So the "date" variable is of type character and contains either a valid date or the "END" string.
I'd first convert the date variable into a real SAS date, and for "END" I'd set an artificial high value (9999-12-31).
Then sort by user and date.
Then a data step like this:
data want; set have; retain keep_date; if not first.user and date = '31dec9999'd then date = keep_date; keep_date = date; drop keep_date; run;
It is necessary that at least the first entry for a user contains a valid date.
02-09-2017 07:53 AM
Hello, thanks that looks like it should work. I can't follow the SQL well enough to know that it will definitely work, but it looks like a good starting point.
I've actually solved the problem by creating a second Query Builder that takes only the User ID and the MAX of the associated dates, then joining this table back to the original table. I've then used a CASE WHEN statement to substitute these dates where the date is 'END'.
02-09-2017 08:30 AM
It's data that comes from a legacy system, I've no idea how it ends up that way, or why as the final record would still have a date which is then overwritten.
Presumably someone somewhere in the past didn't specify the ability to record that a record would be the last one and someone else decided it was more important to know that than the date of the record. In the context of what the data is used for and the age of the system I can sort of understand that, though it's not ideal!
02-09-2017 11:54 AM
I don't know how you are reading this data into SAS but if you are using a data step you might consider adjusting the process to use a custom format to handle this.
proc format library=work; invalue stoopiddate (upcase) 'END' = '21DEC9999'd other = [mmddyy10.]; run; data example; informat date stoopiddate.; input date; format date date9.; datalines; 01/01/2016 02/02/2016 03/03/2016 end ; run;
sets the "END" value to a large date. OR use a custom missing such as .E and a format that would display .E as "END". I don't have any clue how you may use the resulting value so either approach may be useful.