BookmarkSubscribeRSS Feed
paulrockliffe
Obsidian | Level 7

I have a large dataset that features multiple rows per User.  Each row has a date that corresponds to the period where data was collected from the user.  If the User stops needing to send data that ends up in this dataset the date column records 'END' rather than a date.  I odn't know why, but that's the way it is.

 

I want to use a computed column to recode the dates to return the date if there's a date, but if there isn't a date to return the latest date against the appropriate User.  

 

I can do the CASE WHEN bit, but I don't know how to pick out the MAX date for each User?  I'm sure it's really easy, but I've not done it before and I'm struggling to know what to google for!

 

Any help would be appreciated!

 

Thanks

 

Paul.

6 REPLIES 6
Vish33
Lapis Lazuli | Level 10

can you provide sample input data and ouput you required.

Kurt_Bremser
Super User

So the "date" variable is of type character and contains either a valid date or the "END" string.

I'd first convert the date variable into a real SAS date, and for "END" I'd set an artificial high value (9999-12-31).

Then sort by user and date.

Then a data step like this:

data want;
set have;
retain keep_date;
if not first.user and date = '31dec9999'd then date = keep_date;
keep_date = date;
drop keep_date;
run;

It is necessary that at least the first entry for a user contains a valid date.

paulrockliffe
Obsidian | Level 7

Hello, thanks that looks like it should work.  I can't follow the SQL well enough to know that it will definitely work, but it looks like a good starting point.

 

I've actually solved the problem by creating a second Query Builder that takes only the User ID and the MAX of the associated dates, then joining this table back to the original table.  I've then used a CASE WHEN statement to substitute these dates where the date is 'END'.

Vish33
Lapis Lazuli | Level 10

I am little surprised how this date value is 'END' ..is it character variable in the dataset?

paulrockliffe
Obsidian | Level 7

It's data that comes from a legacy system, I've no idea how it ends up that way, or why as the final record would still have a date which is then overwritten.  

 

Presumably someone somewhere in the past didn't specify the ability to record that a record would be the last one and someone else decided it was more important to know that than the date of the record.  In the context of what the data is used for and the age of the system I can sort of understand that, though it's not ideal!

ballardw
Super User

I don't know how you are reading this data into SAS but if you are using a data step you might consider adjusting the process to use a custom format to handle this.

 

proc format library=work;
invalue stoopiddate (upcase)
'END' = '21DEC9999'd
other = [mmddyy10.];
run;

data example;
   informat date stoopiddate.;
   input date;
   format date date9.;
datalines;
01/01/2016
02/02/2016
03/03/2016
end
;
run;

sets the "END" value to a large date. OR use a custom missing such as .E and a format that would display .E as "END". I don't have any clue how you may use the resulting value so either approach may be useful.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 995 views
  • 0 likes
  • 4 in conversation