Hello,
I am trying to convert a variable that contains multiple format (LASTPAY) to a single/ standardized format (NEWEXPDATE).
Any idea how one would go about that?
Thanks!
CF
Here is one way:
data have; input havedate $10.; cards; 42916 6/23/2017 20170630 2017-06-21 ; data want; set have; format wantdate date9.; wantdate=input(havedate,anydtdte10.); if missing(wantdate) then wantdate=input(havedate,8.)-21916; run;
Art, CEO, AnalystFinder.com
How many variations of the date do you have? I see just two in your example. But you can conditionally convert using the approprirate format based on a pattern match. That is, if it looks like it's already a SAS date value, just input(lastpay, best.). If it's a date format, input(lastpay, mmddyy10.), for example.
Hello!
There is four (the data wasn't assembled by me and it the process of aggregating the dataset, the conversion wasn't done at the source unfortunately):
42916 |
6/23/2017 |
20170630 |
2017-06-21
I've attached a sample.
Let me know!
Thank!
data have;
input lastpay$10.;
newexpdate=input(lastpay,mmddyy10.);
format newexpdate mmddyy10.;
cards;
42917
6/27/2017
;
run;
Are the values you've shown correct, ie the is the correct new variable in that data set?
Can you provide data as text to test?
I've attached a sample, there are at least 4 different formats (that I could find).
Thanks for the help!
Most of your "dates" look like the Excel days from their offset date of 1 Jan 1900 except for a few that were entered as text in a manner that Excel did not convert to its internal date.
One way: Make sure that the entire column in the Excel sheet has the cell format set to a date.
Then export to CSV and import into SAS.
Here is one way:
data have; input havedate $10.; cards; 42916 6/23/2017 20170630 2017-06-21 ; data want; set have; format wantdate date9.; wantdate=input(havedate,anydtdte10.); if missing(wantdate) then wantdate=input(havedate,8.)-21916; run;
Art, CEO, AnalystFinder.com
This code I found much more accurate:
data work.want;
set work.have;
format datenew date9.;
datenew =input(datehave,8.)-21916;
if missing(datenew ) then datenew =input(datehave,anydtdte10.);
run;
There is another complication in your example data:
You are showing a Lastpay value of 42909 that, depending on the row, apparently is supposed to transform to 09/01/2020, 11/22/2019, 05/01/2021. Is that actually the case?
Against my better judgement I opened that Excel file.
I copied the data column to another column to manipulate in Excel. Highlight the new column, go to the data tab, select the Text to columns and convert everything to numeric. Then applied the DATE format to the cells.
This is what i get for the first few rows (tab delimited text with a blank column to help readability)
LASTPAY After Text to column and apply date format. 42887 6/1/2017 6/24/2017 6/24/2017 42887 6/1/2017 42917 7/1/2017 42906 6/20/2017 2017-06-30 6/30/2017 42911 6/25/2017 6/30/2017 6/30/2017 42917 7/1/2017 42916 6/30/2017 42907 6/21/2017 42887 6/1/2017 42916 6/30/2017 42901 6/15/2017 42905 6/19/2017 42908 6/22/2017 42916 6/30/2017 42917 7/1/2017 42922 7/6/2017 42907 6/21/2017 42923 7/7/2017 42915 6/29/2017
I suspect that your process somewhere along the line turned most of your dates into character values after removing the date format for some reason.
@ballardw: You have more courage than I do. I attempted to open the file using Chrome's fairly safe viewer, but it was unable to open the file so I decided not to open it in Excel.
Art, CEO, AnalystFinder.com
Ya I think that's the problem - unfortunately, all I have is a serveral million observations with that glitch in it and I can't really go back to the source and ask them to fix the raw variables (the dataset is an amalgation of multiple sources, all with different formats) - so I was hoping that I could somehow just fix it within SAS.
This question has come up before. Basically you have a column in Excel that has mixed dates and character strings. So when you convert it into a SAS dataset using PROC IMPORT or LIBNAME engine SAS will set the variable type to character. The dates come over as the text version of the integer value that EXCEL uses for dates. You just need to convert them. I would test if the value is an integer and then convert the integer by correcting for the difference in offset dates. And then use the ANYDTDTE informat to convert the other values.
data want ;
set have ;
date = input(chardate,??11.);
if missing(date) then date=input(chardate,anydtdte11.);
else date=date + '01JAN1900'd -2 ;
format date date9.;
run;
The -2 in the conversion formula is because SAS starts counting from zero and Excel starts counting from one and Excel mistakenly thinks that 1900 was a leap year.
Note you could still have trouble if your multiple sources come from locations where people use MDY ordering for dates and also from places where people use DMY ordering for dates. In that case there really is no way to tell which is the right date without some additional knowledge. For example if you know the dates should always be the first of the month then you would know the '1/7/2017' meant 01JUL2017 and not 07JAN2017. Or if you know the source location of the record you could use that. Otherwise there will be no way to convert a value like '10/12/2016' with 100% convidence.
I prefer the order the way I proposed as, otherwise, a date like 20170630 doesn't translate correctly and you end up with all kinds of notes in your log.
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.