Hi Everyone,
Please i have a partial date problem for both starting of and ending of adverse event.
the aestdtc = 18-11-202
aeendtc= U-U-2020
How do i impute this partial dates knowing fully well that other dates in the column have length as
aestdtc = 7-5-2020 where 7 corresponds to Day and 5 corresponds to Month.
Warm regards
@chimukah wrote:
Please i have a partial date problem for both starting of and ending of adverse event.
the aestdtc = 18-11-202
aeendtc= U-U-2020
How do i impute this partial dates knowing fully well that other dates in the column have length as
aestdtc = 7-5-2020 where 7 corresponds to Day and 5 corresponds to Month.
Well, this is a bit difficult. You may not be able to correctly determine some dates given partial dates like these. However, there are some things that you can do:
Parse the data. First, you would want to break the date into its Day, Month, and Year components using the SCAN function with '-' as your delimiter.
Correct the data. Second, you would create correction rules such as:
You could make more sophisticated rules, like setting a range on what years are valid or imputing the start year from the end year if the end year is valid, but the above are some ideas to start with.
Check the data. Third, you would want to identify illogical or invalid conditions that remain after your parsing and correcting. The End should not be before the Start, for example. Perhaps you would want to delete records with illogical conditions from the final data or at least set the illogical values to missing.
Below is a sample program that encodes these steps (parse, correct, check). The program probably won't do everything you need, but you can use it as a starting point.
Jim
DATA Corrected_Dates (KEEP=Final_aestdtc Final_aeendtc);
FORMAT Final_aestdtc DATE9.;
FORMAT Final_aeendtc DATE9.;
INFILE Datalines DSD DLM='09'X MISSOVER;
INPUT
aestdtc $
aeendtc $
;
Work_Date = aestdtc;
Type = 'START';
LINK Parse_Date;
Final_aestdtc = SAS_Date;
Work_Date = aeendtc;
Type = 'END';
LINK Parse_Date;
Final_aeendtc = SAS_Date;
IF MISSING(Final_aestdtc) THEN
DO;
PUTLOG "WARNING: Invalid start date on record " _N_= COMMA17.;
END;
IF MISSING(Final_aeendtc) THEN
DO;
PUTLOG "WARNING: Invalid end date on record " _N_= COMMA17.;
END;
IF Final_aeendtc < Final_aestdtc THEN
DO;
PUTLOG "WARNING: End before start on record " _N_= COMMA17.;
CALL MISSING(Final_aestdtc, Final_aeendtc);
END;
******;
RETURN;
******;
**********;
Parse_Date:
**********;
Day = SCAN(Work_Date, 1, '-');
Month = SCAN(Work_Date, 2, '-');
Year = SCAN(Work_Date, 3, '-');
IF Type = 'START' THEN
DO;
IF NOTDIGIT(STRIP(Day)) THEN
Day = '1';
IF NOTDIGIT(STRIP(Month)) THEN
MONTH = '1';
END;
IF Type = 'END' THEN
DO;
IF NOTDIGIT(STRIP(Day)) THEN
Day = '31';
IF NOTDIGIT(STRIP(Month)) THEN
MONTH = '12';
END;
IF Year = '202' THEN
Year = '2020';
SAS_Date = MDY(INPUT(STRIP(Month), 2.), INPUT(STRIP(Day), 2.), INPUT(STRIP(Year), 4.));
******;
RETURN;
******;
DATALINES;
18-11-202 U-U-2020
;
RUN;
@chimukah wrote:
Please i have a partial date problem for both starting of and ending of adverse event.
the aestdtc = 18-11-202
aeendtc= U-U-2020
How do i impute this partial dates knowing fully well that other dates in the column have length as
aestdtc = 7-5-2020 where 7 corresponds to Day and 5 corresponds to Month.
Well, this is a bit difficult. You may not be able to correctly determine some dates given partial dates like these. However, there are some things that you can do:
Parse the data. First, you would want to break the date into its Day, Month, and Year components using the SCAN function with '-' as your delimiter.
Correct the data. Second, you would create correction rules such as:
You could make more sophisticated rules, like setting a range on what years are valid or imputing the start year from the end year if the end year is valid, but the above are some ideas to start with.
Check the data. Third, you would want to identify illogical or invalid conditions that remain after your parsing and correcting. The End should not be before the Start, for example. Perhaps you would want to delete records with illogical conditions from the final data or at least set the illogical values to missing.
Below is a sample program that encodes these steps (parse, correct, check). The program probably won't do everything you need, but you can use it as a starting point.
Jim
DATA Corrected_Dates (KEEP=Final_aestdtc Final_aeendtc);
FORMAT Final_aestdtc DATE9.;
FORMAT Final_aeendtc DATE9.;
INFILE Datalines DSD DLM='09'X MISSOVER;
INPUT
aestdtc $
aeendtc $
;
Work_Date = aestdtc;
Type = 'START';
LINK Parse_Date;
Final_aestdtc = SAS_Date;
Work_Date = aeendtc;
Type = 'END';
LINK Parse_Date;
Final_aeendtc = SAS_Date;
IF MISSING(Final_aestdtc) THEN
DO;
PUTLOG "WARNING: Invalid start date on record " _N_= COMMA17.;
END;
IF MISSING(Final_aeendtc) THEN
DO;
PUTLOG "WARNING: Invalid end date on record " _N_= COMMA17.;
END;
IF Final_aeendtc < Final_aestdtc THEN
DO;
PUTLOG "WARNING: End before start on record " _N_= COMMA17.;
CALL MISSING(Final_aestdtc, Final_aeendtc);
END;
******;
RETURN;
******;
**********;
Parse_Date:
**********;
Day = SCAN(Work_Date, 1, '-');
Month = SCAN(Work_Date, 2, '-');
Year = SCAN(Work_Date, 3, '-');
IF Type = 'START' THEN
DO;
IF NOTDIGIT(STRIP(Day)) THEN
Day = '1';
IF NOTDIGIT(STRIP(Month)) THEN
MONTH = '1';
END;
IF Type = 'END' THEN
DO;
IF NOTDIGIT(STRIP(Day)) THEN
Day = '31';
IF NOTDIGIT(STRIP(Month)) THEN
MONTH = '12';
END;
IF Year = '202' THEN
Year = '2020';
SAS_Date = MDY(INPUT(STRIP(Month), 2.), INPUT(STRIP(Day), 2.), INPUT(STRIP(Year), 4.));
******;
RETURN;
******;
DATALINES;
18-11-202 U-U-2020
;
RUN;
@chimukah wrote:
since the AEENDTC is U, i guess its unknown and it has to be missing.
Yes, that's probably best. As @Patrick points out, it's typically better to set a value to missing than to give a value that will produce invalid or misleading results. If you're not sure, the safest course is to set to missing.
The little program I showed you can be used as a sort of pattern or template for the parse-clean-check process, but you must determine what cleaning rules are appropriate for your situation and your data. My program is meant to illustrate a technique but by no means should you accept my sample rules unless they really fit your data.
Jim
First of all you need to define the rules how to interpret incomplete dates.
Do you know/can you say what dates your incomplete dates actually stand for (for all cases).
IF you can define exhaustive rules then you can write the code for it. What @jimbarbour posted is code to implement such rules - but are the rules used really sufficient for your actual data?
i.e. should aestdtc = 18-01-202 become 18Nov2020 or 18Nov2021? What other information do you have to define a rule which would provide the correct answer? Can there be a rule at all? Could you always know by looking at the data?
...and of course ideally: Instead of trying to fix bad data go back to the the source/data provider and request data of better quality.
Then you need to list and analyze all the string you can't convert to a SAS date value and define the rules for conversion. I'd consider insufficient rules converting incomplete strings to the wrong SAS date values as even worse than having to deal with missing dates. Creating "inappropriate" dates will mask DQ issues and can lead to wrong conclusions.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.