How to solve the problem in the date variable?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 130
Accepted Solution

How to solve the problem in the date variable?

[ Edited ]

When read the attached data, it shows the message:

ERROR: There was a problem with the format so BEST. was used.

as can see, because of the problem, the value in "day" column calculated by day=day(date) function is not correct.

Any way I can solve the problem? Thanks!

 

 

 aaa.jpg

 

 

Attachment

Accepted Solutions
Solution
‎11-07-2015 01:08 PM
Trusted Advisor
Posts: 1,114

Re: How to solve the problem in the date variable?

The values of variable DATE in your dataset are inconsistent: 1060 observations have valid SAS date values, but the remaining 17788 observations have numeric values such as 19900331, i.e. YYYYMMDD values. You can see this by running a PROC FREQ:

proc freq data=date_sample;
format _all_;
tables date;
run;

The permanent format YYMMDDN8., which has been assigned to variable DATE, makes sense only for the 1060 SAS date values.

 

I would convert all YYYYMMDD date values into SAS date values and then recalculate variables YEAR, QUARTER, MONTH and DAY based on these SAS date values:

data clean;
set date_sample;
if date>10000000 then date=input(put(date, 8.), yymmdd8.);
year=year(date);
quarter=qtr(date);
month=month(date);
day=day(date);
run;

This assumes that YEAR, QUARTER, MONTH and DAY are intended to correspond to the date values. I found 1018 observations in date_sample.sas7bdat where MONTH is not equal to MONTH(DATE). In all of these 1018 cases DATE is a valid SAS date value (and hence MONTH(DATE) could be calculated). A total of 17824 observations have incorrect DAY values: all 17788 obs. with YYYYMMDD dates and 36 obs. with SAS date values. Variables YEAR and QUARTER did not show inconsistencies, but this may not be true in your complete dataset (of which date_sample is just a sample, I guess), hence the recalculation also for these variables.

 

Given the mixed-up date formats in date_sample, you should make sure that there are no other types of date values present in your complete dataset.

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,198

Re: How to solve the problem in the date variable?

Supplying the full log including the program would help.
Data never sleeps
Solution
‎11-07-2015 01:08 PM
Trusted Advisor
Posts: 1,114

Re: How to solve the problem in the date variable?

The values of variable DATE in your dataset are inconsistent: 1060 observations have valid SAS date values, but the remaining 17788 observations have numeric values such as 19900331, i.e. YYYYMMDD values. You can see this by running a PROC FREQ:

proc freq data=date_sample;
format _all_;
tables date;
run;

The permanent format YYMMDDN8., which has been assigned to variable DATE, makes sense only for the 1060 SAS date values.

 

I would convert all YYYYMMDD date values into SAS date values and then recalculate variables YEAR, QUARTER, MONTH and DAY based on these SAS date values:

data clean;
set date_sample;
if date>10000000 then date=input(put(date, 8.), yymmdd8.);
year=year(date);
quarter=qtr(date);
month=month(date);
day=day(date);
run;

This assumes that YEAR, QUARTER, MONTH and DAY are intended to correspond to the date values. I found 1018 observations in date_sample.sas7bdat where MONTH is not equal to MONTH(DATE). In all of these 1018 cases DATE is a valid SAS date value (and hence MONTH(DATE) could be calculated). A total of 17824 observations have incorrect DAY values: all 17788 obs. with YYYYMMDD dates and 36 obs. with SAS date values. Variables YEAR and QUARTER did not show inconsistencies, but this may not be true in your complete dataset (of which date_sample is just a sample, I guess), hence the recalculation also for these variables.

 

Given the mixed-up date formats in date_sample, you should make sure that there are no other types of date values present in your complete dataset.

Frequent Contributor
Posts: 130

Re: How to solve the problem in the date variable?

Really appreciated! very detail and helpful comments. I didn't notice there is problem in month as well, thank you for point that out. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 291 views
  • 0 likes
  • 3 in conversation