## How to solve the problem in the date variable?

Solved
Frequent Contributor
Posts: 130

# How to solve the problem in the date variable?

[ Edited ]

When read the attached data, it shows the message:

ERROR: There was a problem with the format so BEST. was used.

as can see, because of the problem, the value in "day" column calculated by day=day(date) function is not correct.

Any way I can solve the problem? Thanks!

Accepted Solutions
Solution
‎11-07-2015 01:08 PM
Posts: 1,248

## Re: How to solve the problem in the date variable?

The values of variable DATE in your dataset are inconsistent: 1060 observations have valid SAS date values, but the remaining 17788 observations have numeric values such as 19900331, i.e. YYYYMMDD values. You can see this by running a PROC FREQ:

proc freq data=date_sample;
format _all_;
tables date;
run;

The permanent format YYMMDDN8., which has been assigned to variable DATE, makes sense only for the 1060 SAS date values.

I would convert all YYYYMMDD date values into SAS date values and then recalculate variables YEAR, QUARTER, MONTH and DAY based on these SAS date values:

data clean;
set date_sample;
if date>10000000 then date=input(put(date, 8.), yymmdd8.);
year=year(date);
quarter=qtr(date);
month=month(date);
day=day(date);
run;

This assumes that YEAR, QUARTER, MONTH and DAY are intended to correspond to the date values. I found 1018 observations in date_sample.sas7bdat where MONTH is not equal to MONTH(DATE). In all of these 1018 cases DATE is a valid SAS date value (and hence MONTH(DATE) could be calculated). A total of 17824 observations have incorrect DAY values: all 17788 obs. with YYYYMMDD dates and 36 obs. with SAS date values. Variables YEAR and QUARTER did not show inconsistencies, but this may not be true in your complete dataset (of which date_sample is just a sample, I guess), hence the recalculation also for these variables.

Given the mixed-up date formats in date_sample, you should make sure that there are no other types of date values present in your complete dataset.

All Replies
Super User
Posts: 5,878

## Re: How to solve the problem in the date variable?

Data never sleeps
Solution
‎11-07-2015 01:08 PM
Posts: 1,248

## Re: How to solve the problem in the date variable?

The values of variable DATE in your dataset are inconsistent: 1060 observations have valid SAS date values, but the remaining 17788 observations have numeric values such as 19900331, i.e. YYYYMMDD values. You can see this by running a PROC FREQ:

proc freq data=date_sample;
format _all_;
tables date;
run;

The permanent format YYMMDDN8., which has been assigned to variable DATE, makes sense only for the 1060 SAS date values.

I would convert all YYYYMMDD date values into SAS date values and then recalculate variables YEAR, QUARTER, MONTH and DAY based on these SAS date values:

data clean;
set date_sample;
if date>10000000 then date=input(put(date, 8.), yymmdd8.);
year=year(date);
quarter=qtr(date);
month=month(date);
day=day(date);
run;

This assumes that YEAR, QUARTER, MONTH and DAY are intended to correspond to the date values. I found 1018 observations in date_sample.sas7bdat where MONTH is not equal to MONTH(DATE). In all of these 1018 cases DATE is a valid SAS date value (and hence MONTH(DATE) could be calculated). A total of 17824 observations have incorrect DAY values: all 17788 obs. with YYYYMMDD dates and 36 obs. with SAS date values. Variables YEAR and QUARTER did not show inconsistencies, but this may not be true in your complete dataset (of which date_sample is just a sample, I guess), hence the recalculation also for these variables.

Given the mixed-up date formats in date_sample, you should make sure that there are no other types of date values present in your complete dataset.

Frequent Contributor
Posts: 130