BookmarkSubscribeRSS Feed
anjanamohan1611
Calcite | Level 5

Hello, 

I am a new user of SAS. 

I have a variable named Date of Birth. But in SAS the DOB appears as 23822, 22867, 19530, and so on. It is a character variable with $8. I would like to convert this to mmddyy10.

I have tried 

data new;
set have;
dob1 =input(dob, $8.);

format dob $8.;
run;

 

Thanks for the help

15 REPLIES 15
ballardw
Super User

I strongly wonder where you managed to get those values as CHARACTER. They look somewhat like the numeric value of SAS dates.

Try formatting DOB1 as mmddyy10.  and see if you are happy with the results.

 

Or show us the LOG from running that example code. Copy the data step from the log with all the notes or warnings and paste into a code box opened on the forum with the </> icon to preserve formatting.

Reeza
Super User
You only need to apply a format to the date.

format dob mmddyy10.;

Add that to your data step and see what happens.

proc print data=have (obs=10);
var dob;
format dob mmddyy10.;
run;

anjanamohan1611
Calcite | Level 5

This is how DOB looks like

anjanamohan1611_0-1598292755564.png

 

DATA CANCER.corr_DOS1;
SET CANCER.BC;
dob1= input (dob, yymmdd6.);
format dob1 MMDDYY10.;
RUN;

 

I didn't get any output. It is missing

Reeza
Super User
You do not need the INPUT statement, only the FORMAT statement.
Note that I didn't use INPUT and referred to your original variable in my example. Run my example replacing your data set name and see what happens.

proc print data = cancer.bc (obs=10);
var dob;
format dob mmddyy10.;
run;
anjanamohan1611
Calcite | Level 5

Here is the log

 

proc print data=CANCER.bc(obs=10);
var dob;
format dob mmddyy10.;
ERROR: You are trying to use the numeric format MMDDYY with the character variable DOB in data
set CANCER.BC.
run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

Reeza
Super User
Ok, then you do need to input but use 8. as the input format. It doesn't make sense that the numbers would be a character and SAS date though so something is a bit wonky here. What is the source data? How did you read it?

DATA CANCER.corr_DOS1;
SET CANCER.BC;
dob1= input (dob, 8.);
format dob1 MMDDYY10.;
RUN;
anjanamohan1611
Calcite | Level 5

Hello,

This was in excel format and I have imported it into SAS.

The original DOB is 

anjanamohan1611_0-1598298097768.png

However, after using the code the new DOB1 is 

anjanamohan1611_1-1598298151843.png

 

ballardw
Super User

@anjanamohan1611 wrote:

Hello,

This was in excel format and I have imported it into SAS.

The original DOB is 

anjanamohan1611_0-1598298097768.png

However, after using the code the new DOB1 is 

 

 


 So now you should show the code used to read the Excel file, the actual other code that you ran from the LOG along with notes and warnings, and quite possibly a copy of the Excel file.

 

 

Tom
Super User Tom
Super User

So the issue is that your Excel file has text in at least one of the cells in the column with the date values.  So when SAS tried to figure out how to convert the column in a variable (rather than a loose collection of cells like a spreadsheet) it had to make a character variable instead of numeric variable.  When that happens for any actual date values in the column in Excel you get the raw number of days (as Excel stores them) as a character string of digits.

 

So first convert to a number and then correct for the difference in starting date.

183   data test;
184     excelstr='23822';
185     exceldt=input(excelstr,32.);
186     sasdt=exceldt+'30DEC1899'd ;
187     put excelstr= $quote. /  (2*exceldt 2*sasdt) (=comma. =yymmdd10. / );
188   run;

excelstr="23822"
exceldt=23,822 exceldt=2025-03-22
sasdt=1,906 sasdt=1965-03-21
ballardw
Super User

I suspect it may be time to actually show what your data set looks like.

Please run

 

Proc contents data=CANCER.BC;

run;

and show us the result.

 

If you try to input a value like 23822 with a yymmdd6 informat it would result in missing values because 1) you don't have 6 characters and 2) it would try to treat 82 as the month. I bet you have lots of invalid data messages in your LOG. You do read the log don't you? Something like:

NOTE: Mathematical operations could not be performed at the
      following places. The results of the operations have been
      set to missing values.
      Each place is given by:
      (Number of times) at (Line):(Column).
      1 at 8:11

which means that the value can't be treated as a date with the value shown.

 

 

And if the "year" was accepted as two digits the result would likely be 2023, which I suspect is not valid for your data. Example:

data junk;
   x = '230823';
   date = input(x,yymmdd6.);
   put date= mmddyy10.;
   format date mmddyy10.;
run;

08 above is valid for a month value so this "works" but the year is 2023.

 

If the values of the variable are actually character then I think you need to create a numeric version such as you tried before: Dob1 =input(dob,best.); or similar. and then apply a format to the DOB1 variable. If the value of DOB is actually numeric then try just assigning the format.

anjanamohan1611
Calcite | Level 5

proc import datafile="C:\CANCER\MEANADH\BC.xlsx" out=cancer.BC
DBMS= xlsx REPLACE;
sheet="sheet_1";
getnames=yes;
run;

 

This is how the data looks in excel 

 
 
 

 

DOB
3/21/1965
8/9/1962
6/20/1953
2/4/1962
1/8/1965
 
 

 

 
 

 

Reeza
Super User
Is this a one time project or needs to be automated?
If it's one time, go into Excel, make sure the field has an Excel date format applied to it and you should be good to go.
anjanamohan1611
Calcite | Level 5

Yes, it is a one time project. I will try fixing the excel

anjanamohan1611
Calcite | Level 5

And in SAS the properties of this variable is 

anjanamohan1611_0-1598300776942.png

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 5674 views
  • 3 likes
  • 4 in conversation