BookmarkSubscribeRSS Feed
wklierman
Obsidian | Level 7

I have asked variants of this question before.  It deals with having different formats for date of birth.  DOB is then used to calculate age.  I have received useful information on this subject from Tom and Jim.

 

But this is the current issue.  I have two data sets (containing contact tracing data) DS1 has DOB in the following format (I use Proc import to move these data sets from excel xlsx files) - the values in the SAS dataset are like

 

         03/31/1997

         10/25/1993

          05/16/1998

          07/25/2019

          01/20/1977

              . . .           and so on  (79,000 obs but not all have a populated DOB)

the type is numeric.

 

Then in dataset two, DS2, the DOB (after the Proc import) is in SAS date form like

           

           30427

           30607

           39465

           19375

            42183

              . . .        and so on  (80,200 obs but not all have a populated DOB)

 

One question is what is the correct method to follow to get these fields in the same format?

 

Before I can merge these two data sets I have to consistently define the DOB. I have tried it like this

 

          So I have been trying to change DS1 to DS2 format like

 

     Data Opera.Arias_view_1;

       Set Opera.Arias_view_1(rename = (DOB=DOBNUM));

              DOB = input(DOBNUM, $11.);

              drop DOBNUM;

    run;

 

 

 

Unfortunately,  the resulting DOB field has values like the following

 

              1360

              -677

              2175

              -178

              1950

 

Obviously, incorrect.

 

I have two other datasets (case data from covid testing) that have Age fields with data like

 

          19

           34

           77

           29

           44 

           . . .         and so on

 

I want to clean up the DOB fields in the contact tracing datasets and produce Age fields analogous to what is in the case datasets.  Then the cases and the contacts datasets can be merged on an ID field.

 

Thank you for your help.

 

wlierman

 

 

 

3 REPLIES 3
Reeza
Super User
What does 30427 show as in your Excel file?

SAS (and Excel and most software applications) actually store dates as the number of days from a specific date, in SAS that date is January 1, 1960 but Excel uses a different cutoff. Sometimes when reading a file, it will get brought in correctly and sometimes you have to correct it. The first attempt should be to just apply a date format to the variable and see how it displays. Then you can figure out if you need to actually change it in some manner.

This should print the first 10 dob, formatted as a date. Do they match what you have in your Excel file?

proc print data=opera.arias_view_1 (obs=10);
var dob;
format dob date9.;
run;

ballardw
Super User

If the SAS variable is a date then it is a numeric value. That numeric value is used for any purpose for matching. The Format used to display it in different data sets makes no difference.

 

I highly doubt that the values in your DS2 are dates. Or you need to show Proc Contents for that data set.

When I do this:

data junk;
   input x;
   put x= date9. x= mmddyy10. x=yymmdd10.;
datalines;
30427
30607
39465
19375
42183
;

I get results like

x=22APR2043 x=04/22/2043 x=2043-04-22
x=19OCT2043 x=10/19/2043 x=2043-10-19
x=19JAN2068 x=01/19/2068 x=2068-01-19
x=17JAN2013 x=01/17/2013 x=2013-01-17
x=29JUN2075 x=06/29/2075 x=2075-06-29

Which makes me very suspicious that the values except the 17Jan2013 are actually Excel numeric dates and that you need to check the results of Proc Import closely. Or use a different approach to reading the data.

 

One thing with Date values is to always assign a date format to the values so they make sense to people.

The other is to read date values with a proper Informat. Here is an example of reading two different variables from text in different formats and then matching the values even though the variables have different formats.

data junk1;
  input x :mmddyy10.;
  format x mmddyy10.;
datalines;
01202010
02152013
;

data junk2;
  input y yymmdd10.;
  format y yymmdd10.;
datalines;
20100120
20130215
;

proc sql;
   create table example as
   select a.x, b.y
   from junk1 as a
        left join
        junk2 as b
        on a.x=b.y
   ;
quit;

I think you need to read the log a bit closer after running this:

    Data Opera.Arias_view_1;
       Set Opera.Arias_view_1(rename = (DOB=DOBNUM));
       DOB = input(DOBNUM, $11.);
        drop DOBNUM;
    run;

and you should keep dobnum for comparison.

If the DOB is an actual date numeric value then input(DOBNUM, $11.) will throw some messages because INPUT expects character values and the conversion is likely way wrong.

 

I strongly suspect you need to go back to the import step and read the data properly using date informats with a data step and/or seriously fixing one or more Excel files.

Please run Proc Contents on BOTH of those sets a show the results for all of the variables.

 

wklierman
Obsidian | Level 7
Thank you for the research suggestions. I will follow-up and be back in touch.

wklierman

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 458 views
  • 2 likes
  • 3 in conversation