- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have asked variants of this question before. It deals with having different formats for date of birth. DOB is then used to calculate age. I have received useful information on this subject from Tom and Jim.
But this is the current issue. I have two data sets (containing contact tracing data) DS1 has DOB in the following format (I use Proc import to move these data sets from excel xlsx files) - the values in the SAS dataset are like
03/31/1997
10/25/1993
05/16/1998
07/25/2019
01/20/1977
. . . and so on (79,000 obs but not all have a populated DOB)
the type is numeric.
Then in dataset two, DS2, the DOB (after the Proc import) is in SAS date form like
30427
30607
39465
19375
42183
. . . and so on (80,200 obs but not all have a populated DOB)
One question is what is the correct method to follow to get these fields in the same format?
Before I can merge these two data sets I have to consistently define the DOB. I have tried it like this
So I have been trying to change DS1 to DS2 format like
Data Opera.Arias_view_1;
Set Opera.Arias_view_1(rename = (DOB=DOBNUM));
DOB = input(DOBNUM, $11.);
drop DOBNUM;
run;
Unfortunately, the resulting DOB field has values like the following
1360
-677
2175
-178
1950
Obviously, incorrect.
I have two other datasets (case data from covid testing) that have Age fields with data like
19
34
77
29
44
. . . and so on
I want to clean up the DOB fields in the contact tracing datasets and produce Age fields analogous to what is in the case datasets. Then the cases and the contacts datasets can be merged on an ID field.
Thank you for your help.
wlierman
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS (and Excel and most software applications) actually store dates as the number of days from a specific date, in SAS that date is January 1, 1960 but Excel uses a different cutoff. Sometimes when reading a file, it will get brought in correctly and sometimes you have to correct it. The first attempt should be to just apply a date format to the variable and see how it displays. Then you can figure out if you need to actually change it in some manner.
This should print the first 10 dob, formatted as a date. Do they match what you have in your Excel file?
proc print data=opera.arias_view_1 (obs=10);
var dob;
format dob date9.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If the SAS variable is a date then it is a numeric value. That numeric value is used for any purpose for matching. The Format used to display it in different data sets makes no difference.
I highly doubt that the values in your DS2 are dates. Or you need to show Proc Contents for that data set.
When I do this:
data junk; input x; put x= date9. x= mmddyy10. x=yymmdd10.; datalines; 30427 30607 39465 19375 42183 ;
I get results like
x=22APR2043 x=04/22/2043 x=2043-04-22 x=19OCT2043 x=10/19/2043 x=2043-10-19 x=19JAN2068 x=01/19/2068 x=2068-01-19 x=17JAN2013 x=01/17/2013 x=2013-01-17 x=29JUN2075 x=06/29/2075 x=2075-06-29
Which makes me very suspicious that the values except the 17Jan2013 are actually Excel numeric dates and that you need to check the results of Proc Import closely. Or use a different approach to reading the data.
One thing with Date values is to always assign a date format to the values so they make sense to people.
The other is to read date values with a proper Informat. Here is an example of reading two different variables from text in different formats and then matching the values even though the variables have different formats.
data junk1; input x :mmddyy10.; format x mmddyy10.; datalines; 01202010 02152013 ; data junk2; input y yymmdd10.; format y yymmdd10.; datalines; 20100120 20130215 ; proc sql; create table example as select a.x, b.y from junk1 as a left join junk2 as b on a.x=b.y ; quit;
I think you need to read the log a bit closer after running this:
Data Opera.Arias_view_1; Set Opera.Arias_view_1(rename = (DOB=DOBNUM)); DOB = input(DOBNUM, $11.); drop DOBNUM; run;
and you should keep dobnum for comparison.
If the DOB is an actual date numeric value then input(DOBNUM, $11.) will throw some messages because INPUT expects character values and the conversion is likely way wrong.
I strongly suspect you need to go back to the import step and read the data properly using date informats with a data step and/or seriously fixing one or more Excel files.
Please run Proc Contents on BOTH of those sets a show the results for all of the variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
wklierman