I have two data sets for pre and post data. I was given the data in SPSS files for two sets of dates. My end goal is to create an age variable:
predateofbirth (character length 10)
precdate (character length 6)
predateofbirth precdate
5/27/1999 220701
10/6/2001 220815
1/1/1953 221031
12/23/1985 230112
5/5/1996 230125
postdateofbirth has a glitch that is putting in a space followed by a zero after any dateofbirth that has a single digit month and single digit day in their birthday (3/6/1989 0)
postdateofbirth (character value length 10)
postcdate length (numeric value length 6)
postdateofbirth postcdate
5/27/1999 220716
10/6/2001 220912
1/1/1953 0 221123
12/23/1985 230202
5/5/1996 0 230214
For the data error that only occurs on the single month and day dates, how would I remove the "space 0" from the postdateofbirth data?
for the predateofbirth to convert to numeric, i did:
numeric_predateofbirth = input(predateofbirth, anydtdte10.);
drop predateofbirth;
rename numeric_predateofbirth = dateofbirth;
for the precdate to convert to numeric, i did:
numeric_precdate = input(precdate, anydtdte10.);
drop precdate;
rename numeric_precdate = cdate;
I am having issues reconciling these variables so that I can create and use an age variable.
After converting the postdate yymmdd to numeric, do I need to add 2,000,000 to it to get a YYYYMMDD variable?
If this wouldn't be the way to go, could anyone explain how to do this like I am a five year old?
So assuming you meant that you had a dataset like this one:
data have;
length predateofbirth $10 precdate $6 ;
input predateofbirth precdate;
cards;
5/27/1999 220701
10/6/2001 220815
1/1/1953 221031
12/23/1985 230112
5/5/1996 230125
;
Then you can covert the first one using MMDDYY informat and the second with the YYMMDD informat. (Note only use the ANYDT... informats as a last resort as they have GUESS what style of date strings you have.)
data want;
set have;
dob = input(predateofbirth,mmddyy10.);
date2 = input(precdate,yymmdd8.);
format dob date2 date9.;
run;
Result:
OBS predateofbirth precdate dob date2
1 5/27/1999 220701 27MAY1999 01JUL2022
2 10/6/2001 220815 06OCT2001 15AUG2022
3 1/1/1953 221031 01JAN1953 31OCT2022
4 12/23/1985 230112 23DEC1985 12JAN2023
5 5/5/1996 230125 05MAY1996 25JAN2023
If your problem with the second file is those extra zeros on the first date field, so perhaps your data looks something like this?
data have;
length postdateofbirth $10 postcdate $6 ;
infile cards dsd truncover;
input postdateofbirth postcdate;
cards;
5/27/1999,220716
10/6/2001,220912
1/1/1953 0,221123
12/23/1985,230202
5/5/1996 0,230214
;
Then you can use the SCAN() function to only take the string up to the first space.
data want;
set have;
dob = input(scan(postdateofbirth,1,' '),mmddyy10.);
date2 = input(postcdate,yymmdd8.);
format dob date2 date9.;
run;
Result
OBS postdateofbirth postcdate dob date2 1 5/27/1999 220716 27MAY1999 16JUL2022 2 10/6/2001 220912 06OCT2001 12SEP2022 3 1/1/1953 0 221123 01JAN1953 23NOV2022 4 12/23/1985 230202 23DEC1985 02FEB2023 5 5/5/1996 0 230214 05MAY1996 14FEB2023
For the post data my postcdate was originally a numeric variable. I don't know if that has anything to do with the issue that I am having, but when I use either date9. or yymmdd8. my output for postcdate is showing:
2564/07/30
2565/02/25
2590/01/26
So, it is somehow reading my variable as a sas date format instead of the format that I am inputting. Do you have any idea why this might be the case or how to correct it? Sorry for all of the trouble.
@bej wrote:
For the post data my postcdate was originally a numeric variable. I don't know if that has anything to do with the issue that I am having, but when I use either date9. or yymmdd8. my output for postcdate is showing:
2564/07/30
2565/02/25
2590/01/26
So, it is somehow reading my variable as a sas date format instead of the format that I am inputting. Do you have any idea why this might be the case or how to correct it? Sorry for all of the trouble.
Please share examples of the observations that cause you trouble. Pay attention to whether the variables are character or numeric. And if they are numeric if they have a FORMAT attached or not.
If the variable is numeric and has integers like 220,716 that you want to interpret as meaning YYMMDD style digits then to convert to a date value you will first need to convert the number to a string so you can use an informat that understands digit strings like that.
date2 = input(put(postcdate,z6.),yymmdd6.);
Notice that 30JUL2564 is the number 220,818 . If you interpret the digits of that integer as representing YYM,MDD values then it could represent 18AUG2022 (or 18AUG1922).
1 data _null_;
2 string='2564/07/30';
3 date=input(string,yymmdd10.);
4 put date= comma12.;
5 run;
date=220,818
SAS stores actual dates as numeric variables with the value as the number of days since 1960. So if you have a variable that is numeric with any of the formats used to display dates you can change how it displayed by just changing the format without making any change to the values. For example if POSTCDATE is numeric and has the YYMMDDN6. format attached to it then it would display 220716 for the date 16JUL2022 (or for the date 16JUL1922).
Hi @bej
You don't specify if your SPSS files are export files from SPSS or SPSS data files. If they are data files, this article gives a lot of helpful information about importing in SAS, both with Base SAS and with SAS Enterprise Guide, which has a point-and click task for SPSS import.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.