Hello,
I am trying to troubleshoot the SAS code that was written (by someone else) to convert d, m, y into DOB. I noticed that August and September births were converted incorrectly and that they were converted to July births. I found a note saying that because partial dates are acceptable, missing day was substituted with 15 and missing month was substituted with 7. Here's the code I inherited.
array MM[2] $ CHILDMTHB MMOBRTH
array DD[2] $ CHILDDAYB MDAYBRTH
array YY[2] $ CHILDYRB MYRBRTH
array dates[2] CHILDOB MDOB
do i=1 to 2;
if compress(YY[i], '89')^='' then do;
if (compress(MM[i], '89')^='') and (compress(DD[i], '89')^='')
then dates[i]=mdy(input(MM[i], 2.), input(DD[i], 2.), input(YY[i], 4.));
else if (compress(MM[i], '89')='') and (compress(DD[i], '89')^='')
then dates[i]=mdy(7, input(DD[i], 2.), input(YY[i], 4.));
else if (compress(MM[i], '89')^='') and (compress(DD[i], '89')='')
then dates[i]=mdy(input(MM[i], 2.), 15, input(YY[i], 4.));
else dates[i]=mdy(7,15, input(YY[i], 4.));
end;
Can someone tell me how to fix this? Thank you!
@Kiko wrote:
That makes sense. What I don't understand is why the person added this in the first place? Do you have any suggestions as to how to fix it so that d,m,y can be converted to DOB correctly
Thanks so much
Remove them? I suspect that this was added due to some issues in your data. Those issues may no longer exist, or someone didn't test their code thoroughly. BUT you should check your data. For each if condition change it to IF/THEN/DO and assign a new variable so you can see which record are being evaluated where. Then you can check if some records are not being converted or where the problem originates.
Or just rewrite from scratch to ensure that it's correct and put comments as to why you do something to not drive the next programmer insane 🙂
if compress(YY[i], '89')^='' then do;
if (compress(MM[i], '89')^='') and (compress(DD[i], '89')^='')
Do you know the logic behind those two lines? I suspect those are your offending lines, but I don't know what the person was thinking when they wrote those.
I would try removing those conditions and see what you get. You would also get problems for dates that are 8/9th of each month I suspect.
@Kiko wrote:
Hello,
I am trying to troubleshoot the SAS code that was written (by someone else) to convert d, m, y into DOB. I noticed that August and September births were converted incorrectly and that they were converted to July births. I found a note saying that because partial dates are acceptable, missing day was substituted with 15 and missing month was substituted with 7. Here's the code I inherited.
array MM[2] $ CHILDMTHB MMOBRTH
array DD[2] $ CHILDDAYB MDAYBRTH
array YY[2] $ CHILDYRB MYRBRTH
array dates[2] CHILDOB MDOB
do i=1 to 2;
if compress(YY[i], '89')^='' then do;
if (compress(MM[i], '89')^='') and (compress(DD[i], '89')^='')
then dates[i]=mdy(input(MM[i], 2.), input(DD[i], 2.), input(YY[i], 4.));
else if (compress(MM[i], '89')='') and (compress(DD[i], '89')^='')
then dates[i]=mdy(7, input(DD[i], 2.), input(YY[i], 4.));
else if (compress(MM[i], '89')^='') and (compress(DD[i], '89')='')
then dates[i]=mdy(input(MM[i], 2.), 15, input(YY[i], 4.));
else dates[i]=mdy(7,15, input(YY[i], 4.));
end;
Can someone tell me how to fix this? Thank you!
Thank you for your reply. I would be happy to try what you suggested, but can you also tell me why you think those two lines should be removed? (which explains I do not understand the logic behind them!). I wasn't paying attention to day so I'll look into it as well, but if you have any suggestions re additional changes need to be made to convert those values correctly I would greatly appreciate it. Thank you!
Compress() removes the digits/characters 8 and 9 from the month or days variables and then tests if it's blanks. So for a month that's 8 or 9 it would then turn up as missing and then it executes the ELSE condition which sets the dates to July 15.
@Kiko wrote:
Thank you for your reply. I would be happy to try what you suggested, but can you also tell me why you think those two lines should be removed? (which explains I do not understand the logic behind them!). I wasn't paying attention to day so I'll look into it as well, but if you have any suggestions re additional changes need to be made to convert those values correctly I would greatly appreciate it. Thank you!
That makes sense. What I don't understand is why the person added this in the first place? Do you have any suggestions as to how to fix it so that d,m,y can be converted to DOB correctly
Thanks so much
@Kiko wrote:
That makes sense. What I don't understand is why the person added this in the first place? Do you have any suggestions as to how to fix it so that d,m,y can be converted to DOB correctly
Thanks so much
Remove them? I suspect that this was added due to some issues in your data. Those issues may no longer exist, or someone didn't test their code thoroughly. BUT you should check your data. For each if condition change it to IF/THEN/DO and assign a new variable so you can see which record are being evaluated where. Then you can check if some records are not being converted or where the problem originates.
Or just rewrite from scratch to ensure that it's correct and put comments as to why you do something to not drive the next programmer insane 🙂
@Kiko wrote:
Oh well, you were so right, the 8/9th of the month was converted to 15. So I am guessing it is something to do w/ replacing partial dates w/ 7 (month) and 15 (day)?
If you were to show an example of the values of the variables you currently have and there types, especially in the form of data step code, it is very likely that you will get a better solution.
Don't use COMPRESS(). Try INPUT() and MISSING() instead. You could start with something like this. But you might want to add some more sanity checks, like checking that M is between 1 and 12. Or that you don't have FEB 31st. Or that the mother was born before the child. etc.
array MM $ CHILDMTHB MMOBRTH ;
array DD $ CHILDDAYB MDAYBRTH ;
array YY $ CHILDYRB MYRBRTH ;
array dates CHILDOB MDOB ;
do i=1 to dim(dates);
y=input(yy(i),??32.);
m=input(mm(i),??32.);
d=input(dd(i),??32.);
if missing(y) then dates(i)=.;
else if missing(m) then dates(i)=mdy(7,15,y);
else if missing(d) then dates(i)=mdy(m,15,y);
else dates(i)=mdy(m,d,y);
end;
I will def try this. Thank you!!
Here's a pure guess as to why the program was written this way. (That doesn't make it accurate!)
When the data was created, missing pieces of dates were filled in with "88" or "99". So removing all "8" and "9" is an attempt to see whether the value is unknown.
You can easily verify this by running a PROC FREQ on the pieces of your date variables, in the original data.
A better way to check, other than COMPRESS might be:
in ("88", "99")
Yup, I just ran the PROC FREQ as you suggested and you are right on! missing and n/a were filled in with 88888 or 99999. Thank you for that explanation!
@Kiko wrote:
Yup, I just ran the PROC FREQ as you suggested and you are right on! missing and n/a were filled in with 88888 or 99999. Thank you for that explanation!
So what do you want to do with these values? Set them missing with a resulting missing date or use a default value to "estimate" a date?
@Kiko I think Astounding answer regarding the 88/99 is a better answer than mine.
I agree that it is an excellent suggestion and I actually intend to try every single solution you came up with and see which one would make the most sense for my project. Funny thing is partial dates are better than unknown/missing dates, but substituting them or converting them to a default date can be misleading. I learned so much from your comments/suggestions so thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.