BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kiko
Fluorite | Level 6

Hello, 

 

I am trying to troubleshoot the SAS code that was written (by someone else)  to convert d, m, y into DOB. I noticed that August and September births were converted incorrectly and that they were converted to July births. I found a note saying that because partial dates are acceptable, missing day was substituted with 15 and missing month was substituted with 7. Here's the code I inherited. 

 

array MM[2] $ CHILDMTHB MMOBRTH 
array DD[2] $ CHILDDAYB MDAYBRTH 
array YY[2] $ CHILDYRB MYRBRTH 
array dates[2] CHILDOB MDOB 
do i=1 to 2;
if compress(YY[i], '89')^='' then do;
if (compress(MM[i], '89')^='') and (compress(DD[i], '89')^='')
then dates[i]=mdy(input(MM[i], 2.), input(DD[i], 2.), input(YY[i], 4.));
else if (compress(MM[i], '89')='') and (compress(DD[i], '89')^='')
then dates[i]=mdy(7, input(DD[i], 2.), input(YY[i], 4.));
else if (compress(MM[i], '89')^='') and (compress(DD[i], '89')='')
then dates[i]=mdy(input(MM[i], 2.), 15, input(YY[i], 4.));
else dates[i]=mdy(7,15, input(YY[i], 4.));

end;

Can someone tell me how to fix this? Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@Kiko wrote:

That makes sense. What I don't understand is why the person added this in the first place? Do you have any suggestions as to how to fix it so that d,m,y can be converted to DOB correctly 

 

Thanks so much 


Remove them? I suspect that this was added due to some issues in your data. Those issues may no longer exist, or someone didn't test their code thoroughly.   BUT you should check your data. For each if condition change it to IF/THEN/DO and assign a new variable so you can see which record are being evaluated where. Then you can check if some records are not being converted or where the problem originates.

 Or just rewrite from scratch to ensure that it's correct and put comments as to why you do something to not drive the next programmer insane 🙂

 

View solution in original post

14 REPLIES 14
Reeza
Super User

if compress(YY[i], '89')^='' then do;
if (compress(MM[i], '89')^='') and (compress(DD[i], '89')^='')

 

Do you know the logic behind those two lines? I suspect those are your offending lines, but I don't know what the person was thinking when they wrote those. 

 

I would try removing those conditions and see what you get. You would also get problems for dates that are 8/9th of each month I suspect. 

 


@Kiko wrote:

Hello, 

 

I am trying to troubleshoot the SAS code that was written (by someone else)  to convert d, m, y into DOB. I noticed that August and September births were converted incorrectly and that they were converted to July births. I found a note saying that because partial dates are acceptable, missing day was substituted with 15 and missing month was substituted with 7. Here's the code I inherited. 

 

array MM[2] $ CHILDMTHB MMOBRTH 
array DD[2] $ CHILDDAYB MDAYBRTH 
array YY[2] $ CHILDYRB MYRBRTH 
array dates[2] CHILDOB MDOB 
do i=1 to 2;
if compress(YY[i], '89')^='' then do;
if (compress(MM[i], '89')^='') and (compress(DD[i], '89')^='')
then dates[i]=mdy(input(MM[i], 2.), input(DD[i], 2.), input(YY[i], 4.));
else if (compress(MM[i], '89')='') and (compress(DD[i], '89')^='')
then dates[i]=mdy(7, input(DD[i], 2.), input(YY[i], 4.));
else if (compress(MM[i], '89')^='') and (compress(DD[i], '89')='')
then dates[i]=mdy(input(MM[i], 2.), 15, input(YY[i], 4.));
else dates[i]=mdy(7,15, input(YY[i], 4.));

end;

Can someone tell me how to fix this? Thank you!

 


 

Kiko
Fluorite | Level 6

Thank you for your reply. I would be happy to try what you suggested, but can you also tell me why you think those two lines should be removed? (which explains I do not understand the logic behind them!). I wasn't paying attention to day so I'll look into it as well, but if you have any suggestions re additional changes need to be made to convert those values correctly I would greatly appreciate it. Thank you!

Reeza
Super User

Compress() removes the digits/characters 8 and 9 from the month or days variables and then tests if it's blanks. So for a month that's 8 or 9 it would then turn up as missing and then it executes the ELSE condition which sets the dates to July 15. 

 


@Kiko wrote:

Thank you for your reply. I would be happy to try what you suggested, but can you also tell me why you think those two lines should be removed? (which explains I do not understand the logic behind them!). I wasn't paying attention to day so I'll look into it as well, but if you have any suggestions re additional changes need to be made to convert those values correctly I would greatly appreciate it. Thank you!


 

Kiko
Fluorite | Level 6

That makes sense. What I don't understand is why the person added this in the first place? Do you have any suggestions as to how to fix it so that d,m,y can be converted to DOB correctly 

 

Thanks so much 

Reeza
Super User

@Kiko wrote:

That makes sense. What I don't understand is why the person added this in the first place? Do you have any suggestions as to how to fix it so that d,m,y can be converted to DOB correctly 

 

Thanks so much 


Remove them? I suspect that this was added due to some issues in your data. Those issues may no longer exist, or someone didn't test their code thoroughly.   BUT you should check your data. For each if condition change it to IF/THEN/DO and assign a new variable so you can see which record are being evaluated where. Then you can check if some records are not being converted or where the problem originates.

 Or just rewrite from scratch to ensure that it's correct and put comments as to why you do something to not drive the next programmer insane 🙂

 

Kiko
Fluorite | Level 6
Oh well, you were so right, the 8/9th of the month was converted to 15. So I am guessing it is something to do w/ replacing partial dates w/ 7 (month) and 15 (day)?
ballardw
Super User

@Kiko wrote:
Oh well, you were so right, the 8/9th of the month was converted to 15. So I am guessing it is something to do w/ replacing partial dates w/ 7 (month) and 15 (day)?

If you were to show an example of the values of the variables you currently have and there types, especially in the form of data step code, it is very likely that you will get a better solution.

Tom
Super User Tom
Super User

Don't use COMPRESS().  Try INPUT() and MISSING() instead.  You could start with something like this.  But you might want to add some more sanity checks, like checking that M is between 1 and 12.  Or that you don't have FEB 31st.  Or that the mother was born before the child. etc.

 

array MM $ CHILDMTHB MMOBRTH ;
array DD $ CHILDDAYB MDAYBRTH ;
array YY $ CHILDYRB MYRBRTH ;
array dates CHILDOB MDOB ;
do i=1 to dim(dates);
  y=input(yy(i),??32.);
  m=input(mm(i),??32.);
  d=input(dd(i),??32.);
  if missing(y) then dates(i)=.;
  else if missing(m) then dates(i)=mdy(7,15,y);
  else if missing(d) then dates(i)=mdy(m,15,y);
  else dates(i)=mdy(m,d,y);
end;
Kiko
Fluorite | Level 6

I will def try this. Thank you!! 

Astounding
PROC Star

Here's a pure guess as to why the program was written this way.  (That doesn't make it accurate!)

 

When the data was created, missing pieces of dates were filled in with "88" or "99".  So removing all "8" and "9" is an attempt to see whether the value is unknown.

 

You can easily verify this by running a PROC FREQ on the pieces of your date variables, in the original data.

 

A better way to check, other than COMPRESS might be:

 

in ("88", "99")

Kiko
Fluorite | Level 6

Yup, I just ran the PROC FREQ as you suggested and you are right on! missing and n/a were filled in with 88888 or 99999. Thank you for that explanation!

ballardw
Super User

@Kiko wrote:

Yup, I just ran the PROC FREQ as you suggested and you are right on! missing and n/a were filled in with 88888 or 99999. Thank you for that explanation!


So what do you want to do with these values? Set them missing with a resulting missing date or use a default value to "estimate" a date?

Reeza
Super User

@Kiko I think Astounding answer regarding the 88/99 is a better answer than mine. 

Kiko
Fluorite | Level 6

I agree that it is an excellent suggestion and I actually intend to try every single solution you came up with and see which one would make the most sense for my project. Funny thing is partial dates are better than unknown/missing dates, but substituting them or converting them to a default date can be misleading. I learned so much from your comments/suggestions so thank you! 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1649 views
  • 2 likes
  • 5 in conversation