missing values were generated on date calculation

Accepted Solution Solved
Reply
Contributor
Posts: 49
Accepted Solution

missing values were generated on date calculation

I have a question regarding to missingness generated by date calculation. My code is as follow:

 

data temp3.test; 
   retain start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin;
   set temp3.ipclms11_recode;
   by bene_id;
   format start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin YYMMDDN8.;
   if first.bene_id then start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
   end_dt_earAdmin=start_dt_earAdmin+364;
   day30_earAdmin=start_dt_earAdmin+29;
   day90_earAdmin=start_dt_earAdmin+89;
   if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin; 
 run;

The note suggests:

 

Missing values were generated as a result of performing an operation on missing values.

 

I checked the missingness of my original data on CLM_ADMSN_DT, clm_from_dt. There are no missing value among these two variables or the newly generated variables: start_dt_earAdmin and end_dt_earAdmin.

 

When I modified my code as below, I don't have the error message any more. But I couldn't figure out why, as the results on start_dt_earAdmin and end_dt_earAdmin look the same in the new dataset prior to and post code modification.

 

data temp3.ipclms11_f; 
   retain start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin;
   set temp3.ipclms11_recode;
   by bene_id;
   format start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin YYMMDDN8.;
   if first.bene_id then start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
   end_dt_earAdmin=sum(start_dt_earAdmin,364);
   day30_earAdmin=sum(start_dt_earAdmin,29);
   day90_earAdmin=sum(start_dt_earAdmin,89);
   if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin; 
run;

Any comment or thoughts are highly valued to me. Thank you!

 


Accepted Solutions
Solution
‎05-18-2016 08:37 AM
Contributor
Posts: 49

Re: missing values were generated on date calculation

Thank you for pointing out a great way of checking, which is quite helpful. I figured out the problem is that I forgot to use 'do' and 'end' after the 'if' statement. The missing values were acutally generated by the missingness of the start date...

 

I correct the if statement as below and it works now. Thank you so much for all your advise! It helps a lot!

 

if first.bene_id then do;
   start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
   end_dt_earAdmin=start_dt_earAdmin + 364;
   day30_earAdmin=start_dt_earAdmin +29 ;
   day90_earAdmin=start_dt_earAdmin +89;
   end;

View solution in original post


All Replies
Super User
Super User
Posts: 6,502

Re: missing values were generated on date calculation

I can't see what you changed.  Can you explain what change you made.

 

Most likely the variable start_dt_earAdmin is already on the input dataset, so the RETAIN does nothing.

Contributor
Posts: 49

Re: missing values were generated on date calculation

Thank you for your question. I should have provided more information on the structure of the data.

1. The original data is patient level data in inpatient setting. Some patients might have multiple records representing multiple inpatient events. So I sort the data by patient id, admission date, claim from date, discharge date and claim through date. Then I take the minimum of (admission date, claim from date) from the earliest record as the index date (start_dt_earAdmin)  for each patient. In the new dataset, I would have the same value of 'start_dt_earAdmin' for the same patient by using 'retain'.

 

2. In the modified version of code, I use sum function instead of

end_dt_earAdmin=start_dt_earAdmin+364;

 

3. The way I checked the missing:

 

proc means data = temp3.ipclms11_f n nmiss;
  var start_dt_earAdmin;
run;
Contributor
Posts: 49

Re: missing values were generated on date calculation

Sorry, I also forgot to mention the end date was set as 30 days, 90 days and a year after the index date.
Super User
Posts: 10,538

Re: missing values were generated on date calculation

I don't know exactly how you checked the missingness of your START variable but apparently you missed some that were.

 

Remove if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin; and then look at the resulting data set and the START variable.

Contributor
Posts: 49

Re: missing values were generated on date calculation

Thank you. That's a good point. However, the resulting data set doesn't change after removing if...
Super User
Posts: 5,093

Re: missing values were generated on date calculation

Crystal,

 

There's a very helpful piece of information that you should post (if you still need help on this).  When the log shows a note about missing values being generated, it also shows a note about which line in the program is responsible.  That's a huge clue.  Perhaps  you could post the log instead of the program to make it easier to track this down.

 

Good luck.

Contributor
Posts: 49

Re: missing values were generated on date calculation

444  data temp3.test;
445     retain start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin;
446     set temp3.ipclms11_recode;
447     by bene_id;
448     format start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin YYMMDDN8.;
449     if first.bene_id then start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
450     end_dt_earAdmin=start_dt_earAdmin+364;
451     day30_earAdmin=start_dt_earAdmin+29;
452     day90_earAdmin=start_dt_earAdmin+89;
453     if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin;
454   run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      35650 at 450:37   35650 at 451:36   35650 at 452:36
NOTE: There were 285911 observations read from the data set TEMP3.IPCLMS11_RECODE.
NOTE: The data set TEMP3.TEST has 250261 observations and 17 variables.
NOTE: DATA statement used (Total process time):
      real time           8.51 seconds
      cpu time            0.42 seconds

Thank you for your suggestion! I'm attaching the log here for your reference.

 

Super User
Posts: 10,538

Re: missing values were generated on date calculation

Not that the column positions 37 and 36 correspond to the locations of the plus signs. This indicates that your start_dt_earAdmin variable is likely missing.

 

Your output did not contain any missing values because

if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin;

would not output anything when end_dt_earAdmin is missing (because the start value was) when either of

CLM_ADMSN_DT or clm_from_dt are not missing.

Since your start_dt_earAdmin is only assigned a value on first.bene_id I suspect that for some id values the first record is missing both CLM_ADMSN_DT and clm_from_dt.

See what this code gives you:

data temp3.missing;
   set temp3.ipclms11_recode;
   by bene_id;
   if first.bene_id and nmiss(CLM_ADMSN_DT, clm_from_dt) = 2;
run;
Contributor
Posts: 49

Re: missing values were generated on date calculation

Thanks a lot for your suggestions and the code. But it seems that I actually don't have any data lost from resulting dataset. Below is the result when i removed the 'if...'

 

498  data temp3.ipclms11_f;
499     retain start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin;
500     set temp3.ipclms11_recode;
501     by bene_id;
502     format start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin YYMMDDN8.;
503     if first.bene_id then start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
504     end_dt_earAdmin=sum(start_dt_earAdmin,364);
505     day30_earAdmin=sum(start_dt_earAdmin,29);
506     day90_earAdmin=sum(start_dt_earAdmin,89);
507  /*   if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin; */
508  run;

NOTE: There were 285911 observations read from the data set TEMP3.IPCLMS11_RECODE.
NOTE: The data set TEMP3.IPCLMS11_F has 285911 observations and 17 variables.
NOTE: DATA statement used (Total process time):
      real time           10.29 seconds
      cpu time            0.43 seconds


508!        **** benes = 250261;

509  %put n=%nobs(temp3.ipclms11_f);
n=285911
510  data temp3.missing;
511     set temp3.ipclms11_recode;
512     by bene_id;
513     if first.bene_id and nmiss(CLM_ADMSN_DT, clm_from_dt) = 2;
514  run;

NOTE: There were 285911 observations read from the data set TEMP3.IPCLMS11_RECODE.
NOTE: The data set TEMP3.MISSING has 0 observations and 14 variables.
NOTE: DATA statement used (Total process time):
      real time           2.35 seconds
      cpu time            0.10 seconds

Super User
Posts: 10,538

Re: missing values were generated on date calculation

Sorry I was unclear. I meant to indicate to run the code with the + instead of sum function without the IF statement.

 

 

Solution
‎05-18-2016 08:37 AM
Contributor
Posts: 49

Re: missing values were generated on date calculation

Thank you for pointing out a great way of checking, which is quite helpful. I figured out the problem is that I forgot to use 'do' and 'end' after the 'if' statement. The missing values were acutally generated by the missingness of the start date...

 

I correct the if statement as below and it works now. Thank you so much for all your advise! It helps a lot!

 

if first.bene_id then do;
   start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
   end_dt_earAdmin=start_dt_earAdmin + 364;
   day30_earAdmin=start_dt_earAdmin +29 ;
   day90_earAdmin=start_dt_earAdmin +89;
   end;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 767 views
  • 4 likes
  • 4 in conversation