turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- missing values were generated on date calculation

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 11:54 AM

I have a question regarding to missingness generated by date calculation. My code is as follow:

```
data temp3.test;
retain start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin;
set temp3.ipclms11_recode;
by bene_id;
format start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin YYMMDDN8.;
if first.bene_id then start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
end_dt_earAdmin=start_dt_earAdmin+364;
day30_earAdmin=start_dt_earAdmin+29;
day90_earAdmin=start_dt_earAdmin+89;
if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin;
run;
```

The note suggests:

Missing values were generated as a result of performing an operation on missing values.

I checked the missingness of my original data on CLM_ADMSN_DT, clm_from_dt. There are no missing value among these two variables or the newly generated variables: start_dt_earAdmin and end_dt_earAdmin.

When I modified my code as below, I don't have the error message any more. But I couldn't figure out why, as the results on start_dt_earAdmin and end_dt_earAdmin look the same in the new dataset prior to and post code modification.

```
data temp3.ipclms11_f;
retain start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin;
set temp3.ipclms11_recode;
by bene_id;
format start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin YYMMDDN8.;
if first.bene_id then start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
end_dt_earAdmin=sum(start_dt_earAdmin,364);
day30_earAdmin=sum(start_dt_earAdmin,29);
day90_earAdmin=sum(start_dt_earAdmin,89);
if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin;
run;
```

Any comment or thoughts are highly valued to me. Thank you!

Accepted Solutions

Solution

05-18-2016
08:37 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 08:37 AM

Thank you for pointing out a great way of checking, which is quite helpful. I figured out the problem is that I forgot to use 'do' and 'end' after the 'if' statement. The missing values were acutally generated by the missingness of the start date...

I correct the if statement as below and it works now. Thank you so much for all your advise! It helps a lot!

```
if first.bene_id then do;
start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
end_dt_earAdmin=start_dt_earAdmin + 364;
day30_earAdmin=start_dt_earAdmin +29 ;
day90_earAdmin=start_dt_earAdmin +89;
end;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 12:14 PM

I can't see what you changed. Can you explain what change you made.

Most likely the variable start_dt_earAdmin is already on the input dataset, so the RETAIN does nothing.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 01:43 PM

Thank you for your question. I should have provided more information on the structure of the data.

1. The original data is patient level data in inpatient setting. Some patients might have multiple records representing multiple inpatient events. So I sort the data by patient id, admission date, claim from date, discharge date and claim through date. Then I take the minimum of (admission date, claim from date) from the earliest record as the index date (start_dt_earAdmin) for each patient. In the new dataset, I would have the same value of 'start_dt_earAdmin' for the same patient by using 'retain'.

2. In the modified version of code, I use sum function instead of

end_dt_earAdmin=start_dt_earAdmin+364;

3. The way I checked the missing:

```
proc means data = temp3.ipclms11_f n nmiss;
var start_dt_earAdmin;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 01:47 PM

Sorry, I also forgot to mention the end date was set as 30 days, 90 days and a year after the index date.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 12:19 PM

I don't know exactly how you checked the missingness of your START variable but apparently you missed some that were.

Remove if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin; and then look at the resulting data set and the START variable.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 02:01 PM

Thank you. That's a good point. However, the resulting data set doesn't change after removing if...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 12:27 PM

Crystal,

There's a very helpful piece of information that you should post (if you still need help on this). When the log shows a note about missing values being generated, it also shows a note about which line in the program is responsible. That's a huge clue. Perhaps you could post the log instead of the program to make it easier to track this down.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 01:28 PM

```
444 data temp3.test;
445 retain start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin;
446 set temp3.ipclms11_recode;
447 by bene_id;
448 format start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin YYMMDDN8.;
449 if first.bene_id then start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
450 end_dt_earAdmin=start_dt_earAdmin+364;
451 day30_earAdmin=start_dt_earAdmin+29;
452 day90_earAdmin=start_dt_earAdmin+89;
453 if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin;
454 run;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
35650 at 450:37 35650 at 451:36 35650 at 452:36
NOTE: There were 285911 observations read from the data set TEMP3.IPCLMS11_RECODE.
NOTE: The data set TEMP3.TEST has 250261 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 8.51 seconds
cpu time 0.42 seconds
```

Thank you for your suggestion! I'm attaching the log here for your reference.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 01:48 PM

Not that the column positions 37 and 36 correspond to the locations of the plus signs. This indicates that your start_dt_earAdmin variable is likely missing.

Your output did not contain any missing values because

if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin;

would not output anything when end_dt_earAdmin is missing (because the start value was) when either of

CLM_ADMSN_DT or clm_from_dt are not missing.

Since your start_dt_earAdmin is only assigned a value on first.bene_id I suspect that for some id values the first record is missing both CLM_ADMSN_DT and clm_from_dt.

See what this code gives you:

```
data temp3.missing;
set temp3.ipclms11_recode;
by bene_id;
if first.bene_id and nmiss(CLM_ADMSN_DT, clm_from_dt) = 2;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 01:59 PM

Thanks a lot for your suggestions and the code. But it seems that I actually don't have any data lost from resulting dataset. Below is the result when i removed the 'if...'

```
498 data temp3.ipclms11_f;
499 retain start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin;
500 set temp3.ipclms11_recode;
501 by bene_id;
502 format start_dt_earAdmin end_dt_earAdmin day30_earAdmin day90_earAdmin YYMMDDN8.;
503 if first.bene_id then start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
504 end_dt_earAdmin=sum(start_dt_earAdmin,364);
505 day30_earAdmin=sum(start_dt_earAdmin,29);
506 day90_earAdmin=sum(start_dt_earAdmin,89);
507 /* if min(CLM_ADMSN_DT, clm_from_dt) le end_dt_earAdmin; */
508 run;
NOTE: There were 285911 observations read from the data set TEMP3.IPCLMS11_RECODE.
NOTE: The data set TEMP3.IPCLMS11_F has 285911 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 10.29 seconds
cpu time 0.43 seconds
508! **** benes = 250261;
509 %put n=%nobs(temp3.ipclms11_f);
n=285911
510 data temp3.missing;
511 set temp3.ipclms11_recode;
512 by bene_id;
513 if first.bene_id and nmiss(CLM_ADMSN_DT, clm_from_dt) = 2;
514 run;
NOTE: There were 285911 observations read from the data set TEMP3.IPCLMS11_RECODE.
NOTE: The data set TEMP3.MISSING has 0 observations and 14 variables.
NOTE: DATA statement used (Total process time):
real time 2.35 seconds
cpu time 0.10 seconds
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2016 04:10 PM

Sorry I was unclear. I meant to indicate to run the code with the + instead of sum function without the IF statement.

Solution

05-18-2016
08:37 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 08:37 AM

Thank you for pointing out a great way of checking, which is quite helpful. I figured out the problem is that I forgot to use 'do' and 'end' after the 'if' statement. The missing values were acutally generated by the missingness of the start date...

I correct the if statement as below and it works now. Thank you so much for all your advise! It helps a lot!

```
if first.bene_id then do;
start_dt_earAdmin=min(CLM_ADMSN_DT, clm_from_dt);
end_dt_earAdmin=start_dt_earAdmin + 364;
day30_earAdmin=start_dt_earAdmin +29 ;
day90_earAdmin=start_dt_earAdmin +89;
end;
```