turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Data cleaning based on rules: time period duration...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2 weeks ago - last edited 2 weeks ago

I'm working with insurance claim data where patients treatment information can be affected by insurance enrollments. Losing insurance but somehow managing to get it back is common, needless to say. If i use data as is, I will have problems such as: patients that appear no treatment received were in actuality happened due to loss of insurance thus not in my claim data. Many other rules can be thought of but I'd like to stick to below few major rules and clean the data accordingly. Please help write SAS code reflecting these rules to a final clean data where my patients treatment information could be sought with minimal influence of insurance status. Perfect doesn't exist. But would like to weed out those observations looking obviously unreliable. @art297 @ChrisNZ @PGStats @Ksharp

Eligibility rules:

- Patients must have insurance => 4 months on a continuous basis where times in between enrollments are up to 30 days but not greater (John had 4 months of continuous insurance enrollment but lost it at the end of 4th month but got his insurance back after 29 days. And similar scenarios were repeated. I would keep John's observation in my data even if he lost his insurance (continued longer then 4 months many times) but always managed to get insurance back within less than a month every time he lose his insurance).

- Insurance start date must be tied to diagnosis date. Aka, patients must have insurance started within one month since date of diagnosis

- With the exception when:

Above conditions is not met because insurance was ended due to death

Data attached to this post.

where variables are:

'begn:' date for beginning of insurance;

'endn:' date for end of insurance

alive: (1)-alive, (0)-death

date of death and date of diagnosis are self explanatory.

```
proc import datafile="...\insurance.csv" out=insurance
dbms=csv replace;
getnames=yes;
run;
```

My attempt which failed to tie the insurance dates to 'date of diagnosis' and 'date of death' and whether time in between two insurances greater or less than a month.

```
/*whether patients insurance lasted only a month or less*/
data insurance1; set insurance;
if sum(end1-beg1)<31 and sum(end2-beg2)<31 and sum(end3-beg3)<31 and sum(end4-beg4)<31 and sum(end5-beg5)<31
and sum(end6-beg6)<31 and sum(end7-beg7)<31 and sum(end8-beg8)<31 and sum(end9-beg9)<31 and sum(end10-beg10)<31
and sum(end11-beg11)<31 and sum(end12-beg12)<31 and sum(end13-beg13)<31 then remove=1;
if sum(end1-beg1)>=31 and sum(end2-beg2)>=31 and sum(end3-beg3)>=31 and sum(end4-beg4)>=31 and sum(end5-beg5)>=31
and sum(end6-beg6)>=31 and sum(end7-beg7)>=31 and sum(end8-beg8)>=31 and sum(end9-beg9)>=31 and sum(end10-beg10)>=31
and sum(end11-beg11)>=31 and sum(end12-beg12)>=31 and sum(end13-beg13)>=31 then remove=0;
/*whether patients insurance continued => 4 months**/
if sum(end1-beg1)<124 or sum(end2-beg2)<124 or sum(end3-beg3)<124 or sum(end4-beg4)<124 or sum(end5-beg5)<124
or sum(end6-beg6)<124 or sum(end7-beg7)<124 or sum(end8-beg8)<124 or sum(end9-beg9)<124 or sum(end10-beg10)<124
or sum(end11-beg11)<124 or sum(end12-beg12)<124 or sum(end13-beg13)<124 then remove=1;
if sum(end1-beg1)>=124 or sum(end2-beg2)>=124 or sum(end3-beg3)>=124 or sum(end4-beg4)>=124 or sum(end5-beg5)>=124
or sum(end6-beg6)>=124 or sum(end7-beg7)>=124 or sum(end8-beg8)>=124 or sum(end9-beg9)>=124 or sum(end10-beg10)>=124
or sum(end11-beg11)>=124 or sum(end12-beg12)>=124 or sum(end13-beg13)>=124 then remove=0;
if remove=0 then output;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Cruise

2 weeks ago

What do you want to happen if sum(end1-beg1)<31 and sum(end2-beg2)>=31 ?
You don't test for this scenario at the moment.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

2 weeks ago

Good catch indeed. I will keep the case but will check whether time period between insurance is not longer than a month. Can you please help me have my rules reflected in the code? I'll appreciate that so much.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Cruise

2 weeks ago - last edited 2 weeks ago

I think there may be some concern about the entire logic.

For instance here:

```
if sum(end1-beg1)<124 or sum(end2-beg2)<124 or sum(end3-beg3)<124 or sum(end4-beg4)<124 or sum(end5-beg5)<124
or sum(end6-beg6)<124 or sum(end7-beg7)<124 or sum(end8-beg8)<124 or sum(end9-beg9)<124 or sum(end10-beg10)<124
or sum(end11-beg11)<124 or sum(end12-beg12)<124 or sum(end13-beg13)<124 then remove=1;
if sum(end1-beg1)>=124 or sum(end2-beg2)>=124 or sum(end3-beg3)>=124 or sum(end4-beg4)>=124 or sum(end5-beg5)>=124
or sum(end6-beg6)>=124 or sum(end7-beg7)>=124 or sum(end8-beg8)>=124 or sum(end9-beg9)>=124 or sum(end10-beg10)>=124
or sum(end11-beg11)>=124 or sum(end12-beg12)>=124 or sum(end13-beg13)>=124 then remove=0;
if remove=0 then output;
```

If sum(end1-beg1)<124 and sum(end11-beg11)>=124 are both true then what should remove actually be?

And what the heck is SUM doing in there at all? It is only adding characters. If the values are dates then If (end1-beg1)<124 is sufficient.

It is also time to learn to use arrays.

Also you should be aware that every record where sum(end1-beg1)<31 is also less than 124 and every record sum(end1-beg1)>=124 is also >=31. So assignment to a single "remove" variable is very likely not behaving as intended. So as a minimum you may want to consider a 31 day variable and a 124 day variable separately.

A better description of what the inclusion exclusion rule is may help. I'm not going to look a the entire data example to try to figure out rules.