DATA Step, Macro, Functions and more

two diagnosis within one year

Reply
Frequent Contributor
Posts: 110

two diagnosis within one year

data stat480.temp2;
input id date dx1 dx2 dx3 dx4 dx5;
datalines;
1  1/1/2005 411 412 413 414 415
1 2/2/2005  411 412 413 414 415
2 1/1/2006  411 412 413 418 419
2  3/3/2006 420 421 422 423 424
3 3/4/2007 430 440 450 460 470
3 4/4/2007 430 510 520 530 560
; run;

Hi I have the following dataset. I am trying to find id who have at least 2 dx of 411 or 430 within one year (dx1-dx5) and keep them in the dataset, otherwise drop. For example output in this case would be keeping id 1 and 3

Super User
Posts: 9,662

Re: two diagnosis within one year

data temp2;
input id date : mmddyy10. dx1 dx2 dx3 dx4 dx5;
format  date  mmddyy10. ;
datalines;
1  1/1/2005 411 412 413 414 415
1 2/2/2005  411 412 413 414 415
2 1/1/2006  411 412 413 418 419
2  3/3/2006 420 421 422 423 424
3  3/4/2007 430 440 450 460 470
3  4/4/2007 430 510 520 530 560
;
run;

data have;
 set temp2;
 year=year(date);
run;
proc sql;
create table key as
 select id,year,sum(dx1 in (411,430)) as dx1,
                sum(dx2 in (411,430)) as dx2,
                sum(dx3 in (411,430)) as dx3,
                sum(dx4 in (411,430)) as dx4,
                sum(dx5 in (411,430)) as dx5,
sum(calculated dx1,calculated dx2,calculated dx3,calculated dx4,calculated dx5) as sum
  from have
   group by id,year
    having calculated sum ge 2;  
    
create table want as
 select * from have where id in (select id from key);
quit;
Frequent Contributor
Posts: 110

Re: two diagnosis within one year

Thank you Xia, if I understand the coding correctly, if one dx was in 12/01/2005 and the other was in 01/01/2006, they will not be captured in your code? "even though they are still within one year" 

lilly 

Super User
Posts: 9,662

Re: two diagnosis within one year

Since you are asking for ONE year, so make a group variable as year.

 group by id,year

If you don't want it, remove it 

 group by id
Frequent Contributor
Posts: 110

Re: two diagnosis within one year

1  1/1/2005 411 412 413 414 415
1 2/2/2005  411 412 413 414 415
2 1/1/2006  411 412 413 418 419
2  3/3/2006 420 421 422 423 424
3  3/4/2007 430 440 450 460 470
3  12/4/2006 430 510 520 530 560;

If we modify the dates as above, your code will not capture id-3. I meant within one year that both codes appeared in less than a year from each other but not necessirly in the same year.  

Super User
Posts: 9,662

Re: two diagnosis within one year

I don't understand what you mean.

If there are many years in the same ID ( like 3), what you want ?

 

What output it should be ( the data you posted)? 

 

1  1/1/2005 411 412 413 414 415
1 2/2/2005  411 412 413 414 415
2 1/1/2006  411 412 413 418 419
2  3/3/2006 420 421 422 423 424
3  3/4/2007 430 440 450 460 470
3  12/4/2006 430 510 520 530 560;

 

You want take ID and Year as two group variables?

Then change the last SQL as

 

select * from have where catx(id,year) in (select catx(id,year) from key) ;

Frequent Contributor
Posts: 110

Re: two diagnosis within one year

Would still be 1 and 3. Your original code assume that the two dx should happen in the same year. But in reality one could happen in late 2004 and the other in early 2005 and still be one year apart. So I want to caputre these as well. 

Super User
Posts: 9,662

Re: two diagnosis within one year

". But in reality one could happen in late 2004 and the other in early 2005 and still be one year apart."

So how do you define this kind of  one year apart ? How do I know which date belong to which year ?

And Better to post some more data to explain your question.

Contributor
Posts: 39

Re: two diagnosis within one year

I've interpreted the question as wanting to keep all observation for an id if that id meets the condition of 2 or more occurrences of 411 or 430 within variables dx1-dx5 at anytime within any 12-month rolling period.

 

The flagged variable is used so that only one row for each id meeting the conditions output to the dataset 'idstokeep'.

 

I've  learned from this that the way the lag function behaves is affected by conditional expressions, and there has been some trial and error in getting to this.

 

Finally, a caveat. This may not work if a single observation has 2 occurrences of 411 or 430, or a value of 411 and 430. (eg DX1=411 & DX2=411, or DX1=411 & DX2=430).  If that is a real possibility then will need to make some changes.

 

 

data have;

input id date : mmddyy10. dx1 dx2 dx3 dx4 dx5;

format  date  mmddyy10. ;

* ids 1, 3, 4 have had repeat readings of 411 or 430 in a 1-year period , so

all observations for these ids will be kept;

datalines;

1  1/1/2005 411 412 413 414 415

1  2/2/2005  411 412 413 414 415

2  11/1/2006  411 412 413 418 419

2  3/3/2007 420 421 422 423 424

3  9/4/2006 430 440 450 460 470

3  6/4/2006 430 510 520 530 560

4  3/4/2006 430 440 450 460 470

4  6/12/2006 432 510 520 530 560

4  9/14/2006 430 510 520 530 560

5  3/4/2006 430 440 450 460 470

5  6/12/2006 432 510 520 530 560

5  4/14/2007 430 510 520 530 560

;

run;

 

proc sort data=have; by id date;

 

proc transpose data=have out=trans(where=(col1 in (411,430)));

by id date;

run;

data idstokeep (keep=id);

set trans;

retain flagged;

by id date;

if first.id then flagged=0 ;

if not first.id and INTCK('YEAR',lag(date),date,'C') =0 and flagged=0 then do;

flagged=1;  output;

end;

run;

data want;

merge idstokeep (in=idwanted) have;

by id;

if idwanted;

run;

Ask a Question
Discussion stats
  • 8 replies
  • 391 views
  • 0 likes
  • 3 in conversation