BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lillymaginta
Lapis Lazuli | Level 10
data stat480.temp2;
input id date dx1 dx2 dx3 dx4 dx5;
datalines;
1  1/1/2005 411 412 413 414 415
1 2/2/2005  411 412 413 414 415
2 1/1/2006  411 412 413 418 419
2  3/3/2006 420 421 422 423 424
3 3/4/2007 430 440 450 460 470
3 4/4/2007 430 510 520 530 560
; run;

Hi I have the following dataset. I am trying to find id who have at least 2 dx of 411 or 430 within one year (dx1-dx5) and keep them in the dataset, otherwise drop. For example output in this case would be keeping id 1 and 3

1 ACCEPTED SOLUTION

Accepted Solutions
JohnHoughton
Quartz | Level 8

I've interpreted the question as wanting to keep all observation for an id if that id meets the condition of 2 or more occurrences of 411 or 430 within variables dx1-dx5 at anytime within any 12-month rolling period.

 

The flagged variable is used so that only one row for each id meeting the conditions output to the dataset 'idstokeep'.

 

I've  learned from this that the way the lag function behaves is affected by conditional expressions, and there has been some trial and error in getting to this.

 

Finally, a caveat. This may not work if a single observation has 2 occurrences of 411 or 430, or a value of 411 and 430. (eg DX1=411 & DX2=411, or DX1=411 & DX2=430).  If that is a real possibility then will need to make some changes.

 

 

data have;

input id date : mmddyy10. dx1 dx2 dx3 dx4 dx5;

format  date  mmddyy10. ;

* ids 1, 3, 4 have had repeat readings of 411 or 430 in a 1-year period , so

all observations for these ids will be kept;

datalines;

1  1/1/2005 411 412 413 414 415

1  2/2/2005  411 412 413 414 415

2  11/1/2006  411 412 413 418 419

2  3/3/2007 420 421 422 423 424

3  9/4/2006 430 440 450 460 470

3  6/4/2006 430 510 520 530 560

4  3/4/2006 430 440 450 460 470

4  6/12/2006 432 510 520 530 560

4  9/14/2006 430 510 520 530 560

5  3/4/2006 430 440 450 460 470

5  6/12/2006 432 510 520 530 560

5  4/14/2007 430 510 520 530 560

;

run;

 

proc sort data=have; by id date;

 

proc transpose data=have out=trans(where=(col1 in (411,430)));

by id date;

run;

data idstokeep (keep=id);

set trans;

retain flagged;

by id date;

if first.id then flagged=0 ;

if not first.id and INTCK('YEAR',lag(date),date,'C') =0 and flagged=0 then do;

flagged=1;  output;

end;

run;

data want;

merge idstokeep (in=idwanted) have;

by id;

if idwanted;

run;

View solution in original post

8 REPLIES 8
Ksharp
Super User
data temp2;
input id date : mmddyy10. dx1 dx2 dx3 dx4 dx5;
format  date  mmddyy10. ;
datalines;
1  1/1/2005 411 412 413 414 415
1 2/2/2005  411 412 413 414 415
2 1/1/2006  411 412 413 418 419
2  3/3/2006 420 421 422 423 424
3  3/4/2007 430 440 450 460 470
3  4/4/2007 430 510 520 530 560
;
run;

data have;
 set temp2;
 year=year(date);
run;
proc sql;
create table key as
 select id,year,sum(dx1 in (411,430)) as dx1,
                sum(dx2 in (411,430)) as dx2,
                sum(dx3 in (411,430)) as dx3,
                sum(dx4 in (411,430)) as dx4,
                sum(dx5 in (411,430)) as dx5,
sum(calculated dx1,calculated dx2,calculated dx3,calculated dx4,calculated dx5) as sum
  from have
   group by id,year
    having calculated sum ge 2;  
    
create table want as
 select * from have where id in (select id from key);
quit;
lillymaginta
Lapis Lazuli | Level 10

Thank you Xia, if I understand the coding correctly, if one dx was in 12/01/2005 and the other was in 01/01/2006, they will not be captured in your code? "even though they are still within one year" 

lilly 

Ksharp
Super User

Since you are asking for ONE year, so make a group variable as year.

 group by id,year

If you don't want it, remove it 

 group by id
lillymaginta
Lapis Lazuli | Level 10
1  1/1/2005 411 412 413 414 415
1 2/2/2005  411 412 413 414 415
2 1/1/2006  411 412 413 418 419
2  3/3/2006 420 421 422 423 424
3  3/4/2007 430 440 450 460 470
3  12/4/2006 430 510 520 530 560;

If we modify the dates as above, your code will not capture id-3. I meant within one year that both codes appeared in less than a year from each other but not necessirly in the same year.  

Ksharp
Super User

I don't understand what you mean.

If there are many years in the same ID ( like 3), what you want ?

 

What output it should be ( the data you posted)? 

 

1  1/1/2005 411 412 413 414 415
1 2/2/2005  411 412 413 414 415
2 1/1/2006  411 412 413 418 419
2  3/3/2006 420 421 422 423 424
3  3/4/2007 430 440 450 460 470
3  12/4/2006 430 510 520 530 560;

 

You want take ID and Year as two group variables?

Then change the last SQL as

 

select * from have where catx(id,year) in (select catx(id,year) from key) ;

lillymaginta
Lapis Lazuli | Level 10

Would still be 1 and 3. Your original code assume that the two dx should happen in the same year. But in reality one could happen in late 2004 and the other in early 2005 and still be one year apart. So I want to caputre these as well. 

Ksharp
Super User

". But in reality one could happen in late 2004 and the other in early 2005 and still be one year apart."

So how do you define this kind of  one year apart ? How do I know which date belong to which year ?

And Better to post some more data to explain your question.

JohnHoughton
Quartz | Level 8

I've interpreted the question as wanting to keep all observation for an id if that id meets the condition of 2 or more occurrences of 411 or 430 within variables dx1-dx5 at anytime within any 12-month rolling period.

 

The flagged variable is used so that only one row for each id meeting the conditions output to the dataset 'idstokeep'.

 

I've  learned from this that the way the lag function behaves is affected by conditional expressions, and there has been some trial and error in getting to this.

 

Finally, a caveat. This may not work if a single observation has 2 occurrences of 411 or 430, or a value of 411 and 430. (eg DX1=411 & DX2=411, or DX1=411 & DX2=430).  If that is a real possibility then will need to make some changes.

 

 

data have;

input id date : mmddyy10. dx1 dx2 dx3 dx4 dx5;

format  date  mmddyy10. ;

* ids 1, 3, 4 have had repeat readings of 411 or 430 in a 1-year period , so

all observations for these ids will be kept;

datalines;

1  1/1/2005 411 412 413 414 415

1  2/2/2005  411 412 413 414 415

2  11/1/2006  411 412 413 418 419

2  3/3/2007 420 421 422 423 424

3  9/4/2006 430 440 450 460 470

3  6/4/2006 430 510 520 530 560

4  3/4/2006 430 440 450 460 470

4  6/12/2006 432 510 520 530 560

4  9/14/2006 430 510 520 530 560

5  3/4/2006 430 440 450 460 470

5  6/12/2006 432 510 520 530 560

5  4/14/2007 430 510 520 530 560

;

run;

 

proc sort data=have; by id date;

 

proc transpose data=have out=trans(where=(col1 in (411,430)));

by id date;

run;

data idstokeep (keep=id);

set trans;

retain flagged;

by id date;

if first.id then flagged=0 ;

if not first.id and INTCK('YEAR',lag(date),date,'C') =0 and flagged=0 then do;

flagged=1;  output;

end;

run;

data want;

merge idstokeep (in=idwanted) have;

by id;

if idwanted;

run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2250 views
  • 1 like
  • 3 in conversation