data stat480.temp2;
input id date dx1 dx2 dx3 dx4 dx5;
datalines;
1 1/1/2005 411 412 413 414 415
1 2/2/2005 411 412 413 414 415
2 1/1/2006 411 412 413 418 419
2 3/3/2006 420 421 422 423 424
3 3/4/2007 430 440 450 460 470
3 4/4/2007 430 510 520 530 560
;
run;
Hi I have the following dataset. I am trying to find id who have at least 2 dx of 411 or 430 within one year (dx1-dx5) and keep them in the dataset, otherwise drop. For example output in this case would be keeping id 1 and 3
I've interpreted the question as wanting to keep all observation for an id if that id meets the condition of 2 or more occurrences of 411 or 430 within variables dx1-dx5 at anytime within any 12-month rolling period.
The flagged variable is used so that only one row for each id meeting the conditions output to the dataset 'idstokeep'.
I've learned from this that the way the lag function behaves is affected by conditional expressions, and there has been some trial and error in getting to this.
Finally, a caveat. This may not work if a single observation has 2 occurrences of 411 or 430, or a value of 411 and 430. (eg DX1=411 & DX2=411, or DX1=411 & DX2=430). If that is a real possibility then will need to make some changes.
data have;
input id date : mmddyy10. dx1 dx2 dx3 dx4 dx5;
format date mmddyy10. ;
* ids 1, 3, 4 have had repeat readings of 411 or 430 in a 1-year period , so
all observations for these ids will be kept;
datalines;
1 1/1/2005 411 412 413 414 415
1 2/2/2005 411 412 413 414 415
2 11/1/2006 411 412 413 418 419
2 3/3/2007 420 421 422 423 424
3 9/4/2006 430 440 450 460 470
3 6/4/2006 430 510 520 530 560
4 3/4/2006 430 440 450 460 470
4 6/12/2006 432 510 520 530 560
4 9/14/2006 430 510 520 530 560
5 3/4/2006 430 440 450 460 470
5 6/12/2006 432 510 520 530 560
5 4/14/2007 430 510 520 530 560
;
run;
proc sort data=have; by id date;
proc transpose data=have out=trans(where=(col1 in (411,430)));
by id date;
run;
data idstokeep (keep=id);
set trans;
retain flagged;
by id date;
if first.id then flagged=0 ;
if not first.id and INTCK('YEAR',lag(date),date,'C') =0 and flagged=0 then do;
flagged=1; output;
end;
run;
data want;
merge idstokeep (in=idwanted) have;
by id;
if idwanted;
run;
data temp2;
input id date : mmddyy10. dx1 dx2 dx3 dx4 dx5;
format date mmddyy10. ;
datalines;
1 1/1/2005 411 412 413 414 415
1 2/2/2005 411 412 413 414 415
2 1/1/2006 411 412 413 418 419
2 3/3/2006 420 421 422 423 424
3 3/4/2007 430 440 450 460 470
3 4/4/2007 430 510 520 530 560
;
run;
data have;
set temp2;
year=year(date);
run;
proc sql;
create table key as
select id,year,sum(dx1 in (411,430)) as dx1,
sum(dx2 in (411,430)) as dx2,
sum(dx3 in (411,430)) as dx3,
sum(dx4 in (411,430)) as dx4,
sum(dx5 in (411,430)) as dx5,
sum(calculated dx1,calculated dx2,calculated dx3,calculated dx4,calculated dx5) as sum
from have
group by id,year
having calculated sum ge 2;
create table want as
select * from have where id in (select id from key);
quit;
Thank you Xia, if I understand the coding correctly, if one dx was in 12/01/2005 and the other was in 01/01/2006, they will not be captured in your code? "even though they are still within one year"
lilly
Since you are asking for ONE year, so make a group variable as year.
group by id,year
If you don't want it, remove it
group by id
1 1/1/2005 411 412 413 414 415
1 2/2/2005 411 412 413 414 415
2 1/1/2006 411 412 413 418 419
2 3/3/2006 420 421 422 423 424
3 3/4/2007 430 440 450 460 470
3 12/4/2006 430 510 520 530 560;
If we modify the dates as above, your code will not capture id-3. I meant within one year that both codes appeared in less than a year from each other but not necessirly in the same year.
I don't understand what you mean.
If there are many years in the same ID ( like 3), what you want ?
What output it should be ( the data you posted)?
1 1/1/2005 411 412 413 414 415
1 2/2/2005 411 412 413 414 415
2 1/1/2006 411 412 413 418 419
2 3/3/2006 420 421 422 423 424
3 3/4/2007 430 440 450 460 470
3 12/4/2006 430 510 520 530 560;
You want take ID and Year as two group variables?
Then change the last SQL as
select * from have where catx(id,year) in (select catx(id,year) from key) ;
Would still be 1 and 3. Your original code assume that the two dx should happen in the same year. But in reality one could happen in late 2004 and the other in early 2005 and still be one year apart. So I want to caputre these as well.
". But in reality one could happen in late 2004 and the other in early 2005 and still be one year apart."
So how do you define this kind of one year apart ? How do I know which date belong to which year ?
And Better to post some more data to explain your question.
I've interpreted the question as wanting to keep all observation for an id if that id meets the condition of 2 or more occurrences of 411 or 430 within variables dx1-dx5 at anytime within any 12-month rolling period.
The flagged variable is used so that only one row for each id meeting the conditions output to the dataset 'idstokeep'.
I've learned from this that the way the lag function behaves is affected by conditional expressions, and there has been some trial and error in getting to this.
Finally, a caveat. This may not work if a single observation has 2 occurrences of 411 or 430, or a value of 411 and 430. (eg DX1=411 & DX2=411, or DX1=411 & DX2=430). If that is a real possibility then will need to make some changes.
data have;
input id date : mmddyy10. dx1 dx2 dx3 dx4 dx5;
format date mmddyy10. ;
* ids 1, 3, 4 have had repeat readings of 411 or 430 in a 1-year period , so
all observations for these ids will be kept;
datalines;
1 1/1/2005 411 412 413 414 415
1 2/2/2005 411 412 413 414 415
2 11/1/2006 411 412 413 418 419
2 3/3/2007 420 421 422 423 424
3 9/4/2006 430 440 450 460 470
3 6/4/2006 430 510 520 530 560
4 3/4/2006 430 440 450 460 470
4 6/12/2006 432 510 520 530 560
4 9/14/2006 430 510 520 530 560
5 3/4/2006 430 440 450 460 470
5 6/12/2006 432 510 520 530 560
5 4/14/2007 430 510 520 530 560
;
run;
proc sort data=have; by id date;
proc transpose data=have out=trans(where=(col1 in (411,430)));
by id date;
run;
data idstokeep (keep=id);
set trans;
retain flagged;
by id date;
if first.id then flagged=0 ;
if not first.id and INTCK('YEAR',lag(date),date,'C') =0 and flagged=0 then do;
flagged=1; output;
end;
run;
data want;
merge idstokeep (in=idwanted) have;
by id;
if idwanted;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.