I have a data set containing ID, Visit Type, and Date.
ID | Visit | Date |
1 | clinic | 20060505 |
1 | hospital | 20060506 |
1 | hospital | 20061217 |
2 | clinic | 20060301 |
2 | clinic | 20060305 |
2 | clinic | 20070503 |
2 | clinic | 20070506 |
2 | clinic | 20100505 |
3 | clinic | 20061112 |
4 | clinic | 20080103 |
4 | clinic | 20081012 |
4 | hospital | 20081227 |
5 | clinic | 20050325 |
5 | hospital | 20050412 |
5 | hospital | 20070510 |
5 | clinic | 20061010 |
5 | clinic | 20061231 |
5 | clinic | 20070125 |
6 | clinic | 20060718 |
6 | clinic | 20060817 |
6 | clinic | 20070918 |
Now based on this, I want to define "Index Date."
For patients who visited both hospital and clinic, I want the index date to be the first visit of hospital.
For patients who only visited clinics, I want the index date to be the first visit of clinic.
So the new table I want would look something like this
ID | Visit | Date | Index Date |
1 | clinic | 20060505 | 20060506 |
1 | hospital | 20060506 | 20060506 |
1 | hospital | 20061217 | 20060506 |
2 | clinic | 20060301 | 20060301 |
2 | clinic | 20060305 | 20060301 |
2 | clinic | 20070503 | 20060301 |
2 | clinic | 20070506 | 20060301 |
2 | clinic | 20100505 | 20060301 |
3 | clinic | 20061112 | 20061112 |
4 | clinic | 20080103 | 20081227 |
4 | clinic | 20081012 | 20081227 |
4 | hospital | 20081227 | 20081227 |
5 | clinic | 20050325 | 20050412 |
5 | hospital | 20050412 | 20050412 |
5 | hospital | 20070510 | 20050412 |
5 | clinic | 20061010 | 20050412 |
5 | clinic | 20061231 | 20050412 |
5 | clinic | 20070125 | 20050412 |
6 | clinic | 20060718 | 20060718 |
6 | clinic | 20060817 | 20060718 |
6 | clinic | 20070918 | 20060718 |
I tried using DO loop but seems like I just can't get it right.
How to solve this?!
data have;
infile cards expandtabs;
input ID Visit $ Date : yymmdd.;
format date yymmddn8.;
cards;
1 clinic 20060505
1 hospital 20060506
1 hospital 20061217
2 clinic 20060301
2 clinic 20060305
2 clinic 20070503
2 clinic 20070506
2 clinic 20100505
3 clinic 20061112
4 clinic 20080103
4 clinic 20081012
4 hospital 20081227
5 clinic 20050325
5 hospital 20050412
5 hospital 20070510
5 clinic 20061010
5 clinic 20061231
5 clinic 20070125
6 clinic 20060718
6 clinic 20060817
6 clinic 20070918
;
proc sort data=have out=temp;
by id descending visit date;
run;
proc sort data=temp out=temp1 nodupkey;
by id ;
run;
data want;
merge have temp1(keep=id date rename=(date=index_date));
by id;
run;
HI @lizwarr
You can try the following code.
However, in your output, I don't understand why index_date for ID 3 should be 2006-03-01 and not 2006-11-12, and why index_date for ID 5 should not be 2005-04-12 for all the records?
Is it a typing mistake?
Best,
proc sql;
create table want as
/*For patients who only visited clinics*/
select *, min(date) as index_date format=YYMMDD10.
from have
group by id
having count(distinct visit) =1 and visit = "clinic"
union corr all
/*For patients who visited both clinics and hospitals*/
select b.*, a.index_date
from (select distinct id, min(date) as index_date format=YYMMDD10.
from (select *
from have
group by id
having count(distinct visit) =2 and visit in("clinic","hospital"))
where visit = "hospital"
group by id) as a
inner join
(select *
from have
group by id
having count(distinct visit) =2 and visit in("clinic","hospital")) as b
on a.id = b.id
order by id, date;
quit;
Hi @lizwarr
Thank you for the clarification.
I thought date was a regular SAS date.
As it is actually character, you can remove 'format=YYMMDD10." as follows.
NB: there is no need to convert it in a valid SAS date as it is in the form YYMMDD, so the alphabetical order is OK to get the minimal value.
Best,
proc sql;
create table want as
/*For patients who only visited clinics*/
select *, min(date) as index_date
from have
group by id
having count(distinct visit) =1 and visit = "clinic"
union corr all
/*For patients who visited both clinics and hospitals*/
select b.*, a.index_date
from (select distinct id, min(date) as index_date
from (select *
from have
group by id
having count(distinct visit) =2 and visit in("clinic","hospital"))
where visit = "hospital"
group by id) as a
inner join
(select *
from have
group by id
having count(distinct visit) =2 and visit in("clinic","hospital")) as b
on a.id = b.id
order by id, date;
quit;
data have;
infile cards expandtabs;
input ID Visit $ Date : yymmdd.;
format date yymmddn8.;
cards;
1 clinic 20060505
1 hospital 20060506
1 hospital 20061217
2 clinic 20060301
2 clinic 20060305
2 clinic 20070503
2 clinic 20070506
2 clinic 20100505
3 clinic 20061112
4 clinic 20080103
4 clinic 20081012
4 hospital 20081227
5 clinic 20050325
5 hospital 20050412
5 hospital 20070510
5 clinic 20061010
5 clinic 20061231
5 clinic 20070125
6 clinic 20060718
6 clinic 20060817
6 clinic 20070918
;
proc sort data=have out=temp;
by id descending visit date;
run;
proc sort data=temp out=temp1 nodupkey;
by id ;
run;
data want;
merge have temp1(keep=id date rename=(date=index_date));
by id;
run;
Hi @lizwarr The loop you mean perhaps could be the following. Also, I request you to kindly please review the contents of your description and make sure the expected output matches the description as mentioned by @ed_sas_member . The below code took the Date as numeric. No big deal, just remove the format statement in the code if your date variable is actually character.
data have;
infile cards expandtabs;
input ID Visit $ Date : yymmdd.;
format date yymmddn8.;
cards;
1 clinic 20060505
1 hospital 20060506
1 hospital 20061217
2 clinic 20060301
2 clinic 20060305
2 clinic 20070503
2 clinic 20070506
2 clinic 20100505
3 clinic 20061112
4 clinic 20080103
4 clinic 20081012
4 hospital 20081227
5 clinic 20050325
5 hospital 20050412
5 hospital 20070510
5 clinic 20061010
5 clinic 20061231
5 clinic 20070125
6 clinic 20060718
6 clinic 20060817
6 clinic 20070918
;
data want;
_iorc_=.;
do _n_=1 by 1 until(last.id);
set have;
by id visit notsorted;
if first.visit then _iorc_+1;
if not _h_date and visit='hospital' then _h_date=date;
if not _c_date and visit='clinic' then _c_date=date;
end;
do _n_=1 to _n_;
set have;
if _iorc_>=2 then index_date=_h_date;
else index_date=_c_date;
output;
end;
format index_date yymmdd10.;
drop _:;
run;
proc print noobs;run;
Generally elders/wise SAS veterans often advice that SAS functionality works best when you have dates as numeric SAS dates. This helps in computing intervals of various kinds etc for many kinds of analysis.
So if numeric-->Proc SQL can be handy
proc sql;
create table want(drop=_n_) as
select *,min(ifn(not _n_,.,_n_)) as index_date format=yymmddn8.
from
(select *, ifn(count(distinct Visit)=2,date*(visit='hospital'),date*(visit='clinic')) as _n_
from have
group by id)
group by id
order by id,date;
quit;
Try next code:
proc sort data=have;
by ID descending visit date;
run;
data want;
set have;
by ID;
retain index_date;
if first.ID then index_date = date;
run;
If need you can sort output to desired order - for example:
proc sort data=want; by ID date; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.