Solved: How to create a variable based on another variables per ID

lizwarr · Posted 05-18-2020 05:02 AM

I have a data set containing ID, Visit Type, and Date.

ID	Visit	Date
1	clinic	20060505
1	hospital	20060506
1	hospital	20061217
2	clinic	20060301
2	clinic	20060305
2	clinic	20070503
2	clinic	20070506
2	clinic	20100505
3	clinic	20061112
4	clinic	20080103
4	clinic	20081012
4	hospital	20081227
5	clinic	20050325
5	hospital	20050412
5	hospital	20070510
5	clinic	20061010
5	clinic	20061231
5	clinic	20070125
6	clinic	20060718
6	clinic	20060817
6	clinic	20070918

Now based on this, I want to define "Index Date."

For patients who visited both hospital and clinic, I want the index date to be the first visit of hospital.

For patients who only visited clinics, I want the index date to be the first visit of clinic.

So the new table I want would look something like this

ID	Visit	Date	Index Date
1	clinic	20060505	20060506
1	hospital	20060506	20060506
1	hospital	20061217	20060506
2	clinic	20060301	20060301
2	clinic	20060305	20060301
2	clinic	20070503	20060301
2	clinic	20070506	20060301
2	clinic	20100505	20060301
3	clinic	20061112	20061112
4	clinic	20080103	20081227
4	clinic	20081012	20081227
4	hospital	20081227	20081227
5	clinic	20050325	20050412
5	hospital	20050412	20050412
5	hospital	20070510	20050412
5	clinic	20061010	20050412
5	clinic	20061231	20050412
5	clinic	20070125	20050412
6	clinic	20060718	20060718
6	clinic	20060817	20060718
6	clinic	20070918	20060718

I tried using DO loop but seems like I just can't get it right.

How to solve this?!

Ksharp · Posted 05-18-2020 07:01 AM

data have;
infile cards expandtabs;
input ID	Visit $	Date : yymmdd.;
format date yymmddn8.;
cards;
1	clinic	20060505
1	hospital	20060506
1	hospital	20061217
2	clinic	20060301
2	clinic	20060305
2	clinic	20070503
2	clinic	20070506
2	clinic	20100505
3	clinic	20061112
4	clinic	20080103
4	clinic	20081012
4	hospital	20081227
5	clinic	20050325
5	hospital	20050412
5	hospital	20070510
5	clinic	20061010
5	clinic	20061231
5	clinic	20070125
6	clinic	20060718
6	clinic	20060817
6	clinic	20070918
;

proc sort data=have out=temp;
by id descending visit date;
run;
proc sort data=temp out=temp1 nodupkey;
by id ;
run;
data want;
 merge have temp1(keep=id date rename=(date=index_date));
 by id;
run;

View solution in original post

ed_sas_member · Posted 05-18-2020 05:54 AM

HI @lizwarr

You can try the following code.

However, in your output, I don't understand why index_date for ID 3 should be 2006-03-01 and not 2006-11-12, and why index_date for ID 5 should not be 2005-04-12 for all the records?

Is it a typing mistake?

Best,

proc sql;

	create table want as
	
	/*For patients who only visited clinics*/
	select *, min(date) as index_date format=YYMMDD10.
	from have
	group by id
	having count(distinct visit) =1 and visit = "clinic"

	union corr all 

	/*For patients who visited both clinics and hospitals*/
	select b.*, a.index_date
	from (select distinct id, min(date) as index_date format=YYMMDD10.
		from (select *
			 from have
			 group by id
			 having count(distinct visit) =2 and visit in("clinic","hospital"))
		where visit = "hospital"
		group by id) as a
		inner join
		(select *
		from have
		group by id
		having count(distinct visit) =2 and visit in("clinic","hospital")) as b
		on a.id = b.id
		
	order by id, date;
quit;

lizwarr · Posted 05-18-2020 06:43 AM

Yes, those are my terrible typos, since I was in a rush to get somewhere.. Sorry for the confusion. Thank you for your quick response! I tried the code you provided, but I keep getting the error code at the end saying "character expression requires a character format" for the line "order by id, date;" My ID is numeric variable, whereas the date is character..

ed_sas_member · Posted 05-18-2020 06:47 AM

Hi @lizwarr

Thank you for the clarification.

I thought date was a regular SAS date.

As it is actually character, you can remove 'format=YYMMDD10." as follows.

NB: there is no need to convert it in a valid SAS date as it is in the form YYMMDD, so the alphabetical order is OK to get the minimal value.

Best,

proc sql;

	create table want as
	
	/*For patients who only visited clinics*/

	select *, min(date) as index_date
	from have
	group by id
	having count(distinct visit) =1 and visit = "clinic"

	union corr all 

	/*For patients who visited both clinics and hospitals*/
	select b.*, a.index_date
	from (select distinct id, min(date) as index_date
		from (select *
			 from have
			 group by id
			 having count(distinct visit) =2 and visit in("clinic","hospital"))
		where visit = "hospital"
		group by id) as a
		inner join
		(select *
		from have
		group by id
		having count(distinct visit) =2 and visit in("clinic","hospital")) as b
		on a.id = b.id
		
	order by id, date;
quit;

Ksharp · Posted 05-18-2020 07:01 AM

data have;
infile cards expandtabs;
input ID	Visit $	Date : yymmdd.;
format date yymmddn8.;
cards;
1	clinic	20060505
1	hospital	20060506
1	hospital	20061217
2	clinic	20060301
2	clinic	20060305
2	clinic	20070503
2	clinic	20070506
2	clinic	20100505
3	clinic	20061112
4	clinic	20080103
4	clinic	20081012
4	hospital	20081227
5	clinic	20050325
5	hospital	20050412
5	hospital	20070510
5	clinic	20061010
5	clinic	20061231
5	clinic	20070125
6	clinic	20060718
6	clinic	20060817
6	clinic	20070918
;

proc sort data=have out=temp;
by id descending visit date;
run;
proc sort data=temp out=temp1 nodupkey;
by id ;
run;
data want;
 merge have temp1(keep=id date rename=(date=index_date));
 by id;
run;

lizwarr · Posted 05-20-2020 02:42 AM

Ah! As simple as that!! Thank you so much!

novinosrin · Posted 05-18-2020 08:04 AM

Hi @lizwarr The loop you mean perhaps could be the following. Also, I request you to kindly please review the contents of your description and make sure the expected output matches the description as mentioned by @ed_sas_member . The below code took the Date as numeric. No big deal, just remove the format statement in the code if your date variable is actually character.



data have;
infile cards expandtabs;
input ID	Visit $	Date : yymmdd.;
format date yymmddn8.;
cards;
1	clinic	20060505
1	hospital	20060506
1	hospital	20061217
2	clinic	20060301
2	clinic	20060305
2	clinic	20070503
2	clinic	20070506
2	clinic	20100505
3	clinic	20061112
4	clinic	20080103
4	clinic	20081012
4	hospital	20081227
5	clinic	20050325
5	hospital	20050412
5	hospital	20070510
5	clinic	20061010
5	clinic	20061231
5	clinic	20070125
6	clinic	20060718
6	clinic	20060817
6	clinic	20070918
;

data want;
 _iorc_=.;
 do _n_=1 by 1 until(last.id);
  set have;
  by id visit notsorted;
  if first.visit then _iorc_+1;
  if not _h_date and visit='hospital' then _h_date=date;
  if not _c_date and visit='clinic' then _c_date=date;
 end;
 do _n_=1 to _n_;
  set have;
  if _iorc_>=2 then index_date=_h_date;
  else index_date=_c_date;
  output;
 end;
 format index_date yymmdd10.;
 drop _:;
run;

proc print noobs;run;

novinosrin · Posted 05-18-2020 08:59 AM

Generally elders/wise SAS veterans often advice that SAS functionality works best when you have dates as numeric SAS dates. This helps in computing intervals of various kinds etc for many kinds of analysis.

So if numeric-->Proc SQL can be handy



proc sql;
create table want(drop=_n_) as
select *,min(ifn(not _n_,.,_n_)) as index_date format=yymmddn8.
from 
(select *, ifn(count(distinct Visit)=2,date*(visit='hospital'),date*(visit='clinic')) as _n_
from have 
group by id)
group by id
order by id,date;
quit;

Shmuel · Posted 05-18-2020 09:03 AM

Try next code:

proc sort data=have;
   by ID descending visit date;
run;
data want;
 set have;
  by ID;
      retain index_date;
      if first.ID then index_date = date;
run;

If need you can sort output to desired order - for example:

proc sort data=want; by ID date; run;

How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Re: How to create a variable based on another variables per ID

Registration is open