Hi all,
I have a data set created with the following code:
data test;
input id $ dxday caetype $;
datalines;
01 1890 NONSP
01 2120 INFL
01 2128 INFL
01 2135 INFL
01 2137 NONSP
02 -775 INFL
02 -569 NONSP
03 3020 ARR
03 3026 NONSP
03 3027 NONSP
03 3175 INFL
;
run;
The 3 variables denote patient ID number, number of days the patient experienced a cardiac adverse event (CAE) relative to a certain vaccination, and type of CAE. I'm trying to calculate differences in the number of days from a baseline value for each patient, where the baseline value is the first occurrence of a CAE with type 'INFL'. So, for ID 01, the first INFL CAE occurs at the 2nd observation (dxday=2120); for ID 02, the first INFL CAE occurs at the 1st observation (dxday=-775); and for ID 03 the first INFL CAE occurs at the last observation (dxday=3175).
What I'd like to do is to create a difference variable (DIFF) that calculates time differences based on the correctly identified baseline time value for each patient. So, what I'd like to end up with is a data set that looks like this:
id | dxday | caetype | DIFF |
---|---|---|---|
01 | 1890 | NONSP | -230 |
01 | 2120 | INFL | 0 |
01 | 2128 | INFL | 8 |
01 | 2135 | INFL | 15 |
01 | 2137 | NONSP | 17 |
02 | -775 | INFL | 0 |
02 | -569 | NONSP | 206 |
03 | 3020 | ARR | -155 |
03 | 3026 | NONSP | -149 |
03 | 3027 | NONSP | -148 |
03 | 3175 | INFL | 0 |
Can anyone provide any insight on how to do this? Thanks so much!
chucsp
Chucs,
SQL is more of my weak spot, so I would go about it like this.
data want;
do until (last.id);
set have;
by id;
if baseline =. and caetype='INFL' then baseline = dxday;
end;
do until (last.id);
set have;
by id;
diff = dxday - baseline;
output;
end;
run;
The logic assumes that your data are already in order BY ID DXDAY. Optionally, you might want to drop BASELINE;
Good luck.
First get a dataset that keeps the first value of caetype per each ID, along with the dxday, then re-merge that dataset onto the original and find the difference. The following code will do the trick rather easily.
Note: You did not specify how to define the "First" value of each INFL per id, so I am going to assume the base order of the data (aka the variable _n_) is the correct variable to find the "first" value.
data test1;
set test;
ordervar=_N_;
if caetype='INFL';
run;
sort data=test1;
by id ordervar;
run;
data joinvalues(drop=ordervar caetype);
set test1;
by id ordervar;
if first.id=1 then output;
run;
proc sql;
create table answer as
select t1.*
,t1.dxday-jv.dxday as diff
from test t1
join joinvalues jv on jv.id=t1.id;
quit;
run;
Hi Chucs,
May be some one come up with better answer. here is the sample code where you can get the desired o/p
proc sql;
create table want as
select a.*,b.baseday,(dxday-baseday) as diff
from have as a left join(select id,dxday as baseday from have(where=(caetype eq 'INFL'))
group by id having dxday=min(dxday)) as b
on a.id=b.id
order by a.id,a.dxday;
quit;
Thanks
Sam
Chucs,
SQL is more of my weak spot, so I would go about it like this.
data want;
do until (last.id);
set have;
by id;
if baseline =. and caetype='INFL' then baseline = dxday;
end;
do until (last.id);
set have;
by id;
diff = dxday - baseline;
output;
end;
run;
The logic assumes that your data are already in order BY ID DXDAY. Optionally, you might want to drop BASELINE;
Good luck.
Thanks to everyone for their response! I'll give them a try.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.