Hi all,
I have a data set created with the following code:
data test;
input id $ dxday caetype $;
datalines;
01 1890 NONSP
01 2120 INFL
01 2128 INFL
01 2135 INFL
01 2137 NONSP
02 -775 INFL
02 -569 NONSP
03 3020 ARR
03 3026 NONSP
03 3027 NONSP
03 3175 INFL
;
run;
The 3 variables denote patient ID number, number of days the patient experienced a cardiac adverse event (CAE) relative to a certain vaccination, and type of CAE. I'm trying to calculate differences in the number of days from a baseline value for each patient, where the baseline value is the first occurrence of a CAE with type 'INFL'. So, for ID 01, the first INFL CAE occurs at the 2nd observation (dxday=2120); for ID 02, the first INFL CAE occurs at the 1st observation (dxday=-775); and for ID 03 the first INFL CAE occurs at the last observation (dxday=3175).
What I'd like to do is to create a difference variable (DIFF) that calculates time differences based on the correctly identified baseline time value for each patient. So, what I'd like to end up with is a data set that looks like this:
id | dxday | caetype | DIFF |
---|---|---|---|
01 | 1890 | NONSP | -230 |
01 | 2120 | INFL | 0 |
01 | 2128 | INFL | 8 |
01 | 2135 | INFL | 15 |
01 | 2137 | NONSP | 17 |
02 | -775 | INFL | 0 |
02 | -569 | NONSP | 206 |
03 | 3020 | ARR | -155 |
03 | 3026 | NONSP | -149 |
03 | 3027 | NONSP | -148 |
03 | 3175 | INFL | 0 |
Can anyone provide any insight on how to do this? Thanks so much!
chucsp
Chucs,
SQL is more of my weak spot, so I would go about it like this.
data want;
do until (last.id);
set have;
by id;
if baseline =. and caetype='INFL' then baseline = dxday;
end;
do until (last.id);
set have;
by id;
diff = dxday - baseline;
output;
end;
run;
The logic assumes that your data are already in order BY ID DXDAY. Optionally, you might want to drop BASELINE;
Good luck.
First get a dataset that keeps the first value of caetype per each ID, along with the dxday, then re-merge that dataset onto the original and find the difference. The following code will do the trick rather easily.
Note: You did not specify how to define the "First" value of each INFL per id, so I am going to assume the base order of the data (aka the variable _n_) is the correct variable to find the "first" value.
data test1;
set test;
ordervar=_N_;
if caetype='INFL';
run;
sort data=test1;
by id ordervar;
run;
data joinvalues(drop=ordervar caetype);
set test1;
by id ordervar;
if first.id=1 then output;
run;
proc sql;
create table answer as
select t1.*
,t1.dxday-jv.dxday as diff
from test t1
join joinvalues jv on jv.id=t1.id;
quit;
run;
Hi Chucs,
May be some one come up with better answer. here is the sample code where you can get the desired o/p
proc sql;
create table want as
select a.*,b.baseday,(dxday-baseday) as diff
from have as a left join(select id,dxday as baseday from have(where=(caetype eq 'INFL'))
group by id having dxday=min(dxday)) as b
on a.id=b.id
order by a.id,a.dxday;
quit;
Thanks
Sam
Chucs,
SQL is more of my weak spot, so I would go about it like this.
data want;
do until (last.id);
set have;
by id;
if baseline =. and caetype='INFL' then baseline = dxday;
end;
do until (last.id);
set have;
by id;
diff = dxday - baseline;
output;
end;
run;
The logic assumes that your data are already in order BY ID DXDAY. Optionally, you might want to drop BASELINE;
Good luck.
Thanks to everyone for their response! I'll give them a try.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: