Hello all,
I am working with clinical patient data for 375 infants. Each infant has multiple rows representing different medical visits in which certain measurements were taken (i.e. height and weight), each person has a different number of rows (total data set rows is about 17,000). Every patient has a unique ID, however these are not consecutive values, they are medical record numbers. I have sorted the data first by this patient id, then by the date of patient visit.
I need to extract the values from measurement fields when a patient's age is equal to 36 weeks, but because of the nature of this data many patients do not have a measurement taken at the precise time point I want, so I then need to extract the measurement from the visit when their age was closest to the desired age.
I am attempting to use a DO Loop to test the age variable for closeness to 36 weeks and then extract the values, but it is not working.
My goal in the below code is to compare age_in_weeks for each patient to 36 weeks, if the age is precisely equal to that value then I want the weight at that visit to be the value for a new variable weight_36_wks. If the age is NOT 36 weeks, I want to compare the age to +/- 1 week. If that age is still not captured I want the age_in_week_counter to accumulate by 1 and then do the comparison again.
When I run the code the accumulation piece does not seem to be working, does it matter that my index variables are not consecutive values? The counter adds one and then never adds again.
data test;
set metrics;
age_in_wk_counter = 1;
DO i = patient_id;
DO j = age_in_wks;
IF j = 36 then weight_36_wks = wt_grams;
ELSE IF j + age_in_wk_counter = 36 then weight_36_wks = wt_grams;
ELSE IF j - age_in_wk_counter = 36 then weight_36_wks = wt_grams;
ELSE weight_36_wks = .
age_in_wk_counter +1;
output;
end;
end;
run;
proc sql;
create table want as
select
patient_id,
weight as wt_grams,
age_in_wks,
abs(36-age_in_wks) as dist
from metrics
group by patient_id
having calculated dist = min(calculated dist)
;
quit;
Untested; for tested code, please supply example data in usable form (data step with datalines).
SQL is very good for your task.
Here is an alternative in DATA Step.
Q: Would it be more physiologically appropriate to compute an interpolated weight from the weights bounding week 36 if week 36 is not present ?
Example:
code - sample data generator
data have; call streaminit(1234); do patid = 1 to 375; length age_wks 8; wt_grams = rand('integer', 2250, 7500); do age_wks = 1 to 50;
* average wt gain from https://www.healthline.com/health/baby/baby-weight-gain;
wt_grams + ifn(age_wks<=16,rand('integer',160,180), ifn(age_wks<=26,rand('integer',113,150), ifn(age_wks<=52,rand('integer',57-113), .))); * skip data 45% of the time; if rand('uniform') < 0.45 then continue; * skip week 36 every 7th patient; if mod(patid,7) = 0 and age_wks = 36 then continue; * skip week 35 every 14th patient; if mod(patid,14) = 0 and age_wks = 35 then continue; if patid=20 and age_wks <= 40 then continue; output; end; end; run;
code - select data from last record approaching week 36, overridden by first record after week 36 if that is closer.
* create view with new categorical variable used in subsequent BY processing; data have_v / view=have_v; set have; length wk_zone $6; wk_zone = ifc(age_wks<=36,'<=36','>36'); run; * process each patients data, retaining the closest to week 36 information; data want_36; set have_v; by patid wk_zone; retain wk36_weight wk36_age; if first.patid then call missing(wk36_weight, wk36_age); if last.wk_zone and wk_zone = '<=36' then do; wk36_weight = wt_grams; wk36_age = age_wks; end; if first.wk_zone and wk_zone = '>36' then do; if missing(wk36_age) or age_wks-36 < 36-wk36_age then do; wk36_weight = wt_grams; wk36_age = age_wks; end; end; if last.patid; keep patid wk36:; run;
Results
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.