Hi all,
I have data that looks like this:
data work.test ;
infile datalines dsd dlm='|' truncover;
input redcap_event_name :$23. BL_wt_clinic_ave BL_wt_home_ave
FU_wt_clinic_ave FU_wt_home_ave wt_change_source ID
;
format redcap_event_name $23. BL_wt_clinic_ave best.
BL_wt_home_ave best. FU_wt_clinic_ave best. FU_wt_home_ave best.
wt_change_source best.
;
informat redcap_event_name $23. ;
label redcap_event_name='redcap_event_name'
BL_wt_clinic_ave='BL_wt_clinic_ave' BL_wt_home_ave='BL_wt_home_ave'
FU_wt_clinic_ave='FU_wt_clinic_ave' FU_wt_home_ave='FU_wt_home_ave'
wt_change_source='wt_change_source'
3 The SAS System 23:38 Thursday, August 27, 2020
;
datalines4;
screener_arm_1|||||1|1
baseline_arm_1|259.3||||1|1
post_appointment_arm_1|||||1|1
12_week_follow_up_arm_1|||251.3||1|1
screener_arm_1|||||1|2
baseline_arm_1|245.7||||1|2
post_appointment_arm_1|||||1|2
12_week_follow_up_arm_1|||242.4||1|2
screener_arm_1|||||1|4
baseline_arm_1|166.7||||1|4
post_appointment_arm_1|||||1|4
12_week_follow_up_arm_1|||166||1|4
screener_arm_1|||||1|6
baseline_arm_1|280.933333333333||||1|6
post_appointment_arm_1|||||1|6
12_week_follow_up_arm_1|||282.8||1|6
screener_arm_1||||||8
baseline_arm_1|434.5|||||8
post_appointment_arm_1||||||8
12_week_follow_up_arm_1||||442.66|8
;;;;
What I need is to be able to find the baseline weight by ID. There are two different variables which could be baseline weight: BL_wt_clinic_ave BL_wt_home_ave
Each ID will only have a value under one of these. I want to create a new variable called BL_wt. I want to do the same thing with follow up weight which only has two variables: FU_wt_clinic_ave FU_wt_home_ave.
I will call this one FU_wt.
Afterward, I am trying to subtract the baseline from followup (FU_Wt - BL_wt) and call it Wt_change.
I am having trouble populating each of the fields for the BL_wt and Fu_wt for all of the observations under each ID. I have tried taking the max for each of these by variable, but then I cant subtract them as they are not on the same observation. This seems quite simple and I might be missing something rudimentary, but any help would be appreciated.
I wasn't able to edit it, but the dataset had a small error toward the bottom. It should look like this:
data work.test ;
infile datalines dsd dlm='|' truncover;
input redcap_event_name :$23. BL_wt_clinic_ave BL_wt_home_ave
FU_wt_clinic_ave FU_wt_home_ave wt_change_source ID
;
format redcap_event_name $23. BL_wt_clinic_ave best.
BL_wt_home_ave best. FU_wt_clinic_ave best. FU_wt_home_ave best.
wt_change_source best.
;
informat redcap_event_name $23. ;
label redcap_event_name='redcap_event_name'
BL_wt_clinic_ave='BL_wt_clinic_ave' BL_wt_home_ave='BL_wt_home_ave'
FU_wt_clinic_ave='FU_wt_clinic_ave' FU_wt_home_ave='FU_wt_home_ave'
wt_change_source='wt_change_source'
3 The SAS System 23:38 Thursday, August 27, 2020
;
datalines4;
screener_arm_1|||||1|1
baseline_arm_1|259.3||||1|1
post_appointment_arm_1|||||1|1
12_week_follow_up_arm_1|||251.3||1|1
screener_arm_1|||||1|2
baseline_arm_1|245.7||||1|2
post_appointment_arm_1|||||1|2
12_week_follow_up_arm_1|||242.4||1|2
screener_arm_1|||||1|4
baseline_arm_1|166.7||||1|4
post_appointment_arm_1|||||1|4
12_week_follow_up_arm_1|||166||1|4
screener_arm_1|||||1|6
baseline_arm_1|280.933333333333||||1|6
post_appointment_arm_1|||||1|6
12_week_follow_up_arm_1|||282.8||1|6
screener_arm_1|||||2|8
baseline_arm_1|434.5||||2|8
post_appointment_arm_1|||||2|8
12_week_follow_up_arm_1|||442.66|2|8
;;;;
I created the source group to differentiate between the different nature of the data (self report or clinic weighed)
There's probably a more elegant way to do this with SQL, but I'm tired, so I just wrote a quick Data step:
DATA WORK.Test_Final;
SET WORK.Test;
BY ID;
DROP Save_: redcap_event_name;
RETAIN Save_BL_wt_clinic_ave Save_BL_wt_home_ave Save_FU_wt_clinic_ave Save_FU_wt_home_ave;
IF NOT MISSING(BL_wt_clinic_ave) THEN
Save_BL_wt_clinic_ave = BL_wt_clinic_ave;
IF NOT MISSING(BL_wt_home_ave) THEN
Save_BL_wt_home_ave = BL_wt_home_ave;
IF NOT MISSING(FU_wt_clinic_ave) THEN
Save_FU_wt_clinic_ave = FU_wt_clinic_ave;
IF NOT MISSING(FU_wt_home_ave) THEN
Save_FU_wt_home_ave = FU_wt_home_ave;
IF LAST.ID THEN
DO;
BL_wt_clinic_ave = Save_BL_wt_clinic_ave;
BL_wt_home_ave = Save_BL_wt_home_ave;
FU_wt_clinic_ave = Save_FU_wt_clinic_ave;
FU_wt_home_ave = Save_FU_wt_home_ave;
IF NOT MISSING(BL_wt_clinic_ave) THEN
BL_Wt = BL_wt_clinic_ave;
ELSE
IF NOT MISSING(BL_wt_home_ave) THEN
BL_Wt = BL_wt_home_ave;
IF NOT MISSING(FU_wt_clinic_ave) THEN
FU_Wt = FU_wt_clinic_ave;
ELSE
IF NOT MISSING(FU_wt_home_ave) THEN
FU_Wt = FU_wt_home_ave;
Wt_change = FU_Wt - BL_wt;
OUTPUT;
CALL MISSING(Save_BL_wt_clinic_ave, Save_BL_wt_home_ave, Save_FU_wt_clinic_ave, Save_FU_wt_home_ave);
END;
RUN;
The above Data step follows immediately after the Data step that you coded and uses the output of your Data step in this Data step's SET statement.
Results:
Is this what you wanted in terms of results? It's basically one line per ID with all of the data consolidated from all of the raw input data lines.
Jim
It is really close! I am trying to do the same thing, but not consolidate the lines. Each of the observations in the IDs is a different time point. I was hoping that I could have the BL_Wt and the Fu_Wt replicate for each of the observations within ID. However, all the rest is spot on!
Oh, OK. That's not too difficult.
Do the following:
PROC SQL;
CREATE TABLE WORK.Test_Final
AS SELECT Detail.*
,Summary.BL_Wt
,Summary.FU_Wt
,Summary.Wt_Change
FROM WORK.Test Detail
INNER JOIN WORK.Test_Summary Summary
ON Detail.ID = Summary.ID
;
QUIT;
You should get the following results:
Jim
In the data you provided, there is always exactly one baseline measuremeant (never 2 or more) and exactly one follow-up measurement (never two or more), so this will work:
proc summary data=test nway;
class id;
var bl_wt_clinic_ave--fu_wt_home_ave;
output out=stats(drop=_:) max=max1-max4;
run;
data want;
merge test stats;
by id;
bl_wt=max(max1,max2);
fu_wt=max(max3,max4);
wt_change=bl_wt-fu_wt;
drop max1-max4;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.