Hi,
I want to subtract two different variables that are populated on different observations within the same ID. A simplified version of my data is presented below:
data work.test ;
infile datalines dsd dlm='|' truncover;
input redcap_event_name :$23. motivate_weightloss
postvisit_motivate_wgtloss ID
;
format redcap_event_name $23. motivate_weightloss best.
postvisit_motivate_wgtloss best.
;
informat redcap_event_name $23. ;
label redcap_event_name='redcap_event_name'
motivate_weightloss='motivate_weightloss'
postvisit_motivate_wgtloss='postvisit_motivate_wgtloss'
;
datalines4;
screener_arm_1|10||1
baseline_arm_1|||1
post_appointment_arm_1||8|1
12_week_follow_up_arm_1|||1
screener_arm_1|8||2
baseline_arm_1|||2
post_appointment_arm_1||7|2
12_week_follow_up_arm_1|||2
screener_arm_1|9||4
baseline_arm_1|||4
post_appointment_arm_1||9|4
12_week_follow_up_arm_1|||4
screener_arm_1|8||6
baseline_arm_1|||6
post_appointment_arm_1||8|6
12_week_follow_up_arm_1|||6
screener_arm_1|9||8
baseline_arm_1|||8
post_appointment_arm_1||7|8
12_week_follow_up_arm_1|||8
;;;;
I want postvisit_motivate_wgtloss - motivate_weightloss
but postvisit_motivate_wgtloss is only populated at the post_appointment_arm and motivate_weightloss is only populated at the screener_arm. This seems like it should be really simple.
I tried the following which didnt quite get me there but I think it is close. I am trying to streamline my code for efficiency because I need to do this several times. I think it is worth mentioning that it is also a big dataset.
DATA WORK.Test_summary;
SET work.test;
BY ID;
DROP S_: redcap_event_name;
RETAIN S_BL_motivate_weightloss S_BL_postvisit_motivate_wgtloss S_FU_motivate_weightloss S_FU_postvisit_motivate_wgtloss;
IF NOT MISSING(BL_motivate_weightloss) THEN
S_BL_motivate_weightloss = BL_motivate_weightloss;
IF NOT MISSING(FU_postvisit_motivate_wgtloss) THEN
S_FU_postvisit_motivate_wgtloss = FU_postvisit_motivate_wgtloss;
IF LAST.ID THEN
DO;
BL_motivate_weightloss = S_BL_motivate_weightloss;
FU_postvisit_motivate_wgtloss = S_FU_postvisit_motivate_wgtloss;
IF NOT MISSING(BL_motivate_weightloss) THEN
BL_Wt = BL_motivate_weightloss;
IF NOT MISSING(FU_postvisit_motivate_wgtloss) THEN
FU_Wt = FU_postvisit_motivate_wgtloss;
Wt_change = FU_Wt - BL_wt;
OUTPUT;
CALL MISSING(S_BL_motivate_weightloss, S_BL_postvisit_motivate_wgtloss, S_FU_motivate_weightloss, S_FU_postvisit_motivate_wgtloss);
END;
RUN;
PROC SQL;
CREATE TABLE WORK.Test_Final
AS SELECT Detail.*
,Summary.BL_Wt
,Summary.FU_Wt
,Summary.Wt_Change
FROM WORK.Test Detail
INNER JOIN WORK.Test_Summary Summary
ON Detail.ID = Summary.ID
;
QUIT;
Any help would be appreciated.
You only need one proc sql step:
proc sql;
/* create table test_final as */
select
a.ID,
a.motivate_weightloss,
b.postvisit_motivate_wgtloss,
b.postvisit_motivate_wgtloss - a.motivate_weightloss as weightloss_change
from
test as a left join
test as b on a.id=b.id
where
a.redcap_event_name = "screener_arm_1" and
b.redcap_event_name = "post_appointment_arm_1";
quit;
You only need one proc sql step:
proc sql;
/* create table test_final as */
select
a.ID,
a.motivate_weightloss,
b.postvisit_motivate_wgtloss,
b.postvisit_motivate_wgtloss - a.motivate_weightloss as weightloss_change
from
test as a left join
test as b on a.id=b.id
where
a.redcap_event_name = "screener_arm_1" and
b.redcap_event_name = "post_appointment_arm_1";
quit;
I think the following code should do what you're asking, yes?
DATA WORK.Consolidated_Test(RENAME=( Sv_Motivate_Weightloss = Motivate_Weightloss
Sv_Postvisit_Motivate_Wgtloss = Postvisit_Motivate_Wgtloss));
DROP Redcap_Event_Name;
FORMAT ID Sv_Motivate_Weightloss Sv_Postvisit_Motivate_Wgtloss Weight_Change;
RETAIN Sv_Motivate_Weightloss Sv_Postvisit_Motivate_Wgtloss;
SET WORK.TEST;
BY ID
NOTSORTED
;
IF NOT MISSING(Motivate_Weightloss) THEN
Sv_Motivate_Weightloss = Motivate_Weightloss;
IF NOT MISSING(Postvisit_Motivate_Wgtloss) THEN
Sv_Postvisit_Motivate_Wgtloss = Postvisit_Motivate_Wgtloss;
IF LAST.ID THEN
DO;
Weight_Change = Sv_Motivate_Weightloss - Sv_Postvisit_Motivate_Wgtloss;
OUTPUT;
END;
ELSE
DO;
DELETE;
END;
RUN;
If you know for a fact that the ID's are going to be in sort order than you can use the NOTSORTED option that I've specified above. Otherwise you would need to add a Proc Sort and removed the NOTSORTED.
Results are one line per ID as shown below.
Jim
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.