Hi,
I want to subtract two different variables that are populated on different observations within the same ID. A simplified version of my data is presented below:
data work.test ;
infile datalines dsd dlm='|' truncover;
input redcap_event_name :$23. motivate_weightloss
postvisit_motivate_wgtloss ID
;
format redcap_event_name $23. motivate_weightloss best.
postvisit_motivate_wgtloss best.
;
informat redcap_event_name $23. ;
label redcap_event_name='redcap_event_name'
motivate_weightloss='motivate_weightloss'
postvisit_motivate_wgtloss='postvisit_motivate_wgtloss'
;
datalines4;
screener_arm_1|10||1
baseline_arm_1|||1
post_appointment_arm_1||8|1
12_week_follow_up_arm_1|||1
screener_arm_1|8||2
baseline_arm_1|||2
post_appointment_arm_1||7|2
12_week_follow_up_arm_1|||2
screener_arm_1|9||4
baseline_arm_1|||4
post_appointment_arm_1||9|4
12_week_follow_up_arm_1|||4
screener_arm_1|8||6
baseline_arm_1|||6
post_appointment_arm_1||8|6
12_week_follow_up_arm_1|||6
screener_arm_1|9||8
baseline_arm_1|||8
post_appointment_arm_1||7|8
12_week_follow_up_arm_1|||8
;;;;
I want postvisit_motivate_wgtloss - motivate_weightloss
but postvisit_motivate_wgtloss is only populated at the post_appointment_arm and motivate_weightloss is only populated at the screener_arm. This seems like it should be really simple.
I tried the following which didnt quite get me there but I think it is close. I am trying to streamline my code for efficiency because I need to do this several times. I think it is worth mentioning that it is also a big dataset.
DATA WORK.Test_summary;
SET work.test;
BY ID;
DROP S_: redcap_event_name;
RETAIN S_BL_motivate_weightloss S_BL_postvisit_motivate_wgtloss S_FU_motivate_weightloss S_FU_postvisit_motivate_wgtloss;
IF NOT MISSING(BL_motivate_weightloss) THEN
S_BL_motivate_weightloss = BL_motivate_weightloss;
IF NOT MISSING(FU_postvisit_motivate_wgtloss) THEN
S_FU_postvisit_motivate_wgtloss = FU_postvisit_motivate_wgtloss;
IF LAST.ID THEN
DO;
BL_motivate_weightloss = S_BL_motivate_weightloss;
FU_postvisit_motivate_wgtloss = S_FU_postvisit_motivate_wgtloss;
IF NOT MISSING(BL_motivate_weightloss) THEN
BL_Wt = BL_motivate_weightloss;
IF NOT MISSING(FU_postvisit_motivate_wgtloss) THEN
FU_Wt = FU_postvisit_motivate_wgtloss;
Wt_change = FU_Wt - BL_wt;
OUTPUT;
CALL MISSING(S_BL_motivate_weightloss, S_BL_postvisit_motivate_wgtloss, S_FU_motivate_weightloss, S_FU_postvisit_motivate_wgtloss);
END;
RUN;
PROC SQL;
CREATE TABLE WORK.Test_Final
AS SELECT Detail.*
,Summary.BL_Wt
,Summary.FU_Wt
,Summary.Wt_Change
FROM WORK.Test Detail
INNER JOIN WORK.Test_Summary Summary
ON Detail.ID = Summary.ID
;
QUIT;
Any help would be appreciated.
You only need one proc sql step:
proc sql;
/* create table test_final as */
select
a.ID,
a.motivate_weightloss,
b.postvisit_motivate_wgtloss,
b.postvisit_motivate_wgtloss - a.motivate_weightloss as weightloss_change
from
test as a left join
test as b on a.id=b.id
where
a.redcap_event_name = "screener_arm_1" and
b.redcap_event_name = "post_appointment_arm_1";
quit;
You only need one proc sql step:
proc sql;
/* create table test_final as */
select
a.ID,
a.motivate_weightloss,
b.postvisit_motivate_wgtloss,
b.postvisit_motivate_wgtloss - a.motivate_weightloss as weightloss_change
from
test as a left join
test as b on a.id=b.id
where
a.redcap_event_name = "screener_arm_1" and
b.redcap_event_name = "post_appointment_arm_1";
quit;
I think the following code should do what you're asking, yes?
DATA WORK.Consolidated_Test(RENAME=( Sv_Motivate_Weightloss = Motivate_Weightloss
Sv_Postvisit_Motivate_Wgtloss = Postvisit_Motivate_Wgtloss));
DROP Redcap_Event_Name;
FORMAT ID Sv_Motivate_Weightloss Sv_Postvisit_Motivate_Wgtloss Weight_Change;
RETAIN Sv_Motivate_Weightloss Sv_Postvisit_Motivate_Wgtloss;
SET WORK.TEST;
BY ID
NOTSORTED
;
IF NOT MISSING(Motivate_Weightloss) THEN
Sv_Motivate_Weightloss = Motivate_Weightloss;
IF NOT MISSING(Postvisit_Motivate_Wgtloss) THEN
Sv_Postvisit_Motivate_Wgtloss = Postvisit_Motivate_Wgtloss;
IF LAST.ID THEN
DO;
Weight_Change = Sv_Motivate_Weightloss - Sv_Postvisit_Motivate_Wgtloss;
OUTPUT;
END;
ELSE
DO;
DELETE;
END;
RUN;
If you know for a fact that the ID's are going to be in sort order than you can use the NOTSORTED option that I've specified above. Otherwise you would need to add a Proc Sort and removed the NOTSORTED.
Results are one line per ID as shown below.
Jim
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.