BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
awardell
Obsidian | Level 7

Hello! 

 

I am working on a project, and to my understanding, I need a way to sum values down a column, by an id group: however my variable needs to sum only values from visits that were achieved from dates above it. Additionally, I will need to account for  another variable and subtract it when applicable. 

 

For example, say I am a doctor, with data on all of the visits from patients. If a patient needs stitches, I total the number of stitches needed for that visit. The total number of stitches should total the number of stitches received in the previous visit. For these data, I will need to account for stitches removed. Thus, if stitches were removed, they should be subtracted from the total number of stitches. From the given variables ID, visit_order, num_stitches, and num_removed, I need to create total_stitches, num_never_removed, and patientlevel_num_never_removed. Where the num_never_removed = total_stitches - num_removed. 

 

ID visit_order num_stitches num_removed total_stitches (to this point) num_never_removed (0s could replace missings here) patientlevel_num_never_removed
X 1 4 . 4 . 2
X 2 . 2 4 2 (= 4-2) 2
X 3 3 . 5 (= 4+3 - 2)) . 2
X 4 . 3 5 2 (= 5-3) 2
X 5 . . 2 (= 5+. -2)  2
A 1 5 . 5 . 0
A 2 . . 5 . 0
A 3 3 . 8 . 0
B 1 2 . 2 . 3
B 2 4 . 6 . 3
B 3 . 3 6 3 (=6-3) 3
B 4 12 . 15 (6+12 - 3) . 3
and so on             
data have; 
input ID $ visit_order num_stitches num_removed; 
datalines; 
X 1 4 .
X 2 . 2
X 3 3 . 
X 4 . 3
X 5 . .
A 1 5 .
A 2 . .
A 3 3 .
B 1 2 .
B 2 4 .
B 3 . 3
B 4 . 12

; 
run; 

I attempted dabbling with the lag function, but I could not get a solution that worked properly for each scenario. Thank you in advance for any help you can provide. I have been working on this problem for hours and have not been able to find a solution. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

To do the "remerge" in one step, use a double DO loop:

data have;
input ID $ visit_order num_stitches num_removed;
datalines;
X 1 4 .
X 2 . 2
X 3 3 . 
X 4 . 3
X 5 . .
A 1 5 .
A 2 . .
A 3 3 .
B 1 2 .
B 2 4 .
B 3 . 3
B 4 12 .
;

data want;
do until (last.id);
  set have;
  by ID notsorted;
  retain total_stitches;
  if first.id then total_stitches = 0;
  total_stitches + num_stitches;
  if num_removed ne . then num_never_removed = total_stitches - num_removed;
  total_stitches + (-num_removed);
end;
patient_level = sum(0,num_never_removed);
do until (last.id);
  set have;
  by ID notsorted;
  retain total_stitches;
  if first.id then total_stitches = 0;
  total_stitches + num_stitches;
  if num_removed ne .
  then num_never_removed = total_stitches - num_removed;
  else num_never_removed = .;
  output;
  total_stitches + (-num_removed);
end;
run;

Note that I corrected the last observation of your example data.

View solution in original post

2 REPLIES 2
ballardw
Super User

Your logic for "total_stitches" makes no sense starting on the second row. Since 2 are removed on the second visit for for ID=X why would the total be 2? What is the logic for subtracting them on the third visit total? Same with visit 5.

 

This implements your (confusing to me) rule for total.

data want;
   set have;
   by notsorted id;
   retain total_stitches;
   lnr=lag(num_removed);
   if first.id then do;
total_stitches=0;
lnr=0;
end; if lnr>0 then total_stitches = sum(total_stitches,num_stitches) - lnr; else total_stitches = sum(total_stitches,num_stitches); if num_removed>0 then num_never_removed=total_stitches - num_removed; drop lnr; run;

Since the total needs to be reset for reach id we use BY group processing since the that will allow the use of the automatic variable First.id to test whether the current record is the first one for the id and do the reset of the total.

The notsorted option is needed because your ID values are not sorted. Probably  better would be to sort by ID and Visit_order in case there are any breaks in visits in the data and not use the Notsorted.

The First (and Last) are numeric values of 1/0 which SAS uses for True/False.

The Retain function creates a variable that will keep values across the data step boundary.

The Lag function pulls the value from the previous record for a variable. The First also resets the lag as you would not want to use the num_removed from the previous Id if there happens to be one.

 

The overall value you show as patientlevel_num_never_removed would have to be determined after the data step and merged back onto this data.

 

You are willing to create and use much longer variables then I would.

Have you encountered the SAS LABEL statement that would allow you to associate longer and nicer to read as column headings such as "Patient Level number of stitches never removed" with a much shorter variable like ns_nr?

Kurt_Bremser
Super User

To do the "remerge" in one step, use a double DO loop:

data have;
input ID $ visit_order num_stitches num_removed;
datalines;
X 1 4 .
X 2 . 2
X 3 3 . 
X 4 . 3
X 5 . .
A 1 5 .
A 2 . .
A 3 3 .
B 1 2 .
B 2 4 .
B 3 . 3
B 4 12 .
;

data want;
do until (last.id);
  set have;
  by ID notsorted;
  retain total_stitches;
  if first.id then total_stitches = 0;
  total_stitches + num_stitches;
  if num_removed ne . then num_never_removed = total_stitches - num_removed;
  total_stitches + (-num_removed);
end;
patient_level = sum(0,num_never_removed);
do until (last.id);
  set have;
  by ID notsorted;
  retain total_stitches;
  if first.id then total_stitches = 0;
  total_stitches + num_stitches;
  if num_removed ne .
  then num_never_removed = total_stitches - num_removed;
  else num_never_removed = .;
  output;
  total_stitches + (-num_removed);
end;
run;

Note that I corrected the last observation of your example data.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1075 views
  • 1 like
  • 3 in conversation