@samp945 wrote:
One question though: I am not familiar with the _ALL_ variable that you used on the by statement. If I understand correctly, using _ALL_ after the first two variables (time stratumnum) has the effect of including all possible combinations of every other variable without actually having to write those all out. Is that correct?
The variable list _ALL_ stands for all variables in dataset KMSurvivalPlotOffType, ordered by variable number (see PROC CONTENTS output). So, adding more variable names in the BY statement actually creates a list with duplicate names, but the duplicates (here: time and stratumnum contained in _ALL_) are ignored, as is mentioned in the second note in the log.
Putting time stratumnum first in the BY statement ensures that dataset WANT will be sorted by time stratumnum, regardless of their position (variable number) in the dataset, which I didn't want to make assumptions about. Sorting by time is crucial for the SERIES statement of the PROC SGPLOT step. The secondary sort key stratumnum determines the order of the treatment groups in the x-axis table, in the legend and also regarding color assignment. At least this is true for the simulated sample data. You could insert descending before stratumnum to switch treatment order.
I think the sort order of the remaining variables in KMSurvivalPlotOffType has no impact on the graph, so covering them by the abbreviation _ALL_ was a convenient way to have NODUPKEY remove duplicate observations without losing any combination of values.
... View more