DATA Step, Macro, Functions and more

Conditionally Dropping variables

Occasional Contributor
Posts: 13

Conditionally Dropping variables

Hi All,


I am trying to do survival analysis, where my outcome is "outcome=1". What I want to ultimately have happen is to have a time variable created, as defined by the time (I have a seperate time variable for each participant follow-up) that they develop "outcome=1". I already transposed the data into wide format and the maximum number of timepoints is 14 for any subject. Thus, there are 14 outcome variables (outcome1-outcome14). I dropped participants who had an outcome1=1.


The hard part here is that just because outcome2=1, outcome3 could also be equal to 1, as could outcome4 for the same participant. But, I only want the first occurence and to basically delete the rest of the data as it becomes useless. The code that I am using to try to accomplish this won't work. This code is only the first step, but if I get this to work, the rest should be pretty simple Any help would be much appreciated.

data ad47;
set ad46;
if outcome2=1 then outcomefinal2=1;
if outcomefinal2=1 then drop outcome3-outcome14;

if outcome3=1 then outcomefinal3=1;
if outcomefinal3=1 then drop outcome4-outcome14;

if outcome4=1 then outcomefinal4=1;
if outcomefinal4=1 then drop outcome5-outcome14;

if outcome5=1 then outcomefinal5=1;
if outcomefinal5=1 then drop outcome6-outcome14;

if outcome6=1 then outcomefinal6=1;
if outcomefinal6=1 then drop outcome7-outcome14;

if outcome7=1 then outcomefinal7=1;
if outcomefinal7=1 then drop outcome8-outcome14;

if outcome8=1 then outcomefinal8=1;
if outcomefinal8=1 then drop outcome9-outcome14;

if outcome9=1 then outcomefinal9=1;
if outcomefinal9=1 then drop outcome10-outcome14;

if outcome10=1 then outcomefinal10=1;
if outcomefinal10=1 then drop outcome11-outcome14;

if outcome11=1 then outcomefinal11=1;
if outcomefinal11=1 then drop outcome12-outcome14;

if outcome12=1 then outcomefinal2=1;
if outcomefinal12=1 then drop outcome13-outcome14;

if outcome13=1 then outcomefinal13=1;
if outcomefinal13=1 then drop outcome14;

if outcome14=1 then outcomefinal14=1;
/*time=max(of MosLater1 - MosLater14);*/


This is an example of the error that I am getting...


1350 if outcome13=1 then outcomefinal13=1;
1351 if outcomefinal13=1 then drop outcome14;
ERROR 180-322: Statement is not valid or it is used out of proper order.





Super User
Posts: 17,784

Re: Conditionally Dropping variables

That doesn't work, your dataset has to have the same number of variables for the entire dataset so you can't drop based on what one row values are when another row may provide contradictory instructions. 


Are you trying to set them to missing instead? Or determine the first variable that contains a 1?


If first use CALL MISSING() to set the remaining variables to missing.

If second, use WHICHN to find the first occurence. 


If you need further help consider posting sample data (input & output) that illustrates your issue instead of code. 


Respected Advisor
Posts: 3,887

Re: Conditionally Dropping variables

[ Edited ]


Given that you've already asked similar questions earlier (but there it was in the end about picking the max time value): Why don't you post representative sample data (data step creating such "have" data) in your long format before you even transpose it and then explain us how the desired result set should look like (ideally also post a data step or a table which gives us the desired result for your sample "have" data).


You're telling us it's only part of the problem. I suggest that if possible explain us the full data manipulation logic you want to implement (show us the desired end result and explain the logic how to get there). From what you've posted and asked so far I'd assume that a lot of what you want to do can get combined into one or two data steps.

Super User
Posts: 6,928

Re: Conditionally Dropping variables

When posting log snippets, use the {i} button. This preserves formatting (the main posting window removes white space and therefore destroys the positioning information of the ERROR message).


And it is impossible to remove variables from certain observations only. The drop statement is a declarative statement that is interpreted once when the data step is compiled. Using it conditionally runs counter to the concept of datasets and the data step.

Maxims of Maximally Efficient SAS Programmers
Posts: 7,360

Re: Conditionally Dropping variables

Sounds to me that you'd be best off incorporating BOTH of @Reeza's suggestions (i.e., use the whichn function to find the first outcome of 1, then use its location to set all remaining outcomes to missing. e.g.:


data have;
  input id outcome1-outcome14;
1 3 2 1 1 1 1 4 5 1 1 1 1 1 1
2 3 2 3 3 3 4 1 5 2 2 2 2 2 2
3 2 2 2 2 2 2 2 2 2 2 2 2 2 1
4 2 2 2 2 2 2 2 2 2 2 2 2 2 2

data want (drop=start i);
  set have;
  array outcomes(*) outcome1-outcome14;
  start=whichn(1,of outcomes(*));
  if 1 le start lt dim(outcomes) then do i=start+1 to dim(outcomes);
    call missing(outcomes(i));

Art, CEO,


Ask a Question
Discussion stats
  • 4 replies
  • 5 in conversation