BookmarkSubscribeRSS Feed
chavens
Fluorite | Level 6

Hi All,

 

I am trying to do survival analysis, where my outcome is "outcome=1". What I want to ultimately have happen is to have a time variable created, as defined by the time (I have a seperate time variable for each participant follow-up) that they develop "outcome=1". I already transposed the data into wide format and the maximum number of timepoints is 14 for any subject. Thus, there are 14 outcome variables (outcome1-outcome14). I dropped participants who had an outcome1=1.

 

The hard part here is that just because outcome2=1, outcome3 could also be equal to 1, as could outcome4 for the same participant. But, I only want the first occurence and to basically delete the rest of the data as it becomes useless. The code that I am using to try to accomplish this won't work. This code is only the first step, but if I get this to work, the rest should be pretty simple Any help would be much appreciated.


data ad47;
set ad46;
if outcome2=1 then outcomefinal2=1;
if outcomefinal2=1 then drop outcome3-outcome14;

if outcome3=1 then outcomefinal3=1;
if outcomefinal3=1 then drop outcome4-outcome14;

if outcome4=1 then outcomefinal4=1;
if outcomefinal4=1 then drop outcome5-outcome14;

if outcome5=1 then outcomefinal5=1;
if outcomefinal5=1 then drop outcome6-outcome14;

if outcome6=1 then outcomefinal6=1;
if outcomefinal6=1 then drop outcome7-outcome14;

if outcome7=1 then outcomefinal7=1;
if outcomefinal7=1 then drop outcome8-outcome14;

if outcome8=1 then outcomefinal8=1;
if outcomefinal8=1 then drop outcome9-outcome14;

if outcome9=1 then outcomefinal9=1;
if outcomefinal9=1 then drop outcome10-outcome14;

if outcome10=1 then outcomefinal10=1;
if outcomefinal10=1 then drop outcome11-outcome14;

if outcome11=1 then outcomefinal11=1;
if outcomefinal11=1 then drop outcome12-outcome14;

if outcome12=1 then outcomefinal2=1;
if outcomefinal12=1 then drop outcome13-outcome14;

if outcome13=1 then outcomefinal13=1;
if outcomefinal13=1 then drop outcome14;

if outcome14=1 then outcomefinal14=1;
/*time=max(of MosLater1 - MosLater14);*/
run;

 

This is an example of the error that I am getting...

 

1350 if outcome13=1 then outcomefinal13=1;
1351 if outcomefinal13=1 then drop outcome14;
----
180
ERROR 180-322: Statement is not valid or it is used out of proper order.

 

Thanks,

 

chavens

4 REPLIES 4
Reeza
Super User

That doesn't work, your dataset has to have the same number of variables for the entire dataset so you can't drop based on what one row values are when another row may provide contradictory instructions. 

 

Are you trying to set them to missing instead? Or determine the first variable that contains a 1?

 

If first use CALL MISSING() to set the remaining variables to missing.

If second, use WHICHN to find the first occurence. 

 

If you need further help consider posting sample data (input & output) that illustrates your issue instead of code. 

 

Patrick
Opal | Level 21

@chavens

Given that you've already asked similar questions earlier (but there it was in the end about picking the max time value): Why don't you post representative sample data (data step creating such "have" data) in your long format before you even transpose it and then explain us how the desired result set should look like (ideally also post a data step or a table which gives us the desired result for your sample "have" data).

 

You're telling us it's only part of the problem. I suggest that if possible explain us the full data manipulation logic you want to implement (show us the desired end result and explain the logic how to get there). From what you've posted and asked so far I'd assume that a lot of what you want to do can get combined into one or two data steps.

Kurt_Bremser
Super User

When posting log snippets, use the {i} button. This preserves formatting (the main posting window removes white space and therefore destroys the positioning information of the ERROR message).

 

And it is impossible to remove variables from certain observations only. The drop statement is a declarative statement that is interpreted once when the data step is compiled. Using it conditionally runs counter to the concept of datasets and the data step.

art297
Opal | Level 21

Sounds to me that you'd be best off incorporating BOTH of @Reeza's suggestions (i.e., use the whichn function to find the first outcome of 1, then use its location to set all remaining outcomes to missing. e.g.:

 

data have;
  input id outcome1-outcome14;
  datalines;
1 3 2 1 1 1 1 4 5 1 1 1 1 1 1
2 3 2 3 3 3 4 1 5 2 2 2 2 2 2
3 2 2 2 2 2 2 2 2 2 2 2 2 2 1
4 2 2 2 2 2 2 2 2 2 2 2 2 2 2
;

data want (drop=start i);
  set have;
  array outcomes(*) outcome1-outcome14;
  start=whichn(1,of outcomes(*));
  if 1 le start lt dim(outcomes) then do i=start+1 to dim(outcomes);
    call missing(outcomes(i));
  end;
run;

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1112 views
  • 0 likes
  • 5 in conversation