Solved: Re: Making dummy values continue into subsequent observations

udden2903 · Posted 04-12-2016 07:50 AM

Hi,

I have a dataset with records of patients and their diagnoses. Here is a simplified version:

data patients ;
infile datalines dsd delimiter=' '; 
input patientID $ year $ diagA $ diagB $ diagC $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . . 1
1 2014 . . .
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 . . .
;
run;

What code can I use to make sure that once a patient has received a diagnosis, a value of 1 is inserted under that variable for each subsequent observation? To illustrate, I want this:

data patients2 ;
infile datalines dsd delimiter=' '; 
input patientID $ year $ diagA $ diagB $ diagC $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . 1 1
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 1 1 .
;
run;

data_null__ · Posted 04-12-2016 08:44 AM

Your want doesn't look right to me.

data patients; 
   input patientID $ year $ diagA $ diagB $ diagC $; 
   datalines; 
1 2010 . . . 
1 2011 . 1 . 
1 2012 . . 1 
1 2014 . . . 
2 2009 1 . . 
2 2010 1 . . 
2 2013 . 1 . 
2 2015 . . . 
;;;; 
   run;
data patients;
   update patients(obs=0) patients;
   by patientid;
   output;
   run;
proc print;
   run;

View solution in original post

RW9 · Posted 04-12-2016 08:05 AM

Bit verbose, am on a meeting so cant think properly:

data want (drop=lstdiag:);
 set patients;
 retain lstdiaga lstdiagb lstdiagc;
 if diaga ne "" then lstdiaga=diaga;
 if diagb ne "" then lstdiagb=diagb;
 if diagc ne "" then lstdiagc=diagc; 
 diaga=coalescec(diaga,lstdiaga);
 diagb=coalescec(diagb,lstdiagb);
 diagc=coalescec(diagc,lstdiagc);
run;

udden2903 · Posted 04-12-2016 09:05 AM

Thank you for your help, I can see the logic in your code. The problem is that I have more than 10 diagnosis variables, and I am reluctant to write one line of code for each one of them. Do you know if there's any way of working with arrays to get around this?

Kurt_Bremser · Posted 04-12-2016 09:14 AM

@udden2903 wrote:
Thank you for your help, I can see the logic in your code. The problem is that I have more than 10 diagnosis variables, and I am reluctant to write one line of code for each one of them. Do you know if there's any way of working with arrays to get around this?

@data_null__'s code will take care of that quite nicely, as it lets SAS do it automatically.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 04-12-2016 09:20 AM

Why do you have more than 10 diagnosis variables? If you normalise your data you will find programming is far easier:

PatientId Diag_No Diag

1 1 xyz

1 2 zyf

You can do it with arrays of course, but it just makes programming more tricky:

%let elements=3;

data patients ; 
  infile datalines dsd delimiter=' '; 
  input patientID $ year $ diagA $ diagB $ diagC $ ; 
datalines; 
1 2010 . . . 
1 2011 . 1 . 
1 2012 . . 1 
1 2014 . . . 
2 2009 1 . . 
2 2010 1 . . 
2 2013 . 1 . 
2 2015 . . . 
; 
run;

data want (drop=i ret:);
  set patients;
  array ret{&elements.} $3;
  array act{&elements.} diag:;
  retain ret:;
  do i=1 to &elements.;
    if act{i} ne "" then ret{i}=act{i};
    act{i}=coalescec(act{i},ret{i});
  end;
run;

udden2903 · Posted 04-12-2016 09:27 AM

A patient can have many different combinations of diagnoses over time. Concatenating the diagnosis dummy variables into one single variable will not be helpful as I want to know, for each observation, all of the diagnoses the patient has had up until that point. I think I would miss that with your solution...

RW9 · Posted 04-12-2016 09:36 AM

No, you can do any processing you can do with tranposed data with normalised data. The only difference is you don't need to know how many observations up front as the structure does not change, only the amount of data to process. With transposed data, the strcuture changes with more or fewer data elements, making programming more difficult.

Kurt_Bremser · Posted 04-12-2016 08:42 AM

A slight variation on @RW9's logic:

%macro checkvar(variable);
retain old&variable;
if not first.patientid and old&variable ne '' then &variable = old&variable;
old&variable = &variable;
drop old&variable;
%mend;

data patients2;
set patients;
by patientid;
%checkvar(diaga);
%checkvar(diagb);
%checkvar(diagc);
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

udden2903 · Posted 04-12-2016 09:18 AM

Is there a way of using the retain statement with an array statement to solve my problem? I have more than 10 diagnoses in my original dataset and I would like to make the code as efficient as possible by invoking something like an array statement.

data_null__ · Posted 04-12-2016 08:44 AM

Your want doesn't look right to me.

data patients; 
   input patientID $ year $ diagA $ diagB $ diagC $; 
   datalines; 
1 2010 . . . 
1 2011 . 1 . 
1 2012 . . 1 
1 2014 . . . 
2 2009 1 . . 
2 2010 1 . . 
2 2013 . 1 . 
2 2015 . . . 
;;;; 
   run;
data patients;
   update patients(obs=0) patients;
   by patientid;
   output;
   run;
proc print;
   run;

Kurt_Bremser · Posted 04-12-2016 08:57 AM

Creative! Like it, @data_null__

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

data_null__ · Posted 04-12-2016 09:36 AM

This version of the update method unsures that only variables of interest "DIAG:" are carried forward.

data patients; 
   input patientID $ year $ diagA $ diagB $ diagC $; 
   datalines; 
1 2010 . . . 
1 2011 . 1 . 
1 2012 . . 1 
1 2014 . . . 
2 2009 1 . . 
2 2010 1 . . 
2 2013 . 1 . 
2 2015 . . . 
;;;; 
   run;
%let locf=patientid diag:;
data patients;
   if 0 then set patients;
   update patients(obs=0 keep=&locf) patients(keep=&locf);
   by patientid;
   set patients(drop=&locf);
   output;
   run;
proc print;
   run;

udden2903 · Posted 04-12-2016 09:53 AM

I can't get this code to work. In my actual dataset, the diagnoses have names like "L40" and "K50", so I would have to change this part of your code:

%let locf=patientid diag:;

I tested it for the L40 diagnosis, using (%let locf=patientid L:; ), but it still does not work.

data_null__ · Posted 04-12-2016 10:21 AM

You need to mention all relevant details when your post. If your names are not DIAG: as you implied then obviously DIAG: will not work for you.

If you were to define an array of the diagnosis variables then how would you do that?

Would you use a name range list? You can always list the individual names. Or use some combination of "SAS Variable Lists" and names etc.

You need to learn about the "SAS Variable List".

udden2903 · Posted 04-14-2016 01:11 AM

I implemented your code, replacing diag: with K: L: M:

You are right, I shouldn't have named my diagnosis variables DiagA, DiagB and DiagC. I was simply trying to make the problem more straightforward to the viewer, but I will avoid making such changes in the future.

Would you mind quickly explaining what your code does? I'm new to macros and I think gaining some understanding of your code would be helpful as I start learning more about them.

SAS Training: Just a Click Away