BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
udden2903
Obsidian | Level 7

Hi,

 

I have a dataset with records of patients and their diagnoses. Here is a simplified version:

 

data patients ;
infile datalines dsd delimiter=' '; 
input patientID $ year $ diagA $ diagB $ diagC $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . . 1
1 2014 . . .
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 . . .
;
run;

What code can I use to make sure that once a patient has received a diagnosis, a value of 1 is inserted under that variable for each subsequent observation? To illustrate, I want this:

 

 

 

data patients2 ;
infile datalines dsd delimiter=' '; 
input patientID $ year $ diagA $ diagB $ diagC $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . 1 1
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 1 1 .
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Your want doesn't look right to me.

 

data patients; 
   input patientID $ year $ diagA $ diagB $ diagC $; 
   datalines; 
1 2010 . . . 
1 2011 . 1 . 
1 2012 . . 1 
1 2014 . . . 
2 2009 1 . . 
2 2010 1 . . 
2 2013 . 1 . 
2 2015 . . . 
;;;; 
   run;
data patients;
   update patients(obs=0) patients;
   by patientid;
   output;
   run;
proc print;
   run;

Capture.PNG

View solution in original post

19 REPLIES 19
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Bit verbose, am on a meeting so cant think properly:

data want (drop=lstdiag:);
set patients;
retain lstdiaga lstdiagb lstdiagc;
if diaga ne "" then lstdiaga=diaga;
if diagb ne "" then lstdiagb=diagb;
if diagc ne "" then lstdiagc=diagc;
diaga=coalescec(diaga,lstdiaga);
diagb=coalescec(diagb,lstdiagb);
diagc=coalescec(diagc,lstdiagc);
run;
udden2903
Obsidian | Level 7
Thank you for your help, I can see the logic in your code. The problem is that I have more than 10 diagnosis variables, and I am reluctant to write one line of code for each one of them. Do you know if there's any way of working with arrays to get around this?
Kurt_Bremser
Super User

@udden2903 wrote:
Thank you for your help, I can see the logic in your code. The problem is that I have more than 10 diagnosis variables, and I am reluctant to write one line of code for each one of them. Do you know if there's any way of working with arrays to get around this?

@data_null__'s code will take care of that quite nicely, as it lets SAS do it automatically.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why do you have more than 10 diagnosis variables?  If you normalise your data you will find programming is far easier:

PatientId    Diag_No    Diag

1               1               xyz

1               2               zyf

 

You can do it with arrays of course, but it just makes programming more tricky:

%let elements=3;

data patients ; 
  infile datalines dsd delimiter=' '; 
  input patientID $ year $ diagA $ diagB $ diagC $ ; 
datalines; 
1 2010 . . . 
1 2011 . 1 . 
1 2012 . . 1 
1 2014 . . . 
2 2009 1 . . 
2 2010 1 . . 
2 2013 . 1 . 
2 2015 . . . 
; 
run;

data want (drop=i ret:);
  set patients;
  array ret{&elements.} $3;
  array act{&elements.} diag:;
  retain ret:;
  do i=1 to &elements.;
    if act{i} ne "" then ret{i}=act{i};
    act{i}=coalescec(act{i},ret{i});
  end;
run;  

 

udden2903
Obsidian | Level 7
A patient can have many different combinations of diagnoses over time. Concatenating the diagnosis dummy variables into one single variable will not be helpful as I want to know, for each observation, all of the diagnoses the patient has had up until that point. I think I would miss that with your solution...
RW9
Diamond | Level 26 RW9
Diamond | Level 26

No, you can do any processing you can do with tranposed data with normalised data.  The only difference is you don't need to know how many observations up front as the structure does not change, only the amount of data to process.  With transposed data, the strcuture changes with more or fewer data elements, making programming more difficult.

 

Kurt_Bremser
Super User

A slight variation on @RW9's logic:

%macro checkvar(variable);
retain old&variable;
if not first.patientid and old&variable ne '' then &variable = old&variable;
old&variable = &variable;
drop old&variable;
%mend;

data patients2;
set patients;
by patientid;
%checkvar(diaga);
%checkvar(diagb);
%checkvar(diagc);
run;
udden2903
Obsidian | Level 7
Is there a way of using the retain statement with an array statement to solve my problem? I have more than 10 diagnoses in my original dataset and I would like to make the code as efficient as possible by invoking something like an array statement.
data_null__
Jade | Level 19

Your want doesn't look right to me.

 

data patients; 
   input patientID $ year $ diagA $ diagB $ diagC $; 
   datalines; 
1 2010 . . . 
1 2011 . 1 . 
1 2012 . . 1 
1 2014 . . . 
2 2009 1 . . 
2 2010 1 . . 
2 2013 . 1 . 
2 2015 . . . 
;;;; 
   run;
data patients;
   update patients(obs=0) patients;
   by patientid;
   output;
   run;
proc print;
   run;

Capture.PNG

data_null__
Jade | Level 19

This version of the update method unsures that only variables of interest "DIAG:" are carried forward.

 

data patients; 
   input patientID $ year $ diagA $ diagB $ diagC $; 
   datalines; 
1 2010 . . . 
1 2011 . 1 . 
1 2012 . . 1 
1 2014 . . . 
2 2009 1 . . 
2 2010 1 . . 
2 2013 . 1 . 
2 2015 . . . 
;;;; 
   run;
%let locf=patientid diag:;
data patients;
   if 0 then set patients;
   update patients(obs=0 keep=&locf) patients(keep=&locf);
   by patientid;
   set patients(drop=&locf);
   output;
   run;
proc print;
   run;

 

udden2903
Obsidian | Level 7

I can't get this code to work. In my actual dataset, the diagnoses have names like "L40" and "K50", so I would have to change this part of your code:

%let locf=patientid diag:;

I tested it for the L40 diagnosis, using (%let locf=patientid L:; ), but it still does not work.

data_null__
Jade | Level 19

You need to mention all relevant details when your post.  If your names are not DIAG: as you implied then obviously DIAG: will not work for you.

 

If you were to define an array of the diagnosis variables then how would you do that?

Would you use a name range list?  You can always list the individual names.  Or use some combination of "SAS Variable Lists" and names etc.

 

You need to learn about the "SAS Variable List".  

 

 

udden2903
Obsidian | Level 7

I implemented your code, replacing diag: with K: L: M:

 

You are right, I shouldn't have named my diagnosis variables DiagA, DiagB and DiagC. I was simply trying to make the problem more straightforward to the viewer, but I will avoid making such changes in the future.

 

Would you mind quickly explaining what your code does? I'm new to macros and I think gaining some understanding of your code would be helpful as I start learning more about them.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 2724 views
  • 3 likes
  • 5 in conversation