Hi,
I have a dataset with records of patients and their diagnoses. Here is a simplified version:
data patients ;
infile datalines dsd delimiter=' ';
input patientID $ year $ diagA $ diagB $ diagC $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . . 1
1 2014 . . .
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 . . .
;
run;
What code can I use to make sure that once a patient has received a diagnosis, a value of 1 is inserted under that variable for each subsequent observation? To illustrate, I want this:
data patients2 ;
infile datalines dsd delimiter=' ';
input patientID $ year $ diagA $ diagB $ diagC $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . 1 1
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 1 1 .
;
run;
Your want doesn't look right to me.
data patients;
input patientID $ year $ diagA $ diagB $ diagC $;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . . 1
1 2014 . . .
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 . . .
;;;;
run;
data patients;
update patients(obs=0) patients;
by patientid;
output;
run;
proc print;
run;
Bit verbose, am on a meeting so cant think properly:
data want (drop=lstdiag:);
set patients;
retain lstdiaga lstdiagb lstdiagc;
if diaga ne "" then lstdiaga=diaga;
if diagb ne "" then lstdiagb=diagb;
if diagc ne "" then lstdiagc=diagc;
diaga=coalescec(diaga,lstdiaga);
diagb=coalescec(diagb,lstdiagb);
diagc=coalescec(diagc,lstdiagc);
run;
@udden2903 wrote:
Thank you for your help, I can see the logic in your code. The problem is that I have more than 10 diagnosis variables, and I am reluctant to write one line of code for each one of them. Do you know if there's any way of working with arrays to get around this?
@data_null__'s code will take care of that quite nicely, as it lets SAS do it automatically.
Why do you have more than 10 diagnosis variables? If you normalise your data you will find programming is far easier:
PatientId Diag_No Diag
1 1 xyz
1 2 zyf
You can do it with arrays of course, but it just makes programming more tricky:
%let elements=3; data patients ; infile datalines dsd delimiter=' '; input patientID $ year $ diagA $ diagB $ diagC $ ; datalines; 1 2010 . . . 1 2011 . 1 . 1 2012 . . 1 1 2014 . . . 2 2009 1 . . 2 2010 1 . . 2 2013 . 1 . 2 2015 . . . ; run; data want (drop=i ret:); set patients; array ret{&elements.} $3; array act{&elements.} diag:; retain ret:; do i=1 to &elements.; if act{i} ne "" then ret{i}=act{i}; act{i}=coalescec(act{i},ret{i}); end; run;
No, you can do any processing you can do with tranposed data with normalised data. The only difference is you don't need to know how many observations up front as the structure does not change, only the amount of data to process. With transposed data, the strcuture changes with more or fewer data elements, making programming more difficult.
A slight variation on @RW9's logic:
%macro checkvar(variable);
retain old&variable;
if not first.patientid and old&variable ne '' then &variable = old&variable;
old&variable = &variable;
drop old&variable;
%mend;
data patients2;
set patients;
by patientid;
%checkvar(diaga);
%checkvar(diagb);
%checkvar(diagc);
run;
Your want doesn't look right to me.
data patients;
input patientID $ year $ diagA $ diagB $ diagC $;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . . 1
1 2014 . . .
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 . . .
;;;;
run;
data patients;
update patients(obs=0) patients;
by patientid;
output;
run;
proc print;
run;
This version of the update method unsures that only variables of interest "DIAG:" are carried forward.
data patients;
input patientID $ year $ diagA $ diagB $ diagC $;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . . 1
1 2014 . . .
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 . . .
;;;;
run;
%let locf=patientid diag:;
data patients;
if 0 then set patients;
update patients(obs=0 keep=&locf) patients(keep=&locf);
by patientid;
set patients(drop=&locf);
output;
run;
proc print;
run;
I can't get this code to work. In my actual dataset, the diagnoses have names like "L40" and "K50", so I would have to change this part of your code:
%let locf=patientid diag:;
I tested it for the L40 diagnosis, using (%let locf=patientid L:; ), but it still does not work.
You need to mention all relevant details when your post. If your names are not DIAG: as you implied then obviously DIAG: will not work for you.
If you were to define an array of the diagnosis variables then how would you do that?
Would you use a name range list? You can always list the individual names. Or use some combination of "SAS Variable Lists" and names etc.
You need to learn about the "SAS Variable List".
I implemented your code, replacing diag: with K: L: M:
You are right, I shouldn't have named my diagnosis variables DiagA, DiagB and DiagC. I was simply trying to make the problem more straightforward to the viewer, but I will avoid making such changes in the future.
Would you mind quickly explaining what your code does? I'm new to macros and I think gaining some understanding of your code would be helpful as I start learning more about them.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.