I have a data set which contains patients and the list of their procedures, but some of a patient's first procedure codes are missing. Is there a way to shift the data over?
So the data looks like this:
Patient ID Proc1 Proc2 Proc3 Proc4
12345 42.1 56.2 22.1 33.4
23456 42.1 56.2 22.1 33.4
34567 . . 22.1 33.4
And I want the data for the last patient Id to have their procedure codes shift over so that the first procedure codes are not missing:
Patient ID Proc1 Proc2 Proc3 Proc4
12345 42.1 56.2 22.1 33.4
23456 42.1 56.2 22.1 33.4
34567 22.1 33.4
Does anyone know how to do this?
Thanks!
Is there any additional logic that needs to be considered?
Does it need to be a straight reassignment, ie proc3 to proc1 or can proc4 go to proc1?
Can you have all missing values?
Yes, whatever the first non-missing value is should be reassigned to the first missing value. So yes, if proc1-proc3 are missing, but proc4-proc6 are not then proc4 becomes proc1 and proc5 becomes proc2, etc.
And yes, there are patients with all missing values in which case nothing happens to them.
This won't work if you have skips - ie proc1 has a value, proc2 is missing, proc3 has a value.
Otherwise this is simple brute force method to move the values using an array.
data have;
infile cards truncover;
input Patient_ID $ Proc1 Proc2 Proc3 Proc4;
cards;
12345 42.1 56.2 22.1 33.4
23456 42.1 56.2 22.1 33.4
34567 . . 22.1 33.4
;
run;
data want;
set have;
array proc(4) proc1-proc4;
do i=1 to 4;
if proc(i) ne . then leave;
end;
*move values to missing records;
if i ne 1 then do j=1 to dim(proc)-i+1;
proc(j)=proc(i);
proc(i)=.;
i=i+1;
end;
drop i j;
run;
Ah! This worked! Thanks!
It's good that PROCi are character variables because this avoids numeric representation issues. I assume that you adapted Reeza's code correspondingly (i.e. used ' ' in place of . to denote missing values) so as to avoid type conversion messages in the log and values such as ' .' instead of blank values.
Are your variables character or numeric? (It's not going to be that difficult, but it's easier to do once than twice.)
Variables are character.
data have;
infile cards truncover;
input Patient_ID $ Proc1 Proc2 Proc3 Proc4;
cards;
12345 42.1 56.2 22.1 33.4
23456 42.1 56.2 22.1 33.4
34567 . . 22.1 33.4
;
run;
data want;
set have;
array p{*} proc1-proc4;
array x{*} new1-new4;
n=0;
do i=1 to dim(p);
if not missing(p{i}) then do;
n+1;x{n}=p{i};
end;
end;
do i=1 to dim(p);
p{i}=x{i};
end;
drop i n new1-new4;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.