I have transposed data that I'd like to recode into 5 separate columns. The issue is that the data within the columns are unordered. So I do end up with binary values 0/1, but they correspond to the wrong newly created variable. I would like for the field in the first columns (Col1-Col5) to correctly agree with the newly created variables (DX thru UN).
ID Col1 Col2 Col3 Col4 Col5
1 MH CARE SP CARE PR CARE
2 UN CARE DX CARE PR CARE UN CARE MH CARE
3 MH CARE UN CARE
4 . . . . .
5 UN CARE
want; set have;
array sc{5}$ col1-col5;
array pc{5} DX SP PR MH UN ;
do i = 1 to 5;
if sc(i)='DX CARE' then pc(i)=1; else pc(i)=0;
if sc(i)='SP CARE' then pc(i)=1; else pc(i)=0;
if sc(i)='PR CARE' then pc(i)=1; else pc(i)=0;
if sc(i)='MH CARE' then pc(i)=1; else pc(i)=0;
if sc(i)='UN CARE' then pc(i)=1; else pc(i)=0;
end;
drop i;
run;
My results which aren't correct:
ID DX SP PR MH UN
1 1 1 1 0 0
2 1 1 1 1 1
3 1 1 0 0 0
4 . . . . .
5 1 0 0 0 0
I'd like the output below:
ID DX SP PR MH UN
1 0 1 1 1 0
2 1 1 1 1 1
3 0 0 0 1 1
4 . . . . .
5 0 0 0 0 1
Another approach plus an actual data set to work with:
data have; infile datalines dlm='|' truncover; informat id best5. Col1 Col2 Col3 Col4 Col5 $10.; input ID Col1 Col2 Col3 Col4 Col5; datalines; 1|MH CARE|SP CARE|PR CARE 2|UN CARE|DX CARE|PR CARE|UN CARE|MH CARE 3|MH CARE|UN CARE 4 5|UN CARE ; run; data want; set have; array sc{5}$ col1-col5; array pc{5} DX SP PR MH UN ; array t{5} $10 _temporary_ ('DX CARE','SP CARE','PR CARE','MH CARE','UN CARE'); do i= 1 to dim(pc); pc[i]=0; end; do i= 1 to dim(sc); if not missing(sc[i]) then pc[whichc(sc[i],of t(*))]=1; end; keep id dx--un; run;
Think about yoru ELSE conditions.
If it's not found it's resetting it to 0, even if you've previously set it to 1...
I think this is what you want instead:
if sc(i) in ('DX CARE', 'SP CARE', 'PR CARE', 'MH CARE', 'UN CARE') then pc(i)=1;
else pc(i)=0;
@jsandsk wrote:
I have transposed data that I'd like to recode into 5 separate columns. The issue is that the data within the columns are unordered. So I do end up with binary values 0/1, but they correspond to the wrong newly created variable. I would like for the field in the first columns (Col1-Col5) to correctly agree with the newly created variables (DX thru UN).
ID Col1 Col2 Col3 Col4 Col5
1 MH CARE SP CARE PR CARE
2 UN CARE DX CARE PR CARE UN CARE MH CARE
3 MH CARE UN CARE
4 . . . . .
5 UN CARE
want; set have;
array sc{5}$ col1-col5;
array pc{5} DX SP PR MH UN ;
do i = 1 to 5;
if sc(i)='DX CARE' then pc(i)=1; else pc(i)=0;
if sc(i)='SP CARE' then pc(i)=1; else pc(i)=0;
if sc(i)='PR CARE' then pc(i)=1; else pc(i)=0;
if sc(i)='MH CARE' then pc(i)=1; else pc(i)=0;
if sc(i)='UN CARE' then pc(i)=1; else pc(i)=0;
end;
drop i;
run;
My results which aren't correct:
ID DX SP PR MH UN
1 1 1 1 0 0
2 1 1 1 1 1
3 1 1 0 0 0
4 . . . . .
5 1 0 0 0 0
I'd like the output below:
ID DX SP PR MH UN
1 0 1 1 1 0
2 1 1 1 1 1
3 0 0 0 1 1
4 . . . . .
5 0 0 0 0 1
Post your exact code and log, and if you can provide some sample data as text that would help to test it. See instructions below, only need a few rows.
Unsure what you mean because I did transpose the data. Each record contains a different amount of the values, some don't contain any. So I still need to get the correct corresponding values for the new variables.
Another approach plus an actual data set to work with:
data have; infile datalines dlm='|' truncover; informat id best5. Col1 Col2 Col3 Col4 Col5 $10.; input ID Col1 Col2 Col3 Col4 Col5; datalines; 1|MH CARE|SP CARE|PR CARE 2|UN CARE|DX CARE|PR CARE|UN CARE|MH CARE 3|MH CARE|UN CARE 4 5|UN CARE ; run; data want; set have; array sc{5}$ col1-col5; array pc{5} DX SP PR MH UN ; array t{5} $10 _temporary_ ('DX CARE','SP CARE','PR CARE','MH CARE','UN CARE'); do i= 1 to dim(pc); pc[i]=0; end; do i= 1 to dim(sc); if not missing(sc[i]) then pc[whichc(sc[i],of t(*))]=1; end; keep id dx--un; run;
something like this?
data want;
set have;
array sc{5}$ col1-col5;
array pc{5} DX SP PR MH UN ;
do i = 1 to 5;
if vname(pc(i)) in sc then pc(i)=1; else pc(i)=0;
end;
drop i;
run;
what exactly does 'vname' reference? The name of the new variable?
yes The name of the new variable in your second array
I was feeling lazy to test but here you go. I tested with your sample:
data have;
infile datalines truncover;
input ID Col1 & $10. Col2 & $10. Col3 & $10. Col4 & $10. Col5 & $10. ;
datalines;
1 MH CARE SP CARE PR CARE
2 UN CARE DX CARE PR CARE UN CARE MH CARE
3 MH CARE UN CARE
4 . . . . .
5 UN CARE
;
data want;
set have;
array sc{5}$ col1-col5;
array pc{5} DX SP PR MH UN ;
do i = 1 to 5;
if catx(' ',vname(pc(i)),'CARE') in sc then pc(i)=1; else pc(i)=0;
end;
drop i;
run;
This works, doesn't matter what the values are or how many. It removes duplicates as well.
data have;
infile datalines dlm='|' truncover;
informat id best5. Col1 Col2 Col3 Col4 Col5 $10.;
input ID Col1 Col2 Col3 Col4 Col5;
datalines;
1|MH CARE|SP CARE|PR CARE
2|UN CARE|DX CARE|PR CARE|UN CARE|MH CARE
3|MH CARE|UN CARE
4
5|UN CARE
;
run;
proc transpose data=have out=_temp;
by ID;
var col1-col5;
run;
data _temp2;
set _temp;
Value = 1;
Prefix = scan(col1, 1);
run;
*remove duplicates;
proc sort data=_temp2 nodupkey;
by id prefix;
run;
proc transpose data=_temp2 out=want;
by ID;
ID PREFIX;
Var Value;
run;
Why does record 2, get SP=1 when there's no SP CARE?
Or should Column 4 be SP CARE?
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.