I have a set of indicators (1 or 0) for which i need to first identify all combinations and then create new variables indicating whether that combination is represented by each record (where representation is determined by 1 [versus 0]). Here's a small example using three variables.
data have; input meth_n rx_n nonrx_n; cards; 1 0 0 1 1 0 0 1 0 run;
What i want:
data want; set have; if meth_n = 1 and rx_n = 0 and nonrx_n = 0 then meth = 1; if meth_n = 1 and rx_n = 1 and nonrx_n = 0 then meth_rx = 1; if meth_n = 0 and rx_n = 1 and nonrx_n = 0 then rx = 1; if meth_n = 0 and rx_n = 1 and nonrx_n = 1 then rx_nonrx = 1; run;
Note that i need to represent all combinations with new variables, even if that combination happens to not be present in the data itself. Here is an attempt i made where i can get a single field to represent the variables that have a '1', but i need variables created with those names.
data try; set have; length combo_var $200; array sub(*) meth_n rx_n nonrx_n; do i=1 to dim(sub); if sub(i) = 1 then do; combo_var = catx("_", combo_var, vname(sub(i))); end; end; run;
Thanks for any help!
Are you trying to make flag variables that indicate, for example, that subject has METH and RX and no others?
If so a simplified SAS code that takes advantage of the binary nature of the variables would look like:
meth_rx_nonrx=meth and rx and nonrx ;
meth_rx=meth and rx and ^nonrx ;
meth_nonrx=meth and ^rx and nonrx ;
rx_nonrx=^meth and rx and nonrx ;
But if the goal is the make a flag that indicates the METH and RX combination, regardless of others then you would want something like this instead.
meth_rx_nonrx=meth and rx and nonrx ;
meth_rx=meth and rx ;
meth_nonrx=meth and nonrx ;
rx_nonrx=rx and nonrx ;
Are you asking how to do this?
data want;
do meth_n=0,1;
do rx_n=0,1;
do nnonrx_n=0,1;
output;
end;
end;
end;
run;
Hi Tom,
No, i'm saying that if i have this data in hand:
data have; input meth_n rx_n nonrx_n; cards; 1 0 0 1 1 0 0 1 0 run;
I want it to look like 'want' from this:
data want; set have; if meth_n = 1 and rx_n = 0 and nonrx_n = 0 then meth = 1; if meth_n = 1 and rx_n = 1 and nonrx_n = 0 then meth_rx = 1; if meth_n = 0 and rx_n = 1 and nonrx_n = 0 then rx = 1; if meth_n = 0 and rx_n = 1 and nonrx_n = 1 then rx_nonrx = 1; run;
But, this is a smaller example...i have more variables to consider...and it seems like there should be a more automated way to get at this
If you have more variables then you need to spell out the RULE for generating the output you want.
It is not obvious.
If the rule is to combine the variable names to make new variable names you will very soon make a string that is too long to use as a variable name.
If you just want to make a VARIABLE with the concatenated names then that is pretty simple.
You seem to be using only PART of the original variable names.
data have;
input meth_n rx_n nonrx_n;
cards;
1 0 0
1 1 0
0 1 0
;
data want;
set have;
array nvar _numeric_;
length combo $200 ;
do index=1 to dim(nvar);
if nvar[index] then combo=catx('_',combo,scan(vname(nvar[index]),1,'_'));
end;
drop index;
run;
proc print;
run;
OBS meth_n rx_n nonrx_n combo 1 1 0 0 meth 2 1 1 0 meth_rx 3 0 1 0 rx
IF (and it is a big if when you have a lot of variables) the COMBO values are not too large to use as a variable name you can probably just use PROC TRANSPOSE to help you.
data fix_have;
row+1;
set have;
run;
proc transpose data=fix_have out=step1;
by row;
run;
data step2;
set step1;
by row;
where col1=1 ;
length combo $200 ;
retain combo ;
if first.row then combo=' ';
combo=catx('_',combo,scan(_name_,1,'_'));
if last.row;
run;
proc transpose data=step2 out=want(drop=_name_);
by row;
id combo;
var col1;
run;
proc print;
run;
OBS row meth meth_rx rx 1 1 1 . . 2 2 . 1 . 3 3 . . 1
You can use PROC STDIZE to convert the missing values into zeros.
Hi Tom,
The solution you posted using transpose works great for something else i have been doing. So many thanks. Apologies for not making my example more clear from the start. I'll expand:
Each record/row represents a coalition in a community engaged in drug use prevention. The data represent their indication of whether they are currently engaged in efforts around that substance, where 1 = Yes and 0 = No. I have to report out the frequency/prevalence of each combination of substances that coalitions are engaged in.
data have;
input row meth rx nonrx;
cards;
1 1 0 0
2 1 1 0
3 0 1 0
;
run;
proc print; run;
So, the first coalition is working on meth only, the second on meth and prescription drugs, and the third on prescription drugs only. Note that nobody is working on non-rx (non-prescription opioids), among other combinations which i also need to report out (meaning i need to know that no coalitions are working on those combinations). So manually, i was accomplish creating the indicators this way:
data want; set have; meth=0; meth_rx=0; meth_nonrx=0; meth_rx_nonrx=0; rx=0; rx_nonrx=0; nonrx=0; if meth = 1 and rx = 0 and nonrx = 0 then meth = 1; if meth = 1 and rx = 1 and nonrx = 0 then meth_rx = 1; if meth = 1 and rx = 0 and nonrx = 1 then meth_nonrx = 1; if meth = 1 and rx = 1 and nonrx = 1 then meth_rx_nonrx = 1; if meth = 0 and rx = 1 and nonrx = 0 then rx = 1; if meth = 0 and rx = 1 and nonrx = 1 then rx_nonrx = 1; if meth = 0 and rx = 0 and nonrx = 1 then nonrx = 1; run; proc print; run;
meth_ meth_rx_ Obs row meth_n rx_n nonrx_n meth meth_rx nonrx nonrx rx rx_nonrx nonrx 1 1 1 0 0 0 0 0 0 0 0 0 2 2 1 1 0 0 0 0 0 0 0 0 3 3 0 1 0 0 0 0 0 0 0 0
Currently I have two more substances to incorporate and (sadly) we add more substances from time to time. Right now, i can create the variables and remain under 32. It seems like i should be able to use two arrays where the 2nd is one variable ahead...but i just can't wrap my head around the creation of the variable name.
Are you trying to make flag variables that indicate, for example, that subject has METH and RX and no others?
If so a simplified SAS code that takes advantage of the binary nature of the variables would look like:
meth_rx_nonrx=meth and rx and nonrx ;
meth_rx=meth and rx and ^nonrx ;
meth_nonrx=meth and ^rx and nonrx ;
rx_nonrx=^meth and rx and nonrx ;
But if the goal is the make a flag that indicates the METH and RX combination, regardless of others then you would want something like this instead.
meth_rx_nonrx=meth and rx and nonrx ;
meth_rx=meth and rx ;
meth_nonrx=meth and nonrx ;
rx_nonrx=rx and nonrx ;
Hi Tom,
OK, this is effectively what i have on my end. I thought maybe there was a more data-driven/automated way to identify the combinations without having to type out the various yes-no setups to get there. I'm going to mark your latest as the accepted solution. Thanks for the guidance!
Jason
So use the ALLCOMB function to get all combinations of 2 or more variables.
%let names='meth' 'rx' 'nonrx';
%let n=%sysfunc(countw(&names,%str( )));
data combo ;
array name[&n] $8 (&names);
n=dim(name);
do k=2 to n;
ncomb=comb(n, k);
do j=1 to ncomb;
rc=allcomb(j, k, of name[*]);
output;
end;
end;
drop rc j;
run;
Result
OBS name1 name2 name3 n k ncomb 1 meth rx nonrx 3 2 3 2 meth nonrx rx 3 2 3 3 rx nonrx meth 3 2 3 4 rx nonrx meth 3 3 1
Then use that to generate your assignment statements.
filename code temp;
data _null_;
file code;
set combo ;
array name name: ;
do i=1 to k ;
put name[i] @;
if i < k then put +(-1) '_' @;
end;
put '= ' @ ;
do i=1 to n ;
if i>k then put '^' @ ;
put name[i] @ ;
if i < n then put 'and ' @;
end;
put ';' ;
run;
Result:
meth_rx = meth and rx and ^nonrx ; meth_nonrx = meth and nonrx and ^rx ; rx_nonrx = rx and nonrx and ^meth ; rx_nonrx_meth = rx and nonrx and meth ;
And you can then copy and paste it or use %INCLUDE to add to a data step.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.