BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jasberger
Calcite | Level 5

I have a set of indicators (1 or 0) for which i need to first identify all combinations and then create new variables indicating whether that combination is represented by each record (where representation is determined by 1 [versus 0]).  Here's a small example using three variables.

data have;
	input meth_n rx_n nonrx_n;
	cards;
	1 0 0
	1 1 0
	0 1 0
	run;

What i want:

data want;
	set have;
	if meth_n = 1 and rx_n = 0 and nonrx_n = 0 then meth = 1;
	if meth_n = 1 and rx_n = 1 and nonrx_n = 0 then meth_rx = 1;
	if meth_n = 0 and rx_n = 1 and nonrx_n = 0 then rx = 1;
	if meth_n = 0 and rx_n = 1 and nonrx_n = 1 then rx_nonrx = 1;
	run;

Note that i need to represent all combinations with new variables, even if that combination happens to not be present in the data itself. Here is an attempt i made where i can get a single field to represent the variables that have a '1', but i need variables created with those names.  

data try; set have; length combo_var $200;
	array sub(*) meth_n rx_n nonrx_n;
	do i=1 to dim(sub);
		if sub(i) = 1 then do;
      		combo_var = catx("_", combo_var, vname(sub(i)));
	end; end;
	run;

Thanks for any help!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Are you trying to make flag variables that indicate, for example, that subject has METH and RX and no others?

If so a simplified SAS code that takes advantage of the binary nature of the variables would look like:

meth_rx_nonrx=meth and rx and nonrx ;
meth_rx=meth and rx and ^nonrx ;
meth_nonrx=meth and ^rx and nonrx ;
rx_nonrx=^meth and rx and nonrx ;

But if the goal is the make a flag that indicates the METH and RX combination, regardless of others then you would want something like this instead.

meth_rx_nonrx=meth and rx and nonrx ;
meth_rx=meth and rx ;
meth_nonrx=meth and nonrx ;
rx_nonrx=rx and nonrx ;

View solution in original post

8 REPLIES 8
Tom
Super User Tom
Super User

Are you asking how to do this?

data want;
do meth_n=0,1;
do rx_n=0,1;
do nnonrx_n=0,1;
  output;
end;
end;
end;
run;
jasberger
Calcite | Level 5

Hi Tom,

 

No, i'm saying that if i have this data in hand:

data have;
	input meth_n rx_n nonrx_n;
	cards;
	1 0 0
	1 1 0
	0 1 0
	run;

I want it to look like 'want' from this:

data want;
	set have;
	if meth_n = 1 and rx_n = 0 and nonrx_n = 0 then meth = 1;
	if meth_n = 1 and rx_n = 1 and nonrx_n = 0 then meth_rx = 1;
	if meth_n = 0 and rx_n = 1 and nonrx_n = 0 then rx = 1;
	if meth_n = 0 and rx_n = 1 and nonrx_n = 1 then rx_nonrx = 1;
	run;

But, this is a smaller example...i have more variables to consider...and it seems like there should be a more automated way to get at this

Tom
Super User Tom
Super User

If you have more variables then you need to spell out the RULE for generating the output you want.

It is not obvious.

If the rule is to combine the variable names to make new variable names you will very soon make a string that is too long to use as a variable name.

 

If you just want to make a VARIABLE with the concatenated names then that is pretty simple.

You seem to be using only PART of the original variable names.

data have;
  input meth_n rx_n nonrx_n;
cards;
1 0 0
1 1 0
0 1 0
;

data want;
  set have;
  array nvar _numeric_;
  length combo $200 ;
  do index=1 to dim(nvar);
    if nvar[index] then combo=catx('_',combo,scan(vname(nvar[index]),1,'_'));
  end;
  drop index;
run;

proc print;
run;
OBS    meth_n    rx_n    nonrx_n    combo

 1        1        0        0       meth
 2        1        1        0       meth_rx
 3        0        1        0       rx


 

Tom
Super User Tom
Super User

IF (and it is a big if when you have a lot of variables) the COMBO values are not too large to use as a variable name you can probably just use PROC TRANSPOSE to help you.

data fix_have;
  row+1;
  set have;
run;

proc transpose data=fix_have out=step1;
  by row;
run;

data step2;
  set step1;
  by row;
  where col1=1 ;
  length combo $200 ;
  retain combo ;
  if first.row then combo=' ';
  combo=catx('_',combo,scan(_name_,1,'_'));
  if last.row;
run;

proc transpose data=step2 out=want(drop=_name_);
  by row;
  id combo;
  var col1;
run;

proc print;
run;
OBS    row    meth    meth_rx    rx

 1      1       1        .        .
 2      2       .        1        .
 3      3       .        .        1

You can use PROC STDIZE to convert the missing values into zeros.

jasberger
Calcite | Level 5

Hi Tom,

The solution you posted using transpose works great for something else i have been doing.  So many thanks.  Apologies for not making my example more clear from the start. I'll expand:

Each record/row represents a coalition in a community engaged in drug use prevention. The data represent their indication of whether they are currently engaged in efforts around that substance, where 1 = Yes and 0 = No. I have to report out the frequency/prevalence of each combination of substances that coalitions are engaged in. 

data have;
input row meth rx nonrx;
cards;
1 1 0 0
2 1 1 0
3 0 1 0
;
run;
proc print; run;

So, the first coalition is working on meth only, the second on meth and prescription drugs, and the third on prescription drugs only.  Note that nobody is working on non-rx (non-prescription opioids), among other combinations which i also need to report out (meaning i need to know that no coalitions are working on those combinations).  So manually, i was accomplish creating the indicators this way:

data want;
	set have;
	meth=0; meth_rx=0; meth_nonrx=0; meth_rx_nonrx=0; rx=0; rx_nonrx=0; nonrx=0;
	if meth = 1 and rx = 0 and nonrx = 0 then meth = 1;
	if meth = 1 and rx = 1 and nonrx = 0 then meth_rx = 1;
	if meth = 1 and rx = 0 and nonrx = 1 then meth_nonrx = 1;
	if meth = 1 and rx = 1 and nonrx = 1 then meth_rx_nonrx = 1;
	if meth = 0 and rx = 1 and nonrx = 0 then rx = 1;
	if meth = 0 and rx = 1 and nonrx = 1 then rx_nonrx = 1;
	if meth = 0 and rx = 0 and nonrx = 1 then nonrx = 1;
	run;
proc print; run;
                                                        meth_   meth_rx_
  Obs   row   meth_n   rx_n   nonrx_n   meth   meth_rx   nonrx     nonrx    rx   rx_nonrx   nonrx

   1     1       1       0       0        0       0        0         0       0       0        0
   2     2       1       1       0        0       0        0         0       0       0        0
   3     3       0       1       0        0       0        0         0       0       0        0

Currently I have two more substances to incorporate and (sadly) we add more substances from time to time.  Right now, i can create the variables and remain under 32. It seems like i should be able to use two arrays where the 2nd is one variable ahead...but i just can't wrap my head around the creation of the variable name. 

 

Tom
Super User Tom
Super User

Are you trying to make flag variables that indicate, for example, that subject has METH and RX and no others?

If so a simplified SAS code that takes advantage of the binary nature of the variables would look like:

meth_rx_nonrx=meth and rx and nonrx ;
meth_rx=meth and rx and ^nonrx ;
meth_nonrx=meth and ^rx and nonrx ;
rx_nonrx=^meth and rx and nonrx ;

But if the goal is the make a flag that indicates the METH and RX combination, regardless of others then you would want something like this instead.

meth_rx_nonrx=meth and rx and nonrx ;
meth_rx=meth and rx ;
meth_nonrx=meth and nonrx ;
rx_nonrx=rx and nonrx ;
jasberger
Calcite | Level 5

Hi Tom,

OK, this is effectively what i have on my end.  I thought maybe there was a more data-driven/automated way to identify the combinations without having to type out the various yes-no setups to get there.  I'm going to mark your latest as the accepted solution. Thanks for the guidance!

 

Jason

Tom
Super User Tom
Super User

So use the ALLCOMB function to get all combinations of 2 or more variables.

%let names='meth' 'rx' 'nonrx';
%let n=%sysfunc(countw(&names,%str( )));

data combo ;
  array name[&n] $8 (&names);
  n=dim(name);
  do k=2 to n;
    ncomb=comb(n, k);
    do j=1 to ncomb;
      rc=allcomb(j, k, of name[*]);
      output;
    end;
  end;
  drop rc j;
run;

Result

OBS    name1    name2    name3    n    k    ncomb

 1     meth     rx       nonrx    3    2      3
 2     meth     nonrx    rx       3    2      3
 3     rx       nonrx    meth     3    2      3
 4     rx       nonrx    meth     3    3      1

Then use that to generate your assignment statements.

filename code temp;
data _null_;
  file code;
  set combo ;
  array name name: ;
  do  i=1 to k ;
    put name[i] @;
    if i < k then put +(-1) '_' @;
  end;
  put '= ' @ ;
  do i=1 to n ;
    if i>k then put '^' @ ;
    put name[i] @ ;
    if i < n then put 'and ' @;
  end;
  put ';' ;
run;

Result:

meth_rx = meth and rx and ^nonrx ;
meth_nonrx = meth and nonrx and ^rx ;
rx_nonrx = rx and nonrx and ^meth ;
rx_nonrx_meth = rx and nonrx and meth ;

And you can then copy and paste it or use %INCLUDE to add to a data step.

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2838 views
  • 1 like
  • 2 in conversation