Hello. I have a tricky programming issue that I need help with.
I have 60 variables. For simplicity, let's say they are called v1, v2, v3, .... v60.
I need to sum all of the possible pairs of these variables. In other words, I need a sum of v1v2, v1v3, v1v4, etc.
Then I will run frequencies to see which combinations are the most common.
I also need to do something similar for combinations of 3 within the variables.
Please help!!
data long_sum;
set have;
array v(60); *list of variables with diseases;
do index1=1 to dim(v)-1;
do index2=index1+1 to dim(v);
if v(index1)=1 and v(index2)=1 then do;
disease1 = vname(v(index1)); *gets disease name;
disease2 = vname(v(index2));
output;
end;
end;
end;
drop v:;
run;
proc freq data=long_sum order=freq;
table disease1*disease2 / list;
run;
Thanks!
What do you mean by
I need to sum all of the possible pairs of these variables. In other words, I need a sum of v1v2, v1v3, v1v4, etc.
How can you sum v1v2 if v1v2 is not a variable?
Or is it a variable that has to be created? If so, how is it created?
What type of values are in V1-V60? Are they all binary for example? Can you explain the problem you're trying to solve here, there may be other ways that are more efficient. You mention 60 variables, what about the rows of each of those variables? You have multiple rows or a single row. It may be best to show a fully worked example for say 5 'variables'.
Modifying the example as you say your variables are likely Bernoulli.
*make fake data to work with;
data have;
call streaminit(30);
array v(60);
do record=1 to 100;
do i=1 to 60;
v(i)=rand('bernoulli', 0.7);
end;
output;
end;
drop i;
run;
*sum and flip long;
data long_sum;
set have;
array v(60);
do index1=1 to dim(v);
do index2=index1+1 to dim(v);
sum=Sum(v(index1), v(index2));
output;
end;
end;
drop v:;
run;
*sort descending to get most frequent at the top;
proc sort data=long_sum;
by descending sum index1 index2;
run;
proc print data=long_sum (obs=10);
run;
@hein68 wrote:
Hello. I have a tricky programming issue that I need help with.
I have 60 variables. For simplicity, let's say they are called v1, v2, v3, .... v60.
I need to sum all of the possible pairs of these variables. In other words, I need a sum of v1v2, v1v3, v1v4, etc.
Then I will run frequencies to see which combinations are the most common.
I also need to do something similar for combinations of 3 within the variables.
Please help!!
data long_sum;
set have;
array v(60); *list of variables with diseases;
do index1=1 to dim(v);
do index2=index1+1 to dim(v);
if v(index1)=1 and v(index2)=1 then do;
disease1 = vname(v(index1)); *gets disease name;
disease2 = vname(v(index2));
output;
end;
end;
end;
drop v:;
run;
proc freq data=long_sum order=freq;
table disease1*disease2 / list;
run;
@hein68 wrote:
The variables are all binary, with values of 0 or 1. Numeric variables.
Each variable is for a different medical diagnosis category. 1=Yes (patient has that disease) 0=No (patient does not have that disease)
I need to find the most common disease pairings that co-occur in patients.
data long_sum;
set have;
array v(60); *list of variables with diseases;
do index1=1 to dim(v)-1;
do index2=index1+1 to dim(v);
if v(index1)=1 and v(index2)=1 then do;
disease1 = vname(v(index1)); *gets disease name;
disease2 = vname(v(index2));
output;
end;
end;
end;
drop v:;
run;
proc freq data=long_sum order=freq;
table disease1*disease2 / list;
run;
Thanks!
This worked for me, thanks!
Thanks!
The variables are all binary, with values of 0 or 1. Numeric variables.
Each variable is for a different medical diagnosis category. 1=Yes (patient has that disease), 0=No (patient does not have that disease)
I need to find the most common disease pairings that co-occur in patients.
Here is code that I wrote for 5 variables (see below), but I need to do this for all possible pairings of 60 variables.
if v1=1 and v2=1 then v1v2=1;
if v1=1 and v3=1 then v1v3=1;
if v1=1 and v4=1 then v1v4=1;
if v1=1 and v5=1 then v1v5=1;
if v2=1 and v3=1 then v1v3=1;
if v2=1 and v4=1 then v1v4=1;
if v2=1 and v5=1 then v1v5=1;
if v3=1 and v4=1 then v1v4=1;
if v3=1 and v5=1 then v1v5=1;
if v4=1 and v5=1 then v1v5=1;
Thanks!
Code isn't helpful, example input data and expected output is more helpful. Regardless, I think the solution I've provided will work for your use case.
Ok, I think that's reasonably clear now. But please answer this ... why are you doing this? What analysis or report will come next. Please don't ignore this question, because I really think you have decided on a solution that is not optimal, when there might be much better solutions out there.
This is always a red flag for me —someone decides they need to do something difficult and unusual, and so that's the discussion; but the discussion really should be about the analysis or report you need to do first, and then provide code that gets you there. This is the XY Problem, textbook example.
/*It is a matrix cross product .*/
data have;
call streaminit(30);
array v(60);
do record=1 to 100;
do i=1 to 60;
v(i)=rand('bernoulli', 0.7);
end;
output;
end;
drop i;
run;
proc corr data=have sscp out=want(drop=intercept where=(_type_='SSCP' and _name_ ne 'Intercept')) noprint;
var v1-v60;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.