BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hein68
Quartz | Level 8

Hello.  I have a tricky programming issue that I need help with.

 

I have 60 variables.  For simplicity, let's say they are called v1, v2, v3, .... v60.

 

I need to sum all of the possible pairs of these variables.  In other words, I need a sum of v1v2, v1v3, v1v4, etc.

 

Then I will run frequencies to see which combinations are the most common.

 

I also need to do something similar for combinations of 3 within the variables.

 

Please help!!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data long_sum;
	set have;
	array v(60); *list of variables with diseases;

	do index1=1 to dim(v)-1;

		do index2=index1+1 to dim(v);
		    if v(index1)=1 and v(index2)=1 then do;
			disease1 = vname(v(index1)); *gets disease name;
			disease2 = vname(v(index2));
			output;
		end;
	end;
	end;
	drop v:;
run;

proc freq data=long_sum order=freq;
table disease1*disease2 / list;
run;

Thanks!

View solution in original post

14 REPLIES 14
PaigeMiller
Diamond | Level 26

What do you mean by 

 

I need to sum all of the possible pairs of these variables.  In other words, I need a sum of v1v2, v1v3, v1v4, etc.

 

How can you sum v1v2 if v1v2 is not a variable?

 

Or is it a variable that has to be created? If so, how is it created?

--
Paige Miller
hein68
Quartz | Level 8
Something like this would work for me:

V1V2=sum(of v1, v2);

Or something like this:

If v1=1 and v2=1 then v1v2=1;

I need to find all the cases where v1 and v2 both equal 1. That’s easy enough. But I need to do that for all possible combinations of the 60 variables (v1, v2, v3, … v60).

Thanks!
Reeza
Super User

What type of values are in V1-V60? Are they all binary for example? Can you explain the problem you're trying to solve here, there may be other ways that are more efficient. You mention 60 variables, what about the rows of each of those variables? You have multiple rows or a single row. It may be best to show a fully worked example for say 5 'variables'.

 

Modifying the example as you say your variables are likely Bernoulli.

 

*make fake data to work with;
data have;
	call streaminit(30);
	array v(60);

	do record=1 to 100;

		do i=1 to 60;
				v(i)=rand('bernoulli', 0.7);			
		end;
		output;
	end;
	drop i;
run;

*sum and flip long;
data long_sum;
	set have;
	array v(60);

	do index1=1 to dim(v);

		do index2=index1+1 to dim(v);
			sum=Sum(v(index1), v(index2));
			output;
		end;
	end;
	drop v:;
run;

*sort descending to get most frequent at the top;
proc sort data=long_sum;
	by descending sum index1 index2;
run;

proc print data=long_sum (obs=10);
run;

@hein68 wrote:

Hello.  I have a tricky programming issue that I need help with.

 

I have 60 variables.  For simplicity, let's say they are called v1, v2, v3, .... v60.

 

I need to sum all of the possible pairs of these variables.  In other words, I need a sum of v1v2, v1v3, v1v4, etc.

 

Then I will run frequencies to see which combinations are the most common.

 

I also need to do something similar for combinations of 3 within the variables.

 

Please help!!


 

hein68
Quartz | Level 8
The variables are all binary, with values of 0 or 1. Numeric variables.

Each variable is for a different medical diagnosis category. 1=Yes (patient has that disease) 0=No (patient does not have that disease)

I need to find the most common disease pairings that co-occur in patients.
Reeza
Super User

data long_sum;
	set have;
	array v(60); *list of variables with diseases;

	do index1=1 to dim(v);

		do index2=index1+1 to dim(v);
		    if v(index1)=1 and v(index2)=1 then do;
			disease1 = vname(v(index1)); *gets disease name;
			disease2 = vname(v(index2));
			output;
		end;
	end;
	end;
	drop v:;
run;

proc freq data=long_sum order=freq;
table disease1*disease2 / list;
run;

@hein68 wrote:
The variables are all binary, with values of 0 or 1. Numeric variables.

Each variable is for a different medical diagnosis category. 1=Yes (patient has that disease) 0=No (patient does not have that disease)

I need to find the most common disease pairings that co-occur in patients.

 

Astounding
PROC Star
Good approach, but note that the outer loop should stop at dim(v)-1 not dim(v).
Reeza
Super User
data long_sum;
	set have;
	array v(60); *list of variables with diseases;

	do index1=1 to dim(v)-1;

		do index2=index1+1 to dim(v);
		    if v(index1)=1 and v(index2)=1 then do;
			disease1 = vname(v(index1)); *gets disease name;
			disease2 = vname(v(index2));
			output;
		end;
	end;
	end;
	drop v:;
run;

proc freq data=long_sum order=freq;
table disease1*disease2 / list;
run;

Thanks!

hein68
Quartz | Level 8

This worked for me, thanks!

hein68
Quartz | Level 8

The variables are all binary, with values of 0 or 1.  Numeric variables.

 

Each variable is for a different medical diagnosis category.  1=Yes (patient has that disease), 0=No (patient does not have that disease)

 

I need to find the most common disease pairings that co-occur in patients.

 

Here is code that I wrote for 5 variables (see below), but I need to do this for all possible pairings of 60 variables.

 

if v1=1 and v2=1 then v1v2=1;
if v1=1 and v3=1 then v1v3=1;
if v1=1 and v4=1 then v1v4=1;
if v1=1 and v5=1 then v1v5=1;

 

if v2=1 and v3=1 then v1v3=1;
if v2=1 and v4=1 then v1v4=1;
if v2=1 and v5=1 then v1v5=1;

 

if v3=1 and v4=1 then v1v4=1;
if v3=1 and v5=1 then v1v5=1;

 

if v4=1 and v5=1 then v1v5=1;

 

Thanks!

Reeza
Super User

Code isn't helpful, example input data and expected output is more helpful. Regardless, I think the solution I've provided will work for your use case. 

 

 

PaigeMiller
Diamond | Level 26

Ok, I think that's reasonably clear now. But please answer this ... why are you doing this? What analysis or report will come next. Please don't ignore this question, because I really think you have decided on a solution that is not optimal, when there might be much better solutions out there.

 

This is always a red flag for me —someone decides they need to do something difficult and unusual, and so that's the discussion; but the discussion really should be about the analysis or report you need to do first, and then provide code that gets you there. This is the XY Problem, textbook example.

--
Paige Miller
Ksharp
Super User
/*It is a matrix cross product .*/
data have;
 call streaminit(30);
 array v(60);

 do record=1 to 100;

  do i=1 to 60;
    v(i)=rand('bernoulli', 0.7);   
  end;
  output;
 end;
 drop i;
run;

proc corr data=have sscp out=want(drop=intercept where=(_type_='SSCP' and _name_ ne 'Intercept')) noprint;
var v1-v60;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1352 views
  • 8 likes
  • 5 in conversation