BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Apo
Fluorite | Level 6 Apo
Fluorite | Level 6

Dear SAS community members,

 

I am working in scientific research. I would like to run simultaneously correlations between one variable (eg blood cholesterol) and thousand of others (names of thousand genes).

The name of the variables look like this (and continues up to some thousands):

TC01000005_hg_1, TC01000006_hg_1, TC01000008_hg_1, TC01000009_hg_1, 
TC01000010_hg_1, TC01000011_hg_1, TC01000012_hg_1, TC01000013_hg_1, 
TC01000014_hg_1, TC01000015_hg_1, TC01000016_hg_1, TC01000017_hg_1, 
TC01000018_hg_1, TC01000019_hg_1

My first question is how I should type the command so that I can include all those thousand variables. I have seen a syntax like the following;

proc corr data=myData;

var Var1;

with var2-var99;

run;

 

But how should I transform it in order to include my type of variables?

 

 

My second question is how I should type the "BEST" command so that, after running the thousands correlations, I could have a list of the top 50 results. This should include the 50 variable names with the highest (or lowest) R value and the lowest p value.

 

My third question, is it possible to run this type of correlations (one variable against 30.000 variables) in SAS university edition. I have assigned 7GB out of 8 total GB RAM for the VM Box.

 

Thank you so much in advance!!

 

Regards

Apo

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Another form of variable list is to use the starting characters of a group of similarly named variables ending with a :

 

With TC: ;

would compare your variable on the VAR statement with all variables whose names start with the characters TC.

View solution in original post

6 REPLIES 6
ballardw
Super User

Another form of variable list is to use the starting characters of a group of similarly named variables ending with a :

 

With TC: ;

would compare your variable on the VAR statement with all variables whose names start with the characters TC.

Apo
Fluorite | Level 6 Apo
Fluorite | Level 6
Thank you so much for this reply!
However, I have problems trying to run correlations of my variable (200 obs) with 10.000 variables (200 obs each), insufficient memory.

ERROR: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.NEW4 may be incomplete. When this step was stopped there were 0 observations and 3 variables.
WARNING: Data set WORK.NEW4 was not replaced because this step was stopped.
NOTE: PROCEDURE CORR used (Total process time):
real time 0.47 seconds
cpu time 0.39 seconds

ballardw
Super User

It helps to show the entire code when you get an error. Some things that may help: use option NOPRINT on the Proc Corr statement to reduce the printed output which takes memory trying to format things. Direct the desired statistics to data sets.

 

You may have to break the data into groups. If you use the -- operator , that is two dashes, in a variable list then the variables that are in order are selected;

 

with TC01000005_hg_1 -- TC01000019_hg_1;  would select adjacent columns in the data set with the leftmost the first variable and the last being the right-most column of that group.

 

I have to say that when you said you had 1000s of variables I was afraid there might be a memory issue.

 

 

 

art297
Opal | Level 21

Answer to first question: run proc contents on the file to identify the first and last variables in the set. Then you can simply specify them in a list like: TC01000005_hg_1--TC010000019_hg_1

 

Second question:

proc corr data=myData best=50;
  var Var1;
  with var2--var99;
run;

 

Third question: I don't know, but I don't see why it wouldn't.

 

Art, CEO, AnalystFinder.com

 

Apo
Fluorite | Level 6 Apo
Fluorite | Level 6

Dear art297

Thanks for your reply!

 

The code functions perfectly! The only problem is that, when I try to start the correlation analysis of my variable with around 10.000 variables, SAS stops and shows the following message;

ERROR: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.NEW4 may be incomplete. When this step was stopped there were 0 observations and 3 variables.
WARNING: Data set WORK.NEW4 was not replaced because this step was stopped.
NOTE: PROCEDURE CORR used (Total process time):
real time 0.47 seconds
cpu time 0.39 seconds
 
My SAS university edition is running on 2 cores and on 7GB RAM. Do you think I should use a faster computer or a standard non-university SAS edition?
 
Thank you once again
 
Ksharp
Super User

SAS/IML is the best way to do that. @Rick_SAS might be interesting.

Post your data and output you want see here.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 4344 views
  • 4 likes
  • 4 in conversation