- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear SAS-Community!
I 'm struggling with counting significant p-values separatly for positive and negative correlation coefficientss.
After running "proc corr" with "ods output PearsonCorr" the following stylized table will be generated:
Variable | x1 | x2 | x3 | p1 | p2 | p3 |
---|---|---|---|---|---|---|
x1 | 1 | 0.8 | -0.65 | - | 0.05 | 0.023 |
x2 | 0.8 | 1 | -0.4 | 0.05 | - | 0.12 |
x3 | -0.65 | -0.4 | 1 | 0.023 | 0.12 | - |
Despite, I have an idea to count all significant p-values just by keeping variables p1,p2,p3 I have no clue to distinguish if it counts for a positive or negative correlation coeffient.
I'm thankful for every idea/code.
Best
Holger
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure how fancy you want to be or the simple way would be something like using arrays..
data x;
set corout;
array ax{*} x1-x3;
array px{*} p1-p3;
row+1;
do j=row to len(px)
if px{j}<0.05 then do;
if ax{j}<0 then sumneg+1;
if ax{j}>0 then sumpos+1;
end;
run;
Since I am at home and don't have sas this is just a guess to the syntax. Hopefully you know how to do arrays.
What you want to do is in the first row compare x1 to x2 and x1 to x3, count the number significant p values then check the correlations to see if they are negative or positive.
The second record you want to start with x2 then compare with x3 then check for significance the neg and positive.
Hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure how fancy you want to be or the simple way would be something like using arrays..
data x;
set corout;
array ax{*} x1-x3;
array px{*} p1-p3;
row+1;
do j=row to len(px)
if px{j}<0.05 then do;
if ax{j}<0 then sumneg+1;
if ax{j}>0 then sumpos+1;
end;
run;
Since I am at home and don't have sas this is just a guess to the syntax. Hopefully you know how to do arrays.
What you want to do is in the first row compare x1 to x2 and x1 to x3, count the number significant p values then check the correlations to see if they are negative or positive.
The second record you want to start with x2 then compare with x3 then check for significance the neg and positive.
Hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi steve!
Thank you for the reply! I coded it and it works with the stylized table.
I'm goint to run lots of simulations, therefore the amount of variables and their names change in every loop.
I think the problem of the code below is, that the array ax consists of all variables beginning with "_" and "P". But it should only include the "P..." variables.
data B;
set A (keep = _:);
array ax{*} _numeric_;
set A (keep =P:);
array px{*}
_numeric_;
row +1;
retain sumneg . sumpos .;
do j=row to dim(px);
if not missing(px{j}) and px{j} <0.05 then do;
if ax{j}<0 then sumneg+1;
if ax{j}>0 then sumpos+1;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have two set statements and keeps? Is that correct?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No, I have only the output table of proc corr with correlation coefficients and the corresponding p-values. The output looks like the table in my first post. (for three variables).
Steve468 suggested to code:
array ax{*} x1-x3;
array px{*} p1-p3;
This is correct for this little example.
The amount of variables of a real data set varies between 1200 - 2000. Simulation example: _1, _2, _12, _13, .... ,_1213, P_1, P_2, P_12, P_13, .... , P_1213
It is important to load the variables beginning with "_" and "P" in the two different arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't understand. You have to "set A" in your code above, why are you doing that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In "set A" is all the data i need. A is the correlation output. It think it will be right to set it only once. (Like Steve468 did it). But, I don't know how to assign all the variables beginning with "_" and "P" to the different arrays.
The procedure should work for a varying amount of variables "_" "P". I tried to set A twice to isolating all the variables beginning wirh "_" and "P" but that doesn't work.
My Code is definitively wrong at this point.
Is there a way to select all variables with a common prefix for an array?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, I figured it out...
Data B;
set A;
array px{*} P:;
array ax{*} _:;
Thanks everybody!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Maybe I'm missing something, but I think there is an easier way.
Turn on the FISHER option and use ODS output to save the FisherPearsonCorr table.
The FisherPearsonCorr table is stored in "long" format instead of "wide" format. Thus it is trivial to count the significant p-values and to know which are associate with positive and negative correlations. No arrays required.
proc corr data=sashelp.cars outp=out fisher;
ods output FisherPearsonCorr=fisher;
run;
proc print data=Fisher(obs=5);
var Var WithVar Corr pValue;
run;