My partial data looks like this:
Market goes from 1-50,
Brand goes from 1-200,
QNUM= BI1ATT01...BI1ATT357, BI2ATT01...BI2ATT200, BI3ATT01...BI3ATT260
AttValue= 0 or 1
SURVEYID | QNO | CQNO | MARKET | BRAND | LOADMONTH | MONREP | QTRREP | WEIGHT | XMKT_WEIGHT | CREATEDDATE | Qnum | AttValue | TIME_PERIOD | FLAG | BAC11 |
6840348 | 00006840348 | 130100006840348 | 10 | 4 | 01Jan2013 | 1301 | 1301 | 1.1147700000000000 | 12.1398900000000000 | 04Apr2013 19:53:51 | BI1ATT01 | 0 | 1/1/13 | 0 | 6 |
6840699 | 00006840699 | 130100006840699 | 10 | 4 | 01Jan2013 | 1301 | 1301 | 1.1207900000000000 | 12.2054200000000000 | 04Apr2013 19:53:55 | BI1ATT01 | 0 | 1/1/13 | 0 | 10 |
6840763 | 00006840763 | 130100006840763 | 10 | 4 | 01Jan2013 | 1301 | 1301 | 1.1207900000000000 | 12.2054200000000000 | 04Apr2013 19:53:56 | BI1ATT01 | 0 | 1/1/13 | 0 | 10 |
6840677 | 00006840677 | 130100006840677 | 10 | 4 | 01Jan2013 | 1301 | 1301 | 1.1207900000000000 | 12.2054200000000000 | 04Apr2013 19:53:54 | BI1ATT01 | 0 | 1/1/13 | 0 | 9 |
6840684 | 00006840684 | 130100006840684 | 10 | 4 | 01Jan2013 | 1301 | 1301 | 0.9990600000000000 | 10.8797300000000000 | 04Apr2013 19:53:55 | BI1ATT01 | 0 | 1/1/13 | 0 | 8 |
6840565 | 00006840565 | 130100006840565 | 10 | 4 | 01Jan2013 | 1301 | 1301 | 1.1998900000000000 | 13.0667700000000000 | 04Apr2013 19:53:53 | BI1ATT01 | 0 | 1/1/13 | 0 | 9 |
6840801 | 00006840801 | 130100006840801 | 10 | 4 | 01Jan2013 | 1301 | 1301 | 1.1207900000000000 | 12.2054200000000000 | 04Apr2013 19:54:03 | BI1ATT01 | 0 | 1/1/13 | 0 | 10 |
6840650 | 00006840650 | 130100006840650 | 10 | 4 | 01Jan2013 | 1301 | 1301 | 1.1207900000000000 | 12.2054200000000000 | 04Apr2013 19:53:54 | BI1ATT01 | 0 | 1/1/13 | 0 | 10 |
This code gives me correct correlations:
PROC CORR DATA=RESP_ATT_2 OUT=RESP_ATT_2_unwtd NOPRINT ;
BY TIME_PERIOD MARKET QNUM NOTSORTED; | |
WHERE MARKET=10 AND QNUM = 'BI1ATT01'; | |
VAR BAC11; | |
WITH attvalue ;RUN; |
Which is:
TIME_PERIOD | MARKET | Qnum | _TYPE_ | _NAME_ | BAC11 |
1/1/13 | 10 | BI1ATT01 | MEAN | 6.976191972 | |
1/1/13 | 10 | BI1ATT01 | STD | 2.080971423 | |
1/1/13 | 10 | BI1ATT01 | N | 15919 | |
1/1/13 | 10 | BI1ATT01 | CORR | AttValue | 0.219948207 |
But if I add another qnum or do it for all values of qnum, it splits (uncorrectly):
TIME_PERIOD | MARKET | Qnum | _TYPE_ | _NAME_ | BAC11 |
1/1/13 | 10 | BI1ATT01 | MEAN | 7.046176 | |
1/1/13 | 10 | BI1ATT01 | STD | 2.04887 | |
1/1/13 | 10 | BI1ATT01 | N | 4916 | |
1/1/13 | 10 | BI1ATT01 | CORR | AttValue | 0.200917 |
1/1/13 | 10 | BI1ATT02 | MEAN | 7.046176 | |
1/1/13 | 10 | BI1ATT02 | STD | 2.04887 | |
1/1/13 | 10 | BI1ATT02 | N | 4916 | |
1/1/13 | 10 | BI1ATT02 | CORR | AttValue | 0.357214 |
2/1/13 | 10 | BI1ATT01 | MEAN | 6.893767 | |
2/1/13 | 10 | BI1ATT01 | STD | 2.038903 | |
2/1/13 | 10 | BI1ATT01 | N | 5535 | |
2/1/13 | 10 | BI1ATT01 | CORR | AttValue | 0.227032 |
2/1/13 | 10 | BI1ATT02 | MEAN | 6.893767 | |
2/1/13 | 10 | BI1ATT02 | STD | 2.038903 | |
2/1/13 | 10 | BI1ATT02 | N | 5535 | |
2/1/13 | 10 | BI1ATT02 | CORR | AttValue | 0.359607 |
3/1/13 | 10 | BI1ATT01 | MEAN | 6.996708 | |
3/1/13 | 10 | BI1ATT01 | STD | 2.148244 | |
3/1/13 | 10 | BI1ATT01 | N | 5468 | |
3/1/13 | 10 | BI1ATT01 | CORR | AttValue | 0.228997 |
3/1/13 | 10 | BI1ATT02 | MEAN | 6.996708 | |
3/1/13 | 10 | BI1ATT02 | STD | 2.148244 | |
3/1/13 | 10 | BI1ATT02 | N | 5468 | |
3/1/13 | 10 | BI1ATT02 | CORR | AttValue | 0.329982 |
4916 + 5535 + 5468 = 15919
How can I get correlations for all bac11 with attvalue for all qnums in 1 proc corr step?
I have been able to do it when each qnum was a separate column.
Thanks, in advance!.
You are getting a statistic for every value of QNUM as requested. Take QNUM out of your BY statement.
That gives me the following values for Market=10, which is not correct.
2013Q1 10 MEAN 7.0461757526
2013Q1 10 STD 2.0486664614
2013Q1 10 N 201556
2013Q1 10 CORR AttValue 0.2626392784
This is correct, when I specify qnum= xxxxx
1/1/13 | 10 | BI1ATT01 | N | 15919 | |
1/1/13 | 10 | BI1ATT01 | CORR | AttValue | 0.219948207 |
OK, so the problem is that you are getting separate analyses for every time period. You will need to sort your data by market qnum; first and then calculate your correlations by market qnum; (without the time_period)
The corrects correlations are as follows:
AGGREGATE | DATEID | QNUM | CNT_UNWTD | BASE_UNWTD | CORR_UNWTD | CNT_WTD | BASE_WTD | CORR_WTD |
10 | 21301 | BI1ATT01_OPINION_CORR | 2721 | 15919 | 0.219948207 | 2880.323 | 16814.26 | 0.223680444 |
10 | 21301 | BI1ATT02_OPINION_CORR | 4339 | 15919 | 0.347334933 | 4583.797 | 16814.26 | 0.352963311 |
10 | 21301 | BI1ATT04_OPINION_CORR | 4087 | 15919 | 0.353498458 | 4348.75 | 16814.26 | 0.356669133 |
10 | 21301 | BI1ATT06_OPINION_CORR | 4211 | 15919 | 0.290342345 | 4463.583 | 16814.26 | 0.293930886 |
10 | 21301 | BI1ATT07_OPINION_CORR | 3988 | 15919 | 0.237841824 | 4197.28 | 16814.26 | 0.243500947 |
10 | 21301 | BI1ATT08_OPINION_CORR | 2286 | 15919 | 0.084569384 | 2407.529 | 16814.26 | 0.085981741 |
10 | 21301 | BI1ATT09_OPINION_CORR | 3132 | 15919 | 0.299537792 | 3287.753 | 16814.26 | 0.302181745 |
10 | 21301 | BI1ATT10_OPINION_CORR | 2883 | 15919 | 0.257086345 | 3030.902 | 16814.26 | 0.258804155 |
10 | 21301 | BI1ATT13_OPINION_CORR | 3743 | 15919 | 0.296277864 | 3957.778 | 16814.26 | 0.298797278 |
Corr is beween Bac11 and AttValue (shown in the original ques.)
When I use 1 qnum at a time, I get the correct correlations, but I wants to do this for about 900 qnums....when I do it all together, it splits it incorrectly as shown in my original question.
Post the code that doesn't work please.
My original post has many details, but here it is:
PROC CORR DATA=RESP_ATT_2 OUT=RESP_ATT_2_unwtd NOPRINT ;
BY TIME_PERIOD MARKET qnum NOTSORTED;
WHERE MARKET=10 /*AND QNUM IN ('BI1ATT01', 'BI1ATT02')*/;
VAR BAC11;
WITH attvalue ;
RUN;
My guess is your BY variables aren't specified appropriately for what you're looking to do.
Can you try changing the orders around to get what you want.
What's the finest level, eg is it QNUM across all markets or within markets?
Hmm..I changed the order to BY TIME_PERIOD qnum MARKET NOTSORTED; -- but get the same results.
Each market has qnums.
The funny thing is, when I specify only 1 qnum, ex -- and qnum='BI1ATT01' ...then the correlations are correct but I cannot do it separately as I have 50+ markets and 900+ QNUMs for correlations.
Thanks....
It doesn't matter how many markets/Qnums you have.
Things I'd try:
1. Sorting data, perhaps its not sorted the way you think it is?
2. Try using just VAR bac11 attvalue rather than WITH.
3. Verify the levels of qnum/market, perhaps run a proc freq with tables qnum*markets to verify what levels you need and what order.
4. Correlations with a 0/1 variable doesn't really mean anything, usually used for continuous variables, so is this really a good measure of what you're trying to accomplish?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.