Help using Base SAS procedures

proc corr

Reply
Contributor
Posts: 35

proc corr

My partial data looks like this:

Market goes from 1-50,

Brand goes from 1-200,

QNUM= BI1ATT01...BI1ATT357, BI2ATT01...BI2ATT200, BI3ATT01...BI3ATT260

AttValue= 0 or 1

SURVEYIDQNOCQNOMARKETBRANDLOADMONTHMONREPQTRREPWEIGHTXMKT_WEIGHTCREATEDDATEQnumAttValueTIME_PERIODFLAGBAC11
68403480000684034813010000684034810401Jan2013130113011.114770000000000012.139890000000000004Apr2013 19:53:51BI1ATT0101/1/1306
68406990000684069913010000684069910401Jan2013130113011.120790000000000012.205420000000000004Apr2013 19:53:55BI1ATT0101/1/13010
68407630000684076313010000684076310401Jan2013130113011.120790000000000012.205420000000000004Apr2013 19:53:56BI1ATT0101/1/13010
68406770000684067713010000684067710401Jan2013130113011.120790000000000012.205420000000000004Apr2013 19:53:54BI1ATT0101/1/1309
68406840000684068413010000684068410401Jan2013130113010.999060000000000010.879730000000000004Apr2013 19:53:55BI1ATT0101/1/1308
68405650000684056513010000684056510401Jan2013130113011.199890000000000013.066770000000000004Apr2013 19:53:53BI1ATT0101/1/1309
68408010000684080113010000684080110401Jan2013130113011.120790000000000012.205420000000000004Apr2013 19:54:03BI1ATT0101/1/13010
68406500000684065013010000684065010401Jan2013130113011.120790000000000012.205420000000000004Apr2013 19:53:54BI1ATT0101/1/13010

This code gives me correct correlations:

PROC CORR DATA=RESP_ATT_2 OUT=RESP_ATT_2_unwtd NOPRINT ;

BY TIME_PERIOD MARKET QNUM NOTSORTED;
WHERE MARKET=10 AND QNUM = 'BI1ATT01';
VAR BAC11;
WITH attvalue ;RUN;

Which is:

TIME_PERIODMARKETQnum_TYPE__NAME_BAC11
1/1/1310BI1ATT01MEAN6.976191972
1/1/1310BI1ATT01STD2.080971423
1/1/1310BI1ATT01N15919
1/1/1310BI1ATT01CORRAttValue0.219948207

But if I add another qnum or do it for all values of qnum, it splits (uncorrectly):

TIME_PERIODMARKETQnum_TYPE__NAME_BAC11
1/1/1310BI1ATT01MEAN7.046176
1/1/1310BI1ATT01STD2.04887
1/1/1310BI1ATT01N4916
1/1/1310BI1ATT01CORRAttValue0.200917
1/1/1310BI1ATT02MEAN7.046176
1/1/1310BI1ATT02STD2.04887
1/1/1310BI1ATT02N4916
1/1/1310BI1ATT02CORRAttValue0.357214
2/1/1310BI1ATT01MEAN6.893767
2/1/1310BI1ATT01STD2.038903
2/1/1310BI1ATT01N5535
2/1/1310BI1ATT01CORRAttValue0.227032
2/1/1310BI1ATT02MEAN6.893767
2/1/1310BI1ATT02STD2.038903
2/1/1310BI1ATT02N5535
2/1/1310BI1ATT02CORRAttValue0.359607
3/1/1310BI1ATT01MEAN6.996708
3/1/1310BI1ATT01STD2.148244
3/1/1310BI1ATT01N5468
3/1/1310BI1ATT01CORRAttValue0.228997
3/1/1310BI1ATT02MEAN6.996708
3/1/1310BI1ATT02STD2.148244
3/1/1310BI1ATT02N5468
3/1/1310BI1ATT02CORRAttValue0.329982

4916 + 5535 + 5468 = 15919

How can I get correlations for all bac11 with attvalue for all qnums in 1 proc corr step?

I have been able to do it when each qnum was a separate column.

Thanks, in advance!.

Respected Advisor
Posts: 4,651

Re: proc corr

You are getting a statistic for every value of QNUM as requested. Take QNUM out of your BY statement.

PG
Contributor
Posts: 35

Re: proc corr

That gives me the following values for Market=10, which is not correct.

2013Q1 10 MEAN                  7.0461757526

2013Q1 10 STD                      2.0486664614

2013Q1 10 N                           201556

2013Q1 10 CORR AttValue      0.2626392784

This is correct, when I specify qnum= xxxxx

1/1/1310BI1ATT01N15919
1/1/1310BI1ATT01CORRAttValue0.219948207
Respected Advisor
Posts: 4,651

Re: proc corr

OK, so the problem is that you are getting separate analyses for every time period. You will need to sort your data by market qnum; first and then calculate your correlations by market qnum; (without the time_period)

PG
Contributor
Posts: 35

Re: proc corr

The corrects correlations are as follows:

AGGREGATEDATEIDQNUMCNT_UNWTDBASE_UNWTDCORR_UNWTDCNT_WTDBASE_WTDCORR_WTD
1021301BI1ATT01_OPINION_CORR2721159190.2199482072880.32316814.260.223680444
1021301BI1ATT02_OPINION_CORR4339159190.3473349334583.79716814.260.352963311
1021301BI1ATT04_OPINION_CORR4087159190.3534984584348.7516814.260.356669133
1021301BI1ATT06_OPINION_CORR4211159190.2903423454463.58316814.260.293930886
1021301BI1ATT07_OPINION_CORR3988159190.2378418244197.2816814.260.243500947
1021301BI1ATT08_OPINION_CORR2286159190.0845693842407.52916814.260.085981741
1021301BI1ATT09_OPINION_CORR3132159190.2995377923287.75316814.260.302181745
1021301BI1ATT10_OPINION_CORR2883159190.2570863453030.90216814.260.258804155
1021301BI1ATT13_OPINION_CORR3743159190.2962778643957.77816814.260.298797278

Corr is beween Bac11 and AttValue (shown in the original ques.)

When I use 1 qnum at a time, I get the correct correlations, but I wants to do this for about 900 qnums....when I do it all together, it splits it incorrectly as shown in my original question.

Super User
Posts: 17,848

Re: proc corr

Post the code that doesn't work please.

Contributor
Posts: 35

Re: proc corr

My original post has many details, but here it is:

PROC CORR DATA=RESP_ATT_2 OUT=RESP_ATT_2_unwtd NOPRINT ;

  BY TIME_PERIOD MARKET qnum NOTSORTED;

WHERE MARKET=10      /*AND QNUM IN ('BI1ATT01', 'BI1ATT02')*/;

VAR BAC11;

WITH attvalue ;

RUN;

Super User
Posts: 17,848

Re: proc corr

My guess is your BY variables aren't specified appropriately for what you're looking to do.

Can  you try changing the orders around to get what you want.

What's the finest level, eg is it QNUM across all markets or within markets?

Contributor
Posts: 35

Re: proc corr

Hmm..I changed the order to BY TIME_PERIOD qnum MARKET  NOTSORTED; -- but get the same results.

Each market has qnums.

The funny thing is, when I specify only 1 qnum, ex -- and qnum='BI1ATT01'  ...then the correlations are correct but I cannot do it separately as I have 50+ markets and 900+ QNUMs for correlations.

 

 

Thanks....

Super User
Posts: 17,848

Re: proc corr

It doesn't matter how many markets/Qnums you have.

Things I'd try:

1. Sorting data, perhaps its not sorted the way you think it is?

2. Try using just VAR bac11 attvalue rather than WITH.

3. Verify the levels of qnum/market, perhaps run a proc freq with tables qnum*markets to verify what levels you need and what order.

4. Correlations with a 0/1 variable doesn't really mean anything, usually used for continuous variables, so is this really a good measure of what you're trying to accomplish?

Ask a Question
Discussion stats
  • 9 replies
  • 296 views
  • 0 likes
  • 3 in conversation