BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DR500
Calcite | Level 5

Hi All,

 

Need a little guidance here. I have categorical Yes/No data for multiple fields. I'm looking to cluster this using the MCA option in PROC CORRESP. However, when SAS spits out my Burt Table and my graph, it is not identifying which field the Yes or No answer belongs to. For instance, if my Field Name was 'HasBankAccount' with Yes or No as the answer, the Burt table will show 'Yes' and 'No', but does not say what field each 'Yes' and 'No' belongs to. I want it to show 'HasBankAccount_Yes' and 'HasBankAccount_No' or something similar just so I know which fields I'm looking at in my cluster.

 

Anyone have any ideas? I've looked all over the internets and got nothing.

Thanks in advance! - Dave

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Why not change your data?

P.S. Fixed it now, Should test it firstly .

data want;
set have;
array x{*} $ _character_;
do i=1 to dim(x);
x{i}=catx('_',vname(x{i}),x{i});
end;
run;

View solution in original post

14 REPLIES 14
PaigeMiller
Diamond | Level 26

Show us a representative portion of the data. Show us the output.

 
--
Paige Miller
DR500
Calcite | Level 5

Ok, here's a rudimentary mock-up:

Dataset  
CustIDHasAccountHasSavings
1YesNo
2YesYes
3NoNo
Burt Table    
     
 YesNoYesNo
Yes1111
No2222
Yes1111
No2222
Desired Output    
     
 HasAccount_YesHasAccount_NoHasSavings_YesHasSavings_No
HasAccount_Yes1111
HasAccount_No2222
HasSavings_Yes1111
HasSavings_No2222

 

R seems to be able to do this with FactomineR, but I really want to do this in SAS.

 

Does this make more sense? Ignore the actual counts - I just made up the ones and twos.

 

Thanks!

Dave

 

PaigeMiller
Diamond | Level 26

If you look at this example, the levels of each variable are formatted differently, and so the Burt Table is interpretable. They accomplish this (as you should) by assigning formats to the original data, so then you can separate the Yes under "HasAccount" from the Yes under "Has Savings"

--
Paige Miller
DR500
Calcite | Level 5

Thanks. I was hoping there was an easier solution than formatting each field(I actually have a lot of fields that would need formatting). Looks like I'm switching to R. Appreciate the help.

 

- Dave

Ksharp
Super User

Why not change your data?

P.S. Fixed it now, Should test it firstly .

data want;
set have;
array x{*} $ _character_;
do i=1 to dim(x);
x{i}=catx('_',vname(x{i}),x{i});
end;
run;

PaigeMiller
Diamond | Level 26

Great idea, @Ksharp 

 

Hey, @DR500 you should give this a try.

--
Paige Miller
DR500
Calcite | Level 5

Thanks! I tried what you posted and received the following error:

Array subscript out of range at line 75 column 21.


Is this the result of a length issue? If so, is there an easy way to correct it? I'd like to do it with script, but if necessary, I can brute force it by writing formulas in Excel to modify. FYI - extensive SQL background but fairly new to SAS, so please bear with me on this! Thanks! - Dave

PaigeMiller
Diamond | Level 26

Show us the log (not just the error message, but the entire log so we can see the code you used and the NOTEs and WARNINGs and ERRORs, unedited, verbatim)

--
Paige Miller
DR500
Calcite | Level 5
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
70
71 data work.test;
72 set work.custdata_mod;
73 array x{*} $ _character_;
74 do _n_=1 to dim(x);
75 x{i}=catx('_',vname(x{i}),x{i});
76 end;
77 run;
 
NOTE: Variable i is uninitialized.
ERROR: Array subscript out of range at line 75 column 21.
customerID=7590-VHVEG gender=Female Partner=Yes Dependents=No tenure=1 PhoneService=No MultipleLines=No InternetService=DSL
OnlineSecurity=No OnlineBackup=Yes DeviceProtection=No TechSupport=No StreamingTV=No StreamingMovies=No Contract=Month-to-month
PaperlessBilling=Yes PaymentMethod=Electronic check MonthlyCharges=29.85 TotalCharges=29.85 SeniorCitizen=No Churn=0
Tenure_Group=0-12 Months MonthlyCharges_Group=0-35 DollarsPerMonth i=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.CUSTDATA_MOD.
WARNING: The data set WORK.TEST may be incomplete. When this step was stopped there were 0 observations and 24 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 2450.03k
OS Memory 36268.00k
Timestamp 04/26/2020 05:34:03 PM
Step Count 70 Switch Count 2
Page Faults 0
Page Reclaims 433
Page Swaps 0
Voluntary Context Switches 9
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
 
 
 
78
79 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
90
 
PaigeMiller
Diamond | Level 26

Seems like both Ksharp and I missed the obvious mistake.

 

You want to use

 

do i=1 to dim(x);
--
Paige Miller
DR500
Calcite | Level 5

That fixed it! However, it modified all the character fields (which is what the script told it to do) but instead of using _character_, can I list only the fields(variables) that I want changed? Like:

 

array x{*} $ OnlineSecurity OnlineBackup DeviceProtection;

 

Or create a macro and call it for the fields I need like:

 

array x{*} $ &mod_fields;

 

Would either of those work? Thanks for all the help on this!

DR500
Calcite | Level 5

Nevermind - I just played around with it and it ran fine with a macro. Really appreciate all the help!

PaigeMiller
Diamond | Level 26

Sure that works, but let's get the terminology correct.

 

&mod_fields is a macro variable, it is not a macro. Macro variables are different than macros, and hopefully using the correct name can avoid confusion in the future.

--
Paige Miller
DR500
Calcite | Level 5

Got it. Like I said, coming from 15+ years of SQL to this is interesting - all sorts of different terminology. Thanks again!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 2210 views
  • 2 likes
  • 3 in conversation