<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Principal Component Analysis of Mixed Data by PROC PRINQUAL in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504721#M25944</link>
    <description>&lt;P&gt;Would you like to use PROC VARCLUS ?&lt;/P&gt;</description>
    <pubDate>Tue, 16 Oct 2018 13:57:18 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2018-10-16T13:57:18Z</dc:date>
    <item>
      <title>Principal Component Analysis of Mixed Data by PROC PRINQUAL</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504695#M25941</link>
      <description>&lt;P&gt;Dear all,&lt;BR /&gt;&lt;BR /&gt;I am dealing with the following problem:&lt;BR /&gt;My data, in counting process style suitable for survival analysis, is high dimensional, i.e. ~2000 variables. I would like to use a principal component analysis to reduce the dimensionality. However, some variables are categorical.&lt;BR /&gt;&lt;BR /&gt;My first shot would be to convert my data into a design matrix (hot encoding of categorical variables) and perform PROC PRINCOMP on this.&lt;BR /&gt;I came across PROC PRINQUAL, which documentation says: "performs principal component analysis (PCA) of qualitative, quantitative, or mixed data". However, its main statement seems to be TRANSFORM which can be used to pre-process the data for a PCA in PRINCOMP, rather than performing PCA directly in PRINQUAL. Which transformation to apply seems arbitrary to me. Is there any guideline available?&lt;BR /&gt;&lt;BR /&gt;Just as an additional information: I do want to split my data in training and test samples. The principal components should be extracted from the training data only to not spoil my test data. I know this can be done by either PROC SCORE or making use of FREQ 0 in PRINQUAL/PRINCOMP.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your thoughts&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Oct 2018 12:55:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504695#M25941</guid>
      <dc:creator>mat_n</dc:creator>
      <dc:date>2018-10-16T12:55:59Z</dc:date>
    </item>
    <item>
      <title>Re: Principal Component Analysis of Mixed Data by PROC PRINQUAL</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504721#M25944</link>
      <description>&lt;P&gt;Would you like to use PROC VARCLUS ?&lt;/P&gt;</description>
      <pubDate>Tue, 16 Oct 2018 13:57:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504721#M25944</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-10-16T13:57:18Z</dc:date>
    </item>
    <item>
      <title>Re: Principal Component Analysis of Mixed Data by PROC PRINQUAL</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504761#M25948</link>
      <description>&lt;P&gt;If you just want to do Principal Components, use the IDENTITY transformation.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Oct 2018 15:52:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504761#M25948</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2018-10-16T15:52:12Z</dc:date>
    </item>
    <item>
      <title>Re: Principal Component Analysis of Mixed Data by PROC PRINQUAL</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504983#M25960</link>
      <description>&lt;P&gt;Thank you both!&lt;/P&gt;&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;I was thinking about this, as well and actually almost expected your suggestion &lt;img id="smileyvery-happy" class="emoticon emoticon-smileyvery-happy" src="https://communities.sas.com/i/smilies/16x16_smiley-very-happy.png" alt="Smiley Very Happy" title="Smiley Very Happy" /&gt;&lt;BR /&gt;However, also PROC VARCLUS requires numerical variables, which has been the crux in the first place. Any suggestions how to handle this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;Yes, I came across this non-transformation transformation. There are two main issues I do have currently:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;While IDENTITY(*) keeps variables exactly like they are, the only (?) available transformation for categorical variables OPSCORE(*) does impute missing values even when specifying NOMISS.is there any possibility to suppress this feature or do I have to exclude these observations in advance?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC PRINQUAL DATA=full_data NOMISS out=prinqual_results REPLACE;
ID cust_num date status;
FREQ freq;
TRANSFORM 
IDENTITY(&amp;amp;num_vars.)
OPSCORE(
&amp;amp;cat_vars);
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;I played around with some transformation methods and noticed that it fundamentaly changes the number of principal components necessary. MONOTONE yields a single PC which explains over 90% of the variance where as IDENTITY needs over 20 to explain just 85%. Of course, this is exactly the purpose of PRINQUAL but I am lacking a theoretical explanation when which transformation is justifiable apart from the very general precondtitions in the manual (numeric, continuous, etc..)&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Wed, 17 Oct 2018 07:48:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/504983#M25960</guid>
      <dc:creator>mat_n</dc:creator>
      <dc:date>2018-10-17T07:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: Principal Component Analysis of Mixed Data by PROC PRINQUAL</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/505033#M25962</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/227683"&gt;@mat_n&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;Yes, I came across this non-transformation transformation. There are two main issues I do have currently:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;While IDENTITY(*) keeps variables exactly like they are, the only (?) available transformation for categorical variables OPSCORE(*) does impute missing values even when specifying NOMISS.is there any possibility to suppress this feature or do I have to exclude these observations in advance?&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I have not actually used PROC PRINQUAL with categorical variables, however &lt;A href="https://documentation.sas.com/?cdcId=pgmmvacdc&amp;amp;cdcVersion=9.4&amp;amp;docsetId=statug&amp;amp;docsetTarget=statug_prinqual_syntax05.htm&amp;amp;locale=en#statug.prinqual.prqideopt" target="_self"&gt;the documentation&lt;/A&gt; for the IDENTITY transformation does not state that the variable must be numeric. So, have you actually tried using IDENTITY on categorical variables?&lt;/P&gt;</description>
      <pubDate>Wed, 17 Oct 2018 11:56:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/505033#M25962</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2018-10-17T11:56:02Z</dc:date>
    </item>
    <item>
      <title>Re: Principal Component Analysis of Mixed Data by PROC PRINQUAL</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/505045#M25965</link>
      <description>Once you get Design Matrix ,then feed it into PROC VARCLUS.</description>
      <pubDate>Wed, 17 Oct 2018 12:18:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/505045#M25965</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-10-17T12:18:10Z</dc:date>
    </item>
    <item>
      <title>Re: Principal Component Analysis of Mixed Data by PROC PRINQUAL</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/505046#M25966</link>
      <description>Yes, they must be numeric:&lt;BR /&gt;ERROR: The IDENTITY variable var1 must be numeric.</description>
      <pubDate>Wed, 17 Oct 2018 12:20:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Principal-Component-Analysis-of-Mixed-Data-by-PROC-PRINQUAL/m-p/505046#M25966</guid>
      <dc:creator>mat_n</dc:creator>
      <dc:date>2018-10-17T12:20:28Z</dc:date>
    </item>
  </channel>
</rss>

