BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PaigeMiller
Diamond | Level 26

There is an alternate method of getting PCA vectors and scores from huge data sets, which I tested a long time ago, and my memory says it was much faster than PCA when you only needed a few vectors and had large data sets.

 

So, assuming my memory is correct, you could try this to get PCA results from PROC PLS. The advantage is that PLS doesn't have to invert matrices and doesn't have to compute eigenvalues/eigenvectors from the entire correlation matrix which is 5359x5359.

 

proc pls data=beta_diversity nfac=2 details;
    ods output xloadings=xloadings
    model _numeric_ = _numeric_;
    output out=_scores_ xscore=prin;
run;

The idea here is that if you fit a PLS model where the x-variables are the same variables as the y-variables, you get PCA!!! (Raise your hand if you knew that). 

 

 

--
Paige Miller
sbxkoenk
SAS Super FREQ

Hello @PaigeMiller ,

My hand is NOT raised. Thanks for the tip!

 

@kellychan84 :

If PROC PRINCOMP is slow, you can also try :

  • PROC HPPRINCOMP if you are in SAS 9.4 (HP is for High Performance and it exploits multi-threading)
  • PROC PCA if you are in SAS VIYA

Thanks,

Koen

kellychan84
Fluorite | Level 6
Hello Paige, thank you for your suggestions!! But when I run the code, it comes up with an error: ERROR: No MODEL specified.
PaigeMiller
Diamond | Level 26

@kellychan84 when you get an error, you need to SHOW US the log.

 

We need to see the ENTIRE log for this PROC, all of it for this PROC, every single line for this PROC, from the first line where the log shows PROC PLS all the way down to the last NOTE beneath the log.

 

Copy the log as text and then paste it into the window that appears when you click on the </> icon

2021-11-26 08_27_29-Reply to Message - SAS Support Communities — Mozilla Firefox.png

--
Paige Miller
kellychan84
Fluorite | Level 6
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 NOTE: ODS statements in the SAS Studio environment may disable some output features.
 71         
 72         data beta_diversity;
 73           length treatment $20;
 74           Infile "/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA delete.csv" dlm="," dsd truncover
 74       ! lrecl=1000000 firstobs=2;
 75           input treatment$ ASV1-ASV5359;
 76         run;
 
 NOTE: The infile "/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA delete.csv" is:
       Filename=/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA delete.csv,
       Owner Name=u39233094,Group Name=oda,
       Access Permission=-rw-r--r--,
       Last Modified=10Feb2022:15:51:11,
       File Size (bytes)=411643
 
 NOTE: 34 records were read from the infile "/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA 
       delete.csv".
       The minimum record length was 10740.
       The maximum record length was 11170.
 NOTE: The data set WORK.BETA_DIVERSITY has 34 observations and 5360 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.02 seconds
       user cpu time       0.02 seconds
       system cpu time     0.00 seconds
       memory              7997.96k
       OS Memory           35880.00k
       Timestamp           02/11/2022 09:21:48 PM
       Step Count                        31  Switch Count  2
       Page Faults                       0
       Page Reclaims                     1436
       Page Swaps                        0
       Voluntary Context Switches        19
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           4360
       
 
 77         proc pls data=beta_diversity nfac=2 details;
 78             ods output xloadings=xloadings
 79             model _numeric_ = _numeric_;
 80             output out=_scores_ xscore=prin;
 81         run;
 
 ERROR: No MODEL specified.
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK._SCORES_ may be incomplete.  When this step was stopped there were 0 observations and 0 variables.
 WARNING: Data set WORK._SCORES_ was not replaced because this step was stopped.
 NOTE: PROCEDURE PLS used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              4996.03k
       OS Memory           34372.00k
       Timestamp           02/11/2022 09:21:48 PM
       Step Count                        32  Switch Count  1
       Page Faults                       0
       Page Reclaims                     521
       Page Swaps                        0
       Voluntary Context Switches        6
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 WARNING: Output '_numeric_' was not created.  Make sure that the output object name, label, or path is spelled correctly.  Also, 
          verify that the appropriate procedure options are used to produce the requested output object.  For example, verify that 
          the NOPRINT option is not used.
 WARNING: Output 'model' was not created.  Make sure that the output object name, label, or path is spelled correctly.  Also, verify 
          that the appropriate procedure options are used to produce the requested output object.  For example, verify that the 
          NOPRINT option is not used.
 WARNING: Output 'xloadings' was not created.  Make sure that the output object name, label, or path is spelled correctly.  Also, 
          verify that the appropriate procedure options are used to produce the requested output object.  For example, verify that 
          the NOPRINT option is not used.
 82         
 83         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 94         
 User: u39233094
data beta_diversity;
  length treatment $20;
  Infile "/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA delete.csv" dlm="," dsd truncover lrecl=1000000 firstobs=2;
  input treatment$ ASV1-ASV5359;
run;
proc pls data=beta_diversity nfac=2 details;
    ods output xloadings=xloadings
    model _numeric_ = _numeric_;
    output out=_scores_ xscore=prin;
run;
PaigeMiller
Diamond | Level 26

Looks like I made a typographical error.

 

There should be semi-colon at the end of the line

 

ods output xloadings=xloadings
--
Paige Miller
kellychan84
Fluorite | Level 6

Hello Paige,

It finally works!! It takes 15 min to finish the program. But there is a warning here. I don't know whether it matters or not. Please see the log attached below. 

 
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 NOTE: ODS statements in the SAS Studio environment may disable some output features.
 71         
 72         data beta_diversity;
 73           length treatment $20;
 74           Infile "/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA delete.csv" dlm="," dsd truncover
 74       ! lrecl=1000000 firstobs=2;
 75           input treatment$ ASV1-ASV5359;
 76         run;
 
 NOTE: The infile "/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA delete.csv" is:
       Filename=/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA delete.csv,
       Owner Name=u39233094,Group Name=oda,
       Access Permission=-rw-r--r--,
       Last Modified=10Feb2022:15:51:11,
       File Size (bytes)=411643
 
 NOTE: 34 records were read from the infile "/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA 
       delete.csv".
       The minimum record length was 10740.
       The maximum record length was 11170.
 NOTE: The data set WORK.BETA_DIVERSITY has 34 observations and 5360 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.08 seconds
       user cpu time       0.02 seconds
       system cpu time     0.01 seconds
       memory              8001.78k
       OS Memory           35368.00k
       Timestamp           02/14/2022 02:51:14 PM
       Step Count                        24  Switch Count  2
       Page Faults                       0
       Page Reclaims                     1666
       Page Swaps                        0
       Voluntary Context Switches        24
       Involuntary Context Switches      0
       Block Input Operations            808
       Block Output Operations           4360
       
 
 77         proc pls data=beta_diversity nfac=2 details;
 78             ods output xloadings=xloadings;
 79             model _numeric_ = _numeric_;
 80             output out=_scores_ xscore=prin;
 81         run;
 
 WARNING: Iteration limit reached without convergence.
 NOTE: The data set WORK.XLOADINGS has 2 observations and 5360 variables.
 NOTE: There were 34 observations read from the data set WORK.BETA_DIVERSITY.
 NOTE: The data set WORK._SCORES_ has 34 observations and 5362 variables.
 NOTE: PROCEDURE PLS used (Total process time):
       real time           14:46.99
       user cpu time       14:40.87
       system cpu time     2.20 seconds
       memory              955894.12k
       OS Memory           1091020.00k
       Timestamp           02/14/2022 03:06:01 PM
       Step Count                        25  Switch Count  10
       Page Faults                       0
       Page Reclaims                     708366
       Page Swaps                        0
       Voluntary Context Switches        33275
       Involuntary Context Switches      1277
       Block Input Operations            0
       Block Output Operations           29576
       
 
 82         
 83         
 84         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 95         

One more question, could you please also teach me how to plot my PCA1and PCA2 after this proc pls? Thank you very much!

One more thing is that there is still only 34 observations in the logs. I don't know why?

sbxkoenk
SAS Super FREQ

Take care @kellychan84 : your analysis is done on 34 observations only. That's NOT what you want!

 

Regarding :

WARNING: Iteration limit reached without convergence.

There is a MAXITER sub-option to increase number of iterations.

(The default algorithm is NIPLS and 200 is the default number of iterations.)

 

Cheers,

Koen

Tom
Super User Tom
Super User

So it seems to be reading the data.  It read 5,360 variables. The longest line was only 11,170 bytes.  Are your variables all only one character long?

 NOTE: 34 records were read from the infile "/home/u39233094/sasuser.v94/Thesis/CSV file/Cecal beta-diversity for SAS PCA 
       delete.csv".
       The minimum record length was 10740.
       The maximum record length was 11170.
 NOTE: The data set WORK.BETA_DIVERSITY has 34 observations and 5360 variables.

But it only found 34 observations, not the 182K you mentioned before.  Did you truncate the text file in some way?  Note you could also just use the OBS= option on the INFILE statement to only read some of lines. Or use the OBS= dataset option when passing the data to a procedure to have it only use some of the observations.

 

The reason SAS did not see your MODEL statement is because you forgot to end the ODS statement so your MODEL keyword became part of the ODS statement instead of a separate statement.  Make sure to place semi-colons at the end of the statements.  Line breaks and extra white space mean nothing in SAS code. You have to actually tell SAS where the statements end by using semi-colons.

proc pls data=beta_diversity nfac=2 details;
    ods output xloadings=xloadings;
    model _numeric_ = _numeric_;
    output out=_scores_ xscore=prin;
run;
Tom
Super User Tom
Super User

If you are really trying to 34 observations to derive insights into over 5,000 variables you do not have enough data for your analysis to have any meaning.  Perhaps that is why it doesn't end?

PaigeMiller
Diamond | Level 26

@Tom wrote:

If you are really trying to 34 observations to derive insights into over 5,000 variables you do not have enough data for your analysis to have any meaning.  Perhaps that is why it doesn't end?


Great catch. This certainly would be a meaningless analysis. However, the algorithm ought to finish really really quickly with just 34 observations.

--
Paige Miller
kellychan84
Fluorite | Level 6

Hello Tom,

I don't know why it shows only 34 observations. After I add the "dsd truncover lrecl=1000000" coding, the column number finally increases from 5000 to exactly what I have (5359). Paige's proc pls procedure is giving me results finally!!

PaigeMiller
Diamond | Level 26

@kellychan84 wrote:

Hello Tom,

I don't know why it shows only 34 observations. After I add the "dsd truncover lrecl=1000000" coding, the column number finally increases from 5000 to exactly what I have (5359). Paige's proc pls procedure is giving me results finally!!


I'm going out to celebrate! 🙂 😀😁


But please clear up this issue: do you still have 34 observations? If so, the results are meaningless.

--
Paige Miller
kellychan84
Fluorite | Level 6

Yeah, I am going to celebrate too. 

But the log shows that it only reads 34 observations. "NOTE: The data set WORK.BETA_DIVERSITY has 34 observations and 5360 variables." Why is that happening? I will try the solutions others are giving.

PaigeMiller
Diamond | Level 26

You have to perform some basic troubleshooting to figure out why when you read the .csv file produces a SAS data set named beta_diversity that has 34 observations. This is where you need to start.

 

And if you do figure out how to get the 180,000 observations, then your PCA/PLS should take days if it takes 15 minutes with 34 observations.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 36 replies
  • 1161 views
  • 3 likes
  • 6 in conversation