BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CEdward
Fluorite | Level 6

Good day,

 

I saw this post previously and am wondering if the opposite is done: https://communities.sas.com/t5/Statistical-Procedures/Identifying-Unused-Observations-in-PROC-PHREG/...

 

Suppose I have a panel dataset, how can I determine how many people were included in my model?

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@CEdward wrote:

PGStats

Thank you.

I want to get the number of distinct individuals used rather than observations. I have used that method for observations already.


Does your data set have an individual identifier? Or group of variables that identify an individual? Then something like this gets the individuals, so you could use that as the basic for counting as a subquery.

 

Proc sql;

   select distinct <variable(s) to identify an individual>

   from residualdataset

   where not missing (residualvariable)

   ;

quit;

View solution in original post

5 REPLIES 5
PGStats
Opal | Level 21

Same logic:

 

Use the fact that residuals cannot be calculated for unused observations. Try adding the statement

 

output out=resOut resmart=resmart;

 

to the phreg procedure, and then

 

proc print data=resOut; where resmart is NOT missing; run;

 

to print the obs used, or

 

proc sql;

select count(resmart) as numberUsedInModel

from resOut;

quit;

 

to get the number of obs used.

PG
CEdward
Fluorite | Level 6

PGStats

Thank you.

I want to get the number of distinct individuals used rather than observations. I have used that method for observations already.

ballardw
Super User

@CEdward wrote:

PGStats

Thank you.

I want to get the number of distinct individuals used rather than observations. I have used that method for observations already.


Does your data set have an individual identifier? Or group of variables that identify an individual? Then something like this gets the individuals, so you could use that as the basic for counting as a subquery.

 

Proc sql;

   select distinct <variable(s) to identify an individual>

   from residualdataset

   where not missing (residualvariable)

   ;

quit;

CEdward
Fluorite | Level 6
I suspected that was the solution. Thank you.
As an added safety check, I noticed a post in that original link which says that using the fact that residuals are not always calculated for lines with missing is not bullet proof. Would someone be able to confirm that?
ballardw
Super User

In that post @JacobSimonsen posted an example for one specific regression with a specific data set.

 

Other regressions might have a similar behavior depending on data, regression procedure and options used.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 511 views
  • 2 likes
  • 3 in conversation