Hi everyone,
I have a dataset in which all the variables have a total observation of 68811 with there are no missing values. However, SAS uses only 68717 in regressions. Is there a way of ensuring that the number of observations equal the number of observations used in the regressions?
Thank you.
You have 94 observations with at least one of
nD_assets1 post nsize nlev ner ncash nGrowth nR_E nroa fic fyear sic2
missing.
If they are all numeric vars, you can find those cases by
data suspect;
set have;
where nmiss(nD_assets1, post nsize, nlev, ner, ncash, nGrowth, nR_E, nroa, fic, fyear, sic2)>0;
run;
Show us the sas log, which will help us help you.
There are always one or more reasons why observations are excluded: missing values, nonpositive weights, or nonpositive frequencies. Did you look at the number of observations table? Does it agree with your assessment that there are no missing values? You can output it to a data set and print it to see additional information about why observations were excluded.
data class;
set sashelp.class;
if uniform(7) lt 0.1 then height = .;
if uniform(7) lt 0.1 then weight = .;
f = uniform(7) > 0.1;
w = uniform(7) > 0.1;
run;
proc print; run;
proc reg;
model weight = height;
freq f;
weight w;
ods output nobs=n;
quit;
proc print; run;
@Theo_Gh wrote:
Hi everyone,
I have a dataset in which all the variables have a total observation of 68811 with there are no missing values. However, SAS uses only 68717 in regressions. Is there a way of ensuring that the number of observations equal the number of observations used in the regressions?
Thank you.
You need to post more information.
SAS uses all non-misisng by default so either you have missing or a WHERE statement or something else that's not being stated. Most likely it's the third OR option here.
Sorry everyone; actually SAS does indicate that there are values. Problem is I can't identify them.
The log is provided below:
proc surveyreg data=theo.final;
3 cluster gvkey;
4 class fic fyear sic2;
5 model nD_assets1= post nsize nlev ner ncash nGrowth nR_E nroa fic fyear sic2/solution ADJRSQ;
6 run;
NOTE: Writing HTML Body file: sashtml.htm
NOTE: In data set FINAL, total 68811 observations read, 94 observations with missing values are
omitted.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 8.16 seconds
cpu time 2.96 seconds
You have 94 observations with at least one of
nD_assets1 post nsize nlev ner ncash nGrowth nR_E nroa fic fyear sic2
missing.
If they are all numeric vars, you can find those cases by
data suspect;
set have;
where nmiss(nD_assets1, post nsize, nlev, ner, ncash, nGrowth, nR_E, nroa, fic, fyear, sic2)>0;
run;
put a comma between all of your variables (or use OF as I did then you don't need any commas).
data ...;
set ...;
if nmiss(of ... all analysis variables variable list...);
run;
proc print; run;
You asked how to find out how to identify the observations with missing values. That is what I showed you. You can't analyze that data set. Analyze the original data set.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.