BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Theo_Gh
Obsidian | Level 7

Hi everyone,

I have a dataset in which all the variables have a total observation of 68811 with there are no missing values. However, SAS uses only 68717 in regressions. Is there a way of ensuring that the number of observations equal the number of observations used in the regressions? 

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You have 94 observations with at least one of

    nD_assets1 post nsize nlev ner ncash nGrowth nR_E nroa fic fyear sic2

missing.

 

If they are all numeric vars, you can find those cases by

 

data suspect;

  set have;

  where nmiss(nD_assets1, post nsize, nlev, ner, ncash, nGrowth, nR_E, nroa, fic, fyear, sic2)>0;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
mkeintz
PROC Star

Show us the sas log, which will help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
WarrenKuhfeld
Ammonite | Level 13

There are always one or more reasons why observations are excluded: missing values, nonpositive weights, or nonpositive frequencies.  Did you look at the number of observations table?  Does it agree with your assessment that there are no missing values?  You can output it to a data set and print it to see additional information about why observations were excluded.

 

data class;
   set sashelp.class;
   if uniform(7) lt 0.1 then height = .;
   if uniform(7) lt 0.1 then weight = .;
   f = uniform(7) > 0.1;
   w = uniform(7) > 0.1;
   run;
proc print; run;
proc reg;
   model weight = height;
   freq f;
   weight w;
   ods output nobs=n;
quit;
proc print; run;

 

Reeza
Super User

@Theo_Gh wrote:

Hi everyone,

I have a dataset in which all the variables have a total observation of 68811 with there are no missing values. However, SAS uses only 68717 in regressions. Is there a way of ensuring that the number of observations equal the number of observations used in the regressions? 

 

Thank you.


You need to post more information. 

SAS uses all non-misisng by default so either you have missing or a WHERE statement or something else that's not being stated. Most likely it's the third OR option here. 

 

Theo_Gh
Obsidian | Level 7

Sorry everyone; actually SAS does indicate that there are values. Problem is I can't identify them. 

 

The log is provided below:

 

proc surveyreg data=theo.final;
3 cluster gvkey;
4 class fic fyear sic2;
5 model nD_assets1= post nsize nlev ner ncash nGrowth nR_E nroa fic fyear sic2/solution ADJRSQ;
6 run;

NOTE: Writing HTML Body file: sashtml.htm
NOTE: In data set FINAL, total 68811 observations read, 94 observations with missing values are
omitted.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 8.16 seconds
cpu time 2.96 seconds

 

mkeintz
PROC Star

You have 94 observations with at least one of

    nD_assets1 post nsize nlev ner ncash nGrowth nR_E nroa fic fyear sic2

missing.

 

If they are all numeric vars, you can find those cases by

 

data suspect;

  set have;

  where nmiss(nD_assets1, post nsize, nlev, ner, ncash, nGrowth, nR_E, nroa, fic, fyear, sic2)>0;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Theo_Gh
Obsidian | Level 7
data theo.new;
set theo.final;
where nmiss(nD_assets1, post nsize, nlev, ner, ncash, nGrowth, nR_E, nroa)>0;
run;

I used the code above but I'm getting a syntax error
WarrenKuhfeld
Ammonite | Level 13

put a comma between all of your variables (or use OF as I did then you don't need any commas).

WarrenKuhfeld
Ammonite | Level 13

data ...;

set ...;

if nmiss(of ... all analysis variables variable list...);

run;

proc print; run;

Theo_Gh
Obsidian | Level 7
Please bear with me.
The codes works now.I get a new dataset but when I use the new dataset in a regression, I get an error that there are no observations in the dataset even though there are
WarrenKuhfeld
Ammonite | Level 13

You asked how to find out how to identify the observations with missing values.  That is what I showed you.  You can't analyze that data set.  Analyze the original data set.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3752 views
  • 3 likes
  • 4 in conversation