BookmarkSubscribeRSS Feed
Leon_Seungmin
Obsidian | Level 7

Hello SAS community!

 

I want to do a simple analysis of one dataset. 

Let's say there are 2 variable A and B.

 

I want to know how many observations not missing variable A or B are in my dataset. Also, I want to know the number of observation missing value for A but not for B and vice versa.

 

What would be good a way to see this? Thank you very much 🙂

 

 

 

11 REPLIES 11
Astounding
PROC Star

The coding would be slightly different for numeric vs. character variables.  What are we dealing with here?

Leon_Seungmin
Obsidian | Level 7

Right now, I am dealing with character variables but it would be nice to know how to deal with numaric variables as well. 

 

Thank you for your help. 

ballardw
Super User

If the data is not really complicated or large I tend to start with something like:

 

Proc freq data=have;

   tables a * b /list missing;

run;

 

Note that the result could be sent to a data set using the OUT= option.

Leon_Seungmin
Obsidian | Level 7

Thanks for the advice. However, the results were bit different from what I expected.

 

My dataset has more than 1 million observation and this program lists all list of observation under variable A and B.

 

What I want is something like this.

 

 

Count if A = missing

Count if B = missing

Count if A = missing & B =! missing

Count if B = missing & A =! missing

 

I would like to see one number for each of these 4 conditions something like 3,482 for the number of A having no observation

and 1,923 for observation missing A but having info about B.

 

I would appreciate if you can help me with this!

Reeza
Super User

Did you try the example code.

Once you've assigned the formats and cross any variables you should get only Missing vs Not Missing. 

 

Please post what you've tried.

Leon_Seungmin
Obsidian | Level 7

Thanks Reeza for sharing the file!

 

I read the example 2, but I am not quite familiar with SAS, I was not sure how to apply this to my case.

 

To be bit more specific, two variables of interest are COMPANY_FKEY and BEST_EDGAR_TICKER. In the file you gave me, 

the following example is given. The library I am using is cleaning.cleanrestat.

 

proc format;
value $ missfmt ' '="Missing"
other="Not Missing"
;
value nmissfmt . ="Missing"
other="Not Missing"
;
run;
proc freq data=home.ex2;
tables _numeric_ _character_/missing;
format _numeric_ nmissfmt.
_character_ $missfmt.;
run;

Once again, I am not quite familiar with SAS yet, I was trying to figure out which one is statement and which one is name of the variable.

 

If you can help me with this one, that will be very kind of you. 

 

Another thing I noticed in the example is that the output only shows the number of observation missing and not missing only for each variable. I want to know the number of observation missing A but not B. It would be also nice if you can help me add that condition.

 

Thank you!

 

Reeza
Super User
proc format;
value $ missfmt ' '="Missing"
other="Not Missing"
;
value nmissfmt . ="Missing"
other="Not Missing"
;
run;
proc freq data=home.ex2;
tables VARIABLE1*VARIABLE2/missing;
format _numeric_ nmissfmt.
_character_ $missfmt.;
run;

If you want the A*B cross tab, you can use an asterisk between the variables in the TABLE statement.

 

You can see the example above, where I assume VARIABLE1= <your variable name> and VARIABLE2 = <your variable name>

Astounding
PROC Star

Assuming you create the formats along the lines mentioned in Reeza's reference, the complete program could be:

 

proc format;

value $miss ' '='Missing' other='Present';

run;

 

proc freq data=have;

tables a * b / list missing;

format a b $miss.;

run;

Leon_Seungmin
Obsidian | Level 7

Thanks for the response! I have a few additional questions.

The following is the result I got from running the program.

 

SAS Output

COMPANY_FKEY SIC_CODE_FKEY Frequency Percent Cumulative
Frequency Cumulative
Percent Present Missing Present Present
520.32520.32
1641299.6816464100.00

 

I ran the program on a sample dataset and I wasn't sure how to interpret this. It appears that there are 52 firm-year observations that does not have SIC_CODE_FKEY while they have COMPANY_FKEY. However, even when I changed the order of the variable, it did not show the other case which is missing COMPANY_FKEY and having SIC_CODE_FKEY. 

 

Also, is it possible to know all 4 possible scenarios? 1) having both A & B 2) having only A 3) having only B 4) missing both A & B

and list the observation for each case?

 

Thank you for the help!

Patrick
Opal | Level 21

If it's only about two variables then some cross tabulation as already posted should do. If you want to list the missing/non missing combinations of n variables then you could also create a single new variable which contains a 0 for missing an a 1 for a value per source variable and then create a count of this single variable. See sample code below.

data have;
  set sashelp.cars (keep= Make Model EngineSize Cylinders);
  /* create some missing values */
  if mod(ceil(ranuni(1)*20),5)=0 then call missing(Make);
  if mod(ceil(ranuni(1)*20),5)=0 then call missing(Model);
  if mod(ceil(ranuni(1)*20),5)=0 then call missing(EngineSize);
  if mod(ceil(ranuni(1)*20),5)=0 then call missing(Cylinders);
run;

data prep(keep=Miss0_NoMiss1);
  set have;
  length Miss0_NoMiss1 $ 4;
  Miss0_NoMiss1=cats(not missing(Make),not missing(Model),not missing(EngineSize),not missing(Cylinders));
run;

proc freq data=prep;
  table Miss0_NoMiss1;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3518 views
  • 0 likes
  • 5 in conversation