BookmarkSubscribeRSS Feed
HALOBEAST999
Fluorite | Level 6

This may be a simple question by does anyone know how to:

  1. Count the number of non-missing observations
  2. Count the number of missing observations

for both character and numeric values using proc means? If I use n and nmiss it will only count the numeric values and not the character values. Any help would be great.

15 REPLIES 15
Shmuel
Garnet | Level 18

Please post a sample of data to illustrate your query.

code you have tried and point what issues you had.

HALOBEAST999
Fluorite | Level 6

Data CARS;
infile 'C:\Users\rsbggb\Downloads\cars.txt' firstobs=4 pad;
input Size $ 1-10 Manufact $ 11-21 Model $ 22-37 Mileage 38-49 Reliable 50;
if Size='' then delete;
run;


proc means data=CARS n;
Var Size Manufact Model Mileage Reliable;
run;

 

This is my code so far. The error I am getting is:

 

12 proc means data=CARS n;
13 Var Size Manufact Model Mileage Reliable;
ERROR: Variable Size in list does not match type prescribed for this list.
ERROR: Variable Manufact in list does not match type prescribed for this list.
ERROR: Variable Model in list does not match type prescribed for this list.
14 run;

PaigeMiller
Diamond | Level 26

@HALOBEAST999 wrote:

Data CARS;
infile 'C:\Users\rsbggb\Downloads\cars.txt' firstobs=4 pad;
input Size $ 1-10 Manufact $ 11-21 Model $ 22-37 Mileage 38-49 Reliable 50;
if Size='' then delete;
run;


proc means data=CARS n;
Var Size Manufact Model Mileage Reliable;
run;

 

This is my code so far. The error I am getting is:

 

12 proc means data=CARS n;
13 Var Size Manufact Model Mileage Reliable;
ERROR: Variable Size in list does not match type prescribed for this list.
ERROR: Variable Manufact in list does not match type prescribed for this list.
ERROR: Variable Model in list does not match type prescribed for this list.
14 run;


Size is not a numeric variable. PROC MEANS can't work on it.

Manufact is not a numeric variable. ditto

Model is not a numeric variable. ditto

 

You have read these variables in from the .txt file improperly, and so they are not numeric, they are character. Remove the $ from the INPUT statement, and see if that works.

--
Paige Miller
Reeza
Super User

You're better off using PROC FREQ unless you want other statistics at the same time. 

 

This is a bit complicated because I try and make a pretty table for reports, so I can dump it straight to Word/PDF/HTML for reports.

 

You can run the code and examine the data sets at each point, it should work in any version of SAS.

 

/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

@HALOBEAST999 wrote:

This may be a simple question by does anyone know how to:

  1. Count the number of non-missing observations
  2. Count the number of missing observations

for both character and numeric values using proc means? If I use n and nmiss it will only count the numeric values and not the character values. Any help would be great.



 

 

 

Ksharp
Super User

Reeza,

This could be simple as.

 


data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;
proc transpose data=class(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select catx(' ','nmiss(',_name_,') as ',_name_) 
into : n_miss separated by ','
from temp;

select catx(' ','nmiss(',_name_,')/count(*) as ',_name_) 
into : n_miss_per separated by ','
from temp;

create table temp1 as
select 'n_miss     ' as label,&n_miss from class
union
select 'n_miss_per'  as label,&n_miss_per from class;
quit;
proc transpose data=temp1 out=want;
id label;
run;
Reeza
Super User

@Ksharp yeah, I wrote it in 2011 actually but have just been putting them up these days in a central location so I can load them automatically. If they work, I'm usually too lazy to  make them more efficient these days 😛

FreelanceReinh
Jade | Level 19

Hello @HALOBEAST999,

 

I agree with @Reeza that PROC FREQ is more appropriate for this task. But if you really want to use PROC MEANS, then try this:

data have;
input (cvar1-cvar3) ($) nvar1 nvar2;
cards;
a e x . 6
. f . 1 .
b g . 2 7
. h y 3 .
c . . 4 8
. i z . 9
d j . 5 .
;

proc format;
value $miss
' ' = 'missing'
other = 'nonmissing';
run;

proc means data=have noprint;
format _char_ $miss.;
class _char_ / missing;
ways 0,1;
output out=cnt n= nmiss= / autoname;
run;

proc print data=cnt noobs;
run;

Result:

                                                                                    nvar1_    nvar2_
  cvar1         cvar2         cvar3       _TYPE_    _FREQ_    nvar1_N    nvar2_N     NMiss     NMiss

missing       missing       missing          0         7         5          4          2         3
missing       missing       missing          1         4         4          2          0         2
missing       missing       nonmissing       1         3         1          2          2         1
missing       missing       missing          2         1         1          1          0         0
missing       nonmissing    missing          2         6         4          3          2         3
missing       missing       missing          4         3         2          1          1         2
nonmissing    missing       missing          4         4         3          3          1         1

So, you get the desired numbers for the numeric variables in the first row and for the character variables (in reverse order) in column _FREQ_ in the other rows.

hashman
Ammonite | Level 13

@HALOBEAST999:

NOTE: I've just edited the first (array) solution since @FreelanceReinh has found an actual error in the code - it wasn't populating the array for numeric variables' frequencies NQ and suggested a 2-dimensional array instead to keep a single LINK block.

  

That's right, MEANS doesn't work with character analytical variables. But in 10-20 minutes (depending on your SAS exposure) you can write 17 lines of DATA step code to do what you want for all variables in the input data set regardless of their data type. For example (corrected code thanks to @FreelanceReinh😞 

data have ;                                                                                                                                                                                                                                                     
  input n1 c1:$1. n2 c2:$2. n3 c3:$3. ;                                                                                                                                                                                                                         
  cards ;                                                                                                                                                                                                                                                       
0  .  00  AA  000  .                                                                                                                                                                                                                                            
.  B  11  .   111  BBB                                                                                                                                                                                                                                          
2  C  .   CC  .    CCC                                                                                                                                                                                                                                          
3  .  .   DD  333  DDD                                                                                                                                                                                                                                          
.  E  44  EE  444  .                                                                                                                                                                                                                                            
5  F  .   FF  555  FFF                                                                                                                                                                                                                                          
6  .  66  GG  666  GGG                                                                                                                                                                                                                                          
7  H  .   HH  .    HHH                                                                                                                                                                                                                                          
.  I  .   .   888  .                                                                                                                                                                                                                                            
9  .  99  JJ  999  .                                                                                                                                                                                                                                            
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data freq (keep = vname N_:) ;                                                                                                                                                                                                                                  
  set have nobs = nobs end = eof ;                                                                                                                                                                                                                              
  array nn _numeric_ ;                                                                                                                                                                                                                                          
  array cc _char_ ;                                                                                                                                                                                                                                             
  array q [2, 1000000] _temporary_ ;                                                                                                                                                                                                                            
  do over nn ; q [1,_i_] ++ cmiss (nn) ; end ;                                                                                                                                                                                                                  
  do over cc ; q [2,_i_] ++ cmiss (cc) ; end ;                                                                                                                                                                                                                  
  if eof ;                                                                                                                                                                                                                                                      
  do over nn ; Vname = put (vname (nn), $32.) ; d = 1 ; link compout ; end ;                                                                                                                                                                                    
  do over cc ; Vname = put (vname (cc), $32.) ; d = 2 ; link compout ; end ;                                                                                                                                                                                    
  return ;                                                                                                                                                                                                                                                      
  compout: N_null = q [d,_i_] ;                                                                                                                                                                                                                                 
           N_notnull = nobs - N_null ;                                                                                                                                                                                                                          
           output ;                                                                                                                                                                                                                                             
  return ;                                                                                                                                                                                                                                                      
run ;                                                         

The hard-coded 100000 is supposed to exceed the number of either numeric or character variables in your input file. If you have more than that 😉 , you can bump it up to 1000000 or even 10000000 (at the expense of at most 150 MB of RAM).

 

If you're still averse to setting hard-coded limits of this kind, the SAS hash object is your friend:

data _null_ ;                                               
  if _n_ = 1 then do ;                                      
    dcl hash f (ordered:"D") ;                              
    f.definekey ("vname") ;                                 
    f.definedata ("vname", "n_null", "n_notnull") ;         
    f.definedone () ;                                       
  end ;                                                     
  set have end = eof ;                                      
  array nn _numeric_ ;                                      
  array cc _char_ ;                                         
  do over nn ;                                              
    Vname = put (vname (nn), $32.) ;                        
    if f.find() ne 0 then call missing (N_null, N_notnull) ;
    if cmiss (nn) then n_null ++ 1 ;                        
    else n_notnull ++ 1 ;                                   
    f.replace() ;                                           
  end ;                                                     
  do over cc ;                                              
    Vname = put (vname (cc), $32.) ;                        
    if f.find() ne 0 then call missing (N_null, N_notnull) ;
    if cmiss (cc) then n_null ++ 1 ;                        
    else n_notnull ++ 1 ;                                   
    f.replace() ;                                           
  end ;                                                     
  if eof then f.output (dataset:"freq") ;                   
run ;                                                       

Sometimes it's easier to write a simple DATA step program than to try fitting what you need in the Procrustean bed of an existing proc.

 

Note that the two programs above will work for any input data set. The only caveat is that if it has no numeric or no character variables, you'll get a harmless warning in the log saying that a zero dimension array is being defined. 

 

Also note that for very large data sets, the array approach will work noticeably faster. This is because arrays items can be overwritten directly in place and therefore (unlike the hash object) do not involve moving data to the PDV and back. 

 

Kind regards

Paul D.  

FreelanceReinh
Jade | Level 19

@hashman: Powerful and beautiful code, as always! I would suggest one correction to the first solution: Replace the two arrays nq and cq by a single, two-dimensional array, say, q[2, 100000], and use this in the compout block (q[t,_i_]) with the appropriate value t of the first index, e.g.

  if eof ; t=1;                                                         
  do over nn ; Vname = put (vname (nn), $32.) ; link compout ; end ; t=2;
hashman
Ammonite | Level 13

@FreelanceReinh:

Thanks! Frankly, it did occur to me ... but decided not to for the sake of symmetry (and perhaps better readability).

 

Kind regards

Paul D.  

FreelanceReinh
Jade | Level 19

@hashman wrote:

@FreelanceReinh:

Thanks! Frankly, it did occur to me ... but decided not to for the sake of symmetry (and perhaps better readability).


Sure, but then you would need a separate "compout" block for the numeric variables, or?

hashman
Ammonite | Level 13

@FreelanceReinh:

Hm ... methinks not. Just the first dimension index would have to be assigned before calling the link. 

FreelanceReinh
Jade | Level 19

@FreelanceReinh wrote:


Sure, but then you would need a separate "compout" block for the numeric variables, or?


What I meant was: Currently, variable N_null is populated with values from array cq only (leading to incorrect results for the numeric variables, in general). The potential second "compout" block could do the same for array nq.

hashman
Ammonite | Level 13

@FreelanceReinh:

Aaaaaha! You're absolutely right! Thanks for reading the code so attentively and finding an actual error.

But of course: It's either 2 arrays and 2 link blocks or a single 2-dim array and a single link block.

I'll edit the original post accordingly.

Thanks again.

 

Kind regards

Paul D.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1806 views
  • 3 likes
  • 7 in conversation