SAS Programming

HALOBEAST999 · Posted 09-19-2019 11:31 AM

This may be a simple question by does anyone know how to:

Count the number of non-missing observations
Count the number of missing observations

for both character and numeric values using proc means? If I use n and nmiss it will only count the numeric values and not the character values. Any help would be great.

Shmuel · Posted 09-19-2019 12:08 PM

Please post a sample of data to illustrate your query.

code you have tried and point what issues you had.

HALOBEAST999 · Posted 09-19-2019 02:14 PM

Data CARS;
infile 'C:\Users\rsbggb\Downloads\cars.txt' firstobs=4 pad;
input Size $ 1-10 Manufact $ 11-21 Model $ 22-37 Mileage 38-49 Reliable 50;
if Size='' then delete;
run;

proc means data=CARS n;
Var Size Manufact Model Mileage Reliable;
run;

This is my code so far. The error I am getting is:

12 proc means data=CARS n;
13 Var Size Manufact Model Mileage Reliable;
ERROR: Variable Size in list does not match type prescribed for this list.
ERROR: Variable Manufact in list does not match type prescribed for this list.
ERROR: Variable Model in list does not match type prescribed for this list.
14 run;

PaigeMiller · Posted 09-19-2019 02:48 PM

@HALOBEAST999 wrote:

Data CARS;
infile 'C:\Users\rsbggb\Downloads\cars.txt' firstobs=4 pad;
input Size $ 1-10 Manufact $ 11-21 Model $ 22-37 Mileage 38-49 Reliable 50;
if Size='' then delete;
run;

proc means data=CARS n;
Var Size Manufact Model Mileage Reliable;
run;

This is my code so far. The error I am getting is:

12 proc means data=CARS n;
13 Var Size Manufact Model Mileage Reliable;
ERROR: Variable Size in list does not match type prescribed for this list.
ERROR: Variable Manufact in list does not match type prescribed for this list.
ERROR: Variable Model in list does not match type prescribed for this list.
14 run;

Size is not a numeric variable. PROC MEANS can't work on it.

Manufact is not a numeric variable. ditto

Model is not a numeric variable. ditto

You have read these variables in from the .txt file improperly, and so they are not numeric, they are character. Remove the $ from the INPUT statement, and see if that works.

--
Paige Miller

Reeza · Posted 09-19-2019 12:12 PM

You're better off using PROC FREQ unless you want other statistics at the same time.

This is a bit complicated because I try and make a pretty table for reports, so I can dump it straight to Word/PDF/HTML for reports.

You can run the code and examine the data sets at each point, it should work in any version of SAS.

/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

@HALOBEAST999 wrote:

This may be a simple question by does anyone know how to:

Count the number of non-missing observations

Count the number of missing observations

for both character and numeric values using proc means? If I use n and nmiss it will only count the numeric values and not the character values. Any help would be great.

Ksharp · Posted 09-20-2019 09:24 AM

Reeza,

This could be simple as.


data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;
proc transpose data=class(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select catx(' ','nmiss(',_name_,') as ',_name_) 
into : n_miss separated by ','
from temp;

select catx(' ','nmiss(',_name_,')/count(*) as ',_name_) 
into : n_miss_per separated by ','
from temp;

create table temp1 as
select 'n_miss     ' as label,&n_miss from class
union
select 'n_miss_per'  as label,&n_miss_per from class;
quit;
proc transpose data=temp1 out=want;
id label;
run;

Reeza · Posted 09-20-2019 11:12 AM

@Ksharp yeah, I wrote it in 2011 actually but have just been putting them up these days in a central location so I can load them automatically. If they work, I'm usually too lazy to make them more efficient these days 😛

FreelanceReinh · Posted 09-19-2019 12:39 PM

Hello @HALOBEAST999,

I agree with @Reeza that PROC FREQ is more appropriate for this task. But if you really want to use PROC MEANS, then try this:

data have;
input (cvar1-cvar3) ($) nvar1 nvar2;
cards;
a e x . 6
. f . 1 .
b g . 2 7
. h y 3 .
c . . 4 8
. i z . 9
d j . 5 .
;

proc format;
value $miss
' ' = 'missing'
other = 'nonmissing';
run;

proc means data=have noprint;
format _char_ $miss.;
class _char_ / missing;
ways 0,1;
output out=cnt n= nmiss= / autoname;
run;

proc print data=cnt noobs;
run;

Result:

                                                                                    nvar1_    nvar2_
  cvar1         cvar2         cvar3       _TYPE_    _FREQ_    nvar1_N    nvar2_N     NMiss     NMiss

missing       missing       missing          0         7         5          4          2         3
missing       missing       missing          1         4         4          2          0         2
missing       missing       nonmissing       1         3         1          2          2         1
missing       missing       missing          2         1         1          1          0         0
missing       nonmissing    missing          2         6         4          3          2         3
missing       missing       missing          4         3         2          1          1         2
nonmissing    missing       missing          4         4         3          3          1         1

So, you get the desired numbers for the numeric variables in the first row and for the character variables (in reverse order) in column _FREQ_ in the other rows.

hashman · Posted 09-20-2019 12:48 AM

@HALOBEAST999:

NOTE: I've just edited the first (array) solution since @FreelanceReinh has found an actual error in the code - it wasn't populating the array for numeric variables' frequencies NQ and suggested a 2-dimensional array instead to keep a single LINK block.

That's right, MEANS doesn't work with character analytical variables. But in 10-20 minutes (depending on your SAS exposure) you can write 17 lines of DATA step code to do what you want for all variables in the input data set regardless of their data type. For example (corrected code thanks to @FreelanceReinh😞

data have ;                                                                                                                                                                                                                                                     
  input n1 c1:$1. n2 c2:$2. n3 c3:$3. ;                                                                                                                                                                                                                         
  cards ;                                                                                                                                                                                                                                                       
0  .  00  AA  000  .                                                                                                                                                                                                                                            
.  B  11  .   111  BBB                                                                                                                                                                                                                                          
2  C  .   CC  .    CCC                                                                                                                                                                                                                                          
3  .  .   DD  333  DDD                                                                                                                                                                                                                                          
.  E  44  EE  444  .                                                                                                                                                                                                                                            
5  F  .   FF  555  FFF                                                                                                                                                                                                                                          
6  .  66  GG  666  GGG                                                                                                                                                                                                                                          
7  H  .   HH  .    HHH                                                                                                                                                                                                                                          
.  I  .   .   888  .                                                                                                                                                                                                                                            
9  .  99  JJ  999  .                                                                                                                                                                                                                                            
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data freq (keep = vname N_:) ;                                                                                                                                                                                                                                  
  set have nobs = nobs end = eof ;                                                                                                                                                                                                                              
  array nn _numeric_ ;                                                                                                                                                                                                                                          
  array cc _char_ ;                                                                                                                                                                                                                                             
  array q [2, 1000000] _temporary_ ;                                                                                                                                                                                                                            
  do over nn ; q [1,_i_] ++ cmiss (nn) ; end ;                                                                                                                                                                                                                  
  do over cc ; q [2,_i_] ++ cmiss (cc) ; end ;                                                                                                                                                                                                                  
  if eof ;                                                                                                                                                                                                                                                      
  do over nn ; Vname = put (vname (nn), $32.) ; d = 1 ; link compout ; end ;                                                                                                                                                                                    
  do over cc ; Vname = put (vname (cc), $32.) ; d = 2 ; link compout ; end ;                                                                                                                                                                                    
  return ;                                                                                                                                                                                                                                                      
  compout: N_null = q [d,_i_] ;                                                                                                                                                                                                                                 
           N_notnull = nobs - N_null ;                                                                                                                                                                                                                          
           output ;                                                                                                                                                                                                                                             
  return ;                                                                                                                                                                                                                                                      
run ;

The hard-coded 100000 is supposed to exceed the number of either numeric or character variables in your input file. If you have more than that 😉 , you can bump it up to 1000000 or even 10000000 (at the expense of at most 150 MB of RAM).

If you're still averse to setting hard-coded limits of this kind, the SAS hash object is your friend:

data _null_ ;                                               
  if _n_ = 1 then do ;                                      
    dcl hash f (ordered:"D") ;                              
    f.definekey ("vname") ;                                 
    f.definedata ("vname", "n_null", "n_notnull") ;         
    f.definedone () ;                                       
  end ;                                                     
  set have end = eof ;                                      
  array nn _numeric_ ;                                      
  array cc _char_ ;                                         
  do over nn ;                                              
    Vname = put (vname (nn), $32.) ;                        
    if f.find() ne 0 then call missing (N_null, N_notnull) ;
    if cmiss (nn) then n_null ++ 1 ;                        
    else n_notnull ++ 1 ;                                   
    f.replace() ;                                           
  end ;                                                     
  do over cc ;                                              
    Vname = put (vname (cc), $32.) ;                        
    if f.find() ne 0 then call missing (N_null, N_notnull) ;
    if cmiss (cc) then n_null ++ 1 ;                        
    else n_notnull ++ 1 ;                                   
    f.replace() ;                                           
  end ;                                                     
  if eof then f.output (dataset:"freq") ;                   
run ;

Sometimes it's easier to write a simple DATA step program than to try fitting what you need in the Procrustean bed of an existing proc.

Note that the two programs above will work for any input data set. The only caveat is that if it has no numeric or no character variables, you'll get a harmless warning in the log saying that a zero dimension array is being defined.

Also note that for very large data sets, the array approach will work noticeably faster. This is because arrays items can be overwritten directly in place and therefore (unlike the hash object) do not involve moving data to the PDV and back.

Kind regards

Paul D.

FreelanceReinh · Posted 09-20-2019 04:11 AM

@hashman: Powerful and beautiful code, as always! I would suggest one correction to the first solution: Replace the two arrays nq and cq by a single, two-dimensional array, say, q[2, 100000], and use this in the compout block (q[t,_i_]) with the appropriate value t of the first index, e.g.

  if eof ; t=1;                                                         
  do over nn ; Vname = put (vname (nn), $32.) ; link compout ; end ; t=2;

hashman · Posted 09-20-2019 01:26 PM

@FreelanceReinh:

Thanks! Frankly, it did occur to me ... but decided not to for the sake of symmetry (and perhaps better readability).

Kind regards

Paul D.

FreelanceReinh · Posted 09-20-2019 01:37 PM

@hashman wrote:

@FreelanceReinh:

Thanks! Frankly, it did occur to me ... but decided not to for the sake of symmetry (and perhaps better readability).

Sure, but then you would need a separate "compout" block for the numeric variables, or?

hashman · Posted 09-20-2019 01:42 PM

@FreelanceReinh:

Hm ... methinks not. Just the first dimension index would have to be assigned before calling the link.

FreelanceReinh · Posted 09-20-2019 01:50 PM

@FreelanceReinh wrote:

Sure, but then you would need a separate "compout" block for the numeric variables, or?

What I meant was: Currently, variable N_null is populated with values from array cq only (leading to incorrect results for the numeric variables, in general). The potential second "compout" block could do the same for array nq.

hashman · Posted 09-20-2019 02:02 PM

@FreelanceReinh:

Aaaaaha! You're absolutely right! Thanks for reading the code so attentively and finding an actual error.

But of course: It's either 2 arrays and 2 link blocks or a single 2-dim array and a single link block.

I'll edit the original post accordingly.

Thanks again.

Kind regards

Paul D.

SAS Programming

Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Re: Proc Means

Question About Default Means Separation in PROC GLIMMIX

Mean median

proc means

Predicted Means and Confidence bands from PROC MIXED OUTPREDM

Proc Means

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...