BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
FreelanceReinh
Jade | Level 19

@Steelers_In_DC's solution could be modified as follows to work also for character variables which do not contain "numeric" values:

  1. Insert options missing=' '; before the PROC TRANSPOSE step.
  2. Replace nmiss by cmiss in the last data step.

However, I have never tried to create a dataset with 14 million variables (as TRAN would be).

Steelers_In_DC
Barite | Level 11

The code I have works with numeic and character fields.  In the example I use both.

FreelanceReinh
Jade | Level 19

@Steelers_In_DC: Yes, you have both, but they all contain only "numeric" values, i.e. values which SAS can convert automatically into numeric values when it applies the NMISS function. If you replace one of the values by a text string, such as 'TEST', a note on "Invalid numeric data" will occur in the log and that text string will be incorrectly counted as missing (namely the missing numeric value resulting from the failed attempt to convert 'TEST' into a number).

 

The CMISS function can handle both numeric and character values. But since the PROC TRANSPOSE step creates only character variables COL1, COL2, ..., it would count the periods '.' from the numeric missings as non-missing (character strings). This is avoided by changing the MISSING system option as suggested. Special missing values would be counted as non-missing, though.

BrunoMueller
SAS Super FREQ

The NLEVELS of Proc FREQ does not exactly do what you need. Had another go using the Hash object in the DATA Step.

 

Have a look

 

data have;
  infile cards dlm="," dsd;
  input
    RAWREG 
    PVPRISM $
    OVRIDE3 $
    DAYS 
  ;
cards;
01,0939,,408
,0943,,379
09,0029,,170
11,0948,,
,0948,,59
04,6108,,57
06,0993,,59
06,6111,,59
06,0993,,220
04,0901,,59
,,3,
;
data _null_;
  set have end=last;
  array _xchar{*} _character_;
  array _xnum{*} _numeric_;

  if _n_ = 1 then do;
    length 
      _colName $ 32
      _nMiss 8
      _n 8
    ;
    declare hash hmiss( ordered: "Y");
    hmiss.defineKey("_colName");
    hmiss.defineData("_colName", "_n", "_nMiss");
    hmiss.defineDone();
  end;

  do i = 1 to dim(_xchar);
    _nMiss=0;
    _n=0;
    _colName = vname( _xchar{i} );
    _rc = hmiss.find();
    _nMiss + (missing(_xchar{i}) = 1);
    _n + (missing(_xchar{i}) = 0);

    if _rc = 0 then do;
      hmiss.replace();
    end;
    else do;
      hmiss.add();
    end;
  end;

  do i = 1 to dim(_xnum);
    _nMiss=0;
    _n=0;
    _colName = vname( _xnum{i} );
    _rc = hmiss.find();
    _nMiss + (missing(_xnum{i}) = 1);
    _n + (missing(_xnum{i}) = 0);

    if _rc = 0 then do;
      hmiss.replace();
    end;
    else do;
      hmiss.add();
    end;

  end;

  if last = 1 then do;
    hmiss.output(dataset: "hashmiss");
  end;
run;

proc print data=hashmiss;
run;

proc print data=have;
run;

Bruno

jenim514
Pyrite | Level 9

Bruno, I have no idea what half your code means, but I'm runing it now.  So far no errors!  I will let you know how it works out.

 

Thanks!!

 

 

Steelers_In_DC
Barite | Level 11

Here's the code I had before with Reinhard's correction, learn something everyday:

 

data have;
infile cards dsd;
input RAWREG$ PVPRISM$ OVRIDE3 DAYS;
cards;
01,0939,,408
,0943,,379
09,0029,,170
11,0948,,
,0948,,59
04,6108,,57
06,0993,,59
06,6111,,59
06,0993,,220
04,0901,,59
aa,aaaa,,5
.,.,,5
;

proc sql;
create table get_max as
select *,count(*) as max_obs
from have;

options missing=' ';
proc transpose data=get_max out=tran;by max_obs;var _all_;

data want (keep=Variable missing non_missing);
set tran(rename=(_name_ = Variable));
Missing=cmiss(of col: );
Non_Missing=max_obs - missing;
if variable in ('max_obs') then delete;
run;

proc sort data=want;by Variable;

jenim514
Pyrite | Level 9

@BrunoMueller  Thank you!  Your code worked perfectly and exactly what I needed.  I took about 45 min for 858 variables and over 4mill observations.  Not bad!!

BrunoMueller
SAS Super FREQ

You are welcome

 

I am sure the code can be optimized to go faster.

The first optimization would be to let Proc MEANS/SUMMARY do the counting for all the numeric variables and have the DATA Step code only for the character variables, this will for sure speed up things. See the code sample provided by @PaigeMiller

 

Give it a try.

 

Bruno

Astounding
PROC Star

I feel a need to jump back in, to create something simple and practical here.

 

First, note that PROC FREQ with a format would be the easiest program.  It does create a separate table for each variable (which you didn't really want), but the tradeoff is that it is quick and easy.

 

Given that you have a large data set, here is a reasonable approach.

 

First, count how many character variables you have.  (That's the complex part.)

 

data _null_;

set have;

array chars {*} _character_;

call symputx ('n_chars', dim(chars));

stop;

run;

 

Then use that number in the coding.

 

 

data _null_;

set have end=done nobs=_nobs_;

array chars {&n_chars} _character_;

array nonmiss {&n_chars};

do _n_=1 to &n_chars;

   if chars{_n_}>' ' then nonmiss{_n_} + 1;

end;

if done;

file print;

put 'Variable'  @35 'N missing'  @47 'N nonmissing' /  60 * '-';

length varname $32;

do _n_=1 to &n_chars;

   varname = vname(chars{_n_});

   n_nonmissing = max(0, nonmiss{_n_});

   n_missing = _nobs_ - n_nonmissing;

   put varname @35 n_missing comma12.  @47 n_nonmissing comma12.;

end;

run;

 

You process your large data set only once, you don't need to do any expensive transformations with it, and you don't even need to save the results ... the report is generated within the same DATA step that does the counting.

 

The code is untested, so you might want to test it (possibly debug it) with a small number of observations.

 

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 10048 views
  • 9 likes
  • 10 in conversation