BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sajid01
Meteorite | Level 14

Hello
Considering the variable as missing and non missing, this is also an approach.

/* Creating a format */
proc format;
  value $ missfmt ' ' = "Missing"  other = "Not_Missing_char";
  value nmissfmt . ="Missing" other="Not_Missing_num";
run;
/* Creating data */
data class;
Format name sex $missfmt.;
Format Age Height Weight nmissfmt.;
set sashelp.class;
if mod(_n_,3)=0 then name=" ";
if mod(_n_,4)=0 then Age=.;
if mod(_n_,5)=0 then Height=.;
if mod(_n_,5)=0 then Weight=.;
if mod(_n_,6)=0 then Sex=" ";
run;
/* counting missing and non missing values*/
proc freq data=class;
Tables _ALL_/ list nopercent nocum missing;
run;

The output will be like this

Screenshot 2021-05-11 7.39.59 PM.png Screenshot 2021-05-11 7.40.53 PM.png

 

Reeza
Super User
From the OP's post he doesn't want the Missing/Not Missing though - he wants the Yes/No values so that doesn't seem to work. Otherwise my initial post had this as the result if he used the code from the link, I only included a partial set in the post directly.
wlierman
Lapis Lazuli | Level 10

The method is really close to a solution.

 

I re-ran my code to produce the tables.  I noticed though that for the missing numeric variables that the missing and non-missing counts looked reversed that is the number of non-missing were listed as missing in the table and the missing count was shown in the non-missing column.

 

If the var was character the count looked correct.

 

How is the count adjusted when the missing value is for a numeric variable?

 

Thank you.

 

wlierman

Tom
Super User Tom
Super User

Make sure you are defining the formats properly. And if you used my final data step to collapse the ODS output then make sure you are testing for the formatted value that matches what you want.  

  value nmissfmt low-high ="Not_Missing" other="Missing";

Notice this definition is using the OTHER group to mean the MISSING values for the numeric format make sure you are setting the labels properly. 

 

The other version in other posting in this thread.

  value nmissfmt . ="Missing" other="Not_Missing_num";

will classify special missing values, like .A or .Z as NOT missing.

wlierman
Lapis Lazuli | Level 10

I'm not connecting the dots on this I ran the code that was was provided which I am sending from the proc format statement

 

proc format;
  value $ missfmt ' ' = "Missing"  other = "Not_Missing";
  value nmissfmt low-high ="Missing" other="Not_Missing";
run;

* turn off the output and capture the one-way freq table TEMP dataset;
ods select none;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
	format _numeric_  nmissfmt.  _character_  $missfmt.;
run;

** turn outputs back on;

ods select all;

** Collapse to one observation per variable;

Data &OUTPUT_DSN;
  length name $32 missing not_missing total 8;
  Set temp;
  by table notsorted;
  If first.table then call missing(of missing not_missing);
  name = substr(table,7);
  if vvaluex(name)='Missing' then missing=frequency;
  else not_missing=frequency;
  retain missing not_missing;
  if last.table then do;
     missing = sum(0,missing);
	 not_missing=sum(0,not_missing);
	 total=sum(missing,not_missing);
	 percent = divide(missing,total);
	 output;
  end;
  keep name missing not_missing total percent;
run;

The output is not correct. If the numeric format was working properly the age variables should have but a handful of nonmissing

values - like 1200  or 12000 -  and the percent would be .90.  The brief output from the actual output data set shows that the numeric variables are being counted as though they are the character variables.  The only reason that there are so many non-missing character variables is that the value is either No or Yes  - but there are blanks too which seem to be counted okay.

I have attached a one page word doc with a handful of examples.

 

Thanks for helping.

 

wlierman

Sajid01
Meteorite | Level 14

The count for missing values for numeric looks correct in the sample data I have taken.
Verified using proc means as can be seen below.

/* Creating a format */
proc format;
  value $ missfmt ' ' = "Missing"  other = "Not_Missing_char";
  value nmissfmt . ="Missing" other="Not_Missing_num";
run;
/* Creating data */
data class;
Format name sex $missfmt.;
Format Age Height Weight nmissfmt.;
set sashelp.class;
if mod(_n_,3)=0 then name=" ";
if mod(_n_,4)=0 then Age=.;
if mod(_n_,5)=0 then Height=.;
if mod(_n_,5)=0 then Weight=.;
if mod(_n_,6)=0 then Sex=" ";
run;

proc means data=class  Nmiss;
var Age Height weight;
run;

The output is like this
Screenshot 2021-05-12 3.31.19 PM.png

The missing variables are in line with earlier results.
Can validate if I have sample data .

wlierman
Lapis Lazuli | Level 10

Hello,

 

I ran the code but ran into the usual error here

proc format;
335     value $ missfmt ' ' = "Missing"  other = "Not_missing_char";
NOTE: Format $MISSFMT is already on the library WORK.FORMATS.
NOTE: Format $MISSFMT has been output.
336     value nmissfmt . = "Missing"  other = "Not_missing_num";
NOTE: Format NMISSFMT is already on the library WORK.FORMATS.
NOTE: Format NMISSFMT has been output.

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           24.06 seconds
      cpu time            0.28 seconds


337  Data OPERA.AKA_data_test;
338  *Format DLEAdi DMHDdi DCOMdi DEARdi DEYEdi DDRSdi DOUTdi DREMdi DPHYdi $missfmt.;
339  Format DEYEage DCOMage DREMage DEARage DDRSage DOUTage DLEAage DLIMage DMHDage DPHYage
339! AgAcq1st nmissfmt.;
340  set OPERA.AKA_data_forum;
ERROR: Variable DEYEage has been defined as both character and numeric.
ERROR: Variable DCOMage has been defined as both character and numeric.
ERROR: Variable DREMage has been defined as both character and numeric.
ERROR: Variable DEARage has been defined as both character and numeric.
ERROR: Variable DDRSage has been defined as both character and numeric.
ERROR: Variable DOUTage has been defined as both character and numeric.
ERROR: Variable DLEAage has been defined as both character and numeric.
ERROR: Variable DLIMage has been defined as both character and numeric.
ERROR: Variable DMHDage has been defined as both character and numeric.
ERROR: Variable DPHYage has been defined as both character and numeric.
ERROR: Variable AgAcq1st has been defined as both character and numeric.
341  run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set OPERA.AKA_DATA_TEST may be incomplete.  When this step was stopped there
         were 0 observations and 12 variables.
WARNING: Data set OPERA.AKA_DATA_TEST was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds


I think he procedure may work.  What is the best method to correct this.  I use the rename, swap, and drop the new_var at the end of the code.  Is that the best.

 

Thanks.

 

wlierman

Tom
Super User Tom
Super User

Not sure why you want to attach WORD file to show a few lines of text.

The counts you show with 20% missing is in-line with the data generation method you showed in your original posting.  

 

You need to attach numeric formats to numeric variables and character formats to character variables.

 

If you are asking to test how many of you character values can be converted into numbers that is a totally different test and would need totally different kind of code.  Ask a new question about that.

 

Note that the code you posted to generate test data only created CHARACTER variables.  If you want to know how to generate some test data with numeric missing data open a new question.

Sajid01
Meteorite | Level 14

Hello
From your question it appears that you want to get the count of non-missing  and missing values.
Further  from this statement "Here is what the columns look like in the actual data set  (these are or would be numeric values -- age)" the variable in question appears to be numeric.
Under such case you can use proc means. A proof of concept example is given below. N is the total number of variables and NMISS is the number of missing variables. The difference would be non-missing values.. Adapt to your requirement.

/* creating a sample dataset with missing values*/
data class; set sashelp.class; if mod(_n_,3) eq 0 then age =.; run; /* Finding missing values*/ proc means data=class N NMISS; var age; run;

your result would be like this

Screenshot 2021-05-11 4.51.32 PM.png

 

wlierman
Lapis Lazuli | Level 10

Thank you. 

 

I will test out the code.  It looks like it is perfect for numeric vars.

 

I am in the middle of testing some other code - but will be sure to credit your help too.

 

Thanks.

 

wlierman

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 24 replies
  • 4182 views
  • 4 likes
  • 5 in conversation