Hello
Considering the variable as missing and non missing, this is also an approach.
/* Creating a format */
proc format;
value $ missfmt ' ' = "Missing" other = "Not_Missing_char";
value nmissfmt . ="Missing" other="Not_Missing_num";
run;
/* Creating data */
data class;
Format name sex $missfmt.;
Format Age Height Weight nmissfmt.;
set sashelp.class;
if mod(_n_,3)=0 then name=" ";
if mod(_n_,4)=0 then Age=.;
if mod(_n_,5)=0 then Height=.;
if mod(_n_,5)=0 then Weight=.;
if mod(_n_,6)=0 then Sex=" ";
run;
/* counting missing and non missing values*/
proc freq data=class;
Tables _ALL_/ list nopercent nocum missing;
run;
The output will be like this
The method is really close to a solution.
I re-ran my code to produce the tables. I noticed though that for the missing numeric variables that the missing and non-missing counts looked reversed that is the number of non-missing were listed as missing in the table and the missing count was shown in the non-missing column.
If the var was character the count looked correct.
How is the count adjusted when the missing value is for a numeric variable?
Thank you.
wlierman
Make sure you are defining the formats properly. And if you used my final data step to collapse the ODS output then make sure you are testing for the formatted value that matches what you want.
value nmissfmt low-high ="Not_Missing" other="Missing";
Notice this definition is using the OTHER group to mean the MISSING values for the numeric format make sure you are setting the labels properly.
The other version in other posting in this thread.
value nmissfmt . ="Missing" other="Not_Missing_num";
will classify special missing values, like .A or .Z as NOT missing.
I'm not connecting the dots on this I ran the code that was was provided which I am sending from the proc format statement
proc format;
value $ missfmt ' ' = "Missing" other = "Not_Missing";
value nmissfmt low-high ="Missing" other="Not_Missing";
run;
* turn off the output and capture the one-way freq table TEMP dataset;
ods select none;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
** turn outputs back on;
ods select all;
** Collapse to one observation per variable;
Data &OUTPUT_DSN;
length name $32 missing not_missing total 8;
Set temp;
by table notsorted;
If first.table then call missing(of missing not_missing);
name = substr(table,7);
if vvaluex(name)='Missing' then missing=frequency;
else not_missing=frequency;
retain missing not_missing;
if last.table then do;
missing = sum(0,missing);
not_missing=sum(0,not_missing);
total=sum(missing,not_missing);
percent = divide(missing,total);
output;
end;
keep name missing not_missing total percent;
run;
The output is not correct. If the numeric format was working properly the age variables should have but a handful of nonmissing
values - like 1200 or 12000 - and the percent would be .90. The brief output from the actual output data set shows that the numeric variables are being counted as though they are the character variables. The only reason that there are so many non-missing character variables is that the value is either No or Yes - but there are blanks too which seem to be counted okay.
I have attached a one page word doc with a handful of examples.
Thanks for helping.
wlierman
The count for missing values for numeric looks correct in the sample data I have taken.
Verified using proc means as can be seen below.
/* Creating a format */
proc format;
value $ missfmt ' ' = "Missing" other = "Not_Missing_char";
value nmissfmt . ="Missing" other="Not_Missing_num";
run;
/* Creating data */
data class;
Format name sex $missfmt.;
Format Age Height Weight nmissfmt.;
set sashelp.class;
if mod(_n_,3)=0 then name=" ";
if mod(_n_,4)=0 then Age=.;
if mod(_n_,5)=0 then Height=.;
if mod(_n_,5)=0 then Weight=.;
if mod(_n_,6)=0 then Sex=" ";
run;
proc means data=class Nmiss;
var Age Height weight;
run;
The output is like this
The missing variables are in line with earlier results.
Can validate if I have sample data .
Hello,
I ran the code but ran into the usual error here
proc format;
335 value $ missfmt ' ' = "Missing" other = "Not_missing_char";
NOTE: Format $MISSFMT is already on the library WORK.FORMATS.
NOTE: Format $MISSFMT has been output.
336 value nmissfmt . = "Missing" other = "Not_missing_num";
NOTE: Format NMISSFMT is already on the library WORK.FORMATS.
NOTE: Format NMISSFMT has been output.
NOTE: PROCEDURE FORMAT used (Total process time):
real time 24.06 seconds
cpu time 0.28 seconds
337 Data OPERA.AKA_data_test;
338 *Format DLEAdi DMHDdi DCOMdi DEARdi DEYEdi DDRSdi DOUTdi DREMdi DPHYdi $missfmt.;
339 Format DEYEage DCOMage DREMage DEARage DDRSage DOUTage DLEAage DLIMage DMHDage DPHYage
339! AgAcq1st nmissfmt.;
340 set OPERA.AKA_data_forum;
ERROR: Variable DEYEage has been defined as both character and numeric.
ERROR: Variable DCOMage has been defined as both character and numeric.
ERROR: Variable DREMage has been defined as both character and numeric.
ERROR: Variable DEARage has been defined as both character and numeric.
ERROR: Variable DDRSage has been defined as both character and numeric.
ERROR: Variable DOUTage has been defined as both character and numeric.
ERROR: Variable DLEAage has been defined as both character and numeric.
ERROR: Variable DLIMage has been defined as both character and numeric.
ERROR: Variable DMHDage has been defined as both character and numeric.
ERROR: Variable DPHYage has been defined as both character and numeric.
ERROR: Variable AgAcq1st has been defined as both character and numeric.
341 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set OPERA.AKA_DATA_TEST may be incomplete. When this step was stopped there
were 0 observations and 12 variables.
WARNING: Data set OPERA.AKA_DATA_TEST was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
I think he procedure may work. What is the best method to correct this. I use the rename, swap, and drop the new_var at the end of the code. Is that the best.
Thanks.
wlierman
Not sure why you want to attach WORD file to show a few lines of text.
The counts you show with 20% missing is in-line with the data generation method you showed in your original posting.
You need to attach numeric formats to numeric variables and character formats to character variables.
If you are asking to test how many of you character values can be converted into numbers that is a totally different test and would need totally different kind of code. Ask a new question about that.
Note that the code you posted to generate test data only created CHARACTER variables. If you want to know how to generate some test data with numeric missing data open a new question.
Hello
From your question it appears that you want to get the count of non-missing and missing values.
Further from this statement "Here is what the columns look like in the actual data set (these are or would be numeric values -- age)" the variable in question appears to be numeric.
Under such case you can use proc means. A proof of concept example is given below. N is the total number of variables and NMISS is the number of missing variables. The difference would be non-missing values.. Adapt to your requirement.
/* creating a sample dataset with missing values*/
data class;
set sashelp.class;
if mod(_n_,3) eq 0 then age =.;
run;
/* Finding missing values*/
proc means data=class N NMISS;
var age;
run;
your result would be like this
Thank you.
I will test out the code. It looks like it is perfect for numeric vars.
I am in the middle of testing some other code - but will be sure to credit your help too.
Thanks.
wlierman
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.