@2222 wrote:
hi Tom, yes, the number of non-missing values for each variable. will you code (and thank you for posting!) handle the character variables and variables that have string entries?
The COUNT() aggregate function of PROC SQL will count the number of OBSERVATIONS that have a non-missing value. You can use use the NMISS() function to count the number of OBSERVATIONS that have a missing value.
But it is still not clear from the words you are using what you actually want. Do you want to count the number of distinct non missing values? Just how many times the value is not missing? You need to be more precise in what you want. Provide detailed examples (does not need to be the real data) would help.
Say you had a variable , let's call it MYVAR, in a dataset named MYDATA which has 7 observations or which 2 are missing. Among the other 5 observations there is a total of 3 distinct values of MYVAR.
So let's setup an example dataset and run some code.
data mydata;
input myvar $ ;
cards;
.
A
B
C
A
A
.
;
proc sql;
select count(*) as n_observations
, count(myvar) as non_missing
, nmiss(myvar) as n_missing
, count(distinct myvar) as n_values
from mydata
;
Results:
n_observations non_missing n_missing n_values ------------------------------------------------ 7 5 2 3
So which number do you want?
Look up PROC COMPARE and run that for your unclean data set against your cleaned data set. It will produced a detailed report of the differences between the two data sets.
@2222 wrote:
Hello community,
Can anyone help me to see if in multiple datasets an array or list of variables exists? Ideally, the code would check the "dirty" datasets and then compare these to the corresponding "cleaned" dataset checking for a set of variables that have (hopefully) been removed from the "dirty" datasets making them the "cleaned" datasets. For example, datasets1-5 have variables that have been removed to create dataset1(-5)_cleaned and id like some code to check to make sure all the "cleaning" has been done by returning a 0 or 1 or exists/does not exist for each variable in each cleaned dataset. (using sas EG)
Hope this makes some sense.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p1k00d45g03uv8n1bfx3d20breg6.htm
@2222 wrote:
thank you so much that was a very detailed report and that will certainly help in the future, for this though its a bit much. i just need to know if there are values in the variable column or not but thank you for replying
If you want the number of unique values you might look at the NLEVELS output from Proc Freq.
Any of the approaches for selecting a set can be used.
Example code would look like:
ods select nlevels; proc freq data=sashelp.class nlevels; ods output nlevels=levelset; run;
This reports on all variables in the data set. If the Nlevels variable is 0 then all observations have missing value for that variable. The Nmisslevels variable with 1 or more indicates at least one missing value. Missing levels would include the special missing of .A-.Z and ._ in addition to the basic . missing value.
Try this instead then.
First create some fake data to test this in a clean NEW SAS session, nothing else should exist in the work library.
*fake data set 1;
data class1;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
*fake data set 2;
data class2;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
Then run this code, exactly as is:
%let lib_compare = WORK;
%macro count_missing(input_dsn=);
*define formats to count missing values;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
INPUT_DATASET = "&input_dsn.";
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep INPUT_DATASET variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by INPUT_DATASET variable;
run;
*transpose only N;
*save to data set with prefix _CT_;
proc transpose data=long out=_ct_%scan(&input_dsn., 2, '.') prefix=N_;
by INPUT_DATASET variable;
id variable_value;
var frequency;
run;
proc datasets lib=&lib_compare. nolist nodetails;
delete temp long;
quit;
%mend;
*run the macro for all data sets in the library specified;
data run_for_all_datasets;
set sashelp.vtable;
where libname = upcase("&lib_compare"); *must be upper case;
str = catt('%count_missing(input_dsn=', libname, '.', memname, ');');
call execute(str);
keep libname memname str;
run;
*Append all results together into one data set;
data summary_all;
length input_dataset $41. variable $32.;
set _ct_:;
run;
Now check the work library for a dataset called SUMMARY_ALL.
It should have a data set that looks like this if you run a PROC PRINT on it:
Obs input_dataset variable _NAME_ N_Missing N_Not Missing
1 WORK.CLASS1 Age Frequency 1 18
2 WORK.CLASS1 Height Frequency 4 15
3 WORK.CLASS1 Name Frequency . 19
4 WORK.CLASS1 Sex Frequency 4 15
5 WORK.CLASS1 Weight Frequency 4 15
6 WORK.CLASS2 Age Frequency 1 18
7 WORK.CLASS2 Height Frequency 4 15
8 WORK.CLASS2 Name Frequency . 19
9 WORK.CLASS2 Sex Frequency 4 15
10 WORK.CLASS2 Weight Frequency 4 15
This tells you the data set, the variable in the data set and how many records are missing/not missing.
Is this what you're looking for? If so, then you can change the library reference in the code above (%let lib_compare = WORK;)
to be the library you want to check and run the code again for it. Delete temporary datasets between runs and tests with the following, changing lib in the code as needed.
*uncomment once you test this is working;
*delete temporary count tables for each dataset;
proc datasets lib=WORK nodetails nolist;
delete _ct_: class: run_: summary:;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.