BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

@2222 wrote:
hi Tom, yes, the number of non-missing values for each variable. will you code (and thank you for posting!) handle the character variables and variables that have string entries?

The COUNT() aggregate function of PROC SQL will count the number of OBSERVATIONS that have a non-missing value.  You can use use the NMISS() function to count the number of OBSERVATIONS that have a missing value.

 

But it is still not clear from the words you are using what you actually want.  Do you want to count the number of distinct non missing values? Just how many times the value is not missing?  You need to be more precise in what you want.  Provide detailed examples (does not need to be the real data) would help.

 

 Say you had a variable , let's call it MYVAR, in a dataset named MYDATA which has 7 observations or which 2 are missing.  Among the other 5 observations there is a total of 3 distinct values of MYVAR.

 

So let's setup an example dataset and run some code.

data mydata;
  input myvar $ ;
cards;
.
A
B
C
A
A
.
;

proc sql;
select count(*) as n_observations
     , count(myvar) as non_missing
     , nmiss(myvar) as n_missing
     , count(distinct myvar) as n_values
from mydata 
;

Results:

n_observations  non_missing  n_missing  n_values
------------------------------------------------
             7            5          2         3

So which number do you want?

Reeza
Super User

Look up PROC COMPARE and run that for your unclean data set against your cleaned data set. It will produced a detailed report of the differences between the two data sets.

 


@2222 wrote:

Hello community,

Can anyone help me to see if in multiple datasets an array or list of variables exists? Ideally, the code would check the "dirty" datasets and then compare these to the corresponding "cleaned" dataset checking for a set of variables that have (hopefully) been removed from the "dirty" datasets making them the "cleaned" datasets. For example, datasets1-5 have variables that have been removed to create dataset1(-5)_cleaned and id like some code to check to make sure all the "cleaning" has been done by returning a 0 or 1 or exists/does not exist for each variable in each cleaned dataset. (using sas EG)
Hope this makes some sense.


Example 1: Producing a Complete Report of the Differences

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p1k00d45g03uv8n1bfx3d20breg6.htm

2222
Calcite | Level 5
thank you so much that was a very detailed report and that will certainly help in the future, for this though its a bit much. i just need to know if there are values in the variable column or not but thank you for replying
ballardw
Super User

@2222 wrote:
thank you so much that was a very detailed report and that will certainly help in the future, for this though its a bit much. i just need to know if there are values in the variable column or not but thank you for replying

If you want the number of unique values you might look at the NLEVELS output from Proc Freq.

Any of the approaches for selecting a set can be used.

Example code would look like:

ods select nlevels;
proc freq data=sashelp.class nlevels;
ods output nlevels=levelset;
run;

This reports on all variables in the data set. If the Nlevels variable is 0 then all observations have missing value for that variable. The Nmisslevels variable with 1 or more indicates at least one missing value. Missing levels would include the special missing of .A-.Z and ._ in addition to the basic . missing value.

 

Reeza
Super User

Try this instead then. 

 

First create some fake data to test this in a clean NEW SAS session, nothing else should exist in the work library.


*fake data set 1;
data class1;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*fake data set 2;
data class2;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

Then run this code, exactly as is:

 



%let lib_compare = WORK;


%macro count_missing(input_dsn=);
*define formats to count missing values;
proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    INPUT_DATASET = "&input_dsn.";
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep INPUT_DATASET variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by INPUT_DATASET variable;
run;

*transpose only N;
*save to data set with prefix _CT_;
proc transpose data=long out=_ct_%scan(&input_dsn., 2, '.') prefix=N_;
    by INPUT_DATASET variable;
    id variable_value;
    var frequency;
run;

proc datasets lib=&lib_compare. nolist nodetails;
delete temp long;
quit;
%mend;


*run the macro for all data sets in the library specified;
data run_for_all_datasets;
set sashelp.vtable;
where libname = upcase("&lib_compare"); *must be upper case;
str = catt('%count_missing(input_dsn=', libname, '.', memname, ');');
call execute(str);
keep libname memname str;
run;

*Append all results together into one data set;
data summary_all;
length input_dataset $41. variable $32.;
set _ct_:;
run;

Now check the work library for a dataset called SUMMARY_ALL.

It should have a data set that looks like this if you run a PROC PRINT on it:

Obs	input_dataset	variable	_NAME_	N_Missing	N_Not Missing
1	WORK.CLASS1	Age	Frequency	1	18
2	WORK.CLASS1	Height	Frequency	4	15
3	WORK.CLASS1	Name	Frequency	.	19
4	WORK.CLASS1	Sex	Frequency	4	15
5	WORK.CLASS1	Weight	Frequency	4	15
6	WORK.CLASS2	Age	Frequency	1	18
7	WORK.CLASS2	Height	Frequency	4	15
8	WORK.CLASS2	Name	Frequency	.	19
9	WORK.CLASS2	Sex	Frequency	4	15
10	WORK.CLASS2	Weight	Frequency	4	15

This tells you the data set, the variable in the data set and how many records are missing/not missing. 

 

Is this what you're looking for? If so, then you can change the library reference in the code above (%let lib_compare = WORK;) to be the library you want to check and run the code again for it. Delete temporary datasets between runs and tests with the following, changing lib in the code as needed.

*uncomment once you test this is working;
*delete temporary count tables for each dataset;
proc datasets lib=WORK nodetails nolist;
delete _ct_: class: run_: summary:;
quit;
2222
Calcite | Level 5
thank you Reeza, and everyone for the help, I really appreciate it

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 35 replies
  • 3113 views
  • 3 likes
  • 7 in conversation