Example 1: Producing a Complete Report of the Differences

Tom · Posted 01-06-2023 11:47 AM

@2222 wrote:
hi Tom, yes, the number of non-missing values for each variable. will you code (and thank you for posting!) handle the character variables and variables that have string entries?

The COUNT() aggregate function of PROC SQL will count the number of OBSERVATIONS that have a non-missing value. You can use use the NMISS() function to count the number of OBSERVATIONS that have a missing value.

But it is still not clear from the words you are using what you actually want. Do you want to count the number of distinct non missing values? Just how many times the value is not missing? You need to be more precise in what you want. Provide detailed examples (does not need to be the real data) would help.

Say you had a variable , let's call it MYVAR, in a dataset named MYDATA which has 7 observations or which 2 are missing. Among the other 5 observations there is a total of 3 distinct values of MYVAR.

So let's setup an example dataset and run some code.

data mydata;
  input myvar $ ;
cards;
.
A
B
C
A
A
.
;

proc sql;
select count(*) as n_observations
     , count(myvar) as non_missing
     , nmiss(myvar) as n_missing
     , count(distinct myvar) as n_values
from mydata 
;

Results:

n_observations  non_missing  n_missing  n_values
------------------------------------------------
             7            5          2         3

So which number do you want?

Reeza · Posted 01-05-2023 11:24 AM

Look up PROC COMPARE and run that for your unclean data set against your cleaned data set. It will produced a detailed report of the differences between the two data sets.

@2222 wrote:

Hello community,

Can anyone help me to see if in multiple datasets an array or list of variables exists? Ideally, the code would check the "dirty" datasets and then compare these to the corresponding "cleaned" dataset checking for a set of variables that have (hopefully) been removed from the "dirty" datasets making them the "cleaned" datasets. For example, datasets1-5 have variables that have been removed to create dataset1(-5)_cleaned and id like some code to check to make sure all the "cleaning" has been done by returning a 0 or 1 or exists/does not exist for each variable in each cleaned dataset. (using sas EG)
Hope this makes some sense.

Example 1: Producing a Complete Report of the Differences

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p1k00d45g03uv8n1bfx3d20breg6.htm

Report Inappropriate Content · Posted 01-05-2023 10:29 PM

thank you so much that was a very detailed report and that will certainly help in the future, for this though its a bit much. i just need to know if there are values in the variable column or not but thank you for replying

ballardw · Posted 01-06-2023 11:35 AM

@2222 wrote:
thank you so much that was a very detailed report and that will certainly help in the future, for this though its a bit much. i just need to know if there are values in the variable column or not but thank you for replying

If you want the number of unique values you might look at the NLEVELS output from Proc Freq.

Any of the approaches for selecting a set can be used.

Example code would look like:

ods select nlevels;
proc freq data=sashelp.class nlevels;
ods output nlevels=levelset;
run;

This reports on all variables in the data set. If the Nlevels variable is 0 then all observations have missing value for that variable. The Nmisslevels variable with 1 or more indicates at least one missing value. Missing levels would include the special missing of .A-.Z and ._ in addition to the basic . missing value.

Reeza · Posted 01-06-2023 01:48 PM

Try this instead then.

First create some fake data to test this in a clean NEW SAS session, nothing else should exist in the work library.


*fake data set 1;
data class1;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*fake data set 2;
data class2;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

Then run this code, exactly as is:



%let lib_compare = WORK;


%macro count_missing(input_dsn=);
*define formats to count missing values;
proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    INPUT_DATASET = "&input_dsn.";
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep INPUT_DATASET variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by INPUT_DATASET variable;
run;

*transpose only N;
*save to data set with prefix _CT_;
proc transpose data=long out=_ct_%scan(&input_dsn., 2, '.') prefix=N_;
    by INPUT_DATASET variable;
    id variable_value;
    var frequency;
run;

proc datasets lib=&lib_compare. nolist nodetails;
delete temp long;
quit;
%mend;


*run the macro for all data sets in the library specified;
data run_for_all_datasets;
set sashelp.vtable;
where libname = upcase("&lib_compare"); *must be upper case;
str = catt('%count_missing(input_dsn=', libname, '.', memname, ');');
call execute(str);
keep libname memname str;
run;

*Append all results together into one data set;
data summary_all;
length input_dataset $41. variable $32.;
set _ct_:;
run;

Now check the work library for a dataset called SUMMARY_ALL.

It should have a data set that looks like this if you run a PROC PRINT on it:

Obs	input_dataset	variable	_NAME_	N_Missing	N_Not Missing
1	WORK.CLASS1	Age	Frequency	1	18
2	WORK.CLASS1	Height	Frequency	4	15
3	WORK.CLASS1	Name	Frequency	.	19
4	WORK.CLASS1	Sex	Frequency	4	15
5	WORK.CLASS1	Weight	Frequency	4	15
6	WORK.CLASS2	Age	Frequency	1	18
7	WORK.CLASS2	Height	Frequency	4	15
8	WORK.CLASS2	Name	Frequency	.	19
9	WORK.CLASS2	Sex	Frequency	4	15
10	WORK.CLASS2	Weight	Frequency	4	15

This tells you the data set, the variable in the data set and how many records are missing/not missing.

Is this what you're looking for? If so, then you can change the library reference in the code above (%let lib_compare = WORK;) to be the library you want to check and run the code again for it. Delete temporary datasets between runs and tests with the following, changing lib in the code as needed.

*uncomment once you test this is working;
*delete temporary count tables for each dataset;
proc datasets lib=WORK nodetails nolist;
delete _ct_: class: run_: summary:;
quit;

Report Inappropriate Content · Posted 01-11-2023 03:35 AM

thank you Reeza, and everyone for the help, I really appreciate it

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Example 1: Producing a Complete Report of the Differences

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Example 1: Producing a Complete Report of the Differences

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Re: How to check all the variables in multiple datasets (single folder)

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!