@2222 wrote:
Someone else has written code to remove the values (these are old values that are no longer valid), i want to check to make sure that the values have indeed been removed.
What does that mean?
Terminology clarification. SAS datasets consist of OBSERVATIONS (what you might call a row in a spreadsheet or a printed reported) and VARIABLES (what you might call a column in a spreadsheet or a printed report).
Do you mean they changed the values of some the variables to something else? What did they change them to?
Do you mean they deleted the OBSERVATIONS that had an invalid value for at least one of the VARIABLES?
How do you know which values are invalid? What makes them invalid?
You mentioned before about comparing the "original" and the "clean" dataset. If you have a lot of datasets how to know which ones to compare to each other? Is there some pattern in the NAME of the datasets?
Please use correct diction. A variable cannot "have observations".
A dataset consists of observations (records or rows), which contain variables (columns), and the variables contain values.
As long as there are observations in a dataset, all variables defined will have values. These values may be missing or non-missing. You cannot have a "variable with no observations", unless the dataset is empty (no observations at all).
In light of this, restate your issue, using the correct terms.
@2222 wrote:
Someone else has written code to remove the values (these are old values that are no longer valid), i want to check to make sure that the values have indeed been removed.
As far as I understood you the situation is: Somebody has already cleaned your data by deleting the contents of certain columns in a certain subset of rows. And now you would like to check if your tables are really clean.
Maybe you could post that code here to give us an idea of the action previously taken and the resulting table. ... and maybe you add a few lines of code to add an example table to let us reproduce the status of your data.
--fja
Understood ... see my answer above ... maybe you could provide an example.
Is the number, name and type of columns fixed ?
Are 0 or "" valid entries, i.e. would be considered _not_ deleted?
Posting some code and/or sample data for Have and Want really helps to understand what you want.
Using your code as starting point below how you could get the list of variables dynamically.
data have;
infile datalines truncover dsd;
input varA varB $ varC;
datalines;
.,X,.
.,X,2
.,X,.
., ,4
;
%macro no_miss_cnt(ds=);
%let ds=%upcase(&ds);
%local cnt_list;
proc sql noprint;
select cats('count(',name,') as')||' '||name into :cnt_list separated by ','
from dictionary.columns
where
libname="%scan(WORK.&DS,-2,.)" and
memname="%scan(&DS,-1,.)"
;
quit;
proc sql;
select
count(*) as __n_rows label='N Observations',
&cnt_list
from &ds
;
quit;
%mend;
%no_miss_cnt(ds=work.have);
So you want the number of non-missing values for each variable?
The syntax is much easier if you use SAS code instead of SQL code.
For example to count the number of non-missing values for all of the numeric variables in a dataset just use PROC SUMMARY.
proc summary data= mylib.a1_cleaned ;
var _numeric_;
output out=want n= ;
run;
Or you could use code generation. So first get a list of the variables in the dataset. For example you could use PROC TRANSPOSE to do that.
proc transpose data= mylib.a1_cleaned(obs=0) out=names; run;
And once you have the names you could use that to generate the variable list in your SQL select statement.
If the number of variable is small then you could use a single macro variable to store the code:
proc sql;
reset noprint;
select catx(' ','count(',nliteral(_name_),') as',nliteral(_name_))
into :list separated by ','
from names
;
reset print;
select &list from mylib.a1_cleaned;
quit;
If it is larger then use a data step to write the code to a file and then %INCLUDE the file.
filename code temp;
data _null_;
set names end=eof ;
file code;
if _n_=1 then put 'select' / ' ' @;
else put ',' @ ;
length nliteral $60 ;
nliteral=nliteral(_name_);
put 'count(' nliteral ') as ' nliteral ;
if eof then put 'from mylib.a1_cleaned;';
run;
proc sql;
%include code / source2;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.