10-23-2017 10:40 AM
Hi I am trying to figure out why the resulting table (final table) contains zero counts of variables that are defined to be missing (empty). Here below is the example of my code:
data sample; do i = 1 to 1000; output; end; run; data tmp; set sample; a = .; b = ''; c = 'text'; run; proc contents data=tmp Position noprint out=see (keep = LIBNAME MEMNAME NAME TYPE LENGTH ENGINE MEMTYPE NOBS); run; data _null_; set see (keep=name); n = _N_; call symput('nrows',compress(n)); run; proc sql noprint; select distinct name into: var1 - : var&nrows from SEE; quit; data final_table; LENGTH NAME $50.; NAME = ''; NOBS_MISSING = .; run; data final_table; set final_table (obs=0); run; %macro loop_(); %do i = 1 %to &nrows; proc sql; create table &&var&i. as select compress("&&var&i.") as NAME, count(&&var&i.) as NOBS_MISSING from TMP where &&var&i. IS MISSING; quit; data final_table; set final_table &&var&i.; run; %end; %mend; %loop_()
I thought it should return nobs_missing = 1000 for missing variables, i.e. "a" and "b" (no matter if char or int format). Any suggestions? Thanks
10-23-2017 11:56 AM - edited 10-23-2017 05:27 PM
What is the question exactly?
I suspect that you are getting impacted by using the WHERE clause.
create table &&var&i. as select compress("&&var&i.") as NAME , count(&&var&i.) as NOBS_MISSING from TMP where &&var&i. IS MISSING ;
If no data meets the WHERE condition then there is nothing to return so you end up with zero rows instead of one row with a count of zero.
Try using MISSING() function, or a CASE statement that generates 1 or 0 values, so that you can use the SUM() aggregate function instead of COUNT()+WHERE.
create table &&var&i. as select compress("&&var&i.") as NAME , sum(missing(&&var&i.)) as NOBS_MISSING from TMP group by 1 ;
Also COUNT(X) will not count missing values of X. So if you only give it missing values you will always get a count of zero. You should use count(*) or count(1). Run this example.
proc sql ; select count(*) as count_of_records , count(age) as count_of_age , count(' ') as count_of_missing from sashelp.class ; quit;
10-23-2017 12:14 PM
I have run your code and it seems that
where &&var&i. IS MISSING;
does not work properly with sas sql.
I have replaced the sql step with next step and it worked perfectly:
data &&var&i.; set tmp end=eof; name = compress("&&var&i."); where &&var&i. is missing; if eof then do; nobs_missing = _N_; output; end; run;
10-23-2017 05:02 PM
Thank you both for your answers. I used proc freq which provides one with counts of missing/complete observations but wanted to know the reason for that missing function is not working properly with combination with where statement.
10-23-2017 05:10 PM - edited 10-23-2017 05:10 PM
Can you please provide a simple example of just a few data records and queries that demonstrate a case that shows either the MISSING() function or the IS MISSING logical condition not working properly? Eliminate the macro code and macro variables, just show the data step and/or SQL query that is not producing the expected results.
10-23-2017 05:43 PM
It has been written by other user:
where &&var&i. IS MISSING;
Perhaps, &&var&i cannot be read properly and thus one needs to rewrite the form of macro variable.
10-23-2017 08:18 PM
@Tom - next simplified code shows that "where var is missing" doesn't work properly in sas sql
data tmp1; infile datalines dlm=',' dsd truncover; input i char1 $ char2 $ num1 num2; datalines; 1,x,,11,. 2,y,a,12,3 ; run; proc sql; create table tst_num2a as select compress("num2") as NAME, count(num2) as NOBS_MISSING from TMP1 where (num2 IS MISSING); quit; proc sql; create table tst_char2 as select compress("char2") as NAME, count(char2) as NOBS_MISSING from TMP1 where (char2 IS MISSING); quit;