I have a datasets which have combination of numeric as well as character variables. I would like to know if there is any easy way to find out if 'all' the values of that variable are missing.
Thanks in advance!
- Dr. Abhijeet Safai
Many thanks to @ballardw , @Mazi , @data_null__ @yabwon @tom_grant , @PaigeMiller and all others for your response.
I was able to solve the issue but not easy way but by macro way. Somehow finally I found that the approach by macro will be better for me so I went ahead with that and the issue is solved. I would share the steps of what I did below.
And in this way the issue was solved. I would have liked to share the code for this but I will not be able to do it because I will need to type out the code again and that is very time consuming.
Many thanks to all those who responded. It was a fun exercise for me. I thought I will be able to avoid macro but later I found that macro is a better way to solve it. As I found this answer working for me, I will mark this as accepted answer.
Thank you.
- Dr. Abhijeet Safai
https://support.sas.com/resources/papers/proceedings20/4737-2020.pdf
Paper written by SAS communities member @Ksharp
Thanks @PaigeMiller .
I was also looking at this. I found this one simpler. I am just trying to take all the variables out by using out= option in proc freq in the code below. Please look at the out= part which I have bolded.
/* create a format to group missing and nonmissing */ proc format; value $missfmt ' '='Missing' other='Not Missing'; value missfmt . ='Missing' other='Not Missing'; run; proc freq data=one; format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */ tables _CHAR_ / out=test1 missing missprint nocum nopercent; format _NUMERIC_ missfmt.; tables _NUMERIC_ / missing missprint nocum nopercent; run;
Thank you.
- Dr. Abhijeet Safai
Is there something wrong with the part you have in bold? If so, please explain.
Hi @PaigeMiller ,
When I used only out=test1, it is not showing all the variables in the dataset. The variables can be seen in the output window, but I was not able to see them in the dataset where I can further process them. To further process them, I need them in the dataset instead in output window.
To find out which file is getting created at the background to store this information (which is available in output window), I used following code with ods trace on.
ods trace on;
ods output onewayfreqs=xxx;
proc freq data=one; format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */ tables _CHAR_ / out=test1 missing missprint nocum nopercent; run;
ods trace off;
As a result, I can now see the dataset in xxx but it is too complex to comprehend. It can be comprehended but there are other ways like macros to find out. I was thinking if there is any simpler way. I found the above approach simplest but that is not giving me information in dataset but only in output window and when I got the dataset which is creating that output window, it is again too complex.
Thank you.
- Dr. Abhijeet Safai
You can take the ONEWAYFREQS output and make it into a manageable/understandable data set. See https://communities.sas.com/t5/SAS-Programming/proc-freq-one-way-table-for-multiple-vars-export-to-d...
Thanks @PaigeMiller .
I was searching some simple way. Thanks for your response but I was thinking if there is any other simple way.
- Dr. Abhijeet Safai
I'm not sure I understand your definition of "simple". The code has been written by others, you just need to plug it into your program, that seems pretty simple to me.
@DrAbhijeetSafai wrote:
Thanks @PaigeMiller .
I was searching some simple way. Thanks for your response but I was thinking if there is any other simple way.
- Dr. Abhijeet Safai
A simple way is to use the NLEVELS output of PROC FREQ.
Example:
ods select none;
ods output nlevels=nlevels;
proc freq data=sashelp.cars nlevels;
tables _all_ / noprint;
run;
ods select all;
proc print data=nlevels;
run;
Output:
So to find the variables that have only missing values look for those that have NNonMissLevels=0.
proc sql noprint;
select nliteral(tablevar) into :missvar_list separated by ' '
from nlevels
where nnonmisslevels=0
;
quit;
There are a couple of problems/issues with this method:
The biggest one is that for large datasets with variables that have a lot of distinct values PROC FREQ might not be able to complete the analysis due to lack of memory.
The second is just the nature of using ODS OUTPUT datasets. They are built for DISPLAY and not really as DATA. So which variables appear and how they are defined in the output dataset will depend on what is in the dataset being analyzed. You can fix that by adding a second step that defines a consistent data structure. For example by setting the variable name and label fields to the maximum value that SAS supports for those attributes. And you can remove the unneeded formats that it attaches.
data nlevels;
length TableVar $32 NLevels NMissLevels NNonMissLevels 8 TableVarLabel $256 ;
set nlevels;
format _all_;
run;
Using this or similar formats will solve the memory problem. Still kind of slow for really large data sets.
proc format;
value $_miss(default=1) ' '=' ' other='X';
value _miss(default=1) ._-.z=' ' other='X';
quit;
Just for fun:
%let size=100;
data have;
array noMissC[&size.] $1 (&size.*"A");
array noMissN[&size.] (&size.*1) ;
array MissC[&size.] $1 ;
array MissN[&size.] ;
do i = 1 to 5e6;
output;
end;
run;
/*
proc print;
run;
*/
options ls=max;
%macro anyNonMissingData(ds);
proc format;
value missing
._-.z = " "
other = "*"
;
value $missing
" " = " "
other = "*"
;
run;
data _null_;
if 0 then set &ds.;
format _numeric_ missing. _character_ $missing.;
length _NAME_ $ 32 _TYPE_ $ 1;
declare hash _H_();
_H_.defineKey("_NAME_");
_H_.DefineDONE();
do until(_E_);
set &ds. end=_E_;
do until(_NAME_='_NAME_');
if _NAME_ NE '_NAME_' then
do;
call vnext(_NAME_, _TYPE_, _N_);
if NOT cmiss(vvaluex(_NAME_)) then rc=_H_.add();
end;
end;
end;
_H_.output(dataset:"work.onlyMissingData(where=((_NAME_ NE '_NAME_')))");
stop;
run;
%mend anyNonMissingData;
%anyNonMissingData(have)
proc print data = onlyMissingData;
run;
Have is 8.47GB, log says:
NOTE: The data set WORK.ONLYMISSINGDATA has 201 observations and 1 variables.
NOTE: There were 5000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 1.45 seconds
user cpu time 0.28 seconds
system cpu time 1.17 seconds
memory 1861.31k
OS Memory 26652.00k
I'd say, not bad (both time and memory), but still 8GB is not a big data.
But we can't do to much about that "all missing" search, its always O(n*k) time process(n=#of obs, k=#of vars).
Bart
@yabwon this is really nice. Great to see application of CALL VNEXT and VVALUEX
@yabwon wrote:
Just for fun:
%let size=100; data have; array noMissC[&size.] $1 (&size.*"A"); array noMissN[&size.] (&size.*1) ; array MissC[&size.] $1 ; array MissN[&size.] ; do i = 1 to 5e6; output; end; run; /* proc print; run; */ options ls=max; %macro anyNonMissingData(ds); proc format; value missing ._-.z = " " other = "*" ; value $missing " " = " " other = "*" ; run; data _null_; if 0 then set &ds.; format _numeric_ missing. _character_ $missing.; length _NAME_ $ 32 _TYPE_ $ 1; declare hash _H_(); _H_.defineKey("_NAME_"); _H_.DefineDONE(); do until(_E_); set &ds. end=_E_; do until(_NAME_='_NAME_'); if _NAME_ NE '_NAME_' then do; call vnext(_NAME_, _TYPE_, _N_); if NOT cmiss(vvaluex(_NAME_)) then rc=_H_.add(); end; end; end; _H_.output(dataset:"work.onlyMissingData(where=((_NAME_ NE '_NAME_')))"); stop; run; %mend anyNonMissingData; %anyNonMissingData(have) proc print data = onlyMissingData; run;
Have is 8.47GB, log says:
NOTE: The data set WORK.ONLYMISSINGDATA has 201 observations and 1 variables. NOTE: There were 5000000 observations read from the data set WORK.HAVE. NOTE: DATA statement used (Total process time): real time 1.45 seconds user cpu time 0.28 seconds system cpu time 1.17 seconds memory 1861.31k OS Memory 26652.00k
I'd say, not bad (both time and memory), but still 8GB is not a big data.
But we can't do to much about that "all missing" search, its always O(n*k) time process(n=#of obs, k=#of vars).
Bart
@yabwon , Thanks a lot!
Macro and do while loop is the approach I used. Many thanks once again!
- Dr. Abhijeet Safai
You don't need the frequency tables OUT=. All you need is NLEVELS. I believe this is about as simple as it can be.
proc format;
value $_miss(default=1) ' '=' ' other='X';
value _miss(default=1) ._-.z=' ' other='X';
quit;
data class;
set sashelp.class;
name = ' '||name;
allmiss = ' ';
x = .d;
run;
proc freq nlevels;
tables _all_ / noprint;
format _character_ $_miss. _numeric_ _miss.;
ods select nlevels;
ods output nlevels=nlevels;
run;
data nlevels;
length TableVar $32 NLevels NMissLevels NNonMissLevels 8;
set nlevels;
run;
proc print;
run;
proc sql noprint;
select TableVar into :allmissvars separated by ' '
from nlevels where nnonmisslevels eq 0;
quit;
%put NOTE: &=allmissvars;
@DrAbhijeetSafai wrote:
Thanks @PaigeMiller .
I was also looking at this. I found this one simpler. I am just trying to take all the variables out by using out= option in proc freq in the code below. Please look at the out= part which I have bolded.
/* create a format to group missing and nonmissing */ proc format; value $missfmt ' '='Missing' other='Not Missing'; value missfmt . ='Missing' other='Not Missing'; run; proc freq data=one; format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */ tables _CHAR_ / out=test1 missing missprint nocum nopercent; format _NUMERIC_ missfmt.; tables _NUMERIC_ / missing missprint nocum nopercent; run;Thank you.
- Dr. Abhijeet Safai
The OUT= option of the Tables data set only outputs the LAST result.
You would have to make a call for each variable separately, with a separate output data set for each, for this to do what you want. Much better ways are shown.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.