Hello Experts,
I have got 100 SAS tables and they are all stored under same library.
I want to check if some of the columns are populated correctly(do a missing value check,)
there are about 8 columns I want to check however the 8 columns are not all presented in all the table.
,so some tables may have 2 out of the 8 columns,some may have 3 .
all the 8 columns I want to check are in numeric value.
The output table I want to have should have the following columns
The table name that has been checked
the library name the table stored
column name that has at least one missing value
I hope I made my question clear.
THanks
What's are you counting as a missing value, a single value or all values in the column? Are all the columns numeric? Or character? A mix?
Are the datasets consistent in any manner, same number of variables, similar number of rows? Is there a naming convention? Are those datasets the only ones in the library or are there others that don't need to be processed.
The process here is : develop code for one dataset. Take code and make it dynamic using SASHELP.VTABLE or information from PROC CONTENTS/DATASETS.
Once you get that working, use SASHELP.VTABLE to drive the macro to run it for each dataset. Or use DOSUBL. If you're new to macro's/automation I wouldn't recommend DOSUBL personally.
I modified my question hope it is more clear.
i build the code my self but I want to see if experts have better way of handling this problem.
sorry can't share my code as I am typing using my phone.
@gyambqt Please use a more descriptive title in the future. This forum is intended as a resource to both answer your current question and for future readers to find solutions to problems they have that are similar. Additionally, taking the time to both detail your question will save the time it takes me to ask all the questions and then you still have to answer them, rather than just being able to answer your question.
sure, I will notice that in the future.
Quick and dirty example:
data work.test1; a=1; b=4; c="1"; run; data work.test2; a=.; b="xyz"; c="RTY"; run; data work.test3; a=9; b=""; c=""; run; proc sql; create table RESULTS (DSNAME char(200),VAR char(200),TEST char(200)); quit; data _null_; set sashelp.vtable (where=(libname="WORK" and substr(memname,1,4)="TEST")); call execute('ods output onewayfreqs=onewayfreqs_'||strip(memname)||'; proc freq data='||strip(memname)||'; tables _all_; run; data onewayfreqs_'||strip(memname)||' (keep=dsname var result); set onewayfreqs_'||strip(memname)||'; dsname="'||strip(memname)||'"; var=table; result=ifc(percent=100,"Non Missing","Missing"); run; proc sql; insert into RESULTS select * from ONEWAYFREQS_'||strip(memname)||'; quit;'); run;
Hi expert, if any of my table contain a mixture of character and numeric variable and I only want to test if any of the 8 columns has missing value. how do I do that?
for example: I have 100 columns in one of my table A and 3 of the 8 columns are presented there and I want to test if the 3 columns are missing.
Also I found your code will check every single records. but if my data contain 100 millitons records it will perform a proc fre for all 100millions records and return a result which is not that efficent. Is there any way to improve the efficency?
So what you are saying is,
You have a 100 datasets which are 100million records, you don't know exactly what the structure is in each, and you want a general program to work with this? This is not a Q&A then. First you need a system to be able to work with big data, without the resources no code will process 100*100million in any good time. Secondly you need to map out the structure of your data so that you know what is in each, formulate a plan that consistently work across the data, proc freq was given in the example as SAS provided, compiled functions will operate faster than any code you write by hand. But what you can do is, once you have a list of variables in each dataset, filter freq for only those:
data work.test1; a=1; b=4; c="1"; run; data work.test2; a=.; b="xyz"; c="RTY"; run; data work.test3; a=9; b=""; c=""; run; proc sql; create table RESULTS (DSNAME char(200),VAR char(200),TEST char(200)); quit; data cols (keep=memname var_list); set sashelp.vcolumn (where=(libname="WORK" and substr(memname,1,4)="TEST" and NAME in ("a","b"))); by libname memname; length var_list $2000; retain var_list; var_list=ifc(first.memname,name,catx(' ',var_list,name)); if last.memname then output; run; data _null_; merge sashelp.vtable (where=(libname="WORK" and substr(memname,1,4)="TEST") in=a) work.cols (in=b); by memname; if a; call execute('ods output onewayfreqs=onewayfreqs_'||strip(memname)||'; proc freq data='||strip(memname)||' (keep='||strip(var_list)||'); tables _all_; run; data onewayfreqs_'||strip(memname)||' (keep=dsname var result); set onewayfreqs_'||strip(memname)||'; dsname="'||strip(memname)||'"; var=table; result=ifc(percent=100,"Non Missing","Missing"); run; proc sql; insert into RESULTS select * from ONEWAYFREQS_'||strip(memname)||'; quit;'); run;
Note I create a dataset cols which contains only the variables I want within the list of A,B.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.