BookmarkSubscribeRSS Feed
gyambqt
Obsidian | Level 7

Hello Experts,

I have got 100 SAS tables and they are all stored under same library.

I want to check if some of the columns are populated correctly(do a missing value check,)

there are about 8 columns I want to check however the 8 columns are not all presented in all the table.

,so some tables may have 2 out of the 8 columns,some may have 3 .

all the 8 columns I want to check are in numeric value.

 

The output table I want to have should have the following columns

The table name that has been checked

the library name the table stored 

column name that has at least one missing value

 

I hope I made my question clear.

 

 

THanks

12 REPLIES 12
Reeza
Super User

What's are you counting as a missing value, a single value or all values in the column? Are all the columns numeric? Or character? A mix?

 

Are the datasets consistent in any manner, same number of variables, similar number of rows? Is there a naming convention? Are those datasets the only ones in the library or are there others that don't need to be processed. 

 

The process here is : develop code for one dataset. Take code and make it dynamic using SASHELP.VTABLE or information from PROC CONTENTS/DATASETS.

 

Once you get that working, use SASHELP.VTABLE to drive the macro to run it for each dataset. Or use DOSUBL.  If you're new to macro's/automation I wouldn't recommend DOSUBL personally. 

 

 

gyambqt
Obsidian | Level 7
yes, the dataset are consistent and always has the columns. however the column can be either numeric or date or character format.
Kurt_Bremser
Super User

@gyambqt wrote:
yes, the dataset are consistent and always has the columns. however the column can be either numeric or date or character format.

You have yet to answer ~80% of @Reeza's questions.

gyambqt
Obsidian | Level 7

I modified my question hope it is more clear.

i build the code my self but I want to see if experts have better way of handling this problem.

sorry can't share my code as I am typing using my phone.

Reeza
Super User

@gyambqt Please use a more descriptive title in the future. This forum is intended as a resource to both answer your current question and for future readers to find solutions to problems they have that are similar. Additionally, taking the time to both detail your question will save the time it takes me to ask all the questions and then you still have to answer them, rather than just being able to answer your question. 

 

gyambqt
Obsidian | Level 7

sure, I will notice that in the future.

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Quick and dirty example:

data work.test1;
  a=1;
  b=4;
  c="1";
run;

data work.test2;
  a=.;
  b="xyz";
  c="RTY";
run;

data work.test3;
  a=9;
  b="";
  c="";
run;

proc sql;
  create table RESULTS (DSNAME char(200),VAR char(200),TEST char(200));
quit;

data _null_;
  set sashelp.vtable (where=(libname="WORK" and substr(memname,1,4)="TEST"));
  call execute('ods output onewayfreqs=onewayfreqs_'||strip(memname)||';
                proc freq data='||strip(memname)||';
                  tables _all_;
                run;
                data onewayfreqs_'||strip(memname)||' (keep=dsname var result);
                  set onewayfreqs_'||strip(memname)||';
                  dsname="'||strip(memname)||'";
                  var=table;
                  result=ifc(percent=100,"Non Missing","Missing");
                run;
                proc sql; insert into RESULTS select * from ONEWAYFREQS_'||strip(memname)||'; quit;');
run;

gyambqt
Obsidian | Level 7

Hi expert,  if any of my table contain a mixture of character and numeric variable and I only want to test if any of the 8 columns has missing value. how do I do that?

for example: I have 100 columns in one of my table A and 3 of the 8 columns are presented there and I want to test if the 3 columns are missing.

gyambqt
Obsidian | Level 7

Also I found your code will check every single records. but if my data contain 100 millitons records it will perform a proc fre for all 100millions records and return a result which is not that efficent. Is there any way to improve the efficency?

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

So what you are saying is, 

You have a 100 datasets which are 100million records, you don't know exactly what the structure is in each, and you want a general program to work with this?  This is not a Q&A then.  First you need a system to be able to work with big data, without the resources no code will process 100*100million in any good time.  Secondly you need to map out the structure of your data so that you know what is in each, formulate a plan that consistently work across the data, proc freq was given in the example as SAS provided, compiled functions will operate faster than any code you write by hand.  But what you can do is, once you have a list of variables in each dataset, filter freq for only those:

data work.test1;
  a=1;
  b=4;
  c="1";
run;

data work.test2;
  a=.;
  b="xyz";
  c="RTY";
run;

data work.test3;
  a=9;
  b="";
  c="";
run;

proc sql;
  create table RESULTS (DSNAME char(200),VAR char(200),TEST char(200));
quit;

data cols (keep=memname var_list);
  set sashelp.vcolumn (where=(libname="WORK" and substr(memname,1,4)="TEST" and NAME in ("a","b")));
  by libname memname;
  length var_list $2000;
  retain var_list;
  var_list=ifc(first.memname,name,catx(' ',var_list,name));
  if last.memname then output;
run;

data _null_;
  merge sashelp.vtable (where=(libname="WORK" and substr(memname,1,4)="TEST") in=a)
        work.cols (in=b);
  by memname;
  if a;
  call execute('ods output onewayfreqs=onewayfreqs_'||strip(memname)||';
                proc freq data='||strip(memname)||' (keep='||strip(var_list)||');
                  tables _all_;
                run;
                data onewayfreqs_'||strip(memname)||' (keep=dsname var result);
                  set onewayfreqs_'||strip(memname)||';
                  dsname="'||strip(memname)||'";
                  var=table;
                  result=ifc(percent=100,"Non Missing","Missing");
                run;
                proc sql; insert into RESULTS select * from ONEWAYFREQS_'||strip(memname)||'; quit;');
run;

Note I create a dataset cols which contains only the variables I want within the list of A,B.

Peter_C
Rhodochrosite | Level 12
I wrote a routine which searched for empty columns in a data step. (It was enhanced to restrict the search to a subset of columns). Once a column has a non-empty cell, it no longer needs checking. In this way non-empty columns are quickly eliminated from investigation. A table pass stops as soon as all columns show non-missing (need not be all on the same row). Unfortunately, tables with even just one empty column, demand a complete pass.
There seems no way to avoid the I/O problem for large tables.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 3822 views
  • 0 likes
  • 5 in conversation