Hello,
I want to search through all the datasets in a specific library to see if two specific columns have blank records, if they do, write to a new file, the name of the dataset(s) with blank records, today's and the number of blank records found per dataset. So the output dataset will have three columns: Dataset name, today's date and number of blank rows:
I am stuck here so far.................
data blankClinicIdNum blankClinic blankIdNum;
set appl1.s:;
runDate=today();
format runDate date9.;
put dt ;
if clinic =. and idnum =. then output blankClinicIdNum;
if clinic =. then output blankClinic;
if idnum =. then output blankIdNum;
run;
data missingData;
set appl1.s: indsname =source;;
runDate=today();
format runDate date9.;
datasetName = source;
Missing_CLINIC_ID=0; MISSING_CLINIC=0; MISSING_ID=0;
if clinic =. and idnum =. then Missing_Clinic_ID = 1;
if clinic =. then MISSING_CLINIC=1;
if idnum =. then MISSING_ID=1;
run;
proc means data=missingData N SUM MEAN;
class datasetName runDate;
var MISSING_:;
output out summary = want;
run;
proc print data=want;
label N = "# of Observations'
SUM = '# Missing'
MEAN = '% Missing';
format mean percent12.1;
run;
An alternative approach.
@nduksy wrote:
Hello,
I want to search through all the datasets in a specific library to see if two specific columns have blank records, if they do, write to a new file, the name of the dataset(s) with blank records, today's and the number of blank records found per dataset. So the output dataset will have three columns: Dataset name, today's date and number of blank rows:
I am stuck here so far.................
data blankClinicIdNum blankClinic blankIdNum;
set appl1.s:;
runDate=today();
format runDate date9.;
put dt ;
if clinic =. and idnum =. then output blankClinicIdNum;
if clinic =. then output blankClinic;
if idnum =. then output blankIdNum;
run;
data missingData;
set appl1.s: indsname =source;;
runDate=today();
format runDate date9.;
datasetName = source;
Missing_CLINIC_ID=0; MISSING_CLINIC=0; MISSING_ID=0;
if clinic =. and idnum =. then Missing_Clinic_ID = 1;
if clinic =. then MISSING_CLINIC=1;
if idnum =. then MISSING_ID=1;
run;
proc means data=missingData N SUM MEAN;
class datasetName runDate;
var MISSING_:;
output out summary = want;
run;
proc print data=want;
label N = "# of Observations'
SUM = '# Missing'
MEAN = '% Missing';
format mean percent12.1;
run;
An alternative approach.
@nduksy wrote:
Hello,
I want to search through all the datasets in a specific library to see if two specific columns have blank records, if they do, write to a new file, the name of the dataset(s) with blank records, today's and the number of blank records found per dataset. So the output dataset will have three columns: Dataset name, today's date and number of blank rows:
I am stuck here so far.................
data blankClinicIdNum blankClinic blankIdNum;
set appl1.s:;
runDate=today();
format runDate date9.;
put dt ;
if clinic =. and idnum =. then output blankClinicIdNum;
if clinic =. then output blankClinic;
if idnum =. then output blankIdNum;
run;
Hi Sorry to bother you again. I want to be able to run the program repeatedly without updating the "runDate". Let me explain further - it is possible for the program to be ran today and 1 blank row is returned for dataset X and tomorrow when it is run, same dataset X now has two blank rows. I want to be able to report both occurrences as separate rows, but with different run dates. see sample table below:
Dataset Name | Run Date | Count |
s123 | 04OCT2021 | 2 |
s123 | 05OCT2021 | 5 |
Thanks as always!
Assuming all of the datasets have the two variables and start with the same prefix you could use a variation on your attempt.
Let's make some sample data first:
data x1(keep=name sex) x2(keep=name sex age);
set sashelp.class;
if _n_=3 then sex=' ';
if _n_=5 then name=' ';
run;
Now we can make a view that will generate 0/1 flags for missing.
data missing / view=missing;
set x: (keep=name sex) indsname=indsname;
dsname=indsname;
blank_name = missing(name);
blank_sex = missing(sex);
blank_name_sex = missing(name) and missing(sex);
keep dsname blank_: ;
run;
Which we can then use PROC SUMMARY to aggregate.
proc summary data=missing nway;
by dsname;
var blank_:;
output out=summary sum= ;
run;
Results:
blank_ blank_ blank_ Obs dsname _TYPE_ _FREQ_ name sex name_sex 1 WORK.X1 0 19 1 1 0 2 WORK.X2 0 19 1 1 0
With a little finagling you can get the result in one step.
data test ;
set x: (keep=name sex) indsname=indsname end=eof;
if indsname ne lag(indsname) and _n_>1 then do;
output;
call missing(blank_name,blank_sex,blank_name_sex);
end;
dsname=indsname;
retain dsname;
blank_name + missing(name);
blank_sex + missing(sex);
blank_name_sex + missing(name) and missing(sex);
if eof then output;
keep dsname blank_: ;
run;
blank_ blank_ blank_ Obs name sex name_sex dsname 1 1 1 0 WORK.X1 2 1 1 0 WORK.X2
You don't say what you are actually stuck on ...
I think that you want to add the INDSNAME option to capture the data set contributing to the current record.
data blankClinicIdNum blankClinic blankIdNum; set appl1.s: INDSNAME=DSN; Datasetname=DSN; runDate=today(); format runDate date9.; put dt ;/* what is this for???*/ if clinic =. and idnum =. then output blankClinicIdNum; if clinic =. then output blankClinic; if idnum =. then output blankIdNum; run;
The INDSNAME option on the SET statement creates a temporary variable holding the name, library and dataset, of the data set providing the current record. To have the value in the output data set you need to assign it to a permanent variable.
THEN you use something like Proc Freq, SQL , Means or Summary to count the combinations of datasetname and date.
If you want the counts of blankclinic and blankidnum in the same data set then combine (merge or join) the count data sets by datasetname and date (though the dates should be the same at this point) if you ever have these created on different dates..
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.