Hello,

I have 241 data sets.  In each set, there is a variable called "Observed Closings" measured quarterly.  I need to test whether the number of observations with zero observed closings is less than 5% of the total number of observations, (# of Zero ObservedClosings) / n >= 5.0%.

Any thoughts?

MG

Please give something in form of data, so that it is easy to understand and then someone can help you more easily.

Data Set 1

Date      # of Observed Closings

1                          0

2                          5

3                          9

4                          4

5                          2

I need to test whether # of observed closings divided by total observations is greater than or equal to 5%.  In this case, the number of observations with zero observed closings is 1.  There are 5 total observations.  So, 1/5 = 0.20 or 20%.

Thanks,

You can do it like this in a data step

``````Data have;
input Date Observed_Closings;
datalines;
1  0
2  5
3  9
4  4
5  2
;

data want(keep=ratio);
set have end=eof;
if Observed_Closings = 0 then c + 1;

if eof then do;
ratio = c/_N_;
output;
end;

retain c;
run;``````
Thanks.  Is there a way to have it run the test on multiple data sets with similar naming convention?

Data set names are

sasuser.out07086

sasuser.out19102

.

.

.

sasuser.out96161

Thanks,

MG

Do you want one ratio for each dataset or do you want one ratio total for all the datasets?

One for each dataset.

In that case you can do like this:

``````Data have1;
input Date Observed_Closings;
datalines;
1  0
2  5
3  9
4  4
5  2
;

Data have2;
input Date Observed_Closings;
datalines;
1  0
2  0
3  9
4  4
5  2
;

Data have3;
input Date Observed_Closings;
datalines;
1  0
2  0
3  0
4  4
5  2
;

data collect_them;
set have: indsname=source;
sourceData = source;
run;

proc sort data = collect_them;
by sourceData date;
run;

data want(keep=sourcedata ratio);
set collect_them;
by sourcedata;

if first.sourcedata then do;
c_zero=0;c_total=0;
end;

if Observed_Closings = 0 then c_zero + 1;
c_total + 1;

if last.sourcedata then do;
ratio = c_zero/c_total;
output;
end;

retain c_zero c_total;
run;``````
mgarrison wrote:

One for each dataset.

The the dataset with the indsname option described would let you use the SourceData as a class variable in proc means or summary or by variable for freq or data step first. and last. processing.

@ballardw, never knew about the indsname option. Very cool stuff, thank you

mgarrison wrote:

Thanks.  Is there a way to have it run the test on multiple data sets with similar naming convention?

Data set names are

sasuser.out07086

sasuser.out19102

.

.

.

sasuser.out96161

Thanks,

MG

Combine the data as

data want;

length SourceData \$ 41;

set sasuser.out: indsname=source;

SourceData=Source;

end;

The colon after Out is a wildcard to get all of the datasets that start with OUT from that library.

Sourcedata variable with have the library.dataset name contributing a record if needed.

