05-25-2017 03:26 PM
Hello,
I have 241 data sets. In each set, there is a variable called "Observed Closings" measured quarterly. I need to test whether the number of observations with zero observed closings is less than 5% of the total number of observations, (# of Zero ObservedClosings) / n >= 5.0%.
Any thoughts?
MG
05-25-2017 03:34 PM
Please give something in form of data, so that it is easy to understand and then someone can help you more easily.
05-25-2017 03:40 PM
Data Set 1
Date # of Observed Closings
1 0
2 5
3 9
4 4
5 2
I need to test whether # of observed closings divided by total observations is greater than or equal to 5%. In this case, the number of observations with zero observed closings is 1. There are 5 total observations. So, 1/5 = 0.20 or 20%.
Thanks,
05-25-2017 03:51 PM
You can do it like this in a data step
Data have;
input Date Observed_Closings;
datalines;
1 0
2 5
3 9
4 4
5 2
;
data want(keep=ratio);
set have end=eof;
if Observed_Closings = 0 then c + 1;
if eof then do;
ratio = c/_N_;
output;
end;
retain c;
run;
05-25-2017 04:23 PM
Thanks. Is there a way to have it run the test on multiple data sets with similar naming convention?
Data set names are
sasuser.out07086
sasuser.out19102
.
.
.
sasuser.out96161
Thanks,
MG
05-25-2017 05:11 PM
Do you want one ratio for each dataset or do you want one ratio total for all the datasets?
05-25-2017 06:26 PM
One for each dataset.
05-25-2017 06:48 PM
In that case you can do like this:
Data have1;
input Date Observed_Closings;
datalines;
1 0
2 5
3 9
4 4
5 2
;
Data have2;
input Date Observed_Closings;
datalines;
1 0
2 0
3 9
4 4
5 2
;
Data have3;
input Date Observed_Closings;
datalines;
1 0
2 0
3 0
4 4
5 2
;
data collect_them;
set have: indsname=source;
sourceData = source;
run;
proc sort data = collect_them;
by sourceData date;
run;
data want(keep=sourcedata ratio);
set collect_them;
by sourcedata;
if first.sourcedata then do;
c_zero=0;c_total=0;
end;
if Observed_Closings = 0 then c_zero + 1;
c_total + 1;
if last.sourcedata then do;
ratio = c_zero/c_total;
output;
end;
retain c_zero c_total;
run;
05-25-2017 06:50 PM
mgarrison wrote:
One for each dataset.
The the dataset with the indsname option described would let you use the SourceData as a class variable in proc means or summary or by variable for freq or data step first. and last. processing.
05-25-2017 06:52 PM
@ballardw, never knew about the indsname option. Very cool stuff, thank you
05-25-2017 05:34 PM
mgarrison wrote:
Thanks. Is there a way to have it run the test on multiple data sets with similar naming convention?
Data set names are
sasuser.out07086
sasuser.out19102
.
.
.
sasuser.out96161
Thanks,
MG
Combine the data as
data want;
length SourceData $ 41;
set sasuser.out: indsname=source;
SourceData=Source;
end;
The colon after Out is a wildcard to get all of the datasets that start with OUT from that library.
Sourcedata variable with have the library.dataset name contributing a record if needed.