Perform Test on Particular Variable across Multiple Data Sets

mgarrison · Posted 05-25-2017 03:26 PM

Hello,

I have 241 data sets. In each set, there is a variable called "Observed Closings" measured quarterly. I need to test whether the number of observations with zero observed closings is less than 5% of the total number of observations, (# of Zero ObservedClosings) / n >= 5.0%.

Any thoughts?

MG

kiranv_ · Posted 05-25-2017 03:34 PM

Please give something in form of data, so that it is easy to understand and then someone can help you more easily.

mgarrison · Posted 05-25-2017 03:40 PM

Data Set 1

Date # of Observed Closings

1 0

2 5

3 9

4 4

5 2

I need to test whether # of observed closings divided by total observations is greater than or equal to 5%. In this case, the number of observations with zero observed closings is 1. There are 5 total observations. So, 1/5 = 0.20 or 20%.

Thanks,

PeterClemmensen · Posted 05-25-2017 03:51 PM

You can do it like this in a data step

Data have;
input Date Observed_Closings;
datalines;
1  0
2  5
3  9
4  4 
5  2
;

data want(keep=ratio);
   set have end=eof;
   if Observed_Closings = 0 then c + 1;

   if eof then do;
      ratio = c/_N_;
      output;
   end;

   retain c;
run;

The DATA to DATA Step Macro
Blog: SASnrd

mgarrison · Posted 05-25-2017 04:23 PM

Thanks. Is there a way to have it run the test on multiple data sets with similar naming convention?

Data set names are

sasuser.out07086

sasuser.out19102

.

sasuser.out96161

Thanks,

MG

PeterClemmensen · Posted 05-25-2017 05:11 PM

Do you want one ratio for each dataset or do you want one ratio total for all the datasets?

The DATA to DATA Step Macro
Blog: SASnrd

mgarrison · Posted 05-25-2017 06:26 PM

One for each dataset.

PeterClemmensen · Posted 05-25-2017 06:48 PM

In that case you can do like this:

Data have1;
input Date Observed_Closings;
datalines;
1  0
2  5
3  9
4  4 
5  2
;

Data have2;
input Date Observed_Closings;
datalines;
1  0
2  0
3  9
4  4 
5  2
;

Data have3;
input Date Observed_Closings;
datalines;
1  0
2  0
3  0
4  4 
5  2
;

data collect_them;
   set have: indsname=source;
   sourceData = source;
run;

proc sort data = collect_them;
   by sourceData date;
run;

data want(keep=sourcedata ratio);
   set collect_them;
   by sourcedata;

   if first.sourcedata then do;
      c_zero=0;c_total=0;
   end;

   if Observed_Closings = 0 then c_zero + 1;
   c_total + 1;

   if last.sourcedata then do;
      ratio = c_zero/c_total;
      output;
   end;

   retain c_zero c_total;
run;

The DATA to DATA Step Macro
Blog: SASnrd

ballardw · Posted 05-25-2017 06:50 PM

@mgarrison wrote:

One for each dataset.

The the dataset with the indsname option described would let you use the SourceData as a class variable in proc means or summary or by variable for freq or data step first. and last. processing.

PeterClemmensen · Posted 05-25-2017 06:52 PM

@ballardw, never knew about the indsname option. Very cool stuff, thank you 🙂

The DATA to DATA Step Macro
Blog: SASnrd

ballardw · Posted 05-25-2017 05:34 PM

@mgarrison wrote:

Thanks. Is there a way to have it run the test on multiple data sets with similar naming convention?

Data set names are

        sasuser.out07086

        sasuser.out19102

        .

        .

        .

        sasuser.out96161

Thanks,

MG

Combine the data as

data want;

length SourceData $ 41;

set sasuser.out: indsname=source;

SourceData=Source;

end;

The colon after Out is a wildcard to get all of the datasets that start with OUT from that library.

Sourcedata variable with have the library.dataset name contributing a record if needed.

Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

Re: Perform Test on Particular Variable across Multiple Data Sets

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away