Perform Test on Particular Variable across Multiple Data Sets

Reply
Occasional Contributor
Posts: 9

Perform Test on Particular Variable across Multiple Data Sets

Hello,

 

I have 241 data sets.  In each set, there is a variable called "Observed Closings" measured quarterly.  I need to test whether the number of observations with zero observed closings is less than 5% of the total number of observations, (# of Zero ObservedClosings) / n >= 5.0%.

 

Any thoughts?

 

MG

PROC Star
Posts: 253

Re: Perform Test on Particular Variable across Multiple Data Sets

Please give something in form of data, so that it is easy to understand and then someone can help you more easily.

Occasional Contributor
Posts: 9

Re: Perform Test on Particular Variable across Multiple Data Sets

Data Set 1

Date      # of Observed Closings

1                          0

2                          5

3                          9

4                          4 

5                          2

 

I need to test whether # of observed closings divided by total observations is greater than or equal to 5%.  In this case, the number of observations with zero observed closings is 1.  There are 5 total observations.  So, 1/5 = 0.20 or 20%.

 

Thanks,

 

PROC Star
Posts: 552

Re: Perform Test on Particular Variable across Multiple Data Sets

You can do it like this in a data step

 

Data have;
input Date Observed_Closings;
datalines;
1  0
2  5
3  9
4  4 
5  2
;

data want(keep=ratio);
   set have end=eof;
   if Observed_Closings = 0 then c + 1;

   if eof then do;
      ratio = c/_N_;
      output;
   end;

   retain c;
run;
Occasional Contributor
Posts: 9

Re: Perform Test on Particular Variable across Multiple Data Sets

Thanks.  Is there a way to have it run the test on multiple data sets with similar naming convention?

 

Data set names are

        sasuser.out07086

        sasuser.out19102

        .

        .

        .

        sasuser.out96161

 

Thanks,

MG

PROC Star
Posts: 552

Re: Perform Test on Particular Variable across Multiple Data Sets

Do you want one ratio for each dataset or do you want one ratio total for all the datasets?

Occasional Contributor
Posts: 9

Re: Perform Test on Particular Variable across Multiple Data Sets

One for each dataset.

PROC Star
Posts: 552

Re: Perform Test on Particular Variable across Multiple Data Sets

In that case you can do like this:

 

Data have1;
input Date Observed_Closings;
datalines;
1  0
2  5
3  9
4  4 
5  2
;

Data have2;
input Date Observed_Closings;
datalines;
1  0
2  0
3  9
4  4 
5  2
;

Data have3;
input Date Observed_Closings;
datalines;
1  0
2  0
3  0
4  4 
5  2
;

data collect_them;
   set have: indsname=source;
   sourceData = source;
run;

proc sort data = collect_them;
   by sourceData date;
run;

data want(keep=sourcedata ratio);
   set collect_them;
   by sourcedata;

   if first.sourcedata then do;
      c_zero=0;c_total=0;
   end;

   if Observed_Closings = 0 then c_zero + 1;
   c_total + 1;

   if last.sourcedata then do;
      ratio = c_zero/c_total;
      output;
   end;

   retain c_zero c_total;
run;
Super User
Posts: 10,516

Re: Perform Test on Particular Variable across Multiple Data Sets


mgarrison wrote:

One for each dataset.


The the dataset with the indsname option described would let you use the SourceData as a class variable in proc means or summary or by variable for freq or data step first. and last. processing.

PROC Star
Posts: 552

Re: Perform Test on Particular Variable across Multiple Data Sets

@ballardw, never knew about the indsname option. Very cool stuff, thank you Smiley Happy

Super User
Posts: 10,516

Re: Perform Test on Particular Variable across Multiple Data Sets


mgarrison wrote:

Thanks.  Is there a way to have it run the test on multiple data sets with similar naming convention?

 

Data set names are

        sasuser.out07086

        sasuser.out19102

        .

        .

        .

        sasuser.out96161

 

Thanks,

MG


Combine the data as

data want;

    length SourceData $ 41;

    set sasuser.out: indsname=source;

    SourceData=Source;

end;

 

The colon after Out is a wildcard to get all of the datasets that start with OUT from that library.

Sourcedata variable with have the library.dataset name contributing a record if needed.

Ask a Question
Discussion stats
  • 10 replies
  • 128 views
  • 4 likes
  • 4 in conversation