BookmarkSubscribeRSS Feed
mgarrison
Fluorite | Level 6

Hello,

 

I have 241 data sets.  In each set, there is a variable called "Observed Closings" measured quarterly.  I need to test whether the number of observations with zero observed closings is less than 5% of the total number of observations, (# of Zero ObservedClosings) / n >= 5.0%.

 

Any thoughts?

 

MG

10 REPLIES 10
kiranv_
Rhodochrosite | Level 12

Please give something in form of data, so that it is easy to understand and then someone can help you more easily.

mgarrison
Fluorite | Level 6

Data Set 1

Date      # of Observed Closings

1                          0

2                          5

3                          9

4                          4 

5                          2

 

I need to test whether # of observed closings divided by total observations is greater than or equal to 5%.  In this case, the number of observations with zero observed closings is 1.  There are 5 total observations.  So, 1/5 = 0.20 or 20%.

 

Thanks,

 

PeterClemmensen
Tourmaline | Level 20

You can do it like this in a data step

 

Data have;
input Date Observed_Closings;
datalines;
1  0
2  5
3  9
4  4 
5  2
;

data want(keep=ratio);
   set have end=eof;
   if Observed_Closings = 0 then c + 1;

   if eof then do;
      ratio = c/_N_;
      output;
   end;

   retain c;
run;
mgarrison
Fluorite | Level 6

Thanks.  Is there a way to have it run the test on multiple data sets with similar naming convention?

 

Data set names are

        sasuser.out07086

        sasuser.out19102

        .

        .

        .

        sasuser.out96161

 

Thanks,

MG

PeterClemmensen
Tourmaline | Level 20

Do you want one ratio for each dataset or do you want one ratio total for all the datasets?

PeterClemmensen
Tourmaline | Level 20

In that case you can do like this:

 

Data have1;
input Date Observed_Closings;
datalines;
1  0
2  5
3  9
4  4 
5  2
;

Data have2;
input Date Observed_Closings;
datalines;
1  0
2  0
3  9
4  4 
5  2
;

Data have3;
input Date Observed_Closings;
datalines;
1  0
2  0
3  0
4  4 
5  2
;

data collect_them;
   set have: indsname=source;
   sourceData = source;
run;

proc sort data = collect_them;
   by sourceData date;
run;

data want(keep=sourcedata ratio);
   set collect_them;
   by sourcedata;

   if first.sourcedata then do;
      c_zero=0;c_total=0;
   end;

   if Observed_Closings = 0 then c_zero + 1;
   c_total + 1;

   if last.sourcedata then do;
      ratio = c_zero/c_total;
      output;
   end;

   retain c_zero c_total;
run;
ballardw
Super User

@mgarrison wrote:

One for each dataset.


The the dataset with the indsname option described would let you use the SourceData as a class variable in proc means or summary or by variable for freq or data step first. and last. processing.

PeterClemmensen
Tourmaline | Level 20

@ballardw, never knew about the indsname option. Very cool stuff, thank you 🙂

ballardw
Super User

@mgarrison wrote:

Thanks.  Is there a way to have it run the test on multiple data sets with similar naming convention?

 

Data set names are

        sasuser.out07086

        sasuser.out19102

        .

        .

        .

        sasuser.out96161

 

Thanks,

MG


Combine the data as

data want;

    length SourceData $ 41;

    set sasuser.out: indsname=source;

    SourceData=Source;

end;

 

The colon after Out is a wildcard to get all of the datasets that start with OUT from that library.

Sourcedata variable with have the library.dataset name contributing a record if needed.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2381 views
  • 4 likes
  • 4 in conversation