Solved: How to find compare group for treatment group with SAS

owen_black · Posted 06-03-2016 01:07 PM

Hi, I have a treatment group of about 2000 firms and their financial data, and comparable data for about a million other firms. Anyone could tell me how could I find firms to build the compare group? Could you kindly tell me where I could find examples of sas code to do the job? Thanks a lot! Cheer, Owen

ballardw · Posted 06-03-2016 06:55 PM

Without knowing details of what you might consider a "match", here is a very generic way.

The first two datasets just build some dummy data with a name and value. The Proc sql compares the sales values (assumes that is what you want to match on, not much else in this data) and selects matches when the difference is 300 or less. Depending on the number of matches you might want you would adjust the value of that comparison parameter up or down.

If you only want the compare company to "match" one in the treatment set it will be up to you to remove the duplicates which would not be very difficult (left as an exercise for the interested reader).

If you multiple values to match on describing the space could be a simple as adding AND for the different variables.

Refinements but I'm too lazy to try to code today would be to identify the minimum absolute difference and only select the compare companies with that value.

I am not making ANY claim to efficiency in the code. Simple and brute force. Careful consideration of the differences at first step may help.


data treat;
   input name $ sales;
datalines;
abc   10000
cdf   20000
ged   30000
;
run;

data compare;
   length compname $ 8;
   do a='S','T','U','V','X','Y','Z';
      do b='S','T','U','V','X','Y','Z';
         do c='S','T','U','V','X','Y','Z';
            do d='S','T','U','V','X','Y','Z';
           
               compname=catt(a,b,c,d);
               compsales = round(100000*rand('uniform'),100);
               output;
            end;
         end;
      end;
   end;
   drop a b c;
run;


proc sql;
   create table compgroup as
   select a.*,b.*
   from treat as a, compare as b
   where abs(a.sales-b.compsales) < 300;
quit;

A real rough other approach would be to combine the two data sets with a variable that contains which set each record comes from.

Sort on the comparison values.

Extract the treatment records and the one or two following the treatment record (very easy) and possibly one or two preceding ( a litle more work).

View solution in original post

PGStats · Posted 06-03-2016 02:34 PM

How do you want to compare them? From your question only, it could be on the first letter of their name .

PG

ballardw · Posted 06-03-2016 06:55 PM

Without knowing details of what you might consider a "match", here is a very generic way.

The first two datasets just build some dummy data with a name and value. The Proc sql compares the sales values (assumes that is what you want to match on, not much else in this data) and selects matches when the difference is 300 or less. Depending on the number of matches you might want you would adjust the value of that comparison parameter up or down.

If you only want the compare company to "match" one in the treatment set it will be up to you to remove the duplicates which would not be very difficult (left as an exercise for the interested reader).

If you multiple values to match on describing the space could be a simple as adding AND for the different variables.

Refinements but I'm too lazy to try to code today would be to identify the minimum absolute difference and only select the compare companies with that value.

I am not making ANY claim to efficiency in the code. Simple and brute force. Careful consideration of the differences at first step may help.


data treat;
   input name $ sales;
datalines;
abc   10000
cdf   20000
ged   30000
;
run;

data compare;
   length compname $ 8;
   do a='S','T','U','V','X','Y','Z';
      do b='S','T','U','V','X','Y','Z';
         do c='S','T','U','V','X','Y','Z';
            do d='S','T','U','V','X','Y','Z';
           
               compname=catt(a,b,c,d);
               compsales = round(100000*rand('uniform'),100);
               output;
            end;
         end;
      end;
   end;
   drop a b c;
run;


proc sql;
   create table compgroup as
   select a.*,b.*
   from treat as a, compare as b
   where abs(a.sales-b.compsales) < 300;
quit;

A real rough other approach would be to combine the two data sets with a variable that contains which set each record comes from.

Sort on the comparison values.

Extract the treatment records and the one or two following the treatment record (very easy) and possibly one or two preceding ( a litle more work).

owen_black · Posted 06-04-2016 01:05 PM

Thank you so much ballardw, I got your idea, that's very similar to my need. Cheers, Owen

How to find compare group for treatment group with SAS

Re: How to find compare group for treatment group with SAS

Re: How to find compare group for treatment group with SAS

Re: How to find compare group for treatment group with SAS

Re: How to find compare group for treatment group with SAS

How to find compare group for treatment group with SAS

Re: How to find compare group for treatment group with SAS

Re: How to find compare group for treatment group with SAS

Re: How to find compare group for treatment group with SAS

Re: How to find compare group for treatment group with SAS

Registration is open