SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to find compare group for treatment group with SAS

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

How to find compare group for treatment group with SAS

Hi, I have a treatment group of about 2000 firms and their financial data, and comparable data for about a million other firms. Anyone could tell me how could I find firms to build the compare group? Could you kindly tell me where I could find examples of sas code to do the job? Thanks a lot! Cheer, Owen

Accepted Solutions
Solution
‎06-04-2016 01:00 PM
Super User
Posts: 11,343

Re: How to find compare group for treatment group with SAS

Posted in reply to owen_black

Without knowing details of what you might consider a "match", here is a very generic way.

The first two datasets just build some dummy data with a name and value. The Proc sql compares the sales values (assumes that is what you want to match on, not much else in this data) and selects matches when the difference is 300 or less. Depending on the number of matches you might want you would adjust the value of that comparison parameter up or down.

If you only want the compare company to "match" one in the treatment set it will be up to you to remove the duplicates which would not be very difficult (left as an exercise for the interested reader).

 

If you multiple values to match on describing the space could be a simple as adding AND for the different variables.

Refinements but I'm too lazy to try to code today would be to identify the minimum absolute difference and only select the compare companies with that value.

 

I am not making ANY claim to efficiency in the code. Simple and brute force. Careful consideration of the differences at first step may help.

 


data treat;
   input name $ sales;
datalines;
abc   10000
cdf   20000
ged   30000
;
run;

data compare;
   length compname $ 8;
   do a='S','T','U','V','X','Y','Z';
      do b='S','T','U','V','X','Y','Z';
         do c='S','T','U','V','X','Y','Z';
            do d='S','T','U','V','X','Y','Z';
           
               compname=catt(a,b,c,d);
               compsales = round(100000*rand('uniform'),100);
               output;
            end;
         end;
      end;
   end;
   drop a b c;
run;


proc sql;
   create table compgroup as
   select a.*,b.*
   from treat as a, compare as b
   where abs(a.sales-b.compsales) < 300;
quit;

A real rough other approach would be to combine the two data sets with a variable that contains which set each record comes from.

 

Sort on the comparison values.

Extract the treatment records and the one or two following the treatment record (very easy) and possibly one or two preceding ( a litle more work).

View solution in original post


All Replies
Respected Advisor
Posts: 4,920

Re: How to find compare group for treatment group with SAS

Posted in reply to owen_black

How do you want to compare them? From your question only, it could be on the first letter of their name Smiley Happy

PG
Solution
‎06-04-2016 01:00 PM
Super User
Posts: 11,343

Re: How to find compare group for treatment group with SAS

Posted in reply to owen_black

Without knowing details of what you might consider a "match", here is a very generic way.

The first two datasets just build some dummy data with a name and value. The Proc sql compares the sales values (assumes that is what you want to match on, not much else in this data) and selects matches when the difference is 300 or less. Depending on the number of matches you might want you would adjust the value of that comparison parameter up or down.

If you only want the compare company to "match" one in the treatment set it will be up to you to remove the duplicates which would not be very difficult (left as an exercise for the interested reader).

 

If you multiple values to match on describing the space could be a simple as adding AND for the different variables.

Refinements but I'm too lazy to try to code today would be to identify the minimum absolute difference and only select the compare companies with that value.

 

I am not making ANY claim to efficiency in the code. Simple and brute force. Careful consideration of the differences at first step may help.

 


data treat;
   input name $ sales;
datalines;
abc   10000
cdf   20000
ged   30000
;
run;

data compare;
   length compname $ 8;
   do a='S','T','U','V','X','Y','Z';
      do b='S','T','U','V','X','Y','Z';
         do c='S','T','U','V','X','Y','Z';
            do d='S','T','U','V','X','Y','Z';
           
               compname=catt(a,b,c,d);
               compsales = round(100000*rand('uniform'),100);
               output;
            end;
         end;
      end;
   end;
   drop a b c;
run;


proc sql;
   create table compgroup as
   select a.*,b.*
   from treat as a, compare as b
   where abs(a.sales-b.compsales) < 300;
quit;

A real rough other approach would be to combine the two data sets with a variable that contains which set each record comes from.

 

Sort on the comparison values.

Extract the treatment records and the one or two following the treatment record (very easy) and possibly one or two preceding ( a litle more work).

New Contributor
Posts: 2

Re: How to find compare group for treatment group with SAS

Thank you so much ballardw, I got your idea, that's very similar to my need. Cheers, Owen
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 233 views
  • 0 likes
  • 3 in conversation