BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
owen_black
Calcite | Level 5
Hi, I have a treatment group of about 2000 firms and their financial data, and comparable data for about a million other firms. Anyone could tell me how could I find firms to build the compare group? Could you kindly tell me where I could find examples of sas code to do the job? Thanks a lot! Cheer, Owen
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Without knowing details of what you might consider a "match", here is a very generic way.

The first two datasets just build some dummy data with a name and value. The Proc sql compares the sales values (assumes that is what you want to match on, not much else in this data) and selects matches when the difference is 300 or less. Depending on the number of matches you might want you would adjust the value of that comparison parameter up or down.

If you only want the compare company to "match" one in the treatment set it will be up to you to remove the duplicates which would not be very difficult (left as an exercise for the interested reader).

 

If you multiple values to match on describing the space could be a simple as adding AND for the different variables.

Refinements but I'm too lazy to try to code today would be to identify the minimum absolute difference and only select the compare companies with that value.

 

I am not making ANY claim to efficiency in the code. Simple and brute force. Careful consideration of the differences at first step may help.

 


data treat;
   input name $ sales;
datalines;
abc   10000
cdf   20000
ged   30000
;
run;

data compare;
   length compname $ 8;
   do a='S','T','U','V','X','Y','Z';
      do b='S','T','U','V','X','Y','Z';
         do c='S','T','U','V','X','Y','Z';
            do d='S','T','U','V','X','Y','Z';
           
               compname=catt(a,b,c,d);
               compsales = round(100000*rand('uniform'),100);
               output;
            end;
         end;
      end;
   end;
   drop a b c;
run;


proc sql;
   create table compgroup as
   select a.*,b.*
   from treat as a, compare as b
   where abs(a.sales-b.compsales) < 300;
quit;

A real rough other approach would be to combine the two data sets with a variable that contains which set each record comes from.

 

Sort on the comparison values.

Extract the treatment records and the one or two following the treatment record (very easy) and possibly one or two preceding ( a litle more work).

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

How do you want to compare them? From your question only, it could be on the first letter of their name Smiley Happy

PG
ballardw
Super User

Without knowing details of what you might consider a "match", here is a very generic way.

The first two datasets just build some dummy data with a name and value. The Proc sql compares the sales values (assumes that is what you want to match on, not much else in this data) and selects matches when the difference is 300 or less. Depending on the number of matches you might want you would adjust the value of that comparison parameter up or down.

If you only want the compare company to "match" one in the treatment set it will be up to you to remove the duplicates which would not be very difficult (left as an exercise for the interested reader).

 

If you multiple values to match on describing the space could be a simple as adding AND for the different variables.

Refinements but I'm too lazy to try to code today would be to identify the minimum absolute difference and only select the compare companies with that value.

 

I am not making ANY claim to efficiency in the code. Simple and brute force. Careful consideration of the differences at first step may help.

 


data treat;
   input name $ sales;
datalines;
abc   10000
cdf   20000
ged   30000
;
run;

data compare;
   length compname $ 8;
   do a='S','T','U','V','X','Y','Z';
      do b='S','T','U','V','X','Y','Z';
         do c='S','T','U','V','X','Y','Z';
            do d='S','T','U','V','X','Y','Z';
           
               compname=catt(a,b,c,d);
               compsales = round(100000*rand('uniform'),100);
               output;
            end;
         end;
      end;
   end;
   drop a b c;
run;


proc sql;
   create table compgroup as
   select a.*,b.*
   from treat as a, compare as b
   where abs(a.sales-b.compsales) < 300;
quit;

A real rough other approach would be to combine the two data sets with a variable that contains which set each record comes from.

 

Sort on the comparison values.

Extract the treatment records and the one or two following the treatment record (very easy) and possibly one or two preceding ( a litle more work).

owen_black
Calcite | Level 5
Thank you so much ballardw, I got your idea, that's very similar to my need. Cheers, Owen

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 884 views
  • 0 likes
  • 3 in conversation