BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hello,
I hope you can help me:
I have one company A with a certain sales figure X in dataset 1.
I have n potential peer companies in dataset 2.
I have to match my company to a peer company out of dataset 2 based on sales number X.
Thereby I have to take the following steps:
1) Sort dataset 2 by Sales number
=> This is straight forward, no help needed for that step
2) Split dataset 2 into three equal portfolios based on Sales (this means that I have n observations, so each portfolio will contain n/3 observations which are sorted by Sales. So the first portfolio will contain the companies with the lowest sales numbers, etc.)
=> I tried to use a macro I found online with a "split function", however, it doesn't seem to work. Do you have something better?
3) Cross-check if company A from dataset 1 could fit into one of the three portfolios created. If the Sales figure of company A from dataset 1 is larger or smaller than the sales numbers of all potential peers in portfolio n, then delete portfolio n. => I should be left with only one portfolio
4) Out of this one portfolio that is left, find the peer company that has a sales figure that is closest to the sales figure for company A
=> both step 3 + 4 I don't know how to program, so some help would be great!
[I know that it looks like step 2 and 3 are redundant. However, in reality I have to match the company based on several figures, not just sales. That's why I need to create these three portfolios. It's just for simplicity that I leave this out for now].

Thanks you!
1 ACCEPTED SOLUTION

Accepted Solutions
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10

At the simplest level, suggest you create a Portfolio variable for identification and divide your "dataset 2" observations equally based on the value of Sales. A DATA step or PROC SQL can perform this task - a macro approach would only benefit if you had a large number of Portfolio splits to generate - so start simple and use the technique involving: IF THEN ; SAS coding.

Then create a SAS file with a subset of key variables needed to compare "dataset 1" to "dataset 2" -- one additional variable is OBSNUM = _N_; to track the source observation.

Then perform your cross-check between "dataset 1" and "dataset 2", using the key-variables to determine where you a relationship in your data.

Lastly, using the results of your cross-check, retrieve the observations from your "dataset 1" and identify your closest_peer within "dataset 2".

Suggest you start programming, one step at a time -- then come back to the forum for suggestions and guidance with any particular problem you might have. You may find PROC SQL to be more comfortable for coding, but do consider that you have more control when using individual DATA step and PROC (SORT or otherwise) invocations, in many cases.

Make use of the SAS support website http://support.sas.com/ for helpful SAS-hosted DOC and also supplemental technical /

conference subject-matter reference materials.

 

Editor's note: in the time since this original post, @Reeza has contributed a useful article with code you can use.


Scott Barry
SBBWorks, Inc.

View solution in original post

5 REPLIES 5
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10

At the simplest level, suggest you create a Portfolio variable for identification and divide your "dataset 2" observations equally based on the value of Sales. A DATA step or PROC SQL can perform this task - a macro approach would only benefit if you had a large number of Portfolio splits to generate - so start simple and use the technique involving: IF THEN ; SAS coding.

Then create a SAS file with a subset of key variables needed to compare "dataset 1" to "dataset 2" -- one additional variable is OBSNUM = _N_; to track the source observation.

Then perform your cross-check between "dataset 1" and "dataset 2", using the key-variables to determine where you a relationship in your data.

Lastly, using the results of your cross-check, retrieve the observations from your "dataset 1" and identify your closest_peer within "dataset 2".

Suggest you start programming, one step at a time -- then come back to the forum for suggestions and guidance with any particular problem you might have. You may find PROC SQL to be more comfortable for coding, but do consider that you have more control when using individual DATA step and PROC (SORT or otherwise) invocations, in many cases.

Make use of the SAS support website http://support.sas.com/ for helpful SAS-hosted DOC and also supplemental technical /

conference subject-matter reference materials.

 

Editor's note: in the time since this original post, @Reeza has contributed a useful article with code you can use.


Scott Barry
SBBWorks, Inc.

deleted_user
Not applicable
Hello Scott,
thanks for your help. Unfortunately, I am not much further. I'm not very clear about how I can use the If...Then... statement to obtain three equally large portfolios.
One thought is that after sorting by Sales, I allocate numbers to my observations:
ex:
Obs Sales
1 100k
2 30k
3 80k
... ....
Then I specify that IF Obs =< n/3 THEN portfolio 1, IF Obs > n/3 and Obs =< 2n/3 THEN portolio 2, else portfolio 3. Is this what you had in mind? But how do I account for rounding issues then?
Also, can you specify in more detail how the cross-check between dataset 1 and dataset 2 is supposed to work? The SAS support website is helpful, but I need to know what function I am looking for.
Thanks,
Tina
deleted_user
Not applicable
Hello Scott,
Thanks for your help. Unfortunately, I am not much further. I'm not very clear about how I can use the If...Then... statement to obtain three equally large portfolios. One thought is that after sorting by Sales, I allocate numbers to my observations:
ex: Obs Sales
1 100k
2 30k
... ....
Then I specify that if Obs smaller/equal n/3 then portfolio 1, if Obs larger n/3 and Obs smaller/equal 2n/3 then portfolio 2, else portfolio 3. Is this what you had in mind? But how do I account for rounding issues then?
Also, can you specify in more detail how the cross-check between dataset 1 and dataset 2 is supposed to work? The SAS support website is helpful, but I need to know what function I am looking for.
Thanks,
Tina
deleted_user
Not applicable
Here is an example of spliting a dataset into 3 different datasets.
There might be a more elegant solution, but I put this together in 5 minutes as an example.

data sampledata;
format sales dollar10.2;
do sales=100 to 154;
output;
end;
run;
data sampledata;
set sampledata;
if first.sales then n=1;
else n+1;
run;
proc sql;
create table obs as
select floor(max(n)/3) as Obs
from sampledata;
quit;
data _null_;
set obs;
call symput('obs',obs);
run;
data sampledata;
set sampledata;
Group = 2;
if n le &obs. then Group = 1;
if n gt (&obs.*2) then Group = 3;
run;
data Group1;
set sampledata (where=(Group=1));
run;
data Group2;
set sampledata (where=(Group=2));
run;
data Group3;
set sampledata (where=(Group=3));
run;
ballardw
Super User
If the dataset is relatively large and multiple passes might take too long the last 4 datasteps in dblan's example can be reduced to:

data group1 (where=(Group=1))
Group2 (where=(Group=2))
Group3 (where=(Group=3))
;
set sampledata;
Group = 2;
if n le &obs. then Group = 1;
if n gt (&obs.*2) then Group = 3;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1897 views
  • 0 likes
  • 3 in conversation