Dataset A:
data ATISSUE; infile datalines dsd truncover; input Fund_name:$14. Year:BEST. Class:$2. Sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.; datalines4; Hamilton,2005,CW,1,1,CW,1 Hamilton,2006,CW,5,0,NK,0 Hamilton,2007,CW,4,0,NK,0 Hamilton,2008,NK,8,1,NK,1 Hamilton,2009,NK,2,0,GG,1 Hamilton,2010,NK,9,0,GG,0 Hairspray,2005,GG,3,1,HC,0 Hairspray,2006,GG,7,1,HC,0 Hairspray,2007,HC,4,0,NK,0 Hairspray,2008,HC,5,0,NK,0 Hairspray,2009,NK,2,0,HC,0 Hairspray,2010,NK,1,0,HC,1 In the Heights,2005,HC,8,0,GG,1 In the Heights,2006,HC,9,0,NK,1 In the Heights,2007,GG,4,1,CW,0 In the Heights,2008,NK,1,1,NK,0 In the Heights,2009,CW,5,1,NK,1 In the Heights,2010,NK,3,0,NK,0 Les Miserables,2005,NK,1,0,NK,1 Les Miserables,2006,NK,4,1,CW,0 Les Miserables,2007,NK,5,0,CW,1 Les Miserables,2008,CW,6,1,GG,1 Les Miserables,2009,CW,1,0,CW,0 Les Miserables,2010,GG,3,1,CW,0 ;;;;
Dataset B:
data POTENTIALPEERS; infile datalines dsd truncover; input Peer_name:$8. Year:BEST. Class:$2. Peer_sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.; datalines4; Rubella,2005,GG,3,1,CW,1 Rubella,2006,NK,7,0,NK,0 Rubella,2007,NK,9,0,NK,0 Rubella,2008,CW,2,1,NK,1 Rubella,2009,CW,5,1,GG,1 Rubella,2010,NK,9,0,GG,0 Smallpox,2005,GG,6,0,HC,1 Smallpox,2006,NK,3,0,HC,0 Smallpox,2007,NK,4,0,NK,0 Smallpox,2008,NK,2,0,NK,0 Smallpox,2009,NK,5,0,HC,0 Smallpox,2010,NK,5,1,HC,1 Mumps,2005,NK,7,1,GG,1 Mumps,2006,CW,2,1,NK,1 Mumps,2007,CW,6,0,CW,0 Mumps,2008,NK,9,0,NK,0 Mumps,2009,GG,7,1,NK,1 Mumps,2010,NK,5,0,NK,0 Measles,2005,NK,3,1,NK,1 Measles,2006,HC,2,0,CW,0 Measles,2007,HC,5,1,CW,0 Measles,2008,HC,6,1,GG,1 Measles,2009,HC,2,0,CW,0 Measles,2010,GG,3,1,CW,0 ;;;;
I want to output every row from Dataset B, for a given Fund_name in Dataset A that
1) has a "class" value that appears for any row in a fund_name AND
2) has the exact year - index_flag - load - FoF flag
The following code lets me do this by each Fund_name:
%MACRO findpeers(fundname); proc sql; create table atissue.&fundname._peers as select a.fund_name, b.* from atissue.atissue a inner join atissue.potentialpeers b on a.year = b.year AND a.index_flag = b.index_flag AND a.load = b.load AND a.fof_flag = b.fof_flag where b.class in (select distinct class from atissue.atissue where a.fund_name = "&fundname"); quit; %MEND findpeers; %findpeers (Hamilton); %findpeers (Hairspray); %findpeers (In the Heights); %findpeers (Les Miserables);
Questions:
1) The above code seem to work for Hamilton and Hairspray, but why not In the Heights and Les Miserables? Is that because there are spaces? If so, how can I get around that?
2) How can I simplify the last four lines such that, instead of doing %findpeers four times, I can "do over unique values in the fund_name column?
3) Do I have to then stack all these datasets from each fund_name in a separate step? I want to stack them vertically.
Thank you!
@christinakwang Not sure if you still need a solution, but I think this works:
proc sql;
create table want as
select a.fund_name, b.*
from atissue a
inner join potentialpeers b
on a.year = b.year AND
a.index_flag = b.index_flag AND
a.load = b.load
inner join (select distinct z.fund_name, z.class from atissue z) z_x on
z_x.fund_name=a.fund_name and
z_x.class=b.class;
quit;
@christinakwang wrote:
Questions:
1) The above code seem to work for Hamilton and Hairspray, but why not In the Heights and Les Miserables? Is that because there are spaces? If so, how can I get around that?
Yes the spaces are an issue because of how you create the table name: &fund_name._peers. Now replace &fund_name with your parameter -> In the heights_peers -> this is not a valid data set name. You need to either remove spaces or come up with a different naming convention.
2) How can I simplify the last four lines such that, instead of doing %findpeers four times, I can "do over unique values in the fund_name column?
Do a join based on the value and don't use a macro. There is no need for a macro here.
3) Do I have to then stack all these datasets from each fund_name in a separate step? I want to stack them vertically.
See #2 and then you won't have this issue in the first place.
If you post data that we can code someone can help provide working code. We can't work off pictures and I'm not typing out your data :).
Thanks! I've added the data to this post! Digesting the rest of your reply...
Can we add more than one file? 😮
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
You can make data step code from an existing SAS data set using the instructions found here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Subset your existing data to enough observations to exercise your requirement and only include the variables you need. Use the instructions and either paste the code into a code box in the forum opened with the {i} icon or attach as a text file.
Thanks! I added what I think is the correctly-formatted data in the original post :D. Going to try to update my code... but not sure how to do this without a macro. ...
Still not sure how I can use a sql join?
proc sql;
create table atissue.peers as
select a.fund_name, b.*
from atissue.atissue a
inner join atissue.potentialpeers b
on a.year = b.year AND
a.index_flag = b.index_flag AND
a.load = b.load AND
a.fof_flag = b.fof_flag
where b.class in (select distinct class from atissue.atissue);
*Used to have (select distinct class from at.issue.atissue where fund_name = "XXXX"
Not sure how we can make the code go through all XXX *separately* i.e. assemble all the matches
for Hamilton only. and then Hairspray. And then In the Heights. and then Les Miserables, etc.;
quit;
For example, the above code is wrong because I end up getting this:
data ATISSUE.PEERS;
infile datalines dsd truncover;
input Fund_name:$14. Peer_name:$8. Year:BEST. Class:$2. Peer_sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Hamilton,Rubella,2005,GG,3,1,CW,1
Hamilton,Rubella,2006,NK,7,0,NK,0
Hamilton,Rubella,2007,NK,9,0,NK,0
Hamilton,Smallpox,2007,NK,4,0,NK,0
Hamilton,Rubella,2008,CW,2,1,NK,1
Hamilton,Rubella,2010,NK,9,0,GG,0
Hairspray,Rubella,2007,NK,9,0,NK,0
Hairspray,Smallpox,2007,NK,4,0,NK,0
Hairspray,Smallpox,2008,NK,2,0,NK,0
Hairspray,Mumps,2008,NK,9,0,NK,0
Hairspray,Smallpox,2009,NK,5,0,HC,0
In the Heights,Measles,2007,HC,5,1,CW,0
In the Heights,Mumps,2009,GG,7,1,NK,1
In the Heights,Mumps,2010,NK,5,0,NK,0
Les Miserables,Measles,2008,HC,6,1,GG,1
Les Miserables,Measles,2009,HC,2,0,CW,0
Les Miserables,Measles,2010,GG,3,1,CW,0
;;;;
But the first line (Hamilton, Rubella, 2005, GG, 3, 1, CW, 1), should not have been outputted because "GG" class is not in any of the Hamilton years.
Still not sure if I'm getting a solution... 😞
@christinakwang Not sure if you still need a solution, but I think this works:
proc sql;
create table want as
select a.fund_name, b.*
from atissue a
inner join potentialpeers b
on a.year = b.year AND
a.index_flag = b.index_flag AND
a.load = b.load
inner join (select distinct z.fund_name, z.class from atissue z) z_x on
z_x.fund_name=a.fund_name and
z_x.class=b.class;
quit;
Thanks Reeza!! This works!!!!!!!!!! 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.