BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
christinakwang
Obsidian | Level 7

Dataset A:

 

data ATISSUE;
  infile datalines dsd truncover;
  input Fund_name:$14. Year:BEST. Class:$2. Sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Hamilton,2005,CW,1,1,CW,1
Hamilton,2006,CW,5,0,NK,0
Hamilton,2007,CW,4,0,NK,0
Hamilton,2008,NK,8,1,NK,1
Hamilton,2009,NK,2,0,GG,1
Hamilton,2010,NK,9,0,GG,0
Hairspray,2005,GG,3,1,HC,0
Hairspray,2006,GG,7,1,HC,0
Hairspray,2007,HC,4,0,NK,0
Hairspray,2008,HC,5,0,NK,0
Hairspray,2009,NK,2,0,HC,0
Hairspray,2010,NK,1,0,HC,1
In the Heights,2005,HC,8,0,GG,1
In the Heights,2006,HC,9,0,NK,1
In the Heights,2007,GG,4,1,CW,0
In the Heights,2008,NK,1,1,NK,0
In the Heights,2009,CW,5,1,NK,1
In the Heights,2010,NK,3,0,NK,0
Les Miserables,2005,NK,1,0,NK,1
Les Miserables,2006,NK,4,1,CW,0
Les Miserables,2007,NK,5,0,CW,1
Les Miserables,2008,CW,6,1,GG,1
Les Miserables,2009,CW,1,0,CW,0
Les Miserables,2010,GG,3,1,CW,0
;;;;

 

Dataset B: 

 

data POTENTIALPEERS;
  infile datalines dsd truncover;
  input Peer_name:$8. Year:BEST. Class:$2. Peer_sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Rubella,2005,GG,3,1,CW,1
Rubella,2006,NK,7,0,NK,0
Rubella,2007,NK,9,0,NK,0
Rubella,2008,CW,2,1,NK,1
Rubella,2009,CW,5,1,GG,1
Rubella,2010,NK,9,0,GG,0
Smallpox,2005,GG,6,0,HC,1
Smallpox,2006,NK,3,0,HC,0
Smallpox,2007,NK,4,0,NK,0
Smallpox,2008,NK,2,0,NK,0
Smallpox,2009,NK,5,0,HC,0
Smallpox,2010,NK,5,1,HC,1
Mumps,2005,NK,7,1,GG,1
Mumps,2006,CW,2,1,NK,1
Mumps,2007,CW,6,0,CW,0
Mumps,2008,NK,9,0,NK,0
Mumps,2009,GG,7,1,NK,1
Mumps,2010,NK,5,0,NK,0
Measles,2005,NK,3,1,NK,1
Measles,2006,HC,2,0,CW,0
Measles,2007,HC,5,1,CW,0
Measles,2008,HC,6,1,GG,1
Measles,2009,HC,2,0,CW,0
Measles,2010,GG,3,1,CW,0
;;;;

 

 

 

I want to output every row from Dataset B, for a given Fund_name in Dataset A that 

 

1) has a "class" value that appears for any row in a fund_name AND 

2) has the exact year - index_flag - load - FoF flag

 

The following code lets me do this by each Fund_name: 

 

%MACRO findpeers(fundname);

proc sql; 
	create table atissue.&fundname._peers as 
		select a.fund_name, b.*
		from atissue.atissue a
		inner join atissue.potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load AND
			a.fof_flag = b.fof_flag
		where b.class in (select distinct class from atissue.atissue where a.fund_name = "&fundname");
quit;

%MEND findpeers;

%findpeers (Hamilton);
%findpeers (Hairspray);
%findpeers (In the Heights);
%findpeers (Les Miserables);

 

Questions: 

 

1) The above code seem to work for Hamilton and Hairspray, but why not In the Heights and Les Miserables? Is that because there are spaces? If so, how can I get around that? 

 

2) How can I simplify the last four lines such that, instead of doing %findpeers four times, I can "do over unique values in the fund_name column?

 

3) Do I have to then stack all these datasets from each fund_name in a separate step? I want to stack them vertically. 

 

Thank you! 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@christinakwang Not sure if you still need a solution, but I think this works:

 

proc sql; 
	create table want as 
		select a.fund_name, b.*
		from atissue a
		inner join potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load 
		inner join (select distinct z.fund_name, z.class from atissue z) z_x on
		z_x.fund_name=a.fund_name and
		z_x.class=b.class;
quit;

View solution in original post

8 REPLIES 8
Reeza
Super User

@christinakwang wrote:


Questions: 

 

1) The above code seem to work for Hamilton and Hairspray, but why not In the Heights and Les Miserables? Is that because there are spaces? If so, how can I get around that? 

 

 

Yes the spaces are an issue because of how you create the table name: &fund_name._peers.  Now replace &fund_name with your parameter -> In the heights_peers -> this is not a valid data set name. You need to either remove spaces or come up with a different naming convention.

2) How can I simplify the last four lines such that, instead of doing %findpeers four times, I can "do over unique values in the fund_name column?

Do a join based on the value and don't use a macro. There is no need for a macro here.

 

3) Do I have to then stack all these datasets from each fund_name in a separate step? I want to stack them vertically. 

See #2 and then you won't have this issue in the first place.

 

If you post data that we can code someone can help provide working code. We can't work off pictures and I'm not typing out your data :).

 

 

christinakwang
Obsidian | Level 7

Thanks! I've added the data to this post! Digesting the rest of your reply... 

 

Can we add more than one file? 😮 

ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

You can make data step code from an existing SAS data set using the instructions found here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Subset your existing data to enough observations to exercise your requirement and only include the variables you need. Use the instructions and either paste the code into a code box in the forum opened with the {i} icon or attach as a text file.

christinakwang
Obsidian | Level 7

Thanks! I added what I think is the correctly-formatted data in the original post :D. Going to try to update my code... but not sure how to do this without a macro. ...

christinakwang
Obsidian | Level 7

Still not sure how I can use a sql join? 

 

proc sql; 
	create table atissue.peers as 
		select a.fund_name, b.*
		from atissue.atissue a
		inner join atissue.potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load AND
			a.fof_flag = b.fof_flag
		where b.class in (select distinct class from atissue.atissue);
		*Used to have (select distinct class from at.issue.atissue where fund_name = "XXXX"
		Not sure how we can make the code go through all XXX *separately* i.e. assemble all the matches
		for Hamilton only. and then Hairspray. And then In the Heights. and then Les Miserables, etc.;	
quit;

 For example, the above code is wrong because I end up getting this: 

 

data ATISSUE.PEERS;
  infile datalines dsd truncover;
  input Fund_name:$14. Peer_name:$8. Year:BEST. Class:$2. Peer_sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Hamilton,Rubella,2005,GG,3,1,CW,1
Hamilton,Rubella,2006,NK,7,0,NK,0
Hamilton,Rubella,2007,NK,9,0,NK,0
Hamilton,Smallpox,2007,NK,4,0,NK,0
Hamilton,Rubella,2008,CW,2,1,NK,1
Hamilton,Rubella,2010,NK,9,0,GG,0
Hairspray,Rubella,2007,NK,9,0,NK,0
Hairspray,Smallpox,2007,NK,4,0,NK,0
Hairspray,Smallpox,2008,NK,2,0,NK,0
Hairspray,Mumps,2008,NK,9,0,NK,0
Hairspray,Smallpox,2009,NK,5,0,HC,0
In the Heights,Measles,2007,HC,5,1,CW,0
In the Heights,Mumps,2009,GG,7,1,NK,1
In the Heights,Mumps,2010,NK,5,0,NK,0
Les Miserables,Measles,2008,HC,6,1,GG,1
Les Miserables,Measles,2009,HC,2,0,CW,0
Les Miserables,Measles,2010,GG,3,1,CW,0
;;;;

But the first line (Hamilton, Rubella, 2005, GG, 3, 1, CW, 1), should not have been outputted because "GG" class is not in any of the Hamilton years. 

christinakwang
Obsidian | Level 7

Still not sure if I'm getting a solution... 😞 

Reeza
Super User

@christinakwang Not sure if you still need a solution, but I think this works:

 

proc sql; 
	create table want as 
		select a.fund_name, b.*
		from atissue a
		inner join potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load 
		inner join (select distinct z.fund_name, z.class from atissue z) z_x on
		z_x.fund_name=a.fund_name and
		z_x.class=b.class;
quit;
christinakwang
Obsidian | Level 7

Thanks Reeza!! This works!!!!!!!!!! 🙂 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2170 views
  • 2 likes
  • 3 in conversation