DATA Step, Macro, Functions and more

Repeat code for unique values in a column

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Repeat code for unique values in a column

[ Edited ]

Dataset A:

 

data ATISSUE;
  infile datalines dsd truncover;
  input Fund_name:$14. Year:BEST. Class:$2. Sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Hamilton,2005,CW,1,1,CW,1
Hamilton,2006,CW,5,0,NK,0
Hamilton,2007,CW,4,0,NK,0
Hamilton,2008,NK,8,1,NK,1
Hamilton,2009,NK,2,0,GG,1
Hamilton,2010,NK,9,0,GG,0
Hairspray,2005,GG,3,1,HC,0
Hairspray,2006,GG,7,1,HC,0
Hairspray,2007,HC,4,0,NK,0
Hairspray,2008,HC,5,0,NK,0
Hairspray,2009,NK,2,0,HC,0
Hairspray,2010,NK,1,0,HC,1
In the Heights,2005,HC,8,0,GG,1
In the Heights,2006,HC,9,0,NK,1
In the Heights,2007,GG,4,1,CW,0
In the Heights,2008,NK,1,1,NK,0
In the Heights,2009,CW,5,1,NK,1
In the Heights,2010,NK,3,0,NK,0
Les Miserables,2005,NK,1,0,NK,1
Les Miserables,2006,NK,4,1,CW,0
Les Miserables,2007,NK,5,0,CW,1
Les Miserables,2008,CW,6,1,GG,1
Les Miserables,2009,CW,1,0,CW,0
Les Miserables,2010,GG,3,1,CW,0
;;;;

 

Dataset B: 

 

data POTENTIALPEERS;
  infile datalines dsd truncover;
  input Peer_name:$8. Year:BEST. Class:$2. Peer_sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Rubella,2005,GG,3,1,CW,1
Rubella,2006,NK,7,0,NK,0
Rubella,2007,NK,9,0,NK,0
Rubella,2008,CW,2,1,NK,1
Rubella,2009,CW,5,1,GG,1
Rubella,2010,NK,9,0,GG,0
Smallpox,2005,GG,6,0,HC,1
Smallpox,2006,NK,3,0,HC,0
Smallpox,2007,NK,4,0,NK,0
Smallpox,2008,NK,2,0,NK,0
Smallpox,2009,NK,5,0,HC,0
Smallpox,2010,NK,5,1,HC,1
Mumps,2005,NK,7,1,GG,1
Mumps,2006,CW,2,1,NK,1
Mumps,2007,CW,6,0,CW,0
Mumps,2008,NK,9,0,NK,0
Mumps,2009,GG,7,1,NK,1
Mumps,2010,NK,5,0,NK,0
Measles,2005,NK,3,1,NK,1
Measles,2006,HC,2,0,CW,0
Measles,2007,HC,5,1,CW,0
Measles,2008,HC,6,1,GG,1
Measles,2009,HC,2,0,CW,0
Measles,2010,GG,3,1,CW,0
;;;;

 

 

 

I want to output every row from Dataset B, for a given Fund_name in Dataset A that 

 

1) has a "class" value that appears for any row in a fund_name AND 

2) has the exact year - index_flag - load - FoF flag

 

The following code lets me do this by each Fund_name: 

 

%MACRO findpeers(fundname);

proc sql; 
	create table atissue.&fundname._peers as 
		select a.fund_name, b.*
		from atissue.atissue a
		inner join atissue.potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load AND
			a.fof_flag = b.fof_flag
		where b.class in (select distinct class from atissue.atissue where a.fund_name = "&fundname");
quit;

%MEND findpeers;

%findpeers (Hamilton);
%findpeers (Hairspray);
%findpeers (In the Heights);
%findpeers (Les Miserables);

 

Questions: 

 

1) The above code seem to work for Hamilton and Hairspray, but why not In the Heights and Les Miserables? Is that because there are spaces? If so, how can I get around that? 

 

2) How can I simplify the last four lines such that, instead of doing %findpeers four times, I can "do over unique values in the fund_name column?

 

3) Do I have to then stack all these datasets from each fund_name in a separate step? I want to stack them vertically. 

 

Thank you! 


Accepted Solutions
Solution
‎02-19-2017 07:19 PM
Super User
Posts: 17,818

Re: Repeat code for unique values in a column

@christinakwang Not sure if you still need a solution, but I think this works:

 

proc sql; 
	create table want as 
		select a.fund_name, b.*
		from atissue a
		inner join potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load 
		inner join (select distinct z.fund_name, z.class from atissue z) z_x on
		z_x.fund_name=a.fund_name and
		z_x.class=b.class;
quit;

View solution in original post


All Replies
Super User
Posts: 17,818

Re: Repeat code for unique values in a column


christinakwang wrote:


Questions: 

 

1) The above code seem to work for Hamilton and Hairspray, but why not In the Heights and Les Miserables? Is that because there are spaces? If so, how can I get around that? 

 

 

Yes the spaces are an issue because of how you create the table name: &fund_name._peers.  Now replace &fund_name with your parameter -> In the heights_peers -> this is not a valid data set name. You need to either remove spaces or come up with a different naming convention.

2) How can I simplify the last four lines such that, instead of doing %findpeers four times, I can "do over unique values in the fund_name column?

Do a join based on the value and don't use a macro. There is no need for a macro here.

 

3) Do I have to then stack all these datasets from each fund_name in a separate step? I want to stack them vertically. 

See #2 and then you won't have this issue in the first place.

 

If you post data that we can code someone can help provide working code. We can't work off pictures and I'm not typing out your data Smiley Happy.

 

 

Occasional Contributor
Posts: 16

Re: Repeat code for unique values in a column

Thanks! I've added the data to this post! Digesting the rest of your reply... 

 

Can we add more than one file? Smiley Surprised 

Super User
Posts: 10,497

Re: Repeat code for unique values in a column

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

You can make data step code from an existing SAS data set using the instructions found here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Subset your existing data to enough observations to exercise your requirement and only include the variables you need. Use the instructions and either paste the code into a code box in the forum opened with the {i} icon or attach as a text file.

Occasional Contributor
Posts: 16

Re: Repeat code for unique values in a column

Thanks! I added what I think is the correctly-formatted data in the original post Smiley Very Happy. Going to try to update my code... but not sure how to do this without a macro. ...

Occasional Contributor
Posts: 16

Re: Repeat code for unique values in a column

[ Edited ]

Still not sure how I can use a sql join? 

 

proc sql; 
	create table atissue.peers as 
		select a.fund_name, b.*
		from atissue.atissue a
		inner join atissue.potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load AND
			a.fof_flag = b.fof_flag
		where b.class in (select distinct class from atissue.atissue);
		*Used to have (select distinct class from at.issue.atissue where fund_name = "XXXX"
		Not sure how we can make the code go through all XXX *separately* i.e. assemble all the matches
		for Hamilton only. and then Hairspray. And then In the Heights. and then Les Miserables, etc.;	
quit;

 For example, the above code is wrong because I end up getting this: 

 

data ATISSUE.PEERS;
  infile datalines dsd truncover;
  input Fund_name:$14. Peer_name:$8. Year:BEST. Class:$2. Peer_sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Hamilton,Rubella,2005,GG,3,1,CW,1
Hamilton,Rubella,2006,NK,7,0,NK,0
Hamilton,Rubella,2007,NK,9,0,NK,0
Hamilton,Smallpox,2007,NK,4,0,NK,0
Hamilton,Rubella,2008,CW,2,1,NK,1
Hamilton,Rubella,2010,NK,9,0,GG,0
Hairspray,Rubella,2007,NK,9,0,NK,0
Hairspray,Smallpox,2007,NK,4,0,NK,0
Hairspray,Smallpox,2008,NK,2,0,NK,0
Hairspray,Mumps,2008,NK,9,0,NK,0
Hairspray,Smallpox,2009,NK,5,0,HC,0
In the Heights,Measles,2007,HC,5,1,CW,0
In the Heights,Mumps,2009,GG,7,1,NK,1
In the Heights,Mumps,2010,NK,5,0,NK,0
Les Miserables,Measles,2008,HC,6,1,GG,1
Les Miserables,Measles,2009,HC,2,0,CW,0
Les Miserables,Measles,2010,GG,3,1,CW,0
;;;;

But the first line (Hamilton, Rubella, 2005, GG, 3, 1, CW, 1), should not have been outputted because "GG" class is not in any of the Hamilton years. 

Occasional Contributor
Posts: 16

Re: Repeat code for unique values in a column

Still not sure if I'm getting a solution... Smiley Sad 

Solution
‎02-19-2017 07:19 PM
Super User
Posts: 17,818

Re: Repeat code for unique values in a column

@christinakwang Not sure if you still need a solution, but I think this works:

 

proc sql; 
	create table want as 
		select a.fund_name, b.*
		from atissue a
		inner join potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load 
		inner join (select distinct z.fund_name, z.class from atissue z) z_x on
		z_x.fund_name=a.fund_name and
		z_x.class=b.class;
quit;
Occasional Contributor
Posts: 16

Re: Repeat code for unique values in a column

Thanks Reeza!! This works!!!!!!!!!! Smiley Happy 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 225 views
  • 2 likes
  • 3 in conversation