Solved: Repeat code for unique values in a column

christinakwang · Posted 01-31-2017 05:20 PM

Dataset A:

data ATISSUE;
  infile datalines dsd truncover;
  input Fund_name:$14. Year:BEST. Class:$2. Sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Hamilton,2005,CW,1,1,CW,1
Hamilton,2006,CW,5,0,NK,0
Hamilton,2007,CW,4,0,NK,0
Hamilton,2008,NK,8,1,NK,1
Hamilton,2009,NK,2,0,GG,1
Hamilton,2010,NK,9,0,GG,0
Hairspray,2005,GG,3,1,HC,0
Hairspray,2006,GG,7,1,HC,0
Hairspray,2007,HC,4,0,NK,0
Hairspray,2008,HC,5,0,NK,0
Hairspray,2009,NK,2,0,HC,0
Hairspray,2010,NK,1,0,HC,1
In the Heights,2005,HC,8,0,GG,1
In the Heights,2006,HC,9,0,NK,1
In the Heights,2007,GG,4,1,CW,0
In the Heights,2008,NK,1,1,NK,0
In the Heights,2009,CW,5,1,NK,1
In the Heights,2010,NK,3,0,NK,0
Les Miserables,2005,NK,1,0,NK,1
Les Miserables,2006,NK,4,1,CW,0
Les Miserables,2007,NK,5,0,CW,1
Les Miserables,2008,CW,6,1,GG,1
Les Miserables,2009,CW,1,0,CW,0
Les Miserables,2010,GG,3,1,CW,0
;;;;

Dataset B:

data POTENTIALPEERS;
  infile datalines dsd truncover;
  input Peer_name:$8. Year:BEST. Class:$2. Peer_sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Rubella,2005,GG,3,1,CW,1
Rubella,2006,NK,7,0,NK,0
Rubella,2007,NK,9,0,NK,0
Rubella,2008,CW,2,1,NK,1
Rubella,2009,CW,5,1,GG,1
Rubella,2010,NK,9,0,GG,0
Smallpox,2005,GG,6,0,HC,1
Smallpox,2006,NK,3,0,HC,0
Smallpox,2007,NK,4,0,NK,0
Smallpox,2008,NK,2,0,NK,0
Smallpox,2009,NK,5,0,HC,0
Smallpox,2010,NK,5,1,HC,1
Mumps,2005,NK,7,1,GG,1
Mumps,2006,CW,2,1,NK,1
Mumps,2007,CW,6,0,CW,0
Mumps,2008,NK,9,0,NK,0
Mumps,2009,GG,7,1,NK,1
Mumps,2010,NK,5,0,NK,0
Measles,2005,NK,3,1,NK,1
Measles,2006,HC,2,0,CW,0
Measles,2007,HC,5,1,CW,0
Measles,2008,HC,6,1,GG,1
Measles,2009,HC,2,0,CW,0
Measles,2010,GG,3,1,CW,0
;;;;

I want to output every row from Dataset B, for a given Fund_name in Dataset A that

1) has a "class" value that appears for any row in a fund_name AND

2) has the exact year - index_flag - load - FoF flag

The following code lets me do this by each Fund_name:

%MACRO findpeers(fundname);

proc sql; 
	create table atissue.&fundname._peers as 
		select a.fund_name, b.*
		from atissue.atissue a
		inner join atissue.potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load AND
			a.fof_flag = b.fof_flag
		where b.class in (select distinct class from atissue.atissue where a.fund_name = "&fundname");
quit;

%MEND findpeers;

%findpeers (Hamilton);
%findpeers (Hairspray);
%findpeers (In the Heights);
%findpeers (Les Miserables);

Questions:

1) The above code seem to work for Hamilton and Hairspray, but why not In the Heights and Les Miserables? Is that because there are spaces? If so, how can I get around that?

2) How can I simplify the last four lines such that, instead of doing %findpeers four times, I can "do over unique values in the fund_name column?

3) Do I have to then stack all these datasets from each fund_name in a separate step? I want to stack them vertically.

Thank you!

Reeza · Posted 02-11-2017 11:04 PM

@christinakwang Not sure if you still need a solution, but I think this works:

proc sql; 
	create table want as 
		select a.fund_name, b.*
		from atissue a
		inner join potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load 
		inner join (select distinct z.fund_name, z.class from atissue z) z_x on
		z_x.fund_name=a.fund_name and
		z_x.class=b.class;
quit;

View solution in original post

Reeza · Posted 01-31-2017 05:57 PM

@christinakwang wrote:

Questions:

1) The above code seem to work for Hamilton and Hairspray, but why not In the Heights and Les Miserables? Is that because there are spaces? If so, how can I get around that?

Yes the spaces are an issue because of how you create the table name: &fund_name._peers. Now replace &fund_name with your parameter -> In the heights_peers -> this is not a valid data set name. You need to either remove spaces or come up with a different naming convention.

2) How can I simplify the last four lines such that, instead of doing %findpeers four times, I can "do over unique values in the fund_name column?

Do a join based on the value and don't use a macro. There is no need for a macro here.

3) Do I have to then stack all these datasets from each fund_name in a separate step? I want to stack them vertically.

See #2 and then you won't have this issue in the first place.

If you post data that we can code someone can help provide working code. We can't work off pictures and I'm not typing out your data :).

christinakwang · Posted 01-31-2017 06:04 PM

Thanks! I've added the data to this post! Digesting the rest of your reply...

Can we add more than one file? 😮

ballardw · Posted 01-31-2017 06:20 PM

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

You can make data step code from an existing SAS data set using the instructions found here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Subset your existing data to enough observations to exercise your requirement and only include the variables you need. Use the instructions and either paste the code into a code box in the forum opened with the {i} icon or attach as a text file.

christinakwang · Posted 01-31-2017 06:33 PM

Thanks! I added what I think is the correctly-formatted data in the original post :D. Going to try to update my code... but not sure how to do this without a macro. ...

christinakwang · Posted 02-01-2017 01:27 PM

Still not sure how I can use a sql join?

proc sql; 
	create table atissue.peers as 
		select a.fund_name, b.*
		from atissue.atissue a
		inner join atissue.potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load AND
			a.fof_flag = b.fof_flag
		where b.class in (select distinct class from atissue.atissue);
		*Used to have (select distinct class from at.issue.atissue where fund_name = "XXXX"
		Not sure how we can make the code go through all XXX *separately* i.e. assemble all the matches
		for Hamilton only. and then Hairspray. And then In the Heights. and then Les Miserables, etc.;	
quit;

For example, the above code is wrong because I end up getting this:

data ATISSUE.PEERS;
  infile datalines dsd truncover;
  input Fund_name:$14. Peer_name:$8. Year:BEST. Class:$2. Peer_sales:BEST. Index_Flag:BEST. Load:$2. FoF_Flag:BEST.;
datalines4;
Hamilton,Rubella,2005,GG,3,1,CW,1
Hamilton,Rubella,2006,NK,7,0,NK,0
Hamilton,Rubella,2007,NK,9,0,NK,0
Hamilton,Smallpox,2007,NK,4,0,NK,0
Hamilton,Rubella,2008,CW,2,1,NK,1
Hamilton,Rubella,2010,NK,9,0,GG,0
Hairspray,Rubella,2007,NK,9,0,NK,0
Hairspray,Smallpox,2007,NK,4,0,NK,0
Hairspray,Smallpox,2008,NK,2,0,NK,0
Hairspray,Mumps,2008,NK,9,0,NK,0
Hairspray,Smallpox,2009,NK,5,0,HC,0
In the Heights,Measles,2007,HC,5,1,CW,0
In the Heights,Mumps,2009,GG,7,1,NK,1
In the Heights,Mumps,2010,NK,5,0,NK,0
Les Miserables,Measles,2008,HC,6,1,GG,1
Les Miserables,Measles,2009,HC,2,0,CW,0
Les Miserables,Measles,2010,GG,3,1,CW,0
;;;;

But the first line (Hamilton, Rubella, 2005, GG, 3, 1, CW, 1), should not have been outputted because "GG" class is not in any of the Hamilton years.

christinakwang · Posted 02-06-2017 02:43 PM

Still not sure if I'm getting a solution... 😞

Reeza · Posted 02-11-2017 11:04 PM

@christinakwang Not sure if you still need a solution, but I think this works:

proc sql; 
	create table want as 
		select a.fund_name, b.*
		from atissue a
		inner join potentialpeers b
		on a.year = b.year AND
			a.index_flag = b.index_flag AND 
			a.load = b.load 
		inner join (select distinct z.fund_name, z.class from atissue z) z_x on
		z_x.fund_name=a.fund_name and
		z_x.class=b.class;
quit;

christinakwang · Posted 02-19-2017 07:19 PM

Thanks Reeza!! This works!!!!!!!!!! 🙂

Repeat code for unique values in a column

Re: Repeat code for unique values in a column

Re: Repeat code for unique values in a column

Re: Repeat code for unique values in a column

Re: Repeat code for unique values in a column

Re: Repeat code for unique values in a column

Re: Repeat code for unique values in a column

Re: Repeat code for unique values in a column

Re: Repeat code for unique values in a column

Re: Repeat code for unique values in a column

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away