BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tomcmacdonald
Quartz | Level 8

These are my test datasets:

 

data a;
	input id variety $;
	datalines;
1 jazz
2 gala
3 empire
4 rome
5 mcintosh
;
run;

data b;
	input id place $;
	datalines;
1 binghamton
2 westchester
3 scranton
4 utica
5 albany
;
run;

Sorting boilerplate:

 

proc sort data=a; by id; run;
proc sort data=b; by id; run;

I'm trying to do something like this but with the DATA STEP:

 

proc sql;
	create table foo as 
	select *
	from a join b on a.id = b.id 
	where prxmatch('/^a/i', b.place);
quit;

This is my attempt but it throws syntax errors:

 

data foo;
	merge a(in=a) b(in=a);
	by id;
	where prxmatch('/^a/i', b.place);
run;

How would I refer to b.place in the data step? 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@tomcmacdonald wrote:

This is the error message I get when executing your first code snippet:

ERROR: Variable place is not on file WORK.A.

The purpose of the WHERE statement is to limit the data that is read into the data step.  If you use a WHERE statement in a data step it is applied to all of the input dataset.  

If you want to limit the observations that are written out you could just convert to a subsetting IF statement instead or convert the logic to use DELETE or OUTPUT statements to determine which observations are written out.

data want;
  merge A B ;
  by ID ;
  if (....) ;
run;

Or you could use the WHERE= dataset option on the dataset it applies to to limit the records read from that dataset.

data want;
  merge A B(where=(....)) ;
  by ID ;
run;

 

View solution in original post

4 REPLIES 4
ballardw
Super User

In this case since PLACE only exists in one data set there is no need to attempt to specify the set, it is already in the data vector as the variable Place. Just use: where prxmatch('/^a/i', place);

 

Assuming that you actually meant (otherwise both data sets contributed the same in variable)

data foo;
	merge a(in=a) b(in=b);
	by id;
	where prxmatch('/^a/i', b.place);
run;

 

 

MERGE combines the records before the executable steps following the merge are executed. So the concept of a variable belonging to as specific data set is gone.

You can reference records that contributed to the current record with

 

If b then <do something>

 

However to see a real difference between data step merge and sql join use this data b:

data b;
	input id place $ variety $;
	datalines;
1 binghamton  a
2 westchester b
3 scranton    c
4 utica       d
5 albany      e
;
run;

and see what happens in data foo.

 

Then do the same data foo but reverse the order of a and b on the merge statement.

tomcmacdonald
Quartz | Level 8

This is the error message I get when executing your first code snippet:

ERROR: Variable place is not on file WORK.A.
Tom
Super User Tom
Super User

@tomcmacdonald wrote:

This is the error message I get when executing your first code snippet:

ERROR: Variable place is not on file WORK.A.

The purpose of the WHERE statement is to limit the data that is read into the data step.  If you use a WHERE statement in a data step it is applied to all of the input dataset.  

If you want to limit the observations that are written out you could just convert to a subsetting IF statement instead or convert the logic to use DELETE or OUTPUT statements to determine which observations are written out.

data want;
  merge A B ;
  by ID ;
  if (....) ;
run;

Or you could use the WHERE= dataset option on the dataset it applies to to limit the records read from that dataset.

data want;
  merge A B(where=(....)) ;
  by ID ;
run;

 

ballardw
Super User

Please show code executed. I did not intend you to run

data foo;
	merge a(in=a) b(in=b);
	by id;
	where prxmatch('/^a/i', b.place);
run;

because the b.place is an error, that was to point out you had used (in=a) for both set a and b.

 

Executable code would be

data foo;
	merge a b;
	by id;
	where prxmatch('/^a/i',place);
run;

Since the In variables really weren't of use. And assumes the prxmatch is valid.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 4645 views
  • 0 likes
  • 3 in conversation