DATA Step, Macro, Functions and more

How to refer to columns in a particular dataset in a DATA STEP merge?

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

How to refer to columns in a particular dataset in a DATA STEP merge?

These are my test datasets:

 

data a;
	input id variety $;
	datalines;
1 jazz
2 gala
3 empire
4 rome
5 mcintosh
;
run;

data b;
	input id place $;
	datalines;
1 binghamton
2 westchester
3 scranton
4 utica
5 albany
;
run;

Sorting boilerplate:

 

proc sort data=a; by id; run;
proc sort data=b; by id; run;

I'm trying to do something like this but with the DATA STEP:

 

proc sql;
	create table foo as 
	select *
	from a join b on a.id = b.id 
	where prxmatch('/^a/i', b.place);
quit;

This is my attempt but it throws syntax errors:

 

data foo;
	merge a(in=a) b(in=a);
	by id;
	where prxmatch('/^a/i', b.place);
run;

How would I refer to b.place in the data step? 


Accepted Solutions
Solution
‎04-04-2018 03:14 PM
Super User
Super User
Posts: 7,860

Re: How to refer to columns in a particular dataset in a DATA STEP merge?

[ Edited ]
Posted in reply to tomcmacdonald

@tomcmacdonald wrote:

This is the error message I get when executing your first code snippet:

ERROR: Variable place is not on file WORK.A.

The purpose of the WHERE statement is to limit the data that is read into the data step.  If you use a WHERE statement in a data step it is applied to all of the input dataset.  

If you want to limit the observations that are written out you could just convert to a subsetting IF statement instead or convert the logic to use DELETE or OUTPUT statements to determine which observations are written out.

data want;
  merge A B ;
  by ID ;
  if (....) ;
run;

Or you could use the WHERE= dataset option on the dataset it applies to to limit the records read from that dataset.

data want;
  merge A B(where=(....)) ;
  by ID ;
run;

 

View solution in original post


All Replies
Super User
Posts: 13,084

Re: How to refer to columns in a particular dataset in a DATA STEP merge?

Posted in reply to tomcmacdonald

In this case since PLACE only exists in one data set there is no need to attempt to specify the set, it is already in the data vector as the variable Place. Just use: where prxmatch('/^a/i', place);

 

Assuming that you actually meant (otherwise both data sets contributed the same in variable)

data foo;
	merge a(in=a) b(in=b);
	by id;
	where prxmatch('/^a/i', b.place);
run;

 

 

MERGE combines the records before the executable steps following the merge are executed. So the concept of a variable belonging to as specific data set is gone.

You can reference records that contributed to the current record with

 

If b then <do something>

 

However to see a real difference between data step merge and sql join use this data b:

data b;
	input id place $ variety $;
	datalines;
1 binghamton  a
2 westchester b
3 scranton    c
4 utica       d
5 albany      e
;
run;

and see what happens in data foo.

 

Then do the same data foo but reverse the order of a and b on the merge statement.

Contributor
Posts: 71

Re: How to refer to columns in a particular dataset in a DATA STEP merge?

[ Edited ]

This is the error message I get when executing your first code snippet:

ERROR: Variable place is not on file WORK.A.
Solution
‎04-04-2018 03:14 PM
Super User
Super User
Posts: 7,860

Re: How to refer to columns in a particular dataset in a DATA STEP merge?

[ Edited ]
Posted in reply to tomcmacdonald

@tomcmacdonald wrote:

This is the error message I get when executing your first code snippet:

ERROR: Variable place is not on file WORK.A.

The purpose of the WHERE statement is to limit the data that is read into the data step.  If you use a WHERE statement in a data step it is applied to all of the input dataset.  

If you want to limit the observations that are written out you could just convert to a subsetting IF statement instead or convert the logic to use DELETE or OUTPUT statements to determine which observations are written out.

data want;
  merge A B ;
  by ID ;
  if (....) ;
run;

Or you could use the WHERE= dataset option on the dataset it applies to to limit the records read from that dataset.

data want;
  merge A B(where=(....)) ;
  by ID ;
run;

 

Super User
Posts: 13,084

Re: How to refer to columns in a particular dataset in a DATA STEP merge?

Posted in reply to tomcmacdonald

Please show code executed. I did not intend you to run

data foo;
	merge a(in=a) b(in=b);
	by id;
	where prxmatch('/^a/i', b.place);
run;

because the b.place is an error, that was to point out you had used (in=a) for both set a and b.

 

Executable code would be

data foo;
	merge a b;
	by id;
	where prxmatch('/^a/i',place);
run;

Since the In variables really weren't of use. And assumes the prxmatch is valid.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 122 views
  • 0 likes
  • 3 in conversation