These are my test datasets:
data a;
input id variety $;
datalines;
1 jazz
2 gala
3 empire
4 rome
5 mcintosh
;
run;
data b;
input id place $;
datalines;
1 binghamton
2 westchester
3 scranton
4 utica
5 albany
;
run;
Sorting boilerplate:
proc sort data=a; by id; run;
proc sort data=b; by id; run;
I'm trying to do something like this but with the DATA STEP:
proc sql;
create table foo as
select *
from a join b on a.id = b.id
where prxmatch('/^a/i', b.place);
quit;
This is my attempt but it throws syntax errors:
data foo;
merge a(in=a) b(in=a);
by id;
where prxmatch('/^a/i', b.place);
run;
How would I refer to b.place in the data step?
@tomcmacdonald wrote:
This is the error message I get when executing your first code snippet:
ERROR: Variable place is not on file WORK.A.
The purpose of the WHERE statement is to limit the data that is read into the data step. If you use a WHERE statement in a data step it is applied to all of the input dataset.
If you want to limit the observations that are written out you could just convert to a subsetting IF statement instead or convert the logic to use DELETE or OUTPUT statements to determine which observations are written out.
data want;
merge A B ;
by ID ;
if (....) ;
run;
Or you could use the WHERE= dataset option on the dataset it applies to to limit the records read from that dataset.
data want;
merge A B(where=(....)) ;
by ID ;
run;
In this case since PLACE only exists in one data set there is no need to attempt to specify the set, it is already in the data vector as the variable Place. Just use: where prxmatch('/^a/i', place);
Assuming that you actually meant (otherwise both data sets contributed the same in variable)
data foo; merge a(in=a) b(in=b); by id; where prxmatch('/^a/i', b.place); run;
MERGE combines the records before the executable steps following the merge are executed. So the concept of a variable belonging to as specific data set is gone.
You can reference records that contributed to the current record with
If b then <do something>
However to see a real difference between data step merge and sql join use this data b:
data b; input id place $ variety $; datalines; 1 binghamton a 2 westchester b 3 scranton c 4 utica d 5 albany e ; run;
and see what happens in data foo.
Then do the same data foo but reverse the order of a and b on the merge statement.
This is the error message I get when executing your first code snippet:
ERROR: Variable place is not on file WORK.A.
@tomcmacdonald wrote:
This is the error message I get when executing your first code snippet:
ERROR: Variable place is not on file WORK.A.
The purpose of the WHERE statement is to limit the data that is read into the data step. If you use a WHERE statement in a data step it is applied to all of the input dataset.
If you want to limit the observations that are written out you could just convert to a subsetting IF statement instead or convert the logic to use DELETE or OUTPUT statements to determine which observations are written out.
data want;
merge A B ;
by ID ;
if (....) ;
run;
Or you could use the WHERE= dataset option on the dataset it applies to to limit the records read from that dataset.
data want;
merge A B(where=(....)) ;
by ID ;
run;
Please show code executed. I did not intend you to run
data foo; merge a(in=a) b(in=b); by id; where prxmatch('/^a/i', b.place); run;
because the b.place is an error, that was to point out you had used (in=a) for both set a and b.
Executable code would be
data foo; merge a b; by id; where prxmatch('/^a/i',place); run;
Since the In variables really weren't of use. And assumes the prxmatch is valid.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.