Assume I've two datasets Have1 and Have2. In Have1 dataset I've two variables ERR_MSG_TXT and FLT_ID. In Have2 dataset I've two variables called RUN_ID and STAT_CD.
Now I want to create a new dataset WANT which should have a variables called RUN_ID and FLT_ID.So in the new dataset WANT, I've to look for ERR_MSG_TXT in dataset Have1 and STAT_CD variable in dataset Have2. If it matches, then I've to get the respective variables RUN_ID and FLT_ID from datasets Have1 and Have2. I'm not certain which method (look up or join?) works better here. Appericiate if someone of you help me with the code to accomplish this Task.
Dataset Have1:
FLT_ID | ERR_MSG_TXT |
148 | Syntax error |
149 | Not a valid procedure |
150 | Warning |
Dataset Have2:
RUN_ID | STAT_CD |
1 | Syntax error |
2 | Warning |
3 | Not a valid procedure |
Desired Output: Dataset WANT
RUN_ID | FLT_ID |
1 | 148 |
2 | 150 |
3 | 149 |
@David_Billa hi and welcome to the SAS Community 🙂
There are multiple ways to do this, but here is one
data have1;
input FLT_ID ERR_MSG_TXT $50.;
datalines;
148 Syntax error
149 Not a valid procedure
150 Warning
;
data have2;
input RUN_ID STAT_CD $50.;
datalines;
1 Syntax error
2 Warning
3 Not a valid procedure
;
data want(keep=RUN_ID FLT_ID);
if _N_=1 then do;
declare hash h(dataset:'have1(rename=(ERR_MSG_TXT=STAT_CD))');
h.definekey('STAT_CD');
h.definedata('FLT_ID');
h.definedone();
end;
set have2;
FLT_ID=.;
rc=h.find();
run;
Great! May I know why you've assigned missing value for FLT_ID?
Also could you please tell me about the method using join to complete this task? If not join, then using any other method.
Here a few more ways how this can be done.
Please note: In the moment there is a 1:1 relationship between the tables in your sample data. If that's different and especially if there are also non-matching cases possible then you will eventually need to amend the logic to get to your desired result.
data have1;
input FLT_ID ERR_MSG_TXT $50.;
datalines;
148 Syntax error
149 Not a valid procedure
150 Warning
;
data have2;
input RUN_ID STAT_CD $50.;
datalines;
1 Syntax error
2 Warning
3 Not a valid procedure
;
/*** option 1 ***/
proc sql;
/* create table want1 as*/
select r.run_id, l.flt_id
from have1 l inner join have2 r
on l.ERR_MSG_TXT=r.STAT_CD
;
quit;
/*** option 2 ***/
/* create format from data set */
data cntl;
set have1(rename=(ERR_MSG_TXT=start FLT_ID=label));
retain fmtname '$FLT_ID' type 'c';
run;
proc format cntlin=cntl;
run;
/* option 2, variant 1: Use format to populate a new variable */
data want2;
set have2;
FLT_ID=put(STAT_CD,$FLT_ID3.);
keep run_id FLT_ID;
run;
proc print;
run;
/* option 2, variant 2: Apply format for printing values of existing variable */
proc print data=have2;
format STAT_CD $FLT_ID3.;
run;
@David_Billa wrote:
May I know how to amend the logic for non matching cases? One simple
example should be OK
That really depends how your desired result needs to look like, i.e. do you want to include or exclude non-matching cases, do you want to include all non matching rows from one table or from both ...and so on.
Using the code already provided why don't you give it a go yourself first and then eventually come back and ask targeted questions for things where you get stuck with your code.
You need of course first to create additional sample data for such cases.
With SQL: Look up in the documentation SQL Left Join, Right Join, Full Outer Join.... and for formats how to add an OTHER case
Yet another way to do this, and to get the non-matches output as well.
Assuming that HAVE1 is your lookup dataset, and HAVE2 is the result of various runs, which you want to check:
data have1;
input FLT_ID ERR_MSG_TXT $50.;
datalines;
148 Syntax error
149 Not a valid procedure
150 Warning
;
data have2;
input RUN_ID STAT_CD $50.;
datalines;
1 Syntax error
2 Warning
3 Not a valid procedure
;
proc sql;
create index ERR_MSG_TXT on have1(ERR_MSG_TXT);
quit;
data want(keep=RUN_ID FLT_ID) notfound(keep=RUN_ID STAT_CD);
set have2;
ERR_MSG_TXT=STAT_CD;
set have1 key=ERR_MSG_TXT/unique;
if _iorc_ then do;
output notfound;
_error_=0;
end;
else output want;
run;
If you omit the "_error_=0" line, you get a message in the log when a value is not in the lookup dataset, which may be nice.
Left join can handle mismatches in a reasonable manner:
data have1;
input FLT_ID ERR_MSG_TXT $50.;
datalines;
148 Syntax error
149 Not a valid procedure
150 Warning
999 Unknown message
;
data have2;
input RUN_ID STAT_CD $50.;
datalines;
1 Syntax error
2 Warning
3 Not a valid procedure
4 Bravo!
;
proc sql;
create table want as
select
a.RUN_ID,
coalesce(b.FLT_ID, 999) as FLT_ID
from
have2 as a left join
have1 as b on b.ERR_MSG_TXT = a.STAT_CD;
select * from want;
quit;
RUN_ID FLT_ID 4 999 3 149 1 148 2 150
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.