MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file

reubens — Thu, 05 Oct 2023 11:07:38 GMT

Hi there,

I have two files (MAIN) and (ADDON) that I'd like to merge based on the variable (reprisk_ID) that is shared by BOTH FILES

The MAIN file has two columns (1) reprisk_ID (2) value

The ADDON file has three columns: (1) reprisk_ID, (2) xxx

Both files are merged to produce WANT_2 (based on my code below) that

1) Merges MAIN & ADDON (based on the variable "reprisk_ID)

2) Keeps the merged file where only xxx=1

WANT_2 is that dataset output I'd like to keep, but I'm very much hoping that my SAS syntax can be minimized so that I don't need to use the syntax in RED but can only use the syntax in BLUE with minor adjustments. (Reason is the the MAIN files that I'm dealing with can be 100GB) and this process can be very slow.

The MAIN file is based on this code:

/* ---------------------------------------------------------------------------------------------------------------------------- */
/* MAIN FILE */
/* ---------------------------------------------------------------------------------------------------------------------------- */

data MAIN;
input reprisk_ID $ value;
Format reprisk_ID $5.;

datalines;
ABC 1000
BCF 2000
ING 3000
DEF 4000
GHI 5000
ING 3500
BCF 5900
;
run;

The ADDON file is based on this code:
/* ---------------------------------------------------------------------------------------------------------------------------- */
/* ADDON FILE */
/* ---------------------------------------------------------------------------------------------------------------------------- */

data ADDON;
input reprisk_ID $ xxx;
format reprisk_ID $5.;

datalines;
ABC 1
BCF 1
ING 1

run;

I'd like to get the following WANT output

/* ---------------------------------------------------------------------------------------------------------------------------- */
/* WANT FILE (How do I get this want file?) */
/* ---------------------------------------------------------------------------------------------------------------------------- */

data WANT;
input reprisk_ID $ value xxx;

format reprisk_ID $5.;

datalines;
ABC 1000 1
BCF 5900 1
BCF 2000 1
ING 3000 1
ING 3500 1
run;

The current SAS syntax I'm using is this, where I get WANT from doing an extra step in getting WANT_1 and then WANT_2 (where WANT_2 is effectively WANT)

/* ---------------------------------------------------------------------------------------------------------------------------- */
/* CODE TO MERGE - MAIN & ADDON = WANT */
/* ---------------------------------------------------------------------------------------------------------------------------- */

%macro prep_tbl(inds,outview);
proc sql;
create view &outview as
select *
from &inds
/* where xxx =1 */
order by reprisk_id
;
quit;
%mend;
%prep_tbl(MAIN,v_1);
%prep_tbl(ADDON,v_2);

data WANT_1;
merge v_1 (in=v1) v_2;
by reprisk_id;
if v1;
run;

/* ---------------------------------------------------------------------------------------------------------------------------- */

data WANT_2;
set WANT_1;
if xxx=. then delete;
run;

My question

Is there a simpler way to get WANT by not doing the step highlighted in red above, but doing something simpler by manipulating the lines in blue only?

Thanks,

Reuben

Re: MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file

andreas_lds — Thu, 05 Oct 2023 11:41:32 GMT

if you don't need the result sorted, using a hash object provides you with dataset want in one step:

data want; 
   set main;

   if _n_ = 1 then do;
      if 0 then set addon;

      declare hash h(dataset: 'addon');
      h.defineKey('reprisk_id');
      h.defineData('xxx');
      h.defineDone();
   end;

   if h.find() = 0;
run;

or change the merging step, so that an in-variable is defined for the addon dataset, too:

data want_ds;
   merge main(in = inMain) addon(in = inAddon);
   by reprisk_id;

   if inMain and inAddon;
run;

I was not able to sort the data in the order of your want dataset.

Re: MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file

ballardw — Thu, 05 Oct 2023 14:52:52 GMT

You do not say exactly why you are having to use that "code in red".

If it is because you have some observations in the first set that don't match

data WANT_1;
merge v_1 (in=v1) v_2 (in=v2);
by reprisk_id;
if v1 and v2;
run;

restricts output to results from both data sets.

Unless you have missing values in your source data which you did not show.

Re: MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file

reubens — Thu, 05 Oct 2023 17:17:04 GMT

Thanks so much! That's resolves my issue.

Another solution, which just occurred to me is just use a where= statement that let's you choose what you want to collate.

data WANT_1 (where=(xxx=1));
merge v_1 (in=v1) v_2;
/* by DSCD year; */
by reprisk_id;
if v1;
run;

topic Re: MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file in SAS Programming

MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file

Re: MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file

Re: MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file

Re: MERGE multiple files (based on a particular column) and RECEIVE a conditional MERGED output file