BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
reubens
Calcite | Level 5

Hi there, 

 

I have two files (MAIN) and (ADDON) that I'd like to merge based on the variable (reprisk_ID) that is shared by BOTH FILES

 

The MAIN file has two columns (1) reprisk_ID (2) value

The ADDON file has three columns: (1) reprisk_ID, (2) xxx 

 

Both files are merged to produce WANT_2 (based on my code below) that

1) Merges MAIN & ADDON (based on the variable "reprisk_ID)

2) Keeps the merged file where only xxx=1

 

WANT_2 is that dataset output I'd like to keep, but I'm very much hoping that my SAS syntax can be minimized so that I don't need to use the syntax in RED but can only use the syntax in BLUE with minor adjustments. (Reason is the the MAIN files that I'm dealing with can be 100GB) and this process can be very slow. 

 

The MAIN file is based on this code:

 

/* ---------------------------------------------------------------------------------------------------------------------------- */
/* MAIN FILE */
/* ---------------------------------------------------------------------------------------------------------------------------- */

data MAIN;
input reprisk_ID $ value;
Format reprisk_ID $5.;

datalines;
ABC 1000
BCF 2000
ING 3000
DEF 4000
GHI 5000
ING 3500
BCF 5900
;
run;

 

The ADDON file is based on this code:
/* ---------------------------------------------------------------------------------------------------------------------------- */
/* ADDON FILE */
/* ---------------------------------------------------------------------------------------------------------------------------- */

data ADDON;
input reprisk_ID $ xxx;
format reprisk_ID $5.;

datalines;
ABC 1
BCF 1
ING 1

run;

 

I'd like to get the following WANT output

/* ---------------------------------------------------------------------------------------------------------------------------- */
/* WANT FILE (How do I get this want file?) */
/* ---------------------------------------------------------------------------------------------------------------------------- */

data WANT;
input reprisk_ID $ value xxx;

format reprisk_ID $5.;

datalines;
ABC 1000 1
BCF 5900 1
BCF 2000 1
ING 3000 1
ING 3500 1
run;

 

The current SAS syntax I'm using is this, where I get WANT from doing an extra step in getting WANT_1 and then WANT_2 (where WANT_2 is effectively WANT)

 

/* ---------------------------------------------------------------------------------------------------------------------------- */
/* CODE TO MERGE - MAIN & ADDON = WANT */
/* ---------------------------------------------------------------------------------------------------------------------------- */

%macro prep_tbl(inds,outview);
proc sql;
create view &outview as
select *
from &inds
/* where xxx =1 */
order by reprisk_id
;
quit;
%mend;
%prep_tbl(MAIN,v_1);
%prep_tbl(ADDON,v_2);

data WANT_1;
merge v_1 (in=v1) v_2;
by reprisk_id;
if v1;
run;

/* ---------------------------------------------------------------------------------------------------------------------------- */

data WANT_2;
set WANT_1;
if xxx=. then delete;
run;

 

My question

Is there a simpler way to get WANT by not doing the step highlighted in red above, but doing something simpler by manipulating the lines in blue only?

 

Thanks,

Reuben

 

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

You do not say exactly why you are having to use that "code in red".

If it is because you have some observations in the first set that don't match

 

data WANT_1;
merge v_1 (in=v1) v_2 (in=v2);
by reprisk_id;
if v1 and v2;
run;

restricts output to results from both data sets.

Unless you have missing values in your source data which you did not show.

 

 

 

 

View solution in original post

3 REPLIES 3
andreas_lds
Jade | Level 19

if you don't need the result sorted, using a hash object provides you with dataset want in one step:

data want; 
   set main;

   if _n_ = 1 then do;
      if 0 then set addon;

      declare hash h(dataset: 'addon');
      h.defineKey('reprisk_id');
      h.defineData('xxx');
      h.defineDone();
   end;

   if h.find() = 0;
run;

or change the merging step, so that an in-variable is defined for the addon dataset, too:

data want_ds;
   merge main(in = inMain) addon(in = inAddon);
   by reprisk_id;

   if inMain and inAddon;
run;

I was not able to sort the data in the order of your want dataset.

ballardw
Super User

You do not say exactly why you are having to use that "code in red".

If it is because you have some observations in the first set that don't match

 

data WANT_1;
merge v_1 (in=v1) v_2 (in=v2);
by reprisk_id;
if v1 and v2;
run;

restricts output to results from both data sets.

Unless you have missing values in your source data which you did not show.

 

 

 

 

reubens
Calcite | Level 5

Thanks so much! That's resolves my issue. 

Another solution, which just occurred to me is just use a where= statement that let's you choose what you want to collate.

 

data WANT_1 (where=(xxx=1));
merge v_1 (in=v1) v_2;
/* by DSCD year; */
by reprisk_id;
if v1;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 566 views
  • 1 like
  • 3 in conversation