The SAS Output Delivery System and reporting techniques

Need help in Merge in Datastep or Join in Proc SQL please

Accepted Solution Solved
Reply
Contributor
Posts: 59
Accepted Solution

Need help in Merge in Datastep or Join in Proc SQL please

I have 2 datasets.

Dataset 1 has 150,000 records with the following columns,

Id , measure1, measure2    

Notes: The dataset 1 has records of many repeats for ids and for all ids, so lot of duplicates.

Dataset2 has 3000 recrods with following columns

Id, measure3, measure4

Notes: The dataset2 has records with unique ids and no repeats or dupicates.

I want to join or merge the two datasets with the condtion the joined dataset shoud be picked based on the condition measure4 values of dataset2=measure2 values of dataset1.

So, the final merged dataset shoud only have 3000 records.


Accepted Solutions
Solution
‎10-14-2012 10:48 AM
Respected Advisor
Posts: 3,156

Re: Need help in Merge in Datastep or Join in Proc SQL please

If under the same id, there is only ONE measure4=measure2, then the following could work for you:

data want;

set have;

merge dataset1(in=a) dataset2 (in=b);

by id;

if b and measure4=measure2 ;

run;

However, if there is a chance that you may have multiple measure2 that match measure4 within the same id and you don't want duplicates, then you may try proc sql:

proc sql;

  create table want as

   select distinct * from dataset1 a

  left join

    datset2 b

on a.id=b.id and measure4=measure2;

quit;

Note: you can ignore the warning saying 'id' already existed.

Haikuo

View solution in original post


All Replies
SAS Super FREQ
Posts: 8,868

Re: Need help in Merge in Datastep or Join in Proc SQL please


Hi:

  I'm not sure that this is really an ODS question, or even a reporting question. Generally, DATA step questions get posted in either the SAS Procedure forum (since you're asking about PROC SQL) or the Macro Facility and DATA step program (since you are asking about a merge).

  However, given what you've posted, I don't know that anyone could provide you much help. You describe the MEASURE4=MEASURE2 condition, but what about ID? Do the ID values have to be part of the matching or joining?  How do you want duplicates to be handled -- if there are dups in dataset1 with matches on the MEASURE variables? Do you want to keep duplicates or not? Do you want to take the first, the last, keep all?

  What code have you tried? Have you tried PROC SQL? If so, what was your code? What kind of join condition did you code? Did you try a DATA step program? Did you use IN= options and BY group processing?

  My general recommendation for problems of this nature is that you get your logic working on a small subset of data, where you can manually do your matching in order to verify the point at which your program logic will produce the correct results. So, if you have these 2 hypothetical files:

Data set 1

ID M1 M2

1  11 22

1  12 23

2  13 14

2  15 16

3  17 17

3  18 17

3  19 14

  

Data set 2

ID M3 M4

1  1  17

2  2  14

3  3  17

  Can you show what results you expect to get? And, what should happen to ID 1, which is in both files, but has no matches on M4=M2.  Also, ID=3 has 2 matches for M2=17 -- do you want to keep both of them?

  Since you posted your question in the Reporting forum, the next logical question is: what kind of report do you need to produce? How do you plan to report on the results? ODS HTML, ODS RTF, ODS PDF. Were you going to use PROC PRINT, PROC REPORT???

cynthia

Contributor
Posts: 59

Re: Need help in Merge in Datastep or Join in Proc SQL please

Posted in reply to Cynthia_sas

hi

Please accept my very SINCERE APOLOGIES FOR  posting in the incorrect forum. i am not looking to report specially using ODS or anything, i am just simply stuck in getting the ouput right to my code mentioned above. I will make a note not to incorrectly post a discussion next time without fail. SO sorry.

Contributor
Posts: 59

Re: Need help in Merge in Datastep or Join in Proc SQL please

after sorting both datasets by id,

i did this step but did not get the desired output

data want;

set have;

merge dataset1(in=a) dataset2 (in=b);

by id;

if a and b;

where measure4=measure2;

run;


Solution
‎10-14-2012 10:48 AM
Respected Advisor
Posts: 3,156

Re: Need help in Merge in Datastep or Join in Proc SQL please

If under the same id, there is only ONE measure4=measure2, then the following could work for you:

data want;

set have;

merge dataset1(in=a) dataset2 (in=b);

by id;

if b and measure4=measure2 ;

run;

However, if there is a chance that you may have multiple measure2 that match measure4 within the same id and you don't want duplicates, then you may try proc sql:

proc sql;

  create table want as

   select distinct * from dataset1 a

  left join

    datset2 b

on a.id=b.id and measure4=measure2;

quit;

Note: you can ignore the warning saying 'id' already existed.

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 293 views
  • 0 likes
  • 3 in conversation