BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Andygray
Quartz | Level 8

I have 2 datasets.

Dataset 1 has 150,000 records with the following columns,

Id , measure1, measure2    

Notes: The dataset 1 has records of many repeats for ids and for all ids, so lot of duplicates.

Dataset2 has 3000 recrods with following columns

Id, measure3, measure4

Notes: The dataset2 has records with unique ids and no repeats or dupicates.

I want to join or merge the two datasets with the condtion the joined dataset shoud be picked based on the condition measure4 values of dataset2=measure2 values of dataset1.

So, the final merged dataset shoud only have 3000 records.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

If under the same id, there is only ONE measure4=measure2, then the following could work for you:

data want;

set have;

merge dataset1(in=a) dataset2 (in=b);

by id;

if b and measure4=measure2 ;

run;

However, if there is a chance that you may have multiple measure2 that match measure4 within the same id and you don't want duplicates, then you may try proc sql:

proc sql;

  create table want as

   select distinct * from dataset1 a

  left join

    datset2 b

on a.id=b.id and measure4=measure2;

quit;

Note: you can ignore the warning saying 'id' already existed.

Haikuo

View solution in original post

4 REPLIES 4
Cynthia_sas
SAS Super FREQ


Hi:

  I'm not sure that this is really an ODS question, or even a reporting question. Generally, DATA step questions get posted in either the SAS Procedure forum (since you're asking about PROC SQL) or the Macro Facility and DATA step program (since you are asking about a merge).

  However, given what you've posted, I don't know that anyone could provide you much help. You describe the MEASURE4=MEASURE2 condition, but what about ID? Do the ID values have to be part of the matching or joining?  How do you want duplicates to be handled -- if there are dups in dataset1 with matches on the MEASURE variables? Do you want to keep duplicates or not? Do you want to take the first, the last, keep all?

  What code have you tried? Have you tried PROC SQL? If so, what was your code? What kind of join condition did you code? Did you try a DATA step program? Did you use IN= options and BY group processing?

  My general recommendation for problems of this nature is that you get your logic working on a small subset of data, where you can manually do your matching in order to verify the point at which your program logic will produce the correct results. So, if you have these 2 hypothetical files:

Data set 1

ID M1 M2

1  11 22

1  12 23

2  13 14

2  15 16

3  17 17

3  18 17

3  19 14

  

Data set 2

ID M3 M4

1  1  17

2  2  14

3  3  17

  Can you show what results you expect to get? And, what should happen to ID 1, which is in both files, but has no matches on M4=M2.  Also, ID=3 has 2 matches for M2=17 -- do you want to keep both of them?

  Since you posted your question in the Reporting forum, the next logical question is: what kind of report do you need to produce? How do you plan to report on the results? ODS HTML, ODS RTF, ODS PDF. Were you going to use PROC PRINT, PROC REPORT???

cynthia

Andygray
Quartz | Level 8

hi

Please accept my very SINCERE APOLOGIES FOR  posting in the incorrect forum. i am not looking to report specially using ODS or anything, i am just simply stuck in getting the ouput right to my code mentioned above. I will make a note not to incorrectly post a discussion next time without fail. SO sorry.

Andygray
Quartz | Level 8

after sorting both datasets by id,

i did this step but did not get the desired output

data want;

set have;

merge dataset1(in=a) dataset2 (in=b);

by id;

if a and b;

where measure4=measure2;

run;


Haikuo
Onyx | Level 15

If under the same id, there is only ONE measure4=measure2, then the following could work for you:

data want;

set have;

merge dataset1(in=a) dataset2 (in=b);

by id;

if b and measure4=measure2 ;

run;

However, if there is a chance that you may have multiple measure2 that match measure4 within the same id and you don't want duplicates, then you may try proc sql:

proc sql;

  create table want as

   select distinct * from dataset1 a

  left join

    datset2 b

on a.id=b.id and measure4=measure2;

quit;

Note: you can ignore the warning saying 'id' already existed.

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 886 views
  • 0 likes
  • 3 in conversation