Hi SAS experts,
I have a question regards merging datasets.
dataset1 have one observation for each subject (total subject #=60,000):
subjectid registration_date age gender weight
1 5/20/1990 45 M 180
2 4/23/1995 50 M 160
3 3/2/1992 33 F 130
5 6/5/1990 65 F 150
....
dataset2 have multiple observations for each subject:
subjectid testdate test1 test2 test3
1 5/21/1990 50 40 40
1 6/4/1992 45 78 45
3 4/25/1996 45 23 34
3 8/5/1998 45 56 77
3 9/3/1999 12 12 67
4 4/1/1990 23 24 23
4 6/5/1990 23 23 25
....
dataset1 has subjects that are not in dataset2 and vice versa.
I wanted to distinguish date type (registration_date or test_date) so I created new variables date1 (which equals to registration_date in dataset1 and testdate in dataset2) and datetype (which equals to "registration" in dataset1 and "test" in dataset2).
Then I merged the 2 datasets by subjectid and date1, and the merged dataset looked like this:
subjectid date1 datetype registration_date age gender weight testdate test1 test2 test3
1 5/20/1990 registration 5/20/1990 45 M 180
1 5/21/1990 test 5/21/1990 50 40 40
1 6/4/1992 test 6/4/1992 45 78 45
2 4/23/1995 registration 4/23/1995 50 M 160
3 3/2/1992 registration 3/2/1992 33 F 130
3 4/25/1996 test 4/25/1996 45 23 34
3 8/5/1998 test 8/5/1998 45 56 77
3 9/3/1999 test 9/3/1999 12 12 67
4 4/1/1990 test 4/1/1990 23 24 23
4 6/5/1990 test 6/5/1990 23 23 25
5 6/5/1990 registration 6/5/1990 65 F 150
.....
Now I need to exclude patients who only appear in dataset2 and not in dataset1 (so merged dataset will not have subject 4....).
I tried the in operator but I must not have the syntax right because it did not come out the way I want it to.
Please help and thank you very much in advance for any advice!
Please show us your current merge code so we can suggest how to modify it. For starters, did you use SQL or a data step?
IN data set option (not operator) in a data step should work as expected, assuming it was implemented correctly.
data merged;
merge a (in=inA) b (in=inB);
by ID date;
if inA; *keeps only records that are in A;
run;
@michan22 wrote:
Hi SAS experts,
I have a question regards merging datasets.
dataset1 have one observation for each subject (total subject #=60,000):
subjectid registration_date age gender weight
1 5/20/1990 45 M 180
2 4/23/1995 50 M 160
3 3/2/1992 33 F 130
5 6/5/1990 65 F 150
....
dataset2 have multiple observations for each subject:
subjectid testdate test1 test2 test3
1 5/21/1990 50 40 40
1 6/4/1992 45 78 45
3 4/25/1996 45 23 34
3 8/5/1998 45 56 77
3 9/3/1999 12 12 67
4 4/1/1990 23 24 23
4 6/5/1990 23 23 25
....
dataset1 has subjects that are not in dataset2 and vice versa.
I wanted to distinguish date type (registration_date or test_date) so I created new variables date1 (which equals to registration_date in dataset1 and testdate in dataset2) and datetype (which equals to "registration" in dataset1 and "test" in dataset2).
Then I merged the 2 datasets by subjectid and date1, and the merged dataset looked like this:
subjectid date1 datetype registration_date age gender weight testdate test1 test2 test3
1 5/20/1990 registration 5/20/1990 45 M 180
1 5/21/1990 test 5/21/1990 50 40 40
1 6/4/1992 test 6/4/1992 45 78 45
2 4/23/1995 registration 4/23/1995 50 M 160
3 3/2/1992 registration 3/2/1992 33 F 130
3 4/25/1996 test 4/25/1996 45 23 34
3 8/5/1998 test 8/5/1998 45 56 77
3 9/3/1999 test 9/3/1999 12 12 67
4 4/1/1990 test 4/1/1990 23 24 23
4 6/5/1990 test 6/5/1990 23 23 25
5 6/5/1990 registration 6/5/1990 65 F 150
.....
Now I need to exclude patients who only appear in dataset2 and not in dataset1 (so merged dataset will not have subject 4....).
I tried the in operator but I must not have the syntax right because it did not come out the way I want it to.
Please help and thank you very much in advance for any advice!
Hi!
My code right now:
data merge_nodup_droplabonly;
format subjectid $9. date1 mmddyy10.;
merge dataset1 (in=a) artdate tbdate hivdate clinicalnodup dataset2 (in=b drop=_name_);
by subjectid date1;
run;
I tried if a, if a=1, and it only kept one data line per subject, which is not what I want.
Thank you!!
You might be looking for a left join.
....................
FROM Dataset1
LEFT JOIN Dataset2 ON ........
Show us the code you assist you more in detail.
So I actually found a solution...
Created another dataset that tagged every subject in dataset1, and combined this dataset with the previously merged dataset and deleted anyone without the tag.
I know this is probably not the smartest or cleanest way to do this, so any other suggestions are appreciated!!
Thank you all!!
Why wouldn't below simple merge not work for you?
data merged;
merge dataset1 (in=ds1) dataset2 (in=ds2);
by subjectid;
if ds1; *keeps only records that are in dataset1;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.