I'm trying to merge two datasets in which the same patient may have been tested once for TEST1 and twice for TEST2.
So if the observations/variables are as follows:
FIRST DATASET
Obs ID TEST1
1 A POS
SECOND DATASET
Obs ID TEST2
1 A POS
2 A POS
The resulting merged set returns this:
Obs ID TEST1 TEST2
1 A POS POS
2 A POS POS
How do I keep the second observation for this patient blank for TEST1?
Try this:
data want;
if 0 then set first second; /* only here to populate the PDV */
call missing (of _all_);
merge
first
second
;
by id;
run;
A simple Merge will do 🙂
EDIT: Did not catch the missing test1 part first.
data one;
input ID $ TEST1 $;
datalines;
A POS
;
data two;
input ID $ TEST2 $;
datalines;
A POS
A POS
;
data want;
if 0 then set one two;
test1 = '';
merge one two;
by ID;
run;
Result:
ID TEST1 TEST2 A POS POS A POS
Thanks. Perhaps I should have been clearer. These datasets have been imported from Excel, and each has over 2,000 observations. Is there a method which doesn't require manual data entry? I am much obliged.
@PeterClemmensen two initial data steps are just there to create data for the third data step. Instead of the two fake datasets created here, use your datasets in the MERGE statement.
Try this:
data want;
if 0 then set first second; /* only here to populate the PDV */
call missing (of _all_);
merge
first
second
;
by id;
run;
That did the trick. Thanks!
Sorry to be responding after 2 months but it appears that this bit of code results in the deletion of any observations with both tests. Final dataset shows only those exclusive to one test variable or the other. I'm lost as to how to fix it. Thanks!
Post examples for your datasets as data steps with datalines (as we have done) and your merge code.
I should rephrase: I'm not sure what happened to these observations. The listed variables are common to both datasets with one exception: TESTA had no Btest variable, and TESTB had no Atest variable. The resulting dataset does have observations with entries for both tests, as well as those with entries in only one test or the other, but other observations were dropped entirely and I've no idea how to do the detective work. Here's the code:
DATA TESTCHECK;
RETAIN Client_ID Site_Type Reporting_Month Reporting_Quarter Reporting_Year Year_of_Birth Race Ethnicity Gender Atest Btest;
IF 0 THEN SET TESTA TESTB;
CALL missing (of _all_);
MERGE TESTA TESTB;
BY Client_ID Reporting_Month;
RUN;
I appreciate your help!
UPDATE - got it to work. All good here. Thanks for responding.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.