Dataset a: Dataset b:
ID Recpos AE ID Rec AE_adj
001 1 fever,chills 001 1 rejection,fever,chills
001 2 artery rupture 002 1 rejection
002 0
ID-Subject ID-Character ID-Subject ID-Character
Recpos-Record-Numeric Rec-Record-Numeric
AE-AEterm-character AE_adj-AEterm-character
Output Expected:
ID Recpos Rec AE AE_adj
001 1 1 fever,chills rejection,fever,chills
001 2 artery rupture
002 0
002 1 rejection
Output that I got after running the below code:
ID Recpos AE AE_adj
001 1 fever,chills rejection,fever,chills
001 2 artery rupture
002 0
002 1 rejection
Can you please suggest what is going wrong with in the merging? Is it the renaming thing that is causing this issue?
Thank you,
data step_01_a;
set a;
keep ID Recpos AE;
run;
proc sort data = step_01_a;
by ID Recpos;
run;
data step_01_b;
set b;
keep ID Rec AE_adj;
run;
proc sort data=step_01_b out=step_01_b(rename=(Rec=Recpos));
by ID Recpos;
run;
data final;
merge step_01_a(in=a) step_01_b(in=b);
if a or b;
run;
You post your example data as dataset a and dataset b, and not as usable data in the form of data step, and then do not use those names in the code. So there is some possibility of confusion right there.
Second, we do not have any actual surety of the contents of any of your data sets, names of variables or such.
You renamed REC to RECPOSE in the second sort so REC went away and unless there is a REC variable in the A set then you shifted the value to the RECPOSE column.
Did you actually intend to match the records by sequence or by the value of ID?
Hi,
Sorting is done by both ID and Record which has different labels in dataset a and b. Dataset a has Record as Recpos and b as Rec and hence I have done the renaming.
Is this causing any issue of Record information being over written or something?
Is there a way we can merge two different dataset and still have all the values in the output, even the missing values?
Thank you.
@Nasya wrote:
Hi,
Sorting is done by both ID and Record which has different labels in dataset a and b. Dataset a has Record as Recpos and b as Rec and hence I have done the renaming.
Is this causing any issue of Record information being over written or something?
Is there a way we can merge two different dataset and still have all the values in the output, even the missing values?
Thank you.
From your wanted and actual code the difference is that REC does not appear as a variable and the value appears in the wrong variable:
proc sort data=step_01_b out=step_01_b(rename=(Rec=Recpos)); by ID Recpos; run;
which is why the variable REC is not in the output. There is no variable. How can it "have a missing value"?
I think you need to expand on the logic of rules involving the role of ID RECPOS and REC in the merge process.
Your merge did not nave any BY statement, which was somewhat odd in that you sorted two data sets before merging, made a common variable (recpos). Without a BY statement MERGE will match the first record from one set with the first of the second, only using the order of the records in the two sets.
Without matching to ensure a common sort order there may be other issues that can arise. And since missing comes before 0 in sort orders you will have some difficulty getting the order you show with 0 before missing for recpos on id 002
Yes, its a rename issue. Don't rename it if you don't want it merged.
Your MERGE is also missing a BY statement which you likely want.
See the coloured text below. If you want it merged by REC and ID then you will need a different approach than outlined here.
One thing that can help avoid these issues is if you comment your code. Before each step add a single comment that explains the logic, ie:
1. create a smaller data set to merge and test
2. Sort before a merge
3. join data from adverse events to the main trial data
data step_01_a; set a; keep ID Recpos AE; run; proc sort data = step_01_a; by ID Recpos; run; data step_01_b; set b; keep ID Rec AE_adj; run; proc sort data=step_01_b out=step_01_b (rename=(Rec=Recpos)); by ID Recpos; run; data final; merge step_01_a(in=a) step_01_b(in=b); BY ID; if a or b; run;
Yes, merging is expected with both ID and REC(from a and b).
Can you please suggest/guide me so that I can try that approach?
Thank you
This gives what you want, but I'd double check that logic, it seems a bit weird, specifically the REC field as indicated by @ballardw
*create sample data1;
data tableA;
infile cards dlm=',' truncover dsd;
informat ID $4. RecPos 8. ae $50.;
input ID $ Recpos AE $;
cards;
001, 1, "fever,chills"
001, 2, "artery rupture"
002, 0,
;;;;
run;
*create sample data2;
data tableB;
infile cards dlm=',' truncover dsd;
informat ID $4. Rec 8. ae_adj $50.;
input ID $ Rec AE_adj $;
cards;
001, 1, "rejection, fever,chills"
002, 1, "rejection"
;;;;
run;
*create a new join ID for table A;
data tableA2;
set tableA;
RecJoin=RecPOs;
run;
*create a new join ID for table B;
data tableB2;
set tableB;
RecJoin=Rec;
run;
*sort to merge;
proc sort data=tableA2;
by ID recJoin;
run;
proc sort data=TableB2;
by ID RecJoin;
run;
*merge;
data want;
merge tableA2 (in= a)
tableB2 (in= b) ;
by ID recJoin;
run;
PS I really really hate typing out data, so if you could provide data as a sample, similar to what I did at the top of my code in future questions that's helpful.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.