BookmarkSubscribeRSS Feed
Nasya
Obsidian | Level 7

Dataset a:                                                     Dataset b:

ID     Recpos    AE                                        ID     Rec   AE_adj

001      1           fever,chills                           001      1     rejection,fever,chills

001      2           artery rupture                     002       1    rejection           

002      0

                                                                     

                                                                     

ID-Subject ID-Character                             ID-Subject ID-Character                     

Recpos-Record-Numeric                           Rec-Record-Numeric

AE-AEterm-character                                 AE_adj-AEterm-character

 

Output Expected:

 

ID     Recpos  Rec     AE                                           AE_adj

001      1         1        fever,chills                              rejection,fever,chills

001      2                   artery rupture                                

002      0

002                  1                                               rejection

 

Output that I got after running the below code:

 

ID     Recpos    AE                                           AE_adj

001      1           fever,chills                              rejection,fever,chills

001      2           artery rupture                                

002      0

002      1                                                           rejection

 

Can you please suggest what is going wrong with in the merging? Is it the renaming thing that is causing this issue?

Thank you,                                                                  

data step_01_a;
  set a;
  keep ID Recpos AE;
run;

proc sort data = step_01_a;
by ID Recpos;
run;
data step_01_b; set b; keep ID Rec AE_adj; run;

proc sort data=step_01_b out=step_01_b(rename=(Rec=Recpos));
by ID Recpos;
run;

data final;
merge step_01_a(in=a) step_01_b(in=b);
if a or b;
run;

 

6 REPLIES 6
ballardw
Super User

You post your example data as dataset a and dataset b, and not as usable data in the form of data step, and then do not use those names in the code. So there is some possibility of confusion right there.

Second, we do not have any actual surety of the contents of any of your data sets, names of variables or such.

 

You renamed REC to RECPOSE in the second sort so REC went away and unless there is a REC variable in the A set then you shifted the value to the RECPOSE column.

 

Did you actually intend to match the records by sequence or by the value of ID?

Nasya
Obsidian | Level 7

Hi,

 

Sorting is done by both ID and Record which has different labels in dataset a and b. Dataset a has Record as Recpos and b as Rec and hence I have done the renaming.

Is this causing any issue of Record information being over written or something?

Is there a way we can merge two different dataset and still have all the values in the output, even the missing values?

Thank you.

ballardw
Super User

@Nasya wrote:

Hi,

 

Sorting is done by both ID and Record which has different labels in dataset a and b. Dataset a has Record as Recpos and b as Rec and hence I have done the renaming.

Is this causing any issue of Record information being over written or something?

Is there a way we can merge two different dataset and still have all the values in the output, even the missing values?

Thank you.


From your wanted and actual code the difference is that REC does not appear as a variable and the value appears in the wrong variable:

proc sort data=step_01_b 
     out=step_01_b(rename=(Rec=Recpos));
by ID Recpos;
run;

which is why the variable REC is not in the output. There is no variable. How can it "have a missing value"?

 

 

I think you need to expand on the logic of rules involving the role of ID RECPOS and REC in the merge process.

 

Your merge did not nave any BY statement, which was somewhat odd in that you sorted two data sets before merging, made a common variable (recpos). Without a BY statement MERGE will match the first record from one set with the first of the second, only using the order of the records in the two sets.

Without matching to ensure a common sort order there may be other issues that can arise. And since missing comes before 0 in sort orders you will have some difficulty getting the order you show with 0 before missing for recpos on id 002

Reeza
Super User

Yes, its a rename issue. Don't rename it if you don't want it merged. 

 

Your MERGE is also missing a BY statement which you likely want.

 

 See the coloured text below. If you want it merged by REC and ID then you will need a different approach than outlined here. 

 

One thing that can help avoid these issues is if you comment your code. Before each step add a single comment that explains the logic, ie:

 

1. create a smaller data set to merge and test

2. Sort before a merge

3. join data from adverse events to the main trial data

 

 

data step_01_a;
	set a;
	keep ID Recpos AE;
run;

proc sort data = step_01_a;
	by ID Recpos;
run;

data step_01_b;
	set b;
	keep ID Rec AE_adj;
run;

proc sort data=step_01_b out=step_01_b (rename=(Rec=Recpos));
	by ID Recpos;
run;

data final;
	merge step_01_a(in=a) step_01_b(in=b);
    BY ID;
	if a or b;
run;

Nasya
Obsidian | Level 7

Yes, merging is expected with both ID and REC(from a and b).

Can you please suggest/guide me so that I can try that approach?

Thank you

Reeza
Super User

This gives what you want, but I'd double check that logic, it seems a bit weird, specifically the REC field as indicated by @ballardw

 

 

*create sample data1;
data tableA;
infile cards dlm=',' truncover dsd;
informat ID $4. RecPos 8. ae $50.;
input ID  $    Recpos    AE  $;
cards;
001, 1, "fever,chills" 
001, 2, "artery rupture"  
002, 0,
;;;;
run; 

*create sample data2;
data tableB;
infile cards dlm=',' truncover dsd;
informat ID $4. Rec 8. ae_adj $50.;
input ID  $    Rec    AE_adj  $;
cards;
001, 1, "rejection, fever,chills" 
002, 1,   "rejection"
;;;;
run; 


*create a new join ID for table A;
data tableA2;
set tableA;
RecJoin=RecPOs;
run;

*create a new join ID for table B;
data tableB2;
set tableB;
RecJoin=Rec;
run;

*sort to merge;
proc sort data=tableA2;
by ID recJoin;
run;

proc sort data=TableB2;
by ID RecJoin;
run;

*merge;
data want;
merge tableA2 (in= a) 
      tableB2 (in= b) ;
by ID recJoin;
run;

PS I really really hate typing out data, so if you could provide data as a sample, similar to what I did at the top of my code in future questions that's helpful. 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 823 views
  • 0 likes
  • 3 in conversation