BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dregerator
Obsidian | Level 7

Hi,

I'm trying to create a master dataset from two different datasets. (Dataset A and Dataset B). However, Dataset A has different values for two variables and I want those values to match what I have in dataset B.

 

For example: Dataset A

 

STUDYID     USUBJID            SUBJID VSBLFL

C01347    C01347-36501       36501     Y

C01347    C01347-36502      36502      Y

C01347    C01347-36503      36503      Y

C01347    C01347-36504      36504      Y

C01347    C01347-36505      36505      Y

 

Dataset B

STUDYID     USUBJID            SUBJID

SUP-9005    SUP-9005-36501   36501

SUP-9005    SUP-9005-36502   36502

SUP-9005    SUP-9005-36503   36503

SUP-9005    SUP-9005-36504   36504

SUP-9005    SUP-9005-36505   36505

 

 

What i would like is Master dataset:

STUDYID     USUBJID            SUBJID VSBLFL

SUP-9005    SUP-9005-36501     36501     Y

SUP-9005    SUP-9005-36501     36501

SUP-9005    SUP-9005-36502     36502      Y

SUP-9005    SUP-9005-36502     36502 

SUP-9005    SUP-9005-36503     36503      Y

SUP-9005    SUP-9005-36503     36503

SUP-9005    SUP-9005-36504     36504      Y

SUP-9005    SUP-9005-36504     36504

SUP-9005    SUP-9005-36505     36505      Y

SUP-9005    SUP-9005-36505     36505

 

I need baseline information from Dataset 1, however, it has different USUBJID's and STUDYID's. I know I could hardcode and change them, but I'm trying to data derive these values. Was hoping anyone had suggestions how to do this?

 

Please let me know if you need more information. 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Will you ever have multiple subjid in either data set? If so, this may not work.

*combine data sets - adding source variable;
data combined;
set DataA (in=A) DataB (in=B) indsname=source;
dataSource = source;
run;

*Sort to control the order of subjid based on input data source. May need to change if your data sets are not named A/B as in your example;
proc sort data=combined;
by subjid dataSource; 
run;

data want;
set combined;
by subjid;
*keeps values throughout the subjID;
retain new_usubjid new_StudyID;

*sets first record as values that will then be carried through;
if first.subjid then do;
    new_usubjid = usubjid;
    new_studyID = studyID;
end;


run;



You should be able to derive the VSBFL from the dataSource variable via an IF statement.

 


@Dregerator wrote:

Hi,

I'm trying to create a master dataset from two different datasets. (Dataset A and Dataset B). However, Dataset A has different values for two variables and I want those values to match what I have in dataset B.

 

For example: Dataset A

 

STUDYID     USUBJID            SUBJID VSBLFL

C01347    C01347-36501       36501     Y

C01347    C01347-36502      36502      Y

C01347    C01347-36503      36503      Y

C01347    C01347-36504      36504      Y

C01347    C01347-36505      36505      Y

 

Dataset B

STUDYID     USUBJID            SUBJID

SUP-9005    SUP-9005-36501   36501

SUP-9005    SUP-9005-36502   36502

SUP-9005    SUP-9005-36503   36503

SUP-9005    SUP-9005-36504   36504

SUP-9005    SUP-9005-36505   36505

 

 

What i would like is Master dataset:

STUDYID     USUBJID            SUBJID VSBLFL

SUP-9005    SUP-9005-36501     36501     Y

SUP-9005    SUP-9005-36501     36501

SUP-9005    SUP-9005-36502     36502      Y

SUP-9005    SUP-9005-36502     36502 

SUP-9005    SUP-9005-36503     36503      Y

SUP-9005    SUP-9005-36503     36503

SUP-9005    SUP-9005-36504     36504      Y

SUP-9005    SUP-9005-36504     36504

SUP-9005    SUP-9005-36505     36505      Y

SUP-9005    SUP-9005-36505     36505

 

I need baseline information from Dataset 1, however, it has different USUBJID's and STUDYID's. I know I could hardcode and change them, but I'm trying to data derive these values. Was hoping anyone had suggestions how to do this?

 

Please let me know if you need more information. 

 

 

 

 


 

View solution in original post

4 REPLIES 4
Dregerator
Obsidian | Level 7
No, I need all the information from Dataset A and set it to Dataset B, however, Dataset A has different STUDYID and USUBJID's. The only common variable between them is SUBJID. I am trying to find a way to match the STUDYID and USUBJID from Dataset A to Dataset B without hardcoding these values.
Kurt_Bremser
Super User

Your "master" dataset does not have a value from dataset A, aside from those in vsblfl.

So this code creates your intended master dataset without even reading A:

data b;
input STUDYID  $   USUBJID  :$14. SUBJID :$5.;
datalines;
SUP-9005    SUP-9005-36501   36501
SUP-9005    SUP-9005-36502   36502
SUP-9005    SUP-9005-36503   36503
SUP-9005    SUP-9005-36504   36504
SUP-9005    SUP-9005-36505   36505
;

data master;
set B;
vsblfl = "Y";
output;
vsblfl = "";
output;
run;

proc print data=master noobs;
run;

Result:

STUDYID        USUBJID        SUBJID    vsblfl

SUP-9005    SUP-9005-36501    36501       Y   
SUP-9005    SUP-9005-36501    36501           
SUP-9005    SUP-9005-36502    36502       Y   
SUP-9005    SUP-9005-36502    36502           
SUP-9005    SUP-9005-36503    36503       Y   
SUP-9005    SUP-9005-36503    36503           
SUP-9005    SUP-9005-36504    36504       Y   
SUP-9005    SUP-9005-36504    36504           
SUP-9005    SUP-9005-36505    36505       Y   
SUP-9005    SUP-9005-36505    36505           

So please tell us how the values in A influence the outcome.

Reeza
Super User

Will you ever have multiple subjid in either data set? If so, this may not work.

*combine data sets - adding source variable;
data combined;
set DataA (in=A) DataB (in=B) indsname=source;
dataSource = source;
run;

*Sort to control the order of subjid based on input data source. May need to change if your data sets are not named A/B as in your example;
proc sort data=combined;
by subjid dataSource; 
run;

data want;
set combined;
by subjid;
*keeps values throughout the subjID;
retain new_usubjid new_StudyID;

*sets first record as values that will then be carried through;
if first.subjid then do;
    new_usubjid = usubjid;
    new_studyID = studyID;
end;


run;



You should be able to derive the VSBFL from the dataSource variable via an IF statement.

 


@Dregerator wrote:

Hi,

I'm trying to create a master dataset from two different datasets. (Dataset A and Dataset B). However, Dataset A has different values for two variables and I want those values to match what I have in dataset B.

 

For example: Dataset A

 

STUDYID     USUBJID            SUBJID VSBLFL

C01347    C01347-36501       36501     Y

C01347    C01347-36502      36502      Y

C01347    C01347-36503      36503      Y

C01347    C01347-36504      36504      Y

C01347    C01347-36505      36505      Y

 

Dataset B

STUDYID     USUBJID            SUBJID

SUP-9005    SUP-9005-36501   36501

SUP-9005    SUP-9005-36502   36502

SUP-9005    SUP-9005-36503   36503

SUP-9005    SUP-9005-36504   36504

SUP-9005    SUP-9005-36505   36505

 

 

What i would like is Master dataset:

STUDYID     USUBJID            SUBJID VSBLFL

SUP-9005    SUP-9005-36501     36501     Y

SUP-9005    SUP-9005-36501     36501

SUP-9005    SUP-9005-36502     36502      Y

SUP-9005    SUP-9005-36502     36502 

SUP-9005    SUP-9005-36503     36503      Y

SUP-9005    SUP-9005-36503     36503

SUP-9005    SUP-9005-36504     36504      Y

SUP-9005    SUP-9005-36504     36504

SUP-9005    SUP-9005-36505     36505      Y

SUP-9005    SUP-9005-36505     36505

 

I need baseline information from Dataset 1, however, it has different USUBJID's and STUDYID's. I know I could hardcode and change them, but I'm trying to data derive these values. Was hoping anyone had suggestions how to do this?

 

Please let me know if you need more information. 

 

 

 

 


 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1436 views
  • 2 likes
  • 3 in conversation