Hi,
I'm trying to create a master dataset from two different datasets. (Dataset A and Dataset B). However, Dataset A has different values for two variables and I want those values to match what I have in dataset B.
For example: Dataset A
STUDYID USUBJID SUBJID VSBLFL
C01347 C01347-36501 36501 Y
C01347 C01347-36502 36502 Y
C01347 C01347-36503 36503 Y
C01347 C01347-36504 36504 Y
C01347 C01347-36505 36505 Y
Dataset B
STUDYID USUBJID SUBJID
SUP-9005 SUP-9005-36501 36501
SUP-9005 SUP-9005-36502 36502
SUP-9005 SUP-9005-36503 36503
SUP-9005 SUP-9005-36504 36504
SUP-9005 SUP-9005-36505 36505
What i would like is Master dataset:
STUDYID USUBJID SUBJID VSBLFL
SUP-9005 SUP-9005-36501 36501 Y
SUP-9005 SUP-9005-36501 36501
SUP-9005 SUP-9005-36502 36502 Y
SUP-9005 SUP-9005-36502 36502
SUP-9005 SUP-9005-36503 36503 Y
SUP-9005 SUP-9005-36503 36503
SUP-9005 SUP-9005-36504 36504 Y
SUP-9005 SUP-9005-36504 36504
SUP-9005 SUP-9005-36505 36505 Y
SUP-9005 SUP-9005-36505 36505
I need baseline information from Dataset 1, however, it has different USUBJID's and STUDYID's. I know I could hardcode and change them, but I'm trying to data derive these values. Was hoping anyone had suggestions how to do this?
Please let me know if you need more information.
Will you ever have multiple subjid in either data set? If so, this may not work.
*combine data sets - adding source variable;
data combined;
set DataA (in=A) DataB (in=B) indsname=source;
dataSource = source;
run;
*Sort to control the order of subjid based on input data source. May need to change if your data sets are not named A/B as in your example;
proc sort data=combined;
by subjid dataSource;
run;
data want;
set combined;
by subjid;
*keeps values throughout the subjID;
retain new_usubjid new_StudyID;
*sets first record as values that will then be carried through;
if first.subjid then do;
new_usubjid = usubjid;
new_studyID = studyID;
end;
run;
You should be able to derive the VSBFL from the dataSource variable via an IF statement.
@Dregerator wrote:
Hi,
I'm trying to create a master dataset from two different datasets. (Dataset A and Dataset B). However, Dataset A has different values for two variables and I want those values to match what I have in dataset B.
For example: Dataset A
STUDYID USUBJID SUBJID VSBLFL
C01347 C01347-36501 36501 Y
C01347 C01347-36502 36502 Y
C01347 C01347-36503 36503 Y
C01347 C01347-36504 36504 Y
C01347 C01347-36505 36505 Y
Dataset B
STUDYID USUBJID SUBJID
SUP-9005 SUP-9005-36501 36501
SUP-9005 SUP-9005-36502 36502
SUP-9005 SUP-9005-36503 36503
SUP-9005 SUP-9005-36504 36504
SUP-9005 SUP-9005-36505 36505
What i would like is Master dataset:
STUDYID USUBJID SUBJID VSBLFL
SUP-9005 SUP-9005-36501 36501 Y
SUP-9005 SUP-9005-36501 36501
SUP-9005 SUP-9005-36502 36502 Y
SUP-9005 SUP-9005-36502 36502
SUP-9005 SUP-9005-36503 36503 Y
SUP-9005 SUP-9005-36503 36503
SUP-9005 SUP-9005-36504 36504 Y
SUP-9005 SUP-9005-36504 36504
SUP-9005 SUP-9005-36505 36505 Y
SUP-9005 SUP-9005-36505 36505
I need baseline information from Dataset 1, however, it has different USUBJID's and STUDYID's. I know I could hardcode and change them, but I'm trying to data derive these values. Was hoping anyone had suggestions how to do this?
Please let me know if you need more information.
So, for every subjid found in A, you want to duplicate the observation in B, one with and one without a value in vsblfl?
Your "master" dataset does not have a value from dataset A, aside from those in vsblfl.
So this code creates your intended master dataset without even reading A:
data b;
input STUDYID $ USUBJID :$14. SUBJID :$5.;
datalines;
SUP-9005 SUP-9005-36501 36501
SUP-9005 SUP-9005-36502 36502
SUP-9005 SUP-9005-36503 36503
SUP-9005 SUP-9005-36504 36504
SUP-9005 SUP-9005-36505 36505
;
data master;
set B;
vsblfl = "Y";
output;
vsblfl = "";
output;
run;
proc print data=master noobs;
run;
Result:
STUDYID USUBJID SUBJID vsblfl SUP-9005 SUP-9005-36501 36501 Y SUP-9005 SUP-9005-36501 36501 SUP-9005 SUP-9005-36502 36502 Y SUP-9005 SUP-9005-36502 36502 SUP-9005 SUP-9005-36503 36503 Y SUP-9005 SUP-9005-36503 36503 SUP-9005 SUP-9005-36504 36504 Y SUP-9005 SUP-9005-36504 36504 SUP-9005 SUP-9005-36505 36505 Y SUP-9005 SUP-9005-36505 36505
So please tell us how the values in A influence the outcome.
Will you ever have multiple subjid in either data set? If so, this may not work.
*combine data sets - adding source variable;
data combined;
set DataA (in=A) DataB (in=B) indsname=source;
dataSource = source;
run;
*Sort to control the order of subjid based on input data source. May need to change if your data sets are not named A/B as in your example;
proc sort data=combined;
by subjid dataSource;
run;
data want;
set combined;
by subjid;
*keeps values throughout the subjID;
retain new_usubjid new_StudyID;
*sets first record as values that will then be carried through;
if first.subjid then do;
new_usubjid = usubjid;
new_studyID = studyID;
end;
run;
You should be able to derive the VSBFL from the dataSource variable via an IF statement.
@Dregerator wrote:
Hi,
I'm trying to create a master dataset from two different datasets. (Dataset A and Dataset B). However, Dataset A has different values for two variables and I want those values to match what I have in dataset B.
For example: Dataset A
STUDYID USUBJID SUBJID VSBLFL
C01347 C01347-36501 36501 Y
C01347 C01347-36502 36502 Y
C01347 C01347-36503 36503 Y
C01347 C01347-36504 36504 Y
C01347 C01347-36505 36505 Y
Dataset B
STUDYID USUBJID SUBJID
SUP-9005 SUP-9005-36501 36501
SUP-9005 SUP-9005-36502 36502
SUP-9005 SUP-9005-36503 36503
SUP-9005 SUP-9005-36504 36504
SUP-9005 SUP-9005-36505 36505
What i would like is Master dataset:
STUDYID USUBJID SUBJID VSBLFL
SUP-9005 SUP-9005-36501 36501 Y
SUP-9005 SUP-9005-36501 36501
SUP-9005 SUP-9005-36502 36502 Y
SUP-9005 SUP-9005-36502 36502
SUP-9005 SUP-9005-36503 36503 Y
SUP-9005 SUP-9005-36503 36503
SUP-9005 SUP-9005-36504 36504 Y
SUP-9005 SUP-9005-36504 36504
SUP-9005 SUP-9005-36505 36505 Y
SUP-9005 SUP-9005-36505 36505
I need baseline information from Dataset 1, however, it has different USUBJID's and STUDYID's. I know I could hardcode and change them, but I'm trying to data derive these values. Was hoping anyone had suggestions how to do this?
Please let me know if you need more information.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.