Hi guys, I have another #SASBeginners question. Well, maybe it's not a beginner's questions, but it's a question a beginner is having trouble answering.
I'm trying to update my original_dataset with data from another dataset ('data_to_add'), however I need to look at a flag and a numeric variable in the secondary dataset, to decide what variable to affect in the original_dataset.
In the contrived example below, the original_dataset has all PERSON_IDs and all variables initialised to null. In the data_to_add, we have numerous instances of each PERSON_ID, with an associated column number for each ob, however only one of those obs associated for that PERSON_ID has the KEEP_BL flag set to 'Y'. So each PERSON_ID in the data_to_add has only one 'Y'. In one instance, P0004, there is no flag set at all.
In this contrived example, I've used five columns, but could someone please show me how to do this for N columns? Thank you!
Ta,
JOE
original_dataset
PERSON_ID COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID
P0001 . . . . .
P0002 . . . . .
P0003 . . . . .
P0004 . . . . .
data_to_add
PERSON_ID COL_NO KEEP_BL
P0001 1 .
P0001 2 Y
P0001 2 .
P0001 4 .
P0002 5 .
P0002 3 Y
P0004 1 .
P0004 4 .
P0003 5 .
P0005 1 Y
P0003 1 Y
Desired_dataset:
PERSON_ID COL1_ID COL2_ID COL3_ID COL4_ID COL5_ID
P0001 . Y . . .
P0002 . . Y . .
P0003 Y . . . .
P0004 . . . . .
Looks like a case of using the UPDATE statement, but first you have to convert your transaction dataset to use the same variables as the master dataset.
Note that your transaction data has a record P0005 that is not in the master dataset. If you need to delete that then you might need extra code.
data original_dataset ;
length PERSON_ID $5 COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID $1;
input PERSON_ID COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID ;
cards;
P0001 . . . . .
P0002 . . . . .
P0003 . . . . .
P0004 . . . . .
run;
data data_to_add ;
length PERSON_ID $5 COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID $1;
length COL_NO 8 KEEP_BL $1 ;
input PERSON_ID COL_NO KEEP_BL ;
array cols col01_id -- col05_id ;
cols(col_no) = keep_bl ;
keep person_id -- col05_id;
cards;
P0001 1 .
P0001 2 Y
P0001 2 .
P0001 4 .
P0002 5 .
P0002 3 Y
P0004 1 .
P0004 4 .
P0003 5 .
P0005 1 Y
P0003 1 Y
run;
proc sort data=data_to_add;
by person_id;
run;
data want ;
update original_dataset data_to_add;
by person_id;
run;
PERSON_
Obs ID COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID
1 P0001 Y
2 P0002 Y
3 P0003 Y
4 P0004
5 P0005 Y
Looks like a case of using the UPDATE statement, but first you have to convert your transaction dataset to use the same variables as the master dataset.
Note that your transaction data has a record P0005 that is not in the master dataset. If you need to delete that then you might need extra code.
data original_dataset ;
length PERSON_ID $5 COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID $1;
input PERSON_ID COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID ;
cards;
P0001 . . . . .
P0002 . . . . .
P0003 . . . . .
P0004 . . . . .
run;
data data_to_add ;
length PERSON_ID $5 COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID $1;
length COL_NO 8 KEEP_BL $1 ;
input PERSON_ID COL_NO KEEP_BL ;
array cols col01_id -- col05_id ;
cols(col_no) = keep_bl ;
keep person_id -- col05_id;
cards;
P0001 1 .
P0001 2 Y
P0001 2 .
P0001 4 .
P0002 5 .
P0002 3 Y
P0004 1 .
P0004 4 .
P0003 5 .
P0005 1 Y
P0003 1 Y
run;
proc sort data=data_to_add;
by person_id;
run;
data want ;
update original_dataset data_to_add;
by person_id;
run;
PERSON_
Obs ID COL01_ID COL02_ID COL03_ID COL04_ID COL05_ID
1 P0001 Y
2 P0002 Y
3 P0003 Y
4 P0004
5 P0005 Y
Thanks for your response Tom! So it looks like I prep the transactional data first to essentially be in the same format (ie. variable names) as the original_dataset, and then run an update on the original_dataset with the data_to_add -- in a way, joining on the BY variable.
Quick follow up question: my actual data doesn't use PERSON_ID alone, and it actually uses a composite of two variables (PERSON_ID and SALE_ID) to effectively give the row/ob its 'primary-ness'. Can I say:
data want ;
update original_dataset data_to_add;
by person_id sale_id;
run;
instead? Will that work?
Ta,
JOE
Sure. Read up about UPDATE in the SAS manuals.
Thanks for the heads up!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.