Hello,
I am trying to find a way to combine two datasets efficiently.
Both A and B have the same 5000 IDs. B has 49 columns with the same names as A, and one extra 'Record_updated' column. In addition, B has all the updated records in those 49 columns. I would like to combine A and B with a result, that contains 1001 columns and 20000 obs, including 49 updated B dataset columns, and a new 'Record_updated' column. Please advise how to approach it, thanks.
Here is an example of a data step update with data set you should have that changes the values of some of the variables and adds an "updated" column, which you didn't bother to describe very well.
data work.base; set sashelp.class; run; data work.update; input name :$8. age height updated; datalines; Alfred 16 73 1 Jane 14 60.5 1 ; data new; update work.base work.update ; by name; run;
Limitations on this approach:
The base data set can have exactly 1 record with the by variable value(s).
Both the base data set, work.class in this case, and the update set must be sorted in the order of the by variables as usual for the by to work.
Any new variables in the update set are added to the result. By default, missing values in the Update set do not replace non-missing values in the base data set. If you need that behavior use the option UPDATEMODE=NOMISSINGCHECK on the update statement.
If you data does not match the description above then you need to provide a better description of the data and possibly the expected result.
Data step or sql update
Both A and B have the same 5000 IDs.
How is this combine supposed to work? It sounds like you want a many-to-one combine based upon ID, but you haven't actually stated that. Its an assumption and I'd like to know if it is a correct assumption.
@ybz12003 wrote:
The A's 20000 IDs have B's 5000 IDs.
You have not answered my question. How is this "combine" supposed to happen? Is it many-to-one, or something else? Be specific. Provide details.
I expect the final combined dataset C = 20000 IDs (A) = 5000 IDs (B) + 15000 extra IDs
Here is an example of a data step update with data set you should have that changes the values of some of the variables and adds an "updated" column, which you didn't bother to describe very well.
data work.base; set sashelp.class; run; data work.update; input name :$8. age height updated; datalines; Alfred 16 73 1 Jane 14 60.5 1 ; data new; update work.base work.update ; by name; run;
Limitations on this approach:
The base data set can have exactly 1 record with the by variable value(s).
Both the base data set, work.class in this case, and the update set must be sorted in the order of the by variables as usual for the by to work.
Any new variables in the update set are added to the result. By default, missing values in the Update set do not replace non-missing values in the base data set. If you need that behavior use the option UPDATEMODE=NOMISSINGCHECK on the update statement.
If you data does not match the description above then you need to provide a better description of the data and possibly the expected result.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.