BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello, 

I am trying to find a way to combine two datasets efficiently.   

  • Dataset A contains around 1000 variables (Numeric and Characters) and 20000 obs.
  • Dataset B contains around 50 variables (Numeric and Characters) and 5000 obs.

Both A and B have the same 5000 IDs.  B has 49 columns with the same names as A, and one extra 'Record_updated' column.  In addition, B has all the updated records in those 49 columns.  I would like to combine A and B with a result, that contains 1001 columns and 20000 obs,  including 49 updated B dataset columns, and a new 'Record_updated' column.  Please advise how to approach it, thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here is an example of a data step update with data set you should have that changes the values of some of the variables and adds an "updated" column, which you didn't bother to describe very well.

 

data work.base;
   set sashelp.class;
run;

data work.update;
   input name :$8. age height updated;
datalines;
Alfred   16  73 1
Jane     14  60.5 1
;

data new;
   update work.base
          work.update
   ;
   by name;
run; 

Limitations on this approach:

The base data set can have exactly 1 record with the by variable value(s).

Both the base data set, work.class in this case, and the update set must be sorted in the order of the by variables as usual for the by to work.

Any new variables in the update set are added to the result. By default, missing values in the Update set do not replace non-missing values in the base data set. If you need that behavior use the option UPDATEMODE=NOMISSINGCHECK on the update statement.

 

If you data does not match the description above then you need to provide a better description of the data and possibly the expected result.

View solution in original post

8 REPLIES 8
ballardw
Super User

Data step or sql update

PaigeMiller
Diamond | Level 26

Both A and B have the same 5000 IDs.

 

How is this combine supposed to work? It sounds like you want a many-to-one combine based upon ID, but you haven't actually stated that. Its an assumption and I'd like to know if it is a correct assumption.

--
Paige Miller
ybz12003
Rhodochrosite | Level 12
The A's 20000 IDs have B's 5000 IDs.
PaigeMiller
Diamond | Level 26

@ybz12003 wrote:
The A's 20000 IDs have B's 5000 IDs.

You have not answered my question. How is this "combine" supposed to happen? Is it many-to-one, or something else? Be specific. Provide details.

--
Paige Miller
ybz12003
Rhodochrosite | Level 12

I expect the final combined dataset C = 20000 IDs (A) = 5000 IDs (B) + 15000 extra IDs

ybz12003
Rhodochrosite | Level 12
Also,
the final combined dataset C = 1001 columns = 1950 old columns (A) + 49 updated infor. columns (B) + 1 'Record_updated' column (B)
ballardw
Super User

Here is an example of a data step update with data set you should have that changes the values of some of the variables and adds an "updated" column, which you didn't bother to describe very well.

 

data work.base;
   set sashelp.class;
run;

data work.update;
   input name :$8. age height updated;
datalines;
Alfred   16  73 1
Jane     14  60.5 1
;

data new;
   update work.base
          work.update
   ;
   by name;
run; 

Limitations on this approach:

The base data set can have exactly 1 record with the by variable value(s).

Both the base data set, work.class in this case, and the update set must be sorted in the order of the by variables as usual for the by to work.

Any new variables in the update set are added to the result. By default, missing values in the Update set do not replace non-missing values in the base data set. If you need that behavior use the option UPDATEMODE=NOMISSINGCHECK on the update statement.

 

If you data does not match the description above then you need to provide a better description of the data and possibly the expected result.

ybz12003
Rhodochrosite | Level 12
I didn't use your code but would like to try the 'Update' statement to compare mine to see if there is any difference. Although my codes had ten more steps, I feel safer getting a dataset I want.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 774 views
  • 1 like
  • 3 in conversation