Solved: Joing/ Merge/ Append??

jenim514 · Posted 02-12-2016 04:15 PM

Hi! I am joing to join/merge/ append...not sure which!?

I have five tables, and I want to merge all table- all rows and all variables. There are some variables that overlap, but some variables that are unique to that table. For example, PatientID is in all tables, but varialbe NDC is only in one table that I am merging (but I need NDC populated in the final table with any missing values assigned to missing). I need all rows and all variables in the final table...with values assigned missing if variable was not in the merged table. Make sense?

I've tried append... but some many values assigned to missing when they should have been populated.

Proc datasets force;

Append base=asthma.FULLDX_FY11to14_IP_OP data=tedi.fulldx_FY11to14_MIL_crgF;

Append base=asthma.FULLDX_FY11to14_IP_OP data= tedni.ALLDX_OPER_FY11to14_MIL_CRGF;

Append base= asthma.FULLDX_FY11to14_IP_OP data= caper.ALLDX_OPER_FY11to14_MIL_CRGF;

Append base=asthma.FULLDX_FY11to14_IP_OP data=sidr.fulldx_FY11to14_MIL_crgF;

run;

I've tried UNION, UNION ALL...both gave tons of errors and did not process.

I tried MERGE, but did not include a BY statment. SHould I have?

TIA for your help!!

FreelanceReinh · Posted 02-12-2016 05:21 PM

Hello @jenim514,

You haven't mentioned the SET statement yet, although the result of a data step of the form shown below would be similar to what you have described as your intended result.

data want;
set have1
    have2
    have3
    have4
    have5;
run;

Dataset WANT would contain the data from all rows of HAVE1 - HAVE5. The sum of the five numbers of observations of HAVE1 - HAVE5 would equal the number of observations in WANT.
Dataset WANT would contain all variables found in HAVE1 - HAVE5. Common variables would be aligned. So, the number of variables in WANT would be less than the sum of the five individual numbers of variables.
Variables occurring only in some of the five input datasets would be assigned missing values in the observations coming from the other datasets.

Of course, the above data step could be enhanced. For example, if HAVE1 - HAVE5 were sorted by PatientID, the statement by PatientID; could be inserted in order to keep observations from the same patient together. A variable SOURCE could be introduced indicating which of the five input datasets contributed the respective observation. ...

That said, the main question is: What will be the purpose of the new dataset?

This question should govern the choice of the most appropriate concatenation/interleaving/merging technique.

For example: If both HAVE 2 and HAVE4 contribute one observation from PatientID=1234, would it make sense to have these combined into one observation of WANT?

View solution in original post

ballardw · Posted 02-12-2016 05:05 PM

Append in a number of forms such as Proc Append as well as Proc datasets, Union and data step SET stack the data "vertically" as in each row of each contributing set is added sequentially.

It appears that you likely want a MERGE (data step) or JOIN (Proc Sql).

Since you have multiple tables the orders of steps can be very important. It would help if you could show a few rows of each data set and exactly what the output data set should look like as well as any known rules for which "overlapping" varibles end up in the final data.

It sounds like PatientId is likely to be a key in tying things together but we need to know if it appears multiple times in any of the contributing data sets. If so, then we need to know what rule there may be for matching the different rows in one set to those in another.

FreelanceReinh · Posted 02-12-2016 05:21 PM

Hello @jenim514,

You haven't mentioned the SET statement yet, although the result of a data step of the form shown below would be similar to what you have described as your intended result.

data want;
set have1
    have2
    have3
    have4
    have5;
run;

Dataset WANT would contain the data from all rows of HAVE1 - HAVE5. The sum of the five numbers of observations of HAVE1 - HAVE5 would equal the number of observations in WANT.
Dataset WANT would contain all variables found in HAVE1 - HAVE5. Common variables would be aligned. So, the number of variables in WANT would be less than the sum of the five individual numbers of variables.
Variables occurring only in some of the five input datasets would be assigned missing values in the observations coming from the other datasets.

Of course, the above data step could be enhanced. For example, if HAVE1 - HAVE5 were sorted by PatientID, the statement by PatientID; could be inserted in order to keep observations from the same patient together. A variable SOURCE could be introduced indicating which of the five input datasets contributed the respective observation. ...

That said, the main question is: What will be the purpose of the new dataset?

This question should govern the choice of the most appropriate concatenation/interleaving/merging technique.

For example: If both HAVE 2 and HAVE4 contribute one observation from PatientID=1234, would it make sense to have these combined into one observation of WANT?

Joing/ Merge/ Append??

Re: Joing/ Merge/ Append??

Re: Joing/ Merge/ Append??

Re: Joing/ Merge/ Append??

Joing/ Merge/ Append??

Re: Joing/ Merge/ Append??

Re: Joing/ Merge/ Append??

Re: Joing/ Merge/ Append??

Registration is open

SAS Training: Just a Click Away