BookmarkSubscribeRSS Feed
Jannie_D
Calcite | Level 5

I have a four very big (500 000+ observations) dataset. They all have the same variables but are from different years.

what happens when I use a merge statement to merge them.

like:

 

data DRGa;

merge DRG2014a DRG2015a DRG2016a DRG2017a;

by pnr;

run;

 

my worry is that data will disappear if there are repeated observations in the same variable.

for instance date of hospitalization. For some it might only have happened in 2014 but for some it might have happened several times. 

I guess I am wondering what happens to the data if there is repetetive observations over the years?

and if there is a better way to manage it?

3 REPLIES 3
Kurt_Bremser
Super User

Your problem is rather a problem of data "appearing" than of data "disappearing".

Run this for reference:

data a;
input id $ val1;
datalines;
A 1
A 2
A 3
;

data b;
input id $ val2;
datalines;
A 1
;

data c;

input id $ val3;
datalines;
A 5
A 6
;

data all;
merge a b c;
by id;
run;

So you need to make a decision how to treat data that would be "multiplied" to fill up the columns. If you want to have missing values in such a case, you can modify the MERGE:

data all;
if 0 then set a b c;
call missing (of _all_);
merge a b c;
by id;
run;

If, OTOH, you want build all possible combinations (called a cartesian join), you need to use SQL.

 

If you need more help, please post some illustrative example data (in data steps with datalines) and what you expect to get out of that.

Kurt_Bremser
Super User

Oh, wait, I see you said "They have the same variables".

Then you should not MERGE them, but stack them with a SET statement:

data DRGa;
set DRG2014a DRG2015a DRG2016a DRG2017a;
by pnr;
run;

This will result in a dataset where you have all yearly observations grouped by pnr.

Reeza
Super User
You need to stack your data, via an APPEND or SET not merge in this case. You may or may not need the BY statement but I'm guessing not really. If you want to identify which record comes from which data set look at the INDSNAME option.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1667 views
  • 0 likes
  • 3 in conversation