Solved: MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

wagdy · Posted 11-23-2018 11:18 AM

NO CONTRY 1 & CONTRY 4 IN DATA YAYA ??

MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

8 LIBNAME XXX 'c:\users\wagdy';

NOTE: Libref XXX was successfully assigned as follows:

Engine: V9

Physical Name: c:\users\wagdy

129 DATA XXX.YAYA;

130 MERGE XXX.orignal4 XXX.KARIMI2; BY SS_; RUN;

NOTE: There were 856 observations read from the data set XXX.ORIGNAL4.

NOTE: There were 388 observations read from the data set XXX.KARIMI2.

NOTE: The data set XXX.YAYA has 857 observations and 402 variables.

NOTE: DATA statement used (Total process time):

real time 0.06 seconds

cpu time 0.03 seconds

WHEY THE TOTAL IN XXX.YAYA IS 857 NOT 1244? REGARDS. WAGDY

mkeintz · Posted 11-23-2018 01:22 PM

As @Tom suggests, you appear to want to interleave records from 2 data sets instead of match-merging them.

I.e. you want the analog of the second program below:

data mostrecent_asof_2018OCT;
  merge mostrecent_asof_2018SEP  updates_for_2018OCT;
  by id;
run;


** versus **;
data complete_history_through_2018OCT;
  set complete_history_through_2018SEP upodates_for_2018OCT;
  by id;
run;

The first program REPLACES data in mostrecent_asof_2018SEP (the "old master") with matching observations in updates_for_2018OCT (the "update data set"), and only increases the number of obs in the resulting dataset for (1) records in the update data set having id's absent from the old master, and (2) instances in which a given ID has more records in the update data set than in the old master. BTW, condition (2) is usually a reason why you would only want to use this approach when there is never more than one record per BY-variable (or per list of BY variables) in both data sets.

The second program INTERLEAVES observations, your apparent goal. So the new number of observations exactly equals the sum of the number of obs in the incoming data sets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

VDD · Posted 11-23-2018 11:31 AM

@wagdy how many records do you think you should have?

129 DATA XXX.YAYA;

130 MERGE XXX.orignal4 XXX.KARIMI2; BY SS_; RUN;

NOTE: There were 856 observations read from the data set XXX.ORIGNAL4.

NOTE: There were 388 observations read from the data set XXX.KARIMI2.

NOTE: The data set XXX.YAYA has 857 observations and 402 variables.

try this is that what you are expecting?

DATA XXX.YAYA; 
MERGE XXX.orignal4 (in=ina)
XXX.KARIMI2; 
BY SS_; 
if ina;
RUN;

I don't think you want to do this but if you are just trying to stack the 2 datasets together

data xxx.yaya;
set xxx.orignal4 xxx.karimi2;
run;
proc sort data=xxx.yaya;
by SS_;
run;

PaigeMiller · Posted 11-23-2018 11:39 AM

There's no way we can answer your question about omitting observations unless we see your code and a reasonable portion of your data.

Click on the running man icon and paste your code in that window. Show us a portion of your data via creating SAS code to create the data sets, as shown here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

--
Paige Miller

saslove · Posted 11-23-2018 12:05 PM

1. Merge combines two observations based on common grounds. Here, make sure you are sorting the "by" variable _ss.

But, since it's merging ok without any errors, you are okay there.

Step 2: The number of observations is based on the "common variables". So, it keeps only that's common between the two. Please use in=dataset in your merge statement to make sure you end up with what you want.

Step 3: There are several resources online to understand the difference between set and merge. Basically, set combines both datasets together (doesn't need common "by" variable) and puts them on top of one another. This will result in what you are looking for. But make sure that's what you need.

Tom · Posted 11-23-2018 12:07 PM

WHY THE TOTAL IN XXX.YAYA IS 857 NOT 1244?

Because you MERGEd the datasets instead of just stacking them.

NOTE: There were 856 observations read from the data set XXX.ORIGNAL4.
NOTE: There were 388 observations read from the data set XXX.KARIMI2.

NOTE: The data set XXX.YAYA has 857 observations and 402 variables.

It looks like there was one value of SS_ in KARIMI2 that was not in ORIGNAL4 so the number of records increased by one. Another posibility is that one of the values of SS_ in KARIMI2 had one more observation than that value of SS_ had in ORIGNAL4.

If you just want to stack the datasets together instead of merging them then use the SET statement.

data xxx.yaya;
  set xxx.orignal4 xxx.karimi2;
  by ss_;
run;

mkeintz · Posted 11-23-2018 01:22 PM

As @Tom suggests, you appear to want to interleave records from 2 data sets instead of match-merging them.

I.e. you want the analog of the second program below:

data mostrecent_asof_2018OCT;
  merge mostrecent_asof_2018SEP  updates_for_2018OCT;
  by id;
run;


** versus **;
data complete_history_through_2018OCT;
  set complete_history_through_2018SEP upodates_for_2018OCT;
  by id;
run;

The first program REPLACES data in mostrecent_asof_2018SEP (the "old master") with matching observations in updates_for_2018OCT (the "update data set"), and only increases the number of obs in the resulting dataset for (1) records in the update data set having id's absent from the old master, and (2) instances in which a given ID has more records in the update data set than in the old master. BTW, condition (2) is usually a reason why you would only want to use this approach when there is never more than one record per BY-variable (or per list of BY variables) in both data sets.

The second program INTERLEAVES observations, your apparent goal. So the new number of observations exactly equals the sum of the number of obs in the incoming data sets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

Re: MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

Re: MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

Re: MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

Re: MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

Re: MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

Re: MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!