BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wagdy
Obsidian | Level 7

NO CONTRY 1 & CONTRY 4 IN DATA YAYA ??

 

MERGED TWO FILES BUT THE NEW FILE OMTING MANY OBSERVATIONS

8 LIBNAME XXX 'c:\users\wagdy';

NOTE: Libref XXX was successfully assigned as follows:

     Engine:       V9

     Physical Name: c:\users\wagdy

129   DATA XXX.YAYA;

130   MERGE XXX.orignal4 XXX.KARIMI2; BY SS_; RUN;

 

NOTE: There were 856 observations read from the data set XXX.ORIGNAL4.

NOTE: There were 388 observations read from the data set XXX.KARIMI2.

 

NOTE: The data set XXX.YAYA has 857 observations and 402 variables.

 

NOTE: DATA statement used (Total process time):

     real time           0.06 seconds

     cpu time           0.03 seconds

 

 

WHEY THE TOTAL IN XXX.YAYA IS 857 NOT 1244? REGARDS. WAGDY

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

As @Tom suggests, you appear to want to interleave records from 2 data sets instead of match-merging them.

 

I.e. you want the analog of the second program below:

 

data mostrecent_asof_2018OCT;
  merge mostrecent_asof_2018SEP  updates_for_2018OCT;
  by id;
run;


** versus **;
data complete_history_through_2018OCT;
  set complete_history_through_2018SEP upodates_for_2018OCT;
  by id;
run;

 

The first program REPLACES data in mostrecent_asof_2018SEP (the "old master") with matching observations in updates_for_2018OCT (the "update data set"), and only increases the number of obs in the resulting dataset for (1) records in the update data set having id's absent from the old master, and (2) instances in which a given ID has more records in the update data set than in the old master.  BTW, condition (2) is usually a reason why you would only want to use this approach when there is never more than one record per BY-variable (or per list of BY variables) in both data sets.

 

 

The second program INTERLEAVES observations, your apparent goal.  So the new number of observations exactly equals the sum of the number of obs in the incoming data sets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

@wagdy  how many records do you think you should have?

 

129   DATA XXX.YAYA;

130   MERGE XXX.orignal4 XXX.KARIMI2; BY SS_; RUN;

 

NOTE: There were 856 observations read from the data set XXX.ORIGNAL4.

NOTE: There were 388 observations read from the data set XXX.KARIMI2.

 

NOTE: The data set XXX.YAYA has 857 observations and 402 variables.

try this is that what you are expecting?  

DATA XXX.YAYA; 
MERGE XXX.orignal4 (in=ina)
XXX.KARIMI2; 
BY SS_; 
if ina;
RUN;

 

I don't think you want to do this but if you are just trying to stack the 2 datasets together 

data xxx.yaya;
set xxx.orignal4 xxx.karimi2;
run;
proc sort data=xxx.yaya;
by SS_;
run;

 

PaigeMiller
Diamond | Level 26

There's no way we can answer your question about omitting observations unless we see your code and a reasonable portion of your data.

 

Click on the running man icon and paste your code in that window. Show us a portion of your data via creating SAS code to create the data sets, as shown here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

--
Paige Miller
saslove
Quartz | Level 8

1. Merge combines two observations based on common grounds. Here, make sure you are sorting the "by" variable _ss.

 

But, since it's merging ok without any errors, you are okay there.

 

Step 2: The number of observations is based on the "common variables". So, it keeps only that's common between the two. Please use in=dataset in your merge statement to make sure you end up with what you want.

 

Step 3: There are several resources online to understand the difference between set and merge. Basically, set combines both datasets together (doesn't need common "by" variable) and puts them on top of one another. This will result in what you are looking for. But make sure that's what you need.

Tom
Super User Tom
Super User

WHY THE TOTAL IN XXX.YAYA IS 857 NOT 1244? 

Because you MERGEd the datasets instead of just stacking them.

NOTE: There were 856 observations read from the data set XXX.ORIGNAL4.
NOTE: There were 388 observations read from the data set XXX.KARIMI2.

NOTE: The data set XXX.YAYA has 857 observations and 402 variables.

It looks like there was one value of SS_ in KARIMI2 that was not in ORIGNAL4 so the number of records increased by one. Another posibility is that one of the values of SS_ in KARIMI2 had one more observation than that value of SS_ had in ORIGNAL4.

 

If you just want to stack the datasets together instead of merging them then use the SET statement.

data xxx.yaya;
  set xxx.orignal4 xxx.karimi2;
  by ss_;
run;

 

 

 

mkeintz
PROC Star

As @Tom suggests, you appear to want to interleave records from 2 data sets instead of match-merging them.

 

I.e. you want the analog of the second program below:

 

data mostrecent_asof_2018OCT;
  merge mostrecent_asof_2018SEP  updates_for_2018OCT;
  by id;
run;


** versus **;
data complete_history_through_2018OCT;
  set complete_history_through_2018SEP upodates_for_2018OCT;
  by id;
run;

 

The first program REPLACES data in mostrecent_asof_2018SEP (the "old master") with matching observations in updates_for_2018OCT (the "update data set"), and only increases the number of obs in the resulting dataset for (1) records in the update data set having id's absent from the old master, and (2) instances in which a given ID has more records in the update data set than in the old master.  BTW, condition (2) is usually a reason why you would only want to use this approach when there is never more than one record per BY-variable (or per list of BY variables) in both data sets.

 

 

The second program INTERLEAVES observations, your apparent goal.  So the new number of observations exactly equals the sum of the number of obs in the incoming data sets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 939 views
  • 3 likes
  • 6 in conversation