Updating dataset with duplicates

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Updating dataset with duplicates

I want to Merge or Update these two files with duplicate “by variable”. Assume that there are 100 addition variables and 1,000 members with different Member_ID and each has duplicate observations.  How can you merge the two files?

Author once helped me with similar stuff, but I think it was just tailored for then particular incident with few variables.

Note: I am a beginner and not familiar with Macros. Please avoid macro codes

Dataset_1
Member_IDNameLasnameproduct_namequantityProduceCodeDate
1001zzxxtt2001001/1/2012
1001zzxxtt-122001/2/2012
1001zzxxnn-103001/3/2012
1001zzxxnn2001001/4/2012
1001zzxxnn2002001/5/2012
1001zzxxnn2005001/6/2012
1002yysstt1001005/10/2012
1002yysstt4002005/10/2012
1002yyssnn1003005/10/2012
1002yyssnn1001005/10/2012
1002yyssnn1002005/10/2012
1002yyssnn1005005/10/2012

Dataset_2
Member_IDNameGenderAGEContry_of originCountryregionEthnicFamily size
1001zzm40IndiaUSABambaW10
1002yyf30CanadaMexicoCanaB50

Merged _Dataset_1_2
Member_IDNameLasnameproduct_namequantityProduceCodeDateGenderAGEContry_of originCountryregionEthnicFamily size
1001zzxxtt2001001/1/2012m40IndiaUSABambaW10
1001zzxxtt-122001/2/2012m40IndiaUSABambaW10
1001zzxxnn-103001/3/2012m40IndiaUSABambaW10
1001zzxxnn2001001/4/2012m40IndiaUSABambaW10
1001zzxxnn2002001/5/2012m40IndiaUSABambaW10
1001zzxxnn2005001/6/2012m40IndiaUSABambaW10
1002yysstt1001005/10/2012f30CanadaMexicoCanaB50
1002yysstt4002005/10/2012f30CanadaMexicoCanaB50
1002yyssnn1003005/10/2012f30CanadaMexicoCanaB50
1002yyssnn1001005/10/2012f30CanadaMexicoCanaB50
1002yyssnn1002005/10/2012f30CanadaMexicoCanaB50
1002yyssnn1005005/10/2012f30CanadaMexicoCanaB50

Data Merged _Dataset_1_2;

Merge Dataset_1 Dataset_2;

By Member_ID;

Run;


Accepted Solutions
Solution
‎08-06-2012 02:27 PM
PROC Star
Posts: 7,356

Re: Updating dataset with duplicates

As long as you use correct variable names (i.e., make sure they don't include spaces like the ones shown in your example), and your files are sorted by your by variable(s), your code should work.  However, since name is on both files, I'd include it it as a by variable.  e.g.:

Data Merged_Dataset_1_2;

  Merge Dataset_1 Dataset_2;

  By Member_ID name;

Run;

View solution in original post


All Replies
Solution
‎08-06-2012 02:27 PM
PROC Star
Posts: 7,356

Re: Updating dataset with duplicates

As long as you use correct variable names (i.e., make sure they don't include spaces like the ones shown in your example), and your files are sorted by your by variable(s), your code should work.  However, since name is on both files, I'd include it it as a by variable.  e.g.:

Data Merged_Dataset_1_2;

  Merge Dataset_1 Dataset_2;

  By Member_ID name;

Run;

PROC Star
Posts: 7,356

Re: Updating dataset with duplicates

Your code won't work.  You create table2 before you even create the file.  Additionally, why do you only sort the first dataset by id and not name as well?

What are you trying to accomplish with the update statement?

Respected Advisor
Posts: 3,124

Re: Updating dataset with duplicates

First I have to agree with Art, I have no clue on your purpose. I guess you are just playing 'set, merge, update and modify', it is fun, isn't? By default, Update only outputs after by variable is done, if you need it somehow act like 'merge', you have to order your output explicitly.  See below:

data Dataset_1;

input (Member_ID    Name    Lasname    product_name) (:$8.)    quantity    ProduceCode :$    Date mmddyy10.;

cards;

1001    zz    xx    tt    200    100    1/1/2012

1001    zz    xx    tt    -12    200    1/2/2012

1001    zz    xx    nn    -10    300    1/3/2012

1001    zz    xx    nn    200    100    1/4/2012

1001    zz    xx    nn    200    200    1/5/2012

1001    zz    xx    nn    200    500    1/6/2012

1002    yy    ss    tt    100    100    5/10/2012

1002    yy    ss    tt    400    200    5/10/2012

1002    yy    ss    nn    100    300    5/10/2012

1002    yy    ss    nn    100    100    5/10/2012

1002    yy    ss    nn    100    200    5/10/2012

1002    yy    ss    nn    100    500    5/10/2012

;

data Dataset_2;

input (Member_ID    Name    Gender) (:$) AGE    (Contry_of_origin    Country region    Ethnic) (:$)    Family_size;

cards;

1001    zz    m    40    India    USA    Bamba    W    10

1002    yy    f    30    Canada    Mexico    Cana    B    50

;

/*merge*/

data want_merge;

merge  dataset_2 dataset_1 ;

by member_id;

run;

/*update*/

data want;

update dataset_2 dataset_1 ;

by member_id;

output; /*try to omit this one and see what happens*/

run;

Haikuo

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 239 views
  • 0 likes
  • 3 in conversation