BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JJ_83
Obsidian | Level 7

I'm not sure what's going on.

 

I have these two datasets

 

data have1;
input id value1 $;
cards;
1 a
2 a
;
data have2;
input id zipcode;
cards;
1 88888
1 99999
2 88888
2 99999
;

 

 

And I want this

data want;
input id value1 $ zipcode;
cards;
1 a 88888
1 a 99999
2 a 88888
2 a 99999
;

 

I tried merging the two datasets by id, but it is not repeating the way i want. I feel like I'm missing something obvious.

1 ACCEPTED SOLUTION

Accepted Solutions
JJ_83
Obsidian | Level 7
proc sql;
    create table working.p42 as
    select a.*, b.zipcode
    from working.ucmr3 as a
    inner join working.ucmr3_zip as b
    on a.pwsid = b.pwsid;
quit; 

Chat GPT helped me with this. This code worked.

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26
data want;
    merge have1 have2;
    by id;
run;
--
Paige Miller
SASJedi
Ammonite | Level 13

Show your merge code, please. When I ran this:

 

data want;
	merge have1 have2;
	by id;
run;
proc print;
run;

The result was this:

 

Obs id value1 zipcode
1 1 a 88888
2 1 a 99999
3 2 a 88888
4 2 a 99999

 

Isn't that exactly what you wanted?

 

Check out my Jedi SAS Tricks for SAS Users
JJ_83
Obsidian | Level 7

For some reason that merge statement isn't doing what I want it to do.

 

I want each zipcode value to repeat for each pwsid entry, but it's not doing that.

 

JJ_83_0-1741032467507.png

 

JJ_83
Obsidian | Level 7
proc sql;
    create table working.p42 as
    select a.*, b.zipcode
    from working.ucmr3 as a
    inner join working.ucmr3_zip as b
    on a.pwsid = b.pwsid;
quit; 

Chat GPT helped me with this. This code worked.

ballardw
Super User

You would have to provide examples of the data sets and the code.

You don't even tell us exactly what is wrong with that "example" data.

IF your "merge" was by PWSID then to get all Zipcodes duplicated from a merged data set then PWSID could only appear once in the base set and then have the value of PWSID with each desired Zip code in the other set. Once you have duplicates of a BY variable in two or more sets in a data step merge the results often get much more problematic because of the way the Merge statement works. Did you read the log? Did it have anything about "more than one data set with repeats of BY values" ? That warning tells you that you likely don't have the desired result because if there are   N duplicates of a By variable valie in one set and M duplicates of the same value in the other the resulting set will generally have one of N or M resulting by variable values depending on the data sets.

 

Example:

Data ex1;
  input id  x $;
datalines;
1  a
1  b 
1  c
2  q
;

data ex2;
  input id y $;
datalines;
1   zz
1   yy
2   qq
2   hh
;

data combined;
  merge ex1 ex2;
  by id;
run;

Note that the output data set Combined will have 5 observation not the 8 you were wanting, if I understand: 3 values of Id=1 (not the 6 that a 3x2 cross would generate) and 2 values of Id=2.

Do note the LOG (emphasis added):

19   data combined;
20     merge ex1 ex2;
21     by id;
22   run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 4 observations read from the data set WORK.EX1.
NOTE: There were 4 observations read from the data set WORK.EX2.
NOTE: The data set WORK.COMBINED has 5 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds

Also note that the single Id=2 did get both the other values added.

 


@JJ_83 wrote:

For some reason that merge statement isn't doing what I want it to do.

 

I want each zipcode value to repeat for each pwsid entry, but it's not doing that.

 

JJ_83_0-1741032467507.png

 


 

JJ_83
Obsidian | Level 7

I don't need more feedback, it's been solved. Thank you.

Tom
Super User Tom
Super User

So that picture looks nothing like your original example.  Which variables corresponds with ID and VALUE1 that you had in your example?

 

What do you mean you want ZIPCODE to "repeat"?  In your picture there are many values or ZIPCODE already shown.  How is it not repeating?

 

Note that your example showed a 1 to N match.  Each observation in HAVE1 was merged with zero or more observations in HAVE2.  That is what a MERGE can handle.

 

If you instead want an N to M match then you cannot use a MERGE.  Instead you will need to preform a cartesian product, like what the INNER JOIN of SQL produces.

JJ_83
Obsidian | Level 7

I don't need more feedback, it's been solved. Thank you.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2234 views
  • 12 likes
  • 5 in conversation