Solved: Re: Merging By Variable

JJ_83 · Posted 03-03-2025 02:12 PM

I'm not sure what's going on.

I have these two datasets

data have1;
input id value1 $;
cards;
1 a
2 a
;

data have2;
input id zipcode;
cards;
1 88888
1 99999
2 88888
2 99999
;

And I want this

data want;
input id value1 $ zipcode;
cards;
1 a 88888
1 a 99999
2 a 88888
2 a 99999
;

I tried merging the two datasets by id, but it is not repeating the way i want. I feel like I'm missing something obvious.

JJ_83 · Posted 03-03-2025 03:25 PM

proc sql;
    create table working.p42 as
    select a.*, b.zipcode
    from working.ucmr3 as a
    inner join working.ucmr3_zip as b
    on a.pwsid = b.pwsid;
quit;

Chat GPT helped me with this. This code worked.

View solution in original post

PaigeMiller · Posted 03-03-2025 02:57 PM

data want;
    merge have1 have2;
    by id;
run;

--
Paige Miller

SASJedi · Posted 03-03-2025 02:58 PM

Show your merge code, please. When I ran this:

data want;
	merge have1 have2;
	by id;
run;
proc print;
run;

The result was this:

Obs	id	value1	zipcode
1	1	a	88888
2	1	a	99999
3	2	a	88888
4	2	a	99999

Isn't that exactly what you wanted?

Check out my Jedi SAS Tricks for SAS Users

JJ_83 · Posted 03-03-2025 03:08 PM

For some reason that merge statement isn't doing what I want it to do.

I want each zipcode value to repeat for each pwsid entry, but it's not doing that.

JJ_83 · Posted 03-03-2025 03:25 PM

proc sql;
    create table working.p42 as
    select a.*, b.zipcode
    from working.ucmr3 as a
    inner join working.ucmr3_zip as b
    on a.pwsid = b.pwsid;
quit;

Chat GPT helped me with this. This code worked.

ballardw · Posted 03-03-2025 07:44 PM

You would have to provide examples of the data sets and the code.

You don't even tell us exactly what is wrong with that "example" data.

IF your "merge" was by PWSID then to get all Zipcodes duplicated from a merged data set then PWSID could only appear once in the base set and then have the value of PWSID with each desired Zip code in the other set. Once you have duplicates of a BY variable in two or more sets in a data step merge the results often get much more problematic because of the way the Merge statement works. Did you read the log? Did it have anything about "more than one data set with repeats of BY values" ? That warning tells you that you likely don't have the desired result because if there are N duplicates of a By variable valie in one set and M duplicates of the same value in the other the resulting set will generally have one of N or M resulting by variable values depending on the data sets.

Example:

Data ex1;
  input id  x $;
datalines;
1  a
1  b 
1  c
2  q
;

data ex2;
  input id y $;
datalines;
1   zz
1   yy
2   qq
2   hh
;

data combined;
  merge ex1 ex2;
  by id;
run;

Note that the output data set Combined will have 5 observation not the 8 you were wanting, if I understand: 3 values of Id=1 (not the 6 that a 3x2 cross would generate) and 2 values of Id=2.

Do note the LOG (emphasis added):

19   data combined;
20     merge ex1 ex2;
21     by id;
22   run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 4 observations read from the data set WORK.EX1.
NOTE: There were 4 observations read from the data set WORK.EX2.
NOTE: The data set WORK.COMBINED has 5 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds

Also note that the single Id=2 did get both the other values added.

@JJ_83 wrote:

For some reason that merge statement isn't doing what I want it to do.

I want each zipcode value to repeat for each pwsid entry, but it's not doing that.

JJ_83 · Posted 03-04-2025 10:47 AM

I don't need more feedback, it's been solved. Thank you.

Tom · Posted 03-04-2025 10:44 AM

So that picture looks nothing like your original example. Which variables corresponds with ID and VALUE1 that you had in your example?

What do you mean you want ZIPCODE to "repeat"? In your picture there are many values or ZIPCODE already shown. How is it not repeating?

Note that your example showed a 1 to N match. Each observation in HAVE1 was merged with zero or more observations in HAVE2. That is what a MERGE can handle.

If you instead want an N to M match then you cannot use a MERGE. Instead you will need to preform a cartesian product, like what the INNER JOIN of SQL produces.

JJ_83 · Posted 03-04-2025 10:48 AM

I don't need more feedback, it's been solved. Thank you.

The 2025 SAS Hackathon has begun!