BookmarkSubscribeRSS Feed
wheddingsjr
Pyrite | Level 9

Hi all. I am trying to duplicate SAS logic and get the same results that some else did previously merging two datasets. The first dataset (HPHC_EOL_FY20), has 85k lines and the second (EOL_COHORT), has 135 lines. In the original results that I am trying to duplicate the final result has 55k lines, when I run the below code I get 130k. I am no sure what I am doing wrong. Any advise would be appreciated.

PROC SQL;
create table HPHC_FY20_EOL_combined_v2 as
	select med.*,eol.*
		from Dataset1 as med
				left join
			 Dataset2 as eol
			 	on med.member_id=eol.INSUREDSIDNUMBER
		;
Quit;
13 REPLIES 13
Jagadishkatam
Amethyst | Level 16

Without looking at the data, it is difficult to predict. However i assume the previous merge was done if both datasets had med.member_id=eol.INSUREDSIDNUMBER, considering this i would try inner join instead if left join. please try and let me know if it works

 

PROC SQL;
create table HPHC_FY20_EOL_combined_v2 as
	select med.*,eol.*
		from Dataset1 as med
				inner join
			 Dataset2 as eol
			 	on med.member_id=eol.INSUREDSIDNUMBER
		;
Quit;
Thanks,
Jag
wheddingsjr
Pyrite | Level 9
Thanks for the response Jag. I tried the inner join and got the same results. I think I have to delete dupes, but not sure how to do that.
Kurt_Bremser
Super User

@wheddingsjr wrote:
Thanks for the response Jag. I tried the inner join and got the same results. I think I have to delete dupes, but not sure how to do that.
proc sort
  data=dataset2
  out=dedup
  nodupkey
;
by INSUREDSIDNUMBER;
run;
wheddingsjr
Pyrite | Level 9
Thanks Kurt for the response, but that did not work
Kurt_Bremser
Super User

Please describe in detail what happened. "Did not work" is not helpful at all, and worthy of the proverbial blonde secretary.

  • how many observations are in the first dataset?
  • how many are in the second dataset before and after deduplicating (read the log)?
  • how many are in the result?

Mind that a left join will ALWAYS have at least the number of observations of the "left" dataset. To build the intersection set, you have to use an inner join.

wheddingsjr
Pyrite | Level 9
Hi Kurt

This is what I included in my original post:

Hi all. I am trying to duplicate SAS logic and get the same results that some else did previously merging two datasets. The first dataset (HPHC_EOL_FY20), has 85k lines and the second (EOL_COHORT), has 135 lines. In the original results that I am trying to duplicate the final result has 55k lines, when I run the below code I get 130k. I am no sure what I am doing wrong. Any advise would be appreciated.
wheddingsjr
Pyrite | Level 9
Hi Kurt

This the log:
NOTE: There were 136 observations read from the data set WORK.dataset2.
NOTE: 33 observations with duplicate key values were deleted.
NOTE: The data set WORK.DEDUP has 103 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
Kurt_Bremser
Super User

And how many observations resulted from the join of the large table with your deduped dataset?

Do you need the deduped dataset to create a subset of the large table, or should all observations from the large table be kept?

wheddingsjr
Pyrite | Level 9
There were still 130k lines..What I believe needs to happen is to delete all the dupes from the 130k
wheddingsjr
Pyrite | Level 9
138 PROC SQL;
139 create table HPHC_FY20_EOL_combined_v2 as
140 select med.*,eol.*
141 from HPHC_EOL_FY20 as med
142 inner join
143 EOL_Cohort as eol
144 on med.member_id = eol.INSUREDSIDNUMBER
145 ;
NOTE: Table WORK.HPHC_FY20_EOL_COMBINED_V2 created, with 130170 rows and 35 columns.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2238 views
  • 0 likes
  • 3 in conversation