BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jdr1
Obsidian | Level 7

Dear SAS community, 

 

I would like to merge two datasets: Daily ED visits per county

 

Dataset #1 (ED.EDallcounty): ED visits (50million records for 10 years)- I have daily ED visits for a number of individuals per county for multiple years. The following variables include date, FIPS ID, Mental Health ED visit (1=yes, 0=no), unique visit ID, and county label that matches FIPS ID.

 

Date               ID         MH   VisitID    County

01/01/2009   37097    1       1xxxxxxx  Iredell

01/01/2009   37097    1       1xxxxxxx  Iredell

01/01/2009   37125    1       1xxxxxxx  Moore

01/01/2009   37097    1       1xxxxxxx  Iredell

...

 

Dataset #2 (ED.Tempall): Temp data - Daily temperature data for each county for the year over multiple years. Variables include date of county temperature recording, unique FIPS code for each county, daily average temperature, max temper, and minimum temperature. 

Date               ID          Tavg .  Tmax .  Tmin 

01/01/2009 .   31001    1.5 .  4.9 .        -8.2

01/02/2009 .   31001    1.2 .  4.9 .        -8.2

01/03/2009 .   31001    1.3 .  4.7 .        -8.2

01/04/2009 .   31001    1.1 .  4.2 .        -8.2

01/05/2009 .   31001    1.7 .  4.9 .        -8.2

....

 

I've tried merging both datasets by ID (unique identifier for each county) and date, but am doing something wrong. 

 

Proc sort data =ED.EDallcounty; by ID date; run;

Proc sort data =ED.Tempall; by ID date; run;

Data ED.EDTempAll;

merge ED.EDallcounty (in=a)  ED.Tempall (in=b);

by id date;

if a and b;

run;

 

From SAS log: 

 

NOTE: There were 47274662 observations read from the data set ED.EDALLCOUNTY.

NOTE: There were 783790 observations read from the data set ED.TEMPALL.

NOTE: The data set ED.EDTEMPALL has 0 observations and 23 variables.

NOTE: DATA statement used (Total process time):

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Simple debugging — you need to determine if there are actual matching records in the two data sets. What happens if you take out 

 

if a and b;

and run the code?

 

If there really are no matches, you will see a pattern in the resulting data set that whenever MH is present, Tavg is not present, and vice versa. If there are indeed matches, you will see that too.

 

Also,

 

data what_happened;
    set ED.EDTempAll(where=(not missing(mh) and not missing(tavg)));
run;

Are there observations in this new data set what_happened?

 

 

--
Paige Miller

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Simple debugging — you need to determine if there are actual matching records in the two data sets. What happens if you take out 

 

if a and b;

and run the code?

 

If there really are no matches, you will see a pattern in the resulting data set that whenever MH is present, Tavg is not present, and vice versa. If there are indeed matches, you will see that too.

 

Also,

 

data what_happened;
    set ED.EDTempAll(where=(not missing(mh) and not missing(tavg)));
run;

Are there observations in this new data set what_happened?

 

 

--
Paige Miller
jdr1
Obsidian | Level 7

Thanks so much for the quick reply, Paige!

 

Essentially there are no matches when I take away "if a and b;" And my two datasets stack on top of each other. 

 

NOTE: There were 47274662 observations read from the data set ED.EDALLCOUNTY.

NOTE: There were 783790 observations read from the data set ED.TEMPALL.

NOTE: The data set ED.EDTEMPALL has 48058452 observations and 23 variables.

 

One curious thing I notice is that any left or right or full join I try, my temp variables are always blank??? But I can empirically see that my daily ED visits match the time frame for daily temp data?

 

Here are results from 

 

 

122  data what;

123      set ED.EDTempAll(where=(not missing(MentalIll) and not missing(tavg)));

124  run;

 

NOTE: There were 0 observations read from the data set ED.EDTEMPALL.

      WHERE (not MISSING(MentalIll)) and (not MISSING(tavg));

NOTE: The data set WORK.WHAT has 0 observations and 23 variables.

 

 

PaigeMiller
Diamond | Level 26

Well, that's the answer. You are getting 0 observations because the two data sets don't match.

--
Paige Miller
ballardw
Super User

Given that your ID value is supposed to a county level FIPS and didn't show any errors in the merge attempt (common: mix match of Character and Numeric values for such codes) I would go back to where your sets you attempt to merge come from to see if any filters based on state may have been done.

 

Something else I might be tempted to do since you show one of the sets as having what appears to be text of a county name would be to see if the County Text matches the name. You should be able to get state and county FIPS and County name from the SASHELP.ZIPCODE data set and see if perhaps there was an issue with coding from County to FIPS. State and county in the Zipcode data set are numeric so depending on whether your ID is character or numeric you would combine them for comparison with either:

character: ZIPid = cats(put(state,z2.),put(county,z3.)

numeric: ZIPid = state*1000 + county;

The case of the Countynm variable is proper so you might have to do something with your county names to get them for comparisons.

 

Your example data with no common ID values was suspect to begin with IMHO.

jdr1
Obsidian | Level 7

My apologies for the delay. Thanks to everyone who provided really helpful suggestions! After taking a closer look, I noted that my Temp ID codes (i.e., county FIPS codes) were incorrectly matching Nebraska FIPS codes with North Carolina FIPS codes. After a further drill down, our team realized that the source of the error is the government dataset in which we derived daily temperature at the county-level that has incorrectly labeled each FIPS code. 


After we corrected the problem, our datasets finally merged. Woohoo!

 

Thanks again. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 560 views
  • 0 likes
  • 3 in conversation