Re: Merging/Combining

CatPaws · Posted 02-15-2022 08:56 PM

I have two excel files I need to merge. I need to add the FIPS Code and 2013 Code to my data set. The variable they have in common is County and State. However, they each have a different number of observations, so when I try to merge them, it is distorted. I need to add the FIPS Code and the 2013 Code to each County in my dataset, how do I do this?

SASKiwi · Posted 02-15-2022 09:15 PM

Please post your complete SAS log including code and notes. Do a copy and paste using the </> menu option as screenshots aren't useful.

Tom · Posted 02-15-2022 09:18 PM

I don't understand. You are saying that the file you posted the photograph is not unique when grouped by STATE and COUNTY?

How are they duplicated? What do the duplicates mean?

CatPaws · Posted 02-15-2022 09:37 PM

When I merge them, some are blank, and I know that to be incorrect because the blanks one have a county listed.

proc sort data=Z out=One;
by County;
run;
proc sort data=X out=Two;
by County;
run;
data Combined;
merge One (in=a) Two (in=b);
by County;
run;

Tom · Posted 02-15-2022 10:01 PM

You cannot merge by COUNTY if you data comes from multiple states. There are many counties that appear in more than one state.

To test if your dataset with the FIPS code is unique you need to do that BEFORE you try to merge it with the other dataset.

I don't know what Z and ONE are so let's call the table with FIPS codes FIPS instead.

So to create a dataset named FIPS_DUPS that has all of the records where the combination of STATE and COUNTY are not unique you would run something like this:

proc sort data=fips;
  by state county;
run;
data fips_dups;
  set fips;
  by state county;
  if not (first.county and last.county);
run;

CatPaws · Posted 02-15-2022 10:11 PM

I tried to code you suggested and got this:

NOTE: The data set WORK.SORTED has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

I should note that there are not multiple states, there is only one State. One data set has multiple variables that include the state and county, and the other data set also has the state and county. I need to match the FIPS Code and the 2013 Code to each county listed in my first data set. Does this make more sense?

Tom · Posted 02-15-2022 10:27 PM

You seem to be saying one dataset variables STATE and COUNTY and also FIPS and CODE_2013. It has data for all of the states. Let's call that FIPS.

Another dataset has STATE and COUNTY variables also and perhaps some other variables of interest. But STATE value is a constant so it has the same value on every observations. Let's call that MYDATA.

So if the two datasets are sorted by STATE and COUNTY then it sounds like to attach FIPS and CODE_2013 to the data in MYDATA you would use a data step like this:

data want;
  merge mydata (in=in1) fips ;
  by state county;
  if in1;
run;

Now the result will exclude the records from FIPS that do not match any of the records in MYDATA.

If that description is not right then provide more information on what you are doing. For example print a few observations from both datasets. Or better yet provide the data as a data step we can run to re-create your data so we can provide you with code that is tested and uses your variable names and data.

CatPaws · Posted 02-15-2022 10:49 PM

Here are some observations from my two data set (separated by the yellow column). The data set on the Left has a FIPS code & 2013 code assigned to each County. The data set on the right has multiple observations, and what I need to do is assign the FIPS code and 2013 code to each county on the right. For example, all observations that have Bartow county will also have a FIPS code of 13015 and a 2013 code as 2.

Tom · Posted 02-15-2022 11:37 PM

Not sure why you posted a photograph of a spreadsheet. Why not post the actual data? Note that data is text, not images.

Is the data really in one dataset like that spreadsheet? If so the the first thing you need to do is split it out into two datasets.

If the variables in your actual datasets are really named STATE and COUNTY as the row one headers in your spreadsheet indicate then the code I posted should work. The three observations on the right will merge with the one observation on the left so that the value of "FIPS code" (whatever the actual variable name you created to hold those values since the header name is not a valid SAS variable name) from the left will be replicated onto to all three observations.

What problem did you have with it? Make sure that variables you want to CREATE from the dataset on the left do not already exist in the dataset on the right. Otherwise the MERGE cannot work properly as when the second and third observation from the right a read in the value of "FIPS code" they have in that dataset will overwrite anything that was read from the dataset on the left.

CatPaws · Posted 02-16-2022 12:51 AM

I had a mistake on my end. I did not know that a merge was case-sensitive. They have merged, however, I am getting this error:

WARNING: Multiple lengths were specified for the BY variable County by input data sets. This might
cause unexpected results.
NOTE: There were 159 observations read from the data set WORK.ONE.
NOTE: There were 778604 observations read from the data set WORK.TWO.
NOTE: The data set WORK.COMBINED has 778604 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 0.49 seconds
cpu time 0.46 seconds

When I try to switch them around, the Warning goes away, but it adds additional observations to my data set. (See below)

NOTE: There were 778604 observations read from the data set WORK.TWO.
NOTE: There were 159 observations read from the data set WORK.ONE.
NOTE: The data set WORK.COMBINED has 778618 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 0.48 seconds
cpu time 0.46 seconds

Kurt_Bremser · Posted 02-16-2022 04:49 AM

Start by saving your Excel spreadsheets to a format that allows consistent importing into SAS. PROC IMPORT makes guesses about variable attributes (because Excel does not have the concept of fixed column attributes) depending on the content.

Save as csv (which will result in semicolon-delimited files), and import those with DATA steps where you set the correct (and consistent!) attributes for all columns, so that further processing does not meet with stumbling blocks.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS Innovate 2025: Save the Date