BookmarkSubscribeRSS Feed
romanbilan99
Fluorite | Level 6

I have two datasets. One is every county in NC (with the FIPS code) and the "matching" SC county. Some SC counties are are included multiple time in this dataset. In the second dataset, I have COVID-19 cases and deaths for every county in both states for everyday in 2020. I need to merge the datasets in a way that keeps the same frequency of counties as the first dataset for every day in 2020. So far, everything I try drops the repeated SC observations. Below is a representation of the code I've been trying. 

 

*Sort; 
proc sort data = coviddata;
	by FIPS;
run;

proc sort data = PSdata;
	by FIPS;
run;


*Merge case and PS data into one dataset;
data try;
	merge PSdata coviddata ;
	by FIPS;
run;
5 REPLIES 5
Reeza
Super User
That's weird, the default behaviour is usually to include the multiple records.
Can you provide a small data example that illustrates this behaviour? Fake data is fine.

romanbilan99
Fluorite | Level 6

@Reeza in the county dataset there are 89 observations for NC and 89 for SC. In the COVID dataset there are 100 NC observations and 46 SC observations for everyday. When I try to merge, I get 89 NC observations and 33 SC observations after clearing any excess matches. And I just went ahead and attached the three actual datasets. 

Tom
Super User Tom
Super User

So you have one NC+SC dataset and one NC dataset and one SC dataset.  You want to add the NC and SC specific fields to the NC+SC dataset.  So you need to have key variable(s) in that uniquely identifies the observations in the NC dataset so you can merge. Similarly for the SC dataset.

 

You need to merge the NC+SC with the NC dataset by whatever the key variable is.

Then merge the result with the SC dataset by whatever the key variable is for that merge.  Note that you need re-sort the intermediate result first if you are using data step merge.

 

Take care that the variables are you are trying to add from the NC and SC dataset do not use the same names.

ballardw
Super User

If you have multiple values of a BY variable in only one set then you will get multiple records in the output for matches. Period.

You would have to show some example data that "gets dropped" for us to diagnose. Provide data in the form of a data step.

Most likely cause is values not as you expect or a more complex merge on multiple variables with multiple values in both sets.

 

So to get "same frequency" you have to decide what to do with the multiple records. Since you have "everyday in 2020" then you likely want some sort of summary. So you could either summarize one set before the merge or after the merge.

romanbilan99
Fluorite | Level 6

@ballardw I included some fake data below 

 

County data
FIPS    Name     State  Indicator 
x          xx           NC     1
w          ww           NC    1
y          yy          SC      0
z         zz            SC      0
z         zz            SC      0

COVID data
FIPS    Name     State  Day   Cases
x          xx           NC     3/1     5
w          ww           NC     3/1      6
y          yy          SC      3/1      14
z         zz            SC      3/1     4
 
Combined data
FIPS    Name     State  Day   Cases Indicator
x          xx           NC     3/1     5           1
w          ww           NC     3/1      6       1
y          yy          SC      3/1      14       0
z         zz            SC      3/1     4         0

Notice county ZZ did not repeat in the combined data set. And when it comes to the repeated counties, I just want to attach their COVID data for that day to them. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1806 views
  • 3 likes
  • 4 in conversation