I have two datasets. One is every county in NC (with the FIPS code) and the "matching" SC county. Some SC counties are are included multiple time in this dataset. In the second dataset, I have COVID-19 cases and deaths for every county in both states for everyday in 2020. I need to merge the datasets in a way that keeps the same frequency of counties as the first dataset for every day in 2020. So far, everything I try drops the repeated SC observations. Below is a representation of the code I've been trying.
*Sort; proc sort data = coviddata; by FIPS; run; proc sort data = PSdata; by FIPS; run; *Merge case and PS data into one dataset; data try; merge PSdata coviddata ; by FIPS; run;
@Reeza in the county dataset there are 89 observations for NC and 89 for SC. In the COVID dataset there are 100 NC observations and 46 SC observations for everyday. When I try to merge, I get 89 NC observations and 33 SC observations after clearing any excess matches. And I just went ahead and attached the three actual datasets.
So you have one NC+SC dataset and one NC dataset and one SC dataset. You want to add the NC and SC specific fields to the NC+SC dataset. So you need to have key variable(s) in that uniquely identifies the observations in the NC dataset so you can merge. Similarly for the SC dataset.
You need to merge the NC+SC with the NC dataset by whatever the key variable is.
Then merge the result with the SC dataset by whatever the key variable is for that merge. Note that you need re-sort the intermediate result first if you are using data step merge.
Take care that the variables are you are trying to add from the NC and SC dataset do not use the same names.
If you have multiple values of a BY variable in only one set then you will get multiple records in the output for matches. Period.
You would have to show some example data that "gets dropped" for us to diagnose. Provide data in the form of a data step.
Most likely cause is values not as you expect or a more complex merge on multiple variables with multiple values in both sets.
So to get "same frequency" you have to decide what to do with the multiple records. Since you have "everyday in 2020" then you likely want some sort of summary. So you could either summarize one set before the merge or after the merge.
@ballardw I included some fake data below
County data FIPS Name State Indicator x xx NC 1 w ww NC 1 y yy SC 0 z zz SC 0 z zz SC 0 COVID data FIPS Name State Day Cases x xx NC 3/1 5 w ww NC 3/1 6 y yy SC 3/1 14 z zz SC 3/1 4 Combined data FIPS Name State Day Cases Indicator x xx NC 3/1 5 1 w ww NC 3/1 6 1 y yy SC 3/1 14 0 z zz SC 3/1 4 0
Notice county ZZ did not repeat in the combined data set. And when it comes to the repeated counties, I just want to attach their COVID data for that day to them.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
