Hello!
I am working with healthcare data and have a dataset for each year of data. So, for example, the 2019 and 2020 datasets have all the same variables/variable names for their respective year.
I need to combine/aggregate the 2016-2021 datasets but am having a hard time doing so, or rather, knowing the most succinct way to do so.
There are about 10 or so variables I need to let SAS know are the equal to each other across the datasets.
Thanks!
Can you please make up a small example for two years, showing the data for the two years and the desired result? The example doesn't have to have 10 variables, 3 would be fine. Data can be completely made up, as long as it illustrates the problem.
Example: I need to join the 2016 (made up of 100 hospitalizations), 2017 (made up of 200 hospitalizations), and 2018 (made up of 300 hospitalizations) datasets. I am looking at the diabetes, COPD, and CVD variables (which have the same variable name across the datasets).
So my end product/dataset would be 2016-2018 datasets combined with a total of 600 hospitalizations. After this new dataset was created, I would be able to run a proc freq statement by whether or not any of those 600 hospitalizations between 2016-2018 were for diabetes, COPD, or CVD.
Sounds like you may want something along the lines of
data want; set data2016 (keep=<your desired variables go here>) data2017 (keep=<your desired variables go here>) data2018 (keep=<your desired variables go here>) ; run;
I would suggest keeping all the likely identification variables like hospital, patient and date of visit related stuff if you hadn't been planning on it.
If you're really lazy you could likely skip the KEEP as the default behavior would to include all of the variables from all the sets.
Caveat: the variables really should have the same properties: numeric/character and length in all of the data sets.
Adding on to @ballardw answer.
You may need to rename some variables (RENAME option), and want to indicate which dataset it came from (INDSNAME option).
data want;
length src source $50.;
set
data2016 (keep=<your desired variables go here> rename=(oldname1=newname1 oldname2=newname2))
data2017 (keep=<your desired variables go here>)
data2018 (keep=<your desired variables go here>)
INDSNAME = SRC;
Source = src;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.