BookmarkSubscribeRSS Feed
steph_
Calcite | Level 5

Hello!

 

I am working with healthcare data and have a dataset for each year of data. So, for example, the 2019 and 2020 datasets have all the same variables/variable names for their respective year.

 

I need to combine/aggregate the 2016-2021 datasets but am having a hard time doing so, or rather, knowing the most succinct way to do so.

 

There are about 10 or so variables I need to let SAS know are the equal to each other across the datasets.

 

Thanks!

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Can you please make up a small example for two years, showing the data for the two years and the desired result? The example doesn't have to have 10 variables, 3 would be fine. Data can be completely made up, as long as it illustrates the problem.

--
Paige Miller
steph_
Calcite | Level 5

Example: I need to join the 2016 (made up of 100 hospitalizations), 2017 (made up of 200 hospitalizations), and 2018 (made up of 300 hospitalizations) datasets. I am looking at the diabetes, COPD, and CVD variables (which have the same variable name across the datasets).

 

So my end product/dataset would be 2016-2018 datasets combined with a total of 600 hospitalizations. After this new dataset was created, I would be able to run a proc freq statement by whether or not any of those 600 hospitalizations between 2016-2018 were for diabetes, COPD, or CVD.

ballardw
Super User

Sounds like you may want something along the lines of

 

data want;
    set 
        data2016 (keep=<your desired variables go here>)
        data2017 (keep=<your desired variables go here>)
        data2018 (keep=<your desired variables go here>)
   ;
run;

I would suggest keeping all the likely identification variables like hospital, patient and date of visit related stuff if you hadn't been planning on it.

 

If you're really lazy you could likely skip the KEEP as the default behavior would to include all of the variables from all the sets.

Caveat: the variables really should have the same properties: numeric/character and length in all of the data sets.

 

Reeza
Super User

Adding on to @ballardw answer. 

 

You may need to rename some variables (RENAME option), and want to indicate which dataset it came from (INDSNAME option).

 

data want;
length src source $50.;
    set 
        data2016 (keep=<your desired variables go here> rename=(oldname1=newname1 oldname2=newname2))
        data2017 (keep=<your desired variables go here>)
        data2018 (keep=<your desired variables go here>)
        INDSNAME = SRC;

Source = src;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 528 views
  • 0 likes
  • 4 in conversation