Hello all. So currently I am trying to create a table that will merge data from 4 datasets (say Y1, Y2, Y3, Y4).
The 4 datasets are panel data from year 1, 2, 3, and 4 respectively. I have merged the four data sets using the data merge by ID. When I viewed the resulting data, I realized that there were some IDs that are not present in all four datasets. I would like to be able to make a dataset that merges only the IDs that are present in all four years.
After doing so, I would like to create a new variable that adds all expenditures that were made by each ID in all 4 years, if they meet a separate condition. (i.e. if their zip code starts with the number 5).
Y1
| ID | Expenditure | Zipcode |
| 1 | 50 | 20000 |
| 2 | 140 | 30000 |
| 3 | 30 | 51000 |
| 4 | 40 | 52000 |
| 5 | 50 | 50000 |
Y2
| ID | Expenditure | Zipcode |
| 6 | 100 | 50000 |
| 2 | 110 | 30000 |
| 3 | 120 | 51000 |
| 4 | 130 | 52000 |
| 5 | 140 | 50000 |
Y3
| ID | Expenditure | Zipcode |
| 1 | 50 | 20000 |
| 2 | 140 | 30000 |
| 3 | 30 | 51000 |
| 4 | 40 | 52000 |
| 5 | 50 | 50000 |
Y4
| ID | Expenditure | Zipcode |
| 1 | 50 | 20000 |
| 2 | 140 | 30000 |
| 3 | 30 | 51000 |
| 4 | 40 | 52000 |
If the datasets look something like this, the first step would create a data table with IDs 2, 3, 4 since they are the only ones present in all 4 datasets. Then the second part would create a new table with only IDs 3, 4 and would have an additional column sum_expenditure.
I'm sorry my question is so all over the place. Thank you all in advance!
data all;
merge y1(in=in1)
y2(in=in2)
y3(in=in3)
y4(in=in4)
; by ID;
if in1 and in2 and in3 and in4;
run;
data all;
merge y1(in=in1)
y2(in=in2)
y3(in=in3)
y4(in=in4)
; by ID;
if in1 and in2 and in3 and in4;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.