Hi all. I am currently working with a panel data and would like to find the total expenditure of each individuals for each year The data has a list of all purchases and the expenditure for each purchase is listed separately.
ID | Expenditure_year1 | Case | Expenditure_year2 | Case | Expenditure_year3 | Case |
1 | 200 | 1 | 500 | 9 | 100 | 17 |
1 | 300 | 2 | 200 | 10 | 200 | 18 |
1 | 100 | 3 | 100 | 11 | 300 | 19 |
2 | 400 | 4 | 100 | 12 | 400 | 20 |
2 | . | 5 | 200 | 13 | 500 | 21 |
2 | 200 | 6 | 300 | 14 | 600 | 22 |
3 | 300 | 7 | . | 15 | . | 23 |
4 | 100 | 8 | . | 16 | 200 | 24 |
ID | Total_expenditure_year1 | Total_expenditure_year2 | Total_expenditure_year3 |
1 | 600 | 800 | 600 |
2 | 600 | 600 | 1500 |
4 | 100 | 0 | 200 |
Since ID 3 does not have any observation in year 2 and year 3, I would like ID 3 excluded. However for the missing value in year 1 for ID 2, I would like to skip over just that observation and still have the rest added. The resulting table would look something like the second table. If the ID is like ID 4, in which it has an observation for 2 years but maybe didn't make any purchase in year 2 to have missing value, I would like it to replaced with 0 and have the yearly expenditure show up as 0 in the final result.
If possible I would like to set up a plot for each ID their trend in expenditure by year afterwards. The X-axis would be years (1,2,3) and U=Y-axis would be total_expenditure for each year.
Thank you.
When we ask you to post data in a data step with datalines, we mean this:
data have;
input ID $ Year Case $ Expenditure Type $;
datalines;
1 1 1 100 1
1 1 2 200 1
1 1 3 300 2
2 1 4 400 1
2 1 5 . .
2 1 6 500 2
3 1 7 600 2
4 1 8 100 1
1 2 9 100 1
1 2 10 200 2
1 2 11 300 1
2 2 12 400 1
2 2 13 500 2
2 2 14 600 2
3 2 15 . .
4 2 16 . .
1 3 17 100 1
1 3 18 200 1
1 3 19 300 2
2 3 20 400 2
2 3 21 500 1
2 3 22 600 2
3 3 23 . .
4 3 24 100 1
;
Why do we want this?
It's not rocket science, and the skill to create datasets on the fly in code is very useful, as you will find out in your future work. Help us to help you.
From this dataset, I did this:
/* summarize by id and year */
proc summary data=have nway;
class id year;
var expenditure;
output out=want sum()=;
run;
/* filter out entries with too many missing values */
proc sql;
create table filtered as
select *
from want
where id in (
select id from want group by id having count(expenditure) > 1
)
;
quit;
/* create a prefix for the column headers */
proc format;
picture header
low-high = "09" (prefix="Total_expenditure_year")
;
run;
/* create the report */
proc report data=filtered;
column id expenditure,year;
define id / group;
define year / "" across format=header.;
define expenditure / "" analysis;
run;
Please post data in usable form.
Is the structure of the first table your starting point or is this a result after some steps of processing?
This is after merging each year's data by ID.
Your "have" dataset is not possible. You cannot have three columns with the same name (Case).
Please post your real dataset in its real layout in a data step with datalines (do not skip this!), so we really know what you are talking about.
In particular, I suspect that you start with untransposed data, which is much easier to handle. If not, transposing to a long layout will help in further processing.
ID | Year | Case | Expenditure | Type |
1 | 1 | 1 | 100 | 1 |
1 | 1 | 2 | 200 | 1 |
1 | 1 | 3 | 300 | 2 |
2 | 1 | 4 | 400 | 1 |
2 | 1 | 5 | . | . |
2 | 1 | 6 | 500 | 2 |
3 | 1 | 7 | 600 | 2 |
4 | 1 | 8 | 100 | 1 |
1 | 2 | 9 | 100 | 1 |
1 | 2 | 10 | 200 | 2 |
1 | 2 | 11 | 300 | 1 |
2 | 2 | 12 | 400 | 1 |
2 | 2 | 13 | 500 | 2 |
2 | 2 | 14 | 600 | 2 |
3 | 2 | 15 | . | . |
4 | 2 | 16 | . | . |
1 | 3 | 17 | 100 | 1 |
1 | 3 | 18 | 200 | 1 |
1 | 3 | 19 | 300 | 2 |
2 | 3 | 20 | 400 | 2 |
2 | 3 | 21 | 500 | 1 |
2 | 3 | 22 | 600 | 2 |
3 | 3 | 23 | . | . |
4 | 3 | 24 | 100 | 1 |
Sorry. I was trying to simplify my data and didn't realize I uploaded it that way. I included cases because when I attempted, I realized that instead of giving me a data that had one cost observation for each year, it gave me the total cost value on each observation. I was hoping there would be a way to simplify this. The reason I kept case in was because I was hoping to identify the total expenditure by the types of purchase later on and it was on a separate data table that was sorted by case and did not have the ID variable.
The code I tried looked something like this.
proc sql;
create table A as
select ID, year, case, expenditure, type, sum(expenditure) as total_expenditure
from original
group by ID;
quit;
When we ask you to post data in a data step with datalines, we mean this:
data have;
input ID $ Year Case $ Expenditure Type $;
datalines;
1 1 1 100 1
1 1 2 200 1
1 1 3 300 2
2 1 4 400 1
2 1 5 . .
2 1 6 500 2
3 1 7 600 2
4 1 8 100 1
1 2 9 100 1
1 2 10 200 2
1 2 11 300 1
2 2 12 400 1
2 2 13 500 2
2 2 14 600 2
3 2 15 . .
4 2 16 . .
1 3 17 100 1
1 3 18 200 1
1 3 19 300 2
2 3 20 400 2
2 3 21 500 1
2 3 22 600 2
3 3 23 . .
4 3 24 100 1
;
Why do we want this?
It's not rocket science, and the skill to create datasets on the fly in code is very useful, as you will find out in your future work. Help us to help you.
From this dataset, I did this:
/* summarize by id and year */
proc summary data=have nway;
class id year;
var expenditure;
output out=want sum()=;
run;
/* filter out entries with too many missing values */
proc sql;
create table filtered as
select *
from want
where id in (
select id from want group by id having count(expenditure) > 1
)
;
quit;
/* create a prefix for the column headers */
proc format;
picture header
low-high = "09" (prefix="Total_expenditure_year")
;
run;
/* create the report */
proc report data=filtered;
column id expenditure,year;
define id / group;
define year / "" across format=header.;
define expenditure / "" analysis;
run;
Thank you. I will also keep that in mind in the future.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.