Solved: How to add all values of a variable within each ID by year?

ercksh8 · Posted 04-23-2021 01:38 AM

Hi all. I am currently working with a panel data and would like to find the total expenditure of each individuals for each year The data has a list of all purchases and the expenditure for each purchase is listed separately.

ID	Expenditure_year1	Case	Expenditure_year2	Case	Expenditure_year3	Case
1	200	1	500	9	100	17
1	300	2	200	10	200	18
1	100	3	100	11	300	19
2	400	4	100	12	400	20
2	.	5	200	13	500	21
2	200	6	300	14	600	22
3	300	7	.	15	.	23
4	100	8	.	16	200	24

ID	Total_expenditure_year1	Total_expenditure_year2	Total_expenditure_year3
1	600	800	600
2	600	600	1500
4	100	0	200

Since ID 3 does not have any observation in year 2 and year 3, I would like ID 3 excluded. However for the missing value in year 1 for ID 2, I would like to skip over just that observation and still have the rest added. The resulting table would look something like the second table. If the ID is like ID 4, in which it has an observation for 2 years but maybe didn't make any purchase in year 2 to have missing value, I would like it to replaced with 0 and have the yearly expenditure show up as 0 in the final result.

If possible I would like to set up a plot for each ID their trend in expenditure by year afterwards. The X-axis would be years (1,2,3) and U=Y-axis would be total_expenditure for each year.

Thank you.

Kurt_Bremser · Posted 04-23-2021 05:34 AM

When we ask you to post data in a data step with datalines, we mean this:

data have;
input ID $ Year Case $ Expenditure Type $;
datalines;
1 1 1 100 1
1 1 2 200 1
1 1 3 300 2
2 1 4 400 1
2 1 5 . .
2 1 6 500 2
3 1 7 600 2
4 1 8 100 1
1 2 9 100 1
1 2 10  200 2
1 2 11  300 1
2 2 12  400 1
2 2 13  500 2
2 2 14  600 2
3 2 15  . .
4 2 16  . .
1 3 17  100 1
1 3 18  200 1
1 3 19  300 2
2 3 20  400 2
2 3 21  500 1
2 3 22  600 2
3 3 23  . .
4 3 24  100 1
;

Why do we want this?

It is easy to recreate the dataset by just copy/pasting the code
all attributes are defined clearly and leave no doubt about types, formats, ...
because of this, the solution process will be speeded up significantly

It's not rocket science, and the skill to create datasets on the fly in code is very useful, as you will find out in your future work. Help us to help you.

From this dataset, I did this:

/* summarize by id and year */
proc summary data=have nway;
class id year;
var expenditure;
output out=want sum()=;
run;

/* filter out entries with too many missing values */
proc sql;
create table filtered as
  select *
  from want
  where id in (
    select id from want group by id having count(expenditure) > 1
  )
;
quit;

/* create a prefix for the column headers */
proc format;
picture header
  low-high = "09" (prefix="Total_expenditure_year")
;
run;

/* create the report */
proc report data=filtered;
column id expenditure,year;
define id / group;
define year / "" across format=header.;
define expenditure / "" analysis;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

andreas_lds · Posted 04-23-2021 02:01 AM

Please post data in usable form.

Is the structure of the first table your starting point or is this a result after some steps of processing?

ercksh8 · Posted 04-23-2021 02:04 AM

This is after merging each year's data by ID.

Kurt_Bremser · Posted 04-23-2021 03:36 AM

Your "have" dataset is not possible. You cannot have three columns with the same name (Case).

Please post your real dataset in its real layout in a data step with datalines (do not skip this!), so we really know what you are talking about.

In particular, I suspect that you start with untransposed data, which is much easier to handle. If not, transposing to a long layout will help in further processing.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ercksh8 · Posted 04-23-2021 04:17 AM

ID	Year	Case	Expenditure	Type
1	1	1	100	1
1	1	2	200	1
1	1	3	300	2
2	1	4	400	1
2	1	5	.	.
2	1	6	500	2
3	1	7	600	2
4	1	8	100	1
1	2	9	100	1
1	2	10	200	2
1	2	11	300	1
2	2	12	400	1
2	2	13	500	2
2	2	14	600	2
3	2	15	.	.
4	2	16	.	.
1	3	17	100	1
1	3	18	200	1
1	3	19	300	2
2	3	20	400	2
2	3	21	500	1
2	3	22	600	2
3	3	23	.	.
4	3	24	100	1

Sorry. I was trying to simplify my data and didn't realize I uploaded it that way. I included cases because when I attempted, I realized that instead of giving me a data that had one cost observation for each year, it gave me the total cost value on each observation. I was hoping there would be a way to simplify this. The reason I kept case in was because I was hoping to identify the total expenditure by the types of purchase later on and it was on a separate data table that was sorted by case and did not have the ID variable.

The code I tried looked something like this.

proc sql;
     create table A as
     select ID, year, case, expenditure, type, sum(expenditure) as total_expenditure
     from original
     group by ID;
quit;

Kurt_Bremser · Posted 04-23-2021 05:34 AM

When we ask you to post data in a data step with datalines, we mean this:

data have;
input ID $ Year Case $ Expenditure Type $;
datalines;
1 1 1 100 1
1 1 2 200 1
1 1 3 300 2
2 1 4 400 1
2 1 5 . .
2 1 6 500 2
3 1 7 600 2
4 1 8 100 1
1 2 9 100 1
1 2 10  200 2
1 2 11  300 1
2 2 12  400 1
2 2 13  500 2
2 2 14  600 2
3 2 15  . .
4 2 16  . .
1 3 17  100 1
1 3 18  200 1
1 3 19  300 2
2 3 20  400 2
2 3 21  500 1
2 3 22  600 2
3 3 23  . .
4 3 24  100 1
;

Why do we want this?

It is easy to recreate the dataset by just copy/pasting the code
all attributes are defined clearly and leave no doubt about types, formats, ...
because of this, the solution process will be speeded up significantly

It's not rocket science, and the skill to create datasets on the fly in code is very useful, as you will find out in your future work. Help us to help you.

From this dataset, I did this:

/* summarize by id and year */
proc summary data=have nway;
class id year;
var expenditure;
output out=want sum()=;
run;

/* filter out entries with too many missing values */
proc sql;
create table filtered as
  select *
  from want
  where id in (
    select id from want group by id having count(expenditure) > 1
  )
;
quit;

/* create a prefix for the column headers */
proc format;
picture header
  low-high = "09" (prefix="Total_expenditure_year")
;
run;

/* create the report */
proc report data=filtered;
column id expenditure,year;
define id / group;
define year / "" across format=header.;
define expenditure / "" analysis;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ercksh8 · Posted 04-26-2021 03:08 AM

Thank you. I will also keep that in mind in the future.

How to add all values of a variable within each ID by year?

Re: How to add all values of a variable within each ID by year?

Re: How to add all values of a variable within each ID by year?

Re: How to add all values of a variable within each ID by year?

Re: How to add all values of a variable within each ID by year?

Re: How to add all values of a variable within each ID by year?

Re: How to add all values of a variable within each ID by year?

Re: How to add all values of a variable within each ID by year?

Registration is open

SAS Training: Just a Click Away