I have a dataset as the following below
ID Year Indicator
1 1991 1
1 1992 0
1 1992 1
2 1993 2
3 1991 0
3 1991 1
Indicator is a categorical variable. It has three categories (0,1,2). I would like to count the number of each categories for each year for each firm (Panel Data). The expected dataset is following below.
ID Year No0 No1 No2
1 1991 0 1 0
1 1992 1 1 0
1 1993 0 0 0
2 1991 0 0 0
2 1992 0 0 0
2 1993 0 0 1
3 1991 1 1 0
3 1992 0 0 0
3 1993 0 0 0
Can anyone tell me what code do I need to write? Thanks.
Hi:
I would be tempted to use either PROC TABULATE:
Or use PROC REPORT:
But it depends on whether you want a report or an output dataset. If you just need a report, I'd probably pick TABULATE. If you need a report and/or a dataset, I'd pick PROC REPORT.
Hope this helps,
cynthia
data have;
input ID Year Indicator;
cards;
1 1991 1
1 1992 0
1 1992 1
2 1993 2
3 1991 0
3 1991 1
;
run;
proc sql;
select a.id,a.year,sum(Indicator=0) as No0,sum(Indicator=1) as No1,sum(Indicator=2) as No2
from (select * from (select distinct id from have),(select distinct year from have)) as a
left join have as b
on a.id=b.id and a.year=b.year
group by a.id,a.year;
quit;
Hi:
I would be tempted to use either PROC TABULATE:
Or use PROC REPORT:
But it depends on whether you want a report or an output dataset. If you just need a report, I'd probably pick TABULATE. If you need a report and/or a dataset, I'd pick PROC REPORT.
Hope this helps,
cynthia
Thanks. Another question is about missing value. Do missing values affect the count results? Is a missing value counted as 0 or shown as a dot? Thanks.
I also found that after I use proc report to run the data, there are four variables _C3_ _C4_ _C5_ _C6_. what is _C6_?
Hi, When you have indicator as missing, then it becomes a new possible value for the ACROSS columns, so that now you have 4 possible values for Indicator instead of 3. See the screenshot below:
PROC REPORT is correctly showing that only year 1991 and ID=4 have a value of . for INDICATOR. With the MISSING option and COMPLETEROWS, you see that no other ID/YEAR combos have . for INDICATOR.
cynthia
Thank you so much Cynthia!
There are some missing values in my dataset. I would like to treat these missing value as dot in the dataset rather than treating them as 0. Can you tell me how to do that? Thanks.
@dapenDaniel wrote:
There are some missing values in my dataset. I would like to treat these missing value as dot in the dataset rather than treating them as 0. Can you tell me how to do that? Thanks.
Show which code you used.
Perhaps some example data including missing values and what the final result you want would look like.
I used the Proc Report code offered by Cynthia and also Proc SQL code by Ksharp.
The dataset I have. ID = 4 is missing data.
ID Year Indicator
1 1991 1
1 1992 0
1 1992 1
2 1993 2
3 1991 0
3 1991 1
4 1991 .
Indicator is a categorical variable. It has three categories (0,1,2). I would like to count the number of each categories for each year for each firm (Panel Data). The expected dataset is following below.
This is the expected dataset. I would like to keep the missing value as missing value rather than 0
ID Year No0 No1 No2
1 1991 0 1 0
1 1992 1 1 0
1 1993 0 0 0
2 1991 0 0 0
2 1992 0 0 0
2 1993 0 0 1
3 1991 1 1 0
3 1992 0 0 0
3 1993 0 0 0
4 1991 . . .
Hi:
What you're describing/showing doesn't make sense to me.
There is clearly something about your data that I don't understand. If ID=4/YEAR=1991 is a row in your data with a value of . for INDICATOR, then you have 4 possible values for INDICATOR: 0, 1, 2 and .(missing). You can't have a . for INDICATOR and want it to mean one thing for all the other rows and another thing for ID=4.
I could understand if your data didn't have anything at all for ID=4 -- but you know that you need to include ID=4 on the report, but if it's not in the data PROC REPORT won't add it to the report without special options. This is where you might use PRELOADFMT:
But as I said, I don't understand why you want that one row for ID=4 to be treated any differently than the other rows.
Cynthia
Hi Cynthia,
thanks for your comment. 0 means that this event does not happen in a specific year and Missing value means that this event indeed happened in that year but I cannot tell which category it belongs to due to unavailable information.Does that make sense? Thanks.
Hi:
By that logic "0 means that this event does not happen in a specific year.", the data for ID=1 and YEAR=1992 does not make sense to me.
How can 1992 have a value for both 0 and 1 for INDICATOR when ID=1??
It doesn't make sense to me along with your explanation that "Missing value means that this event indeed happened in that year but I cannot tell which category".
cynthia
data have;
input ID Year Indicator;
cards;
1 1991 1
1 1992 0
1 1992 1
2 1993 2
3 1991 0
3 1991 1
4 1991 .
;
run;
proc sql;
select a.id,a.year,sum(Indicator=0) as No0,sum(Indicator=1) as No1,sum(Indicator=2) as No2
from (select * from (select distinct id from have),
(select distinct year from have)) as a
left join have as b
on a.id=b.id and a.year=b.year
where catx(' ',a.id,a.year) not in (select catx(' ',id,year) from have where indicator is missing)
group by a.id,a.year
union
select id,year from have where indicator is missing;
quit;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.