Solved: Count data

dapenDaniel · Posted 04-13-2018 12:07 AM

I have a dataset as the following below

ID Year Indicator

1 1991 1

1 1992 0

1 1992 1

2 1993 2

3 1991 0

3 1991 1

Indicator is a categorical variable. It has three categories (0,1,2). I would like to count the number of each categories for each year for each firm (Panel Data). The expected dataset is following below.

ID Year No0 No1 No2

1 1991 0 1 0

1 1992 1 1 0

1 1993 0 0 0

2 1991 0 0 0

2 1992 0 0 0

2 1993 0 0 1

3 1991 1 1 0

3 1992 0 0 0

3 1993 0 0 0

Can anyone tell me what code do I need to write? Thanks.

Cynthia_sas · Posted 04-13-2018 11:30 AM

Hi:

I would be tempted to use either PROC TABULATE:

Or use PROC REPORT:

But it depends on whether you want a report or an output dataset. If you just need a report, I'd probably pick TABULATE. If you need a report and/or a dataset, I'd pick PROC REPORT.

Hope this helps,

cynthia

View solution in original post

Ksharp · Posted 04-13-2018 09:25 AM

data have;
input ID        Year           Indicator;
cards;
1          1991             1
1          1992             0
1          1992             1
2          1993             2
3          1991             0
3          1991             1
;
run;
proc sql;
select a.id,a.year,sum(Indicator=0) as No0,sum(Indicator=1) as No1,sum(Indicator=2) as No2
from (select * from (select distinct id from have),(select distinct year from have)) as a
 left join have as b
 on a.id=b.id and a.year=b.year
  group by a.id,a.year;
quit;

Cynthia_sas · Posted 04-13-2018 11:30 AM

Hi:

I would be tempted to use either PROC TABULATE:

Or use PROC REPORT:

But it depends on whether you want a report or an output dataset. If you just need a report, I'd probably pick TABULATE. If you need a report and/or a dataset, I'd pick PROC REPORT.

Hope this helps,

cynthia

dapenDaniel · Posted 04-13-2018 11:44 AM

Thanks. Another question is about missing value. Do missing values affect the count results? Is a missing value counted as 0 or shown as a dot? Thanks.

dapenDaniel · Posted 04-13-2018 11:54 AM

I also found that after I use proc report to run the data, there are four variables _C3_ _C4_ _C5_ _C6_. what is _C6_?

Cynthia_sas · Posted 04-13-2018 01:28 PM

Hi, When you have indicator as missing, then it becomes a new possible value for the ACROSS columns, so that now you have 4 possible values for Indicator instead of 3. See the screenshot below:

PROC REPORT is correctly showing that only year 1991 and ID=4 have a value of . for INDICATOR. With the MISSING option and COMPLETEROWS, you see that no other ID/YEAR combos have . for INDICATOR.

cynthia

dapenDaniel · Posted 04-13-2018 01:38 PM

Thank you so much Cynthia!

dapenDaniel · Posted 04-13-2018 12:03 PM

There are some missing values in my dataset. I would like to treat these missing value as dot in the dataset rather than treating them as 0. Can you tell me how to do that? Thanks.

ballardw · Posted 04-13-2018 12:24 PM

@dapenDaniel wrote:

There are some missing values in my dataset. I would like to treat these missing value as dot in the dataset rather than treating them as 0. Can you tell me how to do that? Thanks.

Show which code you used.

Perhaps some example data including missing values and what the final result you want would look like.

dapenDaniel · Posted 04-13-2018 12:34 PM

I used the Proc Report code offered by Cynthia and also Proc SQL code by Ksharp.

The dataset I have. ID = 4 is missing data.

ID Year Indicator

1 1991 1

1 1992 0

1 1992 1

2 1993 2

3 1991 0

3 1991 1

4 1991 .

Indicator is a categorical variable. It has three categories (0,1,2). I would like to count the number of each categories for each year for each firm (Panel Data). The expected dataset is following below.

This is the expected dataset. I would like to keep the missing value as missing value rather than 0

ID Year No0 No1 No2

1 1991 0 1 0

1 1992 1 1 0

1 1993 0 0 0

2 1991 0 0 0

2 1992 0 0 0

2 1993 0 0 1

3 1991 1 1 0

3 1992 0 0 0

3 1993 0 0 0

4 1991 . . .

Cynthia_sas · Posted 04-13-2018 01:54 PM

Hi:

What you're describing/showing doesn't make sense to me.

There is clearly something about your data that I don't understand. If ID=4/YEAR=1991 is a row in your data with a value of . for INDICATOR, then you have 4 possible values for INDICATOR: 0, 1, 2 and .(missing). You can't have a . for INDICATOR and want it to mean one thing for all the other rows and another thing for ID=4.

I could understand if your data didn't have anything at all for ID=4 -- but you know that you need to include ID=4 on the report, but if it's not in the data PROC REPORT won't add it to the report without special options. This is where you might use PRELOADFMT:

But as I said, I don't understand why you want that one row for ID=4 to be treated any differently than the other rows.

Cynthia

dapenDaniel · Posted 04-13-2018 02:38 PM

Hi Cynthia,

thanks for your comment. 0 means that this event does not happen in a specific year and Missing value means that this event indeed happened in that year but I cannot tell which category it belongs to due to unavailable information.Does that make sense? Thanks.

Cynthia_sas · Posted 04-13-2018 06:30 PM

Hi:

By that logic "0 means that this event does not happen in a specific year.", the data for ID=1 and YEAR=1992 does not make sense to me.

How can 1992 have a value for both 0 and 1 for INDICATOR when ID=1??

It doesn't make sense to me along with your explanation that "Missing value means that this event indeed happened in that year but I cannot tell which category".

cynthia

Ksharp · Posted 04-14-2018 06:49 AM

data have;
input ID        Year           Indicator;
cards;
1          1991             1
1          1992             0
1          1992             1
2          1993             2
3          1991             0
3          1991             1
4          1991             .
;
run;
proc sql;
select a.id,a.year,sum(Indicator=0) as No0,sum(Indicator=1) as No1,sum(Indicator=2) as No2
from (select * from (select distinct id from have),
     (select distinct year from have)) as a
 left join have as b
 on a.id=b.id and a.year=b.year
 where catx(' ',a.id,a.year) not in (select catx(' ',id,year) from have where indicator is missing)
  group by a.id,a.year
union
select id,year from have where indicator is missing;
quit;

Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Re: Count data

Registration is open

SAS Training: Just a Click Away