Help using Base SAS procedures

Count data

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Count data

I have a dataset as the following below

 

ID        Year           Indicator

1          1991             1

1          1992             0

1          1992             1

2          1993             2

3          1991             0

3          1991             1

 

Indicator is a categorical variable. It has three categories (0,1,2). I would like to count the number of each categories for each year for each firm (Panel Data). The expected dataset is following below.

 

ID           Year            No0          No1          No2

1             1991             0               1              0

1             1992             1               1              0 

1             1993             0               0              0

2             1991             0               0              0 

2             1992             0               0              0 

2             1993             0               0              1

3             1991             1               1              0 

3             1992             0               0              0 

3             1993             0               0              0           

 

Can anyone tell me what code do I need to write? Thanks.


Accepted Solutions
Solution
‎04-13-2018 01:38 PM
SAS Super FREQ
Posts: 9,370

Re: Count data

Hi:

  I would be tempted to use either PROC TABULATE:

tabulate_table.png

 

  Or use PROC REPORT:

report_table.png

 

But it depends on whether you want a report or an output dataset. If you just need a report, I'd probably pick TABULATE. If you need a report and/or a dataset, I'd pick PROC REPORT.

 

Hope this helps,

 

cynthia

View solution in original post


All Replies
Super User
Posts: 10,784

Re: Count data

Posted in reply to dapenDaniel
data have;
input ID        Year           Indicator;
cards;
1          1991             1
1          1992             0
1          1992             1
2          1993             2
3          1991             0
3          1991             1
;
run;
proc sql;
select a.id,a.year,sum(Indicator=0) as No0,sum(Indicator=1) as No1,sum(Indicator=2) as No2
from (select * from (select distinct id from have),(select distinct year from have)) as a
 left join have as b
 on a.id=b.id and a.year=b.year
  group by a.id,a.year;
quit;
Solution
‎04-13-2018 01:38 PM
SAS Super FREQ
Posts: 9,370

Re: Count data

Hi:

  I would be tempted to use either PROC TABULATE:

tabulate_table.png

 

  Or use PROC REPORT:

report_table.png

 

But it depends on whether you want a report or an output dataset. If you just need a report, I'd probably pick TABULATE. If you need a report and/or a dataset, I'd pick PROC REPORT.

 

Hope this helps,

 

cynthia

Contributor
Posts: 24

Re: Count data

Posted in reply to Cynthia_sas

Thanks. Another question is about missing value. Do missing values affect the count results? Is a missing value counted as 0 or shown as a dot? Thanks.

Contributor
Posts: 24

Re: Count data

Posted in reply to Cynthia_sas

I also found that after I use proc report to run the data, there are four variables _C3_ _C4_ _C5_ _C6_. what is _C6_?

SAS Super FREQ
Posts: 9,370

Re: Count data

[ Edited ]
Posted in reply to dapenDaniel

Hi, When you have indicator as missing, then it becomes a new possible value for the ACROSS columns, so that now you have 4 possible values for Indicator instead of 3. See the screenshot below:

report_ind_missing.png

PROC REPORT is correctly showing that only year 1991 and ID=4 have a value of . for INDICATOR. With the MISSING option and COMPLETEROWS, you see that no other ID/YEAR combos have . for INDICATOR.

 

cynthia

Contributor
Posts: 24

Re: Count data

Posted in reply to Cynthia_sas

Thank you so much Cynthia!

Contributor
Posts: 24

Re: Count data

There are some missing values in my dataset. I would like to treat these missing value as dot in the dataset rather than treating them as 0. Can you tell me how to do that? Thanks.

Super User
Posts: 13,563

Re: Count data

Posted in reply to dapenDaniel

@dapenDaniel wrote:

There are some missing values in my dataset. I would like to treat these missing value as dot in the dataset rather than treating them as 0. Can you tell me how to do that? Thanks.


Show which code you used.

Perhaps some example data including missing values and what the final result you want would look like.

 

 

 

 

 

Contributor
Posts: 24

Re: Count data

[ Edited ]

I used the Proc Report code offered by Cynthia and also Proc SQL code by Ksharp. 

 

The dataset I have. ID = 4 is missing data.

ID        Year           Indicator

1          1991             1

1          1992             0

1          1992             1

2          1993             2

3          1991             0

3          1991             1

4          1991             .

 

Indicator is a categorical variable. It has three categories (0,1,2). I would like to count the number of each categories for each year for each firm (Panel Data). The expected dataset is following below.

 

This is the expected dataset. I would like to keep the missing value as missing value rather than 0

ID           Year            No0          No1          No2

1             1991             0               1              0

1             1992             1               1              0 

1             1993             0               0              0

2             1991             0               0              0 

2             1992             0               0              0 

2             1993             0               0              1

3             1991             1               1              0 

3             1992             0               0              0 

3             1993             0               0              0           

4             1991              .                .               .

SAS Super FREQ
Posts: 9,370

Re: Count data

Posted in reply to dapenDaniel

Hi:

  What you're describing/showing doesn't make sense to me.

desired_output.png

 

There is clearly something about your data that I don't understand. If ID=4/YEAR=1991 is a row in your data with a value of . for INDICATOR, then you have 4 possible values for INDICATOR: 0, 1, 2 and .(missing). You can't have a . for INDICATOR and want it to mean one thing for all the other rows and another thing for ID=4.

 

  I could understand if your data didn't have anything at all for ID=4 -- but you know that you need to include ID=4 on the report, but if it's not in the data PROC REPORT won't add it to the report without special options. This is where you might use PRELOADFMT:

preloadfmt.png

 

But as I said, I don't understand why you want that one row for ID=4 to be treated any differently than the other rows.

 

Cynthia

Contributor
Posts: 24

Re: Count data

Posted in reply to Cynthia_sas

Hi Cynthia,

 

thanks for your comment. 0 means that this event does not happen in a specific year and Missing value means that this event indeed happened in that year but I cannot tell which category it belongs to due to unavailable information.Does that make sense? Thanks.

SAS Super FREQ
Posts: 9,370

Re: Count data

Posted in reply to dapenDaniel

Hi:

  By that logic "0 means that this event does not happen in a specific year.", the data for ID=1 and YEAR=1992 does not make sense to me.

Not_make_sense.png

 

How can 1992 have a value for both 0 and 1 for INDICATOR when ID=1??

 

It doesn't make sense to me along with your explanation that "Missing value means that this event indeed happened in that year but I cannot tell which category".

 

cynthia

Super User
Posts: 10,784

Re: Count data

Posted in reply to dapenDaniel
data have;
input ID        Year           Indicator;
cards;
1          1991             1
1          1992             0
1          1992             1
2          1993             2
3          1991             0
3          1991             1
4          1991             .
;
run;
proc sql;
select a.id,a.year,sum(Indicator=0) as No0,sum(Indicator=1) as No1,sum(Indicator=2) as No2
from (select * from (select distinct id from have),
     (select distinct year from have)) as a
 left join have as b
 on a.id=b.id and a.year=b.year
 where catx(' ',a.id,a.year) not in (select catx(' ',id,year) from have where indicator is missing)
  group by a.id,a.year
union
select id,year from have where indicator is missing;
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 256 views
  • 0 likes
  • 4 in conversation