BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dapenDaniel
Obsidian | Level 7

I have a dataset as the following below

 

ID        Year           Indicator

1          1991             1

1          1992             0

1          1992             1

2          1993             2

3          1991             0

3          1991             1

 

Indicator is a categorical variable. It has three categories (0,1,2). I would like to count the number of each categories for each year for each firm (Panel Data). The expected dataset is following below.

 

ID           Year            No0          No1          No2

1             1991             0               1              0

1             1992             1               1              0 

1             1993             0               0              0

2             1991             0               0              0 

2             1992             0               0              0 

2             1993             0               0              1

3             1991             1               1              0 

3             1992             0               0              0 

3             1993             0               0              0           

 

Can anyone tell me what code do I need to write? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ

Hi:

  I would be tempted to use either PROC TABULATE:

tabulate_table.png

 

  Or use PROC REPORT:

report_table.png

 

But it depends on whether you want a report or an output dataset. If you just need a report, I'd probably pick TABULATE. If you need a report and/or a dataset, I'd pick PROC REPORT.

 

Hope this helps,

 

cynthia

View solution in original post

13 REPLIES 13
Ksharp
Super User
data have;
input ID        Year           Indicator;
cards;
1          1991             1
1          1992             0
1          1992             1
2          1993             2
3          1991             0
3          1991             1
;
run;
proc sql;
select a.id,a.year,sum(Indicator=0) as No0,sum(Indicator=1) as No1,sum(Indicator=2) as No2
from (select * from (select distinct id from have),(select distinct year from have)) as a
 left join have as b
 on a.id=b.id and a.year=b.year
  group by a.id,a.year;
quit;
Cynthia_sas
SAS Super FREQ

Hi:

  I would be tempted to use either PROC TABULATE:

tabulate_table.png

 

  Or use PROC REPORT:

report_table.png

 

But it depends on whether you want a report or an output dataset. If you just need a report, I'd probably pick TABULATE. If you need a report and/or a dataset, I'd pick PROC REPORT.

 

Hope this helps,

 

cynthia

dapenDaniel
Obsidian | Level 7

Thanks. Another question is about missing value. Do missing values affect the count results? Is a missing value counted as 0 or shown as a dot? Thanks.

dapenDaniel
Obsidian | Level 7

I also found that after I use proc report to run the data, there are four variables _C3_ _C4_ _C5_ _C6_. what is _C6_?

Cynthia_sas
SAS Super FREQ

Hi, When you have indicator as missing, then it becomes a new possible value for the ACROSS columns, so that now you have 4 possible values for Indicator instead of 3. See the screenshot below:

report_ind_missing.png

PROC REPORT is correctly showing that only year 1991 and ID=4 have a value of . for INDICATOR. With the MISSING option and COMPLETEROWS, you see that no other ID/YEAR combos have . for INDICATOR.

 

cynthia

dapenDaniel
Obsidian | Level 7

Thank you so much Cynthia!

dapenDaniel
Obsidian | Level 7

There are some missing values in my dataset. I would like to treat these missing value as dot in the dataset rather than treating them as 0. Can you tell me how to do that? Thanks.

ballardw
Super User

@dapenDaniel wrote:

There are some missing values in my dataset. I would like to treat these missing value as dot in the dataset rather than treating them as 0. Can you tell me how to do that? Thanks.


Show which code you used.

Perhaps some example data including missing values and what the final result you want would look like.

 

 

 

 

 

dapenDaniel
Obsidian | Level 7

I used the Proc Report code offered by Cynthia and also Proc SQL code by Ksharp. 

 

The dataset I have. ID = 4 is missing data.

ID        Year           Indicator

1          1991             1

1          1992             0

1          1992             1

2          1993             2

3          1991             0

3          1991             1

4          1991             .

 

Indicator is a categorical variable. It has three categories (0,1,2). I would like to count the number of each categories for each year for each firm (Panel Data). The expected dataset is following below.

 

This is the expected dataset. I would like to keep the missing value as missing value rather than 0

ID           Year            No0          No1          No2

1             1991             0               1              0

1             1992             1               1              0 

1             1993             0               0              0

2             1991             0               0              0 

2             1992             0               0              0 

2             1993             0               0              1

3             1991             1               1              0 

3             1992             0               0              0 

3             1993             0               0              0           

4             1991              .                .               .

Cynthia_sas
SAS Super FREQ

Hi:

  What you're describing/showing doesn't make sense to me.

desired_output.png

 

There is clearly something about your data that I don't understand. If ID=4/YEAR=1991 is a row in your data with a value of . for INDICATOR, then you have 4 possible values for INDICATOR: 0, 1, 2 and .(missing). You can't have a . for INDICATOR and want it to mean one thing for all the other rows and another thing for ID=4.

 

  I could understand if your data didn't have anything at all for ID=4 -- but you know that you need to include ID=4 on the report, but if it's not in the data PROC REPORT won't add it to the report without special options. This is where you might use PRELOADFMT:

preloadfmt.png

 

But as I said, I don't understand why you want that one row for ID=4 to be treated any differently than the other rows.

 

Cynthia

dapenDaniel
Obsidian | Level 7

Hi Cynthia,

 

thanks for your comment. 0 means that this event does not happen in a specific year and Missing value means that this event indeed happened in that year but I cannot tell which category it belongs to due to unavailable information.Does that make sense? Thanks.

Cynthia_sas
SAS Super FREQ

Hi:

  By that logic "0 means that this event does not happen in a specific year.", the data for ID=1 and YEAR=1992 does not make sense to me.

Not_make_sense.png

 

How can 1992 have a value for both 0 and 1 for INDICATOR when ID=1??

 

It doesn't make sense to me along with your explanation that "Missing value means that this event indeed happened in that year but I cannot tell which category".

 

cynthia

Ksharp
Super User
data have;
input ID        Year           Indicator;
cards;
1          1991             1
1          1992             0
1          1992             1
2          1993             2
3          1991             0
3          1991             1
4          1991             .
;
run;
proc sql;
select a.id,a.year,sum(Indicator=0) as No0,sum(Indicator=1) as No1,sum(Indicator=2) as No2
from (select * from (select distinct id from have),
     (select distinct year from have)) as a
 left join have as b
 on a.id=b.id and a.year=b.year
 where catx(' ',a.id,a.year) not in (select catx(' ',id,year) from have where indicator is missing)
  group by a.id,a.year
union
select id,year from have where indicator is missing;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1290 views
  • 0 likes
  • 4 in conversation