01-11-2014 04:24 PM
I have this dataset:
year firm industry value status
2000 a I 8 A
2000 b II 1 A
2001 a I 7 A
2001 a I 0 N
2001 b II 6 A
2002 a I 5 A
2003 b II 2 A
2003 c III 9 A
2003 d III 0 N
I need to do these steps:
1. create a variable status= A(for value greater than 0) and N otherwise. I have done that above
2. need to delete if a firm appears twice in any year, preferably retain the firm observation with the positive value.
3.For each industry each year, I need to count the number of firms that have a positive value(count=A) and the total number of firms.
I am getting stuck on the 2nd step.
For the 3rd step, this is the code I wrote:
If first.status='A' then Acount=1;
If first.status='N' then Ncount=1;
by year industry_new;
But these codes are not working. The error message that comes is: ERROR: BY variables are not properly sorted on data set
Would appreciate help.
01-11-2014 06:27 PM
Hi Try this ,
input year firm :$ industry :$ value;
if value>0 then Status='A' ;else Status='N';
2000 a I 8
2000 b II 1
2001 a I 7
2001 a I 0
2001 b II 6
2002 a I 5
2002 a I 0
2002 b II 5
2003 b II 2
2003 c III 9
2003 d III 0
/*query3:gives count of firms per year*/
select year,count(*) as count_firm
/* query2:gives count of firms per year per industry wise having status ='A' */
select year,firm,industry,status,count(*) as A_firms
/* query1 start: gives solution to your point2*/
select distinct year,firm,industry,value,status
from test group by year,firm having value=max(value)
where status='A' group by year,industry
group by year;
01-12-2014 02:32 AM
Please check the below code
/*need to delete if a firm appears twice in any year, preferably retain the firm observation with the positive value.*/
create table status as select distinct year,firm,industry, status, count(status) as count from test where status='A' and value > 0 group by year,firm,industry;
/*.For each industry each year, I need to count the number of firms that have a positive value(count=A) and the total number of firms.*/
create table count as select distinct status,year,industry,firm,count(status) as count_status,count(firm) as count_firm from status group by year,industry;
The dataset count has the desired output. if i understood your requirement correctly you are expecting the below output. could you please check and let me know if this is not the desired output.
01-12-2014 02:09 PM
Thank you for your suggestion.
I need the output slightly different. I need two count variables
- the first count variable contains the number of firms per industry. Each indsutry row will contain the total number of firms in that industry.
- the second count variable contains the number of firms with positive value per industry. Each industry row will contain the number of firms with positive values in that industry.
I need these two count variables so as to construct a ratio of # of positives divided by the total #.
Using Pradeep's code, I have been able to construct the first variable and now, am trying to do the second.
As far as retaining the firm with positive values each year, to avoid duplicate firms each year, I do not think that should be a problem, I had a look at the dataset.
01-12-2014 02:52 PM
This should do what you want: For reference I nested the creation of the sum variables in the joins in testb, but this could just as easily be done in two separate tables and then joined in, in the same manner. I maxed the values in testa bc you only wanted the best value for duplicates. Hope this helps.
create table testa as
select distinct year, firm, industry, (max(value)) as value, (case when (max(value)) >0 then 'A' else 'N' end) as status
group by year, firm, industry;
create table testb as
select distinct a.*, b.acount, c.numfirm
from testa a left join (select distinct year, firm, industry, (sum(case when status = 'A' then 1 else 0 end)) as acount
from testa group by year, firm, industry) b on (a.year=b.year and a.firm=b.firm and a.industry=b.industry)
left join (select distinct year, count(firm) as numfirm, industry
from testa group by year, industry) c on (a.year=c.year and a.industry=c.industry);
01-12-2014 03:13 PM
Thank you, overmar.
I was still working on earlier codes and I think I got the desired output.
I had to use the sum() statement. That worked!
Thank you, everyone.
As always, this forum is a blessing for SAS newbies