BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
joon1
Quartz | Level 8

Dear Madam/Sir,

I have firm identifier (gvkey), year (cyear) and auditor key (auditor_fkey) and like to define common auditor as a dummy variable. In other words, gvkey 1161 and 1380 share the common auditor in year 2003 and these firm-years are coded as 1.  Any coding idea will be greatly appreciated.

 

Obs cyear gvkey AUDITOR_FKEY1234567891011121314151617181920
200311612
200411612
200511612
200611612
200213802
200313802
200413802
200513802
200613802
200713802
200813802
200913802
201013802
201113802
201213802
201313802
200019131
200119131
200219131
200319131

Thanks

Joon1

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

I think you should post the desired output to clarify your question better.

Assuming I understood what you mean.

 

data have;
    input cyear gvkey AUDITOR_FKEY;
    datalines;
2003 1161 2
2004 1161 2
2005 1161 2
2006 1161 2
2002 1380 2
2003 1380 2
2004 1380 2
2005 1380 2
2006 1380 2
2007 1380 2
2008 1380 2
2009 1380 2
2010 1380 2
2011 1380 2
2012 1380 2
2013 1380 2
2000 1913 1
2001 1913 1
2002 1913 1
2003 1913 1
;
run;
proc sql;
create table want as
select *,count(distinct gvkey) ne 1 as flag
 from have
  group by AUDITOR_FKEY,cyear
   order by gvkey,cyear;
quit;

View solution in original post

14 REPLIES 14
Patrick
Opal | Level 21

I don't really understand what you're asking for. Does below return what you're after?

data have;
    input cyear gvkey AUDITOR_FKEY;
    datalines;
2003 1161 2
2004 1161 2
2005 1161 2
2006 1161 2
2002 1380 2
2003 1380 2
2004 1380 2
2005 1380 2
2006 1380 2
2007 1380 2
2008 1380 2
2009 1380 2
2010 1380 2
2011 1380 2
2012 1380 2
2013 1380 2
2000 1913 1
2001 1913 1
2002 1913 1
2003 1913 1
;
run;

proc sql;
	select 
		l.cyear 
		,l.gvkey
		,l.AUDITOR_FKEY
		,r.n_gvkeys
		,case
			when r.n_gvkeys >1 then 1
			else 0
			end as shared_auditor_flg
	from have l
	
	left join
		(
			select 
				cyear 
				,AUDITOR_FKEY
				,count(*) as n_gvkeys
			from have
			group by 
			cyear,AUDITOR_FKEY
	) r
	on l.cyear=r.cyear and l.auditor_fkey=r.auditor_fkey
	order by l.gvkey, l.cyear
	;
quit;

Patrick_0-1734656397527.png

 

 

Ksharp
Super User

I think you should post the desired output to clarify your question better.

Assuming I understood what you mean.

 

data have;
    input cyear gvkey AUDITOR_FKEY;
    datalines;
2003 1161 2
2004 1161 2
2005 1161 2
2006 1161 2
2002 1380 2
2003 1380 2
2004 1380 2
2005 1380 2
2006 1380 2
2007 1380 2
2008 1380 2
2009 1380 2
2010 1380 2
2011 1380 2
2012 1380 2
2013 1380 2
2000 1913 1
2001 1913 1
2002 1913 1
2003 1913 1
;
run;
proc sql;
create table want as
select *,count(distinct gvkey) ne 1 as flag
 from have
  group by AUDITOR_FKEY,cyear
   order by gvkey,cyear;
quit;
joon1
Quartz | Level 8

Thanks for your kind reply, Ksharp. My desired output is the last column (dummy variable coded 1 for firms (gvkey) with the same auditor (auditor_fkey) in the same year (cyear), otherwise 0). Your suggested code does not work. Your help will be highly appreicated.

 

2003 1161 2     1
2004 1161 2     1
2005 1161 2     1
2006 1161 2     1
2002 1380 2     0
2003 1380 2     1
2004 1380 2     1 
2005 1380 2     1
2006 1380 2     1
2007 1380 2     0
2008 1380 2     0
2009 1380 2     0
2010 1380 2     0
2011 1380 2     0
2012 1380 2     0
2013 1380 2     0
2000 1913 1     0
2001 1913 1     0
2002 1913 1     0
2003 1913 1     0

output  

Ksharp
Super User

Then tell me where is different between mine and yours ? Here is my output.

Ksharp_0-1734674750748.png

 

joon1
Quartz | Level 8

I am sorry, Ksharp. Your code assigned all 1s for 'flag'. Your help will be highly appreciated.

 

proc sql;
create table cc2 as
select *,count(distinct gvkey) ne 1 as flag
from cc1
group by AUDITOR_FKEY,cyear
order by gvkey,cyear;
quit;

 

 

Obs cyear gvkey AUDITOR_FKEY flag1234567891011121314151617181920
2003116121
2004116121
2005116121
2006116121
2002138021
2003138021
2004138021
2005138021
2006138021
2007138021
2008138021
2009138021
2010138021
2011138021
2012138021
2013138021
2000191311
2001191311
2002191311
2003191311
ballardw
Super User

Since I run the example data set and get the same result as @Ksharp 

                  AUDITOR_
cyear    gvkey      FKEY      flag

 2003     1161        2         1
 2004     1161        2         1
 2005     1161        2         1
 2006     1161        2         1
 2002     1380        2         0
 2003     1380        2         1
 2004     1380        2         1
 2005     1380        2         1
 2006     1380        2         1
 2007     1380        2         0
 2008     1380        2         0
 2009     1380        2         0
 2010     1380        2         0
 2011     1380        2         0
 2012     1380        2         0
 2013     1380        2         0
 2000     1913        1         0
 2001     1913        1         0
 2002     1913        1         0
 2003     1913        1         0


Then one suspects something different about YOUR CC1 data set that we cannot distinguish because you are not providing example data as a working data step.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

joon1
Quartz | Level 8

Thanks for your help, Balladw. I was not clear on my question. Ksharp's code works.

joon1
Quartz | Level 8

Thanks, Balladw and Ksharp. The working file is just 394 observations, so I display my working dataset (Actually, I don't understand the link you sent). Flags are assigned 1 using your suggested code. Your help will be greatly appreciated.

 

cyeargvkeyAUDITOR_FKEY
200311612
200411612
200511612
200611612
200213802
200313802
200413802
200513802
200613802
200713802
200813802
200913802
201013802
201113802
201213802
201313802
200019131
200119131
200219131
200319131
200122301
200222302
200322302
200422302
200522302
200125931
200225931
200028125
200128125
200228123
201232263
201332263
201432263
201532263
200032783
200232783
200432783
200732783
200832783
200932783
201032783
201132783
201232783
200034395
200134395
200234392
200334392
200434392
200138132
200238132
200338132
200139643
200239643
200339643
200146114
200246114
200346114
200050465
200150465
200250464
200350464
200450464
200550464
200650464
200750464
200850464
200950464
201050464
201150464
200054395
200154395
200254394
200354394
200454394
200255974
200355974
200455974
200555974
200655974
200755974
200457232
200557232
200657232
200757232
200857232
200957232
201057232
201157232
201257232
200357913
200457913
200557913
200657913
200757913
200857913
200957913
201057913
201157913
201257913
201357913
201457913
201557913
200458603
200558603
200658603
200758603
200858603
200958603
201058783
201158783
201258783
201358783
201458783
201558783
201658783
200268213
200368213
200468213
200568213
200668213
200768213
201168451
200171275
200271275
200371273
200871632
200971632
201071632
201171632
201271632
201371632
200375854
200475854
200575854
200675854
200179223
200279223
200379223
200183581
200283581
200383581
200483581
200583581
200683581
200783581
200793722
200893722
200993722
201093722
201193722
201293722
201393722
201493722
201593722
200096672
201198822
201298822
201398822
201498822
201598822
200010386346
200110386346
200210386346
2003103861
2004103861
2005103861
2006103861
2007103866
2008103866
2004105812
2005105812
2006105812
2007105812
2008105812
2009105812
2010105812
2011105812
2001106392
2002106392
2003106392
2004106392
2005106392
2008107932
2009107932
2011108011
2012108011
2013108011
2014108011
2015112643
2000113001
2001113001
2002113001
2003113001
2004113001
2005113001
2006113001
2001121223
2002121223
2003121223
2006121413
2007121413
2008121413
2009121413
2010121413
2011121413
2000122151
2001122151
2002122151
2003122151
2004122151
2005122151
2000127571
2001127571
2002127571
2003127571
2004127571
2005127571
2006127571
2008127852
2009127852
2000134072
2001134072
2013143042
2014143044
2000144123
2005152474
2006152474
2007152474
2013161012
2014161012
2015161012
2016161012
2017161012
2008211862
2009211862
2010211862
2011211862
2003237671
2004237671
2005237671
2005239781
2006239781
2007239781
2008239781
2009239781
2010239781
2011239781
2012239781
2013239781
2014239781
2001247833
2002247833
2003247833
2004247831
2005247831
2006247831
2007247831
2008247831
2009247831
2013251103
2014251103
2015251103
2016251103
2000251194
2001251194
2002251194
2003251194
2004251194
2005251194
2006251194
2007251194
2008251194
2009251194
2010251194
2007252792
2008252792
2009252792
2010252792
2011252792
2012252792
2013252792
2014252792
2015252792
2011258802
2012258802
2013258802
2014258802
2015258802
2003287423
2004287423
2005287423
2006287423
2007287423
2008287423
2009287421
2010287421
2011287421
2012287421
2013287421
2014287421
2015287421
2016287421
2004289302
2005289302
2006289302
2007289302
2008289302
2009289302
2010289302
2005299683
2006299683
2007299683
2008299683
2009299683
2010299683
2011299683
2012300322
2013300322
2001314882
2002314882
2003314882
2004314882
2005314882
2001611555
2005611552
2006611552
2002630515
2003630512
20031177684
20041177684
20051177684
20061177681
20021265541
20031265541
20041265541
20051265541
20061265541
20071265541
20081265541
20091265541
20101265541
20111265541
20121265541
20041289782
20051289782
20061289782
20071289782
20081289782
20091289782
20101289782
20081417202
20091417202
20101417202
20111417202
20121417202
20131417206
20141417206
20011445591
20021445591
20031445591
20041445591
20051445591
20071479882
20081479882
20091479882
20041573541
20051573541
20061573541
20071573541
20081573541
20091573541
20101573541
20111573541
20121573541
20131573541
20071768991
20081768991
20101768991
20111768991
20121768991
20071779252
20081779252
20091779252
20101779252
20111779252
20121779252
20131779252
20141779252
20151779252
20161779252
20091833663
20101833663
20111833663
20121833663
Tom
Super User Tom
Super User

Most of those should be flagged based on your explanation of the logic.

Of the 394 observations there are only 13 where the auditor only audited one firm in that year.

 

                          AUDITOR_
Obs    cyear     gvkey      FKEY      n_firms    flag

  9     2000     25119         4         1         0
 14     2000     10386       346         1         0
 39     2001     10386       346         1         0
 68     2002     10386       346         1         0
217     2007     10386         6         1         0
245     2008     10386         6         1         0
321     2011      5046         4         1         0
361     2013    141720         6         1         0
375     2014     14304         4         1         0
376     2014    141720         6         1         0
377     2015     28742         1         1         0
389     2016     28742         1         1         0
394     2017     16101         2         1         0

PS:  Notice how much easier it is to share plain text instead of tables?  Not only can it be copied into code directly it display using much less screen space on this forum.

joon1
Quartz | Level 8

Thanks for your help, Tom. Your code works. I was not clear about my question. Have a great day.

Tom
Super User Tom
Super User

Perhaps it would be easier to understand if you used normal SAS code instead of SQL logic?

Sort the data by YEAR and AUDITOR (which way you nest them does not really matter) and FIRM.  Then in a data step you can use two DO loops to read through the year X auditor groups of records twice.  Once to count how many distinct firms are listed. And the second time to re-read the data so the detailed observations can be written out.

proc sort data=have;
  by cyear AUDITOR_FKEY gvkey ;
run;

data want;
  do _n_=1 by 1 until(last.AUDITOR_FKEY);
    set have;
    by cyear AUDITOR_FKEY gvkey;
    n_firms = sum(n_firms,first.gvkey);
  end;
  flag = n_firms > 1 ;
  do _n_=1 to _n_ ;
    set have;
    output;
  end;
run;
                         AUDITOR_
Obs    cyear    gvkey      FKEY      n_firms    flag

  1     2000     1913        1          1         0
  2     2001     1913        1          1         0
  3     2002     1913        1          1         0
  4     2002     1380        2          1         0
  5     2003     1913        1          1         0
  6     2003     1161        2          2         1
  7     2003     1380        2          2         1
  8     2004     1161        2          2         1
  9     2004     1380        2          2         1
 10     2005     1161        2          2         1
 11     2005     1380        2          2         1
 12     2006     1161        2          2         1
 13     2006     1380        2          2         1
 14     2007     1380        2          1         0
 15     2008     1380        2          1         0
 16     2009     1380        2          1         0
 17     2010     1380        2          1         0
 18     2011     1380        2          1         0
 19     2012     1380        2          1         0
 20     2013     1380        2          1         0

 

joon1
Quartz | Level 8
I am sorry, Tom. Your code also produces almost 1 for 'flag'. I just posted
the whole dataset in another posting. Thanks.
Tom
Super User Tom
Super User

@joon1 wrote:
I am sorry, Tom. Your code also produces almost 1 for 'flag'. I just posted
the whole dataset in another posting. Thanks.

Show exactly what observations have the wrong FLAG setting. And explain why.  Your new data has only 13/394 observations where the auditor only audited one firm in that year.  If that is NOT the situation you are trying to detect then please explain (with concrete examples) what you do want to flag.

joon1
Quartz | Level 8

Thanks a lot, Ksharp. Actually, your code works. I sincerely apologize for your confusion. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 588 views
  • 2 likes
  • 5 in conversation