BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abdulla
Pyrite | Level 9
RankTDC1GVKEYYearPercentage of TDC1
15237.743100420100.4265
22554.46710042010 
31578.10410042010 
51314.07910042010 
41596.90210042010 
15786.410042011 
22781.15610042011 
41696.43110042011 
31727.06910042011 
14182.832100420120.4386
22068.55410042012 
5841.20410042012 
31312.85210042012 
41131.17610042012 
13700.925101320100.4832
21099.78610132010 
5848.65710132010 
31067.56610132010 
4941.74810132010 
15231.393104520100.3532
42372.1410452010 
51776.06910452010 
32288.90810452010 
23142.08210452010 

 

Hi, I need help for the following.

1. The ranks in the above data are created based on the TDC1. There should be 5 TDC1 observations based on each year. But if you look at 2011, it misses one observation. So, I want to delete all the observations in that year. I have a huge data set. In that data set, if I have less than 5 TDC1 observations in each year or GVKEY, I have to remove all the observations for that particular year and GVKEY. I want to be consistent throughout so that I will have 5 TDC1 observations for each year for different GVKEY. Can anyone please help me by giving me SAS code for that?

 

2. I need to find the percentage of TDC1 for each year and for Rank 1. An example is given in the last column above. 

the percentage for the first row is (5237.743/(5237.743+2554.467+1578.104+1314.079+1596.902).  Please give me SAS code for this too. 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

HI @abdulla  Straight forward SQL

 

data have;
infile cards truncover;
input Rank	TDC1	GVKEY	Year;*	PercentageofTDC1;
cards;
1	5237.743	1004	2010	0.4265
2	2554.467	1004	2010	 
3	1578.104	1004	2010	 
5	1314.079	1004	2010	 
4	1596.902	1004	2010	 
1	5786.4	1004	2011	 
2	2781.156	1004	2011	 
4	1696.431	1004	2011	 
3	1727.069	1004	2011	 
1	4182.832	1004	2012	0.4386
2	2068.554	1004	2012	 
5	841.204	1004	2012	 
3	1312.852	1004	2012	 
4	1131.176	1004	2012	 
1	3700.925	1013	2010	0.4832
2	1099.786	1013	2010	 
5	848.657	1013	2010	 
3	1067.566	1013	2010	 
4	941.748	1013	2010	 
1	5231.393	1045	2010	0.3532
4	2372.14	1045	2010	 
5	1776.069	1045	2010	 
3	2288.908	1045	2010	 
2	3142.082	1045	2010	 
;


proc sql;
create table want as
select *,ifn(rank=1,TDC1/sum(TDC1),.) as Pct
from have
group by GVKEY,Year
having count(*)=5
order by GVKEY,year,rank;
quit;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20

HI @abdulla  Straight forward SQL

 

data have;
infile cards truncover;
input Rank	TDC1	GVKEY	Year;*	PercentageofTDC1;
cards;
1	5237.743	1004	2010	0.4265
2	2554.467	1004	2010	 
3	1578.104	1004	2010	 
5	1314.079	1004	2010	 
4	1596.902	1004	2010	 
1	5786.4	1004	2011	 
2	2781.156	1004	2011	 
4	1696.431	1004	2011	 
3	1727.069	1004	2011	 
1	4182.832	1004	2012	0.4386
2	2068.554	1004	2012	 
5	841.204	1004	2012	 
3	1312.852	1004	2012	 
4	1131.176	1004	2012	 
1	3700.925	1013	2010	0.4832
2	1099.786	1013	2010	 
5	848.657	1013	2010	 
3	1067.566	1013	2010	 
4	941.748	1013	2010	 
1	5231.393	1045	2010	0.3532
4	2372.14	1045	2010	 
5	1776.069	1045	2010	 
3	2288.908	1045	2010	 
2	3142.082	1045	2010	 
;


proc sql;
create table want as
select *,ifn(rank=1,TDC1/sum(TDC1),.) as Pct
from have
group by GVKEY,Year
having count(*)=5
order by GVKEY,year,rank;
quit;
abdulla
Pyrite | Level 9

Thank you very much. Could you give me a video link from which I can learn necessary SAS coding for Finance student

novinosrin
Tourmaline | Level 20

"SAS coding for Finance student"--- My oh my!, those words seem scary and intimidating. I just do SAS like  video games and that's as far as it goes. My recommendation would be to read anything/all posts by

 

1. @hashman   World's renowned super star who is also known as Paul Dorfman aka Don of the SAS world and a SAS author

2. @data_null__  King of SAS, with a prefix John

3. @FreelanceReinh Kolmogorov reborn - This man is prolly your best bet to offer advice

4. @Tom  The sage who transcends generations

5. @Ksharp My mother's favorite

6. @Reeza  There isn't a thing  that Reeza doesn't know about

7. @PaigeMiller  The statsman -champion of Regression,Canonical correlation analysis etc

8. @ballardw  The champ of manipulating time series without using INTERVALDS, can custom code anything he likes

9. @mkeintz  With him you will never LAG and will always LEAD plus you would SET things conditionally

10. @PGStats  First and last captures the heart, and at last Prodigy Genius stats  aka Pierre has no Peers

 

Please reach out to them and hopefully you get started. All the best bro! Have fun learning!

 

 

 


@abdulla wrote:

Thank you very much. Could you give me a video link from which I can learn necessary SAS coding for Finance student


 

hashman
Ammonite | Level 13

@abdulla:

Assuming that the input is sorted by [gvkey,year], you can interleave the files by this key. Then in the first pass through each BY group do the summation and in the second pass decide whether to keep the group and if yes, compute the percentage for the record with rank=1.

In SAS words:

data have ;                                                                                                                             
  input rank tdc1 gvkey year ;                                                                                                          
cards ;                                                                                                                                 
1  5237.743  1004  2010                                                                                                                 
2  2554.467  1004  2010                                                                                                                 
3  1578.104  1004  2010                                                                                                                 
5  1314.079  1004  2010                                                                                                                 
4  1596.902  1004  2010                                                                                                                 
1  5786.4    1004  2011                                                                                                                 
2  2781.156  1004  2011                                                                                                                 
4  1696.431  1004  2011                                                                                                                 
3  1727.069  1004  2011                                                                                                                 
1  4182.832  1004  2012                                                                                                                 
2  2068.554  1004  2012                                                                                                                 
5   841.204  1004  2012                                                                                                                 
3  1312.852  1004  2012                                                                                                                 
4  1131.176  1004  2012                                                                                                                 
1  3700.925  1013  2010                                                                                                                 
2  1099.786  1013  2010                                                                                                                 
5   848.657  1013  2010                                                                                                                 
3  1067.566  1013  2010                                                                                                                 
4   941.748  1013  2010                                                                                                                 
1  5231.393  1045  2010                                                                                                                 
4  2372.14   1045  2010                                                                                                                 
5  1776.069  1045  2010                                                                                                                 
3  2288.908  1045  2010                                                                                                                 
2  3142.082  1045  2010                                                                                                                 
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  set have (in=h) have ;                                                                                                                
  by gvkey year ;                                                                                                                       
  if h then do ;                                                                                                                        
    if first.year then call missing (_s, _n) ;                                                                                          
    _s + tdc1 ;                                                                                                                         
    _n + 1 ;                                                                                                                            
  end ;                                                                                                                                 
  else if _n = 5 then do ;                                                                                                              
    if rank = 1 then tdc1_pct = divide (tdc1, _s) ;                                                                                     
    output ;                                                                                                                            
  end ;                                                                                                                                 
run ;                                                  

Kind regards

Paul D. 

mkeintz
PROC Star

Here's a slight variation on @hashman's.  It also uses SET with two HAVE's, but doesn't accumulate TDC1 in the first pass.  It retieves the TDC1 total only when PCT is calculated.  I offer this just as a specimen of a particular style (using subsetting IF to eliminate the need for an explicit OUTPUT): 

 

data have;
infile cards truncover;
input Rank	TDC1	GVKEY	Year;*	PercentageofTDC1;
cards;
1	5237.743	1004	2010	0.4265
2	2554.467	1004	2010	 
3	1578.104	1004	2010	 
5	1314.079	1004	2010	 
4	1596.902	1004	2010	 
1	5786.4	1004	2011	 
2	2781.156	1004	2011	 
4	1696.431	1004	2011	 
3	1727.069	1004	2011	 
1	4182.832	1004	2012	0.4386
2	2068.554	1004	2012	 
5	841.204	1004	2012	 
3	1312.852	1004	2012	 
4	1131.176	1004	2012	 
1	3700.925	1013	2010	0.4832
2	1099.786	1013	2010	 
5	848.657	1013	2010	 
3	1067.566	1013	2010	 
4	941.748	1013	2010	 
1	5231.393	1045	2010	0.3532
4	2372.14	1045	2010	 
5	1776.069	1045	2010	 
3	2288.908	1045	2010	 
2	3142.082	1045	2010	 
;

data want;
  set have (in=firstpass) have (in=secondpass);
  by gvkey year;
  array t {5} _temporary_;
  if first.year then call missing(of t{*});
  if firstpass then t{rank}=tdc1;
  
  if secondpass=1 and n(of t{*})=5;
  if rank=1 then pct=t{1}/sum(of t{*});  
run;

 

Note it doesn't protect against having extra obs for ranks 1 through 5 (i.e. it doesn't count records, it counts distinct ranks).  And it relies on data being sorted by GVKEY/YEAR (but not neccessarily RANK).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 582 views
  • 7 likes
  • 4 in conversation