Rank | TDC1 | GVKEY | Year | Percentage of TDC1 |
1 | 5237.743 | 1004 | 2010 | 0.4265 |
2 | 2554.467 | 1004 | 2010 | |
3 | 1578.104 | 1004 | 2010 | |
5 | 1314.079 | 1004 | 2010 | |
4 | 1596.902 | 1004 | 2010 | |
1 | 5786.4 | 1004 | 2011 | |
2 | 2781.156 | 1004 | 2011 | |
4 | 1696.431 | 1004 | 2011 | |
3 | 1727.069 | 1004 | 2011 | |
1 | 4182.832 | 1004 | 2012 | 0.4386 |
2 | 2068.554 | 1004 | 2012 | |
5 | 841.204 | 1004 | 2012 | |
3 | 1312.852 | 1004 | 2012 | |
4 | 1131.176 | 1004 | 2012 | |
1 | 3700.925 | 1013 | 2010 | 0.4832 |
2 | 1099.786 | 1013 | 2010 | |
5 | 848.657 | 1013 | 2010 | |
3 | 1067.566 | 1013 | 2010 | |
4 | 941.748 | 1013 | 2010 | |
1 | 5231.393 | 1045 | 2010 | 0.3532 |
4 | 2372.14 | 1045 | 2010 | |
5 | 1776.069 | 1045 | 2010 | |
3 | 2288.908 | 1045 | 2010 | |
2 | 3142.082 | 1045 | 2010 |
Hi, I need help for the following.
1. The ranks in the above data are created based on the TDC1. There should be 5 TDC1 observations based on each year. But if you look at 2011, it misses one observation. So, I want to delete all the observations in that year. I have a huge data set. In that data set, if I have less than 5 TDC1 observations in each year or GVKEY, I have to remove all the observations for that particular year and GVKEY. I want to be consistent throughout so that I will have 5 TDC1 observations for each year for different GVKEY. Can anyone please help me by giving me SAS code for that?
2. I need to find the percentage of TDC1 for each year and for Rank 1. An example is given in the last column above.
the percentage for the first row is (5237.743/(5237.743+2554.467+1578.104+1314.079+1596.902). Please give me SAS code for this too.
HI @abdulla Straight forward SQL
data have;
infile cards truncover;
input Rank TDC1 GVKEY Year;* PercentageofTDC1;
cards;
1 5237.743 1004 2010 0.4265
2 2554.467 1004 2010
3 1578.104 1004 2010
5 1314.079 1004 2010
4 1596.902 1004 2010
1 5786.4 1004 2011
2 2781.156 1004 2011
4 1696.431 1004 2011
3 1727.069 1004 2011
1 4182.832 1004 2012 0.4386
2 2068.554 1004 2012
5 841.204 1004 2012
3 1312.852 1004 2012
4 1131.176 1004 2012
1 3700.925 1013 2010 0.4832
2 1099.786 1013 2010
5 848.657 1013 2010
3 1067.566 1013 2010
4 941.748 1013 2010
1 5231.393 1045 2010 0.3532
4 2372.14 1045 2010
5 1776.069 1045 2010
3 2288.908 1045 2010
2 3142.082 1045 2010
;
proc sql;
create table want as
select *,ifn(rank=1,TDC1/sum(TDC1),.) as Pct
from have
group by GVKEY,Year
having count(*)=5
order by GVKEY,year,rank;
quit;
HI @abdulla Straight forward SQL
data have;
infile cards truncover;
input Rank TDC1 GVKEY Year;* PercentageofTDC1;
cards;
1 5237.743 1004 2010 0.4265
2 2554.467 1004 2010
3 1578.104 1004 2010
5 1314.079 1004 2010
4 1596.902 1004 2010
1 5786.4 1004 2011
2 2781.156 1004 2011
4 1696.431 1004 2011
3 1727.069 1004 2011
1 4182.832 1004 2012 0.4386
2 2068.554 1004 2012
5 841.204 1004 2012
3 1312.852 1004 2012
4 1131.176 1004 2012
1 3700.925 1013 2010 0.4832
2 1099.786 1013 2010
5 848.657 1013 2010
3 1067.566 1013 2010
4 941.748 1013 2010
1 5231.393 1045 2010 0.3532
4 2372.14 1045 2010
5 1776.069 1045 2010
3 2288.908 1045 2010
2 3142.082 1045 2010
;
proc sql;
create table want as
select *,ifn(rank=1,TDC1/sum(TDC1),.) as Pct
from have
group by GVKEY,Year
having count(*)=5
order by GVKEY,year,rank;
quit;
Thank you very much. Could you give me a video link from which I can learn necessary SAS coding for Finance student
"SAS coding for Finance student"--- My oh my!, those words seem scary and intimidating. I just do SAS like video games and that's as far as it goes. My recommendation would be to read anything/all posts by
1. @hashman World's renowned super star who is also known as Paul Dorfman aka Don of the SAS world and a SAS author
2. @data_null__ King of SAS, with a prefix John
3. @FreelanceReinh Kolmogorov reborn - This man is prolly your best bet to offer advice
4. @Tom The sage who transcends generations
5. @Ksharp My mother's favorite
6. @Reeza There isn't a thing that Reeza doesn't know about
7. @PaigeMiller The statsman -champion of Regression,Canonical correlation analysis etc
8. @ballardw The champ of manipulating time series without using INTERVALDS, can custom code anything he likes
9. @mkeintz With him you will never LAG and will always LEAD plus you would SET things conditionally
10. @PGStats First and last captures the heart, and at last Prodigy Genius stats aka Pierre has no Peers
Please reach out to them and hopefully you get started. All the best bro! Have fun learning!
@abdulla wrote:
Thank you very much. Could you give me a video link from which I can learn necessary SAS coding for Finance student
Assuming that the input is sorted by [gvkey,year], you can interleave the files by this key. Then in the first pass through each BY group do the summation and in the second pass decide whether to keep the group and if yes, compute the percentage for the record with rank=1.
In SAS words:
data have ;
input rank tdc1 gvkey year ;
cards ;
1 5237.743 1004 2010
2 2554.467 1004 2010
3 1578.104 1004 2010
5 1314.079 1004 2010
4 1596.902 1004 2010
1 5786.4 1004 2011
2 2781.156 1004 2011
4 1696.431 1004 2011
3 1727.069 1004 2011
1 4182.832 1004 2012
2 2068.554 1004 2012
5 841.204 1004 2012
3 1312.852 1004 2012
4 1131.176 1004 2012
1 3700.925 1013 2010
2 1099.786 1013 2010
5 848.657 1013 2010
3 1067.566 1013 2010
4 941.748 1013 2010
1 5231.393 1045 2010
4 2372.14 1045 2010
5 1776.069 1045 2010
3 2288.908 1045 2010
2 3142.082 1045 2010
;
run ;
data want (drop = _:) ;
set have (in=h) have ;
by gvkey year ;
if h then do ;
if first.year then call missing (_s, _n) ;
_s + tdc1 ;
_n + 1 ;
end ;
else if _n = 5 then do ;
if rank = 1 then tdc1_pct = divide (tdc1, _s) ;
output ;
end ;
run ;
Kind regards
Paul D.
Here's a slight variation on @hashman's. It also uses SET with two HAVE's, but doesn't accumulate TDC1 in the first pass. It retieves the TDC1 total only when PCT is calculated. I offer this just as a specimen of a particular style (using subsetting IF to eliminate the need for an explicit OUTPUT):
data have;
infile cards truncover;
input Rank TDC1 GVKEY Year;* PercentageofTDC1;
cards;
1 5237.743 1004 2010 0.4265
2 2554.467 1004 2010
3 1578.104 1004 2010
5 1314.079 1004 2010
4 1596.902 1004 2010
1 5786.4 1004 2011
2 2781.156 1004 2011
4 1696.431 1004 2011
3 1727.069 1004 2011
1 4182.832 1004 2012 0.4386
2 2068.554 1004 2012
5 841.204 1004 2012
3 1312.852 1004 2012
4 1131.176 1004 2012
1 3700.925 1013 2010 0.4832
2 1099.786 1013 2010
5 848.657 1013 2010
3 1067.566 1013 2010
4 941.748 1013 2010
1 5231.393 1045 2010 0.3532
4 2372.14 1045 2010
5 1776.069 1045 2010
3 2288.908 1045 2010
2 3142.082 1045 2010
;
data want;
set have (in=firstpass) have (in=secondpass);
by gvkey year;
array t {5} _temporary_;
if first.year then call missing(of t{*});
if firstpass then t{rank}=tdc1;
if secondpass=1 and n(of t{*})=5;
if rank=1 then pct=t{1}/sum(of t{*});
run;
Note it doesn't protect against having extra obs for ranks 1 through 5 (i.e. it doesn't count records, it counts distinct ranks). And it relies on data being sorted by GVKEY/YEAR (but not neccessarily RANK).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.