Descriptive Statistics in large database

Frequent Contributor
Posts: 75

Descriptive Statistics in large database

Hi all,

I would like to obtain descriptive statistics for a large database (I am attaching a small sample) per quarter for all the investors and also for every type of investor seperately( type code: 1-5). In addition, I need descriptive for various thresholds (securities traded by more that 1,5,10 and 20 investors). What I need is the following:

Average All periods                    Average(1980-4 till 1981-3)

Panel A: Number of Investors

Total :

1  :

2  :

3  :

4  :

5  :

Panel B: Average number of securities with:

≥ 1 investors  :

≥ 5 investors  :

≥ 10 investors:

≥ 20 investors:

≥ 5 (1)  :

≥ 5(2)   :

≥ 5(3)   :

≥ 5(4)   :

≥ 5 (5)  :

Panel C: Average number of securities held by each manager:

All institutions  :

1  :

2  :

3  :

4  :

5 :

Could anyone help on this? Many thanks in advance!

Posts: 4,736

Re: Descriptive Statistics in large database

Normally people here are very open to help if they see that you've already done some work. You would therefore need to post some code and tell us where you got stuck.

What's the database and what means "large". I'm asking this question because if "large" is "really large" then one has also to consider performance meaning that it's important to try and push execution to the database.

The stuff you're asking for: Proc SQL, Proc Univariate and Proc Freq come to mind. Proc Univariate and Proc Freq run for a lot of databases "in-database".

Frequent Contributor
Posts: 75

Re: Descriptive Statistics in large database

Hi Patrick,

The database is portfolio holdings of institutional investors and is approximately 5 gb.

I managed to do all the descriptives but the ones for the thresholds. For example the number of stocks traded per quarter by more than 1, 5,10,20 investors. At the column with label "position" there is either  '+','-' or '0', depending on whether the investor's position on the security has increased, decreased or remained constant. So, in order let's say to calculate the average number of stocks per quarter traded by  ≥1 investor, then it should have at least one '+' or '-'.

To calculate the average number of stocks per quarter traded by  ≥5 investor, then it should have at least five '+' or '-' in total (could be five '-' or five '+').

I am not that experienced in SAS; usually I run these calculations in excel but due to the large size of data I have to do it in SAS. i would be grateful if someone could help.

Posts: 4,736

Re: Descriptive Statistics in large database

You need to give it a go first and then post the code you can come up with telling us what's not working. This way we will be able to support you.

Just writing all the code for you from scratch bears the risk that you'll get something you don't understand and possibly doesn't do 100% what you need. But then as you don't understand the code you won't be able to amend it and continue asking for more and more which is frustrating for everybody.

If you start with a piece of code which needs fixing then people also understand on which skill level you are and give you appropriate help.

What works best normally is a data step creating a work table with sample data, then a description what you need (as accurate as possible, may be also posting a screenshot or mocking up a report output), and the code you've already developed demonstrating where you are and where you're stuck.

Discussion stats
• 3 replies
• 218 views
• 0 likes
• 2 in conversation