BookmarkSubscribeRSS Feed
SeanZ
Obsidian | Level 7

Hi, I have a data set of over 10 million observations, which is huge. I only want to choose useful information from it. The rules are fairly complicated. First let me show how the data set looks like

Dataset1

manager    quarter     firm          assets          type          s1          s2          s3                         LINE

100               5          1                 10               1               1          1            1                              1

100               6          2                 5                 1               1          1            2                              2

100               7          1                 12              1               1          2            1                               3

101               5          1                 10               2               1          1            1                              4

101               6          2                10               2               1          1            1                               5

101               6          2                 10               2               1          1            2                              6

101               7          1                 10               2               1          1            1                              7

102               5          1                 10               1               1          1            1                              8

102               6          1                 10               1               1          1            1                              9

Dataset2

firm          qtr1               qtr2

1               5                    6

Variables:

manager: ID number of a manager, I have three manager with ID 100, 101 and 102

quarter: I converted date(quarter) into numbers. 5 means the fifth quarter

firm: ID number for a firm that manager buys.

assets: some variable I want to use to calculate something.

type: type of manager. manager 100 is a type 1 manager. 101 is a type 2 manager. Each manager's type cannot change. Different manager can be the same type.

s1, s2, and s3: Three different firm's type. They are specific to firms and different date(quarter).

qtr1: start quarter for the firm.

qtr2: end quarter for the firm.

Ok. Now I would like to show what I want to accomplish.

The firms in dataset2 are a subset of firms in dataset1.

First, I want to keep all observations that are related to dataset2. By that I mean I want to keep any observations for firm #1 quarter 5 and quarter 6, and firm#2 quarter 6 and quarter 7. AND any observations that have the SAME firms' type s1, s2 and s3 (must match exactly) for the each quarter.

For example. firm#1 in quarter 5 has s1=1, s2=1 and s3=1. In that specific quarter (5), and same firm type s1=s2=s3. Line 5 and 8 have the same value. Therefore, I need to keep line 5 and line 8 besides line 1.

In that sense, for the second observation firm1 with qtr2 (quarter=6). There is only one observation (line5), that meets the criterion. (quarter=6 and s1=s2=s3) besides line 9(original line).

Finally, I will have observations line 1, 5, 8, 9

With these observations, I will do the following job.

For the first observation, firm 1 at quarter 5, I will calculate the mean of asset variable of line 1 and 8, which is (10+10)/2.

For the second observation, firm1 at quarter 6, I will calculate the mean of asset variable of line 5, which is just 5.

Anyone help?

2 REPLIES 2
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

It looks like you have a reasonable spec above, and have not pointed out any specific questions.  Could I suggest you start coding the problem you described above and come back with any specific questions.  I suggest coding a datastep to each row you have written above.

Ksharp
Super User

If I understood what you mean .Obs4 is also what you need ?

data Dataset1;
input manager    quarter     firm          assets          type          s1          s2          s3                         LINE ;
cards;
100               5          1                 10               1               1          1            1                              1
100               6          2                 5                 1               1          1            2                              2
100               7          1                 12              1               1          2            1                               3
101               5          1                 10               2               1          1            1                              4
101               6          2                10               2               1          1            1                               5
101               6          2                 10               2               1          1            2                              6
101               7          1                 10               2               1          1            1                              7
102               5          1                 10               1               1          1            1                              8
102               6          1                 10               1               1          1            1                              9
;
run;
data Dataset2 ;
input firm          qtr1               qtr2     ;
cards;
1               5                    6
2 6 7
;
run;
data key(drop=qtr1 qtr2);
 set dataset2;
 quarter=qtr1;output;
 quarter=qtr2;output;
run;
data want;
 if _n_ eq 1 then do;
  if 0 then set key;
  declare hash ha(dataset:'key');
   ha.definekey(all:'y');
   ha.definedone();
end;
set dataset1;
obs+1;
if ha.check()=0 and range(of s:)=0;
run;




Xia Keshan

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 355 views
  • 0 likes
  • 3 in conversation