/* I have a dataset with 4 data columns, COL 1: firm identifier 1 COL 2: worker identifier 1 (period 1), COL3: firms identifer 2, COL 4 worker identifier2 (period 2). It could look like this for 5 firms and 100 workers: */ data datain; input Firmid1 $ w1 $ Firmid2 $ w2 $; datalines; 030 32 010 11 030 45 010 79 030 64 020 05 040 07 030 21 040 21 030 32 040 72 030 45 050 12 040 07 050 13 040 72 050 87 040 81 050 99 050 13 050 98 ; /* Note that worker 21 switches from firm 4 to firm 3. Worker 64 i only employed in period 1 and worker 79 is only employed in period 2, and firms 1 and 2 do not exist in period 1. I want to do the following: pick the firm indicator in period 1 (COL1) and count how many workers are attached to that firm in period 1, then go check for the existence of the same firm identifier in COL 3 and look in column 4 how many workers are in this particular firm in both periods. The output of this routine would be data looking like this: firm i.d., number of workers in period 1, number of workers in period 2, number of workers that stayed in the firm from period 1 to period 2. For the example above (where we are counting also from the perspective of column 3 if we can't find the firm in column 1) we get */ data dataout; input Firmid $ n1 $ n2 $ nstay $; datalines; 010 0 2 0 020 0 2 0 030 3 3 2 040 3 3 2 050 4 2 1 ; /* It is easy to do a frequency table for the periods separately. It is the counting of workers that remain across periods that is causing me trouble. Regards J PS (SAS version 9.4) */
... View more