/* I have a dataset with 4 data columns, COL 1: firm identifier 1 COL 2: worker identifier 1 (period 1), COL3: firms identifer 2, COL 4 worker identifier2 (period 2).
It could look like this for 5 firms and 100 workers:
*/
data datain;
input Firmid1 $ w1 $ Firmid2 $ w2 $;
datalines;
030 32 010 11
030 45 010 79
030 64 020 05
040 07 030 21
040 21 030 32
040 72 030 45
050 12 040 07
050 13 040 72
050 87 040 81
050 99 050 13
050 98
;
/* Note that worker 21 switches from firm 4 to firm 3. Worker 64 i only employed in period 1 and worker 79 is only employed in period 2, and firms 1 and 2 do not exist in period 1.
I want to do the following: pick the firm indicator in period 1 (COL1) and count how many workers are attached to that firm in period 1, then go check for the existence of the same firm identifier in COL 3 and look in column 4 how many workers are in this particular firm in both periods.
The output of this routine would be data looking like this: firm i.d., number of workers in period 1, number of workers in period 2, number of workers that stayed in the firm from period 1 to period 2. For the example above (where we are counting also from the perspective of column 3 if we can't find the firm in column 1) we get
*/
data dataout;
input Firmid $ n1 $ n2 $ nstay $;
datalines;
010 0 2 0
020 0 2 0
030 3 3 2
040 3 3 2
050 4 2 1
;
/*
It is easy to do a frequency table for the periods separately. It is the counting of workers that remain across periods that is causing me trouble.
Regards
J
PS (SAS version 9.4)
*/
Hello @ejarquejm and welcome to the SAS Support Communities!
You can derive n1, n2 and (in particular) nstay with PROC SQL queries and join the three resulting tables:
proc sql;
create table dataout as
select firmid1 as firmid, max(n1,0) as n1, max(n2,0) as n2, max(nstay,0) as nstay
from ((select firmid1, count(distinct w1) as n1
from datain
where firmid1
group by firmid1)
natural full join
(select firmid2 as firmid1, count(distinct w2) as n2
from datain
where firmid2
group by firmid2))
natural left join
(select firmid1, count(w1) as nstay from
(select *
from (select firmid1, w1 from datain)
intersect
(select firmid2, w2 from datain))
group by firmid1);
quit;
I found it more natural to create n1, n2 and nstay as numeric variables, but feel free to convert them to character in the SELECT statement (using the PUT function), if needed.
Note that n2=1 (not 2) for firmid '020'.
Hello @ejarquejm and welcome to the SAS Support Communities!
You can derive n1, n2 and (in particular) nstay with PROC SQL queries and join the three resulting tables:
proc sql;
create table dataout as
select firmid1 as firmid, max(n1,0) as n1, max(n2,0) as n2, max(nstay,0) as nstay
from ((select firmid1, count(distinct w1) as n1
from datain
where firmid1
group by firmid1)
natural full join
(select firmid2 as firmid1, count(distinct w2) as n2
from datain
where firmid2
group by firmid2))
natural left join
(select firmid1, count(w1) as nstay from
(select *
from (select firmid1, w1 from datain)
intersect
(select firmid2, w2 from datain))
group by firmid1);
quit;
I found it more natural to create n1, n2 and nstay as numeric variables, but feel free to convert them to character in the SELECT statement (using the PUT function), if needed.
Note that n2=1 (not 2) for firmid '020'.
brilliant. i will give it a go and get back with the result.
It of course works. Thank you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.