I want to create ED social ties dummy variable (education, two individuals are socially connected through their education network if they went to the same school and graduated within one year of each other tie now). As the first screenshot shown, Is it possible to set the condition that award date 1 = 2, director id 3 NE 4, and company ID 5 = 6? and then create another table that First column is director id1, second column is director id2, the third column is award date, and the final column is ED dummy which equal 1, which means they have social ties (Just like the second screenshot shown). Many thanks for your help.
@Olynn See if below returns what you're after. I've used this time a SQL because I find it often easier to implement the logic when it comes to joins over date ranges.
If you don't want the last case with companyid 9999 then change the SQL to an inner join - but then column ed_dummy wouldn't make much sense anymore as it would always be 1.
data have;
infile datalines truncover dsd;
input companyid $ awarddate:date9. directorid $;
format awarddate date9.;
datalines;
111,01jan2017,0001
111,01jan2018,0002
111,01jan2018,0003
111,01jan2023,0004
111,01jan2023,0005
111,01jan2023,0006
222,01jan2023,0007
222,01jan2023,0008
333,01jan2023,9999
;
proc sql;
select distinct
l.companyid,
case
when l.directorid<r.directorid or missing(r.directorid) then l.directorid
else r.directorid
end as directorid_1,
case
when l.directorid<r.directorid and not missing(r.directorid) then r.directorid
when not missing(r.directorid) then l.directorid
else ' '
end as directorid_2,
min(l.awarddate,r.awarddate) as overlapped_awarddate format=date9.,
not missing(r.directorid) as ed_dummy,
case
when l.directorid<r.directorid or missing(r.directorid) then l.awarddate
else r.awarddate
end as awarddate_1 format=date9.,
case
when l.directorid<r.directorid and not missing(r.directorid) then r.awarddate
when not missing(r.directorid) then l.awarddate
else .
end as awarddate_2 format=date9.
from have l left join have r
on
l.companyid=r.companyid
and l.directorid ne r.directorid
and l.awarddate between intnx('year',r.awarddate,-1,'s') and r.awarddate
order by companyid, overlapped_awarddate, directorid_1
;
quit;
Going forward please help us help you and provide sample HAVE data via a tested SAS datastep as done in below code, show us the desired result using the sample data and explain the logic to get from have to want.
Screenshots are hard to work with and always require extra effort for the ones answering your questions free of charge which often leads to less people being willing to put in this extra effort.
I'm not sure if I understood your question right. If below code doesn't return what you're after then please amend the sample data and show us the desired result.
data have;
infile datalines truncover dsd;
input companyid $ awarddate:date9. directorid $;
format awarddate date9.;
datalines;
111,01jan2023,0001
111,01jan2023,0001
111,01jan2023,0002
222,01jan2023,0001
;
proc sort data=have out=inter nodupkey;
by companyid awarddate directorid;
run;
data want(keep=companyid awarddate director_id_: ed_dummy);
set inter;
by companyid awarddate;
array director_id_ {4} $10;
retain director_id_ ed_dummy;
if first.awarddate then
do;
call missing(_cnt, of director_id_[*]);
ed_dummy= not last.awarddate;
end;
_cnt+1;
director_id_[_cnt]=directorid;
if last.awarddate then output;
run;
proc print data=want;
run;
If there can never be more than two directors in the same group then just change in the code the number of array elements from 4 to 2. ....and should there ever be more directors in the same group than defined for the array then you will get an error "array subscript out of range".
data have;
infile datalines truncover dsd;
input companyid $ awarddate:date9. directorid $;
format awarddate date9.;
datalines;
111,01jan2017,0001
111,01jan2018,0002
111,01jan2018,0003
111,01jan2023,0004
111,01jan2023,0005
111,01jan2023,0006
222,01jan2023,0007
222,01jan2023,0008
;
Hi Patric,
Sorry for providing without specific data set, and make you misunderstand the question. I will use the data set you provide and try to explain it one more time.
1. Companyid is the school the directors attain.
2. Awarddate is the date they graduated.
3. Every director has a unique directorid so every ID will be different.
To make a social ties, if two director graduated in same school within one year and share the same companyid, then the ED dummy will be one. For example, director 0001 and 0002 have social ties (They attain same school and Director 0001 graduation period 01Jan2016~01Jan2017 overlap Director 0002 graduation period 01Jan2017~01Jan2018 on 01Jan2017); Director 0004 and Director 0005 have social ties because their graduate period have overlapped and graduate in same school.
The final form will be like this screenshot.
The solution you provide before is pretty good!
You are so smart and kind.
Thanks for your help.
@Olynn See if below returns what you're after. I've used this time a SQL because I find it often easier to implement the logic when it comes to joins over date ranges.
If you don't want the last case with companyid 9999 then change the SQL to an inner join - but then column ed_dummy wouldn't make much sense anymore as it would always be 1.
data have;
infile datalines truncover dsd;
input companyid $ awarddate:date9. directorid $;
format awarddate date9.;
datalines;
111,01jan2017,0001
111,01jan2018,0002
111,01jan2018,0003
111,01jan2023,0004
111,01jan2023,0005
111,01jan2023,0006
222,01jan2023,0007
222,01jan2023,0008
333,01jan2023,9999
;
proc sql;
select distinct
l.companyid,
case
when l.directorid<r.directorid or missing(r.directorid) then l.directorid
else r.directorid
end as directorid_1,
case
when l.directorid<r.directorid and not missing(r.directorid) then r.directorid
when not missing(r.directorid) then l.directorid
else ' '
end as directorid_2,
min(l.awarddate,r.awarddate) as overlapped_awarddate format=date9.,
not missing(r.directorid) as ed_dummy,
case
when l.directorid<r.directorid or missing(r.directorid) then l.awarddate
else r.awarddate
end as awarddate_1 format=date9.,
case
when l.directorid<r.directorid and not missing(r.directorid) then r.awarddate
when not missing(r.directorid) then l.awarddate
else .
end as awarddate_2 format=date9.
from have l left join have r
on
l.companyid=r.companyid
and l.directorid ne r.directorid
and l.awarddate between intnx('year',r.awarddate,-1,'s') and r.awarddate
order by companyid, overlapped_awarddate, directorid_1
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Ready to level-up your skills? Choose your own adventure.