BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Olynn
Calcite | Level 5

I want to create ED social ties dummy variable (education, two individuals are socially connected through their education network if they went to the same school and graduated within one year of each other tie now).  As the first screenshot shown, Is it possible to set the condition that award date 1 = 2, director id 3 NE 4, and company ID 5 = 6? and then create another table that First column is director id1, second column is director id2, the third column is award date, and the final column is ED dummy which equal 1, which means they have social ties (Just like the second screenshot shown). Many thanks for your help.

Olynn_0-1697906342030.png

Olynn_1-1697906638012.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Olynn See if below returns what you're after. I've used this time a SQL because I find it often easier to implement the logic when it comes to joins over date ranges.

If you don't want the last case with companyid 9999 then change the SQL to an inner join - but then column ed_dummy wouldn't make much sense anymore as it would always be 1.

data have;
  infile datalines truncover dsd;
  input companyid $ awarddate:date9. directorid $;
  format awarddate date9.;
  datalines;
111,01jan2017,0001
111,01jan2018,0002
111,01jan2018,0003
111,01jan2023,0004
111,01jan2023,0005
111,01jan2023,0006
222,01jan2023,0007
222,01jan2023,0008
333,01jan2023,9999
;

proc sql;
  select distinct
    l.companyid,

    case
      when l.directorid<r.directorid or missing(r.directorid) then l.directorid
      else r.directorid
      end as directorid_1,
    case
      when l.directorid<r.directorid and not missing(r.directorid) then r.directorid
      when not missing(r.directorid) then l.directorid
      else ' '
      end as directorid_2,

    min(l.awarddate,r.awarddate) as overlapped_awarddate format=date9.,

    not missing(r.directorid) as ed_dummy,

    case
      when l.directorid<r.directorid or missing(r.directorid) then l.awarddate
      else r.awarddate
      end as awarddate_1 format=date9.,
    case
      when l.directorid<r.directorid and not missing(r.directorid) then r.awarddate
      when not missing(r.directorid) then l.awarddate
      else .
      end as awarddate_2 format=date9.

  from have l left join have r
    on 
      l.companyid=r.companyid 
      and l.directorid ne r.directorid
      and l.awarddate between intnx('year',r.awarddate,-1,'s') and r.awarddate

  order by companyid, overlapped_awarddate, directorid_1
  ;
quit;

Patrick_0-1698014921157.png

 

 

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

Going forward please help us help you and provide sample HAVE data via a tested SAS datastep as done in below code, show us the desired result using the sample data and explain the logic to get from have to want.

Screenshots are hard to work with and always require extra effort for the ones answering your questions free of charge which often leads to less people being willing to put in this extra effort.

 

I'm not sure if I understood your question right. If below code doesn't return what you're after then please amend the sample data and show us the desired result.

data have;
  infile datalines truncover dsd;
  input companyid $ awarddate:date9. directorid $;
  format awarddate date9.;
  datalines;
111,01jan2023,0001
111,01jan2023,0001
111,01jan2023,0002
222,01jan2023,0001
;

proc sort data=have out=inter nodupkey;
  by companyid awarddate directorid;
run;

data want(keep=companyid awarddate director_id_: ed_dummy);
  set inter;
  by companyid awarddate;
  array director_id_ {4} $10;
  retain director_id_ ed_dummy;
  if first.awarddate then
    do;
      call missing(_cnt, of director_id_[*]);
      ed_dummy= not last.awarddate;
    end;
  _cnt+1;
  director_id_[_cnt]=directorid;
  if last.awarddate then output;
run;

proc print data=want;
run;

Patrick_0-1697944287684.png

If there can never be more than two directors in the same group then just change in the code the number of array elements from 4 to 2. ....and should there ever be more directors in the same group than defined for the array then you will get an error "array subscript out of range".

 

 

Olynn
Calcite | Level 5

 

 

data have;
  infile datalines truncover dsd;
  input companyid $ awarddate:date9. directorid $;
  format awarddate date9.;
  datalines;
111,01jan2017,0001
111,01jan2018,0002
111,01jan2018,0003
111,01jan2023,0004
111,01jan2023,0005
111,01jan2023,0006
222,01jan2023,0007
222,01jan2023,0008
;

 

 

Hi Patric,

Sorry for providing without specific data set, and make you misunderstand the question. I will use the data set you provide and try to  explain it one more time. 

1. Companyid is the school the directors attain.

2. Awarddate is the date they graduated.

3. Every director has a unique directorid so every ID will be different.

To make a social ties, if two director graduated in same school within one year and share the same companyid, then the  ED dummy will be one. For example, director 0001 and 0002 have social ties (They attain same school and Director 0001 graduation period 01Jan2016~01Jan2017 overlap Director 0002  graduation period 01Jan2017~01Jan2018 on 01Jan2017); Director 0004 and Director 0005 have social ties because their graduate period have overlapped and graduate in same school.

The final form will be like this screenshot.

Olynn_0-1697952332118.png

The solution you provide before is pretty good!

You are so smart and kind.

Thanks for your help.

 

 

Patrick
Opal | Level 21

@Olynn See if below returns what you're after. I've used this time a SQL because I find it often easier to implement the logic when it comes to joins over date ranges.

If you don't want the last case with companyid 9999 then change the SQL to an inner join - but then column ed_dummy wouldn't make much sense anymore as it would always be 1.

data have;
  infile datalines truncover dsd;
  input companyid $ awarddate:date9. directorid $;
  format awarddate date9.;
  datalines;
111,01jan2017,0001
111,01jan2018,0002
111,01jan2018,0003
111,01jan2023,0004
111,01jan2023,0005
111,01jan2023,0006
222,01jan2023,0007
222,01jan2023,0008
333,01jan2023,9999
;

proc sql;
  select distinct
    l.companyid,

    case
      when l.directorid<r.directorid or missing(r.directorid) then l.directorid
      else r.directorid
      end as directorid_1,
    case
      when l.directorid<r.directorid and not missing(r.directorid) then r.directorid
      when not missing(r.directorid) then l.directorid
      else ' '
      end as directorid_2,

    min(l.awarddate,r.awarddate) as overlapped_awarddate format=date9.,

    not missing(r.directorid) as ed_dummy,

    case
      when l.directorid<r.directorid or missing(r.directorid) then l.awarddate
      else r.awarddate
      end as awarddate_1 format=date9.,
    case
      when l.directorid<r.directorid and not missing(r.directorid) then r.awarddate
      when not missing(r.directorid) then l.awarddate
      else .
      end as awarddate_2 format=date9.

  from have l left join have r
    on 
      l.companyid=r.companyid 
      and l.directorid ne r.directorid
      and l.awarddate between intnx('year',r.awarddate,-1,'s') and r.awarddate

  order by companyid, overlapped_awarddate, directorid_1
  ;
quit;

Patrick_0-1698014921157.png

 

 

Olynn
Calcite | Level 5
Many thanks!!!!!!!!!!!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 581 views
  • 1 like
  • 2 in conversation