Conditional programming (tricky example)

zishaq · Posted 10-27-2020 08:51 AM

Hi,

I have a very specific example I am trying to work on which I think may involve complex conditional processing. I have the following dataset:

data test;
input individual $ first_one last_one total_length sector $;
datalines;
 A 1 72 72 Sports
 A 3 60 58 Care
 A 50 55 6 Sports
 A 51 71 21 Teaching
 B 5 40 36 Unknown
 B 7 47 41 Teaching
 B 30 47 11 Unknown
 B 49 52 4 Care
 C 50 52 3 Sports
 C 59 66 8 Sports
;

- Individual is unique for each person (A,B and C are the three individuals in this example)

- First_one is the month in which the person started their job

- Last_one is the month in which the person finished their job

- Total_length is the difference between the first_one and last_one (plus 1 to include their end month)

- Sector is the area in which the person works in for each job

Task

Effectively for each individual (A,B and C in this example), where the 'first_one' is between 49 and 60 (i.e. job started in between these months), I want to assign these rows to a new variable called 'category'. The 'category' variable should take the following 4 values:

'No previous record'

'Gap longer than 3 months'

'Gap less than 3 months - different sector'

'Gap less than 3 months - same sector'.

The idea is that I am trying to compare a job with another but the added complexity is determining which job I choose as the comparator.

Below is a set of rules that I want to apply for each job that starts between months 49 and 60 (note that an individual can have multiple jobs that start in these months so they will need to go through the following process separately) for determining the 'comparator' job (i.e. the job we want to compare).

Are there any jobs running concurrently to the job start (i.e. started earlier, but seen as active in the same month as the job start)?

IF YES (and only one concurrent job) – this is the comparator job
IF YES (and multiple concurrent jobs) – whichever of the concurrent jobs has the latest start month is the comparator job
IF YES (and all the multiple concurrent jobs have the same start month) – the remaining tie decider should be longest job length to decide comparator

IF NO – are there any jobs with latest recorded activity within 3 months prior to the job start?

IF YES (and only one job end in the 3 month window) – that is the comparator job
IF YES (and multiple job ends in the 3 month window) – whichever job has the latest end date would be the comparator job
IF YES (and multiple jobs in the 3 month window with the same end month) – the remaining tie decider should be longest job length to decide comparator

If none of the above get a YES then is there any previous recorded job history?

IF YES - goes in ‘employment gap longer than 3 months’ category
IF NO – goes in ‘no previous record’ category

The final output for the sample dataset I created above should look like this:

data outcome;
input individual $ first_one last_one total_length sector $ category $;
datalines;
A 1 72 72 Sports .
A 1 60 58 Care .
A 50 55 6 Sports 'Gap less than 3 months - same sector'
A 51 71 21 Teaching 'Gap less than 3 months - different sector'
B 5 40 36 Unknown .
B 7 47 41 Teaching .
B 30 47 11 Unknown .
B 49 52 4 Care . 'Gap less than 3 months - different sector'
C 50 52 3 Sports 'No previous employment'
C 59 66 8 Sports 'Gap longer than 3 months'
;

For individual A, row 3 in dataset has been assigned a category 'Gap less than 3 months - same sector' as the comparator job based on the rules above were row 1 (row 1 and row 2 are concurrent jobs to row 3 and both start in the same month, however, row 1 has the longer job length).

For individual A, row 4 in dataset has been assigned a category 'Gap less than 3 months - different sector' as the comparator job based on the rules above were row 3 (rows 1,2 and 3 all run concurrently with row 4, however they don't share the same start month so we choose row 3 as this has the latest start month).

For individual B, row 8 in dataset is the only job which started in months between 49 and 60. So the comparator job for this job is row 6 and it should be assigned the value 'Gap less than 3 months - different sector' (neither rows 5, 6 or 7 run concurrently with row 8, however, rows 6 and 7 started within three months of row 8. As both rows 6 and 7 both have the same end month, we choose the row 6 as it has the longer job length).

For individual C, row 9 in dataset is assigned 'No previous employment' as there are no jobs for this person before they started this job. Row 10 is assigned 'Gap longer than 3 months' as we compare this job to row 9 and we realise their is a longer gap than 3 months.

Appreciate this is a complex example so happy to take any tips!

Thanks

ChrisNZ · Posted 10-29-2020 11:31 PM

Why nothing for A row 2?

High-Performance SAS Coding - Third Edition

zishaq · Posted 10-30-2020 05:04 AM

hi, because the job didn't start in between 49 and 60

ChrisNZ · Posted 11-01-2020 08:26 PM

> row 1 and row 2 are concurrent jobs to row 3 and both start in the same month

No they're not. ROW 1 start on 1 and ROW 3 start on 3.

Unsure what exactly you want, as some details are still unclear, but his should be pretty close, and you can tweak to match your exact requirements.

data HAVE;
  input INDIVIDUAL $ FIRST_ONE LAST_ONE TOTAL_LENGTH SECTOR $;   
  ROW=_N_;
cards;
 A 1 72 72 Sports
 A 3 60 58 Care
 A 50 55 6 Sports
 A 51 71 21 Teaching
 B 5 40 36 Unknown
 B 7 47 41 Teaching
 B 30 47 11 Unknown
 B 49 52 4 Care
 C 50 52 3 Sports
 C 59 66 8 Sports
;

proc sql;
  create table JOIN as
  select a.* 
        ,b.FIRST_ONE as F, b.LAST_ONE as L, b.TOTAL_LENGTH as T, b.SECTOR as S
        ,(a.FIRST_ONE between b.FIRST_ONE and b.LAST_ONE)  as OVERLAP
        ,(a.FIRST_ONE-b.LAST_ONE between 1 and 3)          as THREE_MONTH_GAP
        ,a.FIRST_ONE-b.LAST_ONE                            as DIFF                                  
        ,b.ROW                                             as R
  from HAVE a
         left join 
       HAVE b
         on  a.FIRST_ONE  between 49 and 60
         and a.INDIVIDUAL eq b.INDIVIDUAL 
         and a.ROW        ne b.ROW
  order by a.INDIVIDUAL, a.FIRST_ONE, a.LAST_ONE 
         , calculated OVERLAP descending 
         , b.FIRST_ONE descending, b.TOTAL_LENGTH descending      %* order for overlap;
         , calculated THREE_MONTH_GAP descending
         , b.LAST_ONE descending, b.TOTAL_LENGTH descending       %* order 3-month gap;
  ;
quit;

data WANT;
  set JOIN;
  by ROW;
  if first.ROW;
  CATEGORY=ifc(DIFF eq .                  , '-                                        '
          ,ifc(OVERLAP         & SECTOR =S, 'Gap less than 3 months - same sector     '
          ,ifc(OVERLAP         & SECTOR^=S, 'Gap less than 3 months - different sector'
          ,ifc(THREE_MONTH_GAP & SECTOR =S, 'Gap less than 3 months - same sector     '
          ,ifc(THREE_MONTH_GAP & SECTOR =S, 'Gap less than 3 months - different sector'
          ,ifc(DIFF > 0                   , 'Gap longer than 3 months                 '
          ,                                 'No previous employment                   '))))));
 * keep FIRST_ONE LAST_ONE TOTAL_LENGTH CATEGORY;
run;

High-Performance SAS Coding - Third Edition

Conditional programming (tricky example)

Re: Conditional programming (tricky example)

Re: Conditional programming (tricky example)

Re: Conditional programming (tricky example)

Registration is open

Conditional programming (tricky example)

Re: Conditional programming (tricky example)

Re: Conditional programming (tricky example)

Re: Conditional programming (tricky example)

Registration is open

SAS Training: Just a Click Away