BookmarkSubscribeRSS Feed
zishaq
Obsidian | Level 7

Hi,

 

I have a very specific example I am trying to work on which I think may involve complex conditional processing. I have the following dataset:

 

data test;
input individual $ first_one last_one total_length sector $;
datalines;
 A 1 72 72 Sports
 A 3 60 58 Care
 A 50 55 6 Sports
 A 51 71 21 Teaching
 B 5 40 36 Unknown
 B 7 47 41 Teaching
 B 30 47 11 Unknown
 B 49 52 4 Care
C 50 52 3 Sports
C 59 66 8 Sports ;

 

- Individual is unique for each person (A,B and C are the three individuals in this example)

- First_one is the month in which the person started their job

- Last_one is the month in which the person finished their job

- Total_length is the difference between the first_one and last_one (plus 1 to include their end month)

- Sector is the area in which the person works in for each job

 

Task

Effectively for each individual (A,B and C in this example), where the 'first_one' is between 49 and 60 (i.e. job started in between these months), I want to assign these rows to a new variable called 'category'. The 'category' variable should take the following 4 values:

'No previous record'

'Gap longer than 3 months'

'Gap less than 3 months - different sector'

'Gap less than 3 months - same sector'.

 

The idea is that I am trying to compare a job with another but the added complexity is determining which job I choose as the comparator. 

 

Below is a set of rules that I want to apply for each job that starts between months 49 and 60 (note that an individual can have multiple jobs that start in these months so they will need to go through the following process separately) for determining the 'comparator' job (i.e. the job we want to compare).

 

    • Are there any jobs running concurrently to the job start (i.e. started earlier, but seen as active in the same month as the job start)?
      • IF YES (and only one concurrent job) – this is the comparator job
      • IF YES (and multiple concurrent jobs) – whichever of the concurrent jobs has the latest start month is the comparator job
      • IF YES (and all the multiple concurrent jobs have the same start month) – the remaining tie decider should be longest job length to decide comparator
    • IF NO – are there any jobs with latest recorded activity within 3 months prior to the job start?
      • IF YES (and only one job end in the 3 month window) – that is the comparator job
      • IF YES (and multiple job ends in the 3 month window) – whichever job has the latest end date would be the comparator job 
      • IF YES (and multiple jobs in the 3 month window with the same end month) – the remaining tie decider should be longest job length to decide comparator
    • If none of the above get a YES then is there any previous recorded job history?
      • IF YES - goes in ‘employment gap longer than 3 months’ category
      • IF NO – goes in ‘no previous record’ category

The final output for the sample dataset I created above should look like this:

 

 

data outcome;
input individual $ first_one last_one total_length sector $ category $;
datalines;
A 1 72 72 Sports .
A 1 60 58 Care .
A 50 55 6 Sports 'Gap less than 3 months - same sector'
A 51 71 21 Teaching 'Gap less than 3 months - different sector'
B 5 40 36 Unknown .
B 7 47 41 Teaching .
B 30 47 11 Unknown .
B 49 52 4 Care . 'Gap less than 3 months - different sector'
C 50 52 3 Sports 'No previous employment'
C 59 66 8 Sports 'Gap longer than 3 months' ;

For individual A, row 3 in dataset has been assigned a category 'Gap less than 3 months - same sector' as the comparator job based on the rules above were row 1 (row 1 and row 2 are concurrent jobs to row 3 and both start in the same month, however, row 1 has the longer job length).

 

For individual A, row 4 in dataset has been assigned a category 'Gap less than 3 months - different sector'  as the comparator job based on the rules above were row 3 (rows 1,2 and 3 all run concurrently with row 4, however they don't share the same start month so we choose row 3 as this has the latest start month).

For individual B, row 8 in dataset is the only job which started in months between 49 and 60. So the comparator job for this job is row 6 and it should be assigned the value 'Gap less than 3 months - different sector' (neither rows 5, 6 or 7 run concurrently with row 8, however, rows 6 and 7 started within three months of row 8. As both rows 6 and 7 both have the same end month, we choose the row 6 as it has the longer job length).

For individual C, row 9 in dataset is assigned 'No previous employment' as there are no jobs for this person before they started this job. Row 10 is assigned 'Gap longer than 3 months' as we compare this job to row 9 and we realise their is a longer gap than 3 months.

 

Appreciate this is a complex example so happy to take any tips!

 

Thanks

3 REPLIES 3
zishaq
Obsidian | Level 7
hi, because the job didn't start in between 49 and 60
ChrisNZ
Tourmaline | Level 20

> row 1 and row 2 are concurrent jobs to row 3 and both start in the same month

No they're not. ROW 1 start on 1 and ROW 3 start on 3.

 

Unsure what exactly you want, as some details are still unclear, but his should be pretty close, and you can tweak to match your exact requirements.

data HAVE;
  input INDIVIDUAL $ FIRST_ONE LAST_ONE TOTAL_LENGTH SECTOR $;   
  ROW=_N_;
cards;
 A 1 72 72 Sports
 A 3 60 58 Care
 A 50 55 6 Sports
 A 51 71 21 Teaching
 B 5 40 36 Unknown
 B 7 47 41 Teaching
 B 30 47 11 Unknown
 B 49 52 4 Care
 C 50 52 3 Sports
 C 59 66 8 Sports
;

proc sql;
  create table JOIN as
  select a.* 
        ,b.FIRST_ONE as F, b.LAST_ONE as L, b.TOTAL_LENGTH as T, b.SECTOR as S
        ,(a.FIRST_ONE between b.FIRST_ONE and b.LAST_ONE)  as OVERLAP
        ,(a.FIRST_ONE-b.LAST_ONE between 1 and 3)          as THREE_MONTH_GAP
        ,a.FIRST_ONE-b.LAST_ONE                            as DIFF                                  
        ,b.ROW                                             as R
  from HAVE a
         left join 
       HAVE b
         on  a.FIRST_ONE  between 49 and 60
         and a.INDIVIDUAL eq b.INDIVIDUAL 
         and a.ROW        ne b.ROW
  order by a.INDIVIDUAL, a.FIRST_ONE, a.LAST_ONE 
         , calculated OVERLAP descending 
         , b.FIRST_ONE descending, b.TOTAL_LENGTH descending      %* order for overlap;
         , calculated THREE_MONTH_GAP descending
         , b.LAST_ONE descending, b.TOTAL_LENGTH descending       %* order 3-month gap;
  ;
quit;

data WANT;
  set JOIN;
  by ROW;
  if first.ROW;
  CATEGORY=ifc(DIFF eq .                  , '-                                        '
          ,ifc(OVERLAP         & SECTOR =S, 'Gap less than 3 months - same sector     '
          ,ifc(OVERLAP         & SECTOR^=S, 'Gap less than 3 months - different sector'
          ,ifc(THREE_MONTH_GAP & SECTOR =S, 'Gap less than 3 months - same sector     '
          ,ifc(THREE_MONTH_GAP & SECTOR =S, 'Gap less than 3 months - different sector'
          ,ifc(DIFF > 0                   , 'Gap longer than 3 months                 '
          ,                                 'No previous employment                   '))))));
 * keep FIRST_ONE LAST_ONE TOTAL_LENGTH CATEGORY;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1050 views
  • 0 likes
  • 2 in conversation