Hi,
I have a very specific example I am trying to work on which I think may involve complex conditional processing. I have the following dataset:
data test; input individual $ first_one last_one total_length sector $; datalines; A 1 72 72 Sports A 3 60 58 Care A 50 55 6 Sports A 51 71 21 Teaching B 5 40 36 Unknown B 7 47 41 Teaching B 30 47 11 Unknown B 49 52 4 Care
C 50 52 3 Sports
C 59 66 8 Sports ;
- Individual is unique for each person (A,B and C are the three individuals in this example)
- First_one is the month in which the person started their job
- Last_one is the month in which the person finished their job
- Total_length is the difference between the first_one and last_one (plus 1 to include their end month)
- Sector is the area in which the person works in for each job
Task
Effectively for each individual (A,B and C in this example), where the 'first_one' is between 49 and 60 (i.e. job started in between these months), I want to assign these rows to a new variable called 'category'. The 'category' variable should take the following 4 values:
'No previous record'
'Gap longer than 3 months'
'Gap less than 3 months - different sector'
'Gap less than 3 months - same sector'.
The idea is that I am trying to compare a job with another but the added complexity is determining which job I choose as the comparator.
Below is a set of rules that I want to apply for each job that starts between months 49 and 60 (note that an individual can have multiple jobs that start in these months so they will need to go through the following process separately) for determining the 'comparator' job (i.e. the job we want to compare).
The final output for the sample dataset I created above should look like this:
data outcome; input individual $ first_one last_one total_length sector $ category $; datalines; A 1 72 72 Sports . A 1 60 58 Care . A 50 55 6 Sports 'Gap less than 3 months - same sector' A 51 71 21 Teaching 'Gap less than 3 months - different sector' B 5 40 36 Unknown . B 7 47 41 Teaching . B 30 47 11 Unknown . B 49 52 4 Care . 'Gap less than 3 months - different sector'
C 50 52 3 Sports 'No previous employment'
C 59 66 8 Sports 'Gap longer than 3 months' ;
For individual A, row 3 in dataset has been assigned a category 'Gap less than 3 months - same sector' as the comparator job based on the rules above were row 1 (row 1 and row 2 are concurrent jobs to row 3 and both start in the same month, however, row 1 has the longer job length).
For individual A, row 4 in dataset has been assigned a category 'Gap less than 3 months - different sector' as the comparator job based on the rules above were row 3 (rows 1,2 and 3 all run concurrently with row 4, however they don't share the same start month so we choose row 3 as this has the latest start month).
For individual B, row 8 in dataset is the only job which started in months between 49 and 60. So the comparator job for this job is row 6 and it should be assigned the value 'Gap less than 3 months - different sector' (neither rows 5, 6 or 7 run concurrently with row 8, however, rows 6 and 7 started within three months of row 8. As both rows 6 and 7 both have the same end month, we choose the row 6 as it has the longer job length).
For individual C, row 9 in dataset is assigned 'No previous employment' as there are no jobs for this person before they started this job. Row 10 is assigned 'Gap longer than 3 months' as we compare this job to row 9 and we realise their is a longer gap than 3 months.
Appreciate this is a complex example so happy to take any tips!
Thanks
Why nothing for A row 2?
> row 1 and row 2 are concurrent jobs to row 3 and both start in the same month
No they're not. ROW 1 start on 1 and ROW 3 start on 3.
Unsure what exactly you want, as some details are still unclear, but his should be pretty close, and you can tweak to match your exact requirements.
data HAVE;
input INDIVIDUAL $ FIRST_ONE LAST_ONE TOTAL_LENGTH SECTOR $;
ROW=_N_;
cards;
A 1 72 72 Sports
A 3 60 58 Care
A 50 55 6 Sports
A 51 71 21 Teaching
B 5 40 36 Unknown
B 7 47 41 Teaching
B 30 47 11 Unknown
B 49 52 4 Care
C 50 52 3 Sports
C 59 66 8 Sports
;
proc sql;
create table JOIN as
select a.*
,b.FIRST_ONE as F, b.LAST_ONE as L, b.TOTAL_LENGTH as T, b.SECTOR as S
,(a.FIRST_ONE between b.FIRST_ONE and b.LAST_ONE) as OVERLAP
,(a.FIRST_ONE-b.LAST_ONE between 1 and 3) as THREE_MONTH_GAP
,a.FIRST_ONE-b.LAST_ONE as DIFF
,b.ROW as R
from HAVE a
left join
HAVE b
on a.FIRST_ONE between 49 and 60
and a.INDIVIDUAL eq b.INDIVIDUAL
and a.ROW ne b.ROW
order by a.INDIVIDUAL, a.FIRST_ONE, a.LAST_ONE
, calculated OVERLAP descending
, b.FIRST_ONE descending, b.TOTAL_LENGTH descending %* order for overlap;
, calculated THREE_MONTH_GAP descending
, b.LAST_ONE descending, b.TOTAL_LENGTH descending %* order 3-month gap;
;
quit;
data WANT;
set JOIN;
by ROW;
if first.ROW;
CATEGORY=ifc(DIFF eq . , '- '
,ifc(OVERLAP & SECTOR =S, 'Gap less than 3 months - same sector '
,ifc(OVERLAP & SECTOR^=S, 'Gap less than 3 months - different sector'
,ifc(THREE_MONTH_GAP & SECTOR =S, 'Gap less than 3 months - same sector '
,ifc(THREE_MONTH_GAP & SECTOR =S, 'Gap less than 3 months - different sector'
,ifc(DIFF > 0 , 'Gap longer than 3 months '
, 'No previous employment '))))));
* keep FIRST_ONE LAST_ONE TOTAL_LENGTH CATEGORY;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.