Hello all,
I would like to calculate durations (jobyrout-jobyrin) for overlap and non overlap years without using macro.
The macro didn’t work well with large datasets having long arrays; it worked for a sample (small) dataset, so, I am obliged to find another method that will achieve the same aim, even if the method is long, I would prefer that to macro.
I would appreciate if someone knows of any other method to achieve my aim. The dataset, my objectives and expected output are given below.
I really need help; I have no clue, please.
Thanks in advance for your help.
/**Data set*/
data full_tdata;
input id$ 1-6 job 7-8 jobyrin 9-11 jobyrout 18-20 icode$ 23-27 lung$ 29-35;
datalines;
OSa13 3 73 78 6191 cacase
OSa30 1 39 46 7181 cacase
OSa30 3 56 64 6191 cacase
OSa73 1 23 31 7181 popcon
OSa73 2 31 42 5130 popcon
OSf26 2 70 70 6191 popcon
OSj17 3 60 75 6191 cacase
OSj17 4 65 70 6191 cacase
OSj17 3 60 75 6191 cacase
OSj17 4 65 70 6191 cacase
;
Objectives
1) No overlap at all ie. .ids osa13 (duration=78-73=5) and osa30 (duration=46-39=7 for icode7181 and 64-56=8 for icode 6191).
2) The END of the previous matches the START of the current ie. id osa73 ; duration=31-23=8 for icode 7181 and 42-31=11 for icode 5130.
3) The START and END are the same ie. osf26: duration is automatically assigned 0.5 (70 to 70, not 0 but 0.5).
4) The START AND END of current is between START and END of previous. Eg. osj 17 jobs 3 and 4.; duration will be from 60 to 75 (ie. 75-60=15).
Expected output:
id janitor_dur janitor_ever baker_dur baker_ever plumber_dur plumber_ever
osa13 5 1 0 0 0 0
osa30 8 1 7 1 0 0
osa73 0 0 8 1 11 1
osf26 0.5 1 0 0 0 0
osj17 15 1 0 0 0 0
Icode 6191=janitor: if janitor(6191) is present (janitor_ever=1, otherwise janitor_ever=0) ; janitor_dur is the duration (jobyrout-jobyrin); icode 7181=baker, 5130=plumber.
1) No overlap at all ie. .ids osa13 (duration=78-73=5) and osa30 (duration=46-39=7 for icode7181 and 64-56=8 for icode 6191).
2) The END of the previous matches the START of the current ie. id osa73 ; duration=31-23=8 for icode 7181 and 42-31=11 for icode 5130.
3) The START and END are the same ie. osf26: duration is automatically assigned 0.5 (70 to 70, not 0 but 0.5).
4) The START AND END of current is between START and END of previous. Eg. osj 17 jobs 3 and 4.; duration will be from 60 to 75 (ie. 75-60=15).
Thanks very much in advance for your assistance.
ak.
One suspects that part of the problem is pointed out in the way you phrase this:
I would like to calculate durations (jobyrout-jobyrin) for overlap and non overlap years without using macro.
Your data no where shows evidence of anything resembling a "year" from names or values.
If 73 is a year I strongly suggest you use a 4 digit year because as soon as a value like 10 appears it is no longer clear whether you may be dealing with 1910 or 2010. I think this is a valid concern since your very small example shows 23 as a value. So if you have "recent" data then you may have an in value like 99 and out of 05. Which your example calculation would yield 5-99 = -94. Which I strongly suspect screws things up for what ever you are attempting. And might have had part of an issue with the "macro".
You list objectives but do not state how to show them, or at least not in a way that makes much sense to me.
Perhaps not the cleanest way of doing it, but does this work?
data have;
input id$ 1-6 job 7-8 jobyrin 9-11 jobyrout 18-20 icode$ 23-27 lung$ 29-35;
datalines;
OSa13 3 73 78 6191 cacase
OSa30 1 39 46 7181 cacase
OSa30 3 56 64 6191 cacase
OSa73 1 23 31 7181 popcon
OSa73 2 31 42 5130 popcon
OSf26 2 70 70 6191 popcon
OSj17 3 60 75 6191 cacase
OSj17 4 65 70 6191 cacase
;
run;
data codes;
input icode $ code $;
datalines;
6191 janitor
7181 baker
5130 plumber
;
run;
proc sql;
create table have_with_codes as
select h.*
,c.code
,jobYrOut - jobYrIn as duration
from have h
left join codes c
on h.icode = c.icode
order by id, job, jobYrIn;
quit;
data have_modify_duration;
set have_with_codes;
by id;
retain in_prev out_prev;
if duration = 0
then duration = 0.5;
if first.id
then do;
in_prev = jobYrIn;
out_prev = jobYrOut;
end;
else
do;
if (in_prev < JobYrIn and JobYrOut < out_prev)
then duration = .;
end;
if duration ne .;
drop in_prev out_prev lung job jobYrIn jobYrOut;
run;
proc transpose data=have_modify_duration
out=transposed (drop=_name_) suffix=_dur;
by id;
id code;
run;
proc sql;
create table want as
select id
,coalesce(janitor_dur,0) as janitor_dur
,case when janitor_dur = . then 0 else 1 end as janitor_ever
,coalesce(baker_dur,0) as baker_dur
,case when baker_dur = . then 0 else 1 end as baker_ever
,coalesce(plumber_dur,0) as plumber_dur
,case when plumber_dur = . then 0 else 1 end as plumber_ever
from transposed;
quit;
One suspects that part of the problem is pointed out in the way you phrase this:
I would like to calculate durations (jobyrout-jobyrin) for overlap and non overlap years without using macro.
Your data no where shows evidence of anything resembling a "year" from names or values.
If 73 is a year I strongly suggest you use a 4 digit year because as soon as a value like 10 appears it is no longer clear whether you may be dealing with 1910 or 2010. I think this is a valid concern since your very small example shows 23 as a value. So if you have "recent" data then you may have an in value like 99 and out of 05. Which your example calculation would yield 5-99 = -94. Which I strongly suspect screws things up for what ever you are attempting. And might have had part of an issue with the "macro".
You list objectives but do not state how to show them, or at least not in a way that makes much sense to me.
Can there be more than one id/icode combination for a given year?
I eliminated the duplicate records in your original dataset.
Using LAG() to find the previous end year and look ahead to find the next start year you can decide which observations need to have their duration adjusted by 0.5. Then simple SUMMARY and TRANSPOSE will generate the wide dataset. If you really need those additional EVER flag varaibles you can add those and also set zeros for the jobs never held by an individual.
data full_tdata;
input id$ job jobyrin jobyrout icode$ lung$ ;
datalines;
OSa13 3 1973 1978 6191 cacase
OSa30 1 1939 1946 7181 cacase
OSa30 3 1956 1964 6191 cacase
OSa73 1 1923 1931 7181 popcon
OSa73 2 1931 1942 5130 popcon
OSf26 2 1970 1970 6191 popcon
OSj17 3 1960 1975 6191 cacase
OSj17 4 1965 1970 6191 cacase
;
proc sort ;
by id jobyrin jobyrout ;
run;
proc format ;
value $icode '6191'='janitor' '7181'='baker' '5130'='plumber' other='other';
run;
data duration ;
set full_tdata;
by id jobyrin;
set full_tdata(firstobs=2 keep=jobyrin rename=(jobyrin=next_yr)) full_tdata(obs=1 drop=_all_);
prev_yr=lag(jobyrout);
if first.id then prev_yr=.;
if last.id then next_yr=.;
duration = jobyrout - jobyrin + 1 - 0.5*(prev_yr=jobyrin or next_yr=jobyrout);
job_name = put(icode,$icode.);
run;
proc print;
run;
proc summary data=duration nway;
by id;
class job_name ;
var duration ;
output out=summary sum=;
run;
proc print;
run;
proc transpose data=summary out=wide(drop=_name_) suffix=_dur;
by id ;
id job_name ;
var duration ;
run;
proc print;
run;
data want;
set wide ;
array dur janitor_dur baker_dur plumber_dur;
array ever janitor_ever baker_ever plumber_ever;
do over dur;
dur = sum(0,dur);
ever = dur > 0;
end;
run;
proc print;
run;
Is this what you wanted from that input? If not then explain what you want to do differently.
Obs id job jobyrin jobyrout icode lung next_yr prev_yr duration job_name 1 OSa13 3 1973 1978 6191 cacase . . 6.0 janitor 2 OSa30 1 1939 1946 7181 cacase 1956 . 8.0 baker 3 OSa30 3 1956 1964 6191 cacase . 1946 9.0 janitor 4 OSa73 1 1923 1931 7181 popcon 1931 . 8.5 baker 5 OSa73 2 1931 1942 5130 popcon . 1931 11.5 plumber 6 OSf26 2 1970 1970 6191 popcon . . 1.0 janitor 7 OSj17 3 1960 1975 6191 cacase 1965 . 16.0 janitor 8 OSj17 4 1965 1970 6191 cacase . 1975 6.0 janitor Obs id job_name _TYPE_ _FREQ_ duration 1 OSa13 janitor 1 1 6.0 2 OSa30 baker 1 1 8.0 3 OSa30 janitor 1 1 9.0 4 OSa73 baker 1 1 8.5 5 OSa73 plumber 1 1 11.5 6 OSf26 janitor 1 1 1.0 7 OSj17 janitor 1 2 22.0 janitor_ baker_ plumber_ Obs id dur dur dur 1 OSa13 6 . . 2 OSa30 9 8.0 . 3 OSa73 . 8.5 11.5 4 OSf26 1 . . 5 OSj17 22 . . janitor_ baker_ plumber_ janitor_ baker_ plumber_ Obs id dur dur dur ever ever ever 1 OSa13 6 0.0 0.0 1 0 0 2 OSa30 9 8.0 0.0 1 1 0 3 OSa73 0 8.5 11.5 0 1 1 4 OSf26 1 0.0 0.0 1 0 0 5 OSj17 22 0.0 0.0 1 0 0
Hello,
Thanks for your time to come out with that output. It is not what I am looking for but it is very close to the answer.
For eg. for id osa13, the duration should be 78-73 = 5, instead of 6. For osf26, it should be 70-70= 0.5 instead of 1. I will try to manipulate your code to see if I will obtain the expected output.
I appreciate your assistance. Your approach look quite straightforward.
ak.
@ak2011 wrote:
Hello,
Thanks for your time to come out with that output. It is not what I am looking for but it is very close to the answer.
For eg. for id osa13, the duration should be 78-73 = 5, instead of 6. For osf26, it should be 70-70= 0.5 instead of 1. I will try to manipulate your code to see if I will obtain the expected output.
I appreciate your assistance. Your approach look quite straightforward.
ak.
Not sure I understand the logic of that way of counting but try:
duration = jobyrout - jobyrin + 0.5*(prev_yr=jobyrin or next_yr=jobyrout or jobyrin=jobyrout);
But that seems to treat the same time period differently if it is split between two jobs.
Should the time preiod from 1923 to 1942 counted as 19 years or 20 years?
What if we split it into 5 different jobs then how many years should it be counted as?
data full_tdata;
input id$ job jobyrin jobyrout icode$ lung$ ;
datalines;
OSa73 1 1923 1931 7181 popcon
OSa73 2 1931 1942 5130 popcon
OSa73a 1 1923 1931 7181 popcon
OSa73b 2 1931 1942 5130 popcon
OSa73c 2 1923 1942 5130 popcon
OSf26 2 1970 1970 6191 popcon
;
baker_ plumber_ janitor_ Obs id dur dur dur 1 OSa73 8.5 11.5 . 2 OSa73a 8.0 . . 3 OSa73b . 11.0 . 4 OSa73c . 19.0 . 5 OSf26 . . 0.5
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.