BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ak2011
Fluorite | Level 6

 

Hello all,

I would like to calculate durations  (jobyrout-jobyrin) for overlap and non overlap years without using macro.

The macro didn’t work well with large datasets having long arrays; it worked  for a sample (small) dataset, so, I am obliged to find another method that will achieve the same aim, even if the method is long, I would prefer that to macro.

I would appreciate if someone knows of any other method to achieve my aim. The dataset, my objectives and expected output  are given below.

 I really need help; I have no clue, please.

Thanks in advance for your help.


/**Data set*/

data full_tdata;
input id$ 1-6 job 7-8  jobyrin 9-11  jobyrout 18-20  icode$ 23-27 lung$ 29-35;
datalines;
OSa13         3                  73                      78            6191         cacase
OSa30         1                  39                      46            7181         cacase
OSa30         3                  56                      64            6191         cacase
OSa73         1                  23                      31            7181         popcon
OSa73         2                  31                      42            5130         popcon
OSf26         2                  70                      70            6191         popcon
OSj17         3                  60                      75            6191         cacase
OSj17         4                  65                      70            6191         cacase

OSj17         3                  60                      75            6191         cacase
OSj17         4                  65                      70            6191         cacase
;


Objectives

1) No overlap at all  ie. .ids osa13 (duration=78-73=5) and osa30 (duration=46-39=7 for icode7181 and 64-56=8 for icode 6191).

2) The END of the previous matches the START of the current ie. id osa73 ; duration=31-23=8 for icode 7181 and 42-31=11 for icode 5130.

3) The START and END are the same  ie. osf26:  duration is automatically assigned 0.5 (70 to 70, not 0 but 0.5).

4) The START AND END of current is between START and END of previous. Eg. osj 17 jobs 3 and 4.; duration will be from 60 to 75 (ie. 75-60=15).

 

Expected output:

id                        janitor_dur    janitor_ever   baker_dur      baker_ever  plumber_dur     plumber_ever
osa13                   5                       1               0                           0                   0                     0              
osa30                   8                       1               7                           1                   0                     0
osa73                   0                       0               8                           1                  11                    1
osf26                   0.5                    1                0                           0                   0                     0
osj17                   15                     1                0                           0                   0                      0

 

Icode 6191=janitor:  if janitor(6191) is present (janitor_ever=1, otherwise janitor_ever=0)  ; janitor_dur is the duration (jobyrout-jobyrin);  icode 7181=baker, 5130=plumber.

1) No overlap at all  ie. .ids osa13 (duration=78-73=5) and osa30 (duration=46-39=7 for icode7181 and 64-56=8 for icode 6191).

2) The END of the previous matches the START of the current ie. id osa73 ; duration=31-23=8 for icode 7181 and 42-31=11 for icode 5130.

3) The START and END are the same  ie. osf26:  duration is automatically assigned 0.5 (70 to 70, not 0 but 0.5).

4) The START AND END of current is between START and END of previous. Eg. osj 17 jobs 3 and 4.; duration will be from 60 to 75 (ie. 75-60=15).

 

Thanks very much in advance for your assistance.

ak.

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One suspects that part of the problem is pointed out in the way you phrase this:

I would like to calculate durations (jobyrout-jobyrin) for overlap and non overlap years without using macro.

Your data no where shows evidence of anything resembling a "year" from names or values.

 

If 73 is a year I strongly suggest you use a 4 digit year because as soon as a value like 10 appears it is no longer clear whether you may be dealing with 1910 or 2010. I think this is a valid concern since your very small example shows 23 as a value. So if you have "recent" data then you may have an in value like 99 and out of 05. Which your example calculation would yield 5-99 = -94. Which I strongly suspect screws things up for what ever you are attempting. And might have had part of an issue with the "macro".

 

You list objectives but do not state how to show them, or at least not in a way that makes much sense to me.

 

View solution in original post

10 REPLIES 10
mklangley
Lapis Lazuli | Level 10

Perhaps not the cleanest way of doing it, but does this work?

data have;
    input id$ 1-6 job 7-8  jobyrin 9-11  jobyrout 18-20  icode$ 23-27 lung$ 29-35;
    datalines;
OSa13 3 73       78   6191  cacase
OSa30 1 39       46   7181  cacase
OSa30 3 56       64   6191  cacase
OSa73 1 23       31   7181  popcon
OSa73 2 31       42   5130  popcon
OSf26 2 70       70   6191  popcon
OSj17 3 60       75   6191  cacase
OSj17 4 65       70   6191  cacase
    ;
run;

data codes;
    input icode $ code $;
    datalines;
6191 janitor
7181 baker
5130 plumber
    ;
run;

proc sql;
    create table have_with_codes as
    select h.*
          ,c.code
          ,jobYrOut - jobYrIn as duration
    from have h
    left join codes c
        on h.icode = c.icode
    order by id, job, jobYrIn;
quit;

data have_modify_duration;
    set have_with_codes;
    by id;
    retain in_prev out_prev;
    if duration = 0
        then duration = 0.5;
    if first.id
        then do;
            in_prev = jobYrIn;
            out_prev = jobYrOut;
        end;
    else
        do;
            if (in_prev < JobYrIn and JobYrOut < out_prev)
                then duration = .;
        end;
    if duration ne .;
    drop in_prev out_prev lung job jobYrIn jobYrOut;
run;

proc transpose data=have_modify_duration
    out=transposed (drop=_name_) suffix=_dur;
    by id;
    id code;
run;

proc sql;
    create table want as
    select id
          ,coalesce(janitor_dur,0) as janitor_dur
          ,case when janitor_dur = . then 0 else 1 end as janitor_ever
          ,coalesce(baker_dur,0) as baker_dur
          ,case when baker_dur = . then 0 else 1 end as baker_ever
          ,coalesce(plumber_dur,0) as plumber_dur
          ,case when plumber_dur = . then 0 else 1 end as plumber_ever
    from transposed;
quit;
ak2011
Fluorite | Level 6
Hello,
You method works! Thanks very much.

ak.
ballardw
Super User

One suspects that part of the problem is pointed out in the way you phrase this:

I would like to calculate durations (jobyrout-jobyrin) for overlap and non overlap years without using macro.

Your data no where shows evidence of anything resembling a "year" from names or values.

 

If 73 is a year I strongly suggest you use a 4 digit year because as soon as a value like 10 appears it is no longer clear whether you may be dealing with 1910 or 2010. I think this is a valid concern since your very small example shows 23 as a value. So if you have "recent" data then you may have an in value like 99 and out of 05. Which your example calculation would yield 5-99 = -94. Which I strongly suspect screws things up for what ever you are attempting. And might have had part of an issue with the "macro".

 

You list objectives but do not state how to show them, or at least not in a way that makes much sense to me.

 

PGStats
Opal | Level 21

Can there be more than one id/icode combination for a given year?

PG
ak2011
Fluorite | Level 6
Hello,
Yes if the year overlaps. Eg. id osh77 job 1 jobyrin 63 jobyrout 66 icode 6121
osh77 job 2 jobyrin 66 jobyrout 70 icode 6121
Thank you.
ak.
Tom
Super User Tom
Super User

I eliminated the duplicate records in your original dataset.

Using LAG() to find the previous end year and look ahead to find the next start year you can decide which observations need to have their duration adjusted by 0.5.  Then simple SUMMARY and TRANSPOSE will generate the wide dataset.  If you really need those additional EVER flag varaibles you can add those and also set zeros for the jobs never held by an individual.

data full_tdata;
  input id$ job  jobyrin  jobyrout  icode$ lung$ ;
datalines;
OSa13 3 1973 1978 6191 cacase
OSa30 1 1939 1946 7181 cacase
OSa30 3 1956 1964 6191 cacase
OSa73 1 1923 1931 7181 popcon
OSa73 2 1931 1942 5130 popcon
OSf26 2 1970 1970 6191 popcon
OSj17 3 1960 1975 6191 cacase
OSj17 4 1965 1970 6191 cacase
;

proc sort  ;
  by id jobyrin jobyrout ;
run;

proc format ;
 value $icode '6191'='janitor' '7181'='baker' '5130'='plumber' other='other';
run;

data duration ;
  set full_tdata;
  by id jobyrin;
  set full_tdata(firstobs=2 keep=jobyrin rename=(jobyrin=next_yr)) full_tdata(obs=1 drop=_all_);
  prev_yr=lag(jobyrout);
  if first.id then prev_yr=.;
  if last.id then next_yr=.;
  duration = jobyrout - jobyrin + 1 - 0.5*(prev_yr=jobyrin or next_yr=jobyrout);
  job_name = put(icode,$icode.);
run;

proc print;
run;

proc summary data=duration nway;
  by id;
  class job_name ;
  var duration ;
  output out=summary sum=;
run;

proc print;
run;

proc transpose data=summary out=wide(drop=_name_) suffix=_dur;
  by id ;
  id job_name ;
  var duration ;
run;

proc print;
run;

data want;
  set wide ;
  array dur janitor_dur baker_dur plumber_dur;
  array ever  janitor_ever baker_ever plumber_ever;
  do over dur;
    dur = sum(0,dur);
    ever = dur > 0;
  end;
run;

proc print;
run;

 Is this what you wanted from that input?  If not then explain what you want to do differently.

Obs    id     job   jobyrin   jobyrout   icode    lung    next_yr   prev_yr   duration   job_name

 1    OSa13    3      1973      1978     6191    cacase        .         .       6.0     janitor
 2    OSa30    1      1939      1946     7181    cacase     1956         .       8.0     baker
 3    OSa30    3      1956      1964     6191    cacase        .      1946       9.0     janitor
 4    OSa73    1      1923      1931     7181    popcon     1931         .       8.5     baker
 5    OSa73    2      1931      1942     5130    popcon        .      1931      11.5     plumber
 6    OSf26    2      1970      1970     6191    popcon        .         .       1.0     janitor
 7    OSj17    3      1960      1975     6191    cacase     1965         .      16.0     janitor
 8    OSj17    4      1965      1970     6191    cacase        .      1975       6.0     janitor


Obs     id      job_name    _TYPE_    _FREQ_    duration

 1     OSa13    janitor        1         1         6.0
 2     OSa30    baker          1         1         8.0
 3     OSa30    janitor        1         1         9.0
 4     OSa73    baker          1         1         8.5
 5     OSa73    plumber        1         1        11.5
 6     OSf26    janitor        1         1         1.0
 7     OSj17    janitor        1         2        22.0


                janitor_    baker_    plumber_
Obs     id         dur        dur        dur

 1     OSa13        6          .          .
 2     OSa30        9         8.0         .
 3     OSa73        .         8.5       11.5
 4     OSf26        1          .          .
 5     OSj17       22          .          .


                janitor_    baker_    plumber_    janitor_    baker_    plumber_
Obs     id         dur        dur        dur        ever       ever       ever

 1     OSa13        6         0.0        0.0          1          0          0
 2     OSa30        9         8.0        0.0          1          1          0
 3     OSa73        0         8.5       11.5          0          1          1
 4     OSf26        1         0.0        0.0          1          0          0
 5     OSj17       22         0.0        0.0          1          0          0

ak2011
Fluorite | Level 6

Hello,

Thanks for your time to come out with that output. It is not what I am looking for but it is very close to the answer.

For eg. for id osa13, the duration should be 78-73 = 5, instead of 6. For osf26, it should be 70-70= 0.5 instead of 1. I will try to manipulate your code to see if I will obtain the expected output.

I appreciate your assistance. Your approach look quite straightforward. 

 

ak.

Tom
Super User Tom
Super User

@ak2011 wrote:

Hello,

Thanks for your time to come out with that output. It is not what I am looking for but it is very close to the answer.

For eg. for id osa13, the duration should be 78-73 = 5, instead of 6. For osf26, it should be 70-70= 0.5 instead of 1. I will try to manipulate your code to see if I will obtain the expected output.

I appreciate your assistance. Your approach look quite straightforward. 

 

ak.


Not sure I understand the logic of that way of counting but try:

  duration = jobyrout - jobyrin + 0.5*(prev_yr=jobyrin or next_yr=jobyrout or jobyrin=jobyrout);

But that seems to treat the same time period differently if it is split between two jobs.

Should the time preiod from 1923 to 1942 counted as 19 years or 20 years?

What if we split it into 5 different jobs then how many years should it be counted as?

data full_tdata;
  input id$ job  jobyrin  jobyrout  icode$ lung$ ;
datalines;
OSa73 1 1923 1931 7181 popcon
OSa73 2 1931 1942 5130 popcon
OSa73a 1 1923 1931 7181 popcon
OSa73b 2 1931 1942 5130 popcon
OSa73c 2 1923 1942 5130 popcon
OSf26 2 1970 1970 6191 popcon
;
                 baker_    plumber_    janitor_
Obs      id        dur        dur         dur

 1     OSa73       8.5       11.5          .
 2     OSa73a      8.0         .           .
 3     OSa73b       .        11.0          .
 4     OSa73c       .        19.0          .
 5     OSf26        .          .          0.5
ak2011
Fluorite | Level 6
Hello Tom,
Thanks again for your time. The duration from 1923 to 1942 equals 19. If split between 5 different jobs, it will still be counted as 19 years: from 1923 to 1942 = 19 years.
ak.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 767 views
  • 0 likes
  • 5 in conversation