Solved: Re: Counting overlap years as a single observation

ak2011 · Posted 02-21-2021 07:38 AM

Hi,

Can someone please help me with the code to count records of overlap years reported in two or more different jobs.

Eg with id os24, jobyrout 78 is reported in two different jobs 2 and 3. Same with Os35: jobyrout 69 is repeated for jobs 5 and 6. Same with os84: jobyrout 75 is reported for jobs 7 and 8,etc.

I want a code to pull out these overlap records and count them.

I really don’t have any clue to this question. I checked SAS community forum for questions and solutions on overlap but they did not address my question.

My data, code, log and results are found below.

Thanks in advance for your help.

ak.


data jobyr;
input id$ job jobyrin jobyrout;
datalines;
os24 1 63 73
os24 2 74 78
os24 3 78 79 
os24 4 80 85
os24 5 86 89
os35 4 61 64
os35 5 65 69
os35 6 69 79
os84 7 67 75
os84 8 75 81
os84 9 81 88
os94 2 14 21
os94 3 42 51
os94 4 51 68
os94 5 80 85
;

proc print data=jobyr;
title 'Table 1: Records with overlapyrs';
 run;

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

72

73

74 data jobyr;

75 input id$ job jobyrin jobyrout;

76 datalines;

NOTE: The data set WORK.JOBYR has 15 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

92 ;

93

94 proc print data=jobyr;

95 title 'Table 1: Records with overlapyrs';

96 run;

NOTE: There were 15 observations read from the data set WORK.JOBYR.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.20 seconds

cpu time 0.20 seconds

97

98

99 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

111

Table 1: Records with overlapyrs


Obs	id	job	jobyrin	jobyrout
1	os24	1	63	73
2	os24	2	74	78
3	os24	3	78	79
4	os24	4	80	85
5	os24	5	86	89
6	os35	4	61	64
7	os35	5	65	69
8	os35	6	69	79
9	os84	7	67	75
10	os84	8	75	81
11	os84	9	81	88
12	os94	2	14	21
13	os94	3	42	51
14	os94	4	51	68
15	os94	5	80	85

andreas_lds · Posted 02-23-2021 04:57 AM

I am tempted to merge this question back into the old one. It is not useful at all to repost an unanswered question.

And please show what you have tried so far, @mkeintz suggested some changes to your code.

View solution in original post

PaigeMiller · Posted 02-21-2021 07:49 AM

Eg with id os24, jobyrout 78 is reported in two different jobs 2 and 3. Same with Os35: jobyrout 69 is repeated for jobs 5 and 6. Same with os84: jobyrout 75 is reported for jobs 7 and 8,etc.

I do not see this in your data. Jobyrout is 78 once in your data set.

--
Paige Miller

ak2011 · Posted 02-21-2021 04:38 PM

Hello,

Actually, 78 appear appears twice as shown below:

os24:Job 2: 74-78

os24 Job3: 78-79

Thanks.

ak2011 · Posted 02-21-2021 09:20 AM

Hello,

Could someone assist me with the code to count overlap year records as a single observation?

For eg. I have a dataset with 9 observations, with id os24 having an overlap duration(jobyrin jobyout for job 2 and 3):74-78; 78-79; Same thing with os35: jobs 5 and 6 have overlap duration 65-69;69-79.

I need to have 1 single observation for the overlap duration. My expected output (table 2) is shown below.

Eventually I need to have 7 records as the overlap duration was counted as 1 obs.

Expected output: (Table 2)

id job jobyrin jobyrout jobyrdur indust_code;

os24 1 63 73 10 6191
os24 2_3 74 79 5 6191
os24 4 80 85 5 6191
os24 5 86 89 3 6191
os35 4 61 64 3 6191
os35 5_6 65 79 14 6191
os84 7 67 75 8 6191
;

My data, code, log and results are found below.

I would appreciate if you help me wit the code. I have no clue; I searched SAS online documentation, SAS forum,etc, but still I have no idea.

Please help.

Thanks.

ak.





data jobyr;
input id$ job jobyrin jobyrout indust_code;
datalines;
os24 1 63 73 6191
os24 2 74 78 6191
os24 3 78 79 6191
os24 4 80 85 6191
os24 5 86 89 6191
os35 4 61 64 6191
os35 5 65 69 6191
os35 6 69 79 6191
os84 7 67 75 6191
;

/*proc print data=jobyr;
title 'Table 1: Records with overlapyrs';
 run;*/

/* Calculate job duration in years and count indust_code*/
data jobdur; set jobyr;
jobyrdur=jobyrout-jobyrin;
run;

/* Count overall indust_code; both overlap and non-overlap*/
proc freq data=jobdur;
tables indust_code;
title 'Table 1: Indust_code count-overlap joboutyr or not'; run;



/* Counting indust_code: Consider overlap years per id as one duration*/

/* No clue here: Expected output (Table 2) is shown below*/

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

72

73 data jobyr;

74 input id$ job jobyrin jobyrout indust_code;

75 datalines;

NOTE: The data set WORK.JOBYR has 9 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.02 seconds

85 ;

86

87 /*proc print data=jobyr;

88 title 'Table 1: Records with overlapyrs';

89 run;*/

90

91 /* Calculate job duration in years and count indust_code*/

92 data jobdur; set jobyr;

93 jobyrdur=jobyrout-jobyrin;

94 run;

NOTE: There were 9 observations read from the data set WORK.JOBYR.

NOTE: The data set WORK.JOBDUR has 9 observations and 6 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.01 seconds

95

96 /* Count overall indust_code; both overlap and non-overlap*/

97 proc freq data=jobdur;

98 tables indust_code;

99 title 'Table 1: Indust_code count-overlap joboutyr or not'; run;

NOTE: There were 9 observations read from the data set WORK.JOBDUR.

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.29 seconds

cpu time 0.27 seconds

100

101

102

103 /* Counting indust_code: Consider overlap years per id as one duration*/

104

105 /* No clue here: Expected output (Table 2) is shown below*/

106

107 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

119

Table 1: Indust_code count-overlap joboutyr or not

The FREQ Procedure


indust_code	Frequency	Percent	Cumulative Frequency	Cumulative Percent
6191	9	100.00	9	100.00

mkeintz · Posted 02-21-2021 06:48 PM

What about the indust_code? Can it change during a sequence of overlapping years? If so, what do you want to do about it?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ak2011 · Posted 02-22-2021 12:07 AM

No the indust_code does not change; it remains the same.

Thank you.

ak.

ak2011 · Posted 02-22-2021 12:10 AM

I just want to count the indust_code as it is; the 6191 remains 6191 but I just want to know how many of them ie. I just need the count.

Thank you.

mkeintz · Posted 02-22-2021 11:19 AM

In your data step you need to decide whether an observation is to be skipped, due to the fact that it overlaps with the next observation.

To do so, you can do a self-merge with offset, as here:

merge jobyr jobyr (firstobs=2 keep=jobyrin rename=(jobyrin=nxt_jobyrin));

which tells sas to read an observation, and also one variable (renamed) from the next observation (because it has "firstobs=2"). This will let you know whether the current obs overlaps with the next, by comparing JOBYROUT with NXT_JOBYRIN.

Of course, this doesn't protect against the end of one id overlapping with the start of the next. To detect that condition, you can use a SET with a BY statement:

  set jobyr (keep=id);
  by id;

which generates dummy variables first.id and last.id, indicating whether the current obs is at the start or end of an id. So if you are at the end of an id, you should output the current obs, even if the years overlap with the next obs.

data want (drop=first_: nxt_: job);
  set jobyr (keep=id);
  by id;
  merge jobyr jobyr (firstobs=2 keep=jobyrin rename=(jobyrin=nxt_jobyrin));

  retain first_yrin first_job .;

  if first_yrin=. then first_yrin=jobyrin;
  if first_job=. then first_job=job;

  if last.id or nxt_jobyrin>jobyrout;
  jobyrin=first_yrin;
  jobs=ifc(job^=first_job,catx('_',first_job,job),cats(job));
  call missing(of first_:);
run;

Then the only other task is the preserve FIRST_YRIN, which is the JOBYRIN at the start of an overlapping series. And also similarly preserve FIRST_JOB.

The subsetting if statement

  if last.id or nxt_jobyrin>jobyrout;

allows only the last record of each id, or any record that does not overlap with the upcoming record.

Note that this will accommodate an overlapping series of any length - not just two overlapping records.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

JackHamilton · Posted 02-22-2021 03:25 PM

My goto paper for overlapping observations is

Starts and Stops: Processing Episode Data With Beginning and Ending Dates

Mike Rhoads

https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/260-29.pdf

Note that there are two separate programs in this paper. I think you want Example 2, not Example 1.

ak2011 · Posted 02-23-2021 04:01 AM

Thanks for your reference. I read it but I didn't understand it. My question still remains unanswered.

I still need help.

Kurt_Bremser · Posted 02-23-2021 04:19 AM

And do not (NEVER) store years with two digits, unless you consider 2000 to come before 1999.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ak2011 · Posted 02-22-2021 01:28 AM

Hello,

I have done several search to obtain the right code to achieve this objective, but I have found nothing:

Help needed,please:

From the dataset have below, I would like to do the following:

For each id job, if the jobyrout of the current observation is the same as the jobyrin of the next observation, then retain/keep that observation. Thus, for os24 jobs 2,3 jobyrout 78 (job 2) appears as jobyrin (78) for job 3. That is, 78 appears in 2 different jobs for id os24. Same logic with os35: 69 appears in jobs 5 and 6.

So I want to retain these observations as shown in my expected output below.

Thanks in advance.

ak.

My expected output is also shown below.

data have;
input id$ job jobyrin jobyrout;
datalines;
os24 1 63 73
os24 2 74 78
os24 3 78 79
os24 4 80 85
os24 5 86 89
os35 4 61 64
os35 5 65 69
os35 6 69 79
os84 7 67 75
;

Expected output

os24 2 74 78
os24 3 78 79
os35 5 65 69
os35 6 69 79
;

qoit · Posted 02-22-2021 03:39 AM

Hope the below helps:

data have;
	input id$ job jobyrin jobyrout;
	jobyrout_lag = lag(jobyrout);
	datalines;
os24 1 63 73
os24 2 74 78
os24 3 78 79
os24 4 80 85
os24 5 86 89
os35 4 61 64
os35 5 65 69
os35 6 69 79
os84 7 67 75
;
run;

DATA want (drop=jobyrout_lag jobyrin_);
	_N_ ++ 1;

	IF _N_ <= N THEN
		DO;
			Set have POINT=_N_;
			jobyrin_ = jobyrin;
		END;
	ELSE jobyrin = .;
	Set have nobs = n;

	if jobyrout = jobyrin_ or jobyrin = jobyrout_lag then
		output;
RUN;

qoit · Posted 02-22-2021 03:44 AM

sorry got a type when pasting:

IF _N_ <= N THEN
DO;
Set have POINT=_N_;
jobyrin_ = jobyrin;
END;
ELSE jobyrin_ = .; /* CHANGE HERE */
Set have nobs = n;

Ksharp · Posted 02-22-2021 06:10 AM

data have;
	input id$ job jobyrin jobyrout;
	datalines;
os24 1 63 73
os24 2 74 78
os24 3 78 79
os24 4 80 85
os24 5 86 89
os35 4 61 64
os35 5 65 69
os35 6 69 79
os84 7 67 75
;
run;
data want;
 merge have have(keep=id jobyrin rename=(id=_id jobyrin=_jobyrin ) firstobs=2);
 if (id=_id and jobyrout=_jobyrin) or (id=lag(id) and jobyrin=lag(jobyrout));
 drop _:;
run;

Displaying/printing overlap records

Re: Counting overlap durations as one observation

Re: Displaying/printing overlap records

Re: Displaying/printing overlap records

Counting overlap years as a single observation

Re: Counting overlap years as a single observation

Re: Counting overlap years as a single observation

Re: Counting overlap years as a single observation

Re: Counting overlap years as a single observation

Re: Counting overlap years as a single observation

Re: Counting overlap years as a single observation

Re: Counting overlap years as a single observation

Retain obs if current jobyrout equals jobyrin of next obs

Re: Retain obs if current jobyrout equals jobyrin of next obs

Re: Retain obs if current jobyrout equals jobyrin of next obs

Re: Retain obs if current jobyrout equals jobyrin of next obs

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away