BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ak2011
Fluorite | Level 6

Hi,

Can someone please help me with the code to count records of overlap years reported in two or more different jobs.

Eg  with id os24, jobyrout 78 is reported in two different jobs 2 and 3. Same with Os35: jobyrout 69 is repeated for jobs 5 and 6. Same with os84: jobyrout 75 is reported for jobs 7 and 8,etc.

I want a code to pull out these overlap records and count them.

I really don’t have any clue to this question. I checked SAS community forum for questions  and solutions on overlap but they did not address my question.

 

My data, code, log and results  are found below.

 

Thanks in advance for your help.

 

ak.

 


data jobyr;
input id$ job jobyrin jobyrout;
datalines;
os24 1 63 73
os24 2 74 78
os24 3 78 79
os24 4 80 85
os24 5 86 89
os35 4 61 64
os35 5 65 69
os35 6 69 79
os84 7 67 75
os84 8 75 81
os84 9 81 88
os94 2 14 21
os94 3 42 51
os94 4 51 68
os94 5 80 85
;

proc print data=jobyr;
title 'Table 1: Records with overlapyrs';
run;

 

 

 

 

 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73
74 data jobyr;
75 input id$ job jobyrin jobyrout;
76 datalines;
 
NOTE: The data set WORK.JOBYR has 15 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
92 ;
93
94 proc print data=jobyr;
95 title 'Table 1: Records with overlapyrs';
96 run;
 
NOTE: There were 15 observations read from the data set WORK.JOBYR.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.20 seconds
cpu time 0.20 seconds
 
 
97
98
99 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
111

 

Table 1: Records with overlapyrs

 
Obs id job jobyrin jobyrout
1 os24 1 63 73
2 os24 2 74 78
3 os24 3 78 79
4 os24 4 80 85
5 os24 5 86 89
6 os35 4 61 64
7 os35 5 65 69
8 os35 6 69 79
9 os84 7 67 75
10 os84 8 75 81
11 os84 9 81 88
12 os94 2 14 21
13 os94 3 42 51
14 os94 4 51 68
15 os94 5 80 85

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

I am tempted to merge this question back into the old one. It is not useful at all to repost an unanswered question.

And please show what you have tried so far, @mkeintz suggested some changes to your code.

 

 

View solution in original post

22 REPLIES 22
PaigeMiller
Diamond | Level 26

Eg with id os24, jobyrout 78 is reported in two different jobs 2 and 3. Same with Os35: jobyrout 69 is repeated for jobs 5 and 6. Same with os84: jobyrout 75 is reported for jobs 7 and 8,etc.

 

I do not see this in your data. Jobyrout is 78 once in your data set.

--
Paige Miller
ak2011
Fluorite | Level 6

Hello,

Actually, 78 appear appears twice as shown below:

os24:Job 2: 74-78

os24 Job3: 78-79

 

Thanks.

 

 

 

 

ak2011
Fluorite | Level 6
 

Hello,

Could someone assist me with the code to count overlap year records as a single observation?

For eg. I have a dataset with 9 observations, with id os24 having an overlap duration(jobyrin jobyout  for  job 2 and 3):74-78; 78-79; Same thing with os35: jobs 5 and 6 have overlap  duration 65-69;69-79.

I need to have 1 single observation for the overlap duration. My expected  output (table 2) is shown below.

 

Eventually I need to have 7 records as the overlap duration was counted as 1 obs.

Expected output: (Table 2)

id job jobyrin jobyrout jobyrdur indust_code;

os24 1 63 73 10 6191
os24 2_3 74 79 5 6191
os24 4 80 85 5 6191
os24 5 86 89 3 6191
os35 4 61 64 3 6191
os35 5_6 65 79 14 6191
os84 7 67 75 8 6191
;

 

 

My data, code, log and results are found below.

 

I would appreciate if you help me wit the code. I have no clue; I searched SAS online documentation, SAS forum,etc, but still I have no idea.

Please help.

Thanks.

ak.





data jobyr;
input id$ job jobyrin jobyrout indust_code;
datalines;
os24 1 63 73 6191
os24 2 74 78 6191
os24 3 78 79 6191
os24 4 80 85 6191
os24 5 86 89 6191
os35 4 61 64 6191
os35 5 65 69 6191
os35 6 69 79 6191
os84 7 67 75 6191
;

/*proc print data=jobyr;
title 'Table 1: Records with overlapyrs';
run;*/

/* Calculate job duration in years and count indust_code*/
data jobdur; set jobyr;
jobyrdur=jobyrout-jobyrin;
run;

/* Count overall indust_code; both overlap and non-overlap*/
proc freq data=jobdur;
tables indust_code;
title 'Table 1: Indust_code count-overlap joboutyr or not'; run;



/* Counting indust_code: Consider overlap years per id as one duration*/

/* No clue here: Expected output (Table 2) is shown below*/
 
 
 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 data jobyr;
74 input id$ job jobyrin jobyrout indust_code;
75 datalines;
 
NOTE: The data set WORK.JOBYR has 9 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
 
 
85 ;
86
87 /*proc print data=jobyr;
88 title 'Table 1: Records with overlapyrs';
89 run;*/
90
91 /* Calculate job duration in years and count indust_code*/
92 data jobdur; set jobyr;
93 jobyrdur=jobyrout-jobyrin;
94 run;
 
NOTE: There were 9 observations read from the data set WORK.JOBYR.
NOTE: The data set WORK.JOBDUR has 9 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
 
 
95
96 /* Count overall indust_code; both overlap and non-overlap*/
97 proc freq data=jobdur;
98 tables indust_code;
99 title 'Table 1: Indust_code count-overlap joboutyr or not'; run;
NOTE: There were 9 observations read from the data set WORK.JOBDUR.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.29 seconds
cpu time 0.27 seconds
 
 
100
101
102
103 /* Counting indust_code: Consider overlap years per id as one duration*/
104
105 /* No clue here: Expected output (Table 2) is shown below*/
106
107 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
119

 

Table 1: Indust_code count-overlap joboutyr or not

The FREQ Procedure

 
indust_code Frequency Percent Cumulative
Frequency
Cumulative
Percent
6191 9 100.00 9 100.00

 

 

 

mkeintz
PROC Star

What about the indust_code?  Can it change during a sequence of overlapping years?  If so, what do you want to do about it?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ak2011
Fluorite | Level 6

No the indust_code does not change; it remains the same.

Thank you.

ak.

ak2011
Fluorite | Level 6

I just want to count the indust_code as it is; the 6191 remains 6191 but I just want to know how many of them ie. I just need the count.

Thank you.

mkeintz
PROC Star

In your data step you need to decide whether an observation is to be skipped, due to the fact that it overlaps with the next observation. 

 

To do so, you can do a self-merge with offset, as here:

merge jobyr jobyr (firstobs=2 keep=jobyrin rename=(jobyrin=nxt_jobyrin));

which tells sas to read an observation, and also one variable (renamed) from the next observation (because it has "firstobs=2").  This will let you know whether the current obs overlaps with the next, by comparing JOBYROUT with NXT_JOBYRIN.

 

Of course, this doesn't protect against the end of one id overlapping with the start of the next.  To detect that condition, you can use a SET with a BY statement:

  set jobyr (keep=id);
  by id;

which generates dummy variables first.id and last.id, indicating whether the current obs is at the start or end of an id.  So if you are at the end of an id, you should output the current obs, even if the years overlap with the next obs.

 

data want (drop=first_: nxt_: job);
  set jobyr (keep=id);
  by id;
  merge jobyr jobyr (firstobs=2 keep=jobyrin rename=(jobyrin=nxt_jobyrin));

  retain first_yrin first_job .;

  if first_yrin=. then first_yrin=jobyrin;
  if first_job=. then first_job=job;

  if last.id or nxt_jobyrin>jobyrout;
  jobyrin=first_yrin;
  jobs=ifc(job^=first_job,catx('_',first_job,job),cats(job));
  call missing(of first_:);
run;

Then the only other task is the preserve FIRST_YRIN, which is the JOBYRIN at the start of an overlapping series.  And also similarly preserve FIRST_JOB.  

 

The subsetting if statement

  if last.id or nxt_jobyrin>jobyrout;

allows only the last record of each id, or any record that does not overlap with the upcoming record.

 

Note that this will accommodate an overlapping series of any length - not just two overlapping records.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JackHamilton
Lapis Lazuli | Level 10

My goto paper for overlapping observations is

 

Starts and Stops: Processing Episode Data With Beginning and Ending Dates

Mike Rhoads

https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/260-29.pdf

 

Note that there are two separate programs in this paper.  I think you want Example 2, not Example 1.

ak2011
Fluorite | Level 6

Thanks for your reference. I read it  but I didn't understand it. My question still remains unanswered.

I still need help.

ak2011
Fluorite | Level 6

 

Hello,

I have done several search to obtain the right code to achieve this objective, but I have found nothing:

Help needed,please:

From the dataset have below, I would like to do the following:

For each id job, if the jobyrout of the current observation is the same as the jobyrin of the next observation, then retain/keep that observation. Thus, for os24 jobs 2,3 jobyrout 78 (job 2) appears as jobyrin (78) for job 3. That is, 78 appears in 2 different jobs for id os24. Same logic with os35: 69 appears in jobs 5 and 6.

So I want to retain these observations as shown in my expected output below.

Thanks in advance.

ak.

 

My expected output is also shown below.

 

data have;
input id$ job jobyrin jobyrout;
datalines;
os24 1 63 73
os24 2 74 78
os24 3 78 79
os24 4 80 85
os24 5 86 89
os35 4 61 64
os35 5 65 69
os35 6 69 79
os84 7 67 75
;


Expected output

os24 2 74 78
os24 3 78 79
os35 5 65 69
os35 6 69 79
;

qoit
Pyrite | Level 9

Hope the below helps:

data have;
	input id$ job jobyrin jobyrout;
	jobyrout_lag = lag(jobyrout);
	datalines;
os24 1 63 73
os24 2 74 78
os24 3 78 79
os24 4 80 85
os24 5 86 89
os35 4 61 64
os35 5 65 69
os35 6 69 79
os84 7 67 75
;
run;

DATA want (drop=jobyrout_lag jobyrin_);
	_N_ ++ 1;

	IF _N_ <= N THEN
		DO;
			Set have POINT=_N_;
			jobyrin_ = jobyrin;
		END;
	ELSE jobyrin = .;
	Set have nobs = n;

	if jobyrout = jobyrin_ or jobyrin = jobyrout_lag then
		output;
RUN;
qoit
Pyrite | Level 9
sorry got a type when pasting:

IF _N_ <= N THEN
DO;
Set have POINT=_N_;
jobyrin_ = jobyrin;
END;
ELSE jobyrin_ = .; /* CHANGE HERE */
Set have nobs = n;
Ksharp
Super User
data have;
	input id$ job jobyrin jobyrout;
	datalines;
os24 1 63 73
os24 2 74 78
os24 3 78 79
os24 4 80 85
os24 5 86 89
os35 4 61 64
os35 5 65 69
os35 6 69 79
os84 7 67 75
;
run;
data want;
 merge have have(keep=id jobyrin rename=(id=_id jobyrin=_jobyrin ) firstobs=2);
 if (id=_id and jobyrout=_jobyrin) or (id=lag(id) and jobyrin=lag(jobyrout));
 drop _:;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 22 replies
  • 1780 views
  • 0 likes
  • 8 in conversation