Solved: Re: Uexpected merged results from two macro datasets

ak2011 · Posted 03-05-2021 06:45 PM

Hello,

Thanks to all who provided solutions/comments to my question of merging two macro results. However, the problem has still not been solved, so I would appreciate if someone could help me.

I am reposting the same question. The unexpected merged results (17 observations) this time is given for your perusal.

In my previous posts,I mentioned that I wanted to merge 2 macro results datasets :Table 1 (janitor)
and Table 2 (Baker). Janitor dataset is called jan and baker is bake.
The merged file mjb produced unexpected results (Table 3-17 observations).
I have still not found the solution to
my problem inspite of reading extensively on the subject from SAS Forum answers, SAS online documentation and so on.
I would appreciate your help,please.

My code, data and results are found below. My expected output is placed after results.
Thanks in advance. [The log has no errors but too long so not shown].
ak.

/**MACRO*/

data full_tdata;
input id$ 1-6 job 7-8 jobyrin 9-11 jobyrout 18-20  icode$ 23-27  lung$ 29-35;
datalines;
OSa13	3	73	      78	  6191	 cacase
OSa30	1	39	      46	  7181	 cacase
OSa30	3	56	      64	  6191	 cacase
OSa73	1	23	      31	  7181	 popcon
OSa73	2	31	      42	  5130	 popcon
OSa86	3	46	      60	  6198	 cacon
OSa86	4	60	      70	  5130	 cacon
OSa93	3	68	      72	  6121	 popcon
OSf26	1	54	      54	  6198	 popcon
OSf26	2	70	      70	  6191	 popcon
OSh77	1	63	      66	  6121	 cacon
OSh77	2	66	      70	  6121	 cacon
OSh77	3	70	      71	  6121	 cacon
OSi84	1	67	      75	  6191	 cacase
OSi84	2	75	      81	  6191	 cacase
OSj17	3	60	      75	  6191	 cacase
OSj17	4	65	      70	  6191	 cacase
;

proc print data=full_tdata;
title 'Table O: Original full_tdata file'; run;

data small_tdata; set full_tdata; drop lung;


*max 4 jobs per subject;


%macro duration(database=small_tdata, icode=6191, Job_title=janitor);


*Creating a dataset containing all the jobs with selected icode;

data &Job_title;
set &database;
if icode = &icode;
run;


*transposing dataset so that the YEARIN and YEAROUT of all jobs held by a subject are in the same row;


*YEARIN;

proc transpose data = &Job_title out = &Job_title._yearin prefix = YEARIN;
by ID;
var jobyrin;
run;

proc sort data = &Job_title._yearin;
by id;
run;

*YEAROUT;

proc transpose data = &Job_title out = &Job_title._yearout prefix = YEAROUT;
by ID;
var jobyrout;
run;

proc sort data = &Job_title._yearout;
by id;
run;



data &Job_title._final;
merge &Job_title._yearin &Job_title._yearout;
by ID;

array YEARIN (3) YEARIN1 YEARIN2 YEARIN3;
array YEAROUT (3) YEAROUT1 YEAROUT2 YEAROUT3;

array YEARINv2 (3) YEARIN2 YEARIN3 YEARIN4 ;
array YEAROUTv2 (3) YEAROUT2 YEAROUT3 YEAROUT4;

array YEARIN_F (4) YEARIN1 YEARIN2 YEARIN3 YEARIN4;
array YEAROUT_F (4) YEAROUT1 YEAROUT2 YEAROUT3 YEAROUT4;

Array duration (4) Dur1 Dur2 Dur3 Dur4;



do i = 1 to 3;

if yearin(i) ne . and  yearinv2(i) ne . and yearinv2(i) le yearout(i) and yearoutv2(i) ge yearout(i) then do; yearinv2(i) = yearin(i); yearin(i) = .;yearout(i) = .;end;
else if  yearin(i) ne . and  yearinv2(i) ne . and yearinv2(i) le yearout(i) and yearoutv2(i) lt yearout(i) then do; yearinv2(i) = yearin(i);yearoutv2(i) = yearout(i); yearin(i) = .;yearout(i) = .;end;

end;

do i = 1 to 4;

if YEARIN_F(i) = . then  YEARIN_F(i) = 0;
if YEAROUT_F(i) = . then  YEAROUT_F(i) = 0;

duration(i) = YEAROUT_F(i) - YEARIN_F(i);
if YEARIN_F(i) ne 0 and YEAROUT_F(i) = YEARIN_F(i) then duration(i) = 0.5;
end;

&Job_title._dur = Dur1 + Dur2 + Dur3 + Dur4;

&Job_title._ever = 1;

keep ID &Job_title._dur &Job_title._ever;
run;

%MEND duration;





/* 1st macro-janitor icode 6191*/

data jan; set small_tdata;
 
 
%duration(database=small_tdata, icode=6191, Job_title=janitor);
  

proc print; 
title 'Table 1: janitor records';
run;


/* 2nd macro-baker icode 7181*/

data bake; set small_tdata;
 
%duration(database=small_tdata,icode=7181, Job_title=baker);

proc print; 
title 'Table 2: baker records';
run;


/*Merging the janitor (Table 1) and baker(Table 2) files*/

proc sort data=jan; by id;

proc sort data=bake; by id;

data mjb; merge  jan bake; by id; run;


proc print data=mjb;
Title 'Table 3: Merged Table 1(janitor) and Table 2 (baker)-Unexpected merge
This is not what I want';
run;

Table O: Original full_tdata file


Obs	id	job	jobyrin	jobyrout	icode	lung
1	OSa13	3	73	78	6191	cacase
2	OSa30	1	39	46	7181	cacase
3	OSa30	3	56	64	6191	cacase
4	OSa73	1	23	31	7181	popcon
5	OSa73	2	31	42	5130	popcon
6	OSa86	3	46	60	6198	cacon
7	OSa86	4	60	70	5130	cacon
8	OSa93	3	68	72	6121	popcon
9	OSf26	1	54	54	6198	popcon
10	OSf26	2	70	70	6191	popcon
11	OSh77	1	63	66	6121	cacon
12	OSh77	2	66	70	6121	cacon
13	OSh77	3	70	71	6121	cacon
14	OSi84	1	67	75	6191	cacase
15	OSi84	2	75	81	6191	cacase
16	OSj17	3	60	75	6191	cacase
17	OSj17	4	65	70	6191	cacase

Table 1: janitor records


Obs	id	janitor_dur	janitor_ever
1	OSa13	5.0	1
2	OSa30	8.0	1
3	OSf26	0.5	1
4	OSi84	14.0	1
5	OSj17	15.0	1

Table 2: baker records


Obs	id	baker_dur	baker_ever
1	OSa30	7	1
2	OSa73	8	1

Table 3: Merged Table 1(janitor) and Table 2 (baker)-Unexpected mergeThis is not what I want


Obs	id	job	jobyrin	jobyrout
1	OSa13	3	73	78
2	OSa30	1	39	46
3	OSa30	3	56	64
4	OSa73	1	23	31
5	OSa73	2	31	42
6	OSa86	3	46	60
7	OSa86	4	60	70
8	OSa93	3	68	72
9	OSf26	1	54	54
10	OSf26	2	70	70
11	OSh77	1	63	66
12	OSh77	2	66	70
13	OSh77	3	70	71
14	OSi84	1	67	75
15	OSi84	2	75	81
16	OSj17	3	60	75
17	OSj17	4	65	70

Expected output:

id janitor_dur janitor_ever baker_dur baker_ever
osa13 5 1 0 0
osa30 8 1 7 1
osa73 0 0 8 1
osf26 0.5 1 0 0
osi84 14 1 0 0
osj17 15 1 0 0

Tables 1 and 2 are ok, Table 3 is not good: My aim is to merge Tables 1 and 2 to obtain the expected output above.

I would finally merge janitor and baker files (merged Tables 1&2) with table 0: original file.

Please help. Thanks.

ak.

ballardw · Posted 03-05-2021 07:01 PM

READ YOUR BLINKING LOG.

Your log will show the names of the data sets created and how many records.

Your printed output is not from bake nor jan. The output data set names from your macro are janitor_final and baker_final

View solution in original post

ballardw · Posted 03-05-2021 07:01 PM

READ YOUR BLINKING LOG.

Your log will show the names of the data sets created and how many records.

Your printed output is not from bake nor jan. The output data set names from your macro are janitor_final and baker_final

Reeza · Posted 03-05-2021 07:55 PM

Your macro is generating tables with the final names of &Job_title._final, so baker_final and janitor_final. And using Janitor and Baker, not Jan/Bake so the references seem inconsistent.

Where are the Jan/baker files at the end coming from?

ballardw · Posted 03-06-2021 04:47 AM

@Reeza wrote:
Your macro is generating tables with the final names of &Job_title._final, so baker_final and janitor_final. And using Janitor and Baker, not Jan/Bake so the references seem inconsistent.

Where are the Jan/baker files at the end coming from?

OP had two bits like

data jan; set small_tdata;

called immediately before the %duration macro call.

ak2011 · Posted 03-09-2021 03:43 AM

Sorry, data jan; set small_data; is not right.

Thank you.
ak.

Tom · Posted 03-06-2021 11:40 AM

Just read the SAS log to see what the actual names of the datasets you macro generates.

Note it is best not to start using macro code to generate SAS code until you understand how to generate SAS code yourself. Otherwise you will just confuse yourself.

What are you trying to accomplish? Do you even need macro code? To find the duration it is easier with your original structure.

data full_tdata;
  input id $ job jobyrin jobyrout icode $ lung $;
datalines;
OSa13 3 73 78 6191 cacase
OSa30 1 39 46 7181 cacase
OSa30 3 56 64 6191 cacase
OSa73 1 23 31 7181 popcon
OSa73 2 31 42 5130 popcon
OSa86 3 46 60 6198 cacon
OSa86 4 60 70 5130 cacon
OSa93 3 68 72 6121 popcon
OSf26 1 54 54 6198 popcon
OSf26 2 70 70 6191 popcon
OSh77 1 63 66 6121 cacon
OSh77 2 66 70 6121 cacon
OSh77 3 70 71 6121 cacon
OSi84 1 67 75 6191 cacase
OSi84 2 75 81 6191 cacase
OSj17 3 60 75 6191 cacase
OSj17 4 65 70 6191 cacase
;

data job_duration ;
  set full_tdata;
  length job_name $10 ;
  if icode='6191' then job_name='janitor';
  else if icode='7181' then job_name='baker';
  else job_name='other';

  duration = jobyrout - jobyrin;

run;


proc summary data=job_duration nway ;
  by id;
  class job_name ;
  var duration ;
  output out=want sum=;
run;

options missing='0';
proc report data=want ;
  column id duration,job_name ;
  define id / group;
  define job_name / across ' ';
  define duration / ' ';
run;

   id            baker    janitor      other
  OSa13             0          5          0
  OSa30             7          8          0
  OSa73             8          0         11
  OSa86             0          0         24
  OSa93             0          0          4
  OSf26             0          0          0
  OSh77             0          0          8
  OSi84             0         14          0
  OSj17             0         20          0

ak2011 · Posted 03-07-2021 07:18 PM

Thanks Tom,
Actually, I prefer your approach to macro. The above dataset is a sample of a larger one and when I ran the larger macro I usually receive errors about quoted strings,etc. like shown below:

NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.

NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS release. Inserting white space
between a quoted string and the succeeding identifier is recommended.
*************************************************************************************************

So I would like to try your approach:
Basically, these are my objectives:

-Find duration in years (jobyrout-jobyrin), if no overlap in years-calculate it direct:
- If overlap eg:osa73 jobs 1 (jobyrout 31) and 2 (jobyrin 31): then final duration will be from 23 to 42 ie. 42 minus 23, so osa73 will have one line. Same with osh77 jobs 1 and 2: 66 overlaps so final duration will be from 63-70 ie. 70 minus 63.
-if jobyrin=jobyrout assign 0.5.
-osj17 job 3 duration :60-75 and job 4 duraiton 65-70: another special overlap: Here duration for osj17 will be 60 to 75 ie. 75-60.
Please read the summary of the task below:

**************************************************************************************************************

Please, this is what I want to achieve:

1) Verify if job 1 overlaps with job 2. If not, do nothing. If yes, assign job 1 start year to job 2 and, when **necessary (see when necessary situation below), also assign job 1 end year to job 2.
Once this is done, job 1 start and end years are recoded as missing to eliminate this job from the duration calculation (to avoid counting twice the same years)

** When necessary: eg. Please see osj 17 jobs 3 and 4:

If job 1 go from 1960 to 1975 and job 2 go from 1965 to 1970,
then the duration for those two jobs should be from 1960 to 1975.
Since the code work by “deleting” the earliest jobs and modifying the later job dates,
we need to recode job 2 YEARIN and YEAROUT to the YEARIN and YEAROUT of job 1.

2) Repeat the process with the newly recoded job 2 (or original job 2 if it didn't overlap with job 1) and job 3. This process is repeated until all jobs are examined (up to the maximum number of jobs specified by the array length)

3) Assign a value of 0 the any missing start or stop year and calculate the duration of each remaining job (jobs with same start and stop years are coded as
lasting 0.5 year

4) Sum all the durations to obtain the final duration variable

5) Create the final dataset including IDs, the total duration variable, and the ever held job variable

* So What you did is right, two more things to do, however,please:
-Assigning 0.5 to same jobyrout = 70 and jobyrin= 70 for eg. osf26 job 6. 70-70 not zero but 0.5 in this situation.
-the special overlap situation: osj17 jobs 3 and 4.
Thanks very much for your assistance.
I would appreciate if you could help me fix the 2 tasks in your code above.

Thanks again.

ak.

Tom · Posted 03-08-2021 12:18 AM

In general you should be able to do this without transposing the data, just process it as you proceed through the dataset. But that might depend on on how complex the situation is. It is a little hard to follow the logic of how you want to handle overlaps. Can you clarify with simple examples? Here are some questions and suggested approach.

First do we care about the job type or code? Do we want to calculate the duration periods for different job codes separately? If not then what does it mean when two records for different job codes overlap? Do we truncate the time in the earlier job to end when the new job starts? Do we create a new job code to indicate there was a period of time with two (or more) jobs?

First let's assume the data is sorted by ID, START, END. (let's ignore the job code issue for now)

Then let's try to enumerate the possible relationships between the current line and the previous one.

1) No overlap at all.

2) The END of the previous matches the START of the current.

3) The END of the previous is between the START and END of the current.

4) The END of the previous is on or after the END of the current. (The previous period contains the current period.)

5) Some weird crossing of three or more overlapping records.

In general if you have two records where one stops and the other starts and you have no indication of when in the year the change happened you need to have a rule to handle that. You could assume that it always happened in the middle of the year and give each period half of the overlapping year.

ID START STOP DURATION
1  1975  1980   5.5
1  1980  1985   5.5
2  1975  1985   11

If there is a longer overlap then truncate the earlier one at the point where the next one starts.

So you should be able to calculate like this:

ID START STOP PREV NEXT DURATION
1  1975  1982    . 1980  5.5
1  1980  1985 1982    .  5.5
2  1975  1985    .    .  11

You can use LAG() to generate PREV value from STOP of the previous observation. You can use any of a number of methods to simulate the non-existent LEAD() function to generate NEXT form the START of the next observation.

ak2011 · Posted 03-08-2021 08:02 PM

Hello Tom,
Thanks again for your time. Please, I have provided a sample dataset, the objectives and the expected output.
Kindly check to see if you can help. What you suggested about trancation,etc, sorry, I have no clue.
Thanks in advance.
ak.
Calculating duration for overlaps and non-overlaps without using macro?
Hello all,
I would like to calculate durations (jobyrout-jobyrin) for overlap and non overlap years without using macro.
The macro didn’t work well with large datasets with long arrays; it worked good for a sample (small) dataset, so
I am obliged to find another method that will achieve the same aim, even if the method is long, I would prefer that to macro.
I would appreciate if someone knows of any other method to achieve my aim. The dataset, my objectives and expected output are given below.
I really need help; I have no clue, please.

Thanks in advance for your help.
ak.

/**Data set*/
data full_tdata;
input id$ 1-6 job 7-8 jobyrin 9-11 jobyrout 18-20 icode$ 23-27 lung$ 29-35;
datalines;
OSa13 3 73 78 6191 cacase
OSa30 1 39 46 7181 cacase
OSa30 3 56 64 6191 cacase
OSa73 1 23 31 7181 popcon
OSa73 2 31 42 5130 popcon
OSf26 2 70 70 6191 popcon
OSj17 3 60 75 6191 cacase
OSj17 4 65 70 6191 cacase
;

1) No overlap at all ie. .ids osa13 (duration=78-73=5) and osa30 (duration=46-39=7 for icode7181 and 64-56=8 for icode 6191).
2) The END of the previous matches the START of the current ie. id osa73 ; duration=31-23=8 for icode 7181 and 42-31=11 for icode 5130.
3) The START and END are the same ie. osf26: duration is automatically assigned 0.5 (70 to 70, not 0 but 0.5).
4) The START AND END of current is between START and END of previous. Eg. osj 17 jobs 3 and 4.; duration will be from 60 to 75 (ie. 75-60=15).

Expected output:
Icode 6191=janitor: if janitor(6191) is present (janitor_ever=1, otherwise janitor_ever=0) ; janitor_dur is the duration (jobyrout-jobyrin); icode 7181=baker, 5130=plumber.
id janitor_dur janitor_ever baker_dur baker_ever plumber_dur plumber_ever
osa13 5 1 0 0 0 0
osa30 8 1 7 1 0 0
osa73 0 0 8 1 11 1
osf26 0.5 1 0 0 0 0
osj17 15 1 0 0 0 0

SAS Innovate 2025: Call for Content

Classroom Training Available!