Hello,
Thanks to all who provided solutions/comments to my question of merging two macro results. However, the problem has still not been solved, so I would appreciate if someone could help me.
I am reposting the same question. The unexpected merged results (17 observations) this time is given for your perusal.
In my previous posts,I mentioned that I wanted to merge 2 macro results datasets :Table 1 (janitor)
and Table 2 (Baker). Janitor dataset is called jan and baker is bake.
The merged file mjb produced unexpected results (Table 3-17 observations).
I have still not found the solution to
my problem inspite of reading extensively on the subject from SAS Forum answers, SAS online documentation and so on.
I would appreciate your help,please.
My code, data and results are found below. My expected output is placed after results.
Thanks in advance. [The log has no errors but too long so not shown].
ak.
/**MACRO*/
data full_tdata;
input id$ 1-6 job 7-8 jobyrin 9-11 jobyrout 18-20 icode$ 23-27 lung$ 29-35;
datalines;
OSa13 3 73 78 6191 cacase
OSa30 1 39 46 7181 cacase
OSa30 3 56 64 6191 cacase
OSa73 1 23 31 7181 popcon
OSa73 2 31 42 5130 popcon
OSa86 3 46 60 6198 cacon
OSa86 4 60 70 5130 cacon
OSa93 3 68 72 6121 popcon
OSf26 1 54 54 6198 popcon
OSf26 2 70 70 6191 popcon
OSh77 1 63 66 6121 cacon
OSh77 2 66 70 6121 cacon
OSh77 3 70 71 6121 cacon
OSi84 1 67 75 6191 cacase
OSi84 2 75 81 6191 cacase
OSj17 3 60 75 6191 cacase
OSj17 4 65 70 6191 cacase
;
proc print data=full_tdata;
title 'Table O: Original full_tdata file'; run;
data small_tdata; set full_tdata; drop lung;
*max 4 jobs per subject;
%macro duration(database=small_tdata, icode=6191, Job_title=janitor);
*Creating a dataset containing all the jobs with selected icode;
data &Job_title;
set &database;
if icode = &icode;
run;
*transposing dataset so that the YEARIN and YEAROUT of all jobs held by a subject are in the same row;
*YEARIN;
proc transpose data = &Job_title out = &Job_title._yearin prefix = YEARIN;
by ID;
var jobyrin;
run;
proc sort data = &Job_title._yearin;
by id;
run;
*YEAROUT;
proc transpose data = &Job_title out = &Job_title._yearout prefix = YEAROUT;
by ID;
var jobyrout;
run;
proc sort data = &Job_title._yearout;
by id;
run;
data &Job_title._final;
merge &Job_title._yearin &Job_title._yearout;
by ID;
array YEARIN (3) YEARIN1 YEARIN2 YEARIN3;
array YEAROUT (3) YEAROUT1 YEAROUT2 YEAROUT3;
array YEARINv2 (3) YEARIN2 YEARIN3 YEARIN4 ;
array YEAROUTv2 (3) YEAROUT2 YEAROUT3 YEAROUT4;
array YEARIN_F (4) YEARIN1 YEARIN2 YEARIN3 YEARIN4;
array YEAROUT_F (4) YEAROUT1 YEAROUT2 YEAROUT3 YEAROUT4;
Array duration (4) Dur1 Dur2 Dur3 Dur4;
do i = 1 to 3;
if yearin(i) ne . and yearinv2(i) ne . and yearinv2(i) le yearout(i) and yearoutv2(i) ge yearout(i) then do; yearinv2(i) = yearin(i); yearin(i) = .;yearout(i) = .;end;
else if yearin(i) ne . and yearinv2(i) ne . and yearinv2(i) le yearout(i) and yearoutv2(i) lt yearout(i) then do; yearinv2(i) = yearin(i);yearoutv2(i) = yearout(i); yearin(i) = .;yearout(i) = .;end;
end;
do i = 1 to 4;
if YEARIN_F(i) = . then YEARIN_F(i) = 0;
if YEAROUT_F(i) = . then YEAROUT_F(i) = 0;
duration(i) = YEAROUT_F(i) - YEARIN_F(i);
if YEARIN_F(i) ne 0 and YEAROUT_F(i) = YEARIN_F(i) then duration(i) = 0.5;
end;
&Job_title._dur = Dur1 + Dur2 + Dur3 + Dur4;
&Job_title._ever = 1;
keep ID &Job_title._dur &Job_title._ever;
run;
%MEND duration;
/* 1st macro-janitor icode 6191*/
data jan; set small_tdata;
%duration(database=small_tdata, icode=6191, Job_title=janitor);
proc print;
title 'Table 1: janitor records';
run;
/* 2nd macro-baker icode 7181*/
data bake; set small_tdata;
%duration(database=small_tdata,icode=7181, Job_title=baker);
proc print;
title 'Table 2: baker records';
run;
/*Merging the janitor (Table 1) and baker(Table 2) files*/
proc sort data=jan; by id;
proc sort data=bake; by id;
data mjb; merge jan bake; by id; run;
proc print data=mjb;
Title 'Table 3: Merged Table 1(janitor) and Table 2 (baker)-Unexpected merge
This is not what I want';
run;
Table O: Original full_tdata file
|
||||||
Obs |
id |
job |
jobyrin |
jobyrout |
icode |
lung |
1 |
OSa13 |
3 |
73 |
78 |
6191 |
cacase |
2 |
OSa30 |
1 |
39 |
46 |
7181 |
cacase |
3 |
OSa30 |
3 |
56 |
64 |
6191 |
cacase |
4 |
OSa73 |
1 |
23 |
31 |
7181 |
popcon |
5 |
OSa73 |
2 |
31 |
42 |
5130 |
popcon |
6 |
OSa86 |
3 |
46 |
60 |
6198 |
cacon |
7 |
OSa86 |
4 |
60 |
70 |
5130 |
cacon |
8 |
OSa93 |
3 |
68 |
72 |
6121 |
popcon |
9 |
OSf26 |
1 |
54 |
54 |
6198 |
popcon |
10 |
OSf26 |
2 |
70 |
70 |
6191 |
popcon |
11 |
OSh77 |
1 |
63 |
66 |
6121 |
cacon |
12 |
OSh77 |
2 |
66 |
70 |
6121 |
cacon |
13 |
OSh77 |
3 |
70 |
71 |
6121 |
cacon |
14 |
OSi84 |
1 |
67 |
75 |
6191 |
cacase |
15 |
OSi84 |
2 |
75 |
81 |
6191 |
cacase |
16 |
OSj17 |
3 |
60 |
75 |
6191 |
cacase |
17 |
OSj17 |
4 |
65 |
70 |
6191 |
cacase |
Table 1: janitor records
|
|||
Obs |
id |
janitor_dur |
janitor_ever |
1 |
OSa13 |
5.0 |
1 |
2 |
OSa30 |
8.0 |
1 |
3 |
OSf26 |
0.5 |
1 |
4 |
OSi84 |
14.0 |
1 |
5 |
OSj17 |
15.0 |
1 |
Table 2: baker records
|
|||
Obs |
id |
baker_dur |
baker_ever |
1 |
OSa30 |
7 |
1 |
2 |
OSa73 |
8 |
1 |
Table 3: Merged Table 1(janitor) and Table 2 (baker)-Unexpected mergeThis is not what I want
|
||||
Obs |
id |
job |
jobyrin |
jobyrout |
1 |
OSa13 |
3 |
73 |
78 |
2 |
OSa30 |
1 |
39 |
46 |
3 |
OSa30 |
3 |
56 |
64 |
4 |
OSa73 |
1 |
23 |
31 |
5 |
OSa73 |
2 |
31 |
42 |
6 |
OSa86 |
3 |
46 |
60 |
7 |
OSa86 |
4 |
60 |
70 |
8 |
OSa93 |
3 |
68 |
72 |
9 |
OSf26 |
1 |
54 |
54 |
10 |
OSf26 |
2 |
70 |
70 |
11 |
OSh77 |
1 |
63 |
66 |
12 |
OSh77 |
2 |
66 |
70 |
13 |
OSh77 |
3 |
70 |
71 |
14 |
OSi84 |
1 |
67 |
75 |
15 |
OSi84 |
2 |
75 |
81 |
16 |
OSj17 |
3 |
60 |
75 |
17 |
OSj17 |
4 |
65 |
70 |
Expected output:
id janitor_dur janitor_ever baker_dur baker_ever
osa13 5 1 0 0
osa30 8 1 7 1
osa73 0 0 8 1
osf26 0.5 1 0 0
osi84 14 1 0 0
osj17 15 1 0 0
Tables 1 and 2 are ok, Table 3 is not good: My aim is to merge Tables 1 and 2 to obtain the expected output above.
I would finally merge janitor and baker files (merged Tables 1&2) with table 0: original file.
Please help. Thanks.
ak.
READ YOUR BLINKING LOG.
Your log will show the names of the data sets created and how many records.
Your printed output is not from bake nor jan. The output data set names from your macro are janitor_final and baker_final
READ YOUR BLINKING LOG.
Your log will show the names of the data sets created and how many records.
Your printed output is not from bake nor jan. The output data set names from your macro are janitor_final and baker_final
@Reeza wrote:
Your macro is generating tables with the final names of &Job_title._final, so baker_final and janitor_final. And using Janitor and Baker, not Jan/Bake so the references seem inconsistent.
Where are the Jan/baker files at the end coming from?
OP had two bits like
data jan; set small_tdata;
called immediately before the %duration macro call.
Just read the SAS log to see what the actual names of the datasets you macro generates.
Note it is best not to start using macro code to generate SAS code until you understand how to generate SAS code yourself. Otherwise you will just confuse yourself.
What are you trying to accomplish? Do you even need macro code? To find the duration it is easier with your original structure.
data full_tdata;
input id $ job jobyrin jobyrout icode $ lung $;
datalines;
OSa13 3 73 78 6191 cacase
OSa30 1 39 46 7181 cacase
OSa30 3 56 64 6191 cacase
OSa73 1 23 31 7181 popcon
OSa73 2 31 42 5130 popcon
OSa86 3 46 60 6198 cacon
OSa86 4 60 70 5130 cacon
OSa93 3 68 72 6121 popcon
OSf26 1 54 54 6198 popcon
OSf26 2 70 70 6191 popcon
OSh77 1 63 66 6121 cacon
OSh77 2 66 70 6121 cacon
OSh77 3 70 71 6121 cacon
OSi84 1 67 75 6191 cacase
OSi84 2 75 81 6191 cacase
OSj17 3 60 75 6191 cacase
OSj17 4 65 70 6191 cacase
;
data job_duration ;
set full_tdata;
length job_name $10 ;
if icode='6191' then job_name='janitor';
else if icode='7181' then job_name='baker';
else job_name='other';
duration = jobyrout - jobyrin;
run;
proc summary data=job_duration nway ;
by id;
class job_name ;
var duration ;
output out=want sum=;
run;
options missing='0';
proc report data=want ;
column id duration,job_name ;
define id / group;
define job_name / across ' ';
define duration / ' ';
run;
id baker janitor other OSa13 0 5 0 OSa30 7 8 0 OSa73 8 0 11 OSa86 0 0 24 OSa93 0 0 4 OSf26 0 0 0 OSh77 0 0 8 OSi84 0 14 0 OSj17 0 20 0
In general you should be able to do this without transposing the data, just process it as you proceed through the dataset. But that might depend on on how complex the situation is. It is a little hard to follow the logic of how you want to handle overlaps. Can you clarify with simple examples? Here are some questions and suggested approach.
First do we care about the job type or code? Do we want to calculate the duration periods for different job codes separately? If not then what does it mean when two records for different job codes overlap? Do we truncate the time in the earlier job to end when the new job starts? Do we create a new job code to indicate there was a period of time with two (or more) jobs?
First let's assume the data is sorted by ID, START, END. (let's ignore the job code issue for now)
Then let's try to enumerate the possible relationships between the current line and the previous one.
1) No overlap at all.
2) The END of the previous matches the START of the current.
3) The END of the previous is between the START and END of the current.
4) The END of the previous is on or after the END of the current. (The previous period contains the current period.)
5) Some weird crossing of three or more overlapping records.
In general if you have two records where one stops and the other starts and you have no indication of when in the year the change happened you need to have a rule to handle that. You could assume that it always happened in the middle of the year and give each period half of the overlapping year.
ID START STOP DURATION 1 1975 1980 5.5 1 1980 1985 5.5 2 1975 1985 11
If there is a longer overlap then truncate the earlier one at the point where the next one starts.
So you should be able to calculate like this:
ID START STOP PREV NEXT DURATION
1 1975 1982 . 1980 5.5
1 1980 1985 1982 . 5.5
2 1975 1985 . . 11
You can use LAG() to generate PREV value from STOP of the previous observation. You can use any of a number of methods to simulate the non-existent LEAD() function to generate NEXT form the START of the next observation.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.