Hello,
I would appreciate if someone could help me with the code to reshape my long format exposure file(Table 3)
into a wide one to obtain one line per subject. It is seen that id OSa78 is repeated
because of the different jobs but I want to know if there is a way to transpose the table to have one line
for each subject. cot(idchem=1401) and bes(idchem=1108) are pollutants coded 0(uexposed)
and 1(exposed) from their respective idchems and their risks on lung cancer are being evaluated.
Please find attached the SAS code, log and results.
Thanks in advance for your help for me to obtain a wide format for the attached Table 3 or Table 1
ak.
/*Job file*/
data d1;
input id$ 1-6 job 7-8 idchem 9-13 conc 14-15 dur 16-17 confid 18-19 industcode 20-24;
datalines;
OSa01 1 1401 2 3 3 4153
OSa22 2 1108 2 2 3 2165
OSa23 1 1108 2 2 2 8733
OSa50 2 1401 2 3 3 9318
OSa53 4 1401 3 2 3 6191
OSa56 2 1401 1 1 3 1149
OSa78 1 1401 3 3 3 8563
OSa78 2 1401 3 3 3 8563
OSa78 3 1401 2 3 3 8563
OSa78 4 1401 2 1 3 6191
OSb17 1 1108 2 2 3 8791
OSb44 1 1108 1 2 2 8131
OSc35 2 1108 1 2 2 9533
OSc35 3 1108 1 2 2 9533
;
proc sort data=d1; by id idchem;
/* Subject file*/
data d2;
/*input id$ 1-5 job 6-7 agentid 8-12 conc 13-14 edur 15-16 econfi 17-18 indid 19-23;*/
input id$ 1-6 idchem 7-11 level 12-13 final_dur 14-15 final_conc 16-17 final_confid 18-19;
datalines;
OSa01 1401 2 4 2 3
OSa22 1108 2 3 2 3
OSa23 1108 2 7 2 2
OSa50 1401 1 1 2 3
OSa56 1401 1 35 1 3
OSa78 1401 2 48 3 3
OSb17 1108 2 46 2 3
OSb24 1401 2 25 2 3
OSac35 1108 1 3 1 1
;
proc sort data=d2; by id idchem;
/*Step 1: Merge d1 and d2*/
data d3; merge d1(in=a) d2(in=b); by id idchem; if b;
run;
proc print data=d3; title 'Table 1: merged d1 and d2'; run;
/*step 2: Create dummy variables(0=uexposed,1=exposed)*/
proc format;
value idchem
1401 = "cot"
1108= "bes";
run;
data temp1;
set d3;
dum = 1;
format idchem idchem.;
id_job=catx('_', id, job);
put _all_;
run;
proc sort data=temp1; by id job id_job; run;
proc transpose data=temp1 out=temp1b(drop=_name_);
by id job id_job;
id idchem;
var dum;
/*REPLACING MISSING VALUES(IE. DOTS(.)) WITH ZEROS(UNEXPOSED)*/
data m1;
set temp1b;
proc stdize out=m1u reponly missing=0;
run;
/* Finding unique ids from data d1*/
Proc summary data= m1u nway ;
class id;
var cot bes; output out=d4(drop=_type_ _freq_) max=;
run;
proc print data=d4;
title 'Table 2: Unique ids with cot and pollutant bes exposures(0,1)'; run;
run;
/* Step 3: Merge Table 2 with Table 1 (data d3) in step 1*/
proc sort data= d3; by id; run;
proc sort data=d4; by id;run;
data d3d4; merge d3 d4; by id; run;
proc sort data=d3d4; by id; run;
proc print data= d3d4; title ' Table 3: Merged job(d1 file) and subject(d2 file) data
showing dummy variables(0,1)';
run;
OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 /*Job file*/
74
75 data d1;
76 input id$ 1-6 job 7-8 idchem 9-13 conc 14-15 dur 16-17 confid 18-19 industcode 20-24;
77 datalines;
NOTE: The data set WORK.D1 has 14 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
92 ;
93
94 proc sort data=d1; by id idchem;
95
96 /* Subject file*/
97
NOTE: There were 14 observations read from the data set WORK.D1.
NOTE: The data set WORK.D1 has 14 observations and 7 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
98 data d2;
99 /*input id$ 1-5 job 6-7 agentid 8-12 conc 13-14 edur 15-16 econfi 17-18 indid 19-23;*/
100 input id$ 1-6 idchem 7-11 level 12-13 final_dur 14-15 final_conc 16-17 final_confid 18-19;
101 datalines;
NOTE: The data set WORK.D2 has 9 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
111 ;
112
113 proc sort data=d2; by id idchem;
114
115 /*Step 1: Merge d1 and d2*/
116
NOTE: There were 9 observations read from the data set WORK.D2.
NOTE: The data set WORK.D2 has 9 observations and 6 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
117 data d3; merge d1(in=a) d2(in=b); by id idchem; if b;
118 run;
NOTE: There were 14 observations read from the data set WORK.D1.
NOTE: There were 9 observations read from the data set WORK.D2.
NOTE: The data set WORK.D3 has 12 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
119
120 proc print data=d3; title 'Table 1: merged d1 and d2'; run;
NOTE: There were 12 observations read from the data set WORK.D3.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.23 seconds
cpu time 0.24 seconds
121
122
123 /*step 2: Create dummy variables(0=uexposed,1=exposed)*/
124 proc format;
125 value idchem
126 1401 = "cot"
127 1108= "bes";
NOTE: Format IDCHEM is already on the library WORK.FORMATS.
NOTE: Format IDCHEM has been output.
128 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
129
130 data temp1;
131 set d3;
132 dum = 1;
133 format idchem idchem.;
134 id_job=catx('_', id, job);
135 put _all_;
136 run;
id=OSa01 job=1 idchem=cot conc=2 dur=3 confid=3 industcode=4153 level=2 final_dur=4 final_conc=2 final_confid=3 dum=1
id_job=OSa01_1 _ERROR_=0 _N_=1
id=OSa22 job=2 idchem=bes conc=2 dur=2 confid=3 industcode=2165 level=2 final_dur=3 final_conc=2 final_confid=3 dum=1
id_job=OSa22_2 _ERROR_=0 _N_=2
id=OSa23 job=1 idchem=bes conc=2 dur=2 confid=2 industcode=8733 level=2 final_dur=7 final_conc=2 final_confid=2 dum=1
id_job=OSa23_1 _ERROR_=0 _N_=3
id=OSa50 job=2 idchem=cot conc=2 dur=3 confid=3 industcode=9318 level=1 final_dur=1 final_conc=2 final_confid=3 dum=1
id_job=OSa50_2 _ERROR_=0 _N_=4
id=OSa56 job=2 idchem=cot conc=1 dur=1 confid=3 industcode=1149 level=1 final_dur=35 final_conc=1 final_confid=3 dum=1
id_job=OSa56_2 _ERROR_=0 _N_=5
id=OSa78 job=1 idchem=cot conc=3 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1
id_job=OSa78_1 _ERROR_=0 _N_=6
id=OSa78 job=2 idchem=cot conc=3 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1
id_job=OSa78_2 _ERROR_=0 _N_=7
id=OSa78 job=3 idchem=cot conc=2 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1
id_job=OSa78_3 _ERROR_=0 _N_=8
id=OSa78 job=4 idchem=cot conc=2 dur=1 confid=3 industcode=6191 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1
id_job=OSa78_4 _ERROR_=0 _N_=9
id=OSac35 job=. idchem=bes conc=. dur=. confid=. industcode=. level=1 final_dur=3 final_conc=1 final_confid=1 dum=1 id_job=OSac35_.
_ERROR_=0 _N_=10
id=OSb17 job=1 idchem=bes conc=2 dur=2 confid=3 industcode=8791 level=2 final_dur=46 final_conc=2 final_confid=3 dum=1
id_job=OSb17_1 _ERROR_=0 _N_=11
id=OSb24 job=. idchem=cot conc=. dur=. confid=. industcode=. level=2 final_dur=25 final_conc=2 final_confid=3 dum=1 id_job=OSb24_.
_ERROR_=0 _N_=12
NOTE: There were 12 observations read from the data set WORK.D3.
NOTE: The data set WORK.TEMP1 has 12 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
137
138 proc sort data=temp1; by id job id_job; run;
NOTE: There were 12 observations read from the data set WORK.TEMP1.
NOTE: The data set WORK.TEMP1 has 12 observations and 13 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
139
140 proc transpose data=temp1 out=temp1b(drop=_name_);
141 by id job id_job;
142 id idchem;
143 var dum;
144
145 /*REPLACING MISSING VALUES(IE. DOTS(.)) WITH ZEROS(UNEXPOSED)*/
NOTE: There were 12 observations read from the data set WORK.TEMP1.
NOTE: The data set WORK.TEMP1B has 12 observations and 5 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
146 data m1;
147 set temp1b;
148
NOTE: There were 12 observations read from the data set WORK.TEMP1B.
NOTE: The data set WORK.M1 has 12 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
149 proc stdize out=m1u reponly missing=0;
150 run;
NOTE: No VAR statement is given. All numerical variables not named elsewhere make up the first set of variables.
NOTE: There were 12 observations read from the data set WORK.M1.
NOTE: The data set WORK.M1U has 12 observations and 5 variables.
NOTE: PROCEDURE STDIZE used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
151
152 /* Finding unique ids from data d1*/
153
154 Proc summary data= m1u nway ;
155 class id;
156 var cot bes; output out=d4(drop=_type_ _freq_) max=;
157 run;
NOTE: There were 12 observations read from the data set WORK.M1U.
NOTE: The data set WORK.D4 has 9 observations and 3 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
158
159 proc print data=d4;
160 title 'Table 2: Unique ids with cot and pollutant bes exposures(0,1)'; run;
NOTE: There were 9 observations read from the data set WORK.D4.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.12 seconds
cpu time 0.13 seconds
161 run;
162
163 /* Step 3: Merge Table 2 with Table 1 (data d3) in step 1*/
164
165 proc sort data= d3; by id; run;
NOTE: There were 12 observations read from the data set WORK.D3.
NOTE: The data set WORK.D3 has 12 observations and 11 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
166
167 proc sort data=d4; by id;run;
NOTE: There were 9 observations read from the data set WORK.D4.
NOTE: The data set WORK.D4 has 9 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
168
169 data d3d4; merge d3 d4; by id; run;
NOTE: There were 12 observations read from the data set WORK.D3.
NOTE: There were 9 observations read from the data set WORK.D4.
NOTE: The data set WORK.D3D4 has 12 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
170
171 proc sort data=d3d4; by id; run;
NOTE: There were 12 observations read from the data set WORK.D3D4.
NOTE: The data set WORK.D3D4 has 12 observations and 13 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
172
173 proc print data= d3d4; title ' Table 3: Merged job(d1 file) and subject(d2 file) data
174 showing dummy variables(0,1)';
175 run;
NOTE: There were 12 observations read from the data set WORK.D3D4.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.20 seconds
cpu time 0.20 seconds
176
177
178 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
190
OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 /*Job file*/
74
75 data d1;
76 input id$ 1-6 job 7-8 idchem 9-13 conc 14-15 dur 16-17 confid 18-19 industcode 20-24;
77 datalines;
NOTE: The data set WORK.D1 has 14 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
92 ;
93
94 proc sort data=d1; by id idchem;
95
96 /* Subject file*/
97
NOTE: There were 14 observations read from the data set WORK.D1.
NOTE: The data set WORK.D1 has 14 observations and 7 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
98 data d2;
99 /*input id$ 1-5 job 6-7 agentid 8-12 conc 13-14 edur 15-16 econfi 17-18 indid 19-23;*/
100 input id$ 1-6 idchem 7-11 level 12-13 final_dur 14-15 final_conc 16-17 final_confid 18-19;
101 datalines;
NOTE: The data set WORK.D2 has 9 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
111 ;
112
113 proc sort data=d2; by id idchem;
114
115 /*Step 1: Merge d1 and d2*/
116
NOTE: There were 9 observations read from the data set WORK.D2.
NOTE: The data set WORK.D2 has 9 observations and 6 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
117 data d3; merge d1(in=a) d2(in=b); by id idchem; if b;
118 run;
NOTE: There were 14 observations read from the data set WORK.D1.
NOTE: There were 9 observations read from the data set WORK.D2.
NOTE: The data set WORK.D3 has 12 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
119
120 proc print data=d3; title 'Table 1: merged d1 and d2'; run;
NOTE: There were 12 observations read from the data set WORK.D3.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.23 seconds
cpu time 0.24 seconds
121
122
123 /*step 2: Create dummy variables(0=uexposed,1=exposed)*/
124 proc format;
125 value idchem
126 1401 = "cot"
127 1108= "bes";
NOTE: Format IDCHEM is already on the library WORK.FORMATS.
NOTE: Format IDCHEM has been output.
128 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
129
130 data temp1;
131 set d3;
132 dum = 1;
133 format idchem idchem.;
134 id_job=catx('_', id, job);
135 put _all_;
136 run;
id=OSa01 job=1 idchem=cot conc=2 dur=3 confid=3 industcode=4153 level=2 final_dur=4 final_conc=2 final_confid=3 dum=1
id_job=OSa01_1 _ERROR_=0 _N_=1
id=OSa22 job=2 idchem=bes conc=2 dur=2 confid=3 industcode=2165 level=2 final_dur=3 final_conc=2 final_confid=3 dum=1
id_job=OSa22_2 _ERROR_=0 _N_=2
id=OSa23 job=1 idchem=bes conc=2 dur=2 confid=2 industcode=8733 level=2 final_dur=7 final_conc=2 final_confid=2 dum=1
id_job=OSa23_1 _ERROR_=0 _N_=3
id=OSa50 job=2 idchem=cot conc=2 dur=3 confid=3 industcode=9318 level=1 final_dur=1 final_conc=2 final_confid=3 dum=1
id_job=OSa50_2 _ERROR_=0 _N_=4
id=OSa56 job=2 idchem=cot conc=1 dur=1 confid=3 industcode=1149 level=1 final_dur=35 final_conc=1 final_confid=3 dum=1
id_job=OSa56_2 _ERROR_=0 _N_=5
id=OSa78 job=1 idchem=cot conc=3 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1
id_job=OSa78_1 _ERROR_=0 _N_=6
id=OSa78 job=2 idchem=cot conc=3 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1
id_job=OSa78_2 _ERROR_=0 _N_=7
id=OSa78 job=3 idchem=cot conc=2 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1
id_job=OSa78_3 _ERROR_=0 _N_=8
id=OSa78 job=4 idchem=cot conc=2 dur=1 confid=3 industcode=6191 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1
id_job=OSa78_4 _ERROR_=0 _N_=9
id=OSac35 job=. idchem=bes conc=. dur=. confid=. industcode=. level=1 final_dur=3 final_conc=1 final_confid=1 dum=1 id_job=OSac35_.
_ERROR_=0 _N_=10
id=OSb17 job=1 idchem=bes conc=2 dur=2 confid=3 industcode=8791 level=2 final_dur=46 final_conc=2 final_confid=3 dum=1
id_job=OSb17_1 _ERROR_=0 _N_=11
id=OSb24 job=. idchem=cot conc=. dur=. confid=. industcode=. level=2 final_dur=25 final_conc=2 final_confid=3 dum=1 id_job=OSb24_.
_ERROR_=0 _N_=12
NOTE: There were 12 observations read from the data set WORK.D3.
NOTE: The data set WORK.TEMP1 has 12 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
137
138 proc sort data=temp1; by id job id_job; run;
NOTE: There were 12 observations read from the data set WORK.TEMP1.
NOTE: The data set WORK.TEMP1 has 12 observations and 13 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
139
140 proc transpose data=temp1 out=temp1b(drop=_name_);
141 by id job id_job;
142 id idchem;
143 var dum;
144
145 /*REPLACING MISSING VALUES(IE. DOTS(.)) WITH ZEROS(UNEXPOSED)*/
NOTE: There were 12 observations read from the data set WORK.TEMP1.
NOTE: The data set WORK.TEMP1B has 12 observations and 5 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
146 data m1;
147 set temp1b;
148
NOTE: There were 12 observations read from the data set WORK.TEMP1B.
NOTE: The data set WORK.M1 has 12 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
149 proc stdize out=m1u reponly missing=0;
150 run;
NOTE: No VAR statement is given. All numerical variables not named elsewhere make up the first set of variables.
NOTE: There were 12 observations read from the data set WORK.M1.
NOTE: The data set WORK.M1U has 12 observations and 5 variables.
NOTE: PROCEDURE STDIZE used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
151
152 /* Finding unique ids from data d1*/
153
154 Proc summary data= m1u nway ;
155 class id;
156 var cot bes; output out=d4(drop=_type_ _freq_) max=;
157 run;
NOTE: There were 12 observations read from the data set WORK.M1U.
NOTE: The data set WORK.D4 has 9 observations and 3 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
158
159 proc print data=d4;
160 title 'Table 2: Unique ids with cot and pollutant bes exposures(0,1)'; run;
NOTE: There were 9 observations read from the data set WORK.D4.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.12 seconds
cpu time 0.13 seconds
161 run;
162
163 /* Step 3: Merge Table 2 with Table 1 (data d3) in step 1*/
164
165 proc sort data= d3; by id; run;
NOTE: There were 12 observations read from the data set WORK.D3.
NOTE: The data set WORK.D3 has 12 observations and 11 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
166
167 proc sort data=d4; by id;run;
NOTE: There were 9 observations read from the data set WORK.D4.
NOTE: The data set WORK.D4 has 9 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
168
169 data d3d4; merge d3 d4; by id; run;
NOTE: There were 12 observations read from the data set WORK.D3.
NOTE: There were 9 observations read from the data set WORK.D4.
NOTE: The data set WORK.D3D4 has 12 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
170
171 proc sort data=d3d4; by id; run;
NOTE: There were 12 observations read from the data set WORK.D3D4.
NOTE: The data set WORK.D3D4 has 12 observations and 13 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
172
173 proc print data= d3d4; title ' Table 3: Merged job(d1 file) and subject(d2 file) data
174 showing dummy variables(0,1)';
175 run;
NOTE: There were 12 observations read from the data set WORK.D3D4.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.20 seconds
cpu time 0.20 seconds
176
177
178 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
190
I (and others) will not download attachments. Please show us the desired output via placing a screen capture in your message (use the camera icon).
I also point out that this transposing long to wide is rarely a good idea, and makes coding the next steps (usually) much harder, you would be much better off leaving this as a long data set. What do you intend to do once you transpose this to a wide data set?
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.