Long format data to wide format to obtain 1 line per id

ak2011 · Posted 11-15-2020 12:07 PM

 
Hello,
I would appreciate if someone could help me with the code to reshape my long format exposure file(Table 3)
into a wide one to obtain one line per subject. It is seen that id OSa78 is repeated 
because of the different jobs but I want to know if there is a way to transpose the table to have one line
for each subject. cot(idchem=1401) and bes(idchem=1108) are pollutants coded 0(uexposed)
 and 1(exposed) from their respective idchems and their risks on lung cancer are being evaluated.
Please find attached the SAS code, log and results.

Thanks in advance for your help for me to obtain a wide format for the attached Table 3 or Table 1
ak.

/*Job file*/
data d1;
input id$ 1-6  job 7-8 idchem 9-13  conc 14-15  dur 16-17  confid 18-19  industcode 20-24;
datalines;
OSa01	1	1401	2	3	3	4153
OSa22	2	1108	2	2	3	2165
OSa23	1	1108	2	2	2	8733
OSa50	2	1401	2	3	3	9318
OSa53	4	1401	3	2	3	6191
OSa56	2	1401	1	1	3	1149
OSa78	1	1401	3	3	3	8563
OSa78	2	1401	3	3	3	8563
OSa78	3	1401	2	3	3	8563
OSa78	4	1401	2	1	3	6191
OSb17	1	1108	2	2	3	8791
OSb44	1	1108	1	2	2	8131
OSc35	2	1108	1	2	2	9533
OSc35	3	1108	1	2	2	9533
;
                 
proc sort data=d1; by id idchem;
 
  /* Subject file*/
 
data d2;
/*input id$ 1-5 job 6-7 agentid 8-12  conc 13-14 edur 15-16  econfi 17-18 indid 19-23;*/
input id$ 1-6  idchem 7-11 level 12-13 final_dur 14-15 final_conc 16-17 final_confid 18-19;
datalines;  
OSa01	1401	2	4	2	3
OSa22	1108	2	3	2	3
OSa23	1108	2	7	2	2
OSa50	1401	1	1	2	3
OSa56	1401	1	35	1	3
OSa78	1401	2	48	3	3
OSb17	1108	2	46	2	3
OSb24	1401	2	25	2	3
OSac35	1108	1	3	1	1
;

proc sort data=d2; by id idchem;

/*Step 1: Merge d1 and d2*/

data d3; merge d1(in=a) d2(in=b); by id idchem; if b;
run;

proc print data=d3; title 'Table 1: merged d1 and d2'; run;


/*step 2: Create dummy variables(0=uexposed,1=exposed)*/
proc format;
value idchem
1401 = "cot"
1108= "bes";
run;

data temp1;
set d3;
dum = 1;
format idchem idchem.;
id_job=catx('_', id, job); 
put _all_; 
run;

proc sort data=temp1; by id job id_job; run;

proc transpose data=temp1 out=temp1b(drop=_name_);
by id job id_job;
id idchem;
var dum;

/*REPLACING MISSING VALUES(IE. DOTS(.)) WITH ZEROS(UNEXPOSED)*/
data m1;
set temp1b;

proc stdize out=m1u reponly missing=0;
run;
 
/* Finding unique ids from data d1*/

Proc summary data= m1u nway ;
class id;
var cot bes; output out=d4(drop=_type_ _freq_) max=;
run;

proc print data=d4;
title 'Table 2: Unique ids with cot and pollutant bes exposures(0,1)'; run;
run;
 
/* Step 3: Merge Table 2 with Table 1 (data d3) in step 1*/ 
 
 proc sort data= d3; by id; run;
 
 proc sort data=d4; by id;run;
 
 data d3d4; merge d3 d4; by id; run;
 
 proc sort data=d3d4; by id; run;
 
proc print data= d3d4; title ' Table 3: Merged job(d1 file) and subject(d2 file) data 
showing dummy variables(0,1)';
run;

OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

72

73 /*Job file*/

74

75 data d1;

76 input id$ 1-6 job 7-8 idchem 9-13 conc 14-15 dur 16-17 confid 18-19 industcode 20-24;

77 datalines;

NOTE: The data set WORK.D1 has 14 observations and 7 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

92 ;

93

94 proc sort data=d1; by id idchem;

95

96 /* Subject file*/

97

NOTE: There were 14 observations read from the data set WORK.D1.

NOTE: The data set WORK.D1 has 14 observations and 7 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

98 data d2;

99 /*input id$ 1-5 job 6-7 agentid 8-12 conc 13-14 edur 15-16 econfi 17-18 indid 19-23;*/

100 input id$ 1-6 idchem 7-11 level 12-13 final_dur 14-15 final_conc 16-17 final_confid 18-19;

101 datalines;

NOTE: The data set WORK.D2 has 9 observations and 6 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

111 ;

112

113 proc sort data=d2; by id idchem;

114

115 /*Step 1: Merge d1 and d2*/

116

NOTE: There were 9 observations read from the data set WORK.D2.

NOTE: The data set WORK.D2 has 9 observations and 6 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

117 data d3; merge d1(in=a) d2(in=b); by id idchem; if b;

118 run;

NOTE: There were 14 observations read from the data set WORK.D1.

NOTE: There were 9 observations read from the data set WORK.D2.

NOTE: The data set WORK.D3 has 12 observations and 11 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

119

120 proc print data=d3; title 'Table 1: merged d1 and d2'; run;

NOTE: There were 12 observations read from the data set WORK.D3.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.23 seconds

cpu time 0.24 seconds

121

122

123 /*step 2: Create dummy variables(0=uexposed,1=exposed)*/

124 proc format;

125 value idchem

126 1401 = "cot"

127 1108= "bes";

NOTE: Format IDCHEM is already on the library WORK.FORMATS.

NOTE: Format IDCHEM has been output.

128 run;

NOTE: PROCEDURE FORMAT used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

129

130 data temp1;

131 set d3;

132 dum = 1;

133 format idchem idchem.;

134 id_job=catx('_', id, job);

135 put _all_;

136 run;

id=OSa01 job=1 idchem=cot conc=2 dur=3 confid=3 industcode=4153 level=2 final_dur=4 final_conc=2 final_confid=3 dum=1

id_job=OSa01_1 _ERROR_=0 _N_=1

id=OSa22 job=2 idchem=bes conc=2 dur=2 confid=3 industcode=2165 level=2 final_dur=3 final_conc=2 final_confid=3 dum=1

id_job=OSa22_2 _ERROR_=0 _N_=2

id=OSa23 job=1 idchem=bes conc=2 dur=2 confid=2 industcode=8733 level=2 final_dur=7 final_conc=2 final_confid=2 dum=1

id_job=OSa23_1 _ERROR_=0 _N_=3

id=OSa50 job=2 idchem=cot conc=2 dur=3 confid=3 industcode=9318 level=1 final_dur=1 final_conc=2 final_confid=3 dum=1

id_job=OSa50_2 _ERROR_=0 _N_=4

id=OSa56 job=2 idchem=cot conc=1 dur=1 confid=3 industcode=1149 level=1 final_dur=35 final_conc=1 final_confid=3 dum=1

id_job=OSa56_2 _ERROR_=0 _N_=5

id=OSa78 job=1 idchem=cot conc=3 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1

id_job=OSa78_1 _ERROR_=0 _N_=6

id=OSa78 job=2 idchem=cot conc=3 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1

id_job=OSa78_2 _ERROR_=0 _N_=7

id=OSa78 job=3 idchem=cot conc=2 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1

id_job=OSa78_3 _ERROR_=0 _N_=8

id=OSa78 job=4 idchem=cot conc=2 dur=1 confid=3 industcode=6191 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1

id_job=OSa78_4 _ERROR_=0 _N_=9

id=OSac35 job=. idchem=bes conc=. dur=. confid=. industcode=. level=1 final_dur=3 final_conc=1 final_confid=1 dum=1 id_job=OSac35_.

_ERROR_=0 _N_=10

id=OSb17 job=1 idchem=bes conc=2 dur=2 confid=3 industcode=8791 level=2 final_dur=46 final_conc=2 final_confid=3 dum=1

id_job=OSb17_1 _ERROR_=0 _N_=11

id=OSb24 job=. idchem=cot conc=. dur=. confid=. industcode=. level=2 final_dur=25 final_conc=2 final_confid=3 dum=1 id_job=OSb24_.

_ERROR_=0 _N_=12

NOTE: There were 12 observations read from the data set WORK.D3.

NOTE: The data set WORK.TEMP1 has 12 observations and 13 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

137

138 proc sort data=temp1; by id job id_job; run;

NOTE: There were 12 observations read from the data set WORK.TEMP1.

NOTE: The data set WORK.TEMP1 has 12 observations and 13 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

139

140 proc transpose data=temp1 out=temp1b(drop=_name_);

141 by id job id_job;

142 id idchem;

143 var dum;

144

145 /*REPLACING MISSING VALUES(IE. DOTS(.)) WITH ZEROS(UNEXPOSED)*/

NOTE: There were 12 observations read from the data set WORK.TEMP1.

NOTE: The data set WORK.TEMP1B has 12 observations and 5 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

real time 0.02 seconds

cpu time 0.02 seconds

146 data m1;

147 set temp1b;

148

NOTE: There were 12 observations read from the data set WORK.TEMP1B.

NOTE: The data set WORK.M1 has 12 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

149 proc stdize out=m1u reponly missing=0;

150 run;

NOTE: No VAR statement is given. All numerical variables not named elsewhere make up the first set of variables.

NOTE: There were 12 observations read from the data set WORK.M1.

NOTE: The data set WORK.M1U has 12 observations and 5 variables.

NOTE: PROCEDURE STDIZE used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

151

152 /* Finding unique ids from data d1*/

153

154 Proc summary data= m1u nway ;

155 class id;

156 var cot bes; output out=d4(drop=_type_ _freq_) max=;

157 run;

NOTE: There were 12 observations read from the data set WORK.M1U.

NOTE: The data set WORK.D4 has 9 observations and 3 variables.

NOTE: PROCEDURE SUMMARY used (Total process time):

real time 0.02 seconds

cpu time 0.02 seconds

158

159 proc print data=d4;

160 title 'Table 2: Unique ids with cot and pollutant bes exposures(0,1)'; run;

NOTE: There were 9 observations read from the data set WORK.D4.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.12 seconds

cpu time 0.13 seconds

161 run;

162

163 /* Step 3: Merge Table 2 with Table 1 (data d3) in step 1*/

164

165 proc sort data= d3; by id; run;

NOTE: There were 12 observations read from the data set WORK.D3.

NOTE: The data set WORK.D3 has 12 observations and 11 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

166

167 proc sort data=d4; by id;run;

NOTE: There were 9 observations read from the data set WORK.D4.

NOTE: The data set WORK.D4 has 9 observations and 3 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.03 seconds

cpu time 0.00 seconds

168

169 data d3d4; merge d3 d4; by id; run;

NOTE: There were 12 observations read from the data set WORK.D3.

NOTE: There were 9 observations read from the data set WORK.D4.

NOTE: The data set WORK.D3D4 has 12 observations and 13 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.03 seconds

170

171 proc sort data=d3d4; by id; run;

NOTE: There were 12 observations read from the data set WORK.D3D4.

NOTE: The data set WORK.D3D4 has 12 observations and 13 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

172

173 proc print data= d3d4; title ' Table 3: Merged job(d1 file) and subject(d2 file) data

174 showing dummy variables(0,1)';

175 run;

NOTE: There were 12 observations read from the data set WORK.D3D4.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.20 seconds

cpu time 0.20 seconds

176

177

178 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

190

OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

72

73 /*Job file*/

74

75 data d1;

76 input id$ 1-6 job 7-8 idchem 9-13 conc 14-15 dur 16-17 confid 18-19 industcode 20-24;

77 datalines;

NOTE: The data set WORK.D1 has 14 observations and 7 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

92 ;

93

94 proc sort data=d1; by id idchem;

95

96 /* Subject file*/

97

NOTE: There were 14 observations read from the data set WORK.D1.

NOTE: The data set WORK.D1 has 14 observations and 7 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

98 data d2;

99 /*input id$ 1-5 job 6-7 agentid 8-12 conc 13-14 edur 15-16 econfi 17-18 indid 19-23;*/

100 input id$ 1-6 idchem 7-11 level 12-13 final_dur 14-15 final_conc 16-17 final_confid 18-19;

101 datalines;

NOTE: The data set WORK.D2 has 9 observations and 6 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

111 ;

112

113 proc sort data=d2; by id idchem;

114

115 /*Step 1: Merge d1 and d2*/

116

NOTE: There were 9 observations read from the data set WORK.D2.

NOTE: The data set WORK.D2 has 9 observations and 6 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

117 data d3; merge d1(in=a) d2(in=b); by id idchem; if b;

118 run;

NOTE: There were 14 observations read from the data set WORK.D1.

NOTE: There were 9 observations read from the data set WORK.D2.

NOTE: The data set WORK.D3 has 12 observations and 11 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

119

120 proc print data=d3; title 'Table 1: merged d1 and d2'; run;

NOTE: There were 12 observations read from the data set WORK.D3.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.23 seconds

cpu time 0.24 seconds

121

122

123 /*step 2: Create dummy variables(0=uexposed,1=exposed)*/

124 proc format;

125 value idchem

126 1401 = "cot"

127 1108= "bes";

NOTE: Format IDCHEM is already on the library WORK.FORMATS.

NOTE: Format IDCHEM has been output.

128 run;

NOTE: PROCEDURE FORMAT used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

129

130 data temp1;

131 set d3;

132 dum = 1;

133 format idchem idchem.;

134 id_job=catx('_', id, job);

135 put _all_;

136 run;

id=OSa01 job=1 idchem=cot conc=2 dur=3 confid=3 industcode=4153 level=2 final_dur=4 final_conc=2 final_confid=3 dum=1

id_job=OSa01_1 _ERROR_=0 _N_=1

id=OSa22 job=2 idchem=bes conc=2 dur=2 confid=3 industcode=2165 level=2 final_dur=3 final_conc=2 final_confid=3 dum=1

id_job=OSa22_2 _ERROR_=0 _N_=2

id=OSa23 job=1 idchem=bes conc=2 dur=2 confid=2 industcode=8733 level=2 final_dur=7 final_conc=2 final_confid=2 dum=1

id_job=OSa23_1 _ERROR_=0 _N_=3

id=OSa50 job=2 idchem=cot conc=2 dur=3 confid=3 industcode=9318 level=1 final_dur=1 final_conc=2 final_confid=3 dum=1

id_job=OSa50_2 _ERROR_=0 _N_=4

id=OSa56 job=2 idchem=cot conc=1 dur=1 confid=3 industcode=1149 level=1 final_dur=35 final_conc=1 final_confid=3 dum=1

id_job=OSa56_2 _ERROR_=0 _N_=5

id=OSa78 job=1 idchem=cot conc=3 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1

id_job=OSa78_1 _ERROR_=0 _N_=6

id=OSa78 job=2 idchem=cot conc=3 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1

id_job=OSa78_2 _ERROR_=0 _N_=7

id=OSa78 job=3 idchem=cot conc=2 dur=3 confid=3 industcode=8563 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1

id_job=OSa78_3 _ERROR_=0 _N_=8

id=OSa78 job=4 idchem=cot conc=2 dur=1 confid=3 industcode=6191 level=2 final_dur=48 final_conc=3 final_confid=3 dum=1

id_job=OSa78_4 _ERROR_=0 _N_=9

id=OSac35 job=. idchem=bes conc=. dur=. confid=. industcode=. level=1 final_dur=3 final_conc=1 final_confid=1 dum=1 id_job=OSac35_.

_ERROR_=0 _N_=10

id=OSb17 job=1 idchem=bes conc=2 dur=2 confid=3 industcode=8791 level=2 final_dur=46 final_conc=2 final_confid=3 dum=1

id_job=OSb17_1 _ERROR_=0 _N_=11

id=OSb24 job=. idchem=cot conc=. dur=. confid=. industcode=. level=2 final_dur=25 final_conc=2 final_confid=3 dum=1 id_job=OSb24_.

_ERROR_=0 _N_=12

NOTE: There were 12 observations read from the data set WORK.D3.

NOTE: The data set WORK.TEMP1 has 12 observations and 13 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

137

138 proc sort data=temp1; by id job id_job; run;

NOTE: There were 12 observations read from the data set WORK.TEMP1.

NOTE: The data set WORK.TEMP1 has 12 observations and 13 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

139

140 proc transpose data=temp1 out=temp1b(drop=_name_);

141 by id job id_job;

142 id idchem;

143 var dum;

144

145 /*REPLACING MISSING VALUES(IE. DOTS(.)) WITH ZEROS(UNEXPOSED)*/

NOTE: There were 12 observations read from the data set WORK.TEMP1.

NOTE: The data set WORK.TEMP1B has 12 observations and 5 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

real time 0.02 seconds

cpu time 0.02 seconds

146 data m1;

147 set temp1b;

148

NOTE: There were 12 observations read from the data set WORK.TEMP1B.

NOTE: The data set WORK.M1 has 12 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

149 proc stdize out=m1u reponly missing=0;

150 run;

NOTE: No VAR statement is given. All numerical variables not named elsewhere make up the first set of variables.

NOTE: There were 12 observations read from the data set WORK.M1.

NOTE: The data set WORK.M1U has 12 observations and 5 variables.

NOTE: PROCEDURE STDIZE used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

151

152 /* Finding unique ids from data d1*/

153

154 Proc summary data= m1u nway ;

155 class id;

156 var cot bes; output out=d4(drop=_type_ _freq_) max=;

157 run;

NOTE: There were 12 observations read from the data set WORK.M1U.

NOTE: The data set WORK.D4 has 9 observations and 3 variables.

NOTE: PROCEDURE SUMMARY used (Total process time):

real time 0.02 seconds

cpu time 0.02 seconds

158

159 proc print data=d4;

160 title 'Table 2: Unique ids with cot and pollutant bes exposures(0,1)'; run;

NOTE: There were 9 observations read from the data set WORK.D4.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.12 seconds

cpu time 0.13 seconds

161 run;

162

163 /* Step 3: Merge Table 2 with Table 1 (data d3) in step 1*/

164

165 proc sort data= d3; by id; run;

NOTE: There were 12 observations read from the data set WORK.D3.

NOTE: The data set WORK.D3 has 12 observations and 11 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

166

167 proc sort data=d4; by id;run;

NOTE: There were 9 observations read from the data set WORK.D4.

NOTE: The data set WORK.D4 has 9 observations and 3 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.03 seconds

cpu time 0.00 seconds

168

169 data d3d4; merge d3 d4; by id; run;

NOTE: There were 12 observations read from the data set WORK.D3.

NOTE: There were 9 observations read from the data set WORK.D4.

NOTE: The data set WORK.D3D4 has 12 observations and 13 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.03 seconds

170

171 proc sort data=d3d4; by id; run;

NOTE: There were 12 observations read from the data set WORK.D3D4.

NOTE: The data set WORK.D3D4 has 12 observations and 13 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

172

173 proc print data= d3d4; title ' Table 3: Merged job(d1 file) and subject(d2 file) data

174 showing dummy variables(0,1)';

175 run;

NOTE: There were 12 observations read from the data set WORK.D3D4.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.20 seconds

cpu time 0.20 seconds

176

177

178 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

190

PaigeMiller · Posted 11-15-2020 01:02 PM

I (and others) will not download attachments. Please show us the desired output via placing a screen capture in your message (use the camera icon).

I also point out that this transposing long to wide is rarely a good idea, and makes coding the next steps (usually) much harder, you would be much better off leaving this as a long data set. What do you intend to do once you transpose this to a wide data set?

--
Paige Miller

Long format data to wide format to obtain 1 line per id

Re: Long format data to wide format to obtain 1 line per id

Click image to register for webinar

Classroom Training Available!