I have created a data set (have) using following code. Code may not work properly. The print shows as table-1 below. I would like to make the dataset wide, so it shows like table-2.
data have;
input Name $
age
gender $
test1_dt date9.
test1_result $
test2_dt date9.
test2_result $
test3_dt date9.
test3_result $;
datalines;
ABC 17 M 04AUG2020 fail 08AUG2020 fail 12AUG2020 pass
BCD 19 F 02MAY2020 fail 08MAY2020 fail 12MAY2020 pass
CDE 21 F 01MAY2020 pass
DEF 15 M 04MAY2020 fail 04JUN2020 pass
EFG 20 F 02MAY2020 fail 12MAY2020 fail 19MAY2020 pass
FGH 21 M 03MAY2020 fail 04MAY2020 fail 12MAY2020 pass
GHI 17 M 06MAY2020 pass
HIJ 16 F 05MAY2020 fail 15MAY2020 fail 22MAY2020 pass
IJK 20 M 08MAY2020 fail 12MAY2020 pass
JKL 18 F 10JUN2020 fail 20JUN2020 fail 24JUN2020 pass
KLM 18 M 09JUL2020 fail 19JUL2020 fail 29JUL2020 pass
LMN 20 M 11MAY2020 fail 19MAY2020 pass
;
run;
Table-1
Name | age | gender | test1_dt | test1_result | test2_dt | test2_result | test3_dt | test3_result |
ABC | 17 | M | 4-Aug-20 | fail | 8-Aug-20 | fail | 12-Aug-20 | pass |
BCD | 19 | F | 2-May-20 | fail | 8-May-20 | fail | 12-May-20 | pass |
CDE | 21 | F | 1-May-20 | pass | ||||
DEF | 15 | M | 4-May-20 | fail | 4-Jun-20 | pass | ||
EFG | 20 | F | 2-May-20 | fail | 12-May-20 | fail | 19-May-20 | pass |
FGH | 21 | M | 3-May-20 | fail | 4-May-20 | fail | 12-May-20 | pass |
GHI | 17 | M | 6-May-20 | pass | ||||
HIJ | 16 | F | 5-May-20 | fail | 15-May-20 | fail | 22-May-20 | pass |
IJK | 20 | M | 8-May-20 | fail | 12-May-20 | pass | ||
JKL | 18 | F | 10-Jun-20 | fail | 20-Jun-20 | fail | 24-Jun-20 | pass |
KLM | 18 | M | 9-Jul-20 | fail | 19-Jul-20 | fail | 29-Jul-20 | pass |
LMN | 20 | M | 11-May-20 | fail | 19-May-20 | pass |
Table-2
Name | age | gender | test_number | test_date | test_result |
ABC | 17 | M | test1 | 4-Aug-20 | fail |
ABC | 17 | M | test2 | 8-Aug-20 | fail |
ABC | 17 | M | test3 | 12-Aug-20 | pass |
BCD | 19 | F | test1 | 2-May-20 | fail |
BCD | 19 | F | test2 | 8-May-20 | fail |
BCD | 19 | F | test3 | 12-May-20 | pass |
CDE | 21 | F | test1 | 1-May-20 | pass |
DEF | 15 | M | test1 | 4-May-20 | fail |
DEF | 15 | M | test2 | 4-Jun-20 | pass |
EFG | 20 | F | test1 | 2-May-20 | fail |
EFG | 20 | F | test2 | 12-May-20 | fail |
EFG | 20 | F | test3 | 19-May-20 | pass |
FGH | 21 | M | test1 | 3-May-20 | fail |
FGH | 21 | M | test2 | 4-May-20 | fail |
FGH | 21 | M | test3 | 12-May-20 | pass |
GHI | 17 | M | test1 | 6-May-20 | pass |
HIJ | 16 | F | test1 | 5-May-20 | fail |
HIJ | 16 | F | test2 | 15-May-20 | fail |
HIJ | 16 | F | test3 | 22-May-20 | pass |
IJK | 20 | M | test1 | 8-May-20 | fail |
IJK | 20 | M | test2 | 12-May-20 | pass |
JKL | 18 | F | test1 | 10-Jun-20 | fail |
JKL | 18 | F | test2 | 20-Jun-20 | fail |
JKL | 18 | F | test3 | 24-Jun-20 | pass |
KLM | 18 | M | test1 | 9-Jul-20 | fail |
KLM | 18 | M | test2 | 19-Jul-20 | fail |
KLM | 18 | M | test3 | 29-Jul-20 | pass |
LMN | 20 | M | test1 | 11-May-20 | fail |
LMN | 20 | M | test2 | 19-May-20 | pass |
data want;
set have;
array _dates(3) test1_dt test2_dt test3_dt;
array _results(3) test1_result test2_result test3_result;
do i=1 to 3;
test_date = _dates(i);
test_result = _results(i);
test = i;
if not missing(test_result) then output;
end;
drop test1_dt test2_dt test3_dt test1_result test2_result test3_result;
run;
Untested
Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
@Barkat wrote:
I have created a data set (have) using following code. Code may not work properly. The print shows as table-1 below. I would like to make the dataset wide, so it shows like table-2.
data have;
input Name $
age
gender $
test1_dt date9.
test1_result $
test2_dt date9.
test2_result $
test3_dt date9.
test3_result $;
datalines;
ABC 17 M 04AUG2020 fail 08AUG2020 fail 12AUG2020 pass
BCD 19 F 02MAY2020 fail 08MAY2020 fail 12MAY2020 pass
CDE 21 F 01MAY2020 pass
DEF 15 M 04MAY2020 fail 04JUN2020 pass
EFG 20 F 02MAY2020 fail 12MAY2020 fail 19MAY2020 pass
FGH 21 M 03MAY2020 fail 04MAY2020 fail 12MAY2020 pass
GHI 17 M 06MAY2020 pass
HIJ 16 F 05MAY2020 fail 15MAY2020 fail 22MAY2020 pass
IJK 20 M 08MAY2020 fail 12MAY2020 pass
JKL 18 F 10JUN2020 fail 20JUN2020 fail 24JUN2020 pass
KLM 18 M 09JUL2020 fail 19JUL2020 fail 29JUL2020 pass
LMN 20 M 11MAY2020 fail 19MAY2020 pass
;
run;
Table-1
Name age gender test1_dt test1_result test2_dt test2_result test3_dt test3_result ABC 17 M 4-Aug-20 fail 8-Aug-20 fail 12-Aug-20 pass BCD 19 F 2-May-20 fail 8-May-20 fail 12-May-20 pass CDE 21 F 1-May-20 pass DEF 15 M 4-May-20 fail 4-Jun-20 pass EFG 20 F 2-May-20 fail 12-May-20 fail 19-May-20 pass FGH 21 M 3-May-20 fail 4-May-20 fail 12-May-20 pass GHI 17 M 6-May-20 pass HIJ 16 F 5-May-20 fail 15-May-20 fail 22-May-20 pass IJK 20 M 8-May-20 fail 12-May-20 pass JKL 18 F 10-Jun-20 fail 20-Jun-20 fail 24-Jun-20 pass KLM 18 M 9-Jul-20 fail 19-Jul-20 fail 29-Jul-20 pass LMN 20 M 11-May-20 fail 19-May-20 pass
Table-2
Name age gender test_number test_date test_result ABC 17 M test1 4-Aug-20 fail ABC 17 M test2 8-Aug-20 fail ABC 17 M test3 12-Aug-20 pass BCD 19 F test1 2-May-20 fail BCD 19 F test2 8-May-20 fail BCD 19 F test3 12-May-20 pass CDE 21 F test1 1-May-20 pass DEF 15 M test1 4-May-20 fail DEF 15 M test2 4-Jun-20 pass EFG 20 F test1 2-May-20 fail EFG 20 F test2 12-May-20 fail EFG 20 F test3 19-May-20 pass FGH 21 M test1 3-May-20 fail FGH 21 M test2 4-May-20 fail FGH 21 M test3 12-May-20 pass GHI 17 M test1 6-May-20 pass HIJ 16 F test1 5-May-20 fail HIJ 16 F test2 15-May-20 fail HIJ 16 F test3 22-May-20 pass IJK 20 M test1 8-May-20 fail IJK 20 M test2 12-May-20 pass JKL 18 F test1 10-Jun-20 fail JKL 18 F test2 20-Jun-20 fail JKL 18 F test3 24-Jun-20 pass KLM 18 M test1 9-Jul-20 fail KLM 18 M test2 19-Jul-20 fail KLM 18 M test3 29-Jul-20 pass LMN 20 M test1 11-May-20 fail LMN 20 M test2 19-May-20 pass
data want;
set have;
array _dates(3) test1_dt test2_dt test3_dt;
array _results(3) test1_result test2_result test3_result;
do i=1 to 3;
test_date = _dates(i);
test_result = _results(i);
test = i;
if not missing(test_result) then output;
end;
drop test1_dt test2_dt test3_dt test1_result test2_result test3_result;
run;
Untested
Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
@Barkat wrote:
I have created a data set (have) using following code. Code may not work properly. The print shows as table-1 below. I would like to make the dataset wide, so it shows like table-2.
data have;
input Name $
age
gender $
test1_dt date9.
test1_result $
test2_dt date9.
test2_result $
test3_dt date9.
test3_result $;
datalines;
ABC 17 M 04AUG2020 fail 08AUG2020 fail 12AUG2020 pass
BCD 19 F 02MAY2020 fail 08MAY2020 fail 12MAY2020 pass
CDE 21 F 01MAY2020 pass
DEF 15 M 04MAY2020 fail 04JUN2020 pass
EFG 20 F 02MAY2020 fail 12MAY2020 fail 19MAY2020 pass
FGH 21 M 03MAY2020 fail 04MAY2020 fail 12MAY2020 pass
GHI 17 M 06MAY2020 pass
HIJ 16 F 05MAY2020 fail 15MAY2020 fail 22MAY2020 pass
IJK 20 M 08MAY2020 fail 12MAY2020 pass
JKL 18 F 10JUN2020 fail 20JUN2020 fail 24JUN2020 pass
KLM 18 M 09JUL2020 fail 19JUL2020 fail 29JUL2020 pass
LMN 20 M 11MAY2020 fail 19MAY2020 pass
;
run;
Table-1
Name age gender test1_dt test1_result test2_dt test2_result test3_dt test3_result ABC 17 M 4-Aug-20 fail 8-Aug-20 fail 12-Aug-20 pass BCD 19 F 2-May-20 fail 8-May-20 fail 12-May-20 pass CDE 21 F 1-May-20 pass DEF 15 M 4-May-20 fail 4-Jun-20 pass EFG 20 F 2-May-20 fail 12-May-20 fail 19-May-20 pass FGH 21 M 3-May-20 fail 4-May-20 fail 12-May-20 pass GHI 17 M 6-May-20 pass HIJ 16 F 5-May-20 fail 15-May-20 fail 22-May-20 pass IJK 20 M 8-May-20 fail 12-May-20 pass JKL 18 F 10-Jun-20 fail 20-Jun-20 fail 24-Jun-20 pass KLM 18 M 9-Jul-20 fail 19-Jul-20 fail 29-Jul-20 pass LMN 20 M 11-May-20 fail 19-May-20 pass
Table-2
Name age gender test_number test_date test_result ABC 17 M test1 4-Aug-20 fail ABC 17 M test2 8-Aug-20 fail ABC 17 M test3 12-Aug-20 pass BCD 19 F test1 2-May-20 fail BCD 19 F test2 8-May-20 fail BCD 19 F test3 12-May-20 pass CDE 21 F test1 1-May-20 pass DEF 15 M test1 4-May-20 fail DEF 15 M test2 4-Jun-20 pass EFG 20 F test1 2-May-20 fail EFG 20 F test2 12-May-20 fail EFG 20 F test3 19-May-20 pass FGH 21 M test1 3-May-20 fail FGH 21 M test2 4-May-20 fail FGH 21 M test3 12-May-20 pass GHI 17 M test1 6-May-20 pass HIJ 16 F test1 5-May-20 fail HIJ 16 F test2 15-May-20 fail HIJ 16 F test3 22-May-20 pass IJK 20 M test1 8-May-20 fail IJK 20 M test2 12-May-20 pass JKL 18 F test1 10-Jun-20 fail JKL 18 F test2 20-Jun-20 fail JKL 18 F test3 24-Jun-20 pass KLM 18 M test1 9-Jul-20 fail KLM 18 M test2 19-Jul-20 fail KLM 18 M test3 29-Jul-20 pass LMN 20 M test1 11-May-20 fail LMN 20 M test2 19-May-20 pass
@Barkat wrote:
Thanks. I forgot to mention that there are many observations in my data set who have all test dates and all test results data missing. They only have name, age, and gender. When I run this code they are automatically excluded. How can I solve that?
You could remove the IF/THEN so OUTPUT statement executes every time through the DO loop.
Check for a missing DATE instead of a missing value.
Or assuming that you never have missing for the first value and then non-missing later just always output the first one.
if i=1 or not missing(test_date) then output;
Your datelines is missing the missover option. you must include this in order to read in the missing values correctly. also, you may do proc transpose twice and merge those 2 datasets together, although it will be more lines of code than what @Reeza has shown you.
data have;
format test1_dt test2_dt test3_dt date9.;
infile datalines missover;
input Name $
age
gender $
test1_dt date9.
test1_result $
test2_dt date9.
test2_result $
test3_dt date9.
test3_result $;
datalines;
ABC 17 M 04AUG2020 fail 08AUG2020 fail 12AUG2020 pass
BCD 19 F 02MAY2020 fail 08MAY2020 fail 12MAY2020 pass
CDE 21 F 01MAY2020 pass
DEF 15 M 04MAY2020 fail 04JUN2020 pass
EFG 20 F 02MAY2020 fail 12MAY2020 fail 19MAY2020 pass
FGH 21 M 03MAY2020 fail 04MAY2020 fail 12MAY2020 pass
GHI 17 M 06MAY2020 pass
HIJ 16 F 05MAY2020 fail 15MAY2020 fail 22MAY2020 pass
IJK 20 M 08MAY2020 fail 12MAY2020 pass
JKL 18 F 10JUN2020 fail 20JUN2020 fail 24JUN2020 pass
KLM 18 M 09JUL2020 fail 19JUL2020 fail 29JUL2020 pass
LMN 20 M 11MAY2020 fail 19MAY2020 pass
;
run;
proc print data=have;
run;
proc sort data=have;
by name age gender;
run;
proc transpose data=have out=have_t;
var test1_dt test2_dt test3_dt;
by name age gender;
run;
data dates(rename=(col1=test_date));
set have_t;
where col1 ne .;
test_number=substr(_name_,1,5);
drop _name_;
run;
proc transpose data=have out=have_t2;
var test1_result test2_result test3_result;
by name age gender;
run;
data results(rename=(col1=test_result));
set have_t2;
where ^missing(col1);
test_number=substr(_name_,1,5);
drop _name_;
run;
proc sort data=results;
by name test_number;
run;
proc sort data=dates;
by name test_number;
run;
data merged;
length test_number $12;
merge dates results;
by name test_number;
run;
Simplify your data, make your future coding even easier. Test number should be numeric only, not a character string beginning with Test. The test_result field should be 0 or 1, which is not only simpler to type, but simpler to handle in future analyses.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.