BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barkat
Pyrite | Level 9

I have created a data set (have) using following code. Code may not work properly. The print shows as table-1 below. I would like to make the dataset wide, so it shows like table-2. 

data have;
input Name $
age
gender $
test1_dt date9.
test1_result $
test2_dt date9.
test2_result $
test3_dt date9.
test3_result $;
datalines;
ABC 17 M 04AUG2020 fail 08AUG2020 fail 12AUG2020 pass
BCD 19 F 02MAY2020 fail 08MAY2020 fail 12MAY2020 pass
CDE 21 F 01MAY2020 pass
DEF 15 M 04MAY2020 fail 04JUN2020 pass
EFG 20 F 02MAY2020 fail 12MAY2020 fail 19MAY2020 pass
FGH 21 M 03MAY2020 fail 04MAY2020 fail 12MAY2020 pass
GHI 17 M 06MAY2020 pass
HIJ 16 F 05MAY2020 fail 15MAY2020 fail 22MAY2020 pass
IJK 20 M 08MAY2020 fail 12MAY2020 pass
JKL 18 F 10JUN2020 fail 20JUN2020 fail 24JUN2020 pass
KLM 18 M 09JUL2020 fail 19JUL2020 fail 29JUL2020 pass
LMN 20 M 11MAY2020 fail 19MAY2020 pass
;
run;

 

Table-1

Name age gender test1_dt test1_result test2_dt test2_result test3_dt test3_result
ABC 17 M 4-Aug-20 fail 8-Aug-20 fail 12-Aug-20 pass
BCD 19 F 2-May-20 fail 8-May-20 fail 12-May-20 pass
CDE 21 F 1-May-20 pass        
DEF 15 M 4-May-20 fail 4-Jun-20 pass    
EFG 20 F 2-May-20 fail 12-May-20 fail 19-May-20 pass
FGH 21 M 3-May-20 fail 4-May-20 fail 12-May-20 pass
GHI 17 M 6-May-20 pass        
HIJ 16 F 5-May-20 fail 15-May-20 fail 22-May-20 pass
IJK 20 M 8-May-20 fail 12-May-20 pass    
JKL 18 F 10-Jun-20 fail 20-Jun-20 fail 24-Jun-20 pass
KLM 18 M 9-Jul-20 fail 19-Jul-20 fail 29-Jul-20 pass
LMN 20 M 11-May-20 fail 19-May-20 pass    

 

Table-2

Name age gender test_number test_date test_result
ABC 17 M test1 4-Aug-20 fail
ABC 17 M test2 8-Aug-20 fail
ABC 17 M test3 12-Aug-20 pass
BCD 19 F test1 2-May-20 fail
BCD 19 F test2 8-May-20 fail
BCD 19 F test3 12-May-20 pass
CDE 21 F test1 1-May-20 pass
DEF 15 M test1 4-May-20 fail
DEF 15 M test2 4-Jun-20 pass
EFG 20 F test1 2-May-20 fail
EFG 20 F test2 12-May-20 fail
EFG 20 F test3 19-May-20 pass
FGH 21 M test1 3-May-20 fail
FGH 21 M test2 4-May-20 fail
FGH 21 M test3 12-May-20 pass
GHI 17 M test1 6-May-20 pass
HIJ 16 F test1 5-May-20 fail
HIJ 16 F test2 15-May-20 fail
HIJ 16 F test3 22-May-20 pass
IJK 20 M test1 8-May-20 fail
IJK 20 M test2 12-May-20 pass
JKL 18 F test1 10-Jun-20 fail
JKL 18 F test2 20-Jun-20 fail
JKL 18 F test3 24-Jun-20 pass
KLM 18 M test1 9-Jul-20 fail
KLM 18 M test2 19-Jul-20 fail
KLM 18 M test3 29-Jul-20 pass
LMN 20 M test1 11-May-20 fail
LMN 20 M test2 19-May-20 pass

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data want;
set have;

array _dates(3) test1_dt test2_dt test3_dt;
array _results(3) test1_result test2_result test3_result;

do i=1 to 3;
test_date = _dates(i);
test_result = _results(i);
test = i; if not missing(test_result) then output; end; drop test1_dt test2_dt test3_dt test1_result test2_result test3_result; run;

Untested

 

Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/


@Barkat wrote:

I have created a data set (have) using following code. Code may not work properly. The print shows as table-1 below. I would like to make the dataset wide, so it shows like table-2. 

data have;
input Name $
age
gender $
test1_dt date9.
test1_result $
test2_dt date9.
test2_result $
test3_dt date9.
test3_result $;
datalines;
ABC 17 M 04AUG2020 fail 08AUG2020 fail 12AUG2020 pass
BCD 19 F 02MAY2020 fail 08MAY2020 fail 12MAY2020 pass
CDE 21 F 01MAY2020 pass
DEF 15 M 04MAY2020 fail 04JUN2020 pass
EFG 20 F 02MAY2020 fail 12MAY2020 fail 19MAY2020 pass
FGH 21 M 03MAY2020 fail 04MAY2020 fail 12MAY2020 pass
GHI 17 M 06MAY2020 pass
HIJ 16 F 05MAY2020 fail 15MAY2020 fail 22MAY2020 pass
IJK 20 M 08MAY2020 fail 12MAY2020 pass
JKL 18 F 10JUN2020 fail 20JUN2020 fail 24JUN2020 pass
KLM 18 M 09JUL2020 fail 19JUL2020 fail 29JUL2020 pass
LMN 20 M 11MAY2020 fail 19MAY2020 pass
;
run;

 

Table-1

Name age gender test1_dt test1_result test2_dt test2_result test3_dt test3_result
ABC 17 M 4-Aug-20 fail 8-Aug-20 fail 12-Aug-20 pass
BCD 19 F 2-May-20 fail 8-May-20 fail 12-May-20 pass
CDE 21 F 1-May-20 pass        
DEF 15 M 4-May-20 fail 4-Jun-20 pass    
EFG 20 F 2-May-20 fail 12-May-20 fail 19-May-20 pass
FGH 21 M 3-May-20 fail 4-May-20 fail 12-May-20 pass
GHI 17 M 6-May-20 pass        
HIJ 16 F 5-May-20 fail 15-May-20 fail 22-May-20 pass
IJK 20 M 8-May-20 fail 12-May-20 pass    
JKL 18 F 10-Jun-20 fail 20-Jun-20 fail 24-Jun-20 pass
KLM 18 M 9-Jul-20 fail 19-Jul-20 fail 29-Jul-20 pass
LMN 20 M 11-May-20 fail 19-May-20 pass    

 

Table-2

Name age gender test_number test_date test_result
ABC 17 M test1 4-Aug-20 fail
ABC 17 M test2 8-Aug-20 fail
ABC 17 M test3 12-Aug-20 pass
BCD 19 F test1 2-May-20 fail
BCD 19 F test2 8-May-20 fail
BCD 19 F test3 12-May-20 pass
CDE 21 F test1 1-May-20 pass
DEF 15 M test1 4-May-20 fail
DEF 15 M test2 4-Jun-20 pass
EFG 20 F test1 2-May-20 fail
EFG 20 F test2 12-May-20 fail
EFG 20 F test3 19-May-20 pass
FGH 21 M test1 3-May-20 fail
FGH 21 M test2 4-May-20 fail
FGH 21 M test3 12-May-20 pass
GHI 17 M test1 6-May-20 pass
HIJ 16 F test1 5-May-20 fail
HIJ 16 F test2 15-May-20 fail
HIJ 16 F test3 22-May-20 pass
IJK 20 M test1 8-May-20 fail
IJK 20 M test2 12-May-20 pass
JKL 18 F test1 10-Jun-20 fail
JKL 18 F test2 20-Jun-20 fail
JKL 18 F test3 24-Jun-20 pass
KLM 18 M test1 9-Jul-20 fail
KLM 18 M test2 19-Jul-20 fail
KLM 18 M test3 29-Jul-20 pass
LMN 20 M test1 11-May-20 fail
LMN 20 M test2 19-May-20 pass

 




View solution in original post

6 REPLIES 6
Reeza
Super User
data want;
set have;

array _dates(3) test1_dt test2_dt test3_dt;
array _results(3) test1_result test2_result test3_result;

do i=1 to 3;
test_date = _dates(i);
test_result = _results(i);
test = i; if not missing(test_result) then output; end; drop test1_dt test2_dt test3_dt test1_result test2_result test3_result; run;

Untested

 

Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/


@Barkat wrote:

I have created a data set (have) using following code. Code may not work properly. The print shows as table-1 below. I would like to make the dataset wide, so it shows like table-2. 

data have;
input Name $
age
gender $
test1_dt date9.
test1_result $
test2_dt date9.
test2_result $
test3_dt date9.
test3_result $;
datalines;
ABC 17 M 04AUG2020 fail 08AUG2020 fail 12AUG2020 pass
BCD 19 F 02MAY2020 fail 08MAY2020 fail 12MAY2020 pass
CDE 21 F 01MAY2020 pass
DEF 15 M 04MAY2020 fail 04JUN2020 pass
EFG 20 F 02MAY2020 fail 12MAY2020 fail 19MAY2020 pass
FGH 21 M 03MAY2020 fail 04MAY2020 fail 12MAY2020 pass
GHI 17 M 06MAY2020 pass
HIJ 16 F 05MAY2020 fail 15MAY2020 fail 22MAY2020 pass
IJK 20 M 08MAY2020 fail 12MAY2020 pass
JKL 18 F 10JUN2020 fail 20JUN2020 fail 24JUN2020 pass
KLM 18 M 09JUL2020 fail 19JUL2020 fail 29JUL2020 pass
LMN 20 M 11MAY2020 fail 19MAY2020 pass
;
run;

 

Table-1

Name age gender test1_dt test1_result test2_dt test2_result test3_dt test3_result
ABC 17 M 4-Aug-20 fail 8-Aug-20 fail 12-Aug-20 pass
BCD 19 F 2-May-20 fail 8-May-20 fail 12-May-20 pass
CDE 21 F 1-May-20 pass        
DEF 15 M 4-May-20 fail 4-Jun-20 pass    
EFG 20 F 2-May-20 fail 12-May-20 fail 19-May-20 pass
FGH 21 M 3-May-20 fail 4-May-20 fail 12-May-20 pass
GHI 17 M 6-May-20 pass        
HIJ 16 F 5-May-20 fail 15-May-20 fail 22-May-20 pass
IJK 20 M 8-May-20 fail 12-May-20 pass    
JKL 18 F 10-Jun-20 fail 20-Jun-20 fail 24-Jun-20 pass
KLM 18 M 9-Jul-20 fail 19-Jul-20 fail 29-Jul-20 pass
LMN 20 M 11-May-20 fail 19-May-20 pass    

 

Table-2

Name age gender test_number test_date test_result
ABC 17 M test1 4-Aug-20 fail
ABC 17 M test2 8-Aug-20 fail
ABC 17 M test3 12-Aug-20 pass
BCD 19 F test1 2-May-20 fail
BCD 19 F test2 8-May-20 fail
BCD 19 F test3 12-May-20 pass
CDE 21 F test1 1-May-20 pass
DEF 15 M test1 4-May-20 fail
DEF 15 M test2 4-Jun-20 pass
EFG 20 F test1 2-May-20 fail
EFG 20 F test2 12-May-20 fail
EFG 20 F test3 19-May-20 pass
FGH 21 M test1 3-May-20 fail
FGH 21 M test2 4-May-20 fail
FGH 21 M test3 12-May-20 pass
GHI 17 M test1 6-May-20 pass
HIJ 16 F test1 5-May-20 fail
HIJ 16 F test2 15-May-20 fail
HIJ 16 F test3 22-May-20 pass
IJK 20 M test1 8-May-20 fail
IJK 20 M test2 12-May-20 pass
JKL 18 F test1 10-Jun-20 fail
JKL 18 F test2 20-Jun-20 fail
JKL 18 F test3 24-Jun-20 pass
KLM 18 M test1 9-Jul-20 fail
KLM 18 M test2 19-Jul-20 fail
KLM 18 M test3 29-Jul-20 pass
LMN 20 M test1 11-May-20 fail
LMN 20 M test2 19-May-20 pass

 




Barkat
Pyrite | Level 9
Thanks. I forgot to mention that there are many observations in my data set who have all test dates and all test results data missing. They only have name, age, and gender. When I run this code they are automatically excluded. How can I solve that?
Tom
Super User Tom
Super User

@Barkat wrote:
Thanks. I forgot to mention that there are many observations in my data set who have all test dates and all test results data missing. They only have name, age, and gender. When I run this code they are automatically excluded. How can I solve that?

You could remove the IF/THEN so OUTPUT statement executes every time through the DO loop.

Check for a missing DATE instead of a missing value.

Or assuming that you never have missing for the first value and then non-missing later just always output the first one.

if i=1 or not missing(test_date) then output;

 

tarheel13
Rhodochrosite | Level 12

Your datelines is missing the missover option. you must include this in order to read in the missing values correctly. also, you may do proc transpose twice and merge those 2 datasets together, although it will be more lines of code than what @Reeza has shown you. 

data have;
format test1_dt test2_dt test3_dt date9.;
infile datalines missover;
input Name $ 
age 
gender $ 
test1_dt date9. 
test1_result $ 
test2_dt date9.
test2_result $ 
test3_dt date9.
test3_result $;
datalines;
ABC 17 M 04AUG2020 fail 08AUG2020 fail 12AUG2020 pass
BCD 19 F 02MAY2020 fail 08MAY2020 fail 12MAY2020 pass
CDE 21 F 01MAY2020 pass
DEF 15 M 04MAY2020 fail 04JUN2020 pass
EFG 20 F 02MAY2020 fail 12MAY2020 fail 19MAY2020 pass
FGH 21 M 03MAY2020 fail 04MAY2020 fail 12MAY2020 pass
GHI 17 M 06MAY2020 pass
HIJ 16 F 05MAY2020 fail 15MAY2020 fail 22MAY2020 pass
IJK 20 M 08MAY2020 fail 12MAY2020 pass
JKL 18 F 10JUN2020 fail 20JUN2020 fail 24JUN2020 pass
KLM 18 M 09JUL2020 fail 19JUL2020 fail 29JUL2020 pass
LMN 20 M 11MAY2020 fail 19MAY2020 pass
;
run;

proc print data=have;
run;

proc sort data=have;
	by name age gender;
run;

proc transpose data=have out=have_t;
	var test1_dt test2_dt test3_dt;
	by name age gender;
run;

data dates(rename=(col1=test_date));
	set have_t;
	where col1 ne .;
	test_number=substr(_name_,1,5);
	
	drop _name_;
run;

proc transpose data=have out=have_t2;
	var test1_result test2_result test3_result;
	by name age gender;
run;

data results(rename=(col1=test_result));
	set have_t2;
	where ^missing(col1);
	test_number=substr(_name_,1,5);
	
	drop _name_;
run;

proc sort data=results;
	by name test_number;
run;

proc sort data=dates;
	by name test_number;
run;

data merged;
	length test_number $12;
	merge dates results;
	by name test_number;
run;
Barkat
Pyrite | Level 9
Thank you for the solution. Thanks for correcting table creating step by adding missover.
PaigeMiller
Diamond | Level 26

Simplify your data, make your future coding even easier. Test number should be numeric only, not a character string beginning with Test. The test_result field should be 0 or 1, which is not only simpler to type, but simpler to handle in future analyses.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 743 views
  • 4 likes
  • 5 in conversation