My data is currently organized as:
Dr. | Letters | Date |
John | C | 1-Jan-17 |
John | F | 1-Jan-17 |
John | R | 1-Jan-17 |
Ben | C | 16-Feb-17 |
Ben | R | 16-Feb-17 |
Alex | C | 2-Mar-17 |
Alex | R | 2-Mar-17 |
Alex | F | 2-Mar-17 |
John | C | 25-Dec-16 |
John | R | 25-Dec-16 |
John | F | 25-Dec-16 |
I would like it to look like
Dr | Date | C | F | R |
John | 1-Jan-17 | yes | yes | yes |
Ben | 16-Feb-17 | yes | no | yes |
Alex | 2-Mar-17 | yes | yes | yes |
John | 25-Dec-16 | yes | yes | yes |
I have a feeling that this is a proc transpose, but I'm not sure where to start in terms of what the variable/ID are. Thank you!
You're correct, this is a transpose. You also need to add a variable for the main data set that you can use to fill in the Yes/No.
This is untested, but should give you the idea:
*Add variable for main section of report;
data have;
set have;
varT = 'Yes';
run;
*Transpose;
proc transpose data=have out=want prefix = Letter_;
by dr date;
id letters;
var varT;
run;
*Add in No's;
data want;
set want;
array _let(*) $ Letter_:;
do i=1 to dim(_let);
if missing(_let(i)) then _let(i)='No';
end;
run;
I would suggest thinking in terms of PROC TABULATE to count the occurances inside LETTER -- first --then a PROC TRANSPOSE. My two cents:
proc format;
value YesNo
.="No"
Low-0="No"
0<-high="Yes";
PROC TABULATE DATA=WORK.HAVE OUT=counts ;
CLASS Dr Letters Date ;
TABLE Dr*Date,Letters*N*F=YESNO. ;
PROC TRANSPOSE DATA=counts OUT=Want(drop=_Name_ );
format N YesNo.;
BY Dr Date;
ID Letters;
VAR N;
RUN; QUIT;
Could have used PROC FREQ instead of PROG TABULATE, same same.
Proc transpose, indeed, plus a data step to define "yes" as a value, and another data step to generate the "no" values:
data have;
input Dr $ Letters $ Date :date9.;
datalines;
John C 1-Jan-17
John F 1-Jan-17
John R 1-Jan-17
Ben C 16-Feb-17
Ben R 16-Feb-17
Alex C 2-Mar-17
Alex R 2-Mar-17
Alex F 2-Mar-17
John C 25-Dec-16
John R 25-Dec-16
John F 25-Dec-16
;
data temp;
set have;
value = "yes";
run;
proc transpose data=temp out=tbl(drop=_name_);
var value;
id Letters;
by dr notsorted;
run;
data want;
set tbl;
array _a $ C -- R;
do i = 1 to dim(_a);
if missing(_a{i}) then _a{i} = "no";
end;
drop i;
run;
You already have a PROC TRANSPOSE solution, so I'll go a different route. If your sample data is a realistic representation of the complexity, you can do this in one step, using a DATA step.
data want;
set have;
by doctor date notsorted;
length C F R $ 3;
if first.date then do;
C='no';
F='no';
R='no';
end;
retain C F R;
if letters='C' then C='yes';
else if letters='F' then F='yes';
else if letters='R' then R='yes';
if last.date;
drop letters;
run;
General Proc Transpose
https://goo.gl/f6ZvY0
https://communities.sas.com/t5/General-SAS-Programming/General-Proc-Transpose/m-p/344744
I like 'proc corresp' the best for this kind of problem, it identifies John
with 4 levels, C twice.
HAVE
Up to 40 obs WORK.HAVE total obs=11
Obs DR LETTERS DATE
1 John C 01JAN2017
2 John F 01JAN2017
3 John R 01JAN2017
4 Ben C 16FEB2017
5 Ben R 16FEB2017
6 Alex C 02MAR2017
7 Alex R 02MAR2017
8 Alex F 02MAR2017
9 John C 25DEC2016
10 John R 25DEC2016
11 John F 25DEC2016
WANT
obs DR C F R
1 Alex yes yes yes
2 Ben yes no yes
3 John yes yes yes
Simple solutions solutions
\
PROC FREQ (just a display)
=========
Table of DR by LETTERS
DR LETTERS
Frequency|C |F |R | Total
---------+--------+--------+--------+
Alex | 1 | 1 | 1 | 3
---------+--------+--------+--------+
Ben | 1 | 0 | 1 | 2
---------+--------+--------+--------+
John | 2 | 2 | 2 | 6
---------+--------+--------+--------+
Total 4 3 4 11
PROC CORRESP
Obs LABEL C F R SUM
1 Alex 1 1 1 3
2 Ben 1 0 1 2
3 John 2 2 2 6
4 Sum 4 3 4 11
* _ _ _
_ __ ___ __ _| | _____ __| | __ _| |_ __ _
| '_ ` _ \ / _` | |/ / _ \_____ / _` |/ _` | __/ _` |
| | | | | | (_| | < __/_____| (_| | (_| | || (_| |
|_| |_| |_|\__,_|_|\_\___| \__,_|\__,_|\__\__,_|
;
data have;
informat Dr $4. Letters $1. Date date10.;
input Dr $ Letters $ Date ;
cards4;
John C 1-Jan-17
John F 1-Jan-17
John R 1-Jan-17
Ben C 16-Feb-17
Ben R 16-Feb-17
Alex C 2-Mar-17
Alex R 2-Mar-17
Alex F 2-Mar-17
John C 25-Dec-16
John R 25-Dec-16
John F 25-Dec-16
;;;;
run;quit;
*_
| |_ _ __ __ _ _ __ ___ _ __ ___ ___ ___
| __| '__/ _` | '_ \/ __| '_ \ / _ \/ __|/ _ \
| |_| | | (_| | | | \__ \ |_) | (_) \__ \ __/
\__|_| \__,_|_| |_|___/ .__/ \___/|___/\___|
|_|
;
* get rid of dups (ie 2 Cs);
proc sort data=have out=havunq nodupkey;
by dr letters;
run;quit;
* create variable with 'yes' drop date;
data havchr(drop=date);
set havunq;
yesno='yes';
run;quit;
/*
Up to 40 obs WORK.HAVCHR total obs=8
Obs DR LETTERS YESNO
1 Alex C yes
2 Alex F yes
3 Alex R yes
4 Ben C yes
5 Ben R yes
6 John C yes
7 John F yes
8 John R yes
*/
proc transpose data=havchr out=havxpo;
by dr ;
id letters;
var yesno;
run;quit;
/*
Up to 40 obs WORK.HAVXPO total obs=3
Obs DR _NAME_ C F R
1 Alex YESNO yes yes yes
2 Ben YESNO yes yes
3 John YESNO yes yes yes
*/
data want;
set havxpo;
array chrs _character_;
do over chrs;
if chrs=' ' then chrs='no';
end;
run;quit;
/*
Up to 40 obs WORK.WANT total obs=3
Obs DR _NAME_ C F R
1 Alex YESNO yes yes yes
2 Ben YESNO yes no yes
3 John YESNO yes yes yes
*/
proc print data=want;
run;quit;
* couple of possibilities;
*
___ ___ _ __ _ __ ___ ___ _ __
/ __/ _ \| '__| '__/ _ \/ __| '_ \
| (_| (_) | | | | | __/\__ \ |_) |
\___\___/|_| |_| \___||___/ .__/
|_|
;
Ods Exclude All;
Ods Output Observed=want;
Proc Corresp Data=have Observed dim=1;
Table dr, letters;
Run;
Ods Select All;
proc print data=want;run;
If you want counts;
Obs LABEL C F R SUM
1 Alex 1 1 1 3
2 Ben 1 0 1 2
3 John 2 2 2 6
4 Sum 4 3 4 11
* __
/ _|_ __ ___ __ _
| |_| '__/ _ \/ _` |
| _| | | __/ (_| |
|_| |_| \___|\__, |
|_|
;
proc freq data=have;
tables dr*letters/sparse norow nocolumn nopercent;
run;quit;
Table of DR by LETTERS
DR LETTERS
Frequency|C |F |R | Total
---------+--------+--------+--------+
Alex | 1 | 1 | 1 | 3
---------+--------+--------+--------+
Ben | 1 | 0 | 1 | 2
---------+--------+--------+--------+
John | 2 | 2 | 2 | 6
---------+--------+--------+--------+
Total 4 3 4 11
/*
Obs DR _NAME_ C F R
1 Alex DATE yes yes yes
2 Ben DATE yes no yes
3 John DATE yes yes yes
*/
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.