My data is currently organized as:
Dr. | Letters | Date |
John | C | 1-Jan-17 |
John | F | 1-Jan-17 |
John | R | 1-Jan-17 |
Ben | C | 16-Feb-17 |
Ben | R | 16-Feb-17 |
Alex | C | 2-Mar-17 |
Alex | R | 2-Mar-17 |
Alex | F | 2-Mar-17 |
John | C | 25-Dec-16 |
John | R | 25-Dec-16 |
John | F | 25-Dec-16 |
I would like it to look like
Dr | Date | C | F | R |
John | 1-Jan-17 | yes | yes | yes |
Ben | 16-Feb-17 | yes | no | yes |
Alex | 2-Mar-17 | yes | yes | yes |
John | 25-Dec-16 | yes | yes | yes |
I have a feeling that this is a proc transpose, but I'm not sure where to start in terms of what the variable/ID are. Thank you!
You're correct, this is a transpose. You also need to add a variable for the main data set that you can use to fill in the Yes/No.
This is untested, but should give you the idea:
*Add variable for main section of report;
data have;
set have;
varT = 'Yes';
run;
*Transpose;
proc transpose data=have out=want prefix = Letter_;
by dr date;
id letters;
var varT;
run;
*Add in No's;
data want;
set want;
array _let(*) $ Letter_:;
do i=1 to dim(_let);
if missing(_let(i)) then _let(i)='No';
end;
run;
I would suggest thinking in terms of PROC TABULATE to count the occurances inside LETTER -- first --then a PROC TRANSPOSE. My two cents:
proc format;
value YesNo
.="No"
Low-0="No"
0<-high="Yes";
PROC TABULATE DATA=WORK.HAVE OUT=counts ;
CLASS Dr Letters Date ;
TABLE Dr*Date,Letters*N*F=YESNO. ;
PROC TRANSPOSE DATA=counts OUT=Want(drop=_Name_ );
format N YesNo.;
BY Dr Date;
ID Letters;
VAR N;
RUN; QUIT;
Could have used PROC FREQ instead of PROG TABULATE, same same.
Proc transpose, indeed, plus a data step to define "yes" as a value, and another data step to generate the "no" values:
data have;
input Dr $ Letters $ Date :date9.;
datalines;
John C 1-Jan-17
John F 1-Jan-17
John R 1-Jan-17
Ben C 16-Feb-17
Ben R 16-Feb-17
Alex C 2-Mar-17
Alex R 2-Mar-17
Alex F 2-Mar-17
John C 25-Dec-16
John R 25-Dec-16
John F 25-Dec-16
;
data temp;
set have;
value = "yes";
run;
proc transpose data=temp out=tbl(drop=_name_);
var value;
id Letters;
by dr notsorted;
run;
data want;
set tbl;
array _a $ C -- R;
do i = 1 to dim(_a);
if missing(_a{i}) then _a{i} = "no";
end;
drop i;
run;
You already have a PROC TRANSPOSE solution, so I'll go a different route. If your sample data is a realistic representation of the complexity, you can do this in one step, using a DATA step.
data want;
set have;
by doctor date notsorted;
length C F R $ 3;
if first.date then do;
C='no';
F='no';
R='no';
end;
retain C F R;
if letters='C' then C='yes';
else if letters='F' then F='yes';
else if letters='R' then R='yes';
if last.date;
drop letters;
run;
General Proc Transpose
https://goo.gl/f6ZvY0
https://communities.sas.com/t5/General-SAS-Programming/General-Proc-Transpose/m-p/344744
I like 'proc corresp' the best for this kind of problem, it identifies John
with 4 levels, C twice.
HAVE
Up to 40 obs WORK.HAVE total obs=11
Obs DR LETTERS DATE
1 John C 01JAN2017
2 John F 01JAN2017
3 John R 01JAN2017
4 Ben C 16FEB2017
5 Ben R 16FEB2017
6 Alex C 02MAR2017
7 Alex R 02MAR2017
8 Alex F 02MAR2017
9 John C 25DEC2016
10 John R 25DEC2016
11 John F 25DEC2016
WANT
obs DR C F R
1 Alex yes yes yes
2 Ben yes no yes
3 John yes yes yes
Simple solutions solutions
\
PROC FREQ (just a display)
=========
Table of DR by LETTERS
DR LETTERS
Frequency|C |F |R | Total
---------+--------+--------+--------+
Alex | 1 | 1 | 1 | 3
---------+--------+--------+--------+
Ben | 1 | 0 | 1 | 2
---------+--------+--------+--------+
John | 2 | 2 | 2 | 6
---------+--------+--------+--------+
Total 4 3 4 11
PROC CORRESP
Obs LABEL C F R SUM
1 Alex 1 1 1 3
2 Ben 1 0 1 2
3 John 2 2 2 6
4 Sum 4 3 4 11
* _ _ _
_ __ ___ __ _| | _____ __| | __ _| |_ __ _
| '_ ` _ \ / _` | |/ / _ \_____ / _` |/ _` | __/ _` |
| | | | | | (_| | < __/_____| (_| | (_| | || (_| |
|_| |_| |_|\__,_|_|\_\___| \__,_|\__,_|\__\__,_|
;
data have;
informat Dr $4. Letters $1. Date date10.;
input Dr $ Letters $ Date ;
cards4;
John C 1-Jan-17
John F 1-Jan-17
John R 1-Jan-17
Ben C 16-Feb-17
Ben R 16-Feb-17
Alex C 2-Mar-17
Alex R 2-Mar-17
Alex F 2-Mar-17
John C 25-Dec-16
John R 25-Dec-16
John F 25-Dec-16
;;;;
run;quit;
*_
| |_ _ __ __ _ _ __ ___ _ __ ___ ___ ___
| __| '__/ _` | '_ \/ __| '_ \ / _ \/ __|/ _ \
| |_| | | (_| | | | \__ \ |_) | (_) \__ \ __/
\__|_| \__,_|_| |_|___/ .__/ \___/|___/\___|
|_|
;
* get rid of dups (ie 2 Cs);
proc sort data=have out=havunq nodupkey;
by dr letters;
run;quit;
* create variable with 'yes' drop date;
data havchr(drop=date);
set havunq;
yesno='yes';
run;quit;
/*
Up to 40 obs WORK.HAVCHR total obs=8
Obs DR LETTERS YESNO
1 Alex C yes
2 Alex F yes
3 Alex R yes
4 Ben C yes
5 Ben R yes
6 John C yes
7 John F yes
8 John R yes
*/
proc transpose data=havchr out=havxpo;
by dr ;
id letters;
var yesno;
run;quit;
/*
Up to 40 obs WORK.HAVXPO total obs=3
Obs DR _NAME_ C F R
1 Alex YESNO yes yes yes
2 Ben YESNO yes yes
3 John YESNO yes yes yes
*/
data want;
set havxpo;
array chrs _character_;
do over chrs;
if chrs=' ' then chrs='no';
end;
run;quit;
/*
Up to 40 obs WORK.WANT total obs=3
Obs DR _NAME_ C F R
1 Alex YESNO yes yes yes
2 Ben YESNO yes no yes
3 John YESNO yes yes yes
*/
proc print data=want;
run;quit;
* couple of possibilities;
*
___ ___ _ __ _ __ ___ ___ _ __
/ __/ _ \| '__| '__/ _ \/ __| '_ \
| (_| (_) | | | | | __/\__ \ |_) |
\___\___/|_| |_| \___||___/ .__/
|_|
;
Ods Exclude All;
Ods Output Observed=want;
Proc Corresp Data=have Observed dim=1;
Table dr, letters;
Run;
Ods Select All;
proc print data=want;run;
If you want counts;
Obs LABEL C F R SUM
1 Alex 1 1 1 3
2 Ben 1 0 1 2
3 John 2 2 2 6
4 Sum 4 3 4 11
* __
/ _|_ __ ___ __ _
| |_| '__/ _ \/ _` |
| _| | | __/ (_| |
|_| |_| \___|\__, |
|_|
;
proc freq data=have;
tables dr*letters/sparse norow nocolumn nopercent;
run;quit;
Table of DR by LETTERS
DR LETTERS
Frequency|C |F |R | Total
---------+--------+--------+--------+
Alex | 1 | 1 | 1 | 3
---------+--------+--------+--------+
Ben | 1 | 0 | 1 | 2
---------+--------+--------+--------+
John | 2 | 2 | 2 | 6
---------+--------+--------+--------+
Total 4 3 4 11
/*
Obs DR _NAME_ C F R
1 Alex DATE yes yes yes
2 Ben DATE yes no yes
3 John DATE yes yes yes
*/
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.