I have the following dataset. I am struggling to create a table shown below.
[Edited the previous question. The changes are in the output table. I added last two rows in the output table. Row A&E is sum of the values in rows A and E. Similarly Row D&E is the sum of the values in rows D and E.]
data have;
infile datalines ;
input
ID
Park $
Month $ ;
datalines;
001 A Jan
002 A Feb
003 B Mar
004 C Apr
005 B Jan
006 D May
007 E Jun
008 A Feb
009 C Dec
010 A Nov
011 D Sep
012 C Aug
013 B Oct
014 B Jul
015 E Sep
016 C Nov
017 D Jul
018 E Oct
019 E Mar
020 C Apr
;
run;
Park | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
A | 1 | 2 | 1 | |||||||||
B | 1 | 1 | 1 | 1 | ||||||||
C | 2 | 1 | 1 | 1 | ||||||||
D | 1 | 1 | 1 | |||||||||
E | 1 | 1 | 1 | 1 | ||||||||
A&E | 1 | 2 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 2 | 0 |
D&E | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
Basic report:
Proc report data=have; columns park Month; define park/group; define month/ across " " order=data; run;
If you don't want the default . to appear where missing then place before the procedure.
options missing=' ';
and then reset the default missing behavior with: options missing='.'; after.
If you need a data set then say so.
Also, you may find using actual date values in the long run more flexible than character values like "Jan" and "Feb". For one thing you may have a hard time getting things in the correct order because default behavior in many places will place "Apr" first. With an actual date there are several formats such as Monname3. that will display just the month abbreviated name. Plus with a data a change of format could create calendar quarter or year count/totals without changing the data at all.
the less intuitive one 🙂
proc sql;
select park,
sum(case month when "Jan" then 1 else 0 end) as Jan,
sum(case month when "Feb" then 1 else 0 end) as Feb,
sum(case month when "Mar" then 1 else 0 end) as Mar,
sum(case month when "Apr" then 1 else 0 end) as Apr,
sum(case month when "May" then 1 else 0 end) as May,
sum(case month when "Jun" then 1 else 0 end) as Jun,
sum(case month when "Jul" then 1 else 0 end) as Jul,
sum(case month when "Aug" then 1 else 0 end) as Aug,
sum(case month when "Sep" then 1 else 0 end) as Sep,
sum(case month when "Oct" then 1 else 0 end) as Oct,
sum(case month when "Nov" then 1 else 0 end) as Nov,
sum(case month when "Dec" then 1 else 0 end) as Dec
from have
group by park;
quit;
an alternative approach
data have1;
set have;
dummy=1;
run;
proc transpose data=have1 out=want;
by park notsorted;
var dummy;
id month;
run;
proc summary data=want nway;
class park;
var _numeric_;
output out=want1(drop=_:) sum= / autolabel;
run;
and the IML version of it
proc iml;
use have;
read all var {'park'};
read all var {'month'};
close;
u1=unique(park) ;
u2=unique(month);
order={5, 4, 8, 1, 9, 7, 6, 2, 12, 11, 10, 3};
u2=u2[order]`;
res=j(ncol(u1)*ncol(u2),1,.);
do i=1 to ncol(u1);
do j=1 to ncol(u2);
res[j+(i-1)*ncol(u2),1]=t(park=u1[i])*(month=u2[j]);
end;
end;
res1=shape(res, ncol(u1));
print res1 [rowname=u1 colname=u2];
or with an array approach plus the goodie to have the months correctly sorted.
proc sql;
select distinct month into :oks separated by " "
from have order by
input(cats('01',month,'2021'), date9.);
quit;
data want;
set have;
array temp &oks;
do over temp;
temp=(month=vname(temp));
end;
drop month id;
run;
proc summary data=want nway;
class park;
var _numeric_;
output out=want1(drop=_:) sum= / autolabel;
run;
or transreg design for one hot encoding
proc transreg data=have design;
MODEL IDENTITY(id)
CLASS(month / zero=none );
ID park;
OUTPUT OUT=want(DROP=_TYPE_ _NAME_ INTERCEPT id);
RUN;
proc sql;
select distinct name into :oks separated by " "
from dictionary.columns where libname="WORK" and memname="WANT" and type="num" order by
input(cats('01',substr(name,6,3),'2021'), date9.);
quit;
data want;
format &oks best12.;
set want;
run;
proc summary data=want nway;
class park;
var _numeric_;
output out=want1(drop=_:) sum= / autolabel;
run;
BIG hint: whenever you have date-related values in SAS, store them as date values, and use the appropriate format to get your intended appearance. It is then much easier to have correct sorting, or do calculations.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.