I have a very large data set similar to the following sample data set 'have'. Some weeks are missing (12/10/22, and 01/07/23 in the sample data set) in the data set since there are no values on those weeks. However, I need to show all the weeks during the whole period (12/3/22 to 1/21/23) in an order in the output tables or bar chart (missing weeks will have zero value in the output). How can I do that?
data have; informat Week mmddyy8.; input ID Test $ Week value $ ; Format week date7.; cards; 1 x 12/03/22 Pos 1 y 12/03/22 pos 2 z 12/17/22 Neg 1 x 12/17/22 Pos 2 y 12/24/22 pos 3 z 12/24/22 Neg 3 x 12/24/22 Pos 2 y 12/31/22 pos 3 z 01/14/23 Neg 3 x 01/14/23 Pos 2 y 01/21/23 pos 3 z 01/21/23 Neg ;
Depending on the bar char you want to create, you may not need the obs for missing weeks:
proc sql noprint;
select min(Week), max(Week)
into :first, :last
from have;
quit;
proc sgplot data=WORK.HAVE;
vbar Week /;
format Week weeku5.;
yaxis grid;
xaxis interval= week values=(&first to &last by week);
run;
Do you need extra observations/lines for each possible test, or is it sufficient to create a single observation for a week where both test and value are missing?
Please read my post again and answer my question.
Depending on the bar char you want to create, you may not need the obs for missing weeks:
proc sql noprint;
select min(Week), max(Week)
into :first, :last
from have;
quit;
proc sgplot data=WORK.HAVE;
vbar Week /;
format Week weeku5.;
yaxis grid;
xaxis interval= week values=(&first to &last by week);
run;
@Barkat wrote:
Wonderful!
Is it also possible to keep the weeks with no values in the output tables?
The following code does not include the weeks with no values in the table.
PROC TABULATE data=have;
CLASS Test Week;
TABLE Test="" all="Total", week * N='' all="Total"* N='';
RUN;
This is one of the cases where a PRELOADFMT might be appropriate.
Look at my other post to create a dateinterval format.
The Preloadfmt option requires use of a format with a known finite list of values to display, so you can't use the date7 format.
Then :
PROC TABULATE data=have; CLASS Test; class Week/missing preloadfmt; format week dateinterval. ; TABLE Test="" all="Total", week * N='' all="Total"* N=' /printmiss'; RUN;
You should show an example of how you expect to use those missing dates in a report or "output table". Since all of the other variables would be missing it is not obvious to me exactly what you expect to appear any table.
A possible approach with some uses is to create a custom format for the expected date values and use a Preloadfmt option with the variable that need those dates. Such a format might be made as:
data cntlinset; fmtname="Dateinterval"; type='N'; do date='03DEC2022'd to '31Jan2023'd by 7; start=date; label=put(date,mmddyy8.); output; end; run; proc format cntlin=cntlinset; run;
Strongly suggest if you have any choice to use 4-digit years. It reduces any possibly ambiguity in data that may have long history of values (economic or political) for example, and is just plain easier to read in general. Not to mention that the values become dependent on the setting of system options and might result if different values if code is run on someone else's system that is using a different value for the Yearcutoff option.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.