I have a daily transactional data for 2 years and I need to have a summary view like below. The status code is always "B" and the Memo code is always "M"
Status represents initial status of the ID. Memo represents final status of the ID
Data looks like:
ID | Status | Status Date | Memo | Memo_Date | Status Week | Memo_Week |
1 | B | 01-Jan | M | 01-Jan | 1 | 1 |
2 | B | 01-Jan | M | 02-Jan | 1 | 1 |
3 | B | 01-Jan | M | 02-Jan | 1 | 1 |
4 | B | 01-Jan | M | 10-Jan | 1 | 2 |
5 | B | 02-Jan | M | 10-Jan | 1 | 2 |
6 | B | 02-Jan | M | 14-Jan | 1 | 3 |
7 | B | 02-Jan | 1 | |||
8 | B | 02-Jan | M | 03-Jan | 1 | 1 |
9 | B | 10-Jan | 2 | |||
10 | B | 10-Jan | M | 14-Jan | 2 | 3 |
What i need to do is summarize like this below. the intent is to show the % of status B from a given week that move to status B in week 1,2,3 etc.
ID count | Memo_Week_1 | Memo_Week_2 | Memo_Week_3 | |
1 | 8 | 3 | 2 | 2 |
2 | 2 | 0 | 0 | 1 |
What i am thinking of doing is writing a proc sql like below
select status_week, count(ID) as ID_ count, case when (memo_week = 1 then ID end) as Memo_Week_1,
case when (memo_week = 2 then ID end) as Memo_Week_2
and so forth. But this mean i have to hardcode all 52 weeks. There has to be a more efficient way to do this?
Thanks,
If you had actual years (01-Jan-2021) along with day and month, then you can use a Week format, and everything is simple. And make your output vertical (LONG) rather than horizontal (WIDE).
If you absolutely don't have years, the easiest thing to do is still to add a year, but really, you should have years as part of any date.
Example:
data fakedata;
input id status_date :date9. memo_date :date9.;
format status_date memo_date weeku6.;
cards;
1 01jan2021 01jan2021
2 09jan2021 13jan2021
3 04jan2021 01feb2021
4 16jan2021 03feb2021
;
proc freq data=fakedata;
tables status_date memo_date/nocum nopercent;
run;
Thank you Paige,
Could you explain how the solution will lead me to the desired output? The transactional data is about 25 million records for 2 years.
Total ID count | Memo_Week_1 | Memo_Week_2 | Memo_Week_3 | |
Status week1 | 8 | 3 | 2 | 2 |
Status week 2 | 2 | 0 | 0 | 1 |
Do you really want 52 columns in the output?
The code I gave was an example of using week formats, it wasn't designed to produce the exact output, it was designed to hopefully get you started.
You could probably use PROC FREQ to get closer to your desired output.
proc freq data=have;
tables status_date*memo_date/nocol norow nopercent;
format status_date memo_date weeku6.;
run;
@TheNovice wrote:
Hi Paige, yes I would need 52 columns. I can't think of another way to correlate the initial status volume to the change in status through the weeks. The proc freq would only get me there partly... It is still missing the total ID count for each status week.
Remove the NOROW option in PROC FREQ.
Hi Paige,
Thank you but that is not the solution i need. That will give a total count of the ID. I only need a count of the ID where memo_date is not blank
I am trying to show how much of the initial count resolved by subsequent weeks.
You have yet to fully describe your problem. For example, what definition of week do you want to use?
SAS provides at least three different definitions here.
Also you say you have two years of data. That will mean at least 104 weekly columns in wide format. What happens at year boundaries?
I am using the week function. Using the latest suggestion from Kurt my output looks like this. It's almost what i need. the ID count is the total count but the count in the memo_week column needs to be for ID where the memo_Date is not null
There are x number of Id's that hit status B in a given week and then some move to status M in subsequent weeks. I am trying to show that distribution by week against the initial total. Not all Id's move into status M
n | |||||||||||
n | MEMO_WEEK | ||||||||||
STATUS_WEEK | id | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
0 | 56954 | 1004 | 33579 | 9354 | 5128 | 2923 | 1353 | 734 | 826 | 460 | 302 |
1 | 268372 | . | 82860 | 93011 | 37385 | 19232 | 12206 | 4428 | 4373 | 3376 | 2145 |
2 | 199854 | . | . | 50941 | 80656 | 26287 | 14966 | 8869 | 3139 | 3455 | 2835 |
3 | 222380 | . | . | . | 62169 | 82358 | 30440 | 14696 | 10530 | 4194 | 4085 |
4 | 257609 | . | . | . | . | 49375 | 114797 | 34623 | 19441 | 14087 | 5197 |
@TheNovice wrote:
that is not the solution i need. That will give a total count of the ID. I only need a count of the ID where memo_date is not blank
I am trying to show how much of the initial count resolved by subsequent weeks.
As far as I know, the PROC FREQ code I showed in my 2nd reply does exactly this. When either variable is missing, that record is not used in the table.
If you disagree, produce a small example and show us why the PROC FREQ code does not give you what you want; and make clear exactly what you do want
A simple PROC REPORT:
proc report data=have;
column status_week n,memo_week;
define status_week / group;
define memo_week / across;
run;
Hi Kurt,
Thanks for the response. I modified the query to look like this:
But I need the count(id) in the ID column
count(ID when memo_date is not null) in the memo weeks
Is there a way to apply a where statement just to the memo columns
proc report data=cure2;
column status_week n,idn ,memo_week;
define status_week / group;
define id / n;
define memo_week / across;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.