Re: Aggregating data by week

TheNovice · Posted 09-08-2021 05:58 PM

I have a daily transactional data for 2 years and I need to have a summary view like below. The status code is always "B" and the Memo code is always "M"

Status represents initial status of the ID. Memo represents final status of the ID

Data looks like:

ID	Status	Status Date	Memo	Memo_Date	Status Week	Memo_Week
1	B	01-Jan	M	01-Jan	1	1
2	B	01-Jan	M	02-Jan	1	1
3	B	01-Jan	M	02-Jan	1	1
4	B	01-Jan	M	10-Jan	1	2
5	B	02-Jan	M	10-Jan	1	2
6	B	02-Jan	M	14-Jan	1	3
7	B	02-Jan			1
8	B	02-Jan	M	03-Jan	1	1
9	B	10-Jan			2
10	B	10-Jan	M	14-Jan	2	3

What i need to do is summarize like this below. the intent is to show the % of status B from a given week that move to status B in week 1,2,3 etc.

	ID count	Memo_Week_1	Memo_Week_2	Memo_Week_3
1	8	3	2	2
2	2	0	0	1

What i am thinking of doing is writing a proc sql like below

select status_week, count(ID) as ID_ count, case when (memo_week = 1 then ID end) as Memo_Week_1,

case when (memo_week = 2 then ID end) as Memo_Week_2

and so forth. But this mean i have to hardcode all 52 weeks. There has to be a more efficient way to do this?

Thanks,

PaigeMiller · Posted 09-08-2021 06:32 PM

If you had actual years (01-Jan-2021) along with day and month, then you can use a Week format, and everything is simple. And make your output vertical (LONG) rather than horizontal (WIDE).

If you absolutely don't have years, the easiest thing to do is still to add a year, but really, you should have years as part of any date.

Example:

data fakedata;
     input id status_date :date9. memo_date :date9.;
     format status_date memo_date weeku6.;
     cards;
 1 01jan2021 01jan2021
 2 09jan2021 13jan2021
 3 04jan2021 01feb2021
 4 16jan2021 03feb2021
 ;
 
 proc freq data=fakedata;
     tables status_date memo_date/nocum nopercent;
 run;

--
Paige Miller

TheNovice · Posted 09-08-2021 09:10 PM

Thank you Paige,

Could you explain how the solution will lead me to the desired output? The transactional data is about 25 million records for 2 years.

	Total ID count	Memo_Week_1	Memo_Week_2	Memo_Week_3
Status week1	8	3	2	2
Status week 2	2	0	0	1

PaigeMiller · Posted 09-08-2021 09:15 PM

Do you really want 52 columns in the output?

The code I gave was an example of using week formats, it wasn't designed to produce the exact output, it was designed to hopefully get you started.

You could probably use PROC FREQ to get closer to your desired output.

proc freq data=have;
    tables status_date*memo_date/nocol norow nopercent;
    format status_date memo_date weeku6.;
run;

--
Paige Miller

TheNovice · Posted 09-08-2021 09:47 PM

Hi Paige, yes I would need 52 columns. I can't think of another way to correlate the initial status volume to the change in status through the weeks. The proc freq would only get me there partly... It is still missing the total ID count for each status week.

So, I am trying this, but was wondering if there is a faster method to get to the same kind of output

PROC SQL;
CREATE TABLE TT2 AS
SELECT Year,Status_WEEK,COUNT(ID) AS TOTAL,
CASE WHEN (MEMO_WEEK = 1 AND MEMO_TYPE = 'M' THEN ID END) AS MEMO_WEEK_1,
CASE WHEN (MEMO_WEEK = 2 AND MEMO_TYPE = 'M' THEN ID END) AS MEMO_WEEK_2,
CASE WHEN (MEMO_WEEK = 3 AND MEMO_TYPE = 'M' THEN ID END) AS MEMO_WEEK_3,
... AND SO ON FOR ALL 52 WEEKS
from tt1
group by Year,Status_week;
quit;

PaigeMiller · Posted 09-09-2021 07:18 AM

@TheNovice wrote:
Hi Paige, yes I would need 52 columns. I can't think of another way to correlate the initial status volume to the change in status through the weeks. The proc freq would only get me there partly... It is still missing the total ID count for each status week.

Remove the NOROW option in PROC FREQ.

--
Paige Miller

TheNovice · Posted 09-11-2021 12:00 AM

Hi Paige,

Thank you but that is not the solution i need. That will give a total count of the ID. I only need a count of the ID where memo_date is not blank

I am trying to show how much of the initial count resolved by subsequent weeks.

SASKiwi · Posted 09-11-2021 12:33 AM

You have yet to fully describe your problem. For example, what definition of week do you want to use?

SAS provides at least three different definitions here.

Also you say you have two years of data. That will mean at least 104 weekly columns in wide format. What happens at year boundaries?

TheNovice · Posted 09-11-2021 12:56 AM

I am using the week function. Using the latest suggestion from Kurt my output looks like this. It's almost what i need. the ID count is the total count but the count in the memo_week column needs to be for ID where the memo_Date is not null

There are x number of Id's that hit status B in a given week and then some move to status M in subsequent weeks. I am trying to show that distribution by week against the initial total. Not all Id's move into status M

	n
	n	MEMO_WEEK
STATUS_WEEK	id	0	1	2	3	4	5	6	7	8	9
0	56954	1004	33579	9354	5128	2923	1353	734	826	460	302
1	268372	.	82860	93011	37385	19232	12206	4428	4373	3376	2145
2	199854	.	.	50941	80656	26287	14966	8869	3139	3455	2835
3	222380	.	.	.	62169	82358	30440	14696	10530	4194	4085
4	257609	.	.	.	.	49375	114797	34623	19441	14087	5197

PaigeMiller · Posted 09-11-2021 02:29 PM

@TheNovice wrote:

that is not the solution i need. That will give a total count of the ID. I only need a count of the ID where memo_date is not blank

I am trying to show how much of the initial count resolved by subsequent weeks.

As far as I know, the PROC FREQ code I showed in my 2nd reply does exactly this. When either variable is missing, that record is not used in the table.

If you disagree, produce a small example and show us why the PROC FREQ code does not give you what you want; and make clear exactly what you do want

--
Paige Miller

Kurt_Bremser · Posted 09-09-2021 02:34 AM

A simple PROC REPORT:

proc report data=have;
column status_week n,memo_week;
define status_week / group;
define memo_week / across;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

TheNovice · Posted 09-11-2021 12:49 AM

Hi Kurt,

Thanks for the response. I modified the query to look like this:

But I need the count(id) in the ID column

count(ID when memo_date is not null) in the memo weeks

Is there a way to apply a where statement just to the memo columns

proc report data=cure2;
column status_week n,idn ,memo_week;
define status_week / group;
define id / n;
define memo_week / across;

run;

SAS Innovate 2025: Call for Content

Classroom Training Available!