BookmarkSubscribeRSS Feed
TheNovice
Quartz | Level 8

I have a daily transactional data for 2 years and I need to have a summary view like below. The status code is always "B" and the Memo code is always "M"

Status represents initial status of the ID. Memo represents final status of the ID

 

Data looks like: 

 

IDStatusStatus DateMemoMemo_DateStatus WeekMemo_Week
1B01-JanM01-Jan                    1                      1
2B01-JanM02-Jan                    1                      1
3B01-JanM02-Jan                    1                      1
4B01-JanM10-Jan                    1                      2
5B02-JanM10-Jan                    1                      2
6B02-JanM14-Jan                    1                      3
7B02-Jan                      1 
8B02-JanM03-Jan                    1                      1
9B10-Jan                      2 
10B10-JanM14-Jan                    2                      3

 

What i need to do is summarize like this below. the intent is to show the % of status B from a given week that move to status B in week 1,2,3 etc. 

 

 ID countMemo_Week_1Memo_Week_2Memo_Week_3
 1 8322
  2 2001

 

What i am thinking of doing is writing a proc sql like below

 

select status_week, count(ID) as ID_ count, case when (memo_week = 1 then ID end) as Memo_Week_1,

case when (memo_week = 2 then ID end) as Memo_Week_2

 

and so forth. But this mean i have to hardcode all 52 weeks. There has to be a more efficient way to do this?

 

Thanks,

 

11 REPLIES 11
PaigeMiller
Diamond | Level 26

If you had actual years (01-Jan-2021) along with day and month, then you can use a Week format, and everything is simple. And make your output vertical (LONG) rather than horizontal (WIDE).

 

If you absolutely don't have years, the easiest thing to do is still to add a year, but really, you should have years as part of any date.

 

Example:

 

data fakedata;
     input id status_date :date9. memo_date :date9.;
     format status_date memo_date weeku6.;
     cards;
 1 01jan2021 01jan2021
 2 09jan2021 13jan2021
 3 04jan2021 01feb2021
 4 16jan2021 03feb2021
 ;
 
 proc freq data=fakedata;
     tables status_date memo_date/nocum nopercent;
 run;

 

 

 

 

--
Paige Miller
TheNovice
Quartz | Level 8

Thank you Paige,

 

Could you explain how the solution will lead me to the desired output? The transactional data is about 25 million records for 2 years.

 

 Total ID countMemo_Week_1Memo_Week_2Memo_Week_3
Status week1 8322
Status week 2 2001
PaigeMiller
Diamond | Level 26

Do you really want 52 columns in the output?

 

The code I gave was an example of using week formats, it wasn't designed to produce the exact output, it was designed to hopefully get you started.

 

You could probably use PROC FREQ to get closer to your desired output.

 

proc freq data=have;
    tables status_date*memo_date/nocol norow nopercent;
    format status_date memo_date weeku6.;
run;
--
Paige Miller
TheNovice
Quartz | Level 8
Hi Paige, yes I would need 52 columns. I can't think of another way to correlate the initial status volume to the change in status through the weeks. The proc freq would only get me there partly... It is still missing the total ID count for each status week.

So, I am trying this, but was wondering if there is a faster method to get to the same kind of output

PROC SQL;
CREATE TABLE TT2 AS
SELECT Year,Status_WEEK,COUNT(ID) AS TOTAL,
CASE WHEN (MEMO_WEEK = 1 AND MEMO_TYPE = 'M' THEN ID END) AS MEMO_WEEK_1,
CASE WHEN (MEMO_WEEK = 2 AND MEMO_TYPE = 'M' THEN ID END) AS MEMO_WEEK_2,
CASE WHEN (MEMO_WEEK = 3 AND MEMO_TYPE = 'M' THEN ID END) AS MEMO_WEEK_3,
... AND SO ON FOR ALL 52 WEEKS
from tt1
group by Year,Status_week;
quit;
PaigeMiller
Diamond | Level 26

@TheNovice wrote:
Hi Paige, yes I would need 52 columns. I can't think of another way to correlate the initial status volume to the change in status through the weeks. The proc freq would only get me there partly... It is still missing the total ID count for each status week.

Remove the NOROW option in PROC FREQ.

--
Paige Miller
TheNovice
Quartz | Level 8

Hi Paige,

 

Thank you but that is not the solution i need. That will give a total count of the ID. I only need a count of the ID where memo_date is not blank

 

I am trying to show how much of the initial count resolved by subsequent weeks. 

SASKiwi
PROC Star

You have yet to fully describe your problem. For example, what definition of week do you want to use?

 

SAS provides at least three different definitions here.

 

Also you say you have two years of data. That will mean at least 104 weekly columns in wide format. What happens at year boundaries?

TheNovice
Quartz | Level 8

I am using the week function. Using the latest suggestion from Kurt my output looks like this. It's almost what i need. the ID count is the total count but the count in the memo_week column needs to be for ID where the memo_Date is not null

 

There are x number of Id's that hit status B in a given week and then some move to status M in subsequent weeks. I am trying to show that distribution by week against the initial total. Not all Id's move into status M 

 

 n         
 nMEMO_WEEK         
STATUS_WEEKid0123456789
0569541004335799354512829231353734826460302
1268372.82860930113738519232122064428437333762145
2199854..509418065626287149668869313934552835
3222380...621698235830440146961053041944085
4257609....493751147973462319441140875197
PaigeMiller
Diamond | Level 26

@TheNovice wrote:

 

that is not the solution i need. That will give a total count of the ID. I only need a count of the ID where memo_date is not blank

 

I am trying to show how much of the initial count resolved by subsequent weeks. 


As far as I know, the PROC FREQ code I showed in my 2nd reply does exactly this. When either variable is missing, that record is not used in the table.


If you disagree, produce a small example and show us why the PROC FREQ code does not give you what you want; and make clear exactly what you do want

--
Paige Miller
TheNovice
Quartz | Level 8

Hi Kurt,

 

Thanks for the response. I modified the query to look like this:

 

But I need the count(id) in the ID column

count(ID when memo_date is not null) in the memo weeks 

 

Is there a way to apply a where statement just to the memo columns 

 

proc report data=cure2;
column status_week n,idn ,memo_week;
define status_week / group;
define id / n;
define memo_week / across;

run;

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 942 views
  • 0 likes
  • 4 in conversation