BookmarkSubscribeRSS Feed
_MVB_
Obsidian | Level 7

Hi All,

I posted similar couple weeks ago, however these days I need to perform slightly different exercise.

I attached 'have' & 'want' in the excel, please have a look.

This is just a snapshot of what I have - my data is over 10 years with over a hundred million records

 

 

What I have: several years of historical data (month) of accounts with an indicator of a late payment for a respective month and a rate (how many accounts had late payment in a given month). In each month there are unique accounts, such as Aug-10 has 1,270,000 unique accounts, while entire dataset is 33M non-unique accounts. Another point to mention, some account get closed overtime (once it happens, they won't appear on a following month), however new account added. This is why MoM total accounts go up, but not all accounts from previous month would appear on the following month.

This is why in table 'want' I would have 1,647,283 unique accounts (combination of all unique accounts that appear at least 1 time during observation period).

 

What I need is following:

I need to reduce data to unique account level, while not loosing insights of monthly late payment rate and overall payment rate.

 

Until now, I tried two approaches:

1. SurveySelect with STRATA by account and SAMPSIZE=1

2. rand('Uniform') with following sorting and NODUPKEY by account  

In both cases, I could achieve reducing accounts to unique account, however when running frequency for each month and total, rates where not close to rates in 'have' - such as, oldest months came back with rate ~6.5% and earliest with ~3.5%, while total rate ~ 5.2.

 

Can anyone suggest procedure/steps to be taken to handle this through random selection/exclusion to reduce data and keep all insightful information?

 

Thanks in advance!

 

1 REPLY 1
ChrisNZ
Tourmaline | Level 20

Modify your post so you don't use an excel file. This might entice more replies.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 361 views
  • 0 likes
  • 2 in conversation