BookmarkSubscribeRSS Feed
_MVB_
Obsidian | Level 7

Hi All,

I posted similar couple weeks ago, however these days I need to perform slightly different exercise.

I attached 'have' & 'want' in the excel, please have a look.

This is just a snapshot of what I have - my data is over 10 years with over a hundred million records

 

 

What I have: several years of historical data (month) of accounts with an indicator of a late payment for a respective month and a rate (how many accounts had late payment in a given month). In each month there are unique accounts, such as Aug-10 has 1,270,000 unique accounts, while entire dataset is 33M non-unique accounts. Another point to mention, some account get closed overtime (once it happens, they won't appear on a following month), however new account added. This is why MoM total accounts go up, but not all accounts from previous month would appear on the following month.

This is why in table 'want' I would have 1,647,283 unique accounts (combination of all unique accounts that appear at least 1 time during observation period).

 

What I need is following:

I need to reduce data to unique account level, while not loosing insights of monthly late payment rate and overall payment rate.

 

Until now, I tried two approaches:

1. SurveySelect with STRATA by account and SAMPSIZE=1

2. rand('Uniform') with following sorting and NODUPKEY by account  

In both cases, I could achieve reducing accounts to unique account, however when running frequency for each month and total, rates where not close to rates in 'have' - such as, oldest months came back with rate ~6.5% and earliest with ~3.5%, while total rate ~ 5.2.

 

Can anyone suggest procedure/steps to be taken to handle this through random selection/exclusion to reduce data and keep all insightful information?

 

Thanks in advance!

 

1 REPLY 1
ChrisNZ
Tourmaline | Level 20

Modify your post so you don't use an excel file. This might entice more replies.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 455 views
  • 0 likes
  • 2 in conversation