Hi SAS Community,
I have a different data set (attached).
I have got the sum of variables named “principal” and “write_off_total” by account_number using below sql code.
(PGStats and SGB have improved this code).
proc sql;
create table want as
select
min(bank_number) as bank_number ,
left((put(min (ACCOUNT_NUMBER),7.))) as ACCOUNT_NUMBER ,
min(country) as country ,
left((put(min(branch_number),15.))) as branch_number ,
min(currency_code) as currency_code ,
max (POST_DATE) as POST_DATE Format=date9. ,
max (EFFECTIVE_DATE) as EFFECTIVE_DATE Format=date9. ,
sum (PRINCIPAL) as sum_PRINCIPAL ,
sum (Write_off_Total) as net_Write_off_Total
from a.post_this_data
group by account_number
;
quit;
Problems:
i) Variable “ACCOUNT_NUMBER” in the created data set “want” has awkward values (e.g. 7.59E10)
ii) Variable “branch_number” seemingly have more space than necessary.
Question:
i) Is there any method to decide the optimum length of these variables (or can we decide optimum length of any variable in a given data set at once – without trail and error)?
ii) I need “Post_date” and “effective_date” variables in date9. format but code doesn’t give it?
Your help is highly appreciated.
Thanks
Mirisiage
Optimum for file size or readability? Would using best32. for those variables suffice?
If you are not doing arithmetic with a "number" such as account number it probably should not be a numeric but should be a string. Then you won't get the supprises when you try to format a long number into a shorter string. You have an account number with 10 significant digits and you tried to make it fit into 7.
Similar, your branch number probably shouldn't be numeric either. Then the "problem" related to those goes back to whoever designs the original table to make sure they are big enough. But if you have codes that vary from 1234567890 to 12 you are going to have excess space for some.
What do you get for your date values? Hard to trouble shoot without examples.
Hi Art297,
Thank you very much.
I have replaced the lengths with best32., it worked. No more awkward values for “ACCOUNT_NUMBER” in the created data set “want”.
BTW, what is the difference between optimum for file size and readability?
proc sql;
create table want as
select
min(bank_number) as bank_number ,
left((put(min (ACCOUNT_NUMBER),best32.))) as ACCOUNT_NUMBER ,
min(country) as country ,
left((put(min(branch_number),best32.))) as branch_number ,
min(currency_code) as currency_code ,
max (POST_DATE) as POST_DATE Format=date9. ,
max (EFFECTIVE_DATE) as EFFECTIVE_DATE Format=date9. ,
sum (PRINCIPAL) as sum_PRINCIPAL ,
sum (Write_off_Total) as net_Write_off_Total
from a.post_this_data
group by account_number
;
quit;
Hi Ballardw,
When I ran the above code on the attached data set, the values I get for two "date-related" variables are like below. How could I get date9. format?
post_date effective_date
********* ***********
********* ***********
********* ***********
Thanks
Mirisage
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.