BookmarkSubscribeRSS Feed
Mirisage
Obsidian | Level 7

Hi Community,

I was given a text file named have_2.txt with 37 variables and over million records (a sample is attached).

I need to read it into SAS.

Some variable names of have_2.txt file were having two or more standing alone words. So, I have manually changed those names by putting an “underscore” in between like below.

Originally

I changed

Loan number

Loan_number

------

----

Days Star Onn (oldest Amb)

Days_Star_Onn_oldest_Amb

The revised file is named as have_3.txt (attached).

Q1: Does this manual name changes in text file affect my final SAS data file

Q2: To read the data into SAS, I have tried below code but honestly not having full knowledge about what is happening internally.

This code reads the data into SAS but values of some variables are truncated.

I would really appreciate it if anyone of you could help me to develop the most accurate code.

libname a "C:\MNS\";

Data a.test;

LENGTH      

Categor $ 18 /*length of this variable was set at $18 by trial and error, otherwise values under this variable are truncated*/

ABBRV_name  $ 41; /*length of this variable was set at $41 by trial and error, otherwise values under this variable are truncated*/

  Infile "C:\MNS\have_3.txt" firstobs=2 dlm="," dsd LRECL=4096 truncover TERMSTR=CRLF;

  Informat Account_date ddmmyy10.    Date_cancel  ddmmyy10. Date_Of_Firs_Deposit ddmmyy10.;

  Input

Loan_number

Account_date

ABCDEF_number

Current_number $

ABBRV_name $

Notifcation_type $

Regulati_code

Office_code $

XXXX_code $

Seconda_Nameser_code $

Country_Name $

AXZREST_code

Product_Code $

Localnal_nominclacation_code $

Date_cancel

Date_Of_Firs_Deposit

Days_Star_Onn_oldest_Amb

Officerv_longe_name $

Closing_balance $

Locatio $

Station $

XYZ_Limit

XYZ_Balance

Exitt

Outstan

Categor $

Rurald $

Dividee_DD_Names $

Opennq

Regional_Grade_Numb $

Delinquency_Band $

Fiscal_Quarter $

Fiscal_Year

Krrears_Sand $

BpplicationNumber

SkkkkUsed

Scccc_Band $

;

  Format Account_date date9.  Date_cancel date9. Date_Of_Firs_Deposit date9.;

run;

  

  Mirisage

7 REPLIES 7
Ksharp
Super User

Oh, It is CSV file.You can use proc import to get it conveniently.

Then change some variables into DATE .

proc import datafile='c:\temp\have_2.txt' out=have dbms=csv replace;
 getnames=yes;run;

Ksharp

shivas
Pyrite | Level 9

Hi,

You can directly import the csv file using SAS Import wizard,and you can also see the code what sas is using internally in the log file or once import is done if you press F4 you will get the complete code what SAS has used internally in your editor.

Hope it helps...

    data WORK.ss                                      ;

     infile 'F:\Community_SAS\have_3.txt' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

       informat Loan_number best32. ;

       informat Account_date $19. ;

       informat ABCDEF_number best32. ;

       informat Current_number $12. ;

       informat ABBRV_name $16. ;

       informat Notifcation_type $4. ;

       informat Regulati_code best32. ;

       informat Office_code $4. ;

       informat XXXX_code $5. ;

       informat Seconda_Nameser_code $5. ;

       informat Country_Name $19. ;

       informat AXZREST_code best32. ;

       informat Product_Code $5. ;

       informat Localnal_nominclacation_code $3. ;

       informat Date_cancel anydtdtm40. ;

       informat Date_Of_Firs_Deposit anydtdtm40. ;

       informat Days_Star_Onn_oldest_Amb best32. ;

       informat Officerv_longe_name $5. ;

       informat Closing_balance comma32. ;

       informat Locatio $10. ;

       informat Station $8. ;

       informat XYZ_Limit best32. ;

       informat XYZ_Balance best32. ;

       informat Exitt best32. ;

       informat Outstan best32. ;

       informat Categor $16. ;

       informat Rurald $33. ;

       informat Dividee_DD_Names $11. ;

       informat Opennq best32. ;

       informat Regional_Grade_Numb $10. ;

       informat Delinquency_Band $9. ;

       informat Fiscal_Quarter $9. ;

       informat Fiscal_Year best32. ;

       informat Krrears_Sand $9. ;

       informat BpplicationNumber $1. ;

       informat SkkkkUsed $1. ;

       informat Scccc_Band $1. ;

       format Loan_number best12. ;

       format Account_date $19. ;

       format ABCDEF_number best12. ;

       format Current_number $12. ;

       format ABBRV_name $16. ;

       format Notifcation_type $4. ;

       format Regulati_code best12. ;

       format Office_code $4. ;

       format XXXX_code $5. ;

       format Seconda_Nameser_code $5. ;

       format Country_Name $19. ;

       format AXZREST_code best12. ;

       format Product_Code $5. ;

       format Localnal_nominclacation_code $3. ;

       format Date_cancel datetime. ;

       format Date_Of_Firs_Deposit datetime. ;

       format Days_Star_Onn_oldest_Amb best12. ;

       format Officerv_longe_name $5. ;

       format Closing_balance comma12. ;

       format Locatio $10. ;

       format Station $8. ;

       format XYZ_Limit best12. ;

       format XYZ_Balance best12. ;

       format Exitt best12. ;

       format Outstan best12. ;

       format Categor $16. ;

       format Rurald $33. ;

       format Dividee_DD_Names $11. ;

       format Opennq best12. ;

       format Regional_Grade_Numb $10. ;

       format Delinquency_Band $9. ;

       format Fiscal_Quarter $9. ;

       format Fiscal_Year best12. ;

       format Krrears_Sand $9. ;

       format BpplicationNumber $1. ;

       format SkkkkUsed $1. ;

       format Scccc_Band $1. ;

    input

                Loan_number

                Account_date $

                ABCDEF_number

                Current_number $

                ABBRV_name $

                Notifcation_type $

                Regulati_code

                Office_code $

                XXXX_code $

                Seconda_Nameser_code $

                Country_Name $

                AXZREST_code

                Product_Code $

                Localnal_nominclacation_code $

                Date_cancel

                Date_Of_Firs_Deposit

                Days_Star_Onn_oldest_Amb

                Officerv_longe_name $

                Closing_balance

                Locatio $

                Station $

                XYZ_Limit

                XYZ_Balance

                Exitt

                Outstan

                Categor $

                Rurald $

                Dividee_DD_Names $

                Opennq

                Regional_Grade_Numb $

                Delinquency_Band $

                Fiscal_Quarter $

                Fiscal_Year

                Krrears_Sand $

                BpplicationNumber $

                SkkkkUsed $

                Scccc_Band $

    ;

    if _ERROR_ then call symputx('_EFIERR_',1); 

    run;

Thanks,

Shiva

ballardw
Super User

The onlly affect your row header changes would have would be on the names of variables if using PROC IMPORT to generate code to read the data.

You get truncated values because PROC IMPORT, or the code generated there by or that you write is either not allowing enough characters $ xx or BESTXX., or in some cases they shouldn't be numeric variables but strings as they exceed the length of numeric precision for your OS.

I would read the data and then do a proc freq (possibly limiting the the number of observations to some extent) excluding unique variables such as IDs if such exist to see what the I have. In the case of truncated fields make the informat larger and in some cases change from numeric to string.

At which point I go through the code created by proc import an look for things that may not work the way I want or expect.

Such as should loan_number, ABCDEF_number, current_number and such actually be numeric (am i going to do arithmatic with them? probably not numeric, are they longer than the number of significant digits my system supports- probably not numeric).

If this read code is going to be used more than once I would look to get varaible names that are somewhat more meaningful and assign labels. For instance you have a variable "Locatio" which makes me think that it would likely be referring to a Location which probably makes a better variable name as whole words do when possible. But what "location" does it refer to: Mortgage property, branch office, business address? Use a label to make the use of these variables easier in the future.

You also have a variable  Localnal_nominclacation_code which makes me think a search and replace was done on the row of variable names and you got some changes you did not expect or want.

Typically at this stage I remove FORMATS except for date, datetime or time variables as they will default to the format equivalent of the informat mostly.

Mirisage
Obsidian | Level 7

Hi Ksharp, Shivas and Ballardw,

Thank all of you for these excellent help.

I have used both Ksharp’s and Shivas’s approaches and they produce same results except the fact  Shivas’s appraoch for importing using proc import wizard produce variable names as standing alone words for variables having more than one word.

Could you also help in these 3 specific issues:
1). I wanted to convert two date variables named Date_cancel and Date_Of_Firs_Deposit into date9. format. I created the code very below but it doesn’t do the job
2). The variables named Account_date, Dividee_DD_Names were imported as character variables but I need to convert them also into a date9. format. Could help me to code this?
3). Isn’t there any way to avoid a further data step and incorporate the date conversions into Ksharp’s proc import code?

data have_date_revised;

set have;

Informat  Date_cancel ddmmyy10. Date_Of_Firs_Deposit ddmmyy10.;

Format  Date_cancel date9. Date_Of_Firs_Deposit date9.;

run;

I really appreciate your help.

Mirisage

ballardw
Super User

For 1) You would be better off modifying the FORMAT statement in the original program, if using a datastep and not basic proc import. When you say what you tried didn't do the job how did it not do the job? Failure, syntax warning, unexpected date values when printing?

For 2) try changing the INFORMAT to ANYDTDTE10. and remove the $ in the INPUT statement following the variables. Assign your format DATE9. as well.

For 3) Not in proc import. That's why I usually run proc import and then pull the generated code from the LOG to modify.

Ksharp
Super User

Unfortuantely, proc import doesn't offer us a more manipulation .

I would like to use the third item proposed by ballardw, just like Shivas did.

Mirisage
Obsidian | Level 7

I got it done using your clues, ha ha.

Big thanks to Ballardw, Ksharp and  Shivas again!

Best regards

Mirisage

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1147 views
  • 0 likes
  • 4 in conversation