BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I'm having a lot of difficulty reading a .csv file into SAS. The file is large (~900,000 observations) and has an ID column that is a 14 digit number. When I read it into SAS, it comes in as a number in scientific notation with length 8 (i.e., I lose the last part of the ID number). I tried using Excel to change the format of the column, but my version of SAS won't read a .xlsx file and .xls files are limited to 65K observations. I really don't know what to try next. Any suggestions?

Thanks!
16 REPLIES 16
RickM
Fluorite | Level 6
Even though ID is a number it is a good idea to read it in as a chacter, unless Im missing something else about your process.

Good luck!
deleted_user
Not applicable
I agree. My problem is that SAS automatically reads it in as a number. Not sure how to get around that.
RickM
Fluorite | Level 6
What is the code that you are using to read in the csv? It is hard to give any other suggestions without it.
deleted_user
Not applicable
I'm just using the Import Data function under the File menu. It uses the following code:

PROC IMPORT OUT= WORK.temp
DATAFILE= "E:\f55.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;
RickM
Fluorite | Level 6
When you run proc import it should generate a data step in the log. You could copy and paste this into your program and change the informat there. Check the documentation for more info.
deleted_user
Not applicable
I did that (trying several different informat lines), but each time I get an error saying I'm trying to read in a numeric field as a character field and then SAS overrules the command and reads it as an 8 digit number in scientific notation.
RickM
Fluorite | Level 6
Are you also adding $ to the input statement as well? Also both the informat and format statements?
deleted_user
Not applicable
Yes. Here are the relevant lines of the code as I modified them:

informat ID $14. ;
format ID $14. ;
input
ID $;

Even with that, it reads in (for example)
84037175123753
as
8.4037175E13
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
What you are suggesting with the "relevant lines of the code as I modified them" doesn't make sense from the SAS processing perspective -- the key statements you have is with the INPUT and INFORMAT statement and how you have defined your variable -- which will be a CHARACTER type variable, not numeric.

There's something you're not sharing here....Suggest you post a reply with the exact SAS-generated log for a more useful reply and guidance.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
I apologize for not being clear. I was trying to simplify things, which apparently had the opposite effect.

I used the Import Data procedure from the File menu. That generated the log pasted at the end of this message. The problem variable is FILING_ID, which SAS reads in as a numeric field rather than a character field. I then created a program using the code embedded in the log. In doing so, I modified the three lines dealing with FILING_ID (lines 853, 937, and 1022 below) to read as follows.

informat FILING_ID $14. ;

format FILING_ID $14. ;

FILING_ID $

I then ran that program to read in the data. The imported dataset looks fine except that FILING_ID was again read in as a numeric field.

Here's the log:


842 /**********************************************************************
843 * PRODUCT: SAS
844 * VERSION: 9.1
845 * CREATOR: External File Interface
846 * DATE: 04AUG10
847 * DESC: Generated SAS Datastep Code
848 * TEMPLATE SOURCE: (None Specified.)
849 ***********************************************************************/
850 data WORK.TEMP ;
851 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
852 infile 'E:\Pension\Full Data\f_5500_2003.csv' delimiter = ',' MISSOVER DSD lrecl=32767
852! firstobs=2 ;
853 informat FILING_ID best32. ;
854 informat SPONS_DFE_EIN $11. ;
855 informat SPONS_DFE_PN $5. ;
856 informat FORM_PLAN_YEAR_BEGIN_DATE $10. ;
857 informat FORM_TAX_PRD best32. ;
858 informat TYPE_PLAN_ENTITY_IND $3. ;
859 informat TYPE_DFE_PLAN_ENTITY $1. ;
860 informat TYPE_PLAN_FILING_IND $5. ;
861 informat COLLECTIVE_BARGAIN_IND $1. ;
862 informat EXT_APPLICATION_FILED_IND $1. ;
863 informat PLAN_NAME $80. ;
864 informat PLAN_EFF_DATE $10. ;
865 informat SPONSOR_DFE_NAME $48. ;
866 informat SPONS_DFE_DBA_NAME $1. ;
867 informat SPONS_DFE_CARE_OF_NAME $1. ;
868 informat SPONS_DFE_MAIL_STR_ADDRESS $30. ;
869 informat SPONS_DFE_LOC_ADDR $24. ;
870 informat SPONS_DFE_FOREIGN_ROUTE_CD $1. ;
871 informat SP_DFE_FGN_MAIL_CNTRY $1. ;
872 informat SPONS_DFE_CITY $20. ;
873 informat SPONS_DFE_STATE $4. ;
874 informat SPONS_DFE_ZIP_CODE $11. ;
875 informat SPONS_DFE_PHONE_NUM $12. ;
876 informat BUSINESS_CODE $8. ;
877 informat ADMIN_NAME $49. ;
878 informat ADMIN_CARE_OF_NAME $1. ;
879 informat ADMIN_STR_ADDRESS $42. ;
880 informat ADMIN_FOREIGN_ROUTE_CD $1. ;
881 informat ADMIN_FOREIGN_MAILING_CNTRY $1. ;
882 informat ADMIN_CITY $12. ;
883 informat ADMIN_STATE $5. ;
884 informat ADMIN_ZIP_CODE $14. ;
885 informat ADMIN_EIN $14. ;
886 informat ADMIN_PHONE_NUM $16. ;
887 informat LAST_RPT_SPONS_NAME $40. ;
888 informat LAST_RPT_SPONS_EIN $14. ;
889 informat LAST_RPT_PLAN_NUM $7. ;
890 informat PREPARER_NAME $74. ;
891 informat PREPARER_STR_ADDRESS $33. ;
892 informat PREPARER_FOREIGN_ROUTE_CD $1. ;
893 informat PREPARER_FRGN_MAILING_CNTRY $1. ;
894 informat PREPARER_CITY $22. ;
895 informat PREPARER_STATE $5. ;
896 informat PREPARER_ZIP_CODE $9. ;
897 informat PREPARER_EIN $14. ;
898 informat PREPARER_PHONE_NUM $16. ;
899 informat ADMIN_SIGNATURE_IND $3. ;
900 informat ADMIN_SIGNED_DATE $10. ;
901 informat ADMIN_SIGNED_NAME $39. ;
902 informat SPONS_SIGNATURE_IND $3. ;
903 informat SPONS_SIGNED_DATE $13. ;
904 informat SPONS_SIGNED_NAME $43. ;
905 informat TOT_PARTCP_BOY_CNT best32. ;
906 informat TOT_ACTIVE_PARTCP_CNT best32. ;
907 informat RTD_SEP_PARTCP_RCVG_CNT best32. ;
908 informat RTD_SEP_PARTCP_FUT_CNT best32. ;
909 informat SUBTL_ACT_RTD_SEP_CNT best32. ;
910 informat BENEF_RCVG_BNFT_CNT best32. ;
911 informat TOT_ACT_RTD_SEP_BENEF_CNT best32. ;
912 informat PARTCP_ACCOUNT_BAL_CNT best32. ;
913 informat SEP_PARTCP_PARTL_VSTD_CNT best32. ;
914 informat SSA_FILER_PARTCP_CNT best32. ;
915 informat PENSION_BENEFIT_PLAN_IND $3. ;
916 informat TYPE_PENSION_BNFT_CODE $17. ;
917 informat WELFARE_BENEFIT_PLAN_IND $4. ;
918 informat TYPE_WELFARE_BNFT_CODE $5. ;
919 informat FUNDING_ARRANGEMENT_CODE $5. ;
920 informat BENEFIT_CODE $3. ;
921 informat SCH_R_ATTACHED_IND $3. ;
922 informat SCH_T_ATTACHED_IND $3. ;
923 informat NUM_SCH_T_ATTACHED_CNT best32. ;
924 informat SCH_T_PNDG_INFO_PRIOR_YR_DATE $1. ;
925 informat SCH_B_ATTACHED_IND $1. ;
926 informat SCH_E_ATTACHED_IND $1. ;
927 informat SCH_SSA_ATTACHED_IND $1. ;
928 informat SCH_H_ATTACHED_IND $1. ;
929 informat SCH_I_ATTACHED_IND $3. ;
930 informat SCH_A_ATTACHED_IND $4. ;
931 informat NUM_SCH_A_ATTACHED_CNT best32. ;
932 informat SCH_C_ATTACHED_IND $1. ;
933 informat SCH_D_ATTACHED_IND $4. ;
934 informat SCH_G_ATTACHED_IND $1. ;
935 informat SCH_P_ATTACHED_IND $3. ;
936 informat NUM_SCH_P_ATTACHED_CNT best32. ;
937 format FILING_ID best12. ;
938 format SPONS_DFE_EIN $11. ;
939 format SPONS_DFE_PN $5. ;
940 format FORM_PLAN_YEAR_BEGIN_DATE $10. ;
941 format FORM_TAX_PRD best12. ;
942 format TYPE_PLAN_ENTITY_IND $3. ;
943 format TYPE_DFE_PLAN_ENTITY $1. ;
944 format TYPE_PLAN_FILING_IND $5. ;
945 format COLLECTIVE_BARGAIN_IND $1. ;
946 format EXT_APPLICATION_FILED_IND $1. ;
947 format PLAN_NAME $80. ;
948 format PLAN_EFF_DATE $10. ;
949 format SPONSOR_DFE_NAME $48. ;
950 format SPONS_DFE_DBA_NAME $1. ;
951 format SPONS_DFE_CARE_OF_NAME $1. ;
952 format SPONS_DFE_MAIL_STR_ADDRESS $30. ;
953 format SPONS_DFE_LOC_ADDR $24. ;
954 format SPONS_DFE_FOREIGN_ROUTE_CD $1. ;
955 format SP_DFE_FGN_MAIL_CNTRY $1. ;
956 format SPONS_DFE_CITY $20. ;
957 format SPONS_DFE_STATE $4. ;
958 format SPONS_DFE_ZIP_CODE $11. ;
959 format SPONS_DFE_PHONE_NUM $12. ;
960 format BUSINESS_CODE $8. ;
961 format ADMIN_NAME $49. ;
962 format ADMIN_CARE_OF_NAME $1. ;
963 format ADMIN_STR_ADDRESS $42. ;
964 format ADMIN_FOREIGN_ROUTE_CD $1. ;
965 format ADMIN_FOREIGN_MAILING_CNTRY $1. ;
966 format ADMIN_CITY $12. ;
967 format ADMIN_STATE $5. ;
968 format ADMIN_ZIP_CODE $14. ;
969 format ADMIN_EIN $14. ;
970 format ADMIN_PHONE_NUM $16. ;
971 format LAST_RPT_SPONS_NAME $40. ;
972 format LAST_RPT_SPONS_EIN $14. ;
973 format LAST_RPT_PLAN_NUM $7. ;
974 format PREPARER_NAME $74. ;
975 format PREPARER_STR_ADDRESS $33. ;
976 format PREPARER_FOREIGN_ROUTE_CD $1. ;
977 format PREPARER_FRGN_MAILING_CNTRY $1. ;
978 format PREPARER_CITY $22. ;
979 format PREPARER_STATE $5. ;
980 format PREPARER_ZIP_CODE $9. ;
981 format PREPARER_EIN $14. ;
982 format PREPARER_PHONE_NUM $16. ;
983 format ADMIN_SIGNATURE_IND $3. ;
984 format ADMIN_SIGNED_DATE $10. ;
985 format ADMIN_SIGNED_NAME $39. ;
986 format SPONS_SIGNATURE_IND $3. ;
987 format SPONS_SIGNED_DATE $13. ;
988 format SPONS_SIGNED_NAME $43. ;
989 format TOT_PARTCP_BOY_CNT best12. ;
990 format TOT_ACTIVE_PARTCP_CNT best12. ;
991 format RTD_SEP_PARTCP_RCVG_CNT best12. ;
992 format RTD_SEP_PARTCP_FUT_CNT best12. ;
993 format SUBTL_ACT_RTD_SEP_CNT best12. ;
994 format BENEF_RCVG_BNFT_CNT best12. ;
995 format TOT_ACT_RTD_SEP_BENEF_CNT best12. ;
996 format PARTCP_ACCOUNT_BAL_CNT best12. ;
997 format SEP_PARTCP_PARTL_VSTD_CNT best12. ;
998 format SSA_FILER_PARTCP_CNT best12. ;
999 format PENSION_BENEFIT_PLAN_IND $3. ;
1000 format TYPE_PENSION_BNFT_CODE $17. ;
1001 format WELFARE_BENEFIT_PLAN_IND $4. ;
1002 format TYPE_WELFARE_BNFT_CODE $5. ;
1003 format FUNDING_ARRANGEMENT_CODE $5. ;
1004 format BENEFIT_CODE $3. ;
1005 format SCH_R_ATTACHED_IND $3. ;
1006 format SCH_T_ATTACHED_IND $3. ;
1007 format NUM_SCH_T_ATTACHED_CNT best12. ;
1008 format SCH_T_PNDG_INFO_PRIOR_YR_DATE $1. ;
1009 format SCH_B_ATTACHED_IND $1. ;
1010 format SCH_E_ATTACHED_IND $1. ;
1011 format SCH_SSA_ATTACHED_IND $1. ;
1012 format SCH_H_ATTACHED_IND $1. ;
1013 format SCH_I_ATTACHED_IND $3. ;
1014 format SCH_A_ATTACHED_IND $4. ;
1015 format NUM_SCH_A_ATTACHED_CNT best12. ;
1016 format SCH_C_ATTACHED_IND $1. ;
1017 format SCH_D_ATTACHED_IND $4. ;
1018 format SCH_G_ATTACHED_IND $1. ;
1019 format SCH_P_ATTACHED_IND $3. ;
1020 format NUM_SCH_P_ATTACHED_CNT best12. ;
1021 input
1022 FILING_ID
1023 SPONS_DFE_EIN $
1024 SPONS_DFE_PN $
1025 FORM_PLAN_YEAR_BEGIN_DATE $
1026 FORM_TAX_PRD
1027 TYPE_PLAN_ENTITY_IND $
1028 TYPE_DFE_PLAN_ENTITY $
1029 TYPE_PLAN_FILING_IND $
1030 COLLECTIVE_BARGAIN_IND $
1031 EXT_APPLICATION_FILED_IND $
1032 PLAN_NAME $
1033 PLAN_EFF_DATE $
1034 SPONSOR_DFE_NAME $
1035 SPONS_DFE_DBA_NAME $
1036 SPONS_DFE_CARE_OF_NAME $
1037 SPONS_DFE_MAIL_STR_ADDRESS $
1038 SPONS_DFE_LOC_ADDR $
1039 SPONS_DFE_FOREIGN_ROUTE_CD $
1040 SP_DFE_FGN_MAIL_CNTRY $
1041 SPONS_DFE_CITY $
1042 SPONS_DFE_STATE $
1043 SPONS_DFE_ZIP_CODE $
1044 SPONS_DFE_PHONE_NUM $
1045 BUSINESS_CODE $
1046 ADMIN_NAME $
1047 ADMIN_CARE_OF_NAME $
1048 ADMIN_STR_ADDRESS $
1049 ADMIN_FOREIGN_ROUTE_CD $
1050 ADMIN_FOREIGN_MAILING_CNTRY $
1051 ADMIN_CITY $
1052 ADMIN_STATE $
1053 ADMIN_ZIP_CODE $
1054 ADMIN_EIN $
1055 ADMIN_PHONE_NUM $
1056 LAST_RPT_SPONS_NAME $
1057 LAST_RPT_SPONS_EIN $
1058 LAST_RPT_PLAN_NUM $
1059 PREPARER_NAME $
1060 PREPARER_STR_ADDRESS $
1061 PREPARER_FOREIGN_ROUTE_CD $
1062 PREPARER_FRGN_MAILING_CNTRY $
1063 PREPARER_CITY $
1064 PREPARER_STATE $
1065 PREPARER_ZIP_CODE $
1066 PREPARER_EIN $
1067 PREPARER_PHONE_NUM $
1068 ADMIN_SIGNATURE_IND $
1069 ADMIN_SIGNED_DATE $
1070 ADMIN_SIGNED_NAME $
1071 SPONS_SIGNATURE_IND $
1072 SPONS_SIGNED_DATE $
1073 SPONS_SIGNED_NAME $
1074 TOT_PARTCP_BOY_CNT
1075 TOT_ACTIVE_PARTCP_CNT
1076 RTD_SEP_PARTCP_RCVG_CNT
1077 RTD_SEP_PARTCP_FUT_CNT
1078 SUBTL_ACT_RTD_SEP_CNT
1079 BENEF_RCVG_BNFT_CNT
1080 TOT_ACT_RTD_SEP_BENEF_CNT
1081 PARTCP_ACCOUNT_BAL_CNT
1082 SEP_PARTCP_PARTL_VSTD_CNT
1083 SSA_FILER_PARTCP_CNT
1084 PENSION_BENEFIT_PLAN_IND $
1085 TYPE_PENSION_BNFT_CODE $
1086 WELFARE_BENEFIT_PLAN_IND $
1087 TYPE_WELFARE_BNFT_CODE $
1088 FUNDING_ARRANGEMENT_CODE $
1089 BENEFIT_CODE $
1090 SCH_R_ATTACHED_IND $
1091 SCH_T_ATTACHED_IND $
1092 NUM_SCH_T_ATTACHED_CNT
1093 SCH_T_PNDG_INFO_PRIOR_YR_DATE $
1094 SCH_B_ATTACHED_IND $
1095 SCH_E_ATTACHED_IND $
1096 SCH_SSA_ATTACHED_IND $
1097 SCH_H_ATTACHED_IND $
1098 SCH_I_ATTACHED_IND $
1099 SCH_A_ATTACHED_IND $
1100 NUM_SCH_A_ATTACHED_CNT
1101 SCH_C_ATTACHED_IND $
1102 SCH_D_ATTACHED_IND $
1103 SCH_G_ATTACHED_IND $
1104 SCH_P_ATTACHED_IND $
1105 NUM_SCH_P_ATTACHED_CNT
1106 ;
1107 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
1108 run;
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
You have limited control so I suggest you take what you get and use SAS to convert the numeric value back to a SAS CHARACTER variable, using PUT - see example below;


1 data _null_;
2 filing_id = input('84037175123753',best32.);
3 filing_id_c = put(filing_id,14.);
4 putlog _all_;
5 run;

filing_id=8.4037175E13 filing_id_c=84037175123753 _ERROR_=0 _N_=1
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds


Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
@sbb: The problem is that the field gets read in with 8 significant digits, so the last 6 digits are not there at all when I do the conversion.
data_null__
Jade | Level 19
You do understand that the code you extract from the log and modify becomes your NEW program and you don't use PROC IMPORT again. Right?
deleted_user
Not applicable
@data_null_: Yes, I do understand that.

I do have a workaround in which I pull the data into Excel, insert a character in front of the number, and write the file out in .csv format. SAS reads that fine, but it's a bit cumbersome because I have more than a few files to load. The workaround is find for this project, but I'd like to figure out an easier approach for future reference.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1794 views
  • 0 likes
  • 5 in conversation