Help using Base SAS procedures

data input problem

Reply
N/A
Posts: 0

data input problem

I'm having a lot of difficulty reading a .csv file into SAS. The file is large (~900,000 observations) and has an ID column that is a 14 digit number. When I read it into SAS, it comes in as a number in scientific notation with length 8 (i.e., I lose the last part of the ID number). I tried using Excel to change the format of the column, but my version of SAS won't read a .xlsx file and .xls files are limited to 65K observations. I really don't know what to try next. Any suggestions?

Thanks!
Regular Contributor
Posts: 165

Re: data input problem

Even though ID is a number it is a good idea to read it in as a chacter, unless Im missing something else about your process.

Good luck!
N/A
Posts: 0

Re: data input problem

I agree. My problem is that SAS automatically reads it in as a number. Not sure how to get around that.
Regular Contributor
Posts: 165

Re: data input problem

What is the code that you are using to read in the csv? It is hard to give any other suggestions without it.
N/A
Posts: 0

Re: data input problem

I'm just using the Import Data function under the File menu. It uses the following code:

PROC IMPORT OUT= WORK.temp
DATAFILE= "E:\f55.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;
Regular Contributor
Posts: 165

Re: data input problem

When you run proc import it should generate a data step in the log. You could copy and paste this into your program and change the informat there. Check the documentation for more info.
N/A
Posts: 0

Re: data input problem

I did that (trying several different informat lines), but each time I get an error saying I'm trying to read in a numeric field as a character field and then SAS overrules the command and reads it as an 8 digit number in scientific notation.
Regular Contributor
Posts: 165

Re: data input problem

Are you also adding $ to the input statement as well? Also both the informat and format statements?
N/A
Posts: 0

Re: data input problem

Yes. Here are the relevant lines of the code as I modified them:

informat ID $14. ;
format ID $14. ;
input
ID $;

Even with that, it reads in (for example)
84037175123753
as
8.4037175E13
Super Contributor
Super Contributor
Posts: 3,174

Re: data input problem

What you are suggesting with the "relevant lines of the code as I modified them" doesn't make sense from the SAS processing perspective -- the key statements you have is with the INPUT and INFORMAT statement and how you have defined your variable -- which will be a CHARACTER type variable, not numeric.

There's something you're not sharing here....Suggest you post a reply with the exact SAS-generated log for a more useful reply and guidance.

Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: data input problem

I apologize for not being clear. I was trying to simplify things, which apparently had the opposite effect.

I used the Import Data procedure from the File menu. That generated the log pasted at the end of this message. The problem variable is FILING_ID, which SAS reads in as a numeric field rather than a character field. I then created a program using the code embedded in the log. In doing so, I modified the three lines dealing with FILING_ID (lines 853, 937, and 1022 below) to read as follows.

informat FILING_ID $14. ;

format FILING_ID $14. ;

FILING_ID $

I then ran that program to read in the data. The imported dataset looks fine except that FILING_ID was again read in as a numeric field.

Here's the log:


842 /**********************************************************************
843 * PRODUCT: SAS
844 * VERSION: 9.1
845 * CREATOR: External File Interface
846 * DATE: 04AUG10
847 * DESC: Generated SAS Datastep Code
848 * TEMPLATE SOURCE: (None Specified.)
849 ***********************************************************************/
850 data WORK.TEMP ;
851 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
852 infile 'E:\Pension\Full Data\f_5500_2003.csv' delimiter = ',' MISSOVER DSD lrecl=32767
852! firstobs=2 ;
853 informat FILING_ID best32. ;
854 informat SPONS_DFE_EIN $11. ;
855 informat SPONS_DFE_PN $5. ;
856 informat FORM_PLAN_YEAR_BEGIN_DATE $10. ;
857 informat FORM_TAX_PRD best32. ;
858 informat TYPE_PLAN_ENTITY_IND $3. ;
859 informat TYPE_DFE_PLAN_ENTITY $1. ;
860 informat TYPE_PLAN_FILING_IND $5. ;
861 informat COLLECTIVE_BARGAIN_IND $1. ;
862 informat EXT_APPLICATION_FILED_IND $1. ;
863 informat PLAN_NAME $80. ;
864 informat PLAN_EFF_DATE $10. ;
865 informat SPONSOR_DFE_NAME $48. ;
866 informat SPONS_DFE_DBA_NAME $1. ;
867 informat SPONS_DFE_CARE_OF_NAME $1. ;
868 informat SPONS_DFE_MAIL_STR_ADDRESS $30. ;
869 informat SPONS_DFE_LOC_ADDR $24. ;
870 informat SPONS_DFE_FOREIGN_ROUTE_CD $1. ;
871 informat SP_DFE_FGN_MAIL_CNTRY $1. ;
872 informat SPONS_DFE_CITY $20. ;
873 informat SPONS_DFE_STATE $4. ;
874 informat SPONS_DFE_ZIP_CODE $11. ;
875 informat SPONS_DFE_PHONE_NUM $12. ;
876 informat BUSINESS_CODE $8. ;
877 informat ADMIN_NAME $49. ;
878 informat ADMIN_CARE_OF_NAME $1. ;
879 informat ADMIN_STR_ADDRESS $42. ;
880 informat ADMIN_FOREIGN_ROUTE_CD $1. ;
881 informat ADMIN_FOREIGN_MAILING_CNTRY $1. ;
882 informat ADMIN_CITY $12. ;
883 informat ADMIN_STATE $5. ;
884 informat ADMIN_ZIP_CODE $14. ;
885 informat ADMIN_EIN $14. ;
886 informat ADMIN_PHONE_NUM $16. ;
887 informat LAST_RPT_SPONS_NAME $40. ;
888 informat LAST_RPT_SPONS_EIN $14. ;
889 informat LAST_RPT_PLAN_NUM $7. ;
890 informat PREPARER_NAME $74. ;
891 informat PREPARER_STR_ADDRESS $33. ;
892 informat PREPARER_FOREIGN_ROUTE_CD $1. ;
893 informat PREPARER_FRGN_MAILING_CNTRY $1. ;
894 informat PREPARER_CITY $22. ;
895 informat PREPARER_STATE $5. ;
896 informat PREPARER_ZIP_CODE $9. ;
897 informat PREPARER_EIN $14. ;
898 informat PREPARER_PHONE_NUM $16. ;
899 informat ADMIN_SIGNATURE_IND $3. ;
900 informat ADMIN_SIGNED_DATE $10. ;
901 informat ADMIN_SIGNED_NAME $39. ;
902 informat SPONS_SIGNATURE_IND $3. ;
903 informat SPONS_SIGNED_DATE $13. ;
904 informat SPONS_SIGNED_NAME $43. ;
905 informat TOT_PARTCP_BOY_CNT best32. ;
906 informat TOT_ACTIVE_PARTCP_CNT best32. ;
907 informat RTD_SEP_PARTCP_RCVG_CNT best32. ;
908 informat RTD_SEP_PARTCP_FUT_CNT best32. ;
909 informat SUBTL_ACT_RTD_SEP_CNT best32. ;
910 informat BENEF_RCVG_BNFT_CNT best32. ;
911 informat TOT_ACT_RTD_SEP_BENEF_CNT best32. ;
912 informat PARTCP_ACCOUNT_BAL_CNT best32. ;
913 informat SEP_PARTCP_PARTL_VSTD_CNT best32. ;
914 informat SSA_FILER_PARTCP_CNT best32. ;
915 informat PENSION_BENEFIT_PLAN_IND $3. ;
916 informat TYPE_PENSION_BNFT_CODE $17. ;
917 informat WELFARE_BENEFIT_PLAN_IND $4. ;
918 informat TYPE_WELFARE_BNFT_CODE $5. ;
919 informat FUNDING_ARRANGEMENT_CODE $5. ;
920 informat BENEFIT_CODE $3. ;
921 informat SCH_R_ATTACHED_IND $3. ;
922 informat SCH_T_ATTACHED_IND $3. ;
923 informat NUM_SCH_T_ATTACHED_CNT best32. ;
924 informat SCH_T_PNDG_INFO_PRIOR_YR_DATE $1. ;
925 informat SCH_B_ATTACHED_IND $1. ;
926 informat SCH_E_ATTACHED_IND $1. ;
927 informat SCH_SSA_ATTACHED_IND $1. ;
928 informat SCH_H_ATTACHED_IND $1. ;
929 informat SCH_I_ATTACHED_IND $3. ;
930 informat SCH_A_ATTACHED_IND $4. ;
931 informat NUM_SCH_A_ATTACHED_CNT best32. ;
932 informat SCH_C_ATTACHED_IND $1. ;
933 informat SCH_D_ATTACHED_IND $4. ;
934 informat SCH_G_ATTACHED_IND $1. ;
935 informat SCH_P_ATTACHED_IND $3. ;
936 informat NUM_SCH_P_ATTACHED_CNT best32. ;
937 format FILING_ID best12. ;
938 format SPONS_DFE_EIN $11. ;
939 format SPONS_DFE_PN $5. ;
940 format FORM_PLAN_YEAR_BEGIN_DATE $10. ;
941 format FORM_TAX_PRD best12. ;
942 format TYPE_PLAN_ENTITY_IND $3. ;
943 format TYPE_DFE_PLAN_ENTITY $1. ;
944 format TYPE_PLAN_FILING_IND $5. ;
945 format COLLECTIVE_BARGAIN_IND $1. ;
946 format EXT_APPLICATION_FILED_IND $1. ;
947 format PLAN_NAME $80. ;
948 format PLAN_EFF_DATE $10. ;
949 format SPONSOR_DFE_NAME $48. ;
950 format SPONS_DFE_DBA_NAME $1. ;
951 format SPONS_DFE_CARE_OF_NAME $1. ;
952 format SPONS_DFE_MAIL_STR_ADDRESS $30. ;
953 format SPONS_DFE_LOC_ADDR $24. ;
954 format SPONS_DFE_FOREIGN_ROUTE_CD $1. ;
955 format SP_DFE_FGN_MAIL_CNTRY $1. ;
956 format SPONS_DFE_CITY $20. ;
957 format SPONS_DFE_STATE $4. ;
958 format SPONS_DFE_ZIP_CODE $11. ;
959 format SPONS_DFE_PHONE_NUM $12. ;
960 format BUSINESS_CODE $8. ;
961 format ADMIN_NAME $49. ;
962 format ADMIN_CARE_OF_NAME $1. ;
963 format ADMIN_STR_ADDRESS $42. ;
964 format ADMIN_FOREIGN_ROUTE_CD $1. ;
965 format ADMIN_FOREIGN_MAILING_CNTRY $1. ;
966 format ADMIN_CITY $12. ;
967 format ADMIN_STATE $5. ;
968 format ADMIN_ZIP_CODE $14. ;
969 format ADMIN_EIN $14. ;
970 format ADMIN_PHONE_NUM $16. ;
971 format LAST_RPT_SPONS_NAME $40. ;
972 format LAST_RPT_SPONS_EIN $14. ;
973 format LAST_RPT_PLAN_NUM $7. ;
974 format PREPARER_NAME $74. ;
975 format PREPARER_STR_ADDRESS $33. ;
976 format PREPARER_FOREIGN_ROUTE_CD $1. ;
977 format PREPARER_FRGN_MAILING_CNTRY $1. ;
978 format PREPARER_CITY $22. ;
979 format PREPARER_STATE $5. ;
980 format PREPARER_ZIP_CODE $9. ;
981 format PREPARER_EIN $14. ;
982 format PREPARER_PHONE_NUM $16. ;
983 format ADMIN_SIGNATURE_IND $3. ;
984 format ADMIN_SIGNED_DATE $10. ;
985 format ADMIN_SIGNED_NAME $39. ;
986 format SPONS_SIGNATURE_IND $3. ;
987 format SPONS_SIGNED_DATE $13. ;
988 format SPONS_SIGNED_NAME $43. ;
989 format TOT_PARTCP_BOY_CNT best12. ;
990 format TOT_ACTIVE_PARTCP_CNT best12. ;
991 format RTD_SEP_PARTCP_RCVG_CNT best12. ;
992 format RTD_SEP_PARTCP_FUT_CNT best12. ;
993 format SUBTL_ACT_RTD_SEP_CNT best12. ;
994 format BENEF_RCVG_BNFT_CNT best12. ;
995 format TOT_ACT_RTD_SEP_BENEF_CNT best12. ;
996 format PARTCP_ACCOUNT_BAL_CNT best12. ;
997 format SEP_PARTCP_PARTL_VSTD_CNT best12. ;
998 format SSA_FILER_PARTCP_CNT best12. ;
999 format PENSION_BENEFIT_PLAN_IND $3. ;
1000 format TYPE_PENSION_BNFT_CODE $17. ;
1001 format WELFARE_BENEFIT_PLAN_IND $4. ;
1002 format TYPE_WELFARE_BNFT_CODE $5. ;
1003 format FUNDING_ARRANGEMENT_CODE $5. ;
1004 format BENEFIT_CODE $3. ;
1005 format SCH_R_ATTACHED_IND $3. ;
1006 format SCH_T_ATTACHED_IND $3. ;
1007 format NUM_SCH_T_ATTACHED_CNT best12. ;
1008 format SCH_T_PNDG_INFO_PRIOR_YR_DATE $1. ;
1009 format SCH_B_ATTACHED_IND $1. ;
1010 format SCH_E_ATTACHED_IND $1. ;
1011 format SCH_SSA_ATTACHED_IND $1. ;
1012 format SCH_H_ATTACHED_IND $1. ;
1013 format SCH_I_ATTACHED_IND $3. ;
1014 format SCH_A_ATTACHED_IND $4. ;
1015 format NUM_SCH_A_ATTACHED_CNT best12. ;
1016 format SCH_C_ATTACHED_IND $1. ;
1017 format SCH_D_ATTACHED_IND $4. ;
1018 format SCH_G_ATTACHED_IND $1. ;
1019 format SCH_P_ATTACHED_IND $3. ;
1020 format NUM_SCH_P_ATTACHED_CNT best12. ;
1021 input
1022 FILING_ID
1023 SPONS_DFE_EIN $
1024 SPONS_DFE_PN $
1025 FORM_PLAN_YEAR_BEGIN_DATE $
1026 FORM_TAX_PRD
1027 TYPE_PLAN_ENTITY_IND $
1028 TYPE_DFE_PLAN_ENTITY $
1029 TYPE_PLAN_FILING_IND $
1030 COLLECTIVE_BARGAIN_IND $
1031 EXT_APPLICATION_FILED_IND $
1032 PLAN_NAME $
1033 PLAN_EFF_DATE $
1034 SPONSOR_DFE_NAME $
1035 SPONS_DFE_DBA_NAME $
1036 SPONS_DFE_CARE_OF_NAME $
1037 SPONS_DFE_MAIL_STR_ADDRESS $
1038 SPONS_DFE_LOC_ADDR $
1039 SPONS_DFE_FOREIGN_ROUTE_CD $
1040 SP_DFE_FGN_MAIL_CNTRY $
1041 SPONS_DFE_CITY $
1042 SPONS_DFE_STATE $
1043 SPONS_DFE_ZIP_CODE $
1044 SPONS_DFE_PHONE_NUM $
1045 BUSINESS_CODE $
1046 ADMIN_NAME $
1047 ADMIN_CARE_OF_NAME $
1048 ADMIN_STR_ADDRESS $
1049 ADMIN_FOREIGN_ROUTE_CD $
1050 ADMIN_FOREIGN_MAILING_CNTRY $
1051 ADMIN_CITY $
1052 ADMIN_STATE $
1053 ADMIN_ZIP_CODE $
1054 ADMIN_EIN $
1055 ADMIN_PHONE_NUM $
1056 LAST_RPT_SPONS_NAME $
1057 LAST_RPT_SPONS_EIN $
1058 LAST_RPT_PLAN_NUM $
1059 PREPARER_NAME $
1060 PREPARER_STR_ADDRESS $
1061 PREPARER_FOREIGN_ROUTE_CD $
1062 PREPARER_FRGN_MAILING_CNTRY $
1063 PREPARER_CITY $
1064 PREPARER_STATE $
1065 PREPARER_ZIP_CODE $
1066 PREPARER_EIN $
1067 PREPARER_PHONE_NUM $
1068 ADMIN_SIGNATURE_IND $
1069 ADMIN_SIGNED_DATE $
1070 ADMIN_SIGNED_NAME $
1071 SPONS_SIGNATURE_IND $
1072 SPONS_SIGNED_DATE $
1073 SPONS_SIGNED_NAME $
1074 TOT_PARTCP_BOY_CNT
1075 TOT_ACTIVE_PARTCP_CNT
1076 RTD_SEP_PARTCP_RCVG_CNT
1077 RTD_SEP_PARTCP_FUT_CNT
1078 SUBTL_ACT_RTD_SEP_CNT
1079 BENEF_RCVG_BNFT_CNT
1080 TOT_ACT_RTD_SEP_BENEF_CNT
1081 PARTCP_ACCOUNT_BAL_CNT
1082 SEP_PARTCP_PARTL_VSTD_CNT
1083 SSA_FILER_PARTCP_CNT
1084 PENSION_BENEFIT_PLAN_IND $
1085 TYPE_PENSION_BNFT_CODE $
1086 WELFARE_BENEFIT_PLAN_IND $
1087 TYPE_WELFARE_BNFT_CODE $
1088 FUNDING_ARRANGEMENT_CODE $
1089 BENEFIT_CODE $
1090 SCH_R_ATTACHED_IND $
1091 SCH_T_ATTACHED_IND $
1092 NUM_SCH_T_ATTACHED_CNT
1093 SCH_T_PNDG_INFO_PRIOR_YR_DATE $
1094 SCH_B_ATTACHED_IND $
1095 SCH_E_ATTACHED_IND $
1096 SCH_SSA_ATTACHED_IND $
1097 SCH_H_ATTACHED_IND $
1098 SCH_I_ATTACHED_IND $
1099 SCH_A_ATTACHED_IND $
1100 NUM_SCH_A_ATTACHED_CNT
1101 SCH_C_ATTACHED_IND $
1102 SCH_D_ATTACHED_IND $
1103 SCH_G_ATTACHED_IND $
1104 SCH_P_ATTACHED_IND $
1105 NUM_SCH_P_ATTACHED_CNT
1106 ;
1107 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
1108 run;
Super Contributor
Super Contributor
Posts: 3,174

Re: data input problem

You have limited control so I suggest you take what you get and use SAS to convert the numeric value back to a SAS CHARACTER variable, using PUT - see example below;


1 data _null_;
2 filing_id = input('84037175123753',best32.);
3 filing_id_c = put(filing_id,14.);
4 putlog _all_;
5 run;

filing_id=8.4037175E13 filing_id_c=84037175123753 _ERROR_=0 _N_=1
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds


Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: data input problem

@sbb: The problem is that the field gets read in with 8 significant digits, so the last 6 digits are not there at all when I do the conversion.
Respected Advisor
Posts: 3,777

Re: data input problem

You do understand that the code you extract from the log and modify becomes your NEW program and you don't use PROC IMPORT again. Right?
N/A
Posts: 0

Re: data input problem

@data_null_: Yes, I do understand that.

I do have a workaround in which I pull the data into Excel, insert a character in front of the number, and write the file out in .csv format. SAS reads that fine, but it's a bit cumbersome because I have more than a few files to load. The workaround is find for this project, but I'd like to figure out an easier approach for future reference.
Ask a Question
Discussion stats
  • 16 replies
  • 247 views
  • 0 likes
  • 5 in conversation