DATA Step, Macro, Functions and more

CSV vs XLSX Import. How to reformat specific variables.

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

CSV vs XLSX Import. How to reformat specific variables.

I imported these files first as CSVs and then had an issue with the formatting so I saved them all as XLSX and reimported but had different issues with formatting. I went back and reimported them all again as CSVs and have this issue with the formatting. I want to keep the CSV imports but need to fix the formatting issues in variables: QUAL_CD, SALE_PRC1, SALE_YR1.

 

Here is a screen shot of my SAS file from the CSV import:

 

bigger (2).jpg

 

Here is a screen shot from the SAS file of my XLSX import:other (2).jpg

 

The CSV import has them all as Character and only keeps the first digit. The XLSX file has them as numeric and has the full value. All of them should be character except for sale_prc1. I am thinking all I need to do is some sort of format change, but am not quite sure how. I want to change the sale_prc1 to a numeric variable and accommodate for up to 10 digits. I want to keep those other variables at character but get all the digits to show. 


Accepted Solutions
Solution
‎10-12-2017 03:01 PM
Super User
Posts: 13,942

Re: CSV vs XLSX Import. How to reformat specific variables.

Posted in reply to andrewfau

And what specifically is "wrong" when you use that code?

  Does a variable need to be longer? Change the number in the informat. Changing $16. to $25. will read upto 25 characters instead of 16.

  Does the variable to be numeric but is coming in as character? Change the informat for the variable to a numeric such as Best16. or Best32.

Is a variable coming in as numeric but should be character? Change the informat to a $.

Either remove the format or change to match the informat to insure you can see the results.

 

You probably needed to set guessing rows in this:

proc import datafile= 'c:\users\myname\documents\propertydata\csv\palmbeach\nal_2015_60palm beach_f' 
dbms=csv out=propcsv.palmbeach15; run;

so that more than a few rows are examined. By default SAS determines variable characteristics using the first 20 rows.

 

Add: Guessingrows=32000; to tell SAS to use information from 32000 rows to guess the lengths and variable types. Very large values for guessingrows may add some time to read the data during the scan but generally isn't too noticeable.

View solution in original post


All Replies
Super User
Posts: 24,026

Re: CSV vs XLSX Import. How to reformat specific variables.

Posted in reply to andrewfau

When you imported the CSV, check the log. It has the code. Copy the code and edit as needed. Hold down ALT to select the code in the log without line numbers.

Contributor
Posts: 32

Re: CSV vs XLSX Import. How to reformat specific variables.

proc import datafile= 'c:\users\myname\documents\propertydata\csv\palmbeach\nal_2015_60palm beach_f' dbms=csv out=propcsv.palmbeach15;
run;

This is the code I used.

Super User
Posts: 10,600

Re: CSV vs XLSX Import. How to reformat specific variables.

Posted in reply to andrewfau

andrewfau wrote:
proc import datafile= 'c:\users\myname\documents\propertydata\csv\palmbeach\nal_2015_60palm beach_f' dbms=csv out=propcsv.palmbeach15;
run;

This is the code I used.


Then copy the data step code from the log and adapt it, as @Reeza said.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 32

Re: CSV vs XLSX Import. How to reformat specific variables.

Sorry. Here is the code from the log.

 

proc import datafile=
'c:\users\avanvalin2017\documents\propertydata\csv\palmbeach\nal_2015_60palm beach_f.csv'
dbms=csv out=propcsv.palmbeach15;
run;

/**********************************************************************
* PRODUCT: SAS
* VERSION: 9.4
* CREATOR: External File Interface
* DATE: 11OCT17
* DESC: Generated SAS Datastep Code
* TEMPLATE SOURCE: (None Specified.)
***********************************************************************/
data PROPCSV.PALMBEACH15 ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'c:\users\avanvalin2017\documents\propertydata\csv\palmbeach\nal_2015_60palm
beach_f.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat CO_NO best32. ;
informat PARCEL_ID $19. ;
informat FILE_T $3. ;
informat ASMNT_YR best32. ;
informat BAS_STRT $4. ;
informat ATV_STRT $3. ;
informat GRP_NO $3. ;
informat DOR_UC $5. ;
informat PA_UC $4. ;
informat SPASS_CD $1. ;
informat JV best32. ;
informat JV_CHNG $1. ;
informat JV_CHNG_CD $1. ;
informat AV_SD best32. ;
informat AV_NSD best32. ;
informat TV_SD best32. ;
informat TV_NSD best32. ;
informat JV_HMSTD $1. ;
informat AV_HMSTD $1. ;
informat JV_NON_HMSTD_RESD best32. ;
informat AV_NON_HMSTD_RESD best32. ;
informat JV_RESD_NON_RESD best32. ;
informat AV_RESD_NON_RESD best32. ;
informat JV_CLASS_USE best32. ;
informat AV_CLASS_USE best32. ;
informat JV_H2O_RECHRGE $1. ;
informat AV_H2O_RECHRGE $1. ;
informat JV_CONSRV_LND $1. ;
informat AV_CONSRV_LND $1. ;
informat JV_HIST_COM_PROP $1. ;
informat AV_HIST_COM_PROP $1. ;
informat JV_HIST_SIGNF $1. ;
informat AV_HIST_SIGNF $1. ;
informat JV_WRKNG_WTRFNT $1. ;
informat AV_WRKNG_WTRFNT $1. ;
informat NCONST_VAL best32. ;
informat DEL_VAL best32. ;
informat PAR_SPLT $1. ;
informat DISTR_CD $1. ;
informat DISTR_YR $1. ;
informat LND_VAL best32. ;
informat LND_UNTS_CD best32. ;
informat NO_LND_UNTS best32. ;
informat LND_SQFOOT best32. ;
informat DT_LAST_INSPT best32. ;
informat IMP_QUAL $3. ;
informat CONST_CLASS best32. ;
informat EFF_YR_BLT best32. ;
informat ACT_YR_BLT best32. ;
informat TOT_LVG_AREA best32. ;
informat NO_BULDNG best32. ;
informat NO_RES_UNTS best32. ;
informat SPEC_FEAT_VAL best32. ;
informat MULTI_PAR_SAL1 $1. ;
informat SALE_TYPE $1. ;
informat QUAL_CD1 $1. ;
informat VI_CD1 $1. ;
informat SALE_PRC1 $1. ;
informat SALE_YR1 $1. ;
informat SALE_MO1 $1. ;
informat OR_BOOK1 $1. ;
informat OR_PAGE1 $1. ;
informat CLERK_NO1 $1. ;
informat SAL_CHNG_CD1 $1. ;
informat MULTI_PAR_SAL2 $1. ;
informat SALE2_TYPE $1. ;
informat QUAL_CD2 $1. ;
informat VI_CD2 $1. ;
informat SALE_PRC2 $1. ;
informat SALE_YR2 $1. ;
informat SALE_MO2 $1. ;
informat OR_BOOK2 $1. ;
informat OR_PAGE2 $1. ;
informat CLERK_NO2 $1. ;
informat SAL_CHNG_CD2 $1. ;
informat OWN_NAME $32. ;
informat OWN_ADDR1 $23. ;
informat OWN_ADDR2 $1. ;
informat OWN_CITY $14. ;
informat OWN_STATE $4. ;
informat OWN_ZIPCD best32. ;
informat OWN_STATE_DOM $4. ;
informat FIDU_NAME $1. ;
informat FIDU_ADDR1 $1. ;
informat FIDU_ADDR2 $1. ;
informat FIDU_CITY $1. ;
informat FIDU_STATE $1. ;
informat FIDU_ZIPPCD $1. ;
informat FIDU_CD $1. ;
informat S_LEGAL $32. ;
informat APP_STAT $1. ;
informat CO_APP_STAT $1. ;
informat MKT_AR $4. ;
informat NBRHD_CD $10. ;
informat PUBLIC_LND $3. ;
informat TAX_AUTH_CD $6. ;
informat TWN $5. ;
informat RNG $5. ;
informat SEC best32. ;
informat CENSUS_BK $14. ;
informat PHY_ADDR1 $25. ;
informat PHY_ADDR2 $1. ;
informat PHY_CITY $22. ;
informat PHY_ZIPCD best32. ;
informat ASS_TRNSFR_FG $1. ;
informat PREV_HMSTD_OWN $1. ;
informat ASS_DIF_TRNS $1. ;
informat CONO_PRV_HM $1. ;
informat PARCEL_ID_PRV_HMSTD $1. ;
informat YR_VAL_TRNSF $1. ;
informat EXEMPTIONS $7. ;
informat EXEMPT_FLAG $1. ;
informat EXEMPT_OTHER $1. ;
informat HOMESTEAD $1. ;
informat WIDOW $1. ;
informat DISABLE $1. ;
informat RNWL_ENERGY $1. ;
informat SEQ_NO best32. ;
informat RS_ID $6. ;
informat MP_ID $10. ;
informat STATE_PARCEL_ID $20. ;
format CO_NO best12. ;
format PARCEL_ID $19. ;
format FILE_T $3. ;
format ASMNT_YR best12. ;
format BAS_STRT $4. ;
format ATV_STRT $3. ;
format GRP_NO $3. ;
format DOR_UC $5. ;
format PA_UC $4. ;
format SPASS_CD $1. ;
format JV best12. ;
format JV_CHNG $1. ;
format JV_CHNG_CD $1. ;
format AV_SD best12. ;
format AV_NSD best12. ;
format TV_SD best12. ;
format TV_NSD best12. ;
format JV_HMSTD $1. ;
format AV_HMSTD $1. ;
format JV_NON_HMSTD_RESD best12. ;
format AV_NON_HMSTD_RESD best12. ;
format JV_RESD_NON_RESD best12. ;
format AV_RESD_NON_RESD best12. ;
format JV_CLASS_USE best12. ;
format AV_CLASS_USE best12. ;
format JV_H2O_RECHRGE $1. ;
format AV_H2O_RECHRGE $1. ;
format JV_CONSRV_LND $1. ;
format AV_CONSRV_LND $1. ;
format JV_HIST_COM_PROP $1. ;
format AV_HIST_COM_PROP $1. ;
format JV_HIST_SIGNF $1. ;
format AV_HIST_SIGNF $1. ;
format JV_WRKNG_WTRFNT $1. ;
format AV_WRKNG_WTRFNT $1. ;
format NCONST_VAL best12. ;
format DEL_VAL best12. ;
format PAR_SPLT $1. ;
format DISTR_CD $1. ;
format DISTR_YR $1. ;
format LND_VAL best12. ;
format LND_UNTS_CD best12. ;
format NO_LND_UNTS best12. ;
format LND_SQFOOT best12. ;
format DT_LAST_INSPT best12. ;
format IMP_QUAL $3. ;
format CONST_CLASS best12. ;
format EFF_YR_BLT best12. ;
format ACT_YR_BLT best12. ;
format TOT_LVG_AREA best12. ;
format NO_BULDNG best12. ;
format NO_RES_UNTS best12. ;
format SPEC_FEAT_VAL best12. ;
format MULTI_PAR_SAL1 $1. ;
format SALE_TYPE $1. ;
format QUAL_CD1 $1. ;
format VI_CD1 $1. ;
format SALE_PRC1 $1. ;
format SALE_YR1 $1. ;
format SALE_MO1 $1. ;
format OR_BOOK1 $1. ;
format OR_PAGE1 $1. ;
format CLERK_NO1 $1. ;
format SAL_CHNG_CD1 $1. ;
format MULTI_PAR_SAL2 $1. ;
format SALE2_TYPE $1. ;
format QUAL_CD2 $1. ;
format VI_CD2 $1. ;
format SALE_PRC2 $1. ;
format SALE_YR2 $1. ;
format SALE_MO2 $1. ;
format OR_BOOK2 $1. ;
format OR_PAGE2 $1. ;
format CLERK_NO2 $1. ;
format SAL_CHNG_CD2 $1. ;
format OWN_NAME $32. ;
format OWN_ADDR1 $23. ;
format OWN_ADDR2 $1. ;
format OWN_CITY $14. ;
format OWN_STATE $4. ;
format OWN_ZIPCD best12. ;
format OWN_STATE_DOM $4. ;
format FIDU_NAME $1. ;
format FIDU_ADDR1 $1. ;
format FIDU_ADDR2 $1. ;
format FIDU_CITY $1. ;
format FIDU_STATE $1. ;
format FIDU_ZIPPCD $1. ;
format FIDU_CD $1. ;
format S_LEGAL $32. ;
format APP_STAT $1. ;
format CO_APP_STAT $1. ;
format MKT_AR $4. ;
format NBRHD_CD $10. ;
format PUBLIC_LND $3. ;
format TAX_AUTH_CD $6. ;
format TWN $5. ;
format RNG $5. ;
format SEC best12. ;
format CENSUS_BK $14. ;
format PHY_ADDR1 $25. ;
format PHY_ADDR2 $1. ;
format PHY_CITY $22. ;
format PHY_ZIPCD best12. ;
format ASS_TRNSFR_FG $1. ;
format PREV_HMSTD_OWN $1. ;
format ASS_DIF_TRNS $1. ;
format CONO_PRV_HM $1. ;
format PARCEL_ID_PRV_HMSTD $1. ;
format YR_VAL_TRNSF $1. ;
format EXEMPTIONS $7. ;
format EXEMPT_FLAG $1. ;
format EXEMPT_OTHER $1. ;
format HOMESTEAD $1. ;
format WIDOW $1. ;
format DISABLE $1. ;
format RNWL_ENERGY $1. ;
format SEQ_NO best12. ;
format RS_ID $6. ;
format MP_ID $10. ;
format STATE_PARCEL_ID $20. ;
input
CO_NO
PARCEL_ID $
FILE_T $
ASMNT_YR
BAS_STRT $
ATV_STRT $
GRP_NO $
DOR_UC $
PA_UC $
SPASS_CD $
JV
JV_CHNG $
JV_CHNG_CD $
AV_SD
AV_NSD
TV_SD
TV_NSD
JV_HMSTD $
AV_HMSTD $
JV_NON_HMSTD_RESD
AV_NON_HMSTD_RESD
JV_RESD_NON_RESD
AV_RESD_NON_RESD
JV_CLASS_USE
AV_CLASS_USE
JV_H2O_RECHRGE $
AV_H2O_RECHRGE $
JV_CONSRV_LND $
AV_CONSRV_LND $
JV_HIST_COM_PROP $
AV_HIST_COM_PROP $
JV_HIST_SIGNF $
AV_HIST_SIGNF $
JV_WRKNG_WTRFNT $
AV_WRKNG_WTRFNT $
NCONST_VAL
DEL_VAL
PAR_SPLT $
DISTR_CD $
DISTR_YR $
LND_VAL
LND_UNTS_CD
NO_LND_UNTS
LND_SQFOOT
DT_LAST_INSPT
IMP_QUAL $
CONST_CLASS
EFF_YR_BLT
ACT_YR_BLT
TOT_LVG_AREA
NO_BULDNG
NO_RES_UNTS
SPEC_FEAT_VAL
MULTI_PAR_SAL1 $
SALE_TYPE $
QUAL_CD1 $
VI_CD1 $
SALE_PRC1 $
SALE_YR1 $
SALE_MO1 $
OR_BOOK1 $
OR_PAGE1 $
CLERK_NO1 $
SAL_CHNG_CD1 $
MULTI_PAR_SAL2 $
SALE2_TYPE $
QUAL_CD2 $
VI_CD2 $
SALE_PRC2 $
SALE_YR2 $
SALE_MO2 $
OR_BOOK2 $
OR_PAGE2 $
CLERK_NO2 $
SAL_CHNG_CD2 $
OWN_NAME $
OWN_ADDR1 $
OWN_ADDR2 $
OWN_CITY $
OWN_STATE $
OWN_ZIPCD
OWN_STATE_DOM $
FIDU_NAME $
FIDU_ADDR1 $
FIDU_ADDR2 $
FIDU_CITY $
FIDU_STATE $
FIDU_ZIPPCD $
FIDU_CD $
S_LEGAL $
APP_STAT $
CO_APP_STAT $
MKT_AR $
NBRHD_CD $
PUBLIC_LND $
TAX_AUTH_CD $
TWN $
RNG $
SEC
CENSUS_BK $
PHY_ADDR1 $
PHY_ADDR2 $
PHY_CITY $
PHY_ZIPCD
ASS_TRNSFR_FG $
PREV_HMSTD_OWN $
ASS_DIF_TRNS $
CONO_PRV_HM $
PARCEL_ID_PRV_HMSTD $
YR_VAL_TRNSF $
EXEMPTIONS $
EXEMPT_FLAG $
EXEMPT_OTHER $
HOMESTEAD $
WIDOW $
DISABLE $
RNWL_ENERGY $
SEQ_NO
RS_ID $
MP_ID $
STATE_PARCEL_ID $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;

Solution
‎10-12-2017 03:01 PM
Super User
Posts: 13,942

Re: CSV vs XLSX Import. How to reformat specific variables.

Posted in reply to andrewfau

And what specifically is "wrong" when you use that code?

  Does a variable need to be longer? Change the number in the informat. Changing $16. to $25. will read upto 25 characters instead of 16.

  Does the variable to be numeric but is coming in as character? Change the informat for the variable to a numeric such as Best16. or Best32.

Is a variable coming in as numeric but should be character? Change the informat to a $.

Either remove the format or change to match the informat to insure you can see the results.

 

You probably needed to set guessing rows in this:

proc import datafile= 'c:\users\myname\documents\propertydata\csv\palmbeach\nal_2015_60palm beach_f' 
dbms=csv out=propcsv.palmbeach15; run;

so that more than a few rows are examined. By default SAS determines variable characteristics using the first 20 rows.

 

Add: Guessingrows=32000; to tell SAS to use information from 32000 rows to guess the lengths and variable types. Very large values for guessingrows may add some time to read the data during the scan but generally isn't too noticeable.

Contributor
Posts: 32

Re: CSV vs XLSX Import. How to reformat specific variables.

I posted screen shots showing what is happening in the initial post. They are read in as character values which is fine for all but one. The sale price variable (sale_prc1) needs to be numerics and then all of the variables are only reading the first character. So 2014 is reading 2; 11 is reading 1; 53400 is reading 5, and so on. I need the years and month and price to read the full value and the price to be changed to numeric.

Super User
Posts: 13,942

Re: CSV vs XLSX Import. How to reformat specific variables.

Posted in reply to andrewfau

When asking about how to read a file then examples of the input file with difficult cases is important. For a CSV file that is simple: copy a few lines of the text and paste it into a code box opened using the forum {I} menu icon.

 

Pictures are awkward and since you displayed something in the 21,000 row area it is hard to tell that the issue is likely to be many rows of blank values (which SAS will generally assign a 1 character format to) at the top of the file before encountering the rows with actual values for some or many of the variables.

 

If your data is all as sparse as shown you may want even larger values for the Guessingrows parameter.

Respected Advisor
Posts: 4,797

Re: CSV vs XLSX Import. How to reformat specific variables.

Posted in reply to andrewfau

@andrewfau

"The sale price variable (sale_prc1) needs to be numerics"

Then either use GUESSINGROWS=MAX as already suggested so SAS analyzes many more source rows before defining the variables and importing the data.

 

Or... copy/paste the generated data step to a code window and amend the code as required and like already suggested by @Reeza.

I.e. for sal_prc1...

what you have in the code: informat SALE_PRC1 $1.;

change to: informat SALE_PRC1 best32.;

Contributor
Posts: 32

Re: CSV vs XLSX Import. How to reformat specific variables.

I changed the informat of my sale_prc and the log is telling me "the format $BEST was not found or could not be loaded." 

 

It seemed to work when I changed the guessing rows, but I want to do the best option that is going to work best when doing descriptive statistics later on. Does it need to be numeric for that?

 

Sorry. I am very new to SAS and am learning it all in a trial by fire situation. Thanks everyone for your patience and help. 

Contributor
Posts: 32

Re: CSV vs XLSX Import. How to reformat specific variables.

It looks like guessingrows=32000 may be the solution. I will have to look a bit more closely and try it on others to see.
Super User
Posts: 24,026

Re: CSV vs XLSX Import. How to reformat specific variables.

Posted in reply to andrewfau

@andrewfau YOU need to fix your code. So find the name of the variable that is 'incorrect', change the FORMAT/INFORMAT as needed to what you need it to be. 

Contributor
Posts: 32

Re: CSV vs XLSX Import. How to reformat specific variables.

@Reeza So guessingrows is not something that I want to do? I changed the informat but it is saying "The format $BEST was not found or could not be loaded. 

Super User
Posts: 24,026

Re: CSV vs XLSX Import. How to reformat specific variables.

Posted in reply to andrewfau

GuessingRows change is a great idea. Then if it still has issues, ie reading in something as character that you want as numeric, you need to change it in three locations, the FORMAT, INFORMAT and INPUT statement.

 

Format and Informats are pretty self explanatory. What you also want to do is delete the dollar ($) signs from the INPUT line. Then the type is derived based on the format/informat. 

 

 

Contributor
Posts: 32

Re: CSV vs XLSX Import. How to reformat specific variables.

Thanks
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 255 views
  • 4 likes
  • 5 in conversation