BookmarkSubscribeRSS Feed
MaryA_Marion
Lapis Lazuli | Level 10

 

I am using SAS Studio. Code is not reading data with commas enclosed by "s. I tried two ways 1)format  2)informat.

I am reading a csv file with long column headings which I changed to ABCDEFGHIJK. I  will deal with the long variable names next but first how should I read this in?

A=A B=. ....K=.  Error message says I have invalid data for B--K.

How to do? Please see details below. Thank you.

 

Mary A. Marion

 

Data (first three lines):

A,B,C,D,E,F,G,H,I,J,K
Albany County,"294,565","289,565","245,060","32,624",605,"8,090",84,"3,102","5,000","9,079"
Allegany County,"49,927","49,487","48,444",361,139,358,2,183,440,454 ...

 

data ny;
infile '/folders/myshortcuts/611/data/Week 03/HWdata/nypopulation.mm.csv' DSD;
input A B C D E F G H I J K;
format A $30. B comma8. C comma8. D comma8. E comma8.
F comma8. G comma8. H comma8. I comma8.
J comma8. K comma8.;
run;

data ny;
informat A $30. B comma8. C comma8. D comma8. E comma8.
F comma8. G comma8. H comma8. I comma8.
J comma8. K comma8.;
infile '/folders/myshortcuts/611/data/Week 03/HWdata/nypopulation.mm.csv' DSD;
input A B C D E F G H I J K;
run;

 

 

14 REPLIES 14
Reeza
Super User
Can you show the actual log with one or two lines of the details?
MaryA_Marion
Lapis Lazuli | Level 10

I added format statements as well as informat statements on my latest try. Please see log below. Thank you.  MM

 

73 data ny;
74 informat A $30. B comma8. C comma8. D comma8. E comma8.
75 F comma8. G comma8. H comma8. I comma8.
76 J comma8. K comma8.;
77 infile '/folders/myshortcuts/611/data/Week
77 ! 03/HWdata/nypopulation.mm.csv' DSD;
78 input A B C D E F G H I J K;
79 format A $30. B comma8. C comma8. D comma8. E comma8.
80 F comma8. G comma8. H comma8. I comma8.
81 J comma8. K comma8.;
82 run;
 
NOTE: The infile '/folders/myshortcuts/611/data/Week
03/HWdata/nypopulation.mm.csv' is:
Filename=/folders/myshortcuts/611/data/Week
03/HWdata/nypopulation.mm.csv,
Owner Name=root,Group Name=vboxsf,
Access Permission=-rwxrwx---,
Last Modified=30Sep2019:14:54:11,
File Size (bytes)=5312
 
NOTE: Invalid data for B in line 1 3-3.
NOTE: Invalid data for C in line 1 5-5.
NOTE: Invalid data for D in line 1 7-7.
NOTE: Invalid data for E in line 1 9-9.
NOTE: Invalid data for F in line 1 11-11.
NOTE: Invalid data for G in line 1 13-13.
NOTE: Invalid data for H in line 1 15-15.
NOTE: Invalid data for I in line 1 17-17.
NOTE: Invalid data for J in line 1 19-19.
NOTE: Invalid data for K in line 1 21-21.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----
1 A,B,C,D,E,F,G,H,I,J,K 21
A=A B=. C=. D=. E=. F=. G=. H=. I=. J=. K=. _ERROR_=1 _N_=1
NOTE: 63 records were read from the infile '/folders/myshortcuts/611/data/Week
03/HWdata/nypopulation.mm.csv'.
The minimum record length was 21.
The maximum record length was 115.
NOTE: The data set WORK.NY has 63 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
 

 

 

MaryA_Marion
Lapis Lazuli | Level 10
NOTE: Invalid data for K in line 1 21-21.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----
1 A,B,C,D,E,F,G,H,I,J,K 21
A=A B=. C=. D=. E=. F=. G=. H=. I=. J=. K=. _ERROR_=1 _N_=1
NOTE: 63 records were read from the infile '/folders/myshortcuts/611/data/Week
Reeza
Super User
Do you get one error or multiples? If just one then you need to add FIRSTOBS=2 to your INFILE statement to tell SAS to not read the first line of data since its the variable names.
MaryA_Marion
Lapis Lazuli | Level 10

Hello,

 

I successfully read in the data as A--K but since I have to merge this data set with 3 other datasets with different variables ( not B--K), I am not sure this is the best way to go. How to read in this data using long file names as opposed to using letters? I am using label statements. If I use this technique for other files there will be multiple  B's ..K's whose contents differ.  Current working code is given below.

 

Thank you. MM

 

data nypop ;
informat A $30. B C D E F G H I J K comma10.;
infile '/folders/myshortcuts/611/data/Week 03/HWdata/nypopulation.mm.csv' DSD firstobs=2;
input A B C D E F G H I J K ;
format A $30. B C D E F G H I J K comma10.;
label A='Geographical Area'
B='Total Population'
C='Total One Race'
D='White'
E='Black or African American'
F='American Indian and Alaska Native'
G='Asian'
H='Native Hawaiian and Other Pacific Islander'
I='Some other race'
J='Two or more Races'
K='Hispanic or Latino(of any race)';
run;
proc print data=nypop label; run;

 

 

novinosrin
Tourmaline | Level 20

Hi @MaryA_Marion   The problem is simple as we request your attention to the LOG:

 

1. You are trying to read variable A as character, so no problem

2. You are trying to read from B--K as numeric with an informat commaw.

3. So when the informat is applied on the char record i.e firstobs=1 (in other word when firstobs is read, SAS is unable to apply the informat commaw. to convert char to num. Therefore it triggers _ERROR_=1 for _N_=1.. @Reeza  pointed this too

4. Your 2nd record should have been fine. Did you check your output?

 

So

 

infile FIRSTOBS=2;
informat A $30. B C D E F G H I J K comma10.;
input A B C D E F G H I J K;
format A $30. B C C E F G H I J K comma8.;
run;

should do

 

MaryA_Marion
Lapis Lazuli | Level 10

hi,

 

The following code ran:

data ny;
informat A $30. B C D E F G H I J K comma10.;
infile '/folders/myshortcuts/611/data/Week 03/HWdata/nypopulation.mm.csv' DSD firstobs=2;
input A B C D E F G H I J K ;
format A $30. B C D E F G H I J K comma10.;
run;

Thank you. I will be deleting this post tomorrow.  MM

 

novinosrin
Tourmaline | Level 20

We are glad. All the best!

Reeza
Super User

@MaryA_Marion wrote:



Thank you. I will be deleting this post tomorrow.  MM

 


Please don't delete your question.  The point of the forum is to save posts and such for future users to search when they have the same problem. Deleting your questions solves your problem but doesn't contribute to a body of knowledge for future users. 

MaryA_Marion
Lapis Lazuli | Level 10
I am in a classroom and don't want to affect my professor's class. I need
the data sent to SAS communities to be deleted. Can you help with that? I
have great books on SAS but sometimes the syntax eludes me.

Mary A. Marion
Reeza
Super User

There's no data set attached that I see. If you're referring to the data in your posts you can click each post at the top right and delete what you need. 

 

In the future I'd highly recommend not posting actual data and adding fake records instead of working with your real data. 

 


@MaryA_Marion wrote:
I am in a classroom and don't want to affect my professor's class. I need
the data sent to SAS communities to be deleted. Can you help with that? I
have great books on SAS but sometimes the syntax eludes me.

Mary A. Marion

 

mkeintz
PROC Star

The problem with the data your present is that some of the CSV numeric fields (apparently those valued at 1,000 or more) have embedded commas and are in quotes.  So, except for the first field with county names, you have to read the subsequent fields one segment (i.e. on comma-separated set of characters) at a time.  If the segment has a leading quote, then you have to concatenate further segments to it (stored in character variable COMPLETE) until you encounter a segment with a trailing quote.  Then just convert that complete value to a number.  If the segment has no leading quote then you have a complete value that can be converted to a number.

 

To enable the approach above you need to parse the automatic variable _INFILE_, which has a copy of the raw data line in it.  That way you can look at one segment at a time, and dynamically change how to read in each value:

 

data ny (drop=_:);
infile datalines dlm=',';
input A :$30. @;
array var {*} b c d e f g h i j k;
length _complete $12. ;

  do v=1 to dim(var);
    call missing(_complete);
    input _segment :$5. @;
    if (_segment=:'"') then do until(indexc(_segment,'"'));
      _complete=cats(_complete,_segment);
      input _segment :$5. @;
    end;
    _complete=cats(_complete,_segment);
	_complete=translate(_complete,'','"');
    var{v}=input(_complete,best12.);
  end;
  format b--k  comma12.0;
datalines;
Albany County,	"294,565",	"289,565",	"245,060",	"32,624",	605,	"8,090",	84,	"3,102",	"5,000",	"9,079"
Allegany County,"49,927",	"49,487",	"48,444",	361,		139,	358,		2,	183,		440,		454
run;

The "do until indexc(_segment,'"') is a loop that will continue until a newly-read _segment has its own quote (i.e. the trailing quote).  BTW, this will accomodate quoted numbers with more than 2 segments (i.e. like "123,436,789")

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Huh?  The quotes are the solution to the embedded delimiter problem.  Using the DSD option handles that for you. To handle the numeric values with commas use the COMMA informat.

data test;
 infile cards dsd truncover ;
 length name $20 var1-var10 8;
 informat var1-var10 comma.;
 input name var1-var10;
cards;
Albany County,	"294,565",	"289,565",	"245,060",	"32,624",	605,	"8,090",	84,	"3,102",	"5,000",	"9,079"
Allegany County,"49,927",	"49,487",	"48,444",	361,		139,	358,		2,	183,		440,		454
;

proc print;
run;
Obs        name          var1     var2     var3     var4   var5   var6   var7   var8   var9   var10

 1    Albany County     294565   289565   245060   32624    605   8090    84    3102   5000    9079
 2    Allegany County    49927    49487    48444     361    139    358     2     183    440     454

 

mkeintz
PROC Star

Cool.  I didn't realize that.

 

regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 4791 views
  • 2 likes
  • 5 in conversation