I am using SAS Studio. Code is not reading data with commas enclosed by "s. I tried two ways 1)format 2)informat.
I am reading a csv file with long column headings which I changed to ABCDEFGHIJK. I will deal with the long variable names next but first how should I read this in?
A=A B=. ....K=. Error message says I have invalid data for B--K.
How to do? Please see details below. Thank you.
Mary A. Marion
Data (first three lines):
A,B,C,D,E,F,G,H,I,J,K
Albany County,"294,565","289,565","245,060","32,624",605,"8,090",84,"3,102","5,000","9,079"
Allegany County,"49,927","49,487","48,444",361,139,358,2,183,440,454 ...
data ny;
infile '/folders/myshortcuts/611/data/Week 03/HWdata/nypopulation.mm.csv' DSD;
input A B C D E F G H I J K;
format A $30. B comma8. C comma8. D comma8. E comma8.
F comma8. G comma8. H comma8. I comma8.
J comma8. K comma8.;
run;
data ny;
informat A $30. B comma8. C comma8. D comma8. E comma8.
F comma8. G comma8. H comma8. I comma8.
J comma8. K comma8.;
infile '/folders/myshortcuts/611/data/Week 03/HWdata/nypopulation.mm.csv' DSD;
input A B C D E F G H I J K;
run;
I added format statements as well as informat statements on my latest try. Please see log below. Thank you. MM
Hello,
I successfully read in the data as A--K but since I have to merge this data set with 3 other datasets with different variables ( not B--K), I am not sure this is the best way to go. How to read in this data using long file names as opposed to using letters? I am using label statements. If I use this technique for other files there will be multiple B's ..K's whose contents differ. Current working code is given below.
Thank you. MM
data nypop ;
informat A $30. B C D E F G H I J K comma10.;
infile '/folders/myshortcuts/611/data/Week 03/HWdata/nypopulation.mm.csv' DSD firstobs=2;
input A B C D E F G H I J K ;
format A $30. B C D E F G H I J K comma10.;
label A='Geographical Area'
B='Total Population'
C='Total One Race'
D='White'
E='Black or African American'
F='American Indian and Alaska Native'
G='Asian'
H='Native Hawaiian and Other Pacific Islander'
I='Some other race'
J='Two or more Races'
K='Hispanic or Latino(of any race)';
run;
proc print data=nypop label; run;
Hi @MaryA_Marion The problem is simple as we request your attention to the LOG:
1. You are trying to read variable A as character, so no problem
2. You are trying to read from B--K as numeric with an informat commaw.
3. So when the informat is applied on the char record i.e firstobs=1 (in other word when firstobs is read, SAS is unable to apply the informat commaw. to convert char to num. Therefore it triggers _ERROR_=1 for _N_=1.. @Reeza pointed this too
4. Your 2nd record should have been fine. Did you check your output?
So
infile FIRSTOBS=2;
informat A $30. B C D E F G H I J K comma10.;
input A B C D E F G H I J K;
format A $30. B C C E F G H I J K comma8.;
run;
should do
hi,
The following code ran:
data ny;
informat A $30. B C D E F G H I J K comma10.;
infile '/folders/myshortcuts/611/data/Week 03/HWdata/nypopulation.mm.csv' DSD firstobs=2;
input A B C D E F G H I J K ;
format A $30. B C D E F G H I J K comma10.;
run;
Thank you. I will be deleting this post tomorrow. MM
We are glad. All the best!
@MaryA_Marion wrote:
Thank you. I will be deleting this post tomorrow. MM
Please don't delete your question. The point of the forum is to save posts and such for future users to search when they have the same problem. Deleting your questions solves your problem but doesn't contribute to a body of knowledge for future users.
There's no data set attached that I see. If you're referring to the data in your posts you can click each post at the top right and delete what you need.
In the future I'd highly recommend not posting actual data and adding fake records instead of working with your real data.
@MaryA_Marion wrote:
I am in a classroom and don't want to affect my professor's class. I need
the data sent to SAS communities to be deleted. Can you help with that? I
have great books on SAS but sometimes the syntax eludes me.
Mary A. Marion
The problem with the data your present is that some of the CSV numeric fields (apparently those valued at 1,000 or more) have embedded commas and are in quotes. So, except for the first field with county names, you have to read the subsequent fields one segment (i.e. on comma-separated set of characters) at a time. If the segment has a leading quote, then you have to concatenate further segments to it (stored in character variable COMPLETE) until you encounter a segment with a trailing quote. Then just convert that complete value to a number. If the segment has no leading quote then you have a complete value that can be converted to a number.
To enable the approach above you need to parse the automatic variable _INFILE_, which has a copy of the raw data line in it. That way you can look at one segment at a time, and dynamically change how to read in each value:
data ny (drop=_:);
infile datalines dlm=',';
input A :$30. @;
array var {*} b c d e f g h i j k;
length _complete $12. ;
do v=1 to dim(var);
call missing(_complete);
input _segment :$5. @;
if (_segment=:'"') then do until(indexc(_segment,'"'));
_complete=cats(_complete,_segment);
input _segment :$5. @;
end;
_complete=cats(_complete,_segment);
_complete=translate(_complete,'','"');
var{v}=input(_complete,best12.);
end;
format b--k comma12.0;
datalines;
Albany County, "294,565", "289,565", "245,060", "32,624", 605, "8,090", 84, "3,102", "5,000", "9,079"
Allegany County,"49,927", "49,487", "48,444", 361, 139, 358, 2, 183, 440, 454
run;
The "do until indexc(_segment,'"') is a loop that will continue until a newly-read _segment has its own quote (i.e. the trailing quote). BTW, this will accomodate quoted numbers with more than 2 segments (i.e. like "123,436,789")
Huh? The quotes are the solution to the embedded delimiter problem. Using the DSD option handles that for you. To handle the numeric values with commas use the COMMA informat.
data test;
infile cards dsd truncover ;
length name $20 var1-var10 8;
informat var1-var10 comma.;
input name var1-var10;
cards;
Albany County, "294,565", "289,565", "245,060", "32,624", 605, "8,090", 84, "3,102", "5,000", "9,079"
Allegany County,"49,927", "49,487", "48,444", 361, 139, 358, 2, 183, 440, 454
;
proc print;
run;
Obs name var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 1 Albany County 294565 289565 245060 32624 605 8090 84 3102 5000 9079 2 Allegany County 49927 49487 48444 361 139 358 2 183 440 454
Cool. I didn't realize that.
regards,
Mark
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.