BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wgonzalez
Calcite | Level 5

I was tasked to find a data set for an assignment.  While trying to read in the csv, I wanted to take a look at it before I proceeded with my relatively simple assignment.  Here is the code that I first used:

 

data basketball;

infile "C:\Users\wonha\Desktop\SMU Class Material\6308 SAS II and Databases\mens-professional-basketball\basketball_player_allstar.csv" dlm=","
firstobs=2 dsd;

input ID $ L_Name $ F_Name $ Season $ Conference $ League $ Games Minutes Points Off_Rebounds Def_Rebounds Rebounds Assists Steals Blocks Turnovers
Fouls FG_Attempt FG_Made FT_Attempt FT_Made Threes_Attempt Threes_Made;

 

proc print data=basketball (Firstobs=1 obs=5);
run;

 

Here is what appears in the results viewer:image.png

 

I thought that I should then use an informat for the ID, first name, and last name variables, so I use the $12. option on each.

data basketball;

infile "C:\Users\wonha\Desktop\SMU Class Material\6308 SAS II and Databases\mens-professional-basketball\basketball_player_allstar.csv" dlm=","
firstobs=2 dsd;

input ID $12. L_Name $12. F_Name $12. Season $ Conference $ League $ Games Minutes Points Off_Rebounds Def_Rebounds Rebounds Assists Steals Blocks Turnovers
Fouls FG_Attempt FG_Made FT_Attempt FT_Made Threes_Attempt Threes_Made;

 

proc print data=basketball (Firstobs=1 obs=5);
run;

 

Which produced the second set of results: image.png.

 

I took a look at my csv file in a text file and found that each variable and value were in double quotation marks. Here is a short portion of the txt file of the dataset:

"player_id","last_name","first_name","season_id","conference","league_id","games_played","minutes","points","o_rebounds","d_rebounds","rebounds","assists","steals","blocks","turnovers","personal_fouls","fg_attempted","fg_made","ft_attempted","ft_made","three_attempted","three_made"
"abdulka01","Abdul-Jabbar","Kareem","1978","West","NBA","1","28","11",,,"8","3",,,,,"12","5","2","1",,
"abdulka01","Abdul-Jabbar","Kareem","1969","East","NBA","1","18","10",,,"11","4",,,,,"8","4","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1988","West","NBA","1","13","4",,,"3","0",,,,,"6","1","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1987","West","NBA","1","14","10",,,"4","0",,,,,"9","4","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1986","West","NBA","1","27","10",,,"8","3",,,,,"9","4","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1985","West","NBA","1","32","21",,,"7","2",,,,,"15","9","4","3",,
"abdulka01","Abdul-Jabbar","Kareem","1984","West","NBA","1","23","11",,,"6","1",,,,,"10","5","2","1",,
"abdulka01","Abdul-Jabbar","Kareem","1983","West","NBA","1","37","25",,,"13","2",,,,,"19","11","4","3",,
"abdulka01","Abdul-Jabbar","Kareem","1982","West","NBA","1","32","20",,,"6","5",,,,,"12","9","3","2",,
"abdulka01","Abdul-Jabbar","Kareem","1980","West","NBA","1","23","15",,,"6","4",,,,,"9","6","3","3",,
"abdulka01","Abdul-Jabbar","Kareem","1979","West","NBA","1","30","17",,,"16","9",,,,,"17","6","6","5",,
"abdulka01","Abdul-Jabbar","Kareem","1976","West","NBA","1","23","21",,,"4","2",,,,,"14","8","6","5",,
"abdulka01","Abdul-Jabbar","Kareem","1975","West","NBA","1","36","22",,,"5","3",,,,,"16","9","4","4",,
"abdulka01","Abdul-Jabbar","Kareem","1974","West","NBA","1","19","7",,,"10","3",,,,,"10","3","2","1",,
"abdulka01","Abdul-Jabbar","Kareem","1973","West","NBA","1","23","14",,,"8","6",,,,,"11","7","0","0",,
"abdulka01","Abdul-Jabbar","Kareem","1972","West","NBA","1","98",,,,,,,,,,,,,,,
"abdulka01","Abdul-Jabbar","Kareem","1971","West","NBA","1","19","12",,,"7","2",,,,,"10","5","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1970","West","NBA","1","30","19",,,"14","1",,,,,"16","8","4","3",,
"abdulka01","Abdul-Jabbar","Kareem","1981","West","NBA","1","22","2","1","2","3","1","3","2","1","3","10","1","0","0","0","0"
"abdursh01","A-Rahim","Shareef","2001","East","NBA","1","21","9","1","5","6","0","0","0","0","2","4","4","0","0","1","1"

 

I have tried various informat and format options, but get about the same as picture above.  Is there any way to remove all of the quotation marks while trying to read in the .csv file?

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Replace 

input ID $12. L_Name:$12. F_Name $12.

with 

input ID :$12. L_Name :$12. F_Name :$12. 

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Replace 

input ID $12. L_Name:$12. F_Name $12.

with 

input ID :$12. L_Name :$12. F_Name :$12. 

Reeza
Super User

This is actually standard in a CSV, at least for text variables. I'm surprised the numerics are all in quotes but it can still be dealt with. The DSD option handles the quotes, you don't need do explicitly deal with it. 

 

You should apply your informants ahead of the INPUT statement to get it to work correctly, rather than on the INPUT statement. 

 

informat ID L_Name F_Name $12.;

input ..... ;

Give that a shot first. If that doesn't work, you may want to read everything in as a character and then convert to numeric. 

 


@wgonzalez wrote:

I was tasked to find a data set for an assignment.  While trying to read in the csv, I wanted to take a look at it before I proceeded with my relatively simple assignment.  Here is the code that I first used:

 

data basketball;

infile "C:\Users\wonha\Desktop\SMU Class Material\6308 SAS II and Databases\mens-professional-basketball\basketball_player_allstar.csv" dlm=","
firstobs=2 dsd;

input ID $ L_Name $ F_Name $ Season $ Conference $ League $ Games Minutes Points Off_Rebounds Def_Rebounds Rebounds Assists Steals Blocks Turnovers
Fouls FG_Attempt FG_Made FT_Attempt FT_Made Threes_Attempt Threes_Made;

 

proc print data=basketball (Firstobs=1 obs=5);
run;

 

Here is what appears in the results viewer:image.png

 

I thought that I should then use an informat for the ID, first name, and last name variables, so I use the $12. option on each.

data basketball;

infile "C:\Users\wonha\Desktop\SMU Class Material\6308 SAS II and Databases\mens-professional-basketball\basketball_player_allstar.csv" dlm=","
firstobs=2 dsd;

input ID $12. L_Name $12. F_Name $12. Season $ Conference $ League $ Games Minutes Points Off_Rebounds Def_Rebounds Rebounds Assists Steals Blocks Turnovers
Fouls FG_Attempt FG_Made FT_Attempt FT_Made Threes_Attempt Threes_Made;

 

proc print data=basketball (Firstobs=1 obs=5);
run;

 

Which produced the second set of results: image.png.

 

I took a look at my csv file in a text file and found that each variable and value were in double quotation marks. Here is a short portion of the txt file of the dataset:

"player_id","last_name","first_name","season_id","conference","league_id","games_played","minutes","points","o_rebounds","d_rebounds","rebounds","assists","steals","blocks","turnovers","personal_fouls","fg_attempted","fg_made","ft_attempted","ft_made","three_attempted","three_made"
"abdulka01","Abdul-Jabbar","Kareem","1978","West","NBA","1","28","11",,,"8","3",,,,,"12","5","2","1",,
"abdulka01","Abdul-Jabbar","Kareem","1969","East","NBA","1","18","10",,,"11","4",,,,,"8","4","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1988","West","NBA","1","13","4",,,"3","0",,,,,"6","1","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1987","West","NBA","1","14","10",,,"4","0",,,,,"9","4","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1986","West","NBA","1","27","10",,,"8","3",,,,,"9","4","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1985","West","NBA","1","32","21",,,"7","2",,,,,"15","9","4","3",,
"abdulka01","Abdul-Jabbar","Kareem","1984","West","NBA","1","23","11",,,"6","1",,,,,"10","5","2","1",,
"abdulka01","Abdul-Jabbar","Kareem","1983","West","NBA","1","37","25",,,"13","2",,,,,"19","11","4","3",,
"abdulka01","Abdul-Jabbar","Kareem","1982","West","NBA","1","32","20",,,"6","5",,,,,"12","9","3","2",,
"abdulka01","Abdul-Jabbar","Kareem","1980","West","NBA","1","23","15",,,"6","4",,,,,"9","6","3","3",,
"abdulka01","Abdul-Jabbar","Kareem","1979","West","NBA","1","30","17",,,"16","9",,,,,"17","6","6","5",,
"abdulka01","Abdul-Jabbar","Kareem","1976","West","NBA","1","23","21",,,"4","2",,,,,"14","8","6","5",,
"abdulka01","Abdul-Jabbar","Kareem","1975","West","NBA","1","36","22",,,"5","3",,,,,"16","9","4","4",,
"abdulka01","Abdul-Jabbar","Kareem","1974","West","NBA","1","19","7",,,"10","3",,,,,"10","3","2","1",,
"abdulka01","Abdul-Jabbar","Kareem","1973","West","NBA","1","23","14",,,"8","6",,,,,"11","7","0","0",,
"abdulka01","Abdul-Jabbar","Kareem","1972","West","NBA","1","98",,,,,,,,,,,,,,,
"abdulka01","Abdul-Jabbar","Kareem","1971","West","NBA","1","19","12",,,"7","2",,,,,"10","5","2","2",,
"abdulka01","Abdul-Jabbar","Kareem","1970","West","NBA","1","30","19",,,"14","1",,,,,"16","8","4","3",,
"abdulka01","Abdul-Jabbar","Kareem","1981","West","NBA","1","22","2","1","2","3","1","3","2","1","3","10","1","0","0","0","0"
"abdursh01","A-Rahim","Shareef","2001","East","NBA","1","21","9","1","5","6","0","0","0","0","2","4","4","0","0","1","1"

 

I have tried various informat and format options, but get about the same as picture above.  Is there any way to remove all of the quotation marks while trying to read in the .csv file?

 

Thank you.


 

Tom
Super User Tom
Super User

Two things.

The DSD option on the INFILE statement  will automatically remove the optional quotes around values. It will also properly handle any null values.

 

Second when reading delimited data you need to use what SAS calls LIST mode input style instead of FORMATTED input style.  So either totally remove the informats from the INPUT statement (or add the colon modifier in front of the informat specifications).  If you slip into formatted input mode then you might accidentally read past  the delimiter and mess up the parsing of the input line.

 

It is usually best to define you variables before you use them (otherwise SAS will guess how to define them based on how you first use them).  For example you can just use a LENGTH statement to define the type and length of each variable. Numeric variables should be length 8 since SAS stores numbers using 8 byte floating point.  Make the character variables long enough for the maximum number of character you expect them to hold.

 

infile "...." dlm="," firstobs=2 dsd truncover;

length ID $12 L_Name $12 F_Name $12 Season $12 Conference $12 League $12 
Games Minutes Points Off_Rebounds Def_Rebounds Rebounds Assists Steals Blocks 
Turnovers Fouls FG_Attempt FG_Made FT_Attempt FT_Made Threes_Attempt Threes_Made 8
;

input id -- Threes_Made ;

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 9011 views
  • 2 likes
  • 4 in conversation