BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HwGoh
Calcite | Level 5

I'm having a problem reading this dataset 

here's my code :

data uiui;
infile ''
dlm=',' dsd firstobs=2;
length VAR1 8 rank 8 author $39 book $128 ratings 8 reviews 8 type $10 Price 8 ;
format VAR1 8.;
format author $39.;
format book $128.;
format price 8.2;
format rank 8.;
format ratings 8.2;
format reviews 8.;
format type $10.;
input
VAR1
rank
book $
author $
ratings
reviews
type $
price;
if _n_ = 18 then book = 'You Are a Badass®';
if _n_ = 92 then author ='Brené Brown';
run;
proc print data =uiui;
run;
proc contents data=uiui;
run;

 if single line observation is fine but I have no idea how to deal with multiple line observation.

 

Appreciate your help thks in advance

 
 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

I have checked the input file using noetpad++ utility.

Each book-line is terminated by CRLF characters.

When the book-line is splitted into two lines there is a LF only at the end of the line.

 

Next code reads the file correctly. You need just to correct the path to the input file.

filename booklist '/folders/myfolders/flat/top_100_books.txt';

data uiui;
	infile booklist dlm='09'x dsd firstobs=2 TERMSTR=crlf truncover;
	length VAR1 8 rank 8 book $128 author $39  ratings 8 reviews 8 type $10 Price 8;
	format VAR1 8. book $128. author $39. price 8.2 rank 8. ratings 8.2
	       reviews 8. type $10.;
   	input @1 VAR1 rank book $ author $ ratings reviews type $ price;	
	if _n_ = 18 then book = 'You Are a Badass®';
	if _n_ = 92 then author ='Brené Brown';
run;
proc print data =uiui;
run;

proc contents data=uiui;
run;

View solution in original post

4 REPLIES 4
ebowen
Quartz | Level 8

This looks like a poor output from a pdf or something that put carriage returns in the middle of several observations. Unfortunately, I would say you're just going to have to fix your data so that each observation comes on only one line. It's short enough that you should just be able to open it up in a text editor and delete the extra line breaks.
Also, this looks like a tab-delimited file, not a comma delimited. So your delimiter should be:
dlm='09'x;

ballardw
Super User

Since your example data basically does not have any commas in it, why does your infile statement include DLM=','?

 

If you did something like 1) open a CSV file in a spreadsheet and 2) pasted the result of that as "data" then I hope you have an original copy of the CSV from before opening with the spreadsheet, especially if did any editing of the file in the spreadsheet. Spreadsheets can sometimes seriously change values of cells.

 

There is very interesting pattern. Every one of the "data lines" that extends to a second line in the file starts with a " character. Making one suspect that perhaps something happened when those quotes were entered. Not everywhere " at the end of phrase but that is common.

59	59	"Brown Bear, Brown Bear, What Do You See?"	Bill Martin Jr	4.8	3047	Board book	7.23
60	60	Beneath A Scarlet Sky: A Novel	Mark Sullivan	4.7	25842	Paperback	11.99
61	61	What to Expect When You're Expecting	Heidi Murkoff	4.4	3587	Paperback	10.77
62	62	Principles: Life and Work	Ray Dalio	4.4	966	Hardcover	17.95
63	63	"Astrophysics for People in a Hurry
        "	Neil deGrasse Tyson	4.6	3971	Hardcover	13.29
64	64	"Laugh-Out-Loud Jokes for Kids
        "	Rob Elliott	4.6	4537	Paperback	3.61
65	65	Medical Medium Liver Rescue	Anthony William	4.8	1575	Hardcover	20.99
66	66	"The Official SAT Study Guide, 2018 Edition (Official Study Guide for the New Sat)   "	The College Board	4.4	409	Paperback	23
67	67	Calm the F*ck Down	Sasha O'Hara	4.4	2027	Paperback	4.99
68	68	The Care and Keeping of You		4.8	2861	Paperback	9.73
69	69	The Outsider	Stephen King	4.4	3994	Hardcover	17.4
70	70	"Where the Crawdads Sing
        "	Delia Owens	4.8	15153	Hardcover	15.6
71	71	"Dungeons & Dragons Dungeon Master's Guide (Core Rulebook, D&D Roleplaying Game)
        "	Wizards RPG Team	4.6	1534	Hardcover	33.35
72	72	Sapiens	Yuval Noah Harari	4.5	5984	Paperback	13.56

One strongly suspects the posted file is TAB delimited but no clue about the " continuing on the next line.

Shmuel
Garnet | Level 18

I have checked the input file using noetpad++ utility.

Each book-line is terminated by CRLF characters.

When the book-line is splitted into two lines there is a LF only at the end of the line.

 

Next code reads the file correctly. You need just to correct the path to the input file.

filename booklist '/folders/myfolders/flat/top_100_books.txt';

data uiui;
	infile booklist dlm='09'x dsd firstobs=2 TERMSTR=crlf truncover;
	length VAR1 8 rank 8 book $128 author $39  ratings 8 reviews 8 type $10 Price 8;
	format VAR1 8. book $128. author $39. price 8.2 rank 8. ratings 8.2
	       reviews 8. type $10.;
   	input @1 VAR1 rank book $ author $ ratings reviews type $ price;	
	if _n_ = 18 then book = 'You Are a Badass®';
	if _n_ = 92 then author ='Brené Brown';
run;
proc print data =uiui;
run;

proc contents data=uiui;
run;
HwGoh
Calcite | Level 5
Actually, save the file into xlsx for importing on sas studio is easy than using the data step but I wanna try. Your code is worked, really appreciate your help. THANKS

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 568 views
  • 2 likes
  • 4 in conversation