I'm having a problem reading this dataset
here's my code :
data uiui;
infile ''
dlm=',' dsd firstobs=2;
length VAR1 8 rank 8 author $39 book $128 ratings 8 reviews 8 type $10 Price 8 ;
format VAR1 8.;
format author $39.;
format book $128.;
format price 8.2;
format rank 8.;
format ratings 8.2;
format reviews 8.;
format type $10.;
input
VAR1
rank
book $
author $
ratings
reviews
type $
price;
if _n_ = 18 then book = 'You Are a Badass®';
if _n_ = 92 then author ='Brené Brown';
run;
proc print data =uiui;
run;
proc contents data=uiui;
run;
if single line observation is fine but I have no idea how to deal with multiple line observation.
Appreciate your help thks in advance
I have checked the input file using noetpad++ utility.
Each book-line is terminated by CRLF characters.
When the book-line is splitted into two lines there is a LF only at the end of the line.
Next code reads the file correctly. You need just to correct the path to the input file.
filename booklist '/folders/myfolders/flat/top_100_books.txt';
data uiui;
infile booklist dlm='09'x dsd firstobs=2 TERMSTR=crlf truncover;
length VAR1 8 rank 8 book $128 author $39 ratings 8 reviews 8 type $10 Price 8;
format VAR1 8. book $128. author $39. price 8.2 rank 8. ratings 8.2
reviews 8. type $10.;
input @1 VAR1 rank book $ author $ ratings reviews type $ price;
if _n_ = 18 then book = 'You Are a Badass®';
if _n_ = 92 then author ='Brené Brown';
run;
proc print data =uiui;
run;
proc contents data=uiui;
run;
This looks like a poor output from a pdf or something that put carriage returns in the middle of several observations. Unfortunately, I would say you're just going to have to fix your data so that each observation comes on only one line. It's short enough that you should just be able to open it up in a text editor and delete the extra line breaks.
Also, this looks like a tab-delimited file, not a comma delimited. So your delimiter should be:
dlm='09'x;
Since your example data basically does not have any commas in it, why does your infile statement include DLM=','?
If you did something like 1) open a CSV file in a spreadsheet and 2) pasted the result of that as "data" then I hope you have an original copy of the CSV from before opening with the spreadsheet, especially if did any editing of the file in the spreadsheet. Spreadsheets can sometimes seriously change values of cells.
There is very interesting pattern. Every one of the "data lines" that extends to a second line in the file starts with a " character. Making one suspect that perhaps something happened when those quotes were entered. Not everywhere " at the end of phrase but that is common.
59 59 "Brown Bear, Brown Bear, What Do You See?" Bill Martin Jr 4.8 3047 Board book 7.23 60 60 Beneath A Scarlet Sky: A Novel Mark Sullivan 4.7 25842 Paperback 11.99 61 61 What to Expect When You're Expecting Heidi Murkoff 4.4 3587 Paperback 10.77 62 62 Principles: Life and Work Ray Dalio 4.4 966 Hardcover 17.95 63 63 "Astrophysics for People in a Hurry " Neil deGrasse Tyson 4.6 3971 Hardcover 13.29 64 64 "Laugh-Out-Loud Jokes for Kids " Rob Elliott 4.6 4537 Paperback 3.61 65 65 Medical Medium Liver Rescue Anthony William 4.8 1575 Hardcover 20.99 66 66 "The Official SAT Study Guide, 2018 Edition (Official Study Guide for the New Sat) " The College Board 4.4 409 Paperback 23 67 67 Calm the F*ck Down Sasha O'Hara 4.4 2027 Paperback 4.99 68 68 The Care and Keeping of You 4.8 2861 Paperback 9.73 69 69 The Outsider Stephen King 4.4 3994 Hardcover 17.4 70 70 "Where the Crawdads Sing " Delia Owens 4.8 15153 Hardcover 15.6 71 71 "Dungeons & Dragons Dungeon Master's Guide (Core Rulebook, D&D Roleplaying Game) " Wizards RPG Team 4.6 1534 Hardcover 33.35 72 72 Sapiens Yuval Noah Harari 4.5 5984 Paperback 13.56
One strongly suspects the posted file is TAB delimited but no clue about the " continuing on the next line.
I have checked the input file using noetpad++ utility.
Each book-line is terminated by CRLF characters.
When the book-line is splitted into two lines there is a LF only at the end of the line.
Next code reads the file correctly. You need just to correct the path to the input file.
filename booklist '/folders/myfolders/flat/top_100_books.txt';
data uiui;
infile booklist dlm='09'x dsd firstobs=2 TERMSTR=crlf truncover;
length VAR1 8 rank 8 book $128 author $39 ratings 8 reviews 8 type $10 Price 8;
format VAR1 8. book $128. author $39. price 8.2 rank 8. ratings 8.2
reviews 8. type $10.;
input @1 VAR1 rank book $ author $ ratings reviews type $ price;
if _n_ = 18 then book = 'You Are a Badass®';
if _n_ = 92 then author ='Brené Brown';
run;
proc print data =uiui;
run;
proc contents data=uiui;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.