BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bondtk
Quartz | Level 8

 

Using SAS studio, 

my system is windows 10, 64 bit

 

raw txt file:

 

Ranch,1250,2,1,sheppard Avenue,"64,000"
Split,1190,1,1,Rand Street,"65,850"
Condo,1400,2,1.5,Market Street,"80,050"
TwoStory,1810,4 3,Garris Street,"107,250"
Ranch,1500,3,3,Kemble Avenue,"86,650"
Split,1615,4,3,West Drive,"94,450"
Split,1305,3,1.5,Graham Avenue,"73,650"

 

 

My code to import raw txt file:

 

data orion.locationrawfile;
infile "/folders/myfolders/pg2/location.txt" dlm=',' dsd ;
length price $15;
length Address $20;
input Style $  Zip Bedroom $ Baths $ Address Price $ ;

 

proc print data = orion.locationrawfile noobs;
var Style Zip Bedroom Baths Address Price;
run;

 

Output error:

 

First three observations has no $ dollar sign with price variable.

4th observation has bedroom, bath as dot , price comes in the address field and Ranch is listed as price.

6th and 7th observation have no dollar signs, 5th observation is missing.

 

Attaching output as pdf file.

1 ACCEPTED SOLUTION

Accepted Solutions
jklaverstijn
Rhodochrosite | Level 12

The short answer is: your approach is the more efficient.

 

The longer one is: it depends. It is also different. Depending on your requirements you may or may not want to retain the value of Price unchanged from how it was provided in the csv. The source is, judging by the enclosing quotation marks, character and you convert that to a number while reading. In the (no matter how unlikely) case this field would contain something other that a number you get a missing and you do not record the original value. I have worked in many projects where that would be regarded as a problem. So ultimately the choice between your approach and mine would not only be one of efficiency but also based on existing guidelines and business decisions.

 

My choices and experiences all come from a practice where CVS files are thrown at you and there is little or no saying over their structure. As CSV files lack metadata on the columns inside other occasionally columns names, they are a constant source of headaches and sub-optimal solutions.

 

With regards to the datatype of Bedrooms and Baths I have merely copied your code and didn't give it any thought. As these are numeric in the source (no quotes around them) they should also be numeric in your data. So read them as numeric.

 

Wow lots to say about a seven line input file. I hope you find some value in it.

 

Regards,

- Jan

View solution in original post

9 REPLIES 9
jklaverstijn
Rhodochrosite | Level 12

Your fourth line misses a comma between the "bathroom" and 'baths' columns. As a result the input statement gets out of sync and every effect you see is a result of that.

 

The dollar sign will only be added if you apply the proper format. For this you must convert the variable Price to numeric The input format with the quotes around it dictate it to be a string. Use the input function to convert and the DOLLAR format to present the data.

 

Hope this helps,

- Jan.

jklaverstijn
Rhodochrosite | Level 12
My sample code to do this:

data orion.locationrawfile;
infile "raw.txt" dlm=',' dsd ;
length price $15;
length price_num 8;
length Address $20;
input Style $ Zip Bedroom $ Baths $ Address Price $;
price_num=input(price, comma18.);
run;


proc print data = orion.locationrawfile noobs;
var Style Zip Bedroom Baths Address Price_num;
format price_num dollar14.2;
run;

bondtk
Quartz | Level 8

Hi

 

Thanks for your help, your code worked perfectly.  I just need to ask for my learning as I am new to SAS.

 

I have used this code to get the exact outcome , instead of creating new variable Price_num , I used informat for price.

as my code is:

 

data orion.locationrawfile;

infile "/folders/myfolders/pg2/location.txt" dlm=',' dsd ;

length Address $20;

input Style $  Zip Bedroom $  Baths $ Address  Price  ;

informat price dollar8.;

 

proc print data = orion.locationrawfile noobs;

var Style Zip Bedroom Baths Address Price;

format price dollar8.;

run;

 

Please advise which is the most efficient way in terms of processing or handling large data.

 

Also noticed you have used $ sign with Bedroom and Baths, so do we take them as character variable,  please advise

as it will really help in understanding these basic concepts.

 

 

jklaverstijn
Rhodochrosite | Level 12

The short answer is: your approach is the more efficient.

 

The longer one is: it depends. It is also different. Depending on your requirements you may or may not want to retain the value of Price unchanged from how it was provided in the csv. The source is, judging by the enclosing quotation marks, character and you convert that to a number while reading. In the (no matter how unlikely) case this field would contain something other that a number you get a missing and you do not record the original value. I have worked in many projects where that would be regarded as a problem. So ultimately the choice between your approach and mine would not only be one of efficiency but also based on existing guidelines and business decisions.

 

My choices and experiences all come from a practice where CVS files are thrown at you and there is little or no saying over their structure. As CSV files lack metadata on the columns inside other occasionally columns names, they are a constant source of headaches and sub-optimal solutions.

 

With regards to the datatype of Bedrooms and Baths I have merely copied your code and didn't give it any thought. As these are numeric in the source (no quotes around them) they should also be numeric in your data. So read them as numeric.

 

Wow lots to say about a seven line input file. I hope you find some value in it.

 

Regards,

- Jan

Tom
Super User Tom
Super User

@jan

Actually the quotes in the CSV text just indicate that the value contains the delimiter. The quotes are there to prevent the comma from signalling a new value on the line. Normal rules for CSV files (and certainly the rules that SAS uses) only add the quotes when they are needed, they are NOT added to all text.  For example try converting the SASHELP.CLASS dataset.

 

285  data _null_;
286    set sashelp.class (obs=3);
287    file log dsd ;
288    put (_all_) (+0);
289  run;

Alfred,M,14,69,112.5
Alice,F,13,56.5,84
Barbara,F,13,65.3,98

 

Tom
Super User Tom
Super User

Efficiency can be tricky. 

 

Reading the PRICE column directly as a number will be slightly more efficient than reading it into a character variable and then converting it to a number. 

 

Also not keeping two variables for the PRICE column will save space in the dataset. But then if some of the values are invalid numbers you will not have any record in your dataset of the actual text that was in the source file.

ballardw
Super User

You may want to consider how long your informat is. Your example works with your data but if you have a value like "1,234,567" it will generate an "invalid data for Price in line xx" because it needs to read 9 characters.

 

Always consider the largest likely value that may be encountered for your data. Since this looks like real estate you might want dollar10. or if you get into commercial values even more digits.

bondtk
Quartz | Level 8
Thanks guys for your help , really appreciate it.
bondtk
Quartz | Level 8
Thanks Jan for your help

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1245 views
  • 3 likes
  • 4 in conversation