Reading in difficult raw data

Reply
Occasional Contributor
Posts: 10

Reading in difficult raw data

Hi,

I have some data in an Excel csv file (attached) which I have scraped off the web. The scraping software gets everything off the website: the overview section, the features section, and the specifications section for each camera. I am actually only interested in the specifications.

My goal is to eventually create a separate variable for each specification, for eg, digital zoom, which is equal to 10 for camera model X; Effective Pixels=15, etc. I don't know if it is possible to do this using SAS? For eg, search for a specific word, eg, digital zoom and extract the next numeric value, kind of thing.

Anyway,  my first challenge/step I think, is to combine all the columns into one as the software has dumped the data into different columns. Then I need to delete everything before the word "specifications" which appears before the start of the specifications sections. And then I need to delete everything after the word "display languages" as that is the end of the specifications section.

Then I can begin the extraction process. I would appreciate it if you could let me know how to proceed in terms of the SAS code to use. If you also have any suggestions on a better way to proceed, please let me know.

Note: the delimiter that the scraping software used is the "|" just to separate the model name from the rest of the data.

Look forward to hearing from you,

Thanks very much in advance,

Kelly

Attachment
Respected Advisor
Posts: 3,777

Re: Reading in difficult raw data

Not comma delimited.  There are only two fields ModelName and specs, and they are pipe delimited.

You can use the power of the input statement and or SAS functions to ferret out the data you seek.

Ask a Question
Discussion stats
  • 1 reply
  • 172 views
  • 0 likes
  • 2 in conversation