Hi everyone , I have scraped an article from a website . The website had paragraphs written in a series of <p> tags , as a result , the dataset I created consists of one paragraph on one line.
I want to create a variable that consists of approximately 5000 words in a column called article consisting of various paragraphs, but I am unable to figure out how to read multiple data set lines into one single column.
Example Dataset looks something like this :
1)Alphabet's Google has reached licensing deals with over 600 news outlets around the world and is seeing a "huge increase" in users requesting more content from specific publications as part of a new programme, it said on Wednesday
2)The update comes as big Internet service providers including Facebook have been locked in bitter disputes over fair compensation to publishers
Expected result
para = Alphabet's Google has reached licensing deals with over 600 news outlets around the world and is seeing a "huge increase" in users requesting more content from specific publications as part of a new programme, it said on Wednesday. The update comes as big Internet service providers including Facebook have been locked in bitter disputes over fair compensation to publishers.
data paragraphs;
input story & $800.;
datalines;
Alphabet's Google has reached licensing deals with over 600 news outlets around the world and is seeing a "huge increase" in users requesting more content from specific publications as part of a new programme, it said on Wednesday
The update comes as big Internet service providers including Facebook have been locked in bitter disputes over fair compensation to publishers.
;
Here's one way, I believe. There are certainly other solutions that are crafty and innovative, but I think this one is reasonably intuitive. I'd imagine you're going to need to give these some grouping ID or sequence ID if you're dealing with a lot of <p> tags.
data paragraphs;
input story & $800.;
datalines;
Alphabet's Google has reached licensing deals with over 600 news outlets around the world and is seeing a "huge increase" in users requesting more content from specific publications as part of a new programme, it said on Wednesday
The update comes as big Internet service providers including Facebook have been locked in bitter disputes over fair compensation to publishers.
;
proc transpose data = paragraphs out = paragraphs_t;
var story;
run;
data paragraphs_want;
length want_catx $800.; /* Set arbitrary length - CATX defaults to 200, which won't fit your needs. */
set paragraphs_t;
want_catx = catx(". ", col1, col2);
run;
Obs want_catx 1 Alphabet's Google has reached licensing deals with over 600 news outlets around the world and is seeing a "huge increase" in users requesting more content from specific publications as part of a new programme, it said on Wednesday. The update comes as big Internet service providers including Facebook have been locked in bitter disputes over fair compensation to publishers.
Here's one way, I believe. There are certainly other solutions that are crafty and innovative, but I think this one is reasonably intuitive. I'd imagine you're going to need to give these some grouping ID or sequence ID if you're dealing with a lot of <p> tags.
data paragraphs;
input story & $800.;
datalines;
Alphabet's Google has reached licensing deals with over 600 news outlets around the world and is seeing a "huge increase" in users requesting more content from specific publications as part of a new programme, it said on Wednesday
The update comes as big Internet service providers including Facebook have been locked in bitter disputes over fair compensation to publishers.
;
proc transpose data = paragraphs out = paragraphs_t;
var story;
run;
data paragraphs_want;
length want_catx $800.; /* Set arbitrary length - CATX defaults to 200, which won't fit your needs. */
set paragraphs_t;
want_catx = catx(". ", col1, col2);
run;
Obs want_catx 1 Alphabet's Google has reached licensing deals with over 600 news outlets around the world and is seeing a "huge increase" in users requesting more content from specific publications as part of a new programme, it said on Wednesday. The update comes as big Internet service providers including Facebook have been locked in bitter disputes over fair compensation to publishers.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.