Solved: Word list search - Page 2

PGStats · Posted 06-03-2015 02:51 PM

Please post your new idea as a new discussion...

PG

Zachary · Posted 06-03-2015 04:23 PM

Sorry - did not see your response there...

Zachary · Posted 06-03-2015 04:12 PM

I am very close to a final solution:. I think I can take care of it with a lot of labels:

proc sql;
create table scoresL as
select CLAIMNO, COMMENTTEXT, Text4Matching, NumberForLabels
    from
    SRS_Comments500 inner join
    TextFromExcel on indexw(COMMENTTEXT, Text4Matching)>0
order by CLAIMNO, COMMENTTEXT, Text4Matching;
quit;
proc print data=scoresL noobs; run;

proc transpose data=scoresL out=FlaggedCommentsT(drop=_:) prefix=flag_;
by CLAIMNO;
id NumberForLabels;
run;
proc print data=FlaggedCommentsT noobs; run;

I just need to create a routine to label each of the numbered fields.

But, I am getting lots of errors involved with this:

The ID value, "flag_6" occurs twice in the same BY group.

I cannot figure what is happening here. Any suggestions?

gergely_batho · Posted 06-03-2015 04:33 PM

Have you tried to use the tables that are created by the Text Mining Nodes? Those create exactly the same tables (usually in long format, not transposed).

Also, if you have ideas, I would suggest posting into: "Text and Content Analytics" forum. (I wonder what text miners think about measuring pairwise correlations and selecting the best ~100 variables for predictive modeling.)

If you have technical questions (like this one), this forum is adequate.

Zachary · Posted 06-03-2015 04:46 PM

Thank you Gergely. Yes, I have tried using the tables created using the Text Mining Nodes.

The unfortunate part about the Tex and Content Analytics forum is that there is not usually a lot of traffic in there.

But I promise to keep my questions within this forum on the technical side. Thank you so much!

Zachary · Posted 06-04-2015 11:56 AM

Below is the final code that accomplishes what I needed. I have a few comments interspersed throughout here, but please let me know if you have any further questions or concerns.

And thank you to everyone who chimed in here:

proc import out = TextFromExcel datafile = "\\SAS01\sasusers\Shared\PredictiveModeling\PredictiveModelingCPM\TextToSAS.xlsx" dbms=xlsx replace;
sheet="Sheet1";
getnames=yes;
run;

data TextFromExcel;
set TextFromExcel;
Text = lowcase(Text);
rename Text = Text4Matching;
run;

* Duplicates were removed in Excel;

libname Comments "\\SAS01\sasusers\Shared\PredictiveModeling\PredictiveModelingCPM\CommentsAnalysis";

* Long version, instead of wide;
proc sql;
create table scoresL as
select CLAIMNO, COMMENTTEXT, Text4Matching, NumberForLabels
    from
    SRS_Comments500 inner join
    TextFromExcel on indexw(COMMENTTEXT, Text4Matching)>0
order by CLAIMNO, COMMENTTEXT, Text4Matching;
quit;
proc print data=scoresL noobs; run;

* Removing situations where a particular code duplicates across comments - Be sure to also remove the comment so it is not an issue now either;
proc sort data = scoresL (drop = COMMENTTEXT Text4Matching) nodupkey;
by CLAIMNO NumberForLabels;
run;

* Getting a column of 1s in here so that when the data is transposed it has the pattern of 1s for each CLAIMNO;
data scoresL;
set scoresL;
Dichotomous_Counter = 1;
run;

* But I need to reorganize this to be able to spread it out to be wide now;
proc transpose data=scoresL out=FlaggedCommentsT prefix=flag_ name=Transposed_Column;
by CLAIMNO;
id NumberForLabels;
run;
proc print data=FlaggedCommentsT noobs; run;

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Re: Word list search

Registration is open

SAS Training: Just a Click Away