Please post your new idea as a new discussion...
PG
Sorry - did not see your response there...
I am very close to a final solution:. I think I can take care of it with a lot of labels:
proc sql;
create table scoresL as
select CLAIMNO, COMMENTTEXT, Text4Matching, NumberForLabels
from
SRS_Comments500 inner join
TextFromExcel on indexw(COMMENTTEXT, Text4Matching)>0
order by CLAIMNO, COMMENTTEXT, Text4Matching;
quit;
proc print data=scoresL noobs; run;
proc transpose data=scoresL out=FlaggedCommentsT(drop=_:) prefix=flag_;
by CLAIMNO;
id NumberForLabels;
run;
proc print data=FlaggedCommentsT noobs; run;
I just need to create a routine to label each of the numbered fields.
But, I am getting lots of errors involved with this:
The ID value, "flag_6" occurs twice in the same BY group.
I cannot figure what is happening here. Any suggestions?
Have you tried to use the tables that are created by the Text Mining Nodes? Those create exactly the same tables (usually in long format, not transposed).
Also, if you have ideas, I would suggest posting into: "Text and Content Analytics" forum. (I wonder what text miners think about measuring pairwise correlations and selecting the best ~100 variables for predictive modeling.)
If you have technical questions (like this one), this forum is adequate.
Thank you Gergely. Yes, I have tried using the tables created using the Text Mining Nodes.
The unfortunate part about the Tex and Content Analytics forum is that there is not usually a lot of traffic in there.
But I promise to keep my questions within this forum on the technical side. Thank you so much!
Below is the final code that accomplishes what I needed. I have a few comments interspersed throughout here, but please let me know if you have any further questions or concerns.
And thank you to everyone who chimed in here:
proc import out = TextFromExcel datafile = "\\SAS01\sasusers\Shared\PredictiveModeling\PredictiveModelingCPM\TextToSAS.xlsx" dbms=xlsx replace;
sheet="Sheet1";
getnames=yes;
run;
data TextFromExcel;
set TextFromExcel;
Text = lowcase(Text);
rename Text = Text4Matching;
run;
* Duplicates were removed in Excel;
libname Comments "\\SAS01\sasusers\Shared\PredictiveModeling\PredictiveModelingCPM\CommentsAnalysis";
* Long version, instead of wide;
proc sql;
create table scoresL as
select CLAIMNO, COMMENTTEXT, Text4Matching, NumberForLabels
from
SRS_Comments500 inner join
TextFromExcel on indexw(COMMENTTEXT, Text4Matching)>0
order by CLAIMNO, COMMENTTEXT, Text4Matching;
quit;
proc print data=scoresL noobs; run;
* Removing situations where a particular code duplicates across comments - Be sure to also remove the comment so it is not an issue now either;
proc sort data = scoresL (drop = COMMENTTEXT Text4Matching) nodupkey;
by CLAIMNO NumberForLabels;
run;
* Getting a column of 1s in here so that when the data is transposed it has the pattern of 1s for each CLAIMNO;
data scoresL;
set scoresL;
Dichotomous_Counter = 1;
run;
* But I need to reorganize this to be able to spread it out to be wide now;
proc transpose data=scoresL out=FlaggedCommentsT prefix=flag_ name=Transposed_Column;
by CLAIMNO;
id NumberForLabels;
run;
proc print data=FlaggedCommentsT noobs; run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.