Hi there,
I have a problem, my problem is that I have a open text column with data that I have to find all distincts but the date has diferences in many rows, for example:
Description
AMK 100 MG AMP CAT4000173 |
AMK 100 MG AMP REF4000173 |
AMK 500 MG AMP 2 MLREF 4000174 |
AMK 500 MG AMP 2 MLREF4000174 |
AMK 500 MGAMP 2 MLREF4000174 |
AMILASA EN SANGRE |
AMILASA EN SANGRE P |
AGUA PIRRIGAR 1000 BOTE PLASTICO PISA P REF 4000 |
AGUA PIRRIGAR 1000 BOTE PLATICO PISA P REF 40002 |
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF400025 |
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF4000255 |
and I need to a group all of them.
Can you show how that data should be grouped?
Generally one approach would be to "extract" the values of interest into another variable or variables and then work with the extracted values.
You haven't provided much in the way to identify what would need to be extracted or what you would expect the result to look like when finished.
I´m looking for something like this, the problem is that I have more than 60,000 of rows and I need to find all similars
Description | Description_grouped |
AMK 100 MG AMP CAT4000173 | AMK 100 MG AMP |
AMK 100 MG AMP REF4000173 | |
AMK 500 MG AMP 2 MLREF 4000174 | AMK 500 MG AMP 2 |
AMK 500 MG AMP 2 MLREF4000174 | |
AMK 500 MGAMP 2 MLREF4000174 | |
AMILASA EN SANGRE | AMILASA EN SANGRE |
AMILASA EN SANGRE P | |
AGUA PIRRIGAR 1000 BOTE PLASTICO PISA P REF 4000 | AGUA PIRRIGAR 1000 BOTE PLASTICO PISA |
AGUA PIRRIGAR 1000 BOTE PLATICO PISA P REF 40002 | |
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF400025 | AGUA PIRRIGAR 500 BOTE PLASTICO PISA |
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF4000255 |
Thanks.
Do you have any rules involved? Such as ignore everything after MLREF, CAT, REF or " P "?
No, I don´t have any rul jey, I it will help, but not with those that miss an space in the meddile
data WANT;
set HAE;
length GROUP $50;
re=prxparse('s/REF/""/');
POS=prxmatch(RE,DESC);
if POS=0 then do;
re=prxparse('s/CAT/""/');
POS=prxmatch(RE,DESC);
if POS=0 then do;
re=prxparse('s/sP/""/');
POS=prxmatch(RE,DESC);
end;
end;
GROUP=sbsustr(DESC,sum(POS,-1));
if scan(GROUP,-1,"")='P' then GROUP=substr(GROUP,1,length(GROUP)-1);
run;
I bet some contributors can help and make this code even better. But it does the work, for now.
this program will give you a list of the words in your long text string;
the frequency listing will show you you how many occurrences.
DATA unique(keep = word);
attrib word length = $32;*need to increase?;
drop _:; *_temp vars;
set long_text_strings;
do _i = 1 to countw(text,' ');
word = scan(text,_i);
output;
end;
run;
PROC freq data = &syslast
order = freq;
tables word /list /* noprint */
out = list_words;
run;
then make a list of the words you want to find.
make a variable for each word
run a similar loop where you indicate found in the variable
and then do a cross-tabulation.
a start, eh?
Ron Fehd search, find, xtab maven
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.