Hi there,
I have a problem, my problem is that I have a open text column with data that I have to find all distincts but the date has diferences in many rows, for example:
Description
AMK 100 MG AMP CAT4000173 |
AMK 100 MG AMP REF4000173 |
AMK 500 MG AMP 2 MLREF 4000174 |
AMK 500 MG AMP 2 MLREF4000174 |
AMK 500 MGAMP 2 MLREF4000174 |
AMILASA EN SANGRE |
AMILASA EN SANGRE P |
AGUA PIRRIGAR 1000 BOTE PLASTICO PISA P REF 4000 |
AGUA PIRRIGAR 1000 BOTE PLATICO PISA P REF 40002 |
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF400025 |
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF4000255 |
and I need to a group all of them.
Can you show how that data should be grouped?
Generally one approach would be to "extract" the values of interest into another variable or variables and then work with the extracted values.
You haven't provided much in the way to identify what would need to be extracted or what you would expect the result to look like when finished.
I´m looking for something like this, the problem is that I have more than 60,000 of rows and I need to find all similars
Description | Description_grouped |
AMK 100 MG AMP CAT4000173 | AMK 100 MG AMP |
AMK 100 MG AMP REF4000173 | |
AMK 500 MG AMP 2 MLREF 4000174 | AMK 500 MG AMP 2 |
AMK 500 MG AMP 2 MLREF4000174 | |
AMK 500 MGAMP 2 MLREF4000174 | |
AMILASA EN SANGRE | AMILASA EN SANGRE |
AMILASA EN SANGRE P | |
AGUA PIRRIGAR 1000 BOTE PLASTICO PISA P REF 4000 | AGUA PIRRIGAR 1000 BOTE PLASTICO PISA |
AGUA PIRRIGAR 1000 BOTE PLATICO PISA P REF 40002 | |
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF400025 | AGUA PIRRIGAR 500 BOTE PLASTICO PISA |
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF4000255 |
Thanks.
Do you have any rules involved? Such as ignore everything after MLREF, CAT, REF or " P "?
No, I don´t have any rul jey, I it will help, but not with those that miss an space in the meddile
data WANT;
set HAE;
length GROUP $50;
re=prxparse('s/REF/""/');
POS=prxmatch(RE,DESC);
if POS=0 then do;
re=prxparse('s/CAT/""/');
POS=prxmatch(RE,DESC);
if POS=0 then do;
re=prxparse('s/sP/""/');
POS=prxmatch(RE,DESC);
end;
end;
GROUP=sbsustr(DESC,sum(POS,-1));
if scan(GROUP,-1,"")='P' then GROUP=substr(GROUP,1,length(GROUP)-1);
run;
I bet some contributors can help and make this code even better. But it does the work, for now.
this program will give you a list of the words in your long text string;
the frequency listing will show you you how many occurrences.
DATA unique(keep = word);
attrib word length = $32;*need to increase?;
drop _:; *_temp vars;
set long_text_strings;
do _i = 1 to countw(text,' ');
word = scan(text,_i);
output;
end;
run;
PROC freq data = &syslast
order = freq;
tables word /list /* noprint */
out = list_words;
run;
then make a list of the words you want to find.
make a variable for each word
run a similar loop where you indicate found in the variable
and then do a cross-tabulation.
a start, eh?
Ron Fehd search, find, xtab maven
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.