Re: Compare values in the same column

CEHA55 · Posted 09-07-2017 03:25 PM

Hi there,

I have a problem, my problem is that I have a open text column with data that I have to find all distincts but the date has diferences in many rows, for example:

Description

AMK 100 MG AMP CAT4000173

AMK 100 MG AMP REF4000173

AMK 500 MG AMP 2 MLREF 4000174

AMK 500 MG AMP 2 MLREF4000174

AMK 500 MGAMP 2 MLREF4000174

AMILASA EN SANGRE

AMILASA EN SANGRE P

AGUA PIRRIGAR 1000 BOTE PLASTICO PISA P REF 4000

AGUA PIRRIGAR 1000 BOTE PLATICO PISA P REF 40002

AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF400025

AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF4000255

and I need to a group all of them.

ballardw · Posted 09-07-2017 03:29 PM

Can you show how that data should be grouped?

Generally one approach would be to "extract" the values of interest into another variable or variables and then work with the extracted values.

You haven't provided much in the way to identify what would need to be extracted or what you would expect the result to look like when finished.

CEHA55 · Posted 09-07-2017 03:39 PM

I´m looking for something like this, the problem is that I have more than 60,000 of rows and I need to find all similars

Description	Description_grouped
AMK 100 MG AMP CAT4000173	AMK 100 MG AMP
AMK 100 MG AMP REF4000173
AMK 500 MG AMP 2 MLREF 4000174	AMK 500 MG AMP 2
AMK 500 MG AMP 2 MLREF4000174
AMK 500 MGAMP 2 MLREF4000174
AMILASA EN SANGRE	AMILASA EN SANGRE
AMILASA EN SANGRE P
AGUA PIRRIGAR 1000 BOTE PLASTICO PISA P REF 4000	AGUA PIRRIGAR 1000 BOTE PLASTICO PISA
AGUA PIRRIGAR 1000 BOTE PLATICO PISA P REF 40002
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF400025	AGUA PIRRIGAR 500 BOTE PLASTICO PISA
AGUA PIRRIGAR 500 BOTE PLASTICO PISA REF4000255

Thanks.

ballardw · Posted 09-07-2017 04:00 PM

Do you have any rules involved? Such as ignore everything after MLREF, CAT, REF or " P "?

CEHA55 · Posted 09-07-2017 05:12 PM

No, I don´t have any rul jey, I it will help, but not with those that miss an space in the meddile

ShiroAmada · Posted 09-08-2017 01:50 AM

data WANT;

set HAE;

length GROUP $50;

re=prxparse('s/REF/""/');

POS=prxmatch(RE,DESC);

if POS=0 then do;

re=prxparse('s/CAT/""/');

POS=prxmatch(RE,DESC);

if POS=0 then do;

re=prxparse('s/sP/""/');

POS=prxmatch(RE,DESC);

end;

GROUP=sbsustr(DESC,sum(POS,-1));

if scan(GROUP,-1,"")='P' then GROUP=substr(GROUP,1,length(GROUP)-1);

run;

I bet some contributors can help and make this code even better. But it does the work, for now.

Ron_MacroMaven · Posted 09-07-2017 03:39 PM

this program will give you a list of the words in your long text string;

the frequency listing will show you you how many occurrences.

DATA unique(keep = word);
     attrib word length = $32;*need to increase?;
     drop _:; *_temp vars;
set long_text_strings;
do _i = 1 to countw(text,' ');
   word = scan(text,_i);
   output;
   end;
run;
PROC freq data   = &syslast
          order  = freq;
          tables   word /list /* noprint */
             out = list_words;
run;

then make a list of the words you want to find.

make a variable for each word

run a similar loop where you indicate found in the variable

and then do a cross-tabulation.

a start, eh?

Ron Fehd search, find, xtab maven

Classroom Training Available!