BookmarkSubscribeRSS Feed
CEHA55
Calcite | Level 5

Hi there,

 

I have a problem, my problem is that I have a open text column with data that I have to find all distincts but the date has diferences in many rows, for example:

 

Description 

AMK 100 MG AMP CAT4000173
AMK 100 MG AMP REF4000173
AMK 500 MG AMP 2 MLREF 4000174
AMK 500 MG AMP 2 MLREF4000174
AMK 500 MGAMP 2 MLREF4000174
AMILASA EN SANGRE
AMILASA EN SANGRE P
AGUA PIRRIGAR 1000  BOTE PLASTICO  PISA P REF 4000
AGUA PIRRIGAR 1000  BOTE PLATICO  PISA P REF 40002
AGUA PIRRIGAR 500  BOTE PLASTICO  PISA REF400025
AGUA PIRRIGAR 500  BOTE PLASTICO  PISA REF4000255

 

and I need to a group all of them.

 

 

6 REPLIES 6
ballardw
Super User

Can you show how that data should be grouped?

Generally one approach would be to "extract" the values of interest into another variable or variables and then work with the extracted values.

You haven't provided much in the way to identify what would need to be extracted or what you would expect the result to look like when finished.

CEHA55
Calcite | Level 5

I´m looking for something like this, the problem is that I have more than 60,000 of rows and I need to find all similars 

 

 

 

Description Description_grouped
AMK 100 MG AMP CAT4000173AMK 100 MG AMP 
AMK 100 MG AMP REF4000173
AMK 500 MG AMP 2 MLREF 4000174AMK 500 MG AMP 2 
AMK 500 MG AMP 2 MLREF4000174
AMK 500 MGAMP 2 MLREF4000174
AMILASA EN SANGREAMILASA EN SANGRE
AMILASA EN SANGRE P
AGUA PIRRIGAR 1000  BOTE PLASTICO  PISA P REF 4000AGUA PIRRIGAR 1000  BOTE PLASTICO  PISA 
AGUA PIRRIGAR 1000  BOTE PLATICO  PISA P REF 40002
AGUA PIRRIGAR 500  BOTE PLASTICO  PISA REF400025AGUA PIRRIGAR 500  BOTE PLASTICO  PISA 
AGUA PIRRIGAR 500  BOTE PLASTICO  PISA REF4000255

 

Thanks.

ballardw
Super User

Do you have any rules involved? Such as ignore everything after MLREF, CAT, REF or " P "?

CEHA55
Calcite | Level 5

No, I don´t have any rul jey, I it will help, but not with those that miss an space in the meddile

ShiroAmada
Lapis Lazuli | Level 10

data WANT;

  set HAE;

  length GROUP $50;

re=prxparse('s/REF/""/');

POS=prxmatch(RE,DESC);

  if POS=0 then do;

    re=prxparse('s/CAT/""/');

    POS=prxmatch(RE,DESC);

      if POS=0 then do;

        re=prxparse('s/sP/""/');

        POS=prxmatch(RE,DESC);

      end;

  end;

GROUP=sbsustr(DESC,sum(POS,-1));

 

if scan(GROUP,-1,"")='P' then GROUP=substr(GROUP,1,length(GROUP)-1);

run;

 

I bet some contributors can help and make this code even better.  But it does the work, for now.

Ron_MacroMaven
Lapis Lazuli | Level 10

this program will give you a list of the words in your long text string;

the frequency listing will show you you how many occurrences.

 

DATA unique(keep = word);
     attrib word length = $32;*need to increase?;
     drop _:; *_temp vars;
set long_text_strings;
do _i = 1 to countw(text,' ');
   word = scan(text,_i);
   output;
   end;
run;
PROC freq data   = &syslast
          order  = freq;
          tables   word /list /* noprint */
             out = list_words;
run;

then make a list of the words you want to find.

make a variable for each word

 

run a similar loop where you indicate found in the variable

and then do a cross-tabulation.

 

a start, eh?

 

Ron Fehd search, find, xtab maven

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1802 views
  • 1 like
  • 4 in conversation