The SAS Output Delivery System and reporting techniques

lookup based on wildcard

Reply
Frequent Contributor
Posts: 78

lookup based on wildcard

I have two datasets one is Main and another is lookup where variable is Jobname ...Now the requirement is jobname can be a wildcard so example it can be ZZZ1

MAIN

ID DATASET JOBNAME

1  CONF        ZZZ1TEMP

2. CONF        XXX1TEMP

..

LOOKUP

JOBNAME

ZZZ1

I have to do lookup using the variable jobname from lookup and match (like or colon : function ) with Main datast to bring ONLY 1st variable in the output ...I have tried with SQL but that doesn't seems to work ...SET KEY is not going to work as lookup is just starting string of the jobname ...Could somebody share his thought how can we do it?

Super User
Posts: 11,343

Re: lookup based on wildcard

Does the string always start with the look up value?

What should your output look like?

What did you try that that didn't work and what was the symptom of not working: no output, error, incorrect output?

Super User
Posts: 5,518

Re: lookup based on wildcard

A single DATA step should do it.

1. Load LOOKUP into a hash table, where JOBNAME is the key.

2. Read records from MAIN, and look for a match.  More specifically ...

Take all of JOBNAME from MAIN and look for a match.  If there's no match, take all except for last character and look for a match.  If still no match, take all except the last two characters and look for a match.  Continue chopping off a character from the end of JOBNAME in MAIN until you either find a match or you run out of characters to chop off.

Good luck.

Frequent Contributor
Posts: 78

Re: lookup based on wildcard

Posted in reply to Astounding

Tried the Hash method as below it's bringing only one record as below you see there should have been two match for STMYF10C However only with volser K63045 came out instead of both M02163 so not sure why it dropped M02163 match for the Jobname ....

M02163 ,30APR2013 ,30DEC2000 ,18:35 ,NNNNDDNN ,Y ,N ,N ,600F ,600F ,C ,SGVTS00 ,MCVTSC19 ,SMSTAPE1 ,TAPMED02 ,17844313 ,ONLY_REMOTE ,STMYF10C ,CHECKCC ,dataset,9999999

K63045 ,30MAY2013 ,01JAN2005 ,12:22 ,NNNNDDNN ,Y ,N ,N ,600F ,600F ,C ,SGVTS00 ,MCVTSC19 ,SMSTAPE1 ,TAPMED02 ,86323916 ,ONLY_REMOTE ,STMYF10C ,CHECKCC ,dataset,9999999

DATA VTSREP_HASH;                                                      

  IF _N_ = 1 THEN DO;                                                  

        IF 0 THEN                                                      

                SET MERGE_FINAL(KEEP=VOLSER                            

                                     DATE_CACHE                        

                                     TIME_IN_CACHE                     

                                     CCM                               

                                     DATA_INCONSISTENT                 

                                     PROPERTIES_INCONSISTENT           

                                     VOLUME_DAMAGED                    

                                     PENDING_CATEGORY                  

                                     CURRENT_CATEGORY                  

                                     CONSISTENT_TYPE                   

                                     STORAGE_GROUP                     

                                     MANAGEMENT_CLASS                  

                                     STORAGE_CLASS                     

                                     DATA_CLASS                        

                                     SIZE                              

                                     FLAG                              

                                     DSN                               

                                     DATE) ;                           

DECLARE HASH HH(DATASET:"MERGE_FINAL");                               

RC=HH.DEFINEKEY("CJOB");                                              

RC=HH.DEFINEDATA("VOLSER","DATE_CACHE","TIME_IN_CACHE","CCM",         

               "DATA_INCONSISTENT","PROPERTIES_INCONSISTENT",          

               "VOLUME_DAMAGED","PENDING_CATEGORY","CURRENT_CATEGORY", 

               "CONSISTENT_TYPE","STORAGE_GROUP","MANAGEMENT_CLASS",   

               "STORAGE_CLASS","DATA_CLASS","SIZE","FLAG",             

               "DSN","DATE");                                          

RC=HH.DEFINEDONE();                                                    

END;                                                                  

DO UNTIL(EOF);                                                         

SET VTSREP END=EOF;                                                   

RC=HH.FIND(KEY:CJOB);                                                 

IF RC=0 THEN                                                          

DO;                                                                   

OUTPUT VTSREP_HASH;                                                   

END;                                                                  

END;                

PROC PRINT DATA=VTSREP_HASH;  

Respected Advisor
Posts: 4,935

Re: lookup based on wildcard

Apart from the requirement " bring ONLY 1st variable in the output " which I do not understand, the following query should do the basic lookup:

proc sql;

create table want as

select a.*

from Main as a inner join

Lookup as b on a.jobname eqt b.jobname;

quit;

eqt operator matches strings up to the length of the shortest.

PG

PG
Frequent Contributor
Posts: 78

Re: lookup based on wildcard

Inner join seems to bring all posible match for example my main had 1329 observation and lookup had 2 observation now in the WANT table i see :-

NOTE: TABLE WORK.WANT CREATED, WITH 2656 ROWS AND 74 COLUMNS.       

Respected Advisor
Posts: 4,173

Re: lookup based on wildcard

Actually solution is pretty cool and not a lot of us would have thought about the "eqt" operator. I assume what you observe is that multiple lookup strings match your jobnames (something you haven't provided in your sample data).

A possible approach to get around this would be to filter the result for the "best fit" (=the longest string matching).

data main;

  infile datalines truncover dsd;

  input (ID DATASET JOBNAME) (:$10.);

  datalines;

1,CONF,ZZZ1TEMP

2,CONF,ZZZ2TEMP

3,CONF,Z2TEMP

;

run;

data LOOKUP;

  input JOBNAME::$10.;

  datalines;

ZZZ1

ZZ

ABC

ZZZ1TEMPYY

;

run;

proc sql;

  create table want as

    select

      a.*,

      b.jobname as best_fit_string,

      min(length(a.jobname), length(b.jobname)) as length_match_string

      from Main as a inner join

        Lookup as b on a.jobname eqt b.jobname

      group by a.JOBNAME

      having max(length_match_string) = length_match_string

    ;

quit;

Frequent Contributor
Posts: 78

Re: lookup based on wildcard

Thanks Patrick that was very nice solution except I need it for NOT MATCH(basically exclude those job names)  I tried using NE(not equal) in that case that doesn't seems to work(i.e. still result the matching jobname provided via lookup) ...any thought?

Frequent Contributor
Posts: 78

Re: lookup based on wildcard

hold on a second guess it work ...Thanks much Patrick, verifying the results however didn't want you to start looking again unless i see some glitch and need some more help...thanks again..

Respected Advisor
Posts: 4,173

Re: lookup based on wildcard

You will still get duplicates in cases where the string (jobname) in your main table is shorter than 2 or more partially matching strings in your lookup table. You will need to make a decision which lookup record you want to select in such a case and amend the code I've posted accordingly.

If you just want to get a "match/non match" result then you could simply use a left join with the condition Pierre provided and then use a DISTINCT as done in the code below:

proc sql;

  create table want as

    select DISTINCT

      a.*,

      not missing(b.jobname) as match_flg

      from 

        Main as a

        left join

        Lookup as b

        on a.jobname eqt b.jobname

    ;

quit;

Frequent Contributor
Posts: 78

Re: lookup based on wildcard

Look like it doesn't work still :-(

Respected Advisor
Posts: 4,173

Re: lookup based on wildcard

For everything in Main with no match to the lookup table some code like below might work:

proc sql;

  create table no_match_to_lookup as

    select *

    from main

    where JOBNAME not in

      (

        select  distinct a.jobname     

        from Main as a, Lookup as b

        where a.jobname eqt b.jobname

      )

    ;

quit;

Frequent Contributor
Posts: 78

Re: lookup based on wildcard

This may work for lookup between absolute jobname between both MAIN and LOOKUP dataset however since LOOKUP dataset may have both condition(Partial Jobname and Complete Jobname) I don't see how it will work ....

In my scenario MAIN dataset has JOBNAME however via LOOKUP users can provide either Partial name OR Complete Jobname ....

Respected Advisor
Posts: 4,173

Re: lookup based on wildcard

To build the list of matches I'm using the "EQT" operator as suggested. The list will also contain matches where a user only provided a partial job name. After that it's just excluding all job names in the list with matches - and there we can use exact matching as the job name in the match list is also sourced from "main".

If the lookup table will always contain the partial job name or the full job name then instead of "EQT" we could eventually also use a "contains" operator. What's right depends on your exact requirement.

Frequent Contributor
Posts: 78

Re: lookup based on wildcard

Thanks Patrick intiall look says seems to work will try couple of option...thanks so much

Ask a Question
Discussion stats
  • 16 replies
  • 1029 views
  • 1 like
  • 6 in conversation