I have two datasets one is Main and another is lookup where variable is Jobname ...Now the requirement is jobname can be a wildcard so example it can be ZZZ1
MAIN
ID DATASET JOBNAME
1 CONF ZZZ1TEMP
2. CONF XXX1TEMP
..
LOOKUP
JOBNAME
ZZZ1
I have to do lookup using the variable jobname from lookup and match (like or colon : function ) with Main datast to bring ONLY 1st variable in the output ...I have tried with SQL but that doesn't seems to work ...SET KEY is not going to work as lookup is just starting string of the jobname ...Could somebody share his thought how can we do it?
Does the string always start with the look up value?
What should your output look like?
What did you try that that didn't work and what was the symptom of not working: no output, error, incorrect output?
A single DATA step should do it.
1. Load LOOKUP into a hash table, where JOBNAME is the key.
2. Read records from MAIN, and look for a match. More specifically ...
Take all of JOBNAME from MAIN and look for a match. If there's no match, take all except for last character and look for a match. If still no match, take all except the last two characters and look for a match. Continue chopping off a character from the end of JOBNAME in MAIN until you either find a match or you run out of characters to chop off.
Good luck.
Tried the Hash method as below it's bringing only one record as below you see there should have been two match for STMYF10C However only with volser K63045 came out instead of both M02163 so not sure why it dropped M02163 match for the Jobname ....
M02163 ,30APR2013 ,30DEC2000 ,18:35 ,NNNNDDNN ,Y ,N ,N ,600F ,600F ,C ,SGVTS00 ,MCVTSC19 ,SMSTAPE1 ,TAPMED02 ,17844313 ,ONLY_REMOTE ,STMYF10C ,CHECKCC ,dataset,9999999
K63045 ,30MAY2013 ,01JAN2005 ,12:22 ,NNNNDDNN ,Y ,N ,N ,600F ,600F ,C ,SGVTS00 ,MCVTSC19 ,SMSTAPE1 ,TAPMED02 ,86323916 ,ONLY_REMOTE ,STMYF10C ,CHECKCC ,dataset,9999999
DATA VTSREP_HASH;
IF _N_ = 1 THEN DO;
IF 0 THEN
SET MERGE_FINAL(KEEP=VOLSER
DATE_CACHE
TIME_IN_CACHE
CCM
DATA_INCONSISTENT
PROPERTIES_INCONSISTENT
VOLUME_DAMAGED
PENDING_CATEGORY
CURRENT_CATEGORY
CONSISTENT_TYPE
STORAGE_GROUP
MANAGEMENT_CLASS
STORAGE_CLASS
DATA_CLASS
SIZE
FLAG
DSN
DATE) ;
DECLARE HASH HH(DATASET:"MERGE_FINAL");
RC=HH.DEFINEKEY("CJOB");
RC=HH.DEFINEDATA("VOLSER","DATE_CACHE","TIME_IN_CACHE","CCM",
"DATA_INCONSISTENT","PROPERTIES_INCONSISTENT",
"VOLUME_DAMAGED","PENDING_CATEGORY","CURRENT_CATEGORY",
"CONSISTENT_TYPE","STORAGE_GROUP","MANAGEMENT_CLASS",
"STORAGE_CLASS","DATA_CLASS","SIZE","FLAG",
"DSN","DATE");
RC=HH.DEFINEDONE();
END;
DO UNTIL(EOF);
SET VTSREP END=EOF;
RC=HH.FIND(KEY:CJOB);
IF RC=0 THEN
DO;
OUTPUT VTSREP_HASH;
END;
END;
PROC PRINT DATA=VTSREP_HASH;
Apart from the requirement " bring ONLY 1st variable in the output " which I do not understand, the following query should do the basic lookup:
proc sql;
create table want as
select a.*
from Main as a inner join
Lookup as b on a.jobname eqt b.jobname;
quit;
eqt operator matches strings up to the length of the shortest.
PG
Inner join seems to bring all posible match for example my main had 1329 observation and lookup had 2 observation now in the WANT table i see :-
NOTE: TABLE WORK.WANT CREATED, WITH 2656 ROWS AND 74 COLUMNS.
Actually solution is pretty cool and not a lot of us would have thought about the "eqt" operator. I assume what you observe is that multiple lookup strings match your jobnames (something you haven't provided in your sample data).
A possible approach to get around this would be to filter the result for the "best fit" (=the longest string matching).
data main;
infile datalines truncover dsd;
input (ID DATASET JOBNAME) (:$10.);
datalines;
1,CONF,ZZZ1TEMP
2,CONF,ZZZ2TEMP
3,CONF,Z2TEMP
;
run;
data LOOKUP;
input JOBNAME::$10.;
datalines;
ZZZ1
ZZ
ABC
ZZZ1TEMPYY
;
run;
proc sql;
create table want as
select
a.*,
b.jobname as best_fit_string,
min(length(a.jobname), length(b.jobname)) as length_match_string
from Main as a inner join
Lookup as b on a.jobname eqt b.jobname
group by a.JOBNAME
having max(length_match_string) = length_match_string
;
quit;
Thanks Patrick that was very nice solution except I need it for NOT MATCH(basically exclude those job names) I tried using NE(not equal) in that case that doesn't seems to work(i.e. still result the matching jobname provided via lookup) ...any thought?
hold on a second guess it work ...Thanks much Patrick, verifying the results however didn't want you to start looking again unless i see some glitch and need some more help...thanks again..
You will still get duplicates in cases where the string (jobname) in your main table is shorter than 2 or more partially matching strings in your lookup table. You will need to make a decision which lookup record you want to select in such a case and amend the code I've posted accordingly.
If you just want to get a "match/non match" result then you could simply use a left join with the condition Pierre provided and then use a DISTINCT as done in the code below:
proc sql;
create table want as
select DISTINCT
a.*,
not missing(b.jobname) as match_flg
from
Main as a
left join
Lookup as b
on a.jobname eqt b.jobname
;
quit;
Look like it doesn't work still 😞
For everything in Main with no match to the lookup table some code like below might work:
proc sql;
create table no_match_to_lookup as
select *
from main
where JOBNAME not in
(
select distinct a.jobname
from Main as a, Lookup as b
where a.jobname eqt b.jobname
)
;
quit;
This may work for lookup between absolute jobname between both MAIN and LOOKUP dataset however since LOOKUP dataset may have both condition(Partial Jobname and Complete Jobname) I don't see how it will work ....
In my scenario MAIN dataset has JOBNAME however via LOOKUP users can provide either Partial name OR Complete Jobname ....
To build the list of matches I'm using the "EQT" operator as suggested. The list will also contain matches where a user only provided a partial job name. After that it's just excluding all job names in the list with matches - and there we can use exact matching as the job name in the match list is also sourced from "main".
If the lookup table will always contain the partial job name or the full job name then instead of "EQT" we could eventually also use a "contains" operator. What's right depends on your exact requirement.
Thanks Patrick intiall look says seems to work will try couple of option...thanks so much
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.