Dear Experts,
I share the sample dataset which matches my requirement for the study.
I have two datasets contains 2 columns varX varY.
data have1;
input varX$ varY$;
cards;
abc C1
def C2
ghi C3
jkl C4
mno C5
;
The above code list the alphabets individually refers to C_value.
The below code refers to the combination of alphabets, So according to alphabets I need to match the C_values on varY1.
data have2;
input varX1$;
cards;
abc/def
ghi/mno
jkl/def
;
Required Output like:
abc/def | C1/C2 |
ghi/mno | C3/C5 |
jkl/def | C4/C2 |
Kindly suggest a code to solve the task.
Thanks in advance!
Hash-object to the rescue ... maybe ...
Not sure that i understood the c0-problem at all, updated have-datesets would be useful.
data want;
if 0 then set have1;
set have2;
length varY1 $ 10;
if _n_ = 1 then do;
declare hash h(dataset: 'have1');
h.defineKey('varx');
h.defineData('vary');
h.defineDone();
end;
varY1 = '';
do i = 1 to countw(varx1, '/');
varX = scan(varX1, i, '/');
if h.find() = 0 then do;
varY1 = catx('/', varY1, varY);
end;
else do;
varY1 = catx('/', varY1, 'C0');
end;
end;
keep VarX1 VarY1;
run;
You can try this code:
data _null_;
set have1;
call symputx(varX,varY);
run;
data want;
set have2;
varY1 = catx("/",symget(scan(varX1,1,"/")),symget(scan(varX1,2,"/")));
run;
Thank you so much @ed_sas_member @gamotte
Both of your code works well.
Additionally, I required one more condition for this task.
Assume that I mentioned any new combination of 3alphabets in (have1(varX1)) which not listed in (varX(have))
How could I get the result as 'C0' in varY1.
For example :
abc/mno/uvx - C1/C5/C0
ghi/pqr/jkl - C3/C0/C4
It required because Once I found the C0 so I could add that new value in my list manually like
uvx - C6
pqr - C7
You can use the coalescec function :
proc sql noprint;
CREATE TABLE want AS
SELECT a.varX1, catx('/', coalescec(b.varY,'C0'), coalescec(c.varY,'C0'), coalescec(d.varY,'C0')) AS newVar
FROM have2 a
LEFT JOIN have1 b
ON b.varX=scan(a.varX1,1,'/')
LEFT JOIN have1 c
ON c.varX=scan(a.varX1,2,'/')
LEFT JOIN have1 d
ON d.varX=scan(a.varX1,3,'/')
ORDER BY a.varX1
;
quit;
I appreciate your response @gamotte
It results in C0 for the new value but also at the end of all observation were it not requires.
like:
abc/def - C1/C2/C0
ghi/mno - C3/C5/C0
Could you please help me to sort out the C0 at the unrequired Observations.
Much Thanks!
Here is another approach with a format generated by the dataset have1 :
data ctrl;
length label $ 11;
set have1(rename=(varX=start varY=label)) end=last;
end=start;
retain fmtname 'CN' type 'c';
output;
if last then do;
hlo='O';
label='C0';
output;
end;
run;
proc format library=work cntlin=ctrl;
run;
data want;
set have2;
length newvar $20;
do i=1 to countw(varX1,'/');
call catx('/', newVar, put(scan(varX1,i,'/'),CN.));
end;
run;
Hash-object to the rescue ... maybe ...
Not sure that i understood the c0-problem at all, updated have-datesets would be useful.
data want;
if 0 then set have1;
set have2;
length varY1 $ 10;
if _n_ = 1 then do;
declare hash h(dataset: 'have1');
h.defineKey('varx');
h.defineData('vary');
h.defineDone();
end;
varY1 = '';
do i = 1 to countw(varx1, '/');
varX = scan(varX1, i, '/');
if h.find() = 0 then do;
varY1 = catx('/', varY1, varY);
end;
else do;
varY1 = catx('/', varY1, 'C0');
end;
end;
keep VarX1 VarY1;
run;
You can generate codes for the missing combinations as follows :
data temp;
if _N_=0 then set have1; /* retrieve column formats */
set have2;
keep varX varY;
do i=1 to countw(varX1,'/');
varX=scan(varX1,i,'/');
output;
end;
run;
proc sql noprint;
CREATE TABLE new_vals AS
SELECT DISTINCT * FROM temp
WHERE varX NOT IN SELECT varX FROM have1;
SELECT max(input(substr(varY,2),best.))+1 INTO :maxindex
FROM have1;
quit;
data new_vals (drop=index);
set new_vals;
if _N_=1 then index=symgetn('maxindex');
varY=cats('C', index);
index+1;
run;
proc append base=have1 data=new_vals;
run;
Here is another solution: Use SET with KEY= in a datastep:
data have1;
input varX$ varY$;
cards;
abc C1
def C2
ghi C3
jkl C4
mno C5
;run;
data have2;
lengh varx1 $20;
input varX1;
cards;
abc/def
ghi/mno
jkl/def
abc/mno/uvx
ghi/pqr/jkl
;run;
proc sql;
create index varx on have1(varx);
quit;
data want;
set have2;
do _N_=1 to countw(varX1,'/');
varx=scan(varx1,_N_,'/');
set have1 key=varx/unique;
length varY1 $20;
if _iorc_ then do;
_error_=0;
call catx('/',varY1,'C0');
end;
else call catx('/',VarY1,VarY);
end;
keep varX1 VarY1;
run;
I have update the code using macro variables. Let me know if it works!
Best,
/* Retrieve the list of words that are mentionned in have1 but not in have1 */
/* Put each value in a macrovariable with value = "C0" */
data have2_tr (keep = VarX);
if _n_=1 then set have1 (keep=VarX);
set have2;
do i=1 to countw(varX1,"/");
VarX1_tr = scan(varX1,i,"/");
VarX = VarX1_tr;
output;
end;
run;
proc sort data=have2_tr nodupkey;
by VarX;
run;
data have2_not_in_have1;
merge have1 (in=x keep=VarX) have2_tr (in=y);
by VarX;
if x=0 and y=1;
call symputx(VarX,"C0");
run;
/* Retrieve the list of words that are mentionned in have1 */
/* Put each value in a macrovariable with the value = varY's value */
data _null_;
set have1;
call symputx(varX,varY);
run;
/* Output the results */
data want;
set have2;
varY1 = symget(scan(varX1,1));
do i = 1 to countw(varX1,"/")-1;
varY1 = catx("/",varY1,symget(scan(varX1,i+1,"/")));
end;
drop i;
run;
NB: The dataset have2_not_in_have1 will give you the list of values not referenced in have 1 and mentioned as C0 in the final dataset
Hello,
proc sql noprint;
CREATE TABLE want AS
SELECT a.varX1, cats(b.varY,'/',c.varY)
FROM have2 a
LEFT JOIN have1 b
ON b.varX=scan(a.varX1,1,'/')
LEFT JOIN have1 c
ON c.varX=scan(a.varX1,2,'/')
ORDER BY a.varX1
;
quit;
Sorry, Late to the party
data have1;
input varX$ varY$;
cards;
abc C1
def C2
ghi C3
jkl C4
mno C5
;
data have2;
input varX1$;
cards;
abc/def
ghi/mno
jkl/def
;
data want ;
if _n_=1 then do;
if 0 then set have1;
dcl hash H (dataset:'have1',ordered:'y') ;
h.definekey ("vary") ;
h.definedata ("varX","varY") ;
h.definedone () ;
dcl hiter hi('h');
end;
set have2;
length want $50;
do while(hi.next()=0);
if index(varx1, strip(varx))>0 then want=catx('/',want,varY);
end;
keep varx1 want;
run;
another Hash direct find
data have1;
input varX$ varY$;
cards;
abc C1
def C2
ghi C3
jkl C4
mno C5
;
data have2;
input varX1$;
cards;
abc/def
ghi/mno
jkl/def
;
data want ;
if _n_=1 then do;
if 0 then set have1;
dcl hash H (dataset:'have1') ;
h.definekey ("varx") ;
h.definedata ("varY") ;
h.definedone();
end;
set have2;
length want $50;
do _n_=1 to countw(varX1,'/');
_t=scan(varx1,_n_,'/');
if h.find(key:_t)=0 then want=catx('/',want,varY);
end;
keep varx1 want;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.