Dear Experts,
I share the sample dataset which matches my requirement for the study.
I have two datasets contains 2 columns varX varY.
data have1;
input varX$ varY$;
cards;
abc C1
def C2
ghi C3
jkl C4
mno C5
;
The above code list the alphabets individually refers to C_value.
The below code refers to the combination of alphabets, So according to alphabets I need to match the C_values on varY1.
data have2;
input varX1$;
cards;
abc/def
ghi/mno
jkl/def
;
Required Output like:
abc/def | C1/C2 |
ghi/mno | C3/C5 |
jkl/def | C4/C2 |
Kindly suggest a code to solve the task.
Thanks in advance!
Hash-object to the rescue ... maybe ...
Not sure that i understood the c0-problem at all, updated have-datesets would be useful.
data want;
if 0 then set have1;
set have2;
length varY1 $ 10;
if _n_ = 1 then do;
declare hash h(dataset: 'have1');
h.defineKey('varx');
h.defineData('vary');
h.defineDone();
end;
varY1 = '';
do i = 1 to countw(varx1, '/');
varX = scan(varX1, i, '/');
if h.find() = 0 then do;
varY1 = catx('/', varY1, varY);
end;
else do;
varY1 = catx('/', varY1, 'C0');
end;
end;
keep VarX1 VarY1;
run;
You can try this code:
data _null_;
set have1;
call symputx(varX,varY);
run;
data want;
set have2;
varY1 = catx("/",symget(scan(varX1,1,"/")),symget(scan(varX1,2,"/")));
run;
Thank you so much @ed_sas_member @gamotte
Both of your code works well.
Additionally, I required one more condition for this task.
Assume that I mentioned any new combination of 3alphabets in (have1(varX1)) which not listed in (varX(have))
How could I get the result as 'C0' in varY1.
For example :
abc/mno/uvx - C1/C5/C0
ghi/pqr/jkl - C3/C0/C4
It required because Once I found the C0 so I could add that new value in my list manually like
uvx - C6
pqr - C7
You can use the coalescec function :
proc sql noprint;
CREATE TABLE want AS
SELECT a.varX1, catx('/', coalescec(b.varY,'C0'), coalescec(c.varY,'C0'), coalescec(d.varY,'C0')) AS newVar
FROM have2 a
LEFT JOIN have1 b
ON b.varX=scan(a.varX1,1,'/')
LEFT JOIN have1 c
ON c.varX=scan(a.varX1,2,'/')
LEFT JOIN have1 d
ON d.varX=scan(a.varX1,3,'/')
ORDER BY a.varX1
;
quit;
I appreciate your response @gamotte
It results in C0 for the new value but also at the end of all observation were it not requires.
like:
abc/def - C1/C2/C0
ghi/mno - C3/C5/C0
Could you please help me to sort out the C0 at the unrequired Observations.
Much Thanks!
Here is another approach with a format generated by the dataset have1 :
data ctrl;
length label $ 11;
set have1(rename=(varX=start varY=label)) end=last;
end=start;
retain fmtname 'CN' type 'c';
output;
if last then do;
hlo='O';
label='C0';
output;
end;
run;
proc format library=work cntlin=ctrl;
run;
data want;
set have2;
length newvar $20;
do i=1 to countw(varX1,'/');
call catx('/', newVar, put(scan(varX1,i,'/'),CN.));
end;
run;
Hash-object to the rescue ... maybe ...
Not sure that i understood the c0-problem at all, updated have-datesets would be useful.
data want;
if 0 then set have1;
set have2;
length varY1 $ 10;
if _n_ = 1 then do;
declare hash h(dataset: 'have1');
h.defineKey('varx');
h.defineData('vary');
h.defineDone();
end;
varY1 = '';
do i = 1 to countw(varx1, '/');
varX = scan(varX1, i, '/');
if h.find() = 0 then do;
varY1 = catx('/', varY1, varY);
end;
else do;
varY1 = catx('/', varY1, 'C0');
end;
end;
keep VarX1 VarY1;
run;
You can generate codes for the missing combinations as follows :
data temp;
if _N_=0 then set have1; /* retrieve column formats */
set have2;
keep varX varY;
do i=1 to countw(varX1,'/');
varX=scan(varX1,i,'/');
output;
end;
run;
proc sql noprint;
CREATE TABLE new_vals AS
SELECT DISTINCT * FROM temp
WHERE varX NOT IN SELECT varX FROM have1;
SELECT max(input(substr(varY,2),best.))+1 INTO :maxindex
FROM have1;
quit;
data new_vals (drop=index);
set new_vals;
if _N_=1 then index=symgetn('maxindex');
varY=cats('C', index);
index+1;
run;
proc append base=have1 data=new_vals;
run;
Here is another solution: Use SET with KEY= in a datastep:
data have1;
input varX$ varY$;
cards;
abc C1
def C2
ghi C3
jkl C4
mno C5
;run;
data have2;
lengh varx1 $20;
input varX1;
cards;
abc/def
ghi/mno
jkl/def
abc/mno/uvx
ghi/pqr/jkl
;run;
proc sql;
create index varx on have1(varx);
quit;
data want;
set have2;
do _N_=1 to countw(varX1,'/');
varx=scan(varx1,_N_,'/');
set have1 key=varx/unique;
length varY1 $20;
if _iorc_ then do;
_error_=0;
call catx('/',varY1,'C0');
end;
else call catx('/',VarY1,VarY);
end;
keep varX1 VarY1;
run;
I have update the code using macro variables. Let me know if it works!
Best,
/* Retrieve the list of words that are mentionned in have1 but not in have1 */
/* Put each value in a macrovariable with value = "C0" */
data have2_tr (keep = VarX);
if _n_=1 then set have1 (keep=VarX);
set have2;
do i=1 to countw(varX1,"/");
VarX1_tr = scan(varX1,i,"/");
VarX = VarX1_tr;
output;
end;
run;
proc sort data=have2_tr nodupkey;
by VarX;
run;
data have2_not_in_have1;
merge have1 (in=x keep=VarX) have2_tr (in=y);
by VarX;
if x=0 and y=1;
call symputx(VarX,"C0");
run;
/* Retrieve the list of words that are mentionned in have1 */
/* Put each value in a macrovariable with the value = varY's value */
data _null_;
set have1;
call symputx(varX,varY);
run;
/* Output the results */
data want;
set have2;
varY1 = symget(scan(varX1,1));
do i = 1 to countw(varX1,"/")-1;
varY1 = catx("/",varY1,symget(scan(varX1,i+1,"/")));
end;
drop i;
run;
NB: The dataset have2_not_in_have1 will give you the list of values not referenced in have 1 and mentioned as C0 in the final dataset
Hello,
proc sql noprint;
CREATE TABLE want AS
SELECT a.varX1, cats(b.varY,'/',c.varY)
FROM have2 a
LEFT JOIN have1 b
ON b.varX=scan(a.varX1,1,'/')
LEFT JOIN have1 c
ON c.varX=scan(a.varX1,2,'/')
ORDER BY a.varX1
;
quit;
Sorry, Late to the party
data have1;
input varX$ varY$;
cards;
abc C1
def C2
ghi C3
jkl C4
mno C5
;
data have2;
input varX1$;
cards;
abc/def
ghi/mno
jkl/def
;
data want ;
if _n_=1 then do;
if 0 then set have1;
dcl hash H (dataset:'have1',ordered:'y') ;
h.definekey ("vary") ;
h.definedata ("varX","varY") ;
h.definedone () ;
dcl hiter hi('h');
end;
set have2;
length want $50;
do while(hi.next()=0);
if index(varx1, strip(varx))>0 then want=catx('/',want,varY);
end;
keep varx1 want;
run;
another Hash direct find
data have1;
input varX$ varY$;
cards;
abc C1
def C2
ghi C3
jkl C4
mno C5
;
data have2;
input varX1$;
cards;
abc/def
ghi/mno
jkl/def
;
data want ;
if _n_=1 then do;
if 0 then set have1;
dcl hash H (dataset:'have1') ;
h.definekey ("varx") ;
h.definedata ("varY") ;
h.definedone();
end;
set have2;
length want $50;
do _n_=1 to countw(varX1,'/');
_t=scan(varx1,_n_,'/');
if h.find(key:_t)=0 then want=catx('/',want,varY);
end;
keep varx1 want;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.