Hi Everyone,
I have the below data set (forgive me for the arrow, it will make sense later)
This can be created with the below code:
data Hierarchy;
INFILE DATALINES DLM='' missover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $ L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
TASKS:
My tasks are:
1. First, grab the last variable value of each row and prefix it with a 'XXX' as shown in the diagram. So for observation 1, L7 will be XXXA80.
2. Then (using row one as an example), loop through every other row and where the value A80 (last value of row one) is found, replace it with XXXA80.
Another example of step 2: For observation 4, the last value is A70 (which will be replaced with XXXA70 as in step 1). Wherever A70 is found in the dataset e.g. in observation 1, replace it with XXXA70.
EXPECTED OUTCOME:
Thanks,
I have now played around with your data a little, and implemented my suggestion:
data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $ L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;
/* transpose */
proc transpose
data=hierarchy
out=trans (where=(col1 ne ''))
;
by custid;
var l:;
run;
/* create a format */
data cntlin;
set trans (rename=(col1=start)) end=eof;
by custid;
if last.custid;
label = 'XXX' !! start;
type = 'C';
fmtname = 'convert';
keep fmtname type start label;
run;
proc sort data=cntlin nodupkey;
by start;
run;
proc format library=work cntlin=cntlin;
run;
/* create wanted dataset in long format */
data want;
set trans;
col1 = put(col1,convert.);
run;
/* re-transpose to wide format */
proc transpose data=want out=want1;
by custid;
var col1;
id _name_;
run;
Note that the code is quite simple, and makes no assumptions apart from the variable names starting with 'L'. Otherwise, it is completely data-driven.
Hi @frupaul,
Try this:
data want(drop=a i);
length a $20;
dcl hash h();
h.definekey('a');
h.definedone();
a=' ';
do until(eof1);
set hierarchy end=eof1;
array l[8];
do i=8 to 1 by -1 while(l[i]=' ');
end;
if i then h.ref(key: l[i], data: l[i]);
end;
do until(eof2);
set hierarchy end=eof2;
do i=1 to 8;
if h.check(key: l[i])=0 then l[i]='XXX'||l[i];
end;
output;
end;
stop;
run;
Please note that the result does not match your "expected outcome" because it does meet your requirement "Wherever A70 is found in the dataset e.g. in observation 1, replace it with XXXA70".
Edit: Replaced find with check, which should be a bit faster.
You can also use arrays:
data want; set hierachy; array l{20}; first=0; do i=20 to 1 step -1; if first=0 and l{i} ne "" then do; first=1; l{i}=cats("XXX",l{i}); end; else if l{i} in ("A70","A80") then l{i}=cats("XXX",l{i}); end; run;
Do note that as per @Kurt_Bremser, a long structure with few columns and more obs would be a better data structure for this and other programming.
I have now played around with your data a little, and implemented my suggestion:
data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $ L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;
/* transpose */
proc transpose
data=hierarchy
out=trans (where=(col1 ne ''))
;
by custid;
var l:;
run;
/* create a format */
data cntlin;
set trans (rename=(col1=start)) end=eof;
by custid;
if last.custid;
label = 'XXX' !! start;
type = 'C';
fmtname = 'convert';
keep fmtname type start label;
run;
proc sort data=cntlin nodupkey;
by start;
run;
proc format library=work cntlin=cntlin;
run;
/* create wanted dataset in long format */
data want;
set trans;
col1 = put(col1,convert.);
run;
/* re-transpose to wide format */
proc transpose data=want out=want1;
by custid;
var col1;
id _name_;
run;
Note that the code is quite simple, and makes no assumptions apart from the variable names starting with 'L'. Otherwise, it is completely data-driven.
@Kurt_Bremser: Just a minor remark: I think variable L8 gets lost in the PROC TRANSPOSE step because it has only missing values in the test data. As a result, dataset WANT1 doesn't contain it.
Correct. That can be fixed by adding a final step:
data final;
set
hierarchy (obs=0)
want1
;
run;
But I view the dropping of unused columns as a feature 😉
If you want to give a try here is another way,
data Hierarchy;
INFILE DATALINES DLM='' missover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $ L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;
data test(keep=L_);
format L_ $50.;
set Hierarchy;
L_=scan(strip(CATX(' ',of L:)),-1);
run;
proc sql noprint;
select distinct quote(strip(L_)) into: L_ separated by ","
from test;
quit;
%put &L_;
data want;
set Hierarchy;
array All_L{*} L:;
do i=1 to dim(All_L);
if strip(All_L(i)) in (&L_) then All_L(i)=cats('XXX',Strip(All_L(i)));
end;
run;
Sorry too late, was sleeping.
Assuming i understood your requirement:
data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $ L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;
data want;
do n=1 by 1 until(l);
set Hierarchy end=l;
array t(*) l8-l1;
array j(100) $ _temporary_;
j(n)=coalescec(of t(*));
end;
l=0;
do until(l);
set Hierarchy end=l;
do _n_=1 to dim(t);
if not missing(t(_n_)) and t(_n_) in j then t(_n_)=cats('XXX',t(_n_));
end;
output;
end;
keep custid l:;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.