BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
frupaul
Quartz | Level 8

Hi Everyone,

 

I have the below data set (forgive me for the arrow, it will make sense later)

loop1.PNG

This can be created with the below code:

data Hierarchy;
 INFILE DATALINES DLM='' missover;
 length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
 input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
 cards;
 1 A20 A30 A40 A50 A60 A70 A80
 2 A40 A70 A80 A90
 3 A70 A80 A90 A100
 4 A30 A60 A70
 ;

TASKS:
My tasks are:

 

1. First, grab the last variable value of each row and prefix it with a 'XXX' as shown in the diagram. So for observation 1, L7 will be     XXXA80.

2. Then (using row one as an example), loop through every other row and where the value A80 (last value of row one) is found, replace it with XXXA80.

Another example of step 2: For observation 4, the last value is A70 (which will be replaced with XXXA70 as in step 1). Wherever A70 is found in the dataset e.g. in observation 1, replace it with XXXA70.

 

EXPECTED OUTCOME:

 

loop2.PNG

 

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I have now played around with your data a little, and implemented my suggestion:

data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;

/* transpose */
proc transpose
  data=hierarchy
  out=trans (where=(col1 ne ''))
;
by custid;
var l:;
run;

/* create a format */
data cntlin;
set trans (rename=(col1=start)) end=eof;
by custid;
if last.custid;
label = 'XXX' !! start;
type = 'C';
fmtname = 'convert';
keep fmtname type start label;
run;

proc sort data=cntlin nodupkey;
by start;
run;

proc format library=work cntlin=cntlin;
run;

/* create wanted dataset in long format */
data want;
set trans;
col1 = put(col1,convert.);
run;

/* re-transpose to wide format */
proc transpose data=want out=want1;
by custid;
var col1;
id _name_;
run;

Note that the code is quite simple, and makes no assumptions apart from the variable names starting with 'L'. Otherwise, it is completely data-driven.

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User
  1. transpose into a long format.
  2. create a second dataset where only the last for every ID is kept
  3. create a format from that (that prepends the XXX), where all other values remain unchanged
  4. apply that format to the dataset
  5. if needed, transpose back to wide format

 

FreelanceReinh
Jade | Level 19

Hi @frupaul,

 

Try this:

data want(drop=a i);
length a $20;
dcl hash h();
h.definekey('a');
h.definedone();
a=' ';

do until(eof1);
  set hierarchy end=eof1;
  array l[8];
  do i=8 to 1 by -1 while(l[i]=' ');
  end;
  if i then h.ref(key: l[i], data: l[i]);
end;

do until(eof2);
  set hierarchy end=eof2;
  do i=1 to 8;
    if h.check(key: l[i])=0 then l[i]='XXX'||l[i];
  end;
  output;
end;

stop;
run;

Please note that the result does not match your "expected outcome" because it does meet your requirement "Wherever A70 is found in the dataset e.g. in observation 1, replace it with XXXA70".

 

Edit: Replaced find with check, which should be a bit faster.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You can also use arrays:

data want;
  set hierachy;
  array l{20};
  first=0;
  do i=20 to 1 step -1;
    if first=0 and l{i} ne "" then do;
      first=1;
      l{i}=cats("XXX",l{i});
    end;
    else if l{i} in ("A70","A80") then l{i}=cats("XXX",l{i});
  end;
run;

Do note that as per @Kurt_Bremser, a long structure with few columns and more obs would be a better data structure for this and other programming.

Kurt_Bremser
Super User

I have now played around with your data a little, and implemented my suggestion:

data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;

/* transpose */
proc transpose
  data=hierarchy
  out=trans (where=(col1 ne ''))
;
by custid;
var l:;
run;

/* create a format */
data cntlin;
set trans (rename=(col1=start)) end=eof;
by custid;
if last.custid;
label = 'XXX' !! start;
type = 'C';
fmtname = 'convert';
keep fmtname type start label;
run;

proc sort data=cntlin nodupkey;
by start;
run;

proc format library=work cntlin=cntlin;
run;

/* create wanted dataset in long format */
data want;
set trans;
col1 = put(col1,convert.);
run;

/* re-transpose to wide format */
proc transpose data=want out=want1;
by custid;
var col1;
id _name_;
run;

Note that the code is quite simple, and makes no assumptions apart from the variable names starting with 'L'. Otherwise, it is completely data-driven.

FreelanceReinh
Jade | Level 19

@Kurt_Bremser: Just a minor remark: I think variable L8 gets lost in the PROC TRANSPOSE step because it has only missing values in the test data. As a result, dataset WANT1 doesn't contain it.

Kurt_Bremser
Super User

Correct. That can be fixed by adding a final step:

data final;
set
  hierarchy (obs=0)
  want1
;
run;

But I view the dropping of unused columns as a feature 😉

SuryaKiran
Meteorite | Level 14

If you want to give a try here is another way,

data Hierarchy;
 INFILE DATALINES DLM='' missover;
 length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
 input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
 cards;
 1 A20 A30 A40 A50 A60 A70 A80
 2 A40 A70 A80 A90
 3 A70 A80 A90 A100
 4 A30 A60 A70
 ;
 run;
data test(keep=L_);
format L_ $50.;
set Hierarchy;
L_=scan(strip(CATX(' ',of L:)),-1);
run;

proc sql noprint;
select distinct quote(strip(L_)) into: L_ separated by ","
from test;
quit;
%put &L_;

data want;
set Hierarchy;
array All_L{*} L:;
do i=1 to dim(All_L);
if strip(All_L(i)) in (&L_) then All_L(i)=cats('XXX',Strip(All_L(i)));
end;
run;
Thanks,
Suryakiran
novinosrin
Tourmaline | Level 20

Sorry too late, was sleeping. 

Assuming i understood your requirement:

 

data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;
data want;
do n=1 by 1 until(l);
	set Hierarchy end=l;
	array t(*) l8-l1;
	array j(100) $ _temporary_;
	j(n)=coalescec(of t(*));
end;
	l=0;
do until(l);
	set Hierarchy end=l;
	do _n_=1 to dim(t);
	if not missing(t(_n_)) and t(_n_) in j then t(_n_)=cats('XXX',t(_n_));
	end;
	output;
end;
keep custid l:;
run;

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4010 views
  • 5 likes
  • 6 in conversation