Solved: Re: Looping through observations and variables

frupaul · Posted 08-21-2018 06:47 AM

Hi Everyone,

I have the below data set (forgive me for the arrow, it will make sense later)

This can be created with the below code:

data Hierarchy;
 INFILE DATALINES DLM='' missover;
 length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
 input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
 cards;
 1 A20 A30 A40 A50 A60 A70 A80
 2 A40 A70 A80 A90
 3 A70 A80 A90 A100
 4 A30 A60 A70
 ;

TASKS:
My tasks are:

1. First, grab the last variable value of each row and prefix it with a 'XXX' as shown in the diagram. So for observation 1, L7 will be XXXA80.

2. Then (using row one as an example), loop through every other row and where the value A80 (last value of row one) is found, replace it with XXXA80.

Another example of step 2: For observation 4, the last value is A70 (which will be replaced with XXXA70 as in step 1). Wherever A70 is found in the dataset e.g. in observation 1, replace it with XXXA70.

EXPECTED OUTCOME:

Thanks,

Kurt_Bremser · Posted 08-21-2018 08:29 AM

I have now played around with your data a little, and implemented my suggestion:

data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;

/* transpose */
proc transpose
  data=hierarchy
  out=trans (where=(col1 ne ''))
;
by custid;
var l:;
run;

/* create a format */
data cntlin;
set trans (rename=(col1=start)) end=eof;
by custid;
if last.custid;
label = 'XXX' !! start;
type = 'C';
fmtname = 'convert';
keep fmtname type start label;
run;

proc sort data=cntlin nodupkey;
by start;
run;

proc format library=work cntlin=cntlin;
run;

/* create wanted dataset in long format */
data want;
set trans;
col1 = put(col1,convert.);
run;

/* re-transpose to wide format */
proc transpose data=want out=want1;
by custid;
var col1;
id _name_;
run;

Note that the code is quite simple, and makes no assumptions apart from the variable names starting with 'L'. Otherwise, it is completely data-driven.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 08-21-2018 06:56 AM

transpose into a long format.
create a second dataset where only the last for every ID is kept
create a format from that (that prepends the XXX), where all other values remain unchanged
apply that format to the dataset
if needed, transpose back to wide format

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

FreelanceReinh · Posted 08-21-2018 07:21 AM

Hi @frupaul,

Try this:

data want(drop=a i);
length a $20;
dcl hash h();
h.definekey('a');
h.definedone();
a=' ';

do until(eof1);
  set hierarchy end=eof1;
  array l[8];
  do i=8 to 1 by -1 while(l[i]=' ');
  end;
  if i then h.ref(key: l[i], data: l[i]);
end;

do until(eof2);
  set hierarchy end=eof2;
  do i=1 to 8;
    if h.check(key: l[i])=0 then l[i]='XXX'||l[i];
  end;
  output;
end;

stop;
run;

Please note that the result does not match your "expected outcome" because it does meet your requirement "Wherever A70 is found in the dataset e.g. in observation 1, replace it with XXXA70".

Edit: Replaced find with check, which should be a bit faster.

RW9 · Posted 08-21-2018 08:18 AM

You can also use arrays:

data want;
  set hierachy;
  array l{20};
  first=0;
  do i=20 to 1 step -1;
    if first=0 and l{i} ne "" then do;
      first=1;
      l{i}=cats("XXX",l{i});
    end;
    else if l{i} in ("A70","A80") then l{i}=cats("XXX",l{i});
  end;
run;

Do note that as per @Kurt_Bremser, a long structure with few columns and more obs would be a better data structure for this and other programming.

Kurt_Bremser · Posted 08-21-2018 08:29 AM

I have now played around with your data a little, and implemented my suggestion:

data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;

/* transpose */
proc transpose
  data=hierarchy
  out=trans (where=(col1 ne ''))
;
by custid;
var l:;
run;

/* create a format */
data cntlin;
set trans (rename=(col1=start)) end=eof;
by custid;
if last.custid;
label = 'XXX' !! start;
type = 'C';
fmtname = 'convert';
keep fmtname type start label;
run;

proc sort data=cntlin nodupkey;
by start;
run;

proc format library=work cntlin=cntlin;
run;

/* create wanted dataset in long format */
data want;
set trans;
col1 = put(col1,convert.);
run;

/* re-transpose to wide format */
proc transpose data=want out=want1;
by custid;
var col1;
id _name_;
run;

Note that the code is quite simple, and makes no assumptions apart from the variable names starting with 'L'. Otherwise, it is completely data-driven.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

FreelanceReinh · Posted 08-21-2018 08:48 AM

@Kurt_Bremser: Just a minor remark: I think variable L8 gets lost in the PROC TRANSPOSE step because it has only missing values in the test data. As a result, dataset WANT1 doesn't contain it.

Kurt_Bremser · Posted 08-21-2018 08:53 AM

Correct. That can be fixed by adding a final step:

data final;
set
  hierarchy (obs=0)
  want1
;
run;

But I view the dropping of unused columns as a feature 😉

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SuryaKiran · Posted 08-21-2018 08:53 AM

If you want to give a try here is another way,

data Hierarchy;
 INFILE DATALINES DLM='' missover;
 length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
 input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
 cards;
 1 A20 A30 A40 A50 A60 A70 A80
 2 A40 A70 A80 A90
 3 A70 A80 A90 A100
 4 A30 A60 A70
 ;
 run;
data test(keep=L_);
format L_ $50.;
set Hierarchy;
L_=scan(strip(CATX(' ',of L:)),-1);
run;

proc sql noprint;
select distinct quote(strip(L_)) into: L_ separated by ","
from test;
quit;
%put &L_;

data want;
set Hierarchy;
array All_L{*} L:;
do i=1 to dim(All_L);
if strip(All_L(i)) in (&L_) then All_L(i)=cats('XXX',Strip(All_L(i)));
end;
run;

Thanks,
Suryakiran

novinosrin · Posted 08-21-2018 10:30 AM

Sorry too late, was sleeping.

Assuming i understood your requirement:

data Hierarchy;
infile datalines dlm='' truncover;
length CustID $10 L1 $20 L2 $20 L3 $20 L4 $20 L5 $20 L6 $20 L7 $20 L8 $20;
input CustID  $ L1 $ L2 $ L3 $ L4 $ L5 $ L6 $  L7 $ L8 $;
cards;
1 A20 A30 A40 A50 A60 A70 A80
2 A40 A70 A80 A90
3 A70 A80 A90 A100
4 A30 A60 A70
;
run;
data want;
do n=1 by 1 until(l);
	set Hierarchy end=l;
	array t(*) l8-l1;
	array j(100) $ _temporary_;
	j(n)=coalescec(of t(*));
end;
	l=0;
do until(l);
	set Hierarchy end=l;
	do _n_=1 to dim(t);
	if not missing(t(_n_)) and t(_n_) in j then t(_n_)=cats('XXX',t(_n_));
	end;
	output;
end;
keep custid l:;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!