@KS99 wrote:
Dear mkeintz
I have an additional question.
I ran your codes, and it worked well.
Except one thing: I discovered that in my end-results all variables starting with _(number) disappeared!
Maybe should I change the variable names? Or, is there is cure for this?
Many thanks,
KS Choi -,
If you take a close look at my code, you would see that all the variable names beginning with _ were intentionally dropped, for two reasons:
As other responders have pointed out, this is hardly a lagging problem.
Here are my 2 cents. Feel free to ask 🙂
data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
data want(drop = k);
if _N_ = 1 then do;
dcl hash h(multidata : "Y");
h.definekey('k');
h.definedata('k', '_i_');
h.definedone();
do until (z);
set have end = z;
array v var1-var3;
do over v;
if v then do;
k = v;
h.add();
end;
end;
end;
end;
set have;
call missing(of var:);
do while (h.do_over(key : _N_) = 0);
v = k;
end;
run;
Result:
ANALYS var1 var2 var3 1 . . . 2 . . . 3 . 3 . 4 . . . 5 5 . . 6 . . . 7 . 7 7 8 . . . 9 . 9 9 10 . . 10
The code is reasonably fast for the size of your original data as well:
data have;
do analysis = 1 to 5000;
array v var1 - var5000;
do over v;
v = ifn(rand('uniform') < .2, ceil(rand('uniform') * 5000), .);
end;
output;
end;
run;
data want(drop = k);
if _N_ = 1 then do;
array v var1-var5000;
dcl hash h(multidata : "Y");
h.definekey('k');
h.definedata('k', '_i_');
h.definedone();
do until (z);
set have end = z;
do over v;
if v then do;
k = v;
h.add();
end;
end;
end;
end;
set have;
call missing(of var:);
do while (h.do_over(key : _N_) = 0);
v = k;
end;
run;
Log:
NOTE: There were 5000 observations read from the data set WORK.HAVE. NOTE: There were 5000 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 5000 observations and 5001 variables. NOTE: DATA statement used (Total process time): real time 3.03 seconds cpu time 3.00 seconds
Thank you for your elaborate codes!
But my variables are over 4000, and they all have different names.
I will keep your codes for the future reference, though
KS -,
Maybe I am head full of SQL .
data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
proc sql;
create table want as
select *
from (select analys from have) as a left join
(select var1 from have) as b on a.analys=b.var1 left join
(select var2 from have) as c on a.analys=c.var2 left join
(select var3 from have) as d on a.analys=d.var3 ;
quit;
@Ksharp wrote:
Maybe I am head full of SQL .
... proc sql; create table want as select * from (select analys from have) as a left join (select var1 from have) as b on a.analys=b.var1 left join (select var2 from have) as c on a.analys=c.var2 left join (select var3 from have) as d on a.analys=d.var3 ; quit;
Neat, I think the OP has 4,590 variables. That would be a lot of left joins.
Sounds like you want an adjacency matrix. It probably would help to first normalize your data.
data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
proc transpose data=have out=tall;
by analys;
var var1-var3 ;
run;
Then your output looks like a simple report.
proc report data=tall;
column col1 analys,col1=xx ;
define col1 / group ' ';
define analys / across ' ';
define xx / max ' ';
run;
Results
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.