@KS99 wrote:
Dear mkeintz
I have an additional question.
I ran your codes, and it worked well.
Except one thing: I discovered that in my end-results all variables starting with _(number) disappeared!
Maybe should I change the variable names? Or, is there is cure for this?
Many thanks,
KS Choi -,
If you take a close look at my code, you would see that all the variable names beginning with _ were intentionally dropped, for two reasons:
As other responders have pointed out, this is hardly a lagging problem.
Here are my 2 cents. Feel free to ask 🙂
data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
data want(drop = k);
if _N_ = 1 then do;
dcl hash h(multidata : "Y");
h.definekey('k');
h.definedata('k', '_i_');
h.definedone();
do until (z);
set have end = z;
array v var1-var3;
do over v;
if v then do;
k = v;
h.add();
end;
end;
end;
end;
set have;
call missing(of var:);
do while (h.do_over(key : _N_) = 0);
v = k;
end;
run;
Result:
ANALYS var1 var2 var3 1 . . . 2 . . . 3 . 3 . 4 . . . 5 5 . . 6 . . . 7 . 7 7 8 . . . 9 . 9 9 10 . . 10
The code is reasonably fast for the size of your original data as well:
data have;
do analysis = 1 to 5000;
array v var1 - var5000;
do over v;
v = ifn(rand('uniform') < .2, ceil(rand('uniform') * 5000), .);
end;
output;
end;
run;
data want(drop = k);
if _N_ = 1 then do;
array v var1-var5000;
dcl hash h(multidata : "Y");
h.definekey('k');
h.definedata('k', '_i_');
h.definedone();
do until (z);
set have end = z;
do over v;
if v then do;
k = v;
h.add();
end;
end;
end;
end;
set have;
call missing(of var:);
do while (h.do_over(key : _N_) = 0);
v = k;
end;
run;
Log:
NOTE: There were 5000 observations read from the data set WORK.HAVE. NOTE: There were 5000 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 5000 observations and 5001 variables. NOTE: DATA statement used (Total process time): real time 3.03 seconds cpu time 3.00 seconds
Thank you for your elaborate codes!
But my variables are over 4000, and they all have different names.
I will keep your codes for the future reference, though
KS -,
Maybe I am head full of SQL .
data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
proc sql;
create table want as
select *
from (select analys from have) as a left join
(select var1 from have) as b on a.analys=b.var1 left join
(select var2 from have) as c on a.analys=c.var2 left join
(select var3 from have) as d on a.analys=d.var3 ;
quit;
@Ksharp wrote:
Maybe I am head full of SQL .
... proc sql; create table want as select * from (select analys from have) as a left join (select var1 from have) as b on a.analys=b.var1 left join (select var2 from have) as c on a.analys=c.var2 left join (select var3 from have) as d on a.analys=d.var3 ; quit;
Neat, I think the OP has 4,590 variables. That would be a lot of left joins.
Sounds like you want an adjacency matrix. It probably would help to first normalize your data.
data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
proc transpose data=have out=tall;
by analys;
var var1-var3 ;
run;
Then your output looks like a simple report.
proc report data=tall;
column col1 analys,col1=xx ;
define col1 / group ' ';
define analys / across ' ';
define xx / max ' ';
run;
Results
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.