Solved: How to lag values as many rows as designated, iteratively over all var... - Page 2

mkeintz · Posted 01-26-2021 02:30 PM

@KS99 wrote:
Dear mkeintz

I have an additional question.

I ran your codes, and it worked well.
Except one thing: I discovered that in my end-results all variables starting with _(number) disappeared!
Maybe should I change the variable names? Or, is there is cure for this?

Many thanks,
KS Choi -,

If you take a close look at my code, you would see that all the variable names beginning with _ were intentionally dropped, for two reasons:

The array _anls is defined as _temporary_. If you look in the sas documentation, you will see that arrays so define are not kept in the output dataset. So those elements would never be kept.

But the array _tmps is not _temporary_, so see point 2 below:
The data statement has a "drop=v rc _:);" dataset name parameter. Please look up what the drop= parameter does. And notice the expression _: tells sas to refer to all variables whose names begin with a _.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

KS99 · Posted 01-27-2021 08:00 PM

Thank you mkeintz!
I change that part of codes to this: (drop=v rc _tmps:).
Now it cleanly works!

Have a good day !

Sincerely,
KS -,

KS99 · Posted 01-22-2021 04:44 PM

To answer your quick question:
The names of the variables look like these:

_00036020 Num 8 BEST.
_00036110 Num 8 BEST.
_00037520 Num 8 BEST.
_00108410 Num 8 BEST.
_00154710 Num 8 BEST.
_00212110 Num 8 BEST.
SSBSQXJ0 Num 8 BEST.
SSBXBZB0 Num 8 BEST.
SSBYL6RK Num 8 BEST.
SSBYPC1T Num 8 BEST.
SSBZ0Y0T Num 8 BEST.
Y2106R11 Num 8 BEST.
Y2573F10 Num 8 BEST.
.......

They are all numeric variables.

PeterClemmensen · Posted 01-22-2021 05:11 AM

As other responders have pointed out, this is hardly a lagging problem.

Here are my 2 cents. Feel free to ask 🙂

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;

data want(drop = k);
   
   if _N_ = 1 then do;

      dcl hash h(multidata : "Y");
      h.definekey('k');
      h.definedata('k', '_i_');
      h.definedone();

      do until (z);
         set have end = z;
         array v var1-var3;
         do over v;
            if v then do;
               k = v;
               h.add();
            end;
         end;
      end;
   end;

   set have;
   call missing(of var:);

   do while (h.do_over(key : _N_) = 0);
      v = k;
   end;
run;

Result:

ANALYS var1 var2 var3 
1      .    .    . 
2      .    .    . 
3      .    3    . 
4      .    .    . 
5      5    .    . 
6      .    .    . 
7      .    7    7 
8      .    .    . 
9      .    9    9 
10     .    .    10

The code is reasonably fast for the size of your original data as well:

data have;
   do analysis = 1 to 5000;
      array v var1 - var5000;
      do over v;
          v = ifn(rand('uniform') < .2, ceil(rand('uniform') * 5000), .);
      end;
      output;
   end;
run;

data want(drop = k);
   
   if _N_ = 1 then do;
      array v var1-var5000;

      dcl hash h(multidata : "Y");
      h.definekey('k');
      h.definedata('k', '_i_');
      h.definedone();

      do until (z);
         set have end = z;
         do over v;
            if v then do;
               k = v;
               h.add();
            end;
         end;
      end;
   end;

   set have;
   call missing(of var:);

   do while (h.do_over(key : _N_) = 0);
      v = k;
   end;
run;

Log:

NOTE: There were 5000 observations read from the data set WORK.HAVE.
NOTE: There were 5000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5000 observations and 5001 variables.
NOTE: DATA statement used (Total process time):
      real time           3.03 seconds
      cpu time            3.00 seconds

The DATA to DATA Step Macro
Blog: SASnrd

KS99 · Posted 01-22-2021 04:48 PM

Thank you for your elaborate codes!

But my variables are over 4000, and they all have different names.

I will keep your codes for the future reference, though

KS -,

Ksharp · Posted 01-22-2021 07:24 AM

Maybe I am head full of SQL .

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
proc sql;
create table want as
select *
from (select analys from have) as a left join
     (select var1 from have) as b   on a.analys=b.var1 left join
	 (select var2 from have) as c   on a.analys=c.var2 left join 
     (select var3 from have) as d   on a.analys=d.var3 ;
quit;

mkeintz · Posted 01-22-2021 02:03 PM

@Ksharp wrote:

Maybe I am head full of SQL .

... 
proc sql;
create table want as
select *
from (select analys from have) as a left join
     (select var1 from have) as b   on a.analys=b.var1 left join
	 (select var2 from have) as c   on a.analys=c.var2 left join 
     (select var3 from have) as d   on a.analys=d.var3 ;
quit;

@Ksharp

Neat, I think the OP has 4,590 variables. That would be a lot of left joins.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 01-23-2021 05:40 AM

Make a macro ? or Using MERGE instead ?

Tom · Posted 01-22-2021 08:39 AM

Sounds like you want an adjacency matrix. It probably would help to first normalize your data.

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;

proc transpose data=have out=tall;
  by analys;
  var var1-var3 ;
run;

Then your output looks like a simple report.

proc report data=tall;
  column col1 analys,col1=xx ;
  define col1 / group ' ';
  define analys / across ' ';
  define xx / max ' ';
run;

Results

Re: How to lag values as many rows as designated, iteratively over all variables?

Re: How to lag values as many rows as designated, iteratively over all variables?

Re: How to lag values as many rows as designated, iteratively over all variables?

Re: How to lag values as many rows as designated, iteratively over all variables?

Re: How to lag values as many rows as designated, iteratively over all variables?

Re: How to lag values as many rows as designated, iteratively over all variables?

Re: How to lag values as many rows as designated, iteratively over all variables?

Re: How to lag values as many rows as designated, iteratively over all variables?

Re: How to lag values as many rows as designated, iteratively over all variables?

Registration is open

Registration is open

SAS Training: Just a Click Away