BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mkeintz
PROC Star

@KS99 wrote:
Dear mkeintz

I have an additional question.

I ran your codes, and it worked well.
Except one thing: I discovered that in my end-results all variables starting with _(number) disappeared!
Maybe should I change the variable names? Or, is there is cure for this?

Many thanks,
KS Choi -,

If you take a close look at my code, you would see that all the variable names beginning with _ were intentionally dropped, for two reasons:

  1. The array _anls is defined as _temporary_.  If you look in the sas documentation, you will see that arrays so define are not kept in the output dataset.  So those elements would never be kept.

    But the array _tmps is not _temporary_, so see point 2 below:

  2. The data statement has a "drop=v rc _:);" dataset name parameter.  Please look up what the drop= parameter does.  And notice the expression _: tells sas to refer to all variables whose names begin with a _.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
KS99
Obsidian | Level 7
Thank you mkeintz!
I change that part of codes to this: (drop=v rc _tmps:).
Now it cleanly works!

Have a good day !

Sincerely,
KS -,
KS99
Obsidian | Level 7
To answer your quick question:
The names of the variables look like these:

_00036020 Num 8 BEST.
_00036110 Num 8 BEST.
_00037520 Num 8 BEST.
_00108410 Num 8 BEST.
_00154710 Num 8 BEST.
_00212110 Num 8 BEST.
SSBSQXJ0 Num 8 BEST.
SSBXBZB0 Num 8 BEST.
SSBYL6RK Num 8 BEST.
SSBYPC1T Num 8 BEST.
SSBZ0Y0T Num 8 BEST.
Y2106R11 Num 8 BEST.
Y2573F10 Num 8 BEST.
.......

They are all numeric variables.


PeterClemmensen
Tourmaline | Level 20

As other responders have pointed out, this is hardly a lagging problem. 

 

Here are my 2 cents. Feel free to ask 🙂

 

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;

data want(drop = k);
   
   if _N_ = 1 then do;

      dcl hash h(multidata : "Y");
      h.definekey('k');
      h.definedata('k', '_i_');
      h.definedone();

      do until (z);
         set have end = z;
         array v var1-var3;
         do over v;
            if v then do;
               k = v;
               h.add();
            end;
         end;
      end;
   end;

   set have;
   call missing(of var:);

   do while (h.do_over(key : _N_) = 0);
      v = k;
   end;
run;

 

Result:

 

ANALYS var1 var2 var3 
1      .    .    . 
2      .    .    . 
3      .    3    . 
4      .    .    . 
5      5    .    . 
6      .    .    . 
7      .    7    7 
8      .    .    . 
9      .    9    9 
10     .    .    10 

 

The code is reasonably fast for the size of your original data as well:

 

data have;
   do analysis = 1 to 5000;
      array v var1 - var5000;
      do over v;
          v = ifn(rand('uniform') < .2, ceil(rand('uniform') * 5000), .);
      end;
      output;
   end;
run;

data want(drop = k);
   
   if _N_ = 1 then do;
      array v var1-var5000;

      dcl hash h(multidata : "Y");
      h.definekey('k');
      h.definedata('k', '_i_');
      h.definedone();

      do until (z);
         set have end = z;
         do over v;
            if v then do;
               k = v;
               h.add();
            end;
         end;
      end;
   end;

   set have;
   call missing(of var:);

   do while (h.do_over(key : _N_) = 0);
      v = k;
   end;
run;

 

Log:

 

NOTE: There were 5000 observations read from the data set WORK.HAVE.
NOTE: There were 5000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5000 observations and 5001 variables.
NOTE: DATA statement used (Total process time):
      real time           3.03 seconds
      cpu time            3.00 seconds

 

KS99
Obsidian | Level 7

Thank you for your elaborate codes! 

 

But my variables are over 4000, and they all have different names. 

I will keep your codes for the future reference, though

 

KS -, 

Ksharp
Super User

Maybe I am head full of SQL .

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
proc sql;
create table want as
select *
from (select analys from have) as a left join
     (select var1 from have) as b   on a.analys=b.var1 left join
	 (select var2 from have) as c   on a.analys=c.var2 left join 
     (select var3 from have) as d   on a.analys=d.var3 ;
quit;
mkeintz
PROC Star

@Ksharp wrote:

Maybe I am head full of SQL .

... 
proc sql;
create table want as
select *
from (select analys from have) as a left join
     (select var1 from have) as b   on a.analys=b.var1 left join
	 (select var2 from have) as c   on a.analys=c.var2 left join 
     (select var3 from have) as d   on a.analys=d.var3 ;
quit;

 

@Ksharp

 

Neat, I think the OP has 4,590 variables.  That would be a lot of left joins. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
Make a macro ? or Using MERGE instead ?
Tom
Super User Tom
Super User

Sounds like you want an adjacency matrix.  It probably would help to first normalize your data.

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;

proc transpose data=have out=tall;
  by analys;
  var var1-var3 ;
run;

Then your output looks like a simple report.  

proc report data=tall;
  column col1 analys,col1=xx ;
  define col1 / group ' ';
  define analys / across ' ';
  define xx / max ' ';
run;

Results

image.png

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 23 replies
  • 2723 views
  • 6 likes
  • 8 in conversation