BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mkeintz
PROC Star

@KS99 wrote:
Dear mkeintz

I have an additional question.

I ran your codes, and it worked well.
Except one thing: I discovered that in my end-results all variables starting with _(number) disappeared!
Maybe should I change the variable names? Or, is there is cure for this?

Many thanks,
KS Choi -,

If you take a close look at my code, you would see that all the variable names beginning with _ were intentionally dropped, for two reasons:

  1. The array _anls is defined as _temporary_.  If you look in the sas documentation, you will see that arrays so define are not kept in the output dataset.  So those elements would never be kept.

    But the array _tmps is not _temporary_, so see point 2 below:

  2. The data statement has a "drop=v rc _:);" dataset name parameter.  Please look up what the drop= parameter does.  And notice the expression _: tells sas to refer to all variables whose names begin with a _.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
KS99
Obsidian | Level 7
Thank you mkeintz!
I change that part of codes to this: (drop=v rc _tmps:).
Now it cleanly works!

Have a good day !

Sincerely,
KS -,
KS99
Obsidian | Level 7
To answer your quick question:
The names of the variables look like these:

_00036020 Num 8 BEST.
_00036110 Num 8 BEST.
_00037520 Num 8 BEST.
_00108410 Num 8 BEST.
_00154710 Num 8 BEST.
_00212110 Num 8 BEST.
SSBSQXJ0 Num 8 BEST.
SSBXBZB0 Num 8 BEST.
SSBYL6RK Num 8 BEST.
SSBYPC1T Num 8 BEST.
SSBZ0Y0T Num 8 BEST.
Y2106R11 Num 8 BEST.
Y2573F10 Num 8 BEST.
.......

They are all numeric variables.


PeterClemmensen
Tourmaline | Level 20

As other responders have pointed out, this is hardly a lagging problem. 

 

Here are my 2 cents. Feel free to ask 🙂

 

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;

data want(drop = k);
   
   if _N_ = 1 then do;

      dcl hash h(multidata : "Y");
      h.definekey('k');
      h.definedata('k', '_i_');
      h.definedone();

      do until (z);
         set have end = z;
         array v var1-var3;
         do over v;
            if v then do;
               k = v;
               h.add();
            end;
         end;
      end;
   end;

   set have;
   call missing(of var:);

   do while (h.do_over(key : _N_) = 0);
      v = k;
   end;
run;

 

Result:

 

ANALYS var1 var2 var3 
1      .    .    . 
2      .    .    . 
3      .    3    . 
4      .    .    . 
5      5    .    . 
6      .    .    . 
7      .    7    7 
8      .    .    . 
9      .    9    9 
10     .    .    10 

 

The code is reasonably fast for the size of your original data as well:

 

data have;
   do analysis = 1 to 5000;
      array v var1 - var5000;
      do over v;
          v = ifn(rand('uniform') < .2, ceil(rand('uniform') * 5000), .);
      end;
      output;
   end;
run;

data want(drop = k);
   
   if _N_ = 1 then do;
      array v var1-var5000;

      dcl hash h(multidata : "Y");
      h.definekey('k');
      h.definedata('k', '_i_');
      h.definedone();

      do until (z);
         set have end = z;
         do over v;
            if v then do;
               k = v;
               h.add();
            end;
         end;
      end;
   end;

   set have;
   call missing(of var:);

   do while (h.do_over(key : _N_) = 0);
      v = k;
   end;
run;

 

Log:

 

NOTE: There were 5000 observations read from the data set WORK.HAVE.
NOTE: There were 5000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5000 observations and 5001 variables.
NOTE: DATA statement used (Total process time):
      real time           3.03 seconds
      cpu time            3.00 seconds

 

KS99
Obsidian | Level 7

Thank you for your elaborate codes! 

 

But my variables are over 4000, and they all have different names. 

I will keep your codes for the future reference, though

 

KS -, 

Ksharp
Super User

Maybe I am head full of SQL .

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;
proc sql;
create table want as
select *
from (select analys from have) as a left join
     (select var1 from have) as b   on a.analys=b.var1 left join
	 (select var2 from have) as c   on a.analys=c.var2 left join 
     (select var3 from have) as d   on a.analys=d.var3 ;
quit;
mkeintz
PROC Star

@Ksharp wrote:

Maybe I am head full of SQL .

... 
proc sql;
create table want as
select *
from (select analys from have) as a left join
     (select var1 from have) as b   on a.analys=b.var1 left join
	 (select var2 from have) as c   on a.analys=c.var2 left join 
     (select var3 from have) as d   on a.analys=d.var3 ;
quit;

 

@Ksharp

 

Neat, I think the OP has 4,590 variables.  That would be a lot of left joins. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
Make a macro ? or Using MERGE instead ?
Tom
Super User Tom
Super User

Sounds like you want an adjacency matrix.  It probably would help to first normalize your data.

data have;
input ANALYS var1 var2 var3 ;
CARDS;
1 5 3 7
2 . 7 9
3 . 9 10
4 . . .
5 . . .
6 . . .
7 . . .
8 . . .
9 . . .
10 . . .
;;;;

proc transpose data=have out=tall;
  by analys;
  var var1-var3 ;
run;

Then your output looks like a simple report.  

proc report data=tall;
  column col1 analys,col1=xx ;
  define col1 / group ' ';
  define analys / across ' ';
  define xx / max ' ';
run;

Results

image.png

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 1251 views
  • 6 likes
  • 8 in conversation