DATA Step, Macro, Functions and more

Do same basic steps on 200 variables, in a particular dataset

Accepted Solution Solved
Reply
Regular Contributor
Posts: 223
Accepted Solution

Do same basic steps on 200 variables, in a particular dataset

Oligolas was kind to provide the following code, handling multiple datasets in a library, and doing the same procedure(s) on each dataset:

 

data _NULL_;
   set sashelp.vtable;
   where libname eq upcase('YourLib');
   call execute('
      PROC SORT data='||strip(libname)||'.'||strip(memname)||' out=tmp;
         BY variable_rank variable_continuous;
      RUN;

      DATA yourPermLib.'||strip(memname)||'_divisions;
         SET tmp;
         BY variable_rank;
         if last.variable_rank;
      RUN;

      PROC DATASETS lib=work noprint;
         delete tmp;
      RUN;QUIT;');
run;

Works wonderfully.

 

Here's now a follow-up question, if I may:

 

The above code deals with one variable pair -- variable_continuous, and it's rank counterpart, variable_rank.

 

The follow-up issue is that there are 200 such variable pairs in my dataset.  EACH of these continuous, with rank counterpart.

 

The above code results in the following partial output dataset:

 

  •   i_20901 i_20902 i_20903 i_20904 i_20905 i_20906 rg7_20901 rg7_20902 rg7_20903 rg7_20904 rg7_20905 rg7_20906
      5.2998 0.8653 9.7555 -0.4079     4 6 4 1    
      15.686 0.4401 6.3543 0.0869 4.3429 -0.1955 6 4 3 3 0 0
      1.7026 -0.169 11.5141 -0.7457 6.1116 0.0722 1 2 5 0 1 4
      0.6265 0.0777 2.2246 -0.1288 7.4253 -0.2075 0 3 0 2 2 0
      5.6289 0.9559 2.0766 -0.3934 9.0273 -0.3257 4 6 0 1 3 0
      3.1862 -0.2157 4.8757 -0.0159 11.6997 -0.0549 2 2 2 3 4 2
      7.7334 -0.1715 9.3654 -0.1501 14.9105 -0.0166 5 2 4 2 5 2
      2.3106 0.3048 6.3218 -0.6706 25.3866 0.011 2 4 3 0 6 3

 

The above dataset creation run focused on continuous variable i_20905, and the corresponding rank variable rg7_20905.

 

I'll need to delete all the other variables, using SAS 'keep'.  The result would be as follows:

 

  •   i_20905 rg7_20905
         
      4.3429 0
      6.1116 1
      7.4253 2
      9.0273 3
      11.6997 4
      14.9105 5
      25.3866 6

 

Each and every variable pair would require it's own unique output file such as that above.

 

All the files would then be combined using SAS Match-Merging, using rank as the common column.

 

That would be the final result desired.  At least for now.

 

So, what question am I presently asking?

 

Simply, Using Oligolas' code above, is there a way to handle all the variables in one go?

 

If not, I will simply copy-paste x200, changing the specifics on variable for each of the 200 runs.  Perhaps that would be the best way anyway.  As at least I will be able to understand what's happening.

 

Thanks!

 

Nicholas Kormanik

 

 


Accepted Solutions
Solution
‎09-14-2017 05:34 AM
Frequent Contributor
Posts: 129

Re: Do same basic steps on 200 variables, in a particular dataset

Posted in reply to ChrisBrooks

Hi,

 

you're the architect and if I understood your requirements right you could do this:

data test;
input i_20901
i_20902
i_20903
i_20904
i_20905
i_20906
rg7_20901
rg7_20902
rg7_20903
rg7_20904
rg7_20905   
rg7_20906;
cards;
5.2998 0.8653 9.7555 -0.4079 0 0 4 6 4 1 0 0    
15.686 0.4401 6.3543 0.0869 4.3429 -0.1955 6 4 3 3 0 0 
1.7026 -0.169 11.5141 -0.7457 6.1116 0.0722 1 2 5 0 1 4 
0.6265 0.0777 2.2246 -0.1288 7.4253 -0.2075 0 3 0 2 2 0 
5.6289 0.9559 2.0766 -0.3934 9.0273 -0.3257 4 6 0 1 3 0 
3.1862 -0.2157 4.8757 -0.0159 11.6997 -0.0549 2 2 2 3 4 2 
7.7334 -0.1715 9.3654 -0.1501 14.9105 -0.0166 5 2 4 2 5 2 
2.3106 0.3048 6.3218 -0.6706 25.3866 0.011 2 4 3 0 6 3
;
run;

DATA vcolumn;
   length pairID $20;
   set sashelp.vcolumn(keep=libname memname name); 
   where libname eq 'WORK' and memname eq 'TEST';
   pairID=scan(name,2,'_');
RUN;

PROC SORT data=vcolumn; 
   by libname memname pairID descending name; 
RUN;

DATA vcolumn;
   length byVar $40;
   set vcolumn;
   retain byVar;
   by libname memname pairID descending name; 
   if first.pairID then byVar=strip(name);
   else do;
      byVar=strip(byVar)||' '||strip(name);
      output;
   end;
RUN;

DATA _NULL_;
   set vcolumn;
   call execute('
      PROC SORT data='||strip(libname)||'.'||strip(memname)||' out=tmp;
         BY '||strip(byVar)||';
      RUN;

      DATA WORK.'||strip(memname)||'_'||strip(pairID)||'_divisions;
         SET tmp;
         BY '||strip(byVar)||';
         if last.'||strip(scan(byVar,1))||';
      RUN;

      PROC DATASETS lib=work noprint;
         delete tmp;
      RUN;QUIT;');
RUN;

But I'm not sure if creating 100 files x 6 pair variables = 600 datasets is the thing you really want.

________________________

- Cheers -

View solution in original post


All Replies
Super User
Super User
Posts: 7,955

Re: Do same basic steps on 200 variables, in a particular dataset

Posted in reply to NicholasKormanik

Yes, there is a way to do it, and there are many topics on here which you can find using the search, in fact I have probably posted such many times.  Call execute accepts a string - any string - and inserts that into the compiler after the datastep has finished running, therefore any Base SAS code you can write can be put into a call execute, much like you could put it into a macro, the benefit being that you use the datastep as a loop rather than handling that with macro code.  So write your code once so that it works, then identify the pieces which will change, then put that in a call execute.

However, that being said, as in my other post you are really fighting the SAS system.  It is really not a good idea to split data up into little bits.  What will you do after you have copied the code 200 times or put it all in a macro, everything from that point on then requires the same process, not to mention that running each bit will take longer and use more resources.  Three examples for you of how to work with the data as is:

arrays: you can use arrays to refer to groups of variables

shorthand: the other method is shorthand, so to sum up all i_ variables sum(of i_Smiley Happy - how complicated would that be if it was all in different datasets?

Normalisiation: having the data go down the page rather than across e.g:

PARAM  VALUE

i_20901  xyz

...

You can then refer to observations rather than splitting that out into a new dataset - i.e. where clausing.

Super Contributor
Posts: 440

Re: Do same basic steps on 200 variables, in a particular dataset

I'd like to echo the comments of @RW9 - I've seen this many, many times and I'd bet a pound to a penny that this data originally came from spreadsheets. This "short and wide" layout works really well for copying formulae in, say, Excel but doesn't work so well in other environments for the reasons given. If you have the data "tall and thin" as suggested you'll find your programs are more robust (less changing of code), easier to work with (much less typing and reading lines of lines of repeated variable names and simpler coding using By Groups) and they'll actually run faster as well. You should definitley consider converting this data to "long and thin" with Proc Transpose.

Solution
‎09-14-2017 05:34 AM
Frequent Contributor
Posts: 129

Re: Do same basic steps on 200 variables, in a particular dataset

Posted in reply to ChrisBrooks

Hi,

 

you're the architect and if I understood your requirements right you could do this:

data test;
input i_20901
i_20902
i_20903
i_20904
i_20905
i_20906
rg7_20901
rg7_20902
rg7_20903
rg7_20904
rg7_20905   
rg7_20906;
cards;
5.2998 0.8653 9.7555 -0.4079 0 0 4 6 4 1 0 0    
15.686 0.4401 6.3543 0.0869 4.3429 -0.1955 6 4 3 3 0 0 
1.7026 -0.169 11.5141 -0.7457 6.1116 0.0722 1 2 5 0 1 4 
0.6265 0.0777 2.2246 -0.1288 7.4253 -0.2075 0 3 0 2 2 0 
5.6289 0.9559 2.0766 -0.3934 9.0273 -0.3257 4 6 0 1 3 0 
3.1862 -0.2157 4.8757 -0.0159 11.6997 -0.0549 2 2 2 3 4 2 
7.7334 -0.1715 9.3654 -0.1501 14.9105 -0.0166 5 2 4 2 5 2 
2.3106 0.3048 6.3218 -0.6706 25.3866 0.011 2 4 3 0 6 3
;
run;

DATA vcolumn;
   length pairID $20;
   set sashelp.vcolumn(keep=libname memname name); 
   where libname eq 'WORK' and memname eq 'TEST';
   pairID=scan(name,2,'_');
RUN;

PROC SORT data=vcolumn; 
   by libname memname pairID descending name; 
RUN;

DATA vcolumn;
   length byVar $40;
   set vcolumn;
   retain byVar;
   by libname memname pairID descending name; 
   if first.pairID then byVar=strip(name);
   else do;
      byVar=strip(byVar)||' '||strip(name);
      output;
   end;
RUN;

DATA _NULL_;
   set vcolumn;
   call execute('
      PROC SORT data='||strip(libname)||'.'||strip(memname)||' out=tmp;
         BY '||strip(byVar)||';
      RUN;

      DATA WORK.'||strip(memname)||'_'||strip(pairID)||'_divisions;
         SET tmp;
         BY '||strip(byVar)||';
         if last.'||strip(scan(byVar,1))||';
      RUN;

      PROC DATASETS lib=work noprint;
         delete tmp;
      RUN;QUIT;');
RUN;

But I'm not sure if creating 100 files x 6 pair variables = 600 datasets is the thing you really want.

________________________

- Cheers -

Super Contributor
Posts: 440

Re: Do same basic steps on 200 variables, in a particular dataset

I've seen systems like that - they're a support nightmare.....

Regular Contributor
Posts: 223

Re: Do same basic steps on 200 variables, in a particular dataset

Not really sure what you did here, Oligolas, but you're still the winner.

 

Thanks very much!

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 117 views
  • 5 likes
  • 4 in conversation