Solved: Do same basic steps on 200 variables, in a particular dataset

NKormanik · Posted 09-13-2017 02:10 AM

Oligolas was kind to provide the following code, handling multiple datasets in a library, and doing the same procedure(s) on each dataset:

data _NULL_;
   set sashelp.vtable;
   where libname eq upcase('YourLib');
   call execute('
      PROC SORT data='||strip(libname)||'.'||strip(memname)||' out=tmp;
         BY variable_rank variable_continuous;
      RUN;

      DATA yourPermLib.'||strip(memname)||'_divisions;
         SET tmp;
         BY variable_rank;
         if last.variable_rank;
      RUN;

      PROC DATASETS lib=work noprint;
         delete tmp;
      RUN;QUIT;');
run;

Works wonderfully.

Here's now a follow-up question, if I may:

The above code deals with one variable pair -- variable_continuous, and it's rank counterpart, variable_rank.

The follow-up issue is that there are 200 such variable pairs in my dataset. EACH of these continuous, with rank counterpart.

The above code results in the following partial output dataset:

i_20901	i_20902	i_20903	i_20904	i_20905	i_20906	rg7_20901	rg7_20902	rg7_20903	rg7_20904	rg7_20905	rg7_20906
5.2998	0.8653	9.7555	-0.4079			4	6	4	1
15.686	0.4401	6.3543	0.0869	4.3429	-0.1955	6	4	3	3	0	0
1.7026	-0.169	11.5141	-0.7457	6.1116	0.0722	1	2	5	0	1	4
0.6265	0.0777	2.2246	-0.1288	7.4253	-0.2075	0	3	0	2	2	0
5.6289	0.9559	2.0766	-0.3934	9.0273	-0.3257	4	6	0	1	3	0
3.1862	-0.2157	4.8757	-0.0159	11.6997	-0.0549	2	2	2	3	4	2
7.7334	-0.1715	9.3654	-0.1501	14.9105	-0.0166	5	2	4	2	5	2
2.3106	0.3048	6.3218	-0.6706	25.3866	0.011	2	4	3	0	6	3

The above dataset creation run focused on continuous variable i_20905, and the corresponding rank variable rg7_20905.

I'll need to delete all the other variables, using SAS 'keep'. The result would be as follows:

i_20905 rg7_20905

4.3429 0

6.1116 1

7.4253 2

9.0273 3

11.6997 4

14.9105 5

25.3866 6

Each and every variable pair would require it's own unique output file such as that above.

All the files would then be combined using SAS Match-Merging, using rank as the common column.

That would be the final result desired. At least for now.

So, what question am I presently asking?

Simply, Using Oligolas' code above, is there a way to handle all the variables in one go?

If not, I will simply copy-paste x200, changing the specifics on variable for each of the 200 runs. Perhaps that would be the best way anyway. As at least I will be able to understand what's happening.

Thanks!

Nicholas Kormanik

Oligolas · Posted 09-13-2017 12:41 PM

Hi,

you're the architect and if I understood your requirements right you could do this:

data test;
input i_20901
i_20902
i_20903
i_20904
i_20905
i_20906
rg7_20901
rg7_20902
rg7_20903
rg7_20904
rg7_20905   
rg7_20906;
cards;
5.2998 0.8653 9.7555 -0.4079 0 0 4 6 4 1 0 0    
15.686 0.4401 6.3543 0.0869 4.3429 -0.1955 6 4 3 3 0 0 
1.7026 -0.169 11.5141 -0.7457 6.1116 0.0722 1 2 5 0 1 4 
0.6265 0.0777 2.2246 -0.1288 7.4253 -0.2075 0 3 0 2 2 0 
5.6289 0.9559 2.0766 -0.3934 9.0273 -0.3257 4 6 0 1 3 0 
3.1862 -0.2157 4.8757 -0.0159 11.6997 -0.0549 2 2 2 3 4 2 
7.7334 -0.1715 9.3654 -0.1501 14.9105 -0.0166 5 2 4 2 5 2 
2.3106 0.3048 6.3218 -0.6706 25.3866 0.011 2 4 3 0 6 3
;
run;

DATA vcolumn;
   length pairID $20;
   set sashelp.vcolumn(keep=libname memname name); 
   where libname eq 'WORK' and memname eq 'TEST';
   pairID=scan(name,2,'_');
RUN;

PROC SORT data=vcolumn; 
   by libname memname pairID descending name; 
RUN;

DATA vcolumn;
   length byVar $40;
   set vcolumn;
   retain byVar;
   by libname memname pairID descending name; 
   if first.pairID then byVar=strip(name);
   else do;
      byVar=strip(byVar)||' '||strip(name);
      output;
   end;
RUN;

DATA _NULL_;
   set vcolumn;
   call execute('
      PROC SORT data='||strip(libname)||'.'||strip(memname)||' out=tmp;
         BY '||strip(byVar)||';
      RUN;

      DATA WORK.'||strip(memname)||'_'||strip(pairID)||'_divisions;
         SET tmp;
         BY '||strip(byVar)||';
         if last.'||strip(scan(byVar,1))||';
      RUN;

      PROC DATASETS lib=work noprint;
         delete tmp;
      RUN;QUIT;');
RUN;

But I'm not sure if creating 100 files x 6 pair variables = 600 datasets is the thing you really want.

________________________
- Cheers -

View solution in original post

RW9 · Posted 09-13-2017 03:43 AM

Yes, there is a way to do it, and there are many topics on here which you can find using the search, in fact I have probably posted such many times. Call execute accepts a string - any string - and inserts that into the compiler after the datastep has finished running, therefore any Base SAS code you can write can be put into a call execute, much like you could put it into a macro, the benefit being that you use the datastep as a loop rather than handling that with macro code. So write your code once so that it works, then identify the pieces which will change, then put that in a call execute.

However, that being said, as in my other post you are really fighting the SAS system. It is really not a good idea to split data up into little bits. What will you do after you have copied the code 200 times or put it all in a macro, everything from that point on then requires the same process, not to mention that running each bit will take longer and use more resources. Three examples for you of how to work with the data as is:

arrays: you can use arrays to refer to groups of variables

shorthand: the other method is shorthand, so to sum up all i_ variables sum(of i_:) - how complicated would that be if it was all in different datasets?

Normalisiation: having the data go down the page rather than across e.g:

PARAM VALUE

i_20901 xyz

...

You can then refer to observations rather than splitting that out into a new dataset - i.e. where clausing.

ChrisBrooks · Posted 09-13-2017 04:07 AM

I'd like to echo the comments of @RW9 - I've seen this many, many times and I'd bet a pound to a penny that this data originally came from spreadsheets. This "short and wide" layout works really well for copying formulae in, say, Excel but doesn't work so well in other environments for the reasons given. If you have the data "tall and thin" as suggested you'll find your programs are more robust (less changing of code), easier to work with (much less typing and reading lines of lines of repeated variable names and simpler coding using By Groups) and they'll actually run faster as well. You should definitley consider converting this data to "long and thin" with Proc Transpose.

Oligolas · Posted 09-13-2017 12:41 PM

Hi,

you're the architect and if I understood your requirements right you could do this:

data test;
input i_20901
i_20902
i_20903
i_20904
i_20905
i_20906
rg7_20901
rg7_20902
rg7_20903
rg7_20904
rg7_20905   
rg7_20906;
cards;
5.2998 0.8653 9.7555 -0.4079 0 0 4 6 4 1 0 0    
15.686 0.4401 6.3543 0.0869 4.3429 -0.1955 6 4 3 3 0 0 
1.7026 -0.169 11.5141 -0.7457 6.1116 0.0722 1 2 5 0 1 4 
0.6265 0.0777 2.2246 -0.1288 7.4253 -0.2075 0 3 0 2 2 0 
5.6289 0.9559 2.0766 -0.3934 9.0273 -0.3257 4 6 0 1 3 0 
3.1862 -0.2157 4.8757 -0.0159 11.6997 -0.0549 2 2 2 3 4 2 
7.7334 -0.1715 9.3654 -0.1501 14.9105 -0.0166 5 2 4 2 5 2 
2.3106 0.3048 6.3218 -0.6706 25.3866 0.011 2 4 3 0 6 3
;
run;

DATA vcolumn;
   length pairID $20;
   set sashelp.vcolumn(keep=libname memname name); 
   where libname eq 'WORK' and memname eq 'TEST';
   pairID=scan(name,2,'_');
RUN;

PROC SORT data=vcolumn; 
   by libname memname pairID descending name; 
RUN;

DATA vcolumn;
   length byVar $40;
   set vcolumn;
   retain byVar;
   by libname memname pairID descending name; 
   if first.pairID then byVar=strip(name);
   else do;
      byVar=strip(byVar)||' '||strip(name);
      output;
   end;
RUN;

DATA _NULL_;
   set vcolumn;
   call execute('
      PROC SORT data='||strip(libname)||'.'||strip(memname)||' out=tmp;
         BY '||strip(byVar)||';
      RUN;

      DATA WORK.'||strip(memname)||'_'||strip(pairID)||'_divisions;
         SET tmp;
         BY '||strip(byVar)||';
         if last.'||strip(scan(byVar,1))||';
      RUN;

      PROC DATASETS lib=work noprint;
         delete tmp;
      RUN;QUIT;');
RUN;

But I'm not sure if creating 100 files x 6 pair variables = 600 datasets is the thing you really want.

________________________
- Cheers -

ChrisBrooks · Posted 09-13-2017 01:37 PM

I've seen systems like that - they're a support nightmare.....

NKormanik · Posted 09-14-2017 05:34 AM

Not really sure what you did here, Oligolas, but you're still the winner.

Thanks very much!

Do same basic steps on 200 variables, in a particular dataset

Re: Do same basic steps on 200 variables, in a particular dataset

Re: Do same basic steps on 200 variables, in a particular dataset

Re: Do same basic steps on 200 variables, in a particular dataset

Re: Do same basic steps on 200 variables, in a particular dataset

Re: Do same basic steps on 200 variables, in a particular dataset

Re: Do same basic steps on 200 variables, in a particular dataset

Classroom Training Available!