Datastep Help

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Datastep Help

Hi,

I'm stuck creating the appropriate Datastep that will perform a calculation on all possible pair-wise combinations of a numeric variable that pertains to an ID variable, as stratified by the combination of two character variables. The number of the numeric variable values varies across the charcter variable combinations and all character variable values have unique nomenclature (as opposed to the general case below).

The general case:

VAR1     VAR2      ID     Y

A               A          A     y1

A               A          B     y2

A               A          C     y3

Desired output:

VAR1     VAR2     VAR3

A               A          y1+y2

A               A          y1+y3

A               A          y2+y3

Any help is much appreciated.


Accepted Solutions
Solution
‎05-20-2014 11:58 AM
Super User
Posts: 19,835

Re: Datastep Help

I don't understand how UNIQUEID comes into play.

Can you provide a more detailed example, if one of the above suggestions doesn't meet your requirements.

Also, an inefficient method is a cross join and then filter out the results you don't want.

View solution in original post


All Replies
Regular Contributor
Posts: 217

Re: Datastep Help

Hi Anderrh,

Have you tried the lag function?  I am assuming that y1, y2, and y3 are numeric values.  I am also assuming that VAR1 and VAR2 are always the same and the there are always three rows of equal values in VAR1 and VAR2.  


proc sql; drop table work.have; quit;
data work.have;
input VAR1 $ VAR2 $ ID $ Y $;
datalines;
A A A y1
A A B y2
A A C y3
;
RUN;

PROC SORT DATA=HAVE OUT=HAVE2(DROP=ID);
BY VAR1 VAR2 Y;
RUN;

/** ASSUMING THAT Y IS A NUMERIC VALUE **/

DATA WANT;
    SET HAVE2;
BY VAR1 VAR2;
IF LAST.VAR1 AND LAST.VAR2 THEN DO;
        VAR3 = Y + LAG1(Y);  /*(Y3 + Y2)*/
        OUTPUT;
        VAR3 = Y + LAG2(Y);  /*(Y3 + Y1)*/
        OUTPUT;
        VAR3 = LAG2(Y) + LAG3(Y);  /* Y2 + Y1 */
        OUTPUT;
END;
RETURN;
DROP Y;
END;

New Contributor
Posts: 3

Re: Datastep Help

Thanks for the reply.

Y is a numeric variable but, unfortunately, varies in terms of sample size among the various combinations of VAR1 and VAR2, which are not the same.

There are 87 unique values of VAR1, each of which contains anywhere from 2 to 50 unique values of VAR2, each of which contains anywhere from 3 to 50 unique values of ID, each of which contains a unique Y. I'm trying to permutate through all possible pair-wise combinations of Y values among the VAR1*VAR2 combinations.  

Regular Contributor
Posts: 217

Re: Datastep Help

Dear ANDERRH,

The code below should group all your var1 and var2 combinations.  The process of adding the group pairs is not that simple. The principle of my previous code is the same.  For group AudiSedan add 13Y + 12Y then 13Y + 11Y then ...... 12Y + 11Y then............2Y + 1Y;  (13+12+11+10....+1 loops for the group AudiSedan).


proc sql;
  create table var3 as
  select distinct
       strip(make)||strip(type) as group,
       a.make as var1,
       a.type as var2,
       a.invoice as y format=8.,
       count(calculated group) as groupcount
from sashelp.cars as a
group by 1
order by 1,4
;
quit;

                                                          var3                            13:35 Monday, May 19, 2014   2

                            Obs       group       var1     var2             y    groupcount

                              1    AcuraSUV       Acura    SUV          33337         1
                              2    AcuraSedan     Acura    Sedan        21761         5
                              3    AcuraSedan     Acura    Sedan        24647         5
                              4    AcuraSedan     Acura    Sedan        30299         5
                              5    AcuraSedan     Acura    Sedan        39014         5
                              6    AcuraSedan     Acura    Sedan        41100         5
                              7    AcuraSports    Acura    Sports       79978         1
                              8    AudiSedan      Audi     Sedan        23508        13
                              9    AudiSedan      Audi     Sedan        28846        13

Solution
‎05-20-2014 11:58 AM
Super User
Posts: 19,835

Re: Datastep Help

I don't understand how UNIQUEID comes into play.

Can you provide a more detailed example, if one of the above suggestions doesn't meet your requirements.

Also, an inefficient method is a cross join and then filter out the results you don't want.

New Contributor
Posts: 3

Re: Datastep Help

Thanks to all - I actually figured out the cross-join method using the entire cartesian product, filtering out those combinations across the VAR1 VAR2 strata. As noted, it's not the most efficient but it works. Thanks again.

Super User
Posts: 10,041

Re: Datastep Help

Are you doing Survive Analysis ?

proc sort data=sashelp.class(keep= age weight) out=have; by age weight;run;
data want;
 array x{9999} _temporary_;
 n=0; call missing(of x{*});
 do until(last.age);
 set have;
  by age;
  n+1;x{n}=weight;
 end;
 do i=1 to n-1;
  do j=i+1 to n;
   sum=sum(x{i},x{j});
   output;
  end;
 end;
 keep age sum;
run;


Xia Keshan

Regular Contributor
Posts: 217

Re: Datastep Help

Dear Anderrh;

I used sashelp.cars to add the current value to a prior value cascading through all pair combinations.  I tested using 16 rows of the same VAR1 and VAR2 equalities.  I built an array to hold all the values associated to the VAR1 and VAR2 unique combinations; then I used do loops to add different array variable pairs.

Attachment
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 274 views
  • 0 likes
  • 4 in conversation