DATA Step, Macro, Functions and more

Selecting data based on Index

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Selecting data based on Index

[ Edited ]

I have two tables, t1 is filled with all purchase records and is laid out like so:

 

CustomerID|ProductID|PurchaseAge|TimeSincePurchase|

A00001    |0001     |23         |7                |

A00001    |0002     |29         |1                |

A00002    |0001     |65         |13               |

 

t2 is in the layout of:  

 

ProductID|PurchaseAge|FactorAtTime0|FactorAtTime1|...|FactorAtTime120|

0001     |18         |     ...     |     ...     |...|      ...      |

0001     |19         |     ...     |     ...     |...|      ...      |

...      |...        |     ...     |     ...     |...|      ...      |

0002     |18         |     ...     |     ...     |...|      ...      |

0002     |19         |     ...     |     ...     |...|      ...      |

...      |...        |     ...     |     ...     |...|      ...      |

 

I'm attempting to write a query that would left join on the ProductID and PurchaseAge characteristics and then pull in the factor from the "TimeSincePurchase + 3"-th column of t2 . 

 

Is there a method to call columns by their ordinal position within the table as opposed to the name of the column?

If so, can a variable be used in place of a numerical constant?

 

EDIT: As per the below comments, it appears you would have to use a data step rather than sql in order to dynamically pick which column to pull from. The solution I ended up going with was transforming the second table like below then joining on ProductID, PurchaseAge, and TimeSincePurchase.

 

ProductID|PurchaseAge|TimeSincePurchase|Factor|

0001     |18         |       0         |  ..  |

0001     |18         |       1         |  ..  |

0001     |18         |       2         |  ..  |

0001     |19         |     ...         |  ..  |

...      |...        |     ...         |  ..  |

0002     |18         |     ...         |  ..  |

0002     |19         |     ...         |  ..  |

...      |...        |     ...         |  ..  |

 


Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 23,683

Re: Selecting data based on Index

Posted in reply to ProcLunchbreak
Use VVALUEX().

time=timeSincePurchase+3;
varname = catt('FactorAtTime', put(time, 8. -l));
want = vvaluex(varname); *you may need to convert this to a number;

View solution in original post


All Replies
PROC Star
Posts: 1,269

Re: Selecting data based on Index

Posted in reply to ProcLunchbreak

So when TimeSincePurchase=7 you want to pull the factor from t2 from the variable FactorAtTime10 in the record for productID=0001, PurchaseAge=23, correct?

 

Can you show us an example of what your desired output looks like?

New Contributor
Posts: 3

Re: Selecting data based on Index

When TimeSincePurchase=7 I want to pull the factor from the 10th column in t2, FactorAtTime7.

For Example:

 

t1 record 

 

CustomerID|ProductID|PurchaseAge|TimeSincePurchase|

A00001    |0001     |23         |7                |

 

A00001    |0002     |29         |1                |

A00002    |0001     |65         |13               |

 

and t2 record 

 

ProductID|PurchaseAge|FactorAtTime0|FactorAtTime1|...|FactorAtTime7|

 

0001     |21         |     1.3     |     0.2     |...|      0.9    |

0001     |22         |     0.6     |     5.1     |...|      4.7    |

0001     |23         |     1.7     |     2.3     |...|      6.8    |

 

would create

 

CustomerID|ProductID|PurchaseAge|TimeSincePurchase|CurrentFactor

A00001    |0001     |23         |7                |6.8

A00001    |0002     |29         |1                |

A00002    |0001     |65         |13               |

Super User
Posts: 23,683

Re: Selecting data based on Index

[ Edited ]
Posted in reply to ProcLunchbreak

An array is another option but note that you start at 0, so make sure it's +3 that you actually want which would return FactorAtTime 11 since SAS indexes from 1, not 0. 

 

array factorT(*) factorAtTime0 - factorAtTime120;

Value = factort(timesincePurchase-1);

 

PROC Star
Posts: 1,269

Re: Selecting data based on Index

[ Edited ]
Posted in reply to ProcLunchbreak

If you simply want to pull the factor from the variable FactorAtTime7 when TimeSincePurchase=7 in t1, then pulling this value by its relative position in the data set is not the way to go.

 

Use the VVALUEX Funtion as suggested by @Reeza, but do not set time=timeSincePurchase+3.

Super User
Posts: 23,683

Re: Selecting data based on Index


@draycut wrote:

If you simply want to pull the factor from the variable FactorAtTime7 when TimeSincePurchase=7 in t1, then pulling this value by its relative position in the data set is not the way to go.

 

Use the VVALUEX Funtion as suggested by @Reeza, but do not set time=timeSincePurchase+3.


Good catch, thanks Smiley Happy

 

@ProcLunchbreak use Arrays or VVALUEX. Both are illustrated, you can play around to ensure you have the right index set up. 

 

 

Solution
3 weeks ago
Super User
Posts: 23,683

Re: Selecting data based on Index

Posted in reply to ProcLunchbreak
Use VVALUEX().

time=timeSincePurchase+3;
varname = catt('FactorAtTime', put(time, 8. -l));
want = vvaluex(varname); *you may need to convert this to a number;
New Contributor
Posts: 3

Re: Selecting data based on Index

I'm getting a syntax error immediately after VVALUEX with the following code:

 

PROC SQL;
     SELECT
     t1.CustomerID,
     t1.ProductID,
     t1.PurchaseAge,
     t1.TimeSincePurchase,
     (t2.VVALUEX(CATT('FactorAtTime', PUT(t1.TimeSincePurchase, 8., -l)))) AS CurrentFactor
     FROM WORK.PurchaseRecords t1
     LEFT JOIN WORK.ProductFactors t2 ON (t1.ProductID = t2.ProductID) AND (t1.PurchaseAge =
t2.PurchaseAge)
;QUIT;

 

Is this is something I should be trying in a datastep rather than SQL? Smiley Sad

 

Super User
Posts: 23,683

Re: Selecting data based on Index

Posted in reply to ProcLunchbreak

@ProcLunchbreak wrote:

 

Is this is something I should be trying in a datastep rather than SQL? Smiley Sad

 


Yes, AFAIK SQL doesn't have a dynamic method of referencing columns without knowing it ahead of time.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 121 views
  • 2 likes
  • 3 in conversation