- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello, beginner posting for the 1st time,
I want to create a new column that assigns weighted values to a large dataset using a vlookup table in SAS, rather than write hundreds of if-then statements. For example, I have LARGE dataset and each record has a unique groupid that specifies the stratified group it belongs to. (I have hundreds of groups and weighted values)
Vlookup table example:
How can I create a new column where the weighted value is returned based off the string? (Example: If Unique ID=221 then Weight=17.49)
Desired output:
I could use excel for this but I want to use SAS and improve my programming skills. Would array be the best function? What are other ways to do this in SAS? Thank you in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's a good paper that illustrates quite a few methods.
https://support.sas.com/resources/papers/proceedings15/2219-2015.pdf
- Join (MERGE or SQL)
- Format
- Arrays
- Key Indexing
- Hash Table
For the start, Joins, Formats and Arrays can get you 99% of the way there. I've used those three almost exclusively for my entire SAS programming career.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The BEST way depends on a lot on things like the size of two datasets and the types of the variables being created.
The most natural SAS way is to use a FORMAT to translate a code into a string. If you need to create a numeric variable, like your WEIGHT example, then you will need to use an INFORMAT instead. But an INFORMAT needs a character value as the input.
Another natural way is to just to combine the two datasets. Use a JOIN with PROC SQL. Or MERGE using a data step. Both of those require sorting by the key variable (but the SQL method will do the sorting for you) so if the datasets are large and not already sorted by the key variable that can be a performance issue.
A newer method in SAS is a HASH object. But that requires that the lookup table is small enough to fit into memory. it does not require the other (potentially much larger dataset) to be sorted.
data group_wt;
input group_id $ weight ;
cards;
111 42.38
121 14.32
;
data have;
input id $ group_id $ date :date.;
format date date9.;
cards;
1 121 01JAN2022
2 100 01FEB2022
3 111 01MAR2023
;
data want;
set have;
if _n_=1 then do;
if 0 then set group_wt(keep=group_id weight);
declare hash vlookup(dataset:"group_wt(keep=group_id weight");
_error_ = vlookup.definekey("group_id");
_error_ = vlookup.definedata(all:'yes');
_error_ = vlookup.definedone();
end;
if vlookup.find() then call missing(of weight);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much! This is helpful.