@hashman Thank you for such a helpful post! I found sasfile function particularly helpful. Could you please offer a little bit more explanation of what "retain p 1" and "point=p" means in your code?
sasfile fill load ;
data want ;
set have ;
retain p 1 ;
if name = "Smith" then set fill point = p ;
run ; sasfile fill close ;
"
... set fill point = p ;
means "read observation # P". In this case, the FILL data set has but one record we want to read every time the name is Smith, so we need P=1. RETAIN is used to populate P with 1 for the duration of the step without the need to assign P=1 when name is Smith. Note that we cannot code just point=1 since the option cannot accept a literal; it only accepts a variable. BTW, this variable (in this case, P) is auto-dropped.
Now when set fill point = p is executed, the values of the variables in FILL overwrite the values of the corresponding variables in the PDV, quod erat faciendum. SASFILE isn't a function but a statement placing FILL into memory just to make SET FILL execute faster than reading from disk.
Kind regards
Paul D.
@hashman Thank you for your explanation! You also mentioned in the post that we can fill with not only single values. Let's say I want to create winsorized variables based on a set of variables at their 1st and 99 percentile. I've read each variable's 1st and 99 percentile with output. If I want to use the sasfile statement to fill values in the output file, should I first transpose it to the long format to retain values? I was using %macro and call execute for winsorizing, but I also want to try with the sasfile statement. What's your suggestion?
proc means data=mydata p1 p99;
var ch_score1-ch_score100;
output out =sum p1=lower1-lower100 p99=upper1-upper100;
run;
There's a ballot entry for this.
I'd be curious to know how many ballot entries have been implemented.
I have no clue how many, either absolutely or percentage-wise. As to this one, I ain't holding my breath for two reasons: (a) don't think it's high on the list of their priorities and (b) they might deem that there's enough existing functionality that can be used to do it if need be. And, truth be told, they may be right about the latter. Furthermore, if one would want to set v[*] to a number of predetermined different values en masse (not just one) - for example, listed in a SAS data set, - it's easy to do with either point= or a hash, as already shown in this thread.
Kind regards
Paul D.
1. My guess is: epsilon, asymptotically close to zero.
2. As for the suggestion itself, a new feature is always unneeded as it can be worked around. The latest most popular language additions are arguably the cat functions, which can easily be replicated using || and trim() and more.
Yet they contribute to clearer, better code. "Not required" can probably be said of all ballot entries.
My guess is that the new functions were added because SAS Institute developers thought they'd be useful for their jobs developing new products. Who knows?
@ChrisNZ :
"2. As for the suggestion itself, a new feature is always unneeded as it can be worked around. The latest most popular language additions are arguably the cat functions, which can easily be replicated using || and trim() and more.
Yet they contribute to clearer, better code."
Precisely. Plus the kitties add so much functionality to very frequent necessities that emulating it with workarounds would make code unpalatable. They're a typical example of astutely satisfying a rather urgent demand brought about by common usage by offering a number of canned solutions. The same can be said of the SQL INTO clause and many other things.
Of course I'd more than welcome a fast canned function or call routine that would stick a given value, rather than only a standard missing value, into v[*]. However, compared to the kitties and such, I don't see how much value (aside from the speed) it would add versus how badly it's needed.
Kind regards
Paul D.
And now that you've seen all the different options you might also understand that plain and simple code without any "smarts" is sometimes not a bad option because it's easy to understand and maintain.
Your code sample doesn't show the "if/then" logic you mention but just in case you're looking for an approach which allows you to recode data based on values of another variable below some code using an informat for this instead of separate if/then code blocks.
data scores;
infile datalines dsd;
input Name : $9. Score1-Score3 Team ~ $25. Div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
proc format;
invalue recode
'Smith' = 10
'Mitchel' = 20
other = .
;
run;
data test;
set scores;
if input(name,recode.) ne . then
do;
Score1=input(name,recode.);
Score2=input(name,recode.);
Score3=input(name,recode.);
end;
run;
proc print data=test;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.