About FriedEgg

FriedEgg · ‎11-08-2011

Hi, Is there a was to compute an average under a certain condition? For example say I have a modified version of sashelp.class with an additional record where individual has a weight of zero. I want to count this individual into his category, but exclude him from the calculated mean. Here is example code: proc format; value myfmt low-99.999 = 'Light' 100-high = 'Heavy'; quit; data class; input (name sex) ($) age height weight; output; do until(done); set sashelp.class end=done; output; end; cards; FriedEgg M 99 100 0 ; run; proc report data=class; column sex weight=status (Sum Mean),weight; define sex / group; define status / across format=myfmt. preloadfmt; define weight / format=comma8.2; compute after; line ' '; line 'Total Class Weight: ' weight.sum comma8.2; endcomp; run; weight Sum Mean sex Heavy Light weight weight F 3 6 811.00 90.11 M 6 5 1,089.50 99.05 Total Class Weight: 1,900.50 Desired Output: S e Weight Sum Mean x Heavy Light Weight Weight F 3 6 811.00 90.11 M 6 5 1,089.50 108.95 Total Class Weight: 1,900.50

FriedEgg · ‎11-08-2011

I have this running okay with wget. Not able to test with proc http at the moment. wget submission line: wget --post-file=post_data http://www.dgsweb.state.pa.us/Debarment_list/default.asp contents of post_data __EVENTTARGET=&__EVENTARGUMENT=&__LASTFOCUS=&__VIEWSTATE=%2FwEPDwUKMTQ4NjEyMDk4NQ9kFgICAw9kFgYCAQ9kFgoCAQ8QZBAVAwpDaG9vc2UgT25lBkFjdGl2ZQZMb2cgSW4VAwpDaG9vc2UgT25lBkFjdGl2ZQZMb2cgSW4UKwMDZ2dnFgECAWQCAw8QZGQWAWZkAgsPEA8WAh4HVmlzaWJsZWhkZBYAZAINDw8WBB4EVGV4dAUiQ2hvb3NlIG9uZSB0byBzcGVjaWZ5IHNlYXJjaCBhcmVhLh8AaGRkAhUPEGRkFgECAWQCAw9kFgYCCQ8QZA8WRGYCAQICAgMCBAIFAgYCBwIIAgkCCgILAgwCDQIOAg8CEAIRAhICEwIUAhUCFgIXAhgCGQIaAhsCHAIdAh4CHwIgAiECIgIjAiQCJQImAicCKAIpAioCKwIsAi0CLgIvAjACMQIyAjMCNAI1AjYCNwI4AjkCOgI7AjwCPQI%2BAj8CQAJBAkICQxZEEGVlZxAFBUFnaW5nBQIxMGcQBQtBZ3JpY3VsdHVyZQUBNGcQBQdCYW5raW5nBQE1ZxAFG0NlbnRlciBmb3IgSnV2ZW5pbGUgSnVzdGljZQUEODEwN2cQBRhDaXZpbCBTZXJ2aWNlIENvbW1pc3Npb24FAjMyZxAFC0NvcnJlY3Rpb25zBQIxMWcQBQREQ0VEBQIyNGcQBQREQ05SBQIzOGcQBSZEZXBhcnRtZW50IG9mIEVudmlyb25tZW50YWwgUHJvdGVjdGlvbgUCMzVnEAUeRGVwYXJ0bWVudCBvZiBMYWJvciAmIEluZHVzdHJ5BQIxMmcQBQlFZHVjYXRpb24FAjE2ZxAFG0Vudmlyb25tZW50YWwgSGVhcmluZyBCb2FyZAUCMzdnEAUYRmlzaCBhbmQgQm9hdCBDb21taXNzaW9uBQIyMmcQBQ9HYW1lIENvbW1pc3Npb24FAjIzZxAFEEdlbmVyYWwgU2VydmljZXMFAjE1ZxAFBkhlYWx0aAUBN2cQBQlJbnN1cmFuY2UFATlnEAUUTGlxdW9yIENvbnRyb2wgQm9hcmQFAjI2ZxAFHU1pbGl0YXJ5IGFuZCBWZXRlcmFucyBBZmZhaXJzBQIxM2cQBRRNaWxrIE1hcmtldGluZyBCb2FyZAUCMjdnEAUYT2ZmaWNlIG9mIEFkbWluaXN0cmF0aW9uBQQ4MTAxZxAFGU9mZmljZSBvZiBHZW5lcmFsIENvdW5zZWwFBDgxMDJnEAUbT2ZmaWNlIG9mIEluc3BlY3RvciBHZW5lcmFsBQQ4MTAzZxAFFE9mZmljZSBvZiB0aGUgQnVkZ2V0BQQ4MTA0ZxAFFk9mZmljZSBvZiB0aGUgR292ZXJub3IFATFnEAUdUEEgSHVtYW4gUmVsYXRpb25zIENvbW1pc3Npb24FBDgxMDVnEAUUUEEgUHVibGljIFRWIE5ldHdvcmsFAjM0ZxAFBFBDQ0QFBDgxMDZnEAUEUEVNQQUCMzFnEAUtUGVubnN5bHZhbmlhIEhpc3RvcmljYWwgYW5kIE11c2V1bSBDb21taXNzaW9uBQIzMGcQBShQZW5uc3lsdmFuaWEgTXVuaWNpcGFsIFJldGlyZW1lbnQgU3lzdGVtBQI3MWcQBRlQZW5uc3lsdmFuaWEgU3RhdGUgUG9saWNlBQIyMGcQBQhQRU5OVkVTVAUCMzNnEAUUUHJvYmF0aW9uIGFuZCBQYXJvbGUFAjI1ZxAFKVB1YmxpYyBTY2hvb2wgRW1wbG95ZWVzIFJldGlyZW1lbnQgU3lzdGVtBQI3MmcQBQ5QdWJsaWMgV2VsZmFyZQUCMjFnEAUDUFVDBQIxN2cQBQdSZXZlbnVlBQIxOGcQBRVTZWN1cml0aWVzIENvbW1pc3Npb24FATZnEAUFU3RhdGUFAjE5ZxAFIVN0YXRlIEVtcGxveWVlcyBSZXRpcmVtZW50IFN5c3RlbQUCNzBnEAUXU3RhdGUgRXRoaWNzIENvbW1pc3Npb24FAjQwZxAFHFN0YXRlIFRheCBFcXVhbGl6YXRpb24gQm9hcmQFAjM2ZxAFDlRyYW5zcG9ydGF0aW9uBQE4ZxAFBlNlbmF0ZQUCNDFnEAUYSG91c2Ugb2YgUmVwcmVzZW50YXRpdmVzBQI0MmcQBQ9BdWRpdG9yIEdlbmVyYWwFATJnEAUIVHJlYXN1cnkFATNnEAUQQXR0b3JuZXkgR2VuZXJhbAUCMTRnEAUVTHQuIEdvdmVybm9yJ3MgT2ZmaWNlBQIyOGcQBQVQSEVBQQUCMzlnEAUmTGVnaXNsYXRpdmUgQWdlbmN5IG9mIEdlbmVyYWwgQXNzZW1ibHkFAjQ1ZxAFGkxlZ2lzbGF0aXZlIFNlcnZpY2UgQWdlbmN5BQI0NmcQBSZMZWdpc2xhdGl2ZSBCdWRnZXQgJiBGaW5hbmNlIENvbW1pdHRlZQUCNDdnEAUWSnVkaWNpYWwgQ29uZHVjdCBCb2FyZAUCNTFnEAUSQ29tbW9ud2VhbHRoIENvdXJ0BQI1OGcQBQ1TdXByZW1lIENvdXJ0BQI2MGcQBSdJbmRlcGVuZGVudCBSZWd1bGF0b3J5IFJldmlldyBDb21taXR0ZWUFAjYzZxAFF1BBIEdhbWluZyBDb250cm9sIEJvYXJkBQI2NWcQBSNQaGlsZGVscGhpYSBSZWdpb25hbCBQb3J0IEF1dGhvcml0eQUCODhnEAUgU3RhdGUgU3lzdGVtIG9mIEhpZ2hlciBFZHVjYXRpb24FAjkwZxAFE1R1cm5waWtlIENvbW1pc3Npb24FAjkxZxAFIFB1YmxpYyBTY2hvb2wgQnVpbGRpbmcgQXV0aG9yaXR5BQI5OGcQBS9Hb3Zlcm5vcidzIE9mZmljZSBmb3IgTWFuYWdlbWVudCAmIFByb2R1Y3Rpdml0eQUEODEwOGcQBRNDb3VuY2lsIG9uIHRoZSBBcnRzBQQ4MTA5ZxAFHkNhcGl0b2wgUHJlc2VydmF0aW9uIENvbW1pdHRlZQUCOTlnEAUJUEEgQ291cnRzBQI1M2cWAWZkAhUPEGRkFgFmZAIdDxBkZBYBZmQCBQ9kFgQCAQ8QZGQWAWZkAgUPPCsADQBkGAIFEWdyZERlYmFybWVudExpc3QyD2dkBRBncmREZWJhcm1lbnRMaXN0DzwrAAoBCAIBZDtIKQzuOUvkzIcQ22nPMsJDDYX4&__EVENTVALIDATION=%2FwEWEQLH2ujdCgLRoIiFCALf68jbBQL4oL9tApnvpbIHAvmXzo0MAun45GMC9vjkYwKT9%2FrLBwL%2B6YyoCgLCi9reAwKtkuWiCgKXjq%2BFCAKIjq%2BFCAKY4YXrBALC34jvAwL6o4PRA5xSrzJgIdRAZfigZJrvkhhVoNXL&ddlSelect=Active&ddlSearchBy=0&txtContractor=&btnPrint=Printable+Version&rblPaging=0&hidEdit=False&hidLogin= This is the post data you are using below: __EVENTTARGET=&__EVENTARGUMENT=&__LASTFOCUS=&__VIEWSTATE=%2FwEPDwUKMTQ4NjEyMDk4NQ9kFgICAw9kFgYCAQ9kFgoCAQ8QZBAVAwpDaG9vc2UgT25lBkFjdGl2ZQZMb2cgSW4VAwpDaG9vc2UgT25lBkFjdGl2ZQZMb2cgSW4UKwMDZ2dnFgECAWQCAw8QZGQWAWZkAgsPEA8WAh4HVmlzaWJsZWhkZBYAZAINDw8WBB4EVGV4dAUiQ2hvb3NlIG9uZSB0byBzcGVjaWZ5IHNlYXJjaCBhcmVhLh8AaGRkAhUPEGRkFgECAWQCAw9kFgYCCQ8QZA8WRGYCAQICAgMCBAIFAgYCBwIIAgkCCgILAgwCDQIOAg8CEAIRAhICEwIUAhUCFgIXAhgCGQIaAhsCHAIdAh4CHwIgAiECIgIjAiQCJQImAicCKAIpAioCKwIsAi0CLgIvAjACMQIyAjMCNAI1AjYCNwI4AjkCOgI7AjwCPQI%2BAj8CQAJBAkICQxZEEGVlZxAFBUFnaW5nBQIxMGcQBQtBZ3JpY3VsdHVyZQUBNGcQBQdCYW5raW5nBQE1ZxAFG0NlbnRlciBmb3IgSnV2ZW5pbGUgSnVzdGljZQUEODEwN2cQBRhDaXZpbCBTZXJ2aWNlIENvbW1pc3Npb24FAjMyZxAFC0NvcnJlY3Rpb25zBQIxMWcQBQREQ0VEBQIyNGcQBQREQ05SBQIzOGcQBSZEZXBhcnRtZW50IG9mIEVudmlyb25tZW50YWwgUHJvdGVjdGlvbgUCMzVnEAUeRGVwYXJ0bWVudCBvZiBMYWJvciAmIEluZHVzdHJ5BQIxMmcQBQlFZHVjYXRpb24FAjE2ZxAFG0Vudmlyb25tZW50YWwgSGVhcmluZyBCb2FyZAUCMzdnEAUYRmlzaCBhbmQgQm9hdCBDb21taXNzaW9uBQIyMmcQBQ9HYW1lIENvbW1pc3Npb24FAjIzZxAFEEdlbmVyYWwgU2VydmljZXMFAjE1ZxAFBkhlYWx0aAUBN2cQBQlJbnN1cmFuY2UFATlnEAUUTGlxdW9yIENvbnRyb2wgQm9hcmQFAjI2ZxAFHU1pbGl0YXJ5IGFuZCBWZXRlcmFucyBBZmZhaXJzBQIxM2cQBRRNaWxrIE1hcmtldGluZyBCb2FyZAUCMjdnEAUYT2ZmaWNlIG9mIEFkbWluaXN0cmF0aW9uBQQ4MTAxZxAFGU9mZmljZSBvZiBHZW5lcmFsIENvdW5zZWwFBDgxMDJnEAUbT2ZmaWNlIG9mIEluc3BlY3RvciBHZW5lcmFsBQQ4MTAzZxAFFE9mZmljZSBvZiB0aGUgQnVkZ2V0BQQ4MTA0ZxAFFk9mZmljZSBvZiB0aGUgR292ZXJub3IFATFnEAUdUEEgSHVtYW4gUmVsYXRpb25zIENvbW1pc3Npb24FBDgxMDVnEAUUUEEgUHVibGljIFRWIE5ldHdvcmsFAjM0ZxAFBFBDQ0QFBDgxMDZnEAUEUEVNQQUCMzFnEAUtUGVubnN5bHZhbmlhIEhpc3RvcmljYWwgYW5kIE11c2V1bSBDb21taXNzaW9uBQIzMGcQBShQZW5uc3lsdmFuaWEgTXVuaWNpcGFsIFJldGlyZW1lbnQgU3lzdGVtBQI3MWcQBRlQZW5uc3lsdmFuaWEgU3RhdGUgUG9saWNlBQIyMGcQBQhQRU5OVkVTVAUCMzNnEAUUUHJvYmF0aW9uIGFuZCBQYXJvbGUFAjI1ZxAFKVB1YmxpYyBTY2hvb2wgRW1wbG95ZWVzIFJldGlyZW1lbnQgU3lzdGVtBQI3MmcQBQ5QdWJsaWMgV2VsZmFyZQUCMjFnEAUDUFVDBQIxN2cQBQdSZXZlbnVlBQIxOGcQBRVTZWN1cml0aWVzIENvbW1pc3Npb24FATZnEAUFU3RhdGUFAjE5ZxAFIVN0YXRlIEVtcGxveWVlcyBSZXRpcmVtZW50IFN5c3RlbQUCNzBnEAUXU3RhdGUgRXRoaWNzIENvbW1pc3Npb24FAjQwZxAFHFN0YXRlIFRheCBFcXVhbGl6YXRpb24gQm9hcmQFAjM2ZxAFDlRyYW5zcG9ydGF0aW9uBQE4ZxAFBlNlbmF0ZQUCNDFnEAUYSG91c2Ugb2YgUmVwcmVzZW50YXRpdmVzBQI0MmcQBQ9BdWRpdG9yIEdlbmVyYWwFATJnEAUIVHJlYXN1cnkFATNnEAUQQXR0b3JuZXkgR2VuZXJhbAUCMTRnEAUVTHQuIEdvdmVybm9yJ3MgT2ZmaWNlBQIyOGcQBQVQSEVBQQUCMzlnEAUmTGVnaXNsYXRpdmUgQWdlbmN5IG9mIEdlbmVyYWwgQXNzZW1ibHkFAjQ1ZxAFGkxlZ2lzbGF0aXZlIFNlcnZpY2UgQWdlbmN5BQI0NmcQBSZMZWdpc2xhdGl2ZSBCdWRnZXQgJiBGaW5hbmNlIENvbW1pdHRlZQUCNDdnEAUWSnVkaWNpYWwgQ29uZHVjdCBCb2FyZAUCNTFnEAUSQ29tbW9ud2VhbHRoIENvdXJ0BQI1OGcQBQ1TdXByZW1lIENvdXJ0BQI2MGcQBSdJbmRlcGVuZGVudCBSZWd1bGF0b3J5IFJldmlldyBDb21taXR0ZWUFAjYzZxAFF1BBIEdhbWluZyBDb250cm9sIEJvYXJkBQI2NWcQBSNQaGlsZGVscGhpYSBSZWdpb25hbCBQb3J0IEF1dGhvcml0eQUCODhnEAUgU3RhdGUgU3lzdGVtIG9mIEhpZ2hlciBFZHVjYXRpb24FAjkwZxAFE1R1cm5waWtlIENvbW1pc3Npb24FAjkxZxAFIFB1YmxpYyBTY2hvb2wgQnVpbGRpbmcgQXV0aG9yaXR5BQI5OGcQBS9Hb3Zlcm5vcidzIE9mZmljZSBmb3IgTWFuYWdlbWVudCAmIFByb2R1Y3Rpdml0eQUEODEwOGcQBRNDb3VuY2lsIG9uIHRoZSBBcnRzBQQ4MTA5ZxAFHkNhcGl0b2wgUHJlc2VydmF0aW9uIENvbW1pdHRlZQUCOTlnEAUJUEEgQ291cnRzBQI1M2cWAWZkAhUPEGRkFgFmZAIdDxBkZBYBZmQCBQ9kFgQCAQ8QZGQWAWZkAgUPPCsADQBkGAIFEWdyZERlYmFybWVudExpc3QyD2dkBRBncmREZWJhcm1lbnRMaXN0DzwrAAoBCAIBZDtIKQzuOUvkzIcQ22nPMsJDDYX4&__EVENTVALIDATION=%2FwEWEQLH2ujdCgLRoIiFCALf68jbBQL4oL9tApnvpbIHAvmXzo0MAun45GMC9vjkYwKT9%2FrLBwL%2B6YyoCgLCi9reAwKtkuWiCgKXjq%2BFCAKIjq%2BFCAKY4YXrBALC34jvAwL6o4PRA5xSrzJgIdRAZfigZJrvkhhVoNXL&ddlSelect=Active&ddlSearchBy=0&txtContractor=&btnSubmit=Search&rblPaging=0&hidEdit=False&hidLogin= Your post data works with wget as well. The differences between our versions is that I am retrieving the 'printable' version of the data from the site instead of the regular view, it should be easier to work with later... Make sure that the post file you are creating contains all of the data you are telling it to. There may be restrictions on line length with datalines statement that you are exceeding with this rather long post, or it may get truncated on the write to the file without an appropriate lrecl option on your file statement.

FriedEgg · ‎11-08-2011

data have; input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ; _dllr=abs(_dllr); cards; ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance ; run; proc sort data=have; by _all_; run; data _null_; set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE')) end=done; if done then call symputx('nvar',name); run; data want; set have; by _all_; if first.&nvar and last.&nvar then output; run;

FriedEgg · ‎11-07-2011

There is probably a more efficient way to do this but here is one way... Use proc sort with dupout option and nodupkey, merge the dupout file back to the original data to remove the blanks. data foo; *this contains my data; run; proc sort data=foo out=bar dupout=drops nodupkey; by id trans_id; run; data bar; merge drops(in=b) bar(in=a); by id trans_id; if a and not b; run;

FriedEgg · ‎11-04-2011

Yes, see my example, I use this: Collin County, TX~Hispanic_Num~100 Collin County, TX~Hispanic_Den~1000 Collin County, TX~Uninsured_Pop_Num~500 Collin County, TX~Uninsured_Pop_Den~15000 Plano, TX~Hispanic_Num~200 Plano, TX~Hispanic_Den~10000 And it works...

FriedEgg · ‎11-03-2011

Basically the same as Art's method: data foo; infile cards missover; input name $ age string $20.; if missing(string) then output; else do i=1 to countw(string); new_string=scan(string,i); output; end; drop i; cards; John 28 A,BB Jack 25 M,BM,M4m,44,65 Jill 26 ; run;

FriedEgg · ‎11-03-2011

I happened to be looking at the seminars for the next SAS Global Forum and guess what one of the pre-conference statistical tutorials is... Introduction to Bayesian Analysis Using SAS Software. Humerous coincidence.

FriedEgg · ‎11-03-2011

Sorry, just remove those two statements, they are not necessary anyway.

FriedEgg · ‎11-03-2011

Thanks Peter for your follow ups here, I agree that the issue seems to be that the temp file will not work for this situation and that a properly defined non-temp filename declaration would be better suited. For curiosity, OP, can you run something like this so I can see the log? Temp files should exist for the duration of the session, however maybe it has something to do with the ods close? %macro sendmail; %if g_fileexist=1 %then %do; filename eml email 'someone@work.com' subject='File attachment test' attach=("&g_fname" content_type='text/xml' name='attact_test' extension='xml'); data _null_; file eml; put 'This is a test...'; run; %end; %mend; %global g_fname g_fileexist %local fexist xpath xpathexist; filename f01 temp; ods listing close; ods tagsets.excelxp file=f01 style=meadow; proc print data=sashelp.class; run; %let fexist=%sysfunc(fexist(f01)); data _null_; set sashelp.vextfl; where fileref='F01'; call symputx('xpath',xpath); run; %let g_fname=%sysfunc(pathname(f01)); %let g_fileexist=%sysfunc(fileexist(&g_fname)); %let xpathexist=%sysfunc(fileexist(&xpath)); %put NOTE: BEFORE ODS CLOSE; %put NOTE: FEXIST=&fexist; %put NOTE: FNAME=&g_fname XPATH=&xpath; %put NOTE: FILEEXIST=&g_fileexist XPATHEXIST=&xpathexist; %sendmail ods tagsets.excelxp close; %let fexist=%sysfunc(fexist(f01)); data _null_; set sashelp.vextfl; where fileref='F01'; call symputx('xpath',xpath); run; %let g_fname=%sysfunc(pathname(f01)); %let g_fileexist=%sysfunc(fileexist(&g_fname)); %let xpathexist=%sysfunc(fileexist(&xpath)); %put NOTE: AFTER ODS CLOSE; %put NOTE: FEXIST=&fexist; %put NOTE: FNAME=&g_fname XPATH=&xpath; %put NOTE: FILEEXIST=&g_fileexist XPATHEXIST=&xpathexist; %sendmail

FriedEgg · ‎11-02-2011

It has been many years since I worked with SAS for z/OS so my skills are a bit rusty to that systems peculiarities. If you can send more information from the log and the value the &f01 resolved to.

FriedEgg · ‎11-02-2011

I beleive the attach option does not accept a fileref, only a full file path, or a sas catalog. You can still do this with a temp file, you just need to add a step to collect the path of the temp file. You can collect this from the sashelp.vextfl table proc sql noprint; select xpath into :f01 from sashelp.vextfl where fileref='F01'; quit; filename myemail email from=... to=... subject="..." attach="&f01";

FriedEgg · ‎11-02-2011

In my experience with very large data the performance of proc transpose breaks down and becomes highly inefficient in which case data step with array subscript processing becomes more increasingly more efficeint, this is also more sensitive to the number of variables vs. the number of rows. With the number of variables you are dealing with, I have no experienced issues with transpose performance and usually for smaller sets the simplicity of the call outways the small performance difference I experience on my systems.

FriedEgg · ‎11-01-2011

It's not pretty: %let parm=(40*4 3*(3*7 8 9 10)); %let parm2=%sysfunc(translate(%sysfunc(prxchange(%str(s/\b[0-9]{1,}\b(?!\*)/1/),-1,&parm)),+,%str( ))); (40*1+3*(3*1+1+1+1)) s/\b[0-9]{1,}\b(?!\*)/1/ this is a fairly complex regular expression \b[0-9]{1,}\b find a word that is made up of 1 or more digits 0-9 (\b is like \W) (?!\*) lookahead, do not match a literal * (use lookahead to not add this character to %1 for replacement later)

FriedEgg · ‎11-01-2011

The article by Norvig does slightly depart from the method of truely calculating the bayesian probability. Instead it implements a sort of logical replacement... Take the probability of the correction (the shortest edit distance) with the frequency of appearance of the corrected word in our dictionary (big.txt). The best probability will be where the correct word has the shortest edit distance and the highest appearance frequency. This is definitly not calculating the probability, but follows the logic of what the formula is accomplishing, or so Peter departs. He also goes over a vast array of issues that this does have in properly identifying corrections. In a different article, whose source I can no longer remember, I read that at google in their dictionary they use over 10 trillion 4 word strings in their dictionary to aid in the proper identification of spelling corrections (because surrounding words aid in the correction). Here is a example of the issue with this method. I am meaning to spell 'THEY' but I acctidently type THAY %let word=thay; filename big '/nas/sasbox/users/mkastin/big.txt'; data big; length word $48; infile big lrecl=1024 truncover; input @; _infile_=compbl(prxchange('s/[^A-Z]/ /i',-1,_infile_)); if not missing(_infile_) then do i=1 to countw(_infile_,' '); word=upcase(scan(_infile_,i,' ')); if word ne '' then output; end; drop i; run; proc freq data=big; tables word /list out=wfreq(drop=percent) noprint; run; data corrections; if 0 then set wfreq; declare hash wf(hashexp:10,dataset:'wfreq'); declare hiter wfi('wf'); wf.definekey('word'); wf.definedata(all:'Y'); wf.definedone(); orig_word=upcase("&word"); do while(wfi.next()=0); clev=complev(orig_word,word); if clev<=2 then output; end; keep orig_word word count clev; stop; run; proc sql noprint; select min(clev) into :min_clev from corrections; select max(count) into :max_count from corrections where clev=&min_clev; quit; proc sql; select distinct 'Did you mean: ' || strip(word) from corrections where clev=&min_clev and count=&max_count; quit; Did you mean: THAT no, I meant 'THEY'... However, if you look at the data (here are my choices with the shortest edit distance, 1): WORD COUNT orig_word clev HAY 42 THAY 1 THAW 2 THAY 1 THA 1 THAY 1 THAT 12423 THAY 1 THAN 1199 THAY 1 TRAY 8 THAY 1 THY 47 THAY 1 THEY 3932 THAY 1

FriedEgg · ‎11-01-2011

Too bad data step does not have a dynamic array feature like proc fcmp (but even there is only for numeric array anyway).

Online Status	Offline
Date Last Visited	‎03-31-2025 06:28 PM