About FriedEgg

FriedEgg · ‎09-14-2011

Check what your options are: proc options group=ERRORHANDLING; run; specificly look into ERRORABEND/NOERRORABEND, ERRORBYABEND/NOERRORBYABEND

FriedEgg · ‎09-14-2011

data have; input (CLAM LNNO IND) ($); cards ; 12345 001 Y 12345 002 Y 12345 003 Y 23456 001 Y 23456 002 N 34567 001 Y 34567 002 N 34567 003 N 78968 001 Y 78968 002 Y 78968 003 Y ; run; proc sort data=have; by clam ind; run; data want; set have; by clam ind; if first.clam then del=0; if ind='N' then del+1; if ind='Y' and del>0 then delete; drop del; run;

FriedEgg · ‎09-14-2011

I run slightly modified version of KSharp code with exceptional performance. I made minor adjustment to make generation a large sample easier (all numeric fields pan1-pan4 instead of character fields). So I generate sample data for 100,000 rows containing 4 variables with values from 1-999,999 I optionally sort columns and or rows to compare performance (there should not be huge difference since hash object lookup should not be dependent on order, however may see improvements due to fewer large reassignments, maybe). First off I must say the performance of this is extrememly good in my systems, I cannot see this taking 8-10 minutes for a set of 80,000 records as mentioned. options fullstimer; /* Generate some data for a larger test */ data have; call streaminit( 12345 ); array pan[4]; do i=1 to 10**4; do j=1 to dim(pan); pan =abs(mod(int(rand('cauchy')*10**4),10**5)); end; output; end; drop i j; run; /* End */ /* Start optional sortings for testing */ data have; set have; array pan pan1-pan4; call sortn(of pan ); run; proc sort data=have; by pan1 pan2 pan3 pan4; run; /* End */ /* Start Assign Linkage Key */ data want(keep=pan1-pan4 lkey); declare hash ha(hashexp : 20); declare hiter hi('ha'); ha.definekey('count'); ha.definedata('count','pan1','pan2','pan3','pan4'); ha.definedone(); declare hash _ha(hashexp: 20); _ha.definekey('key'); _ha.definedata('_lkey'); _ha.definedone(); do until(last); set have end=last; count+1; ha.add(); end; array h{4} pan1-pan4; _rc=hi.first(); do while(_rc eq 0); lkey+1;_lkey=lkey; do i=1 to 4; if not missing(h{i}) then do; key=h{i}; _ha.replace();end; end; do until(x=1); x=1; rc=hi.first(); do while(rc=0); found=0; do j=1 to 4; key=h{j};rcc=_ha.check(); if rcc =0 then found=1; end; if found then do; do k=1 to 4; if not missing(h{k}) then do; key=h{k};_ha.replace();end; end; output;x=0; _count=count; end; rc=hi.next(); if found then rx=ha.remove(key : _count); end; end; _rc=hi.first(); end; run; /* End */ Results (average of 3 runs each, time is for final linkage step only): No Sorting: real time - 6.25 seconds Sortn only: real time - 4.35 seconds ( downside to utilizing is loss of variable specification if pan1, pan2 etc are meaningful unto themselves ) Sortn and Proc Sort: real time - 4.42 seconds Proc Sort only: 4.38 seconds So clearly, at least utilizing my sample data performance is stellar and can improve by preparing the data with a sort. I would still recomment removing instances will all blanks however for my test this is not necessary. My tests each generated a little over 2,000 unique linkage keys at the end of process and my spot checking all looked good. This is a fantastic thread, I too am still learning very much about the hash objects, unfortuneatly in my work thus far there are not many useful opportunities to impletement them for gains, thus far.

FriedEgg · ‎09-14-2011

proc format; value myfmt (multilabel) 1 = 1 1 - 2 = 2 1 - 3 = 3 1 - 4 = 4 1 - 5 = 5 1 - 6 = 6 1 - 7 = 7 1 - 8 = 8 1 - 9 = 9 /* etc... */ ; run; data have; format date eurdfde9.; input type1 $ type2 value date eurdfde9. type3 $; year=year(date); month=month(date); cards; A 1 458 01-Jan-11 ba A 1 492 01-Mar-11 ba A 1 9 01-Apr-11 ba A 1 1 01-May-11 ba A 1 333 01-Jun-11 ba A 1 1 01-Jul-11 ba A 2 14 01-Jan-11 ba A 2 20 01-Feb-11 ba A 2 18 01-Mar-11 ba A 2 13 01-Apr-11 ba A 2 4605 01-May-11 ba A 2 1599 01-Jun-11 ba A 2 58 01-Jul-11 ba A 3 7 01-Jan-11 ba A 3 3 01-Feb-11 ba A 3 4 01-Mar-11 ba A 3 3 01-Apr-11 ba A 3 4 01-May-11 ba A 3 2 01-Jun-11 ba B 1 5 01-Feb-11 ba B 1 2 01-Mar-11 ba B 1 7 01-Apr-11 ba B 1 1 01-May-11 ba B 1 4 01-Jul-11 ba B 1 20 01-Mar-11 ba B 2 11 01-Apr-11 ba B 2 7 01-Jun-11 ba B 2 70 01-May-11 ba B 2 1 01-Jan-11 ba B 2 1 01-Feb-11 ba B 2 1 01-Mar-11 ba B 2 1 01-Apr-11 ba B 3 1 01-May-11 ba B 3 1 01-Jun-11 ba B 3 1 01-Jul-11 ba B 3 1 01-Jan-11 ba B 3 1 01-Apr-11 ba ; run; proc means data=have sum nonobs nway noprint; class type1 type2 type3 year; class month /mlf; var value; format month myfmt.; output out=temp(drop=_:) sum=total_value; run; data want; format date eurdfde9.; set temp; date=mdy(month*1,1,year); drop month year; run;

FriedEgg · ‎09-13-2011

If you have a negative number this should have one minor flaw. I cannot test but I believe that if you have say -1.003 then using this method digit3=-3 which you probably do not want. You will probably want to account for this.

FriedEgg · ‎09-12-2011

Thanks _null_, I had forgotten about that option. /* create some dummy files */ data _null_; do i=1 to 3; call execute('filename file' || strip(i) || ' "/home/friedegg/file' || strip(i) || '.txt";'); call execute('data _null_;'); call execute(' file file' || strip(i) || ';'); call execute(' put "a b c d e";'); call execute(' put "f g h i j";'); call execute('run;'); end; run; filename indata '/home/mkastin/file*.txt'; data want; length _file _filex $512; infile indata eov=_firstrec_; input (blah1-blah5) (:$1.); if _n_ or _firstrec_ then delete; /* _firstrec_ will =1 for all files first record except the first */ run; /* remove this junk */ data _null_; do i=1 to 3; call execute('x "rm -f /home/friedegg/file' || strip(i) || '.txt";'); end; run;

FriedEgg · ‎09-12-2011

When you click on reply in the upper right hand corner you can choose to utilize the advanced editor. That will allow you to upload files via a normal reply.

FriedEgg · ‎09-12-2011

You didn't upload data or code, you uploaded a random dll file.

FriedEgg · ‎09-12-2011

/* create some dummy files */ data _null_; do i=1 to 3; call execute('filename file' || strip(i) || ' "/home/friedegg/file' || strip(i) || '.txt";'); call execute('data _null_;'); call execute(' file file' || strip(i) || ';'); call execute(' put "a b c d e";'); call execute(' put "f g h i j";'); call execute('run;'); end; run; filename indata '/home/mkastin/file*.txt'; data want; length _file _filex $512; infile indata end=done filename=_file; input (blah1-blah5) (:$1.); retain _filex; if _filex ne _file then do; _filex=_file; delete; end; drop _:; run; /* remove this junk */ data _null_; do i=1 to 3; call execute('x "rm -f /home/friedegg/file' || strip(i) || '.txt";'); end; run;

FriedEgg · ‎09-12-2011

First the program would need to utilize unix style EOL characters or \n would need to properly identify the correct EOL. You can preprocess by converting from dos2unix dos2unix mymacro.sas tr -s "\n" mymacro.sas > mymacro_new.sas tr will squeeze (-s) duplicate apperances of the EOL char ("\n") and leave single apperances intact.

FriedEgg · ‎09-12-2011

The pipe is not necessary for what you are trying to accomplish. filename indata 'C:\Documents and Settings\agautam\Desktop\Selfgen\Model\Commercial\Commercial_Model_Input_Output_Data\CED_2011\Preliminary\Floorspace\*.txt'; data Floor_sp; infile indata filevar = fil2read end = done firstobs = 2 ; input @5 CZ 2. @10 BTYPE $14. @27 YEAR 4. @35 HISTF 8.4 @47 BCASTF 8.4 @59 ADDITIONS 8.4 ; run ;

FriedEgg · ‎09-12-2011

if you are using unix you can do the following via the shell: tr -s \n old.sas > new.sas

FriedEgg · ‎09-12-2011

The dim function gives you the number of variables (dimensions) present in a given array. These are the relevant lines in your code: %let var = HPVI_RECOM IMM_ANY P_NUMHPV INCPOV1 Raceethk race_k educ1 num_provr facility registry asthma P_UTDHPV; array newvar (*) &var; do x = 1 to Dim(newvar); The array statement decalres your newvar array containing a number of dimensions (*) specified by a list of variables (&var). &var according to your %let statement contains 12 variables so dim(newvar)=12 and your do loop with iterate from 1 to 12. The do loop acts as follows: start i=1; do stuff; i+1; check condition 1<=i<=12 (true); do stuff; i+1; check..... so ultimately i=13 and the check condition is false and the loop terminates.

FriedEgg · ‎09-12-2011

proc format; value myfmt (multilabel) 1 = 'Jan' 2 = 'Feb' 3 = 'Mar' 4 = 'Apr' 5 = 'May' 6 = 'Jun' 7 = 'Jul' 8 = 'Aug' 9 = 'Sep' 10 = 'Oct' 11 = 'Nov' 12 = 'Dec' 1 , 2 = 'Jan + Feb' 1 , 2 , 3 = 'Jan + Feb + Mar' /* etc... */ ; run; data have; input type1 $ name1 value date eurdfde9. type2 $; month=month(date); year=year(date); cards; E 3 40 01-Jan-11 A1 E 3 10 01-Feb-11 A1 E 3 49 01-Mar-11 A1 E 3 8 01-Apr-11 A1 E 3 56 01-May-11 A1 E 3 33 01-Jun-11 A1 E 3 60 01-Jul-11 A1 E 3 700 01-Feb-11 a2 E 3 145 01-Jan-11 a2 E 3 190 01-Feb-11 a2 E 3 100 01-Mar-11 a2 E 3 80 01-Apr-11 a2 ; run; proc means data=have sum nonobs nway noprint; class year; class month /mlf; class type1 type2; var value; format month myfmt.; output out=want sum=total_value; run; proc print data=want; run; total_ Obs year month type1 type2 _TYPE_ _FREQ_ value 1 2011 Apr E A1 15 1 8 2 2011 Apr E a2 15 1 80 3 2011 Feb E A1 15 1 10 4 2011 Feb E a2 15 2 890 5 2011 Jan E A1 15 1 40 6 2011 Jan E a2 15 1 145 7 2011 Jan + Feb E A1 15 2 50 8 2011 Jan + Feb E a2 15 3 1035 9 2011 Jan + Feb + Mar E A1 15 3 99 10 2011 Jan + Feb + Mar E a2 15 4 1135 11 2011 Jul E A1 15 1 60 12 2011 Jun E A1 15 1 33 13 2011 Mar E A1 15 1 49 14 2011 Mar E a2 15 1 100 15 2011 May E A1 15 1 56

FriedEgg · ‎09-12-2011

proc format; picture mydtfmt other = '%m/%d/%Y %H:%M:%S' (datatype=datetime); run; data _null_; /* get first day of current month */ today=today(); first_day=mdy(month(today),1,year(today)); /* get last day of previous month and add time 23:59:59 */ want_d=first_day-1; want_dt=dhms(want_d,23,59,59); /* put calculated datetime into macrovariable with for m/d/Y H:M:S */ call symput('previousmonth',put(want_dt,mydtfmt.)); run; %put &previousmonth;

Online Status	Offline
Date Last Visited	‎03-31-2025 06:28 PM