About billfish

billfish · ‎10-15-2020

It may be some SAS OPTION which is not turned on or the way SAS EG (version 7.1) is set up. Looking at the code below, FILENAME = INX works fine. However in code line: FILE_NAME = INX , INX becomes a 8-character string so the SCAN function is not the issue. I have no idea why INX became a 8-character string; I am still awaiting a response from the SAS Administrator. CODE (START) FILENAME import_a "&SASDRV./Study*.txt"; DATA F_LIST; LENGTH FILE_NAME $100 first_var $250; FORMAT first_var $CHAR250.; INFORMAT first_var $CHAR250.; INFILE import_a FILENAME = INX FIRSTOBS = 2 LRECL = 800 ENCODING = "LATIN1" DLM = '09'x MISSOVER DSD ; FILE_NAME = INX; /* FILE_NAME = SCAN(INX,-1,'/\'); */ INPUT first_var : $CHAR250.; RUN; CODE (END) My solution is to assign the designation folder in the Copy-Files task not to WORK/XXX but to my own libname on the SAS server. I then run an x command (UNIX) to list all text files in my libname. I then modified a code snippet from SAS.Communities.com https://communities.sas.com/t5/SAS-Data-Management/how-to-get-list-of-files-available-in-Directory-including-sub/td-p/430369 and I am good.

PGStats · ‎11-14-2015

To select randomly k distinct numbers out of n, use rancomb data test; array x{60}; retain x (1:60); retain seed (-1); do i = 1 to 10; call rancomb(seed, 6, of x{*}); put x1-x6; output; end; keep x1-x6; run;

imcbczm · ‎11-10-2015

sorry, not really what i want.

Reeza · ‎11-09-2015

You could also do a proc means by acct_num and pymt_date. Then calculate 2 metrics, total payment and number of payments per date.

Filipvdr · ‎08-06-2015

Thanks Tom! This works! Did not know that function.

billfish · ‎07-28-2015

A solution amongst others. /*******************************/ /**** random sample dataset ****/ /*******************************/ data t_have(keep=id); do i= 1 to 10; id = 10+ ceil(3*ranuni(7)); output; end; run; /*****************************/ /**** a possible solution ****/ /*****************************/ data t_want(keep=id rnk); length id rnk 8.; if _N_=1 then do; declare hash ha(multidata:'N'); ha.definekey('zId'); ha.definedata('zId','zRnk'); ha.definedone(); end; rnk=0; do until(aDone); set t_have end=aDone; zId=id; rc=ha.find(); if (rc>0) then do; rnk+1; zRnk=rnk; ha.add(); output; end; else do; rnk=zRnk; output; end; end; run; t_want becomes: ==================== id rnk 11 1 13 2 13 2 13 2 12 3 13 2 13 2 13 2 12 3 12 3 ==================== hope this helps.

Ksharp · ‎07-29-2015

Dear Charlotte, There is eight hours between you and me . Nothing for me in this summary, just reading ,learning ...... I'd like to visit England ,like to see big bell,But I have no money . Hope you could visit China either . Now come back to question . I think @slchen 's code could be changed like this: if ( index(_type1,strip(type1))>0 or (missing(_type1) and missing(type1))) and ( index(_type2,strip(type2))>0 or (missing(_type1) and missing(type1))) and ( index(_type3,strip(type3))>0 or (missing(_type1) and missing(type1))) then output; Here is my code: Code: Program data A; input ID $ / Type1 $ / type2 $ / type3 $; cards; ABC01 25 N . ABC01 25 N A1 ABC01 11 Y A5 ABC01 55 k T1 JKL03 39 N A5 JKL03 41 Y A5 JKL03 40 N T1 JKL03 39 Y A1 ; run; data B; input ID $ / Type1 $20. / type2 $20. / type3 $20.; cards; ABC01 25,11,35,45 N,Y T1,A1,A5 JKL03 39,40 N T1,A1 ; run; data want; merge A B(rename=(type1-type3=_t1-_t3)); by id; array x{*} $ type:; array y{*} $ _t:; matched=1; do i=1 to dim(x); if not findw(y{i},strip(x{i})) then matched=0; if missing(x{i}) and missing(y{i}) then matched=1; if matched=0 then leave; end; if matched then output; drop i matched _t:; run;

asimraja · ‎07-10-2015

Thank you, Billfish!

billfish · ‎06-18-2015

A pedestrian view of the question. The function (F) in question has 26 parameters and thus a clear view is complicated. The function (F) has poles at x=0 and x=-1 so the roots of F will not include 0 and -1. First, assign some values to the said 26 parameters. Second, find at least 1 root of F within either in the (0 to 1) or (0 to -1) range. Assumption about the parameters: 1- p13 and p26 are greater than 1, with p26> p13. 2- besides p13 and p26, the other parameters are positive and less than 1. I am using a crude and pedestrian method to find the roots. /************************************************/ /**** sample random sets of parameters p(26) ****/ /************************************************/ data t_a(keep=i p:); array p(26); do i = 1 to 10; do j = 1 to 26; if (j=13) then p(j)= ceil(30*ranuni(5)); else if (j=26) then p(j)= p(13)+ ceil(20*ranuni(7)); else p(j) = ceil(10000*ranuni(11))/10000; end; output; end; run; /************************************************/ /**** for each set of parameters find a root ****/ /************************************************/ data t_b(keep=i p: aZero aChng aRoot bSign); array p(26); set t_a; aSign=0; aZero=0; aFunc=0; aChng=0; aSteps=1000; bSign=1; do j=0 to aSteps; aZero=bSign*j*(1/aSteps); aFlag=aFunc; do k=1 to 14; if (k=1) then aFunc+(p(13)-p(26))*aZero*((1+aZero)**12); else if (k=14) then aFunc+(p(25)*(p(12)-aZero)); else aFunc+(p(11+k)*(p(k-1)-aZero)*aZero*((1+aZero)**(13-k))); end; aFunc= round(aFunc,0.00001); if (j=0) then aSign= sign(aFunc); else if not(sign(aFunc)=aSign) then do; aChng=1; aRoot= aZero-bSign*(1/aSteps)*(abs(aFunc)/(abs(aFunc)+abs(aFlag))); aRoot=round(aRoot,0.000001); leave; end; end; output; run; proc print data=t_b; var i bSign aZero aRoot aChng; format aRoot 9.6; run; /********************/ /*** some results ***/ /********************/ When looking at the (0 to 1) range (bSign=1) one finds at least 1 root (aRoot). aChng=1 -> the function (aFunc) changed signs. i bSign aZero aRoot aChng =================================== 1 1 0.059 0.058985 1 2 1 0.431 0.430937 1 3 1 0.024 0.023931 1 4 1 0.007 0.006165 1 5 1 0.024 0.023457 1 6 1 0.012 0.011116 1 7 1 0.014 0.01384 1 8 1 0.127 0.126725 1 9 1 0.108 0.107417 1 10 1 0.025 0.024228 1 When looking at the (0 to -1) range (bSign=-1), not all sets of p(26) have a root in the (0 to -1) range. No root (aRoot=.) when there is no change in sign (aChng=0). i bSign aZero aRoot aChng ====================================== 1 -1 -1 . 0 2 -1 -0.057 -0.056783 1 3 -1 -0.144 -0.143403 1 4 -1 -1 . 0 5 -1 -0.256 -0.255269 1 6 -1 -0.056 -0.055367 1 7 -1 -1 . 0 8 -1 -0.416 -0.415036 1 9 -1 -0.042 -0.041868 1 10 -1 -0.279 -0.278255 1

Reeza · ‎06-02-2015

If the solution above works for you please mark this (and other) questions as answered.

gergely_batho · ‎04-11-2015

Hi, Why is it important to handle removal of "less then two matches" in the same data step? I'd like to understand how much this problem is performance(memory, CPU) focused. The problem here is, that in the current code, when I discover a match, I immediately output it. If later it turns out, it is the only match, I cannot revoke it. So instead of outputing, you should rather store it in memory (in a hash object, or in an array, or if it is really just 1 observation: in a temporary variable). Then you should output it when it turns out, there are really enough matches. (Otherwise you just clear the hash, empty the array or variable.) Unfortunately approach 1 (and also 2 and 3) is a bit more coding work. But I rather adapted 's idea and code. It is much more easy to adapt to your needs. For example one additional line to exclude ids with less then 2 matches. Also I think this program uses a better heuristic (if you want to minimize the sum of distances): it is "more greedy" then my program, because it starts with the smallest overall distance. PROC SQL; create table t_c as select a.id1, b.id2, a.score1, b.score2, (a.score1-b.score2) as difx, abs(a.score1-b.score2) as difa from data1 a, data2 b where (-0.01 <= (a.score1-b.score2) <=0.01) group by id1 having count(*)>=2 /*exclude ids with less then 2 matches*/ order by difa; QUIT; data t_want(keep=id1 id2 difa difx score1 score2); retain one 1; length numCon1 numCon2 8; if _N_=1 then do; declare hash h1(multidata:'N', suminc:'one');/*This will use a counter for each ID: counting how many times it was used*/ h1.definekey('id1'); h1.definedone(); declare hash h2(multidata:'N', suminc:'one');/*This will use a counter for each ID: counting how many times it was used*/ h2.definekey('id2'); h2.definedone(); end; do until(0); set t_c; h1.ref(); h1.sum(sum:numCon1); h2.ref(); h2.sum(sum:numCon2); if (numCon1<=2 and numCon2<=1) then do; output; end; end; run;

Rick_SAS · ‎06-10-2015

You want to use the inverse CDF method for generating random numbers. Sometimes you can use the QUANTILE function to help solve the inverse problem (see the example of the folded normal distribution), but other times you need to solve for the root of the cumulative distribution: F(x) = u, where u ~U(0,1). In SAS/IML, you can use the FROOT function to solve for numerical roots. If your CDF is given explicitly by a formula or by empirical quantiles, you can use linear interpolation. See this blog post. http://blogs.sas.com/content/iml/2014/06/18/distribution-from-quantiles.html

billfish · ‎04-02-2015

you can try: proc sql; select pat_id, count(distinct date) as visits from patients group by pat_id; quit;

billfish · ‎04-09-2015

Although the question is deemed answered, I want to put in 2 cents into the discussion. I do not have access to SAS/OR so I cannot evaluate the chosen solution. There are 2 offered solutions: 1) by xia keshan, and 2) SubGraphsMacro.sas (by PGStats) In the following I propose a different solution. First the input dataset t_a (here I will characterize it as 100000 (150000)): /***********************/ /**** input dataset ****/ /***********************/ data t_a; do _N_ = 1 to 100000; a1 = int(150000*ranuni(3)); a2 = int(150000*ranuni(5)); output; end; run; Now some benchmarks for the 3 solutions: 1) xia 2) current 3) SubGraphsMacro xia current SubGraphsMacro 100000 (150000) -> 54.00 sec ( 2.00 sec) (7:00 min) 200000 (300000) -> 1:50 min ( 4.06 sec) (25:57 min) 400000 (600000) -> 3:35 min ( 9.00 sec) 1000000 (1500000) -> 9:15 min (13.50 sec) 2000000 (3000000) -> 19:19 min (34:00 sec) 4000000 (6000000) -> 37:45 min ( 1:03 min) 8000000 (12000000) -> 106:58 min ( 2:25 min) 16000000 (24000000) -> (not done) (memory crash) Observations: 1) SubGraphsMacro is by far the slowest, about 10 times slower that xia's solution. 2) The current solution is about 30 times faster than xia's solution. 3) If enough RAM resources are available the current solution can find the clusters for 130M record dataset in less than 1 hour. The main ingredient of the proposed solution is the "symmetrization" of the dataset: /*****************************/ /*** t_b = symmetrized t_a ***/ /*****************************/ data t_b(keep=a1 a2); set t_a(rename=(a1=b1 a2=b2)); if b1=b2 then do; a1=b1; a2=b2; output; end; else do; a1=b1; a2=b2; output; a1=b2; a2=b1; output; end; run; /************************************/ /***** proposed hash solution 1 *****/ /************************************/ data _null_; length yMbr Mbr SetNo a1 a2 SET_NO SetNo 8.; if _N_=1 then do; declare hash ha(dataset:'t_b',multidata:'Y'); ha.definekey ('a1'); ha.definedata('a1','a2'); declare hiter aIter('ha'); ha.definedone(); call missing(a1,a2); declare hash hx(multidata:'N'); hx.definekey ('xMbr'); hx.definedata('xMbr'); declare hiter xIter('hx'); hx.definedone(); declare hash hy(multidata:'N'); hy.definekey ('yMbr'); hy.definedata('yMbr'); declare hiter yIter('hy'); hy.definedone(); declare hash hz(multidata:'N'); hz.definekey ('Mbr'); hz.definedata('Mbr','SetNo'); hz.definedone(); end; do until (aDone); set t_b end=aDone; xMbr=a1; hx.ref(); xMbr=a2; hx.ref(); end; /***************************/ /*** start of clustering ***/ /***************************/ Set_No=0; aSum=0; k1=xIter.first(); do while (k1=0); aSum+1; a1=xMbr; Mbr=xMbr; k2=ha.find(); j2=hz.find(); z1=hy.num_items; if (k2=0) and (not(j2=0)) and (z1=0) then do; Set_No+1; zChange=1; yMbr=a1; hy.ref(); yMbr=a2; hy.ref(); do until (zChange=0); zChange=0; k3=yIter.first(); do while(k3=0); a1=yMbr; k4=ha.find(); if (k4=0) then do; d1=ha.remove(key:a1); end; do while (k4=0); yMbr=a2; zAdd=hy.add(); zChange+(zAdd=0); k4=ha.find_next(); if (k4=0) then do; d1=ha.remove(key:a1); end; end; k3=yIter.next(); end; end; k4=yIter.first(); do while (k4=0); Mbr=yMbr; SETNO=SET_NO; hz.ref(); k4=yIter.next(); end; z1=hy.num_items; hy.clear(); z1=hy.num_items; end; k1=xIter.next(); end; hz.output(dataset:'t_c'); run;

DangIT · ‎03-19-2015

Thank you. This is a great approach I will look into it with more detail.

Online Status	Offline
Date Last Visited	‎10-15-2020 03:39 PM

Re: copy files task: make list of filenames of imported files.

copy files task: make list of filenames of imported files.

Re: Create 6 numbers random

Re: How to create repeated observations of two variables for each valu...

Re: Latest transaction with multiple entries

Re: Problem with quoted macrovariable

Re: Make one row

Re: Giving numbers based on column values- no need to remove any dupli...

Re: look up solution help for row to column values or column to row ma...

Re: Unable to pass an SQL query to Oracle property

Re: copy files task: make list of filenames of imported files.

copy files task: make list of filenames of imported files.

Re: End of Month function

Re: Problem with quoted macrovariable

Re: look up solution help for row to column values or column to row ma...

Re: copy files task: make list of filenames of imported files.

Re: Create 6 numbers random

Re: How to create repeated observations of two variables for each valu...

Re: Latest transaction with multiple entries

Re: Problem with quoted macrovariable

Re: Giving numbers based on column values- no need to remove any dupli...

Re: look up solution help for row to column values or column to row ma...

Re: Unable to pass an SQL query to Oracle property

Re: Could you please suggest a program to find the zeros to these two ...

Re: rounding value to next integer

Re: Matching by scores

Re: How to I generate random numbers using an increasing linear probab...

Re: how to count something using proc sql

Re: Create clusters from pairs

Re: Dynamic ranges and lookup logic