About KachiM

KachiM · ‎07-06-2016

You did not mention in your earlier posting about those who leave and join in the middle. Here are some questions. Do you give new ID for those who join and the IDs of those who left are not used anymore? What is the approximate number leavers and joiners per month? What is the maximum number of IDs to be handled in your realistic circumstance? You may work on a sample input data set and show the required output. Better to close this thread and come with a new thread with a different subject line and place your data there. If you consider that my earlier answer is acceptable, favor me so by clicking at the appropriate box.

KachiM · ‎07-06-2016

One way is to sort the data by the revenue in descending order. You can use the first 25 per cent of the observations. Use their IDs and go into their characteristics of interest for exploration.

KachiM · ‎07-05-2016

The problem reduces to selecting 9 random pairs out of 19 numbers. This must be repeated for 6 months such a way that no pair appears more than once. The names of persons need not be used to find the cominations. There is no need to save the monthly selections to a data set and consult them for selections in the subsequent months, if they are saved in memory. The use of arrays magically reduces the complications in the selection processes. ARRAY M[ ] is used to select individuals without any repeat with in each month. ARRAY NAMES[ ] remembers the names of indivduals. They are just used at the output stage and they are not used in finding the combinations as said above. ARRAY K[6, 19] is used to save the individuals selected in each month(1-6). It is used to check the pair selections in the previous months. Finally the output is sorted by month, first person, second person. The sorted output shows both the indivdual numbers and the corresponding names by month. Here is the output: Obs mon r1 r2 name1 name2 1 1 2 8 Alice Janet 2 1 3 14 Barbara Mary 3 1 6 11 James Joyce 4 1 7 4 Jane Carol 5 1 10 19 John William 6 1 12 1 Judy Alfred 7 1 15 17 Philip Ronald 8 1 16 9 Robert Jeffrey 9 1 18 5 Thomas Henry 10 2 2 11 Alice Joyce 11 2 4 3 Carol Barbara 12 2 5 12 Henry Judy 13 2 6 13 James Louise 14 2 7 1 Jane Alfred 15 2 10 18 John Thomas 16 2 14 16 Mary Robert 17 2 17 9 Ronald Jeffrey 18 2 19 15 William Philip 19 3 1 17 Alfred Ronald 20 3 4 13 Carol Louise 21 3 6 9 James Jeffrey 22 3 7 14 Jane Mary 23 3 8 10 Janet John 24 3 11 5 Joyce Henry 25 3 12 3 Judy Barbara 26 3 16 15 Robert Philip 27 3 18 2 Thomas Alice 28 4 3 15 Barbara Philip 29 4 4 1 Carol Alfred 30 4 5 13 Henry Louise 31 4 7 11 Jane Joyce 32 4 10 16 John Robert 33 4 12 9 Judy Jeffrey 34 4 14 8 Mary Janet 35 4 18 17 Thomas Ronald 36 4 19 2 William Alice 37 5 1 19 Alfred William 38 5 4 12 Carol Judy 39 5 5 7 Henry Jane 40 5 8 3 Janet Barbara 41 5 11 15 Joyce Philip 42 5 13 10 Louise John 43 5 14 9 Mary Jeffrey 44 5 16 18 Robert Thomas 45 5 17 6 Ronald James 46 6 2 17 Alice Ronald 47 6 4 11 Carol Joyce 48 6 5 14 Henry Mary 49 6 9 15 Jeffrey Philip 50 6 10 3 John Barbara 51 6 12 6 Judy James 52 6 13 8 Louise Janet 53 6 16 7 Robert Jane 54 6 18 19 Thomas William The code used is presented below. This code is amenable to write a C-program. data pairs; set sashelp.class(keep = name); ID = _N_; run; data monthly; array k[6, 19] _temporary_; array m[19] _temporary_; array names[19] $8 _temporary_; do i = 1 by 1 until(last); set pairs end = last; names[i] = name; end; call streaminit(123); do mon = 1 to 6; do p = 1 to 9; THERE1: r1 = ceil(rand("Uniform") * 19); if m[r1] then goto THERE1; m[r1] = 1; THERE2: r2 = ceil(rand("Uniform") * 19); if m[r2] then goto THERE2; * Check for previous selections; do v = 1 to mon - 1; if k[v, r1] = r2 or k[v, r2] = r1 then do; m[r1] = .; goto THERE1; end; end; m[r2] = 1; k[mon, r1] = r2; k[mon, r2] = r1; name1 = names[r1]; name2 = names[r2]; output; end; call missing(of m[*]); end; keep mon r1 r2 name1 name2; run; proc sort data = monthly; by mon r1 r2; run; proc print data = monthly; run;

KachiM · ‎07-03-2016

Another way: Look for a missing value, note the position(i). Then move forward to find the next non-missing value to replace the i-th value while marking the j-th value as missing. Skip the check. Repeat the above step for the next i-th value. data need; set have; array num num: ; do i = 1 to dim(num) - 1; if missing(num[i]) then do j = i + 1 to dim(num); if not missing(num[j]) then do; num[i] = num[j]; num[j] = .; leave; end; end; end; drop i j; run;

KachiM · ‎06-22-2016

You have made some queries in your comment on using Hash objects. [1] Loading Both data sets int hash tables: Huge dataset with KEY and non-key variables is likely to run out of memory. This amounts to program aborting in the middle. The first part of the program, data want; if _n_ = 1 then do; if 0 then set SMALL; declare hash h(dataset:'K1', 'K2'); h.definekey('K1', 'K2'); h.definedone(); end; As SMALL is very small compared to HUGE, I loaded the KEY(K1 and K2) into Hash table. Internally, each hash entry in the hash table consumes the number bytes equivalent to twice the length of K1 and K2 as default when data part of the hash table is left out. We can save some memory by introducing the data part as: h.definedata('K1'); before the statement h.definedone(); Then the memory consumed will be length of K1, K2, K1 and we get a nominal reduction. [2] Sophisticated srategies to increase performance: It is a good question for using the code for Production Jobs. SAS has more than one way to solve any problem. I recommend to try out all possible SAS solutions noting both time and memory. By time I mean both REAL and CPU time. Then choose the best method. Use of HASEXP: With respect to solution using Hash Objects, we can use HASHEXP option. By default, it is 8. One can use from 0 to 20. Again the correct value for a given data set has to be tried out. This means that blindly using any value other tha 8 has to be empirically verified. Keep K1, K2 of HUGE: The I/O time to read the entire record of HUGE can be reduced but further processing is warranted as noted below. The statement set HUGE(keep = K1 K2); is expected to bring K1 and K2 only to the Program Data Vector(PDV). The Record ID(_N_) of HUGE can be saved for the matched record. In the next data step, use POINT= option to mark them as deleted. Alternatively using, set HUGE; flag = 0; if h.find() =0 then flag = 1; run; will identify records to delete(flag=1). [3] System does not have enough Ram: This takes to the question of what goes into the hash table. HUGE is not affecting the RAM as one record of it is brought to PDV at a time. Take the case of SMALL data set. If both K1 and K2 are numbers then each hash table entry will take for the first part of the code, 32 bytes plus SAS taking extra 16 bytes. We can reduce 8 bytes only by introducing the hash data part as observed above. If they are both character-types, then length of K1 rounded up to 8 bytes and similarly K2. Suppose K1 is 10 bytes and K2 is 3 bytes, the memory for K1 is 16 and K2 is 8 bytes. This amounts to 24 + 24 bytes and the extra 16 bytes SAS requires. When everything fails, one possibility is to split the SMALL data set into number of parts such that a part can be saved in the hash table. The HUGE data set has to be processed as many times as the number of parts. [4] Revised program that flags the records with 1 for deletion and 0 for inclusion. data want; if _n_ = 1 then do; declare hash h(HASHEXP:8); h.definekey('K1', 'K2'); h.definedata('flag'); h.definedone(); do until(last); set SMALL end = last; h.add(); end; end; set HUGE; flag = 0; if h.find() = 0 then flag = 1; run; The WANT data set is HUGE with additional variable FLAG. For further processing, WHERE = option of SET statement can be used in WANT as: data NEED; set WANT(where=(flag=0)); ... other processing ... run;

KachiM · ‎06-20-2016

[1] Your sample data set shows some of the End_Dates are lesser than Fill_Dates. From Days_Supply, I have corrected and is given in HAVE. [2] There are three possibilites to check namely: a. Non-Overlapping Intervals, b. Partially Overlapping and c. Intervals that falls fully inside the previous Interval. Your input data set is missing the third possibility. I have added one such interval in my second data set. [3] The solution can be achieved without using arrays. We just hold the necessary variables(fill_date, end_date, sum_days_supply) with Prev_ prefix to compare with the current record. data have; informat Fill_Date mmddyy10. End_Date mmddyy10.; input MemberID @8 Drug :$1. @13 Days_Supply @18 Fill_Date @31 End_Date; datalines; 1 a 4 01/02/2015 01/06/2015 1 b 4 01/07/2015 01/11/2015 1 b 15 01/15/2015 01/30/2015 1 c 15 02/10/2015 02/25/2015 2 a 4 12/02/2015 12/06/2015 2 b 4 12/04/2015 12/08/2015 2 b 5 12/09/2015 12/14/2015 ; run; The revised data set holding interval fully falling inside the previous interval. data have; informat Fill_Date mmddyy10. End_Date mmddyy10.; input MemberID @8 Drug :$1. @13 Days_Supply @18 Fill_Date @31 End_Date; datalines; 1 a 4 01/02/2015 01/06/2015 1 b 4 01/07/2015 01/11/2015 1 b 15 01/15/2015 01/30/2015 1 b 4 01/16/2015 01/20/2015 1 c 15 02/10/2015 02/25/2015 2 a 4 12/02/2015 12/06/2015 2 b 4 12/04/2015 12/08/2015 2 b 5 12/09/2015 12/14/2015 ; run; proc sort data = have; by MemberID FillDate; run; data need; do until(last.MemberID); set have; by MemberID; if first.MemberID then do; output; Prev_fill_date = fill_date; Prev_end_date = end_date; Prev_sum_days_supply = days_supply; sum_days_supply = .; end; else do; if Prev_end_date < fill_date then do; sum_days_supply = sum(Prev_sum_days_supply,days_supply); Prev_sum_days_supply = sum_days_supply; Prev_fill_date = fill_date; Prev_end_date = end_date; end; else if Prev_end_date > fill_date & Prev_end_date < end_date then do; sum_days_supply = sum(Prev_sum_days_supply + (end_date - Prev_end_date)); Prev_sum_days_supply = sum_days_supply; Prev_fill_date = fill_date; Prev_end_date = end_date; end; else if Prev_fill_Date <= fill_date and Prev_end_date >= end_date then sum_days_supply = days_supply; output; end; end; drop Prev:; run; proc print data = need; run;

KachiM · ‎06-18-2016

I expected the data set must be sorted by Drug, MemberID and fillDate in that order, so that summing for non-ovelapping intervals can be obtained by the current row and the immediate previous row. Thie assumes that endDate is greater than fillDate. But you seem to have not sorted in that order. In the following example(first row), endDate is EARLIER to fillDate. Is it right? Prepare an example data set that have all your requirements. Show the desired output that comes out of the example data set. Give rules on how each output record is derived. Member ID Drug days_supply fill_date end_date sum_days_supply 1 a 4 01/24/2015 01/06/2015 . 1 b 4 01/07/2015 01/11/2015 8 1 b 15 01/15/2015 01/30/2011 23 1 c 15 02/10/2015 02/25/2011 38

KachiM · ‎06-17-2016

I must have mentioned I/O time in addtion to Run-time.

KachiM · ‎06-17-2016

I presume that your interest is on finding Overlapping date-intervals for a DRUG for a given MemberID. If so, consider the following solution. I assumed that in real situation, you may not have more than 100 intervals within a DRUG for a MemberID. If it is so, SIZE the arrays suitably. The input data set is sorted by DRUG, MemberID and FillDate. data have; informat FillDate mmddyy10. EndDate mmddyy10.; input MemberID @4 Drug :$12. @20 DaysSupply @27 FillDate @37 EndDate; datalines; 1 APAP-Codeine 30 6/1/15 7/1/15 1 Amoxicillin 10 6/8/15 6/18/15 1 APAP-Codeine 5 8/8/15 8/13/15 2 Cephalexin 20 2/5/15 2/25/15 2 Ropinorole 30 2/10/15 3/10/15 3 Ibuprofen 5 7/2/15 7/7/15 ; run; proc sort data = have; by Drug MemberID FillDate; run; data need; array F[100] _temporary_; array E[100] _temporary_; array Days[100] _temporary_; do count = 1 by 1 until(last.Drug); set have; by Drug; F[count] = FillDate; E[count] = EndDate; Days[count] = DaysSupply; end; FillDate = F[1]; EndDate = E[1]; Sumdayssupply = Days[1]; output; do i = 2 to count; flag = F[i-1] <= F[i] <= E[i-1]; if flag = 0 then Days[i] + Days[i-1]; FillDate = F[i]; EndDate = E[i]; Sumdayssupply = Days[i]; output; end; drop count i flag; run; proc print data = need; run;

KachiM · ‎06-17-2016

Hi LinusH: I do not understand your comment. To me, Hash, Array or any other SAS way to solve a problem hinges on either minimizing run-time or memory or both. The data type of K1 and K2 is not known, I used Hash Object. If they were NUMBERs, I would have solved the problem by using Array. I will appreciate the superiority of a method must be empirically verified before offering comments. Regards

KachiM · ‎06-16-2016

Try a Hash Solution. The SMALL goes to Hash table. The HUGE can be processed one record after another. The Matchin record will be ignored. data want; if _n_ = 1 then do; if 0 then set SMALL; declare hash h(dataset:'SMALL'); h.definekey('K1', 'K2'); h.definedone(); end; set HUGE; if h.find() ^= 0; run;

KachiM · ‎01-25-2016

If your interest is pick the closest value of ID to 50 and 75 here is another way. Compare the pair of IDs that enclose either 50 or 75 and then choose the left or right of the pair based on closeness. data have; input id @@; datalines; 8 15 20 26 31 36 44 47 54 59 65 68 74 79 85 90 95 100 ; run; data want; retain old ; set have ; if _n_ = 1 then old = id; else do; if (old < 50 < id) then do; val = ifn((50 - old) < (id - 50), old, id); output; end; if (old < 75 < id) then do; val = ifn((75 - old) < (id - 75), old, id); output; end; old = id; end; run; proc print data = want; run;

KachiM · ‎12-26-2015

It is still doable. Your description of what you want is not clear, at least for me. If small sample data set with desired output, if shown, will enthuse people to try out and get a suitable data step solution.

KachiM · ‎10-27-2015

You missed to read the formula correctly. Use this and find whether this works for you. data have; P=100000; R=0.12; do N=1 to 12; EMI=(P*R*(1+R)**N)/((1+R)**N-1); output; end; run;

KachiM · ‎10-16-2015

We can take care of DIF_DAY and revise my old program. We just use a WHERE= condition to filter observations for DIF_DAY in 0 to 4 and do the program with the data provided. data have; input ide date yymmdd10. dif_DAY rtn; datalines; 1101 2009/8/26 0 -0.01729 1101 2009/8/27 1 0.001466 1101 2009/8/28 2 0.021962 1101 2009/8/31 3 0.022923 1101 2009/9/01 4 0.001401 1101 2009/9/02 5 -0.00699 1101 2009/9/03 6 0 1101 2009/9/04 7 -0.00282 1101 2009/9/07 8 0.011299 1101 2012/11/09 0 -0.00921 1101 2012/11/12 1 0 1101 2012/11/13 2 -0.01859 1101 2012/11/14 3 -0.00406 1101 2012/11/15 4 -0.01223 1101 2012/11/16 5 0.001376 1101 2012/11/19 6 0.005495 1101 2012/11/20 7 0.005464 1101 2012/11/21 8 0 1103 2014/1/23 0 -0.00643 1103 2014/1/24 1 0.006472 1103 2014/1/27 2 -0.00322 1103 2014/2/05 3 -0.00323 1103 2014/2/06 4 0.003236 1103 2014/2/07 5 -0.00323 1103 2014/2/10 6 0.003236 1103 2014/2/11 7 0.003226 ; run; proc sort data = have; by ide ; run; data want; do until(last.ide); set have (where = (dif_DAY between 0 and 4)); by ide; if dif_DAY = 0 then BHR = 1; BHR = BHR * (1 + rtn); if dif_DAY = 4 then do; BHR = BHR - 1; output; end; end; drop dif_DAY rtn; run; proc print data = want ; format date date10.; run;

Online Status	Offline
Date Last Visited	Sunday

Re: Differentiate between potential errors versus good values in a lon...

Re: Join vs Merge 1.2 TB with 110 GB Datasets

Re: Sort only part of a _temporary_ Array

Re: Sort only part of a _temporary_ Array

Re: Sort only part of a _temporary_ Array

Re: Sum Certain values across rows.

Re: i want to sort the data in array and removing duplicate without us...

Re: Hash object memory getting full

Re: Hash Objects Find Greater than

Re: Hash Objects Find Greater than

Re: Creatine a new column if Values in one column are less than value ...

Re: Extracting desired values in a dataset

Re: How to workout keep track the variable

Re: do - end structure and if structure interaction

Re: how to keep multiple variables?

Re: Sort only part of a _temporary_ Array

Re: Sort only part of a _temporary_ Array

Re: Sort only part of a _temporary_ Array

Re: i want to sort the data in array and removing duplicate without us...

Re: Matching clients with SAS

Re: Coffee anyone?

Re: Segmentation

Re: Coffee anyone?

Re: Filling gaps of missing data across arrays

Re: Fastest way to delete rows from a dataset by key

Re: Summing values when events are not overlapping

Re: Summing values when events are not overlapping

Re: Fastest way to delete rows from a dataset by key

Re: Summing values when events are not overlapping

Re: Fastest way to delete rows from a dataset by key

Re: Fastest way to delete rows from a dataset by key

Re: How pick the (50,75) nearest value b/w the observations

Re: Combining datasets on mismatching ID variables

Re: DO LOOP: HOW to create EMI calculator

Re: how to multiple the top 5 data by ID

SAS Global Forum 2018