Hi, I have data as example following I try to merging with hash.
But I can not count exactly.
Also I try split as below table as group of ndcID but only get only once value for every group of ndcID.
id sum max count linkid ndcID
1 | 10 | 278 | 1 (should 4) | 1 | AA |
2 | 10 | 181 | 2 (should 2) | 1 | AB |
3 | 20 | 15 | 1 (should 2) | 2 | AA |
4 | 14 | 28 | 2 (should 2) | 2 | AB |
5 | 30 | 28 | 3 (should 2) | 2 | AC |
options validvarname=any;
data ex2;
length link_id 8;
input link_id study_entry_date :date9.;
format study_entry_date yymmddn8.;
datalines;
1 05JAN2012
2 10FEB2012
run;
data claim;
length link_id 8;
input claim_id ndc_id $ link_id rx_dos_dt :date9. days_supply;
format rx_dos_dt yymmddn8.;
datalines;
1 AA 1 05JAN2014 2
2 AA 1 08JAN2014 3
3 AB 1 14JAN2014 5
4 AD 1 05JAN2013 12
5 AA 2 05JAN2013 10
6 AB 2 10FEB2015 7
7 AC 2 27FEB2015 15
8 AC 3 27JAN2015 15
9 AC 4 27FEB2015 15
10 AA 1 05APR2014 2
11 AA 1 08JAN2015 3
12 AB 1 14JUL2014 5
13 AD 1 25JAN2013 12
14 AA 2 20JAN2013 10
15 AB 2 10MAR2015 7
16 AC 2 27MAR2015 15
17 AC 3 27JUN2015 15
18 AC 4 27AUG2015 15
run;
data ndc;
input ndc_id $;
datalines;
AA
AB
AC
run;
proc sort data=claim;
by link_id ndc_id;
run;
data ndcmerge(keep = link_id ndc_id sum max count);
length sum 8 max 8 count 8;
dcl hash bene_hash(dataset:'ex2', ordered:'a');
bene_hash.definekey('link_id');
bene_hash.definedone();
dcl hash ndc_hash(dataset:'ndc', ordered:'a');
ndc_hash.definekey('ndc_id');
ndc_hash.definedone();
* define dynamic hash object;
declare hash a_groups_data;
declare hash groups (ordered:'a');
groups.defineKey('link_id', 'ndc_id');
groups.defineData('link_id', 'ndc_id', 'a_groups_data');
groups.defineDone();
do count = 0 by 0 until (last.link_id);
do sum = 0 by 0 until (last.ndc_id);
set claim end=last_obs;
by link_id ndc_id;
if bene_hash.find() = 0 and ndc_hash.find() = 0 then do;
prevVal = ifn(first.link_id or first.ndc_id, . , lag(rx_dos_dt));
format prevVal yymmddn8.;
diff = rx_dos_dt - prevVal;
diffPre = ifn(first.link_id or first.ndc_id, . , lag(diff));
if diff > diffPre then
max = diff;
else
max = diffPre;
sum ++ days_supply;
if last.link_id or last.ndc_id then
count = count + 1;
if last.ndc_id then do;
output;
if groups.find() ne 0 then do;
put 'Not Found group of NDC then ADD';
a_groups_data = _new_ hash(ordered:'a');
a_groups_data.defineKey('_n_');
a_groups_data.defineData('link_id', 'ndc_id', 'sum', 'max', 'count');
a_groups_data.defineDone();
groups.add();
end;
else if groups.find() eq 0 then do;;
put 'Found group of NDC';
end;
a_groups_data.add();
end;
end;
end;
end;
dcl hiter groups_iterator ('groups');
do while (groups_iterator.next() = 0);
tablename = cats('group_eq_',ndc_id);
a_groups_data.output (dataset:tablename);
end;
run;
This may do want you want:
data NDCMERGE(keep = LINK_ID NDC_ID SUM MAX COUNT);
retain MAX;
dcl hash LINK_HASH(dataset:'EX2');
LINK_HASH.definekey('LINK_ID');
LINK_HASH.definedone();
dcl hash NDC_HASH(dataset:'NDC');
NDC_HASH.definekey('NDC_ID');
NDC_HASH.definedone();
set CLAIM end=LAST_OBS;
by LINK_ID NDC_ID;
PREVVAL=lag(RX_DOS_DT);
if LINK_HASH.check()=0 and NDC_HASH.check()=0;
if first.NDC_ID then call missing(SUM, COUNT, PREVVAL, MAX);
COUNT + 1;
SUM + DAYS_SUPPLY;
MAX = max(MAX, RX_DOS_DT-PREVVAL);
if last.NDC_ID then output;
run;
MAX | link_id | ndc_id | SUM | COUNT |
---|---|---|---|---|
278 | 1 | AA | 10 | 4 |
181 | 1 | AB | 10 | 2 |
15 | 2 | AA | 20 | 2 |
28 | 2 | AB | 14 | 2 |
28 | 2 | AC | 30 | 2 |
Please show the desired outcome.
I want get the result of COUNT as following:
ID | SUM | MAX DURATION | COUNT | LINK_ID | NDC_ID |
1 | 10 | 278 | 4 | 1 | AA |
2 | 10 | 181 | 2 | 1 | AB |
3 | 20 | 15 | 2 | 2 | AA |
4 | 14 | 28 | 2 | 2 | AB |
5 | 30 | 28 | 2 | 2 | AC |
But the result COUNT incorrect as program return:
ID | SUM | MAX DURATION | COUNT | LINK_ID | NDC_ID |
1 | 10 | 278 | 1 | 1 | AA |
2 | 10 | 181 | 2 | 1 | AB |
3 | 20 | 15 | 1 | 2 | AA |
4 | 14 | 28 | 2 | 2 | AB |
5 | 30 | 28 | 3 | 2 | AC |
Also I want split above table to three group AA, AB, and AC as:
GROUP_AA:
ID | SUM | MAX DURATION | COUNT | LINK_ID | NDC_ID |
1 | 10 | 278 | 1 | 1 | AA |
3 | 20 | 15 | 1 | 2 | AA |
GROUP_AB:
ID | SUM | MAX DURATION | COUNT | LINK_ID | NDC_ID |
2 | 10 | 181 | 2 | 1 | AB |
4 | 14 | 28 | 2 | 2 | AB |
GROUP_AC:
ID | SUM | MAX DURATION | COUNT | LINK_ID | NDC_ID |
5 | 30 | 28 | 3 | 2 | AC |
But program return only once row for every three group as:
GROUP_AA:
ID | SUM | MAX DURATION | COUNT | LINK_ID | NDC_ID |
1 | 10 | 278 | 1 | 1 | AA |
GROUP_AB:
ID | SUM | MAX DURATION | COUNT | LINK_ID | NDC_ID |
2 | 10 | 181 | 2 | 1 | AB |
GROUP_AC:
ID | SUM | MAX DURATION | COUNT | LINK_ID | NDC_ID |
5 | 30 | 28 | 3 | 2 | AC |
Also using the lag() function inside an if clause is a good recipe for unexpected results.
This may do want you want:
data NDCMERGE(keep = LINK_ID NDC_ID SUM MAX COUNT);
retain MAX;
dcl hash LINK_HASH(dataset:'EX2');
LINK_HASH.definekey('LINK_ID');
LINK_HASH.definedone();
dcl hash NDC_HASH(dataset:'NDC');
NDC_HASH.definekey('NDC_ID');
NDC_HASH.definedone();
set CLAIM end=LAST_OBS;
by LINK_ID NDC_ID;
PREVVAL=lag(RX_DOS_DT);
if LINK_HASH.check()=0 and NDC_HASH.check()=0;
if first.NDC_ID then call missing(SUM, COUNT, PREVVAL, MAX);
COUNT + 1;
SUM + DAYS_SUPPLY;
MAX = max(MAX, RX_DOS_DT-PREVVAL);
if last.NDC_ID then output;
run;
MAX | link_id | ndc_id | SUM | COUNT |
---|---|---|---|---|
278 | 1 | AA | 10 | 4 |
181 | 1 | AB | 10 | 2 |
15 | 2 | AA | 20 | 2 |
28 | 2 | AB | 14 | 2 |
28 | 2 | AC | 30 | 2 |
Thanks, this is right and concise code.
Do you know how to use hash to split this table to three group?
I know we can use SQL to split it but just to learn about hash.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.