Bharath_aavas has opened a thread with the subject "create common group id for
multiple link id's".
He solved the problem using 8 steps, answered his question
himself and closed the thread.
In reply to me
he wanted to solve the problem using Arrays.
He said later:
" I have inserted my SAS Code and tested and it is working in
all scenarios. Yet to automate for N number of level in macro
this is working until 6 relation id's."
Since his thread is marked as "Answered", my
Array solution offered to him seems to have not been
seen by others. For the benefit of him
and others, I am posting the array solution using this thread.
This is his Data Set:
data have; input p1 p2; cards; 9528 9534 9534 9056 8675 7890 8675 87677 7002 7001 7001 7005 7005 7009 5422 22333 22333 78878 78878 21121 21121 787999 787999 3422 1 2 1 3 1 4 1 11 5 7 6 7 8 7 ; run;
This is the expected answer I arrived.
group_id Loan1 Loan2 Loan3 Loan4 Loan5 Loan6 1 9528 9534 9056 . . . 2 8675 7890 87677 . . . 3 7002 7001 7005 7009 . . 4 5422 22333 78878 21121 787999 3422 5 1 2 3 4 11 . 6 5 7 6 8 . .
The correct specification will be to get the list of
distinct numbers from both P1 and P2 subject to
having a linkage between elements of both columns.
Such elements are placed into one group.
9528 9534 9056 form one group as other numbers are far apart.
The search then continues for distinct-linked numbers
for another group.
Initially we don't know how many groups are there and how many
distinct numbers will fall into a group.
Approach:
We need to decide when a group begins and when it ends.
Externally we found the number of observations of the Data Set.
Two temporary arrays are created with size equal to the
number of observations. They are named as PID[ ] and LID[ ].
Elements from P1 are copied to PID and from P2 to LID.
A character variable (LIST) with an initial length 100 is
used. It must be increased to larger size when needed. One
can use a length of 32767 initially but for this example it
is a waste of memory.
Elements PID[1] and LID[1], if they are different, are placed
in the LIST. Next, the end of group (or beginning of group)
is decided by the following 2 conditions:
[1] PID[i] is compared with either of LID[i-1] or of PID[i-1].
and
[2] LID[i] is compared with either of PID[i-1] or of LID[i-1].
Their values are negated to get COND1 and COND2. If they
take 1 each, then PID[i] and LID[i] are no longer linked
to previous elements. It is the beginning of next group.
If one of them is zero, then the same group continues.
Here are two conditions:
cond1 = ^(pid[i] = lid[i-1] or pid[i] = pid[i-1]);
cond2 = ^(lid[i] = pid[i-1] or lid[i] = lid[i-1]);
Also we count the number of elements placed in the LIST
and the maximum number of elements is found. This Maximum
number will be used in the next Data Step to size an array.
Here is the Data Step which gets the LIST of distinct
elements for each group in TEMP Data Set:
data _null_; call symputx('size', num); if 0 then set have nobs = num; stop; run; data temp; array pid[&size] _temporary_; array lid[&size] _temporary_; length list $100; if _n_ = 1 then do i = 1 by 1 until(eof); set have end = eof; pid[i] = p1; lid[i] = p2; end; if pid[1] = lid[1] then list = catx(' ',list, pid[1]); else list = catx(' ',list, pid[1], lid[1]); do i = 2 to dim(pid); cond1 = ^(pid[i] = lid[i-1] or pid[i] = pid[i-1]); cond2 = ^(lid[i] = pid[i-1] or lid[i] = lid[i-1]); if cond1 * cond2 then do; count = countw(list); output; list = ' '; end; if cond1 then list = catx(' ',list, pid[i]); if cond2 then list = catx(' ',list, lid[i]); Max = max(Max, count); if i = dim(pid) then do; Max = max(Max,countw(list)); output; call symputx('Max', Max); stop; end; end; keep list; run;
The following Data Step copies the elements from the LIST,
converting them into numbers, into an array L[ ] sized to
the Maximum found in the previous Step.
data want; retain group_id 0; array L Loan1 - Loan&Max; set temp; group_id + 1; n = countw(list); do i = 1 to n; L[i] = input(scan(list,i), best.); end; keep group_id Loan:; run;
Everything is dynamically done except the length of LIST.
DataSP
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.