BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
marneni02
Fluorite | Level 6

Hello Experts,

 

Hope all are doing fine. Please apologize to bother you with my question.

I am trying to assign the initial ID to the other ID rows based on Prior ID column. Can you please help on this to figure the way to do it.

 

I have


data have;
input ID $ p_id $ ;
datalines;
006 .
007 006
008 .
009 .
010 007
011 010
;
run;

Capture.PNG

 

 

I want

Capture.PNG

 

Here, I am trying to create the 'NEED' column with first ID number value based on P_ID. If you see row#2 has P_ID as '006' , since I have record with ID = '006' so I need to get NEED = '006' for that record. Row # 5 has P_ID = '007' which has inital P_ID is '006', so I want '006' in NEED column for rows # 5. Likewise, For Row # 6, based on all previous P_ID values, we should get NEED = '006' for that column. Can you please help on this.

 

Thanks all and Have a great day.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If there are no circular references, then this task can be simplified:

 

data want;
  set have;
  if _n_=1 then do;
    declare hash h (dataset:'have (where=(not missing (p_id)))');
     h.definekey('id');
     h.definedata('p_id');
     h.definedone();
  end;
  if not missing(p_id) then do until(h.find()^=0);
    id=p_id;
  end;
  need=p_id;
  set have;
run;
    

The h.find() method for the hash object h is a way to step from ID to P_ID, then to the P_ID of the P_ID, etc., until no further P_ID can be found.  The only problem is that the loop using the h.find() method, keeps modifying  P_ID and ID until the find failure, losing the original ID and P_ID..  But this is enough to get the NEED value.

 

Then just re-read the original observation (via the second SET statement) to restore the original ID and P_ID values.  The point is that the two SET statement are a way to read each observation twice, once before establishing NEED, and once after.  

 

 

Additional material below:  

 

PS: Here's what the program would look like if you add a circularity check.  It establishes an _orig_id=id  and adds a test for "id=_orig_id" in the UNTIL clause:  

 

data want (drop=_:);
  set have;
  if _n_=1 then do;
    declare hash h (dataset:'have (where=(not missing (p_id)))');
     h.definekey('id');
     h.definedata('p_id');
     h.definedone();
  end;

  _orig_id=id;     /* For circularity check */

  if not missing(p_id) then do until(h.find()^=0 or id=_orig_id);
    id=p_id;
  end;
  need=p_id;
  set have;
run;

However, this code is NOT robust against having a given P_ID show up twice, when at least one instance is in a circular reference.    Although multiple instances of a given P_ID in which none of them occur is a circle does not present a problem.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

So you want to reference the original parent ID? And I suppose you have more than one original parent ID?

marneni02
Fluorite | Level 6

Yes, that is correct

Patrick
Opal | Level 21

@marneni02 Something like below should do the job.

data have;
  input ID $ p_id $;
  datalines;
006 .
007 006
008 .
009 .
010 007
011 010
012 008
013 012
014 015
015 014
;

proc sql;
  create view hierarchy as
    select
      id as _child,
      p_id as _parent
    from have
    where p_id is not missing
    ;
quit;

%let max_depth=99;
data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set hierarchy;
      dcl hash h1(dataset:'hierarchy');
      h1.defineKey('_child');
      h1.defineData('_child','_parent');
      h1.defineDone();
    end;
  call missing(_child,_parent);

  set have;

  if not missing(p_id) then
    do; 
      _parent=p_id;
      do _i=1 to &max_depth until(h1.find(key:_parent) ne 0);
      end;
      need=_parent;
      /* investigate data if _i > &max_depth */
      if _i>&max_depth then circular_reference_flg=1;
    end;
run;

Patrick_0-1627091756087.png

 

marneni02
Fluorite | Level 6

Thank you so much, this is really helpful and it works to my situation.

 

Thank you so much again.

mkeintz
PROC Star

If there are no circular references, then this task can be simplified:

 

data want;
  set have;
  if _n_=1 then do;
    declare hash h (dataset:'have (where=(not missing (p_id)))');
     h.definekey('id');
     h.definedata('p_id');
     h.definedone();
  end;
  if not missing(p_id) then do until(h.find()^=0);
    id=p_id;
  end;
  need=p_id;
  set have;
run;
    

The h.find() method for the hash object h is a way to step from ID to P_ID, then to the P_ID of the P_ID, etc., until no further P_ID can be found.  The only problem is that the loop using the h.find() method, keeps modifying  P_ID and ID until the find failure, losing the original ID and P_ID..  But this is enough to get the NEED value.

 

Then just re-read the original observation (via the second SET statement) to restore the original ID and P_ID values.  The point is that the two SET statement are a way to read each observation twice, once before establishing NEED, and once after.  

 

 

Additional material below:  

 

PS: Here's what the program would look like if you add a circularity check.  It establishes an _orig_id=id  and adds a test for "id=_orig_id" in the UNTIL clause:  

 

data want (drop=_:);
  set have;
  if _n_=1 then do;
    declare hash h (dataset:'have (where=(not missing (p_id)))');
     h.definekey('id');
     h.definedata('p_id');
     h.definedone();
  end;

  _orig_id=id;     /* For circularity check */

  if not missing(p_id) then do until(h.find()^=0 or id=_orig_id);
    id=p_id;
  end;
  need=p_id;
  set have;
run;

However, this code is NOT robust against having a given P_ID show up twice, when at least one instance is in a circular reference.    Although multiple instances of a given P_ID in which none of them occur is a circle does not present a problem.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
marneni02
Fluorite | Level 6

Thank you so much and this is really simple and straight to the point. I really appreciate your help.

 

Thanks again for your valuable on this

Patrick
Opal | Level 21

@mkeintz This lookup will in case of a match overwrite p_id in the base table with p_id from the hash. This could lead to a wrong outcome in case of DQ issues in the base table.

mkeintz
PROC Star

@Patrick wrote:

@mkeintz This lookup will in case of a match overwrite p_id in the base table with p_id from the hash. This could lead to a wrong outcome in case of DQ issues in the base table.


Yes.  I believe that issue is addressed by the second SET statement, that follows the calculation of NEED.

 

"DQ issues".  Is that "Data Quality"?  I qualified my code offering as not suitable for data having circular references (although a check for circularity could be easily put in).  Is there something else I missed (won't be the first time)?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 665 views
  • 3 likes
  • 4 in conversation