BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
suvi107
Calcite | Level 5

Hi There,

Faced with a problem where I need to convert a dataset to another form of dataset.

Current dataset structure is below.

Child    parent   Level
xxx                    1

yyy        xxx       2

zzz        yyy       3

...

...

...

max upto six levels

What I need is a structure like

Child(where level =1)   parent1  parent2  parent3 .... parent6

xxx                             yyy       zzz

Regards,

Suvi

1 ACCEPTED SOLUTION

Accepted Solutions
suvi107
Calcite | Level 5

Converted it to a more sensible dynamic Macro.

%Macro Tstruct_Flat();

/*****Check what is the lowest level LCD present in the structure*****/

proc sql;
  Select max(level) into :max_level from work.lcd_raw ;
quit;


/*****Collect the highest level LCD's in the table****/
proc sql;
  Create table Tstruct_temp as select Child,level from work.lcd_raw where level=1 ;
quit;

%do i  = 2 %to &max_level;

  %let j = %eval(&i. - 1); /*Macro do loop variables to be used inside the loop*/
  %let k = %eval(&i. - 2); /*Macro do loop variables to be used inside the loop*/
  %let l = %eval(&i. ); /*Macro do loop variables to be used inside the loop*/

  %if %eval(&i.*1) = 2 %then %do;
   proc sql;
   Create table Tstruct_temp as select a.*, b.child as parent&j. , b.level as level&l. from Tstruct_temp as a left join work.lcd_raw as b on a.child = b.parent and b.level = &i. ;
   quit;
  %end;

  %if %eval(&i.*1) gt 2 %then %do;
   proc sql;
   Create table Tstruct_temp as select a.*, b.child as parent&j. , b.level as level&l. from Tstruct_temp as a left join work.lcd_raw as b on a.parent&k. = b.parent and b.level = &i. ;
   quit;
  %end;
%end;
%mend;

Thanks,

Suvi

View solution in original post

13 REPLIES 13
Scott_Mitchell
Quartz | Level 8

Is this what you are looking to do?

DATA HAVE;

INFILE DATALINES DLM=",";

INPUT CHILD $ PARENT $ LEVEL;

DATALINES;

XXX,YYY,1

XXX,ZZZ,2

YYY,XXX,2

ZZZ,YYY,3

;

RUN;

PROC TRANSPOSE DATA=HAVE OUT=WANT PREFIX=PARENT;

ID LEVEL;

BY CHILD;

VAR PARENT;

RUN;

Vince28_Statcan
Quartz | Level 8

I don't think the data set has a record for each level of parent for any given child. You would have to somehow preprocess the data to create a record with a level1 child and his level 3/4/5/6 parents to appropriately use a transpose/by combination. At least so it felt with the data sample

LinusH
Tourmaline | Level 20

Not necessary if there is no need for values on each level, just switch to left joins in the SQL example.

Data never sleeps
LinusH
Tourmaline | Level 20

This is a quite typical task: extracting generic hierarchical data and transform it to a dimensional table.

Wouldn't it be nice if SAS could make a PROC and/or a DI Studio transformation that could deal with this task in a generic way?

Data never sleeps
Vince28_Statcan
Quartz | Level 8

Best trick is to ask your hierachical data provider to send you directly the xml files and do a custom maping yourself. At least given the fact that most hierachical data is at one point or another stored as xml.

It is very easy to self learn the basics of xml mapping

LinusH
Tourmaline | Level 20

XML?

No... It's typical in many enterprise data models to store hierarchical data in generic parent-child tables.

The benefit is that the data model does not need to define the maximum no of levels that your hierarchies can consist.

The drawback, at least when using SAS, it generates quite complex code to unwind it to "flat" dimensional tables, used in a star schema for instance.

So in my perspective, there are no "hierarchical data providers", it's a part of the DW architecture. 

Data never sleeps
Tom
Super User Tom
Super User

If you have SAS/OR licensed you might want to look at PROC NETDRAW.  Might not help with this but I have used in the past for processing this type of linked network data.

LinusH
Tourmaline | Level 20

I have access to OR via SAS Alliance license, but OR is not a big hit in most DW sites... 😉

Took a glance at it, and it looks like it does a lot of graphical stuff. If will look into it deeper when I have the time.

Thanx for the tip anyway...

Data never sleeps
Vince28_Statcan
Quartz | Level 8

Not necessarily the simplest approach but here's how I'd go about it.

data want;

     if _n_=1 then do;

     declare hash hl2(dataset: "have(where=(level=2) rename=(child=nextchild))");

     hl2.defineKey('parent');

     hl2.defineData('nextchild');

     hl2.defineDone();

     declare hash hl3(dataset: "have(where=(level=3) rename=(child=nextchild))");

     hl3.defineKey('parent');

     hl3.defineData('nextchild');

     hl3.defineDone();

     declare hash hl4(dataset: "have(where=(level=4) rename=(child=nextchild))");

     hl4.defineKey('parent');

     hl4.defineData('nextchild');

     hl4.defineDone();

     declare hash hl5(dataset: "have(where=(level=5) rename=(child=nextchild))");

     hl5.defineKey('parent');

     hl5.defineData('nextchild');

     hl5.defineDone();

     declare hash hl6(dataset: "have(where=(level=6) rename=(child=nextchild))");

     hl6.defineKey('parent');

     hl6.defineData('nextchild');

     hl6.defineDone();

     end;

     set have(where=(level=1));

     rc=hl2.find(key: child);

     if rc=0 then do;

          parent1=nextchild;

          rc=hl3.find(key: nextchild);

          if rc=0 then do;

               parent2=nextchild;

               rc=hl4.find(key: nextchild);

               if rc=0 then do;

                    parent3=nextchild;

                    rc=hl5.find(key: nextchild);

                    if rc=0 then do;

                         parent4=nextchild;

                         rc=hl6.find(key: nextchild);

                         if rc=0 then parent5=nextchild;

                    end;

                end;

          end;

     end;

     drop nextchild parent level rc;

run;

Based on your data example, I assumed there were 6 levels but since level1 doesn't have a parent, that there should only be parent1-parent5 and not parent-parent6 variables to fill. If there are 7 levels or if level1 can have both a parent and a child it can be worked around.

The issue with the above approach is that if any "key" is present multiple times (like if xxx has parents yyy and zzz) then the hash table, defined as is won't support it and will retain only one of the two (I can't remember if it retains first key or replaces). It can be worked around using multidata option but then the looping becomes a tad bit hectic.

Point being, conceptually this does a merge whilst renaming variables appropriately of the following subsets of your original data:

have(where(level=1))

have(where(level=2))

...

have(where(level=6))

Using this successive approach with the above data subsets and renaming the variable should also allow you to provide the appropriate results without messing with multidata hash tables like below:

proc sql;

     select t1.child, t2.parent1, t3.parent2, t4.parent3, t5.parent4, t6.parent5

     from

     have(where=(level=1) drop=parent) as t1

     left join

     have(where=(level=2) rename=(parent=child child=parent1))

     on t1.child=t2.child

     left join

     have(where=(level=3) rename=(parent=parent1 child=parent2)) as t3

     on t2.parent1=t3.parent1

     left join

     have(where=(level=4) rename=(parent=parent2 child=parent3)) as t4

     on t3.parent2=t4.parent2

     left join

     have(where=(level=5) rename=(parent=parent3 child=parent4)) as t5

     on t4.parent3=t5.parent3

     left join

     have(where=(level=6) rename=(parent=parent4 child=parent5)) as t6

     on t5.parent4=t6.parent4

;

quit;

Note this is all untested its more of a logical guideline on how you can tackle the problem.

Vince

*Edit indeed left join seemed more appropriate for the OPs objectives so I edited the sql example

DBailey
Lapis Lazuli | Level 10

I might point out semantically that your example is backwards.  XXX is the parent of YYY, not the child.

But..to finish the question...Here's a more direct, brute force method:

proc sql;

create table work.want (

     child char(3),

     parent1 char(3),

     parent2 char(3),

     parent3 char(3),

     parent4 char(3),

     parent5 char(3),

     parent6 char(3));

insert into work.want (child)

select child from have where level=1;

update work.want t1

set parent1=(select parent from have where child=t1.child);

update work.want t1

set parent2=(select parent from have where child=t1.parent1);

update work.want t1

set parent3=(select parent from have where child=t1.parent2);

update work.want t1

set parent4=(select parent from have where child=t1.parent3);

update work.want t1

set parent5=(select parent from have where child=t1.parent4);

update work.want t1

set parent6=(select parent from have where child=t1.parent5);

quit;

Astounding
PROC Star

This should get you most of the way there.  You will want to rename the variables, however.

data have;

   set have;

   if level=1 then grouping + 1;

run;

proc transpose data=have out=want prefix=Level_;

   by grouping;

   var child;

   id level;

run;

It does assume that the proper order exists in your hierarchical file.

Good luck.

suvi107
Calcite | Level 5

Thanks everybody for help.

I did manage to find the solution. Its not an optimzed way but it works.

proc sql;

create table work.want (

);


insert into work.want (child)
select child from have where level=1;

Create table want as select a.*, b.child as parent1 from want as a left join have as b on a.child = b.parent and b.level = 2 ;
Create table want as select a.*, b.child as parent2 from want as a left join have as b on a.parent1 = b.parent and b.level = 3;
Create table want as select a.*, b.child as parent3 from want as a left join have as b on a.parent2 = b.parent and b.level = 4;
Create table want as select a.*, b.child as parent4 from want as a left join have as b on a.parent3 = b.parent and b.level = 5;
Create table want as select a.*, b.child as parent5 from want as a left join have as b on a.parent4 = b.parent and b.level = 6;

quit;

Regards,

Suvi

suvi107
Calcite | Level 5

Converted it to a more sensible dynamic Macro.

%Macro Tstruct_Flat();

/*****Check what is the lowest level LCD present in the structure*****/

proc sql;
  Select max(level) into :max_level from work.lcd_raw ;
quit;


/*****Collect the highest level LCD's in the table****/
proc sql;
  Create table Tstruct_temp as select Child,level from work.lcd_raw where level=1 ;
quit;

%do i  = 2 %to &max_level;

  %let j = %eval(&i. - 1); /*Macro do loop variables to be used inside the loop*/
  %let k = %eval(&i. - 2); /*Macro do loop variables to be used inside the loop*/
  %let l = %eval(&i. ); /*Macro do loop variables to be used inside the loop*/

  %if %eval(&i.*1) = 2 %then %do;
   proc sql;
   Create table Tstruct_temp as select a.*, b.child as parent&j. , b.level as level&l. from Tstruct_temp as a left join work.lcd_raw as b on a.child = b.parent and b.level = &i. ;
   quit;
  %end;

  %if %eval(&i.*1) gt 2 %then %do;
   proc sql;
   Create table Tstruct_temp as select a.*, b.child as parent&j. , b.level as level&l. from Tstruct_temp as a left join work.lcd_raw as b on a.parent&k. = b.parent and b.level = &i. ;
   quit;
  %end;
%end;
%mend;

Thanks,

Suvi

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 4113 views
  • 8 likes
  • 7 in conversation