Multiple Rule for by group enumeration for creating keys

teelov · Posted 09-18-2019 02:43 PM

Hello folks. i have some data which needs to be split into 12 or so different groups, there is no key and the order the data is in is important.

the data has a number of groups and those groups have singular and / or nested groups within that. each group will be split out as the data is (again like some previous posts) in a hierarchical format. so each "GROUP" then has its own format which then all needs to be joined up on one line.

the difference is to the other issues i've faced is this has many nested groups..

the best way i can write out the hierarchy groupings is

1. Transaction [TRANS]
1.1. Meter Point [MTPNT]
1.1.1. Asset [ASSET]
1.1.1.1. Meter [METER]
1.1.1.2. Converter [CONVE]
1.1.1.3. Register Details [REGST]
1.1.1.3.1. Reading [READG]
1.1.1.4. Market Participant [MKPRT]
1.1.1.5. Name [NAME]
1.1.1.5.1. Address [ADDRS]
1.1.1.5.2. Contact Mechanism [CONTM]
1.2. Appointment [APPNT]
1.3. Name [NAME]
1.3.1. Address [ADDRS]
1.3.2. Contact Mechanism [CONTM]
1.4. Market Participant [MKPRT]

a previous method i have used before was retain and by groups, but that was only for one nested group and i feel i'm getting lost when trying it over so many groups, as once they are split (this is needed for labeling the variables per split) i need a key to join the sub groups back to the parent then, join that back to the main group.

cavaet: there are 40k extracts with 10k rows each, and not all of the files have all the groups and subgroups each file.

data sample:

"TRANS","JOB1000287648","","","EXCHG","","","","COMLT","","D",,"OAMI","501928",,
"MTPNT","",2143024201,"","","","","",1.022640,,
"ASSET","","REMVE","METER","","PP","METLG210","LPG",2014,"L9999999999M","01","","RE"
"METER","","U","ET",6.0000,"","",,"",,
"REGST","","METER",5,"SCMH",1.000
"READG",20160213,,"50384"
"ASSET","","INSTL","METER","","PP","METLG210","LPG",2014,"L23453466756X","01","","LI"
"METER","","U","ET",6.0000,"","",,"",,
"REGST","","METER",5,"SCMH",1.000
"READG",20160213,,"00566"
"APPNT","",20160213,,"","080000",""

I'm trying to achieve to get a primary key for each group and sub group and sub sub group for when i join it back to TRANS. I need help in the thought process of how to get these key, my initial thoughts a first. by group processing, but with so many nested groups, i need it to be unique to join back together as the sample above would be 2 rows 2 ASSET subgroups under 1 TRANS parent grouping.

anyone who has dealt with energy and flow data before would run a mile, hahaha!

Thanks Everyone for reading,

Matt

SAS 9.4M2 Grid Server on REHL

teelov · Posted 09-18-2019 03:35 PM

have seen the following post

https://communities.sas.com/t5/SAS-Programming/Enumeration-Varible-3-By-Groups/m-p/547771#M151839

maybe i need combination of grouping the the "first." declaration

ballardw · Posted 09-18-2019 04:51 PM

@teelov wrote:

Hello folks. i have some data which needs to be split into 12 or so different groups, there is no key and the order the data is in is important.

the data has a number of groups and those groups have singular and / or nested groups within that. each group will be split out as the data is (again like some previous posts) in a hierarchical format. so each "GROUP" then has its own format which then all needs to be joined up on one line.

the difference is to the other issues i've faced is this has many nested groups..

the best way i can write out the hierarchy groupings is

1. Transaction [TRANS]
1.1. Meter Point [MTPNT]
1.1.1. Asset [ASSET]
1.1.1.1. Meter [METER]
1.1.1.2. Converter [CONVE]
1.1.1.3. Register Details [REGST]
1.1.1.3.1. Reading [READG]
1.1.1.4. Market Participant [MKPRT]
1.1.1.5. Name [NAME]
1.1.1.5.1. Address [ADDRS]
1.1.1.5.2. Contact Mechanism [CONTM]
1.2. Appointment [APPNT]
1.3. Name [NAME]
1.3.1. Address [ADDRS]
1.3.2. Contact Mechanism [CONTM]
1.4. Market Participant [MKPRT]

a previous method i have used before was retain and by groups, but that was only for one nested group and i feel i'm getting lost when trying it over so many groups, as once they are split (this is needed for labeling the variables per split) i need a key to join the sub groups back to the parent then, join that back to the main group.

cavaet: there are 40k extracts with 10k rows each, and not all of the files have all the groups and subgroups each file.

data sample:
"TRANS","JOB1000287648","","","EXCHG","","","","COMLT","","D",,"OAMI","501928",,
"MTPNT","",2143024201,"","","","","",1.022640,,
"ASSET","","REMVE","METER","","PP","METLG210","LPG",2014,"L9999999999M","01","","RE"
"METER","","U","ET",6.0000,"","",,"",,
"REGST","","METER",5,"SCMH",1.000
"READG",20160213,,"50384"
"ASSET","","INSTL","METER","","PP","METLG210","LPG",2014,"L23453466756X","01","","LI"
"METER","","U","ET",6.0000,"","",,"",,
"REGST","","METER",5,"SCMH",1.000
"READG",20160213,,"00566"
"APPNT","",20160213,,"","080000",""
I'm trying to achieve to get a primary key for each group and sub group and sub sub group for when i join it back to TRANS. I need help in the thought process of how to get these key, my initial thoughts a first. by group processing, but with so many nested groups, i need it to be unique to join back together as the sample above would be 2 rows 2 ASSET subgroups under 1 TRANS parent grouping.

anyone who has dealt with energy and flow data before would run a mile, hahaha!

Thanks Everyone for reading,

Matt

SAS 9.4M2 Grid Server on REHL

The first thing you will need to work on is the values ADDRS, NAME, MKPRT and CONTM. The implication from your example data is that is a value in the data but you have the same text value assigned to two different levels of your hierarchy. How, if all that appears in the data, are we to tell which value the encountered ADDRS, NAME, MKPRT or CONTM should have or represent???

There are so many possible meanings of "split" that I would really need a very concrete example of what you mean. And possibly a lot of detail describing the rules.

Much less what it might mean a "for when i join it back to TRANS".

IF the value you show as the value in the data, such as ADDRS were unique in only appearing with a single meaning then a custom informat that assigned valued would be likely a valid approach and for creating summaries and reports possibly a Multilabel format would be appropriate.

Something like this is an example:

proc format library=work;
invalue heir  (upcase)
'TRANS' = 10000 
'MTPNT' = 11000 
'ASSET' = 11100 
'METER' = 11110 
'CONVE' = 11120 
'REGST' = 11130 
'READG' = 11131 
'MKPRT' = 11140 
'NAME' = 11150 
'ADDRS' = 11151 
'CONTM' = 11152 
'APPNT' = 12000 
/*'NAME' = 13000 */
/*'ADDRS' = 13100 */
/*'CONTM' = 13200 */
/*'MKPRT' = 14000 */
;
value heir
10000 = 'Transaction' 
11000 = 'Meter Point' 
11100 = 'Asset' 
11110 = 'Meter' 
11120 = 'Converter' 
11130 = 'Register Details' 
11131 = 'Reading' 
11140 = 'Market Participant' 
11150 = 'Name' 
11151 = 'Address' 
11152 = 'Contact Mechanism' 
12000 = 'Appointment' 
/*13000 = 'Name' */
/*13100 = 'Address' */
/*13200 = 'Contact Mechanism' */
/*14000 = 'Market Participant'*/
;

data example;
   infile datalines dlm=',' truncover;
   input value :heir.;
datalines;
TRANS,JOB1000287648,,,EXCHG,,,,COMLT,,D,,OAMI,501928,,
MTPNT,,2143024201,,,,,,1.022640,,
ASSET,,REMVE,METER,,PP,METLG210,LPG,2014,L9999999999M,01,,RE
METER,,U,ET,6.0000,,,,,,
REGST,,METER,5,SCMH,1.000
READG,20160213,,50384
ASSET,,INSTL,METER,,PP,METLG210,LPG,2014,L23453466756X,01,,LI
METER,,U,ET,6.0000,,,,,,
REGST,,METER,5,SCMH,1.000
READG,20160213,,00566
APPNT,,20160213,,,080000,
;


proc print data=example;
  format value heir.;
run;

Note that the repeated values for the hierarchy are commented out because that will not work for an invalue (informat) statement that attempts to assign two different values to the read string.

teelov · Posted 09-19-2019 04:13 AM

Thank you for the help so far.

the values in my explanation above 1.1 and 1.1.1.3 and so on is me trying to explain how the data hierarchy work, the numbering is just visual.

each keyword you see has its own field specification, in total there are 120 possible fields spit across all the groupings. i need to split out the groups and treat them like small tables, but need to create a key to bring them back together in one group. the data above is a 1 transaction where 2 meters were involved.

so i need help with generating the numeric counters for each "depth" of each level

i'm imagining something like this at the end of the master dataset before i create the subsets for each group.

i did see the below example work by @PeterClemmensenthis work on 3 fields, but in a there were 3 variables to work with - https://communities.sas.com/t5/SAS-Programming/Enumeration-Varible-3-By-Groups/m-p/547771#M151839

data want;
   set have;
   by PID notsorted RID notsorted LID notsorted;
   if first.PID | first.RID | first.LID then do;
      CountVariable+1;
      RestartVariable+1;
   end;
   if first.PID then RestartVariable=1;
run;

so if i forget the rest of the data for now and just think of the pattern, i'm trying to create counters for each level based on the parent child rules above numbered.

this is not correct, but i'm trying to illustrate what i'm trying to achieve

	_n_	TRANS_itr	TRANS_KEY	MTPNT_itr	MTPNT_KEY
TRANS	1	1	1	0	0
MTPNT	2	2	1	1	1
ASSET	3	3	1	2	1
METER	4	4	1	3	1
READG	5	5	1	4	1
MTPNT	6	6	1	1	2
ASSET	7	7	1	2	2
METER	8	8	1	3	2
READG	9	9	1	4	2
APPNT	10	10	1	5	2
TRANS	11	1	2	6	2
MTPNT	12	2	2	1	3
ASSET	13	3	2	2	3
METER	14	4	2	3	3
READG	15	5	2	4	3
MTPNT	16	6	2	1	4
ASSET	17	7	2	2	4
METER	18	8	2	3	4
READG	19	9	2	4	4
APPNT	20	10	2	5	4

teelov · Posted 09-21-2019 01:40 PM

looks like i'm going to have to create two counters for each group i think

teelov · Posted 09-23-2019 08:38 AM

this was my attempt, very basic to start with to see if it is possible in thoery, but none of the keys match up to the groupings

data TRANS;
	set mpancreate_a;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then 
		do;
			if DataItmGrp = "TRANS" then 
				TRANSKey+1;
		end;
run;

data TRANS;
	set TRANS;
	TRANSKey2 + 1;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "TRANS" then
				TRANSKEY2=1;
		end;

	
run;

data MTPNT;
	set TRANS;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "MTPNT" then
				MTPNTKEY+1;
		end;
run;

data MTPNT;
	set MTPNT;
	by  MTPNTKEY NOTSORTED;

	if first.MTPNTKEY  and DataItmGrp = "MTPNT" then
		MTPNTKEY2=0;
	MTPNTKEY2+1;
run;

data ASSET;
	set MTPNT;

	IF MTPNTKEY = 0 THEN
		MTPNTKEY2=0;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "ASSET" then
				ASSETKEY+1;
		end;
run;

data ASSET;
	set ASSET;
	by  ASSETKEY NOTSORTED;

	if first.ASSETKEY  and DataItmGrp = "ASSET" then
		ASSETKEY2=0;
	ASSETKEY2+1;

	IF ASSETKEY =0 THEN
		ASSETKEY2=0;
run;

data METER;
	set ASSET;

	IF ASSETKEY = 0 THEN
		ASSETKEY2=0;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "METER" then
				METERKEY+1;
		end;
run;

data METER;
	set METER;
	by  ASSETKEY NOTSORTED;

	if first.ASSETKEY  and DataItmGrp = "METER" then
		METERKEY2=0;
	METERKEY2+1;

	IF METERKEY =0 THEN
		METERKEY2=0;
run;

data CONVE;
	set METER;

	IF METERKEY = 0 THEN
		METERKEY2=0;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "CONVE" then
				CONVEKEY+1;
		end;
run;

data METER;
	set METER;
	by  ASSETKEY NOTSORTED;

	if first.ASSETKEY  and DataItmGrp = "CONVE" then
		CONVEKEY2=0;
	CONVEKEY2+1;

	IF CONVEKEY =0 THEN
		CONVEKEY2=0;
run;


data REGST;
	set METER;

IF METERKEY = 0 THEN
		METERKEY2=0;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "REGST" then
				REGSTGKEY+1;
		end;
run;


data PRE_SPLIT;
	set REGST;
	by  REGSTGKEY NOTSORTED;

	if first.REGSTGKEY  and DataItmGrp = "REGST" then
		REGSTGKEY2=0;
	REGSTGKEY2+1;

	IF REGSTGKEY =0 THEN
		REGSTGKEY2=0;
run;

Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Re: Multiple Rule for by group enumeration for creating keys

Registration is open

SAS Training: Just a Click Away