BookmarkSubscribeRSS Feed
teelov
Quartz | Level 8

Hello folks. i have some data which needs to be split into 12 or so different groups, there is no key and the order the data is in is important.

 

the data has a number of groups and those groups have singular and / or nested groups within that. each group will be split out as the data is (again like some previous posts) in a hierarchical format. so each "GROUP" then has its own format which then all needs to be joined up on one line.

 

the difference is to the other issues i've faced is this has many nested groups..

 

the best way i can write out the hierarchy groupings is

 

1. Transaction [TRANS]
   1.1. Meter Point [MTPNT]
      1.1.1. Asset [ASSET]
         1.1.1.1. Meter [METER]
         1.1.1.2. Converter [CONVE]
         1.1.1.3. Register Details [REGST]
            1.1.1.3.1. Reading [READG]
         1.1.1.4. Market Participant [MKPRT]
         1.1.1.5. Name [NAME]
            1.1.1.5.1. Address [ADDRS]
            1.1.1.5.2. Contact Mechanism [CONTM]
   1.2. Appointment [APPNT]
   1.3. Name [NAME]
      1.3.1. Address [ADDRS]
      1.3.2. Contact Mechanism [CONTM]
   1.4. Market Participant [MKPRT]

 

a previous method i have used before was retain and by groups, but that was only for one nested group and i feel i'm getting lost when trying it over so many groups, as once they are split (this is needed for labeling the variables per split) i need a key to join the sub groups back to the parent then, join that back to the main group. 

 

cavaet: there are 40k extracts with 10k rows each, and not all of the files have all the groups and subgroups each file.

 

data sample:

 

 

"TRANS","JOB1000287648","","","EXCHG","","","","COMLT","","D",,"OAMI","501928",,
"MTPNT","",2143024201,"","","","","",1.022640,,
"ASSET","","REMVE","METER","","PP","METLG210","LPG",2014,"L9999999999M","01","","RE"
"METER","","U","ET",6.0000,"","",,"",,
"REGST","","METER",5,"SCMH",1.000
"READG",20160213,,"50384"
"ASSET","","INSTL","METER","","PP","METLG210","LPG",2014,"L23453466756X","01","","LI"
"METER","","U","ET",6.0000,"","",,"",,
"REGST","","METER",5,"SCMH",1.000
"READG",20160213,,"00566"
"APPNT","",20160213,,"","080000",""

I'm trying to achieve to get a primary key for each group and sub group and sub sub group for when i join it back to TRANS. I need help in the thought process of how to get these key, my initial thoughts a first. by group processing, but with so many nested groups, i need it to be unique to join back together as the sample above would be 2 rows 2 ASSET subgroups under 1 TRANS parent grouping.

 

anyone who has dealt with energy and flow data before would run a mile, hahaha!

 

Thanks Everyone for reading,

Matt

 

SAS 9.4M2 Grid Server on REHL

5 REPLIES 5
teelov
Quartz | Level 8

have seen the following post

 

https://communities.sas.com/t5/SAS-Programming/Enumeration-Varible-3-By-Groups/m-p/547771#M151839

 

maybe i need combination of grouping the the "first." declaration 

 

ballardw
Super User

@teelov wrote:

Hello folks. i have some data which needs to be split into 12 or so different groups, there is no key and the order the data is in is important.

 

the data has a number of groups and those groups have singular and / or nested groups within that. each group will be split out as the data is (again like some previous posts) in a hierarchical format. so each "GROUP" then has its own format which then all needs to be joined up on one line.

 

the difference is to the other issues i've faced is this has many nested groups..

 

the best way i can write out the hierarchy groupings is

 

1. Transaction [TRANS]
   1.1. Meter Point [MTPNT]
      1.1.1. Asset [ASSET]
         1.1.1.1. Meter [METER]
         1.1.1.2. Converter [CONVE]
         1.1.1.3. Register Details [REGST]
            1.1.1.3.1. Reading [READG]
         1.1.1.4. Market Participant [MKPRT]
         1.1.1.5. Name [NAME]
            1.1.1.5.1. Address [ADDRS]
            1.1.1.5.2. Contact Mechanism [CONTM]
   1.2. Appointment [APPNT]
   1.3. Name [NAME]
      1.3.1. Address [ADDRS]
      1.3.2. Contact Mechanism [CONTM]
   1.4. Market Participant [MKPRT]

 

a previous method i have used before was retain and by groups, but that was only for one nested group and i feel i'm getting lost when trying it over so many groups, as once they are split (this is needed for labeling the variables per split) i need a key to join the sub groups back to the parent then, join that back to the main group. 

 

cavaet: there are 40k extracts with 10k rows each, and not all of the files have all the groups and subgroups each file.

 

data sample:

 

 

"TRANS","JOB1000287648","","","EXCHG","","","","COMLT","","D",,"OAMI","501928",,
"MTPNT","",2143024201,"","","","","",1.022640,,
"ASSET","","REMVE","METER","","PP","METLG210","LPG",2014,"L9999999999M","01","","RE"
"METER","","U","ET",6.0000,"","",,"",,
"REGST","","METER",5,"SCMH",1.000
"READG",20160213,,"50384"
"ASSET","","INSTL","METER","","PP","METLG210","LPG",2014,"L23453466756X","01","","LI"
"METER","","U","ET",6.0000,"","",,"",,
"REGST","","METER",5,"SCMH",1.000
"READG",20160213,,"00566"
"APPNT","",20160213,,"","080000",""

I'm trying to achieve to get a primary key for each group and sub group and sub sub group for when i join it back to TRANS. I need help in the thought process of how to get these key, my initial thoughts a first. by group processing, but with so many nested groups, i need it to be unique to join back together as the sample above would be 2 rows 2 ASSET subgroups under 1 TRANS parent grouping.

 

anyone who has dealt with energy and flow data before would run a mile, hahaha!

 

Thanks Everyone for reading,

Matt

 

SAS 9.4M2 Grid Server on REHL


The first thing you will need to work on is the values ADDRS, NAME, MKPRT and CONTM. The implication from your example data is that is a value in the data but you have the same text value assigned to two different levels of your hierarchy. How, if all that appears in the data, are we to tell which value the encountered ADDRS, NAME, MKPRT or CONTM should have or represent???

 

There are so many possible meanings of "split" that I would really need a very concrete example of what you mean. And possibly a lot of detail describing the rules.

Much less what it might mean a "for when i join it back to TRANS".

IF the value you show as the value in the data, such as ADDRS were unique in only appearing with a single meaning then a custom informat that assigned valued would be likely a valid approach and for creating summaries and reports possibly a Multilabel format would be appropriate.

Something like this is an example:

proc format library=work;
invalue heir  (upcase)
'TRANS' = 10000 
'MTPNT' = 11000 
'ASSET' = 11100 
'METER' = 11110 
'CONVE' = 11120 
'REGST' = 11130 
'READG' = 11131 
'MKPRT' = 11140 
'NAME' = 11150 
'ADDRS' = 11151 
'CONTM' = 11152 
'APPNT' = 12000 
/*'NAME' = 13000 */
/*'ADDRS' = 13100 */
/*'CONTM' = 13200 */
/*'MKPRT' = 14000 */
;
value heir
10000 = 'Transaction' 
11000 = 'Meter Point' 
11100 = 'Asset' 
11110 = 'Meter' 
11120 = 'Converter' 
11130 = 'Register Details' 
11131 = 'Reading' 
11140 = 'Market Participant' 
11150 = 'Name' 
11151 = 'Address' 
11152 = 'Contact Mechanism' 
12000 = 'Appointment' 
/*13000 = 'Name' */
/*13100 = 'Address' */
/*13200 = 'Contact Mechanism' */
/*14000 = 'Market Participant'*/
;

data example;
   infile datalines dlm=',' truncover;
   input value :heir.;
datalines;
TRANS,JOB1000287648,,,EXCHG,,,,COMLT,,D,,OAMI,501928,,
MTPNT,,2143024201,,,,,,1.022640,,
ASSET,,REMVE,METER,,PP,METLG210,LPG,2014,L9999999999M,01,,RE
METER,,U,ET,6.0000,,,,,,
REGST,,METER,5,SCMH,1.000
READG,20160213,,50384
ASSET,,INSTL,METER,,PP,METLG210,LPG,2014,L23453466756X,01,,LI
METER,,U,ET,6.0000,,,,,,
REGST,,METER,5,SCMH,1.000
READG,20160213,,00566
APPNT,,20160213,,,080000,
;


proc print data=example;
  format value heir.;
run;

Note that the repeated values for the hierarchy are commented out because that will not work for an invalue (informat) statement that attempts to assign two different values to the read string.

teelov
Quartz | Level 8

Thank you for the help so far.

 

the values in my explanation above 1.1 and 1.1.1.3 and so on is me trying to explain how the data hierarchy work, the numbering is just visual.

 

each keyword you see has its own field specification, in total there are 120 possible fields spit across all the groupings. i need to split out the groups and treat them like small tables, but need to create a key to bring them back together in one group. the data above is a 1 transaction where 2 meters were involved.

 

so i need help with generating the numeric counters for each "depth" of each level

 

i'm imagining something like this at the end of the master dataset before i create the subsets for each group.

i did see the below example work by @PeterClemmensenthis work on 3 fields, but in a there were 3 variables to work with - https://communities.sas.com/t5/SAS-Programming/Enumeration-Varible-3-By-Groups/m-p/547771#M151839

 

data want;
   set have;
   by PID notsorted RID notsorted LID notsorted;
   if first.PID | first.RID | first.LID then do;
      CountVariable+1;
      RestartVariable+1;
   end;
   if first.PID then RestartVariable=1;
run;

so if i forget the rest of the data for now and just think of the pattern, i'm trying to create counters for each level based on the parent child rules above numbered. 

 

this is not correct, but i'm trying to illustrate what i'm trying to achieve

 

 _n_TRANS_itrTRANS_KEYMTPNT_itrMTPNT_KEY
TRANS11100
MTPNT22111
ASSET   33121
METER   44131
READG   55141
MTPNT66112
ASSET   77122
METER   88132
READG   99142
APPNT   1010152
TRANS111262
MTPNT122213
ASSET   133223
METER   144233
READG   155243
MTPNT166214
ASSET   177224
METER   188234
READG   199244
APPNT   2010254

 

 

 

 

 

 

teelov
Quartz | Level 8

looks like i'm going to have to create two counters for each  group i think

teelov
Quartz | Level 8

this was my attempt, very basic to start with to see if it is possible in thoery, but none of the keys match up to the groupings

 

data TRANS;
	set mpancreate_a;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then 
		do;
			if DataItmGrp = "TRANS" then 
				TRANSKey+1;
		end;
run;

data TRANS;
	set TRANS;
	TRANSKey2 + 1;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "TRANS" then
				TRANSKEY2=1;
		end;

	
run;

data MTPNT;
	set TRANS;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "MTPNT" then
				MTPNTKEY+1;
		end;
run;

data MTPNT;
	set MTPNT;
	by  MTPNTKEY NOTSORTED;

	if first.MTPNTKEY  and DataItmGrp = "MTPNT" then
		MTPNTKEY2=0;
	MTPNTKEY2+1;
run;

data ASSET;
	set MTPNT;

	IF MTPNTKEY = 0 THEN
		MTPNTKEY2=0;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "ASSET" then
				ASSETKEY+1;
		end;
run;

data ASSET;
	set ASSET;
	by  ASSETKEY NOTSORTED;

	if first.ASSETKEY  and DataItmGrp = "ASSET" then
		ASSETKEY2=0;
	ASSETKEY2+1;

	IF ASSETKEY =0 THEN
		ASSETKEY2=0;
run;

data METER;
	set ASSET;

	IF ASSETKEY = 0 THEN
		ASSETKEY2=0;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "METER" then
				METERKEY+1;
		end;
run;

data METER;
	set METER;
	by  ASSETKEY NOTSORTED;

	if first.ASSETKEY  and DataItmGrp = "METER" then
		METERKEY2=0;
	METERKEY2+1;

	IF METERKEY =0 THEN
		METERKEY2=0;
run;

data CONVE;
	set METER;

	IF METERKEY = 0 THEN
		METERKEY2=0;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "CONVE" then
				CONVEKEY+1;
		end;
run;

data METER;
	set METER;
	by  ASSETKEY NOTSORTED;

	if first.ASSETKEY  and DataItmGrp = "CONVE" then
		CONVEKEY2=0;
	CONVEKEY2+1;

	IF CONVEKEY =0 THEN
		CONVEKEY2=0;
run;


data REGST;
	set METER;

IF METERKEY = 0 THEN
		METERKEY2=0;
	by DataItmGrp NOTSORTED;

	if first.DataItmGrp then
		do;
			if DataItmGrp = "REGST" then
				REGSTGKEY+1;
		end;
run;


data PRE_SPLIT;
	set REGST;
	by  REGSTGKEY NOTSORTED;

	if first.REGSTGKEY  and DataItmGrp = "REGST" then
		REGSTGKEY2=0;
	REGSTGKEY2+1;

	IF REGSTGKEY =0 THEN
		REGSTGKEY2=0;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1548 views
  • 0 likes
  • 2 in conversation