Hello folks. i have some data which needs to be split into 12 or so different groups, there is no key and the order the data is in is important.
the data has a number of groups and those groups have singular and / or nested groups within that. each group will be split out as the data is (again like some previous posts) in a hierarchical format. so each "GROUP" then has its own format which then all needs to be joined up on one line.
the difference is to the other issues i've faced is this has many nested groups..
the best way i can write out the hierarchy groupings is
1. Transaction [TRANS]
1.1. Meter Point [MTPNT]
1.1.1. Asset [ASSET]
1.1.1.1. Meter [METER]
1.1.1.2. Converter [CONVE]
1.1.1.3. Register Details [REGST]
1.1.1.3.1. Reading [READG]
1.1.1.4. Market Participant [MKPRT]
1.1.1.5. Name [NAME]
1.1.1.5.1. Address [ADDRS]
1.1.1.5.2. Contact Mechanism [CONTM]
1.2. Appointment [APPNT]
1.3. Name [NAME]
1.3.1. Address [ADDRS]
1.3.2. Contact Mechanism [CONTM]
1.4. Market Participant [MKPRT]
a previous method i have used before was retain and by groups, but that was only for one nested group and i feel i'm getting lost when trying it over so many groups, as once they are split (this is needed for labeling the variables per split) i need a key to join the sub groups back to the parent then, join that back to the main group.
cavaet: there are 40k extracts with 10k rows each, and not all of the files have all the groups and subgroups each file.
data sample:
"TRANS","JOB1000287648","","","EXCHG","","","","COMLT","","D",,"OAMI","501928",, "MTPNT","",2143024201,"","","","","",1.022640,, "ASSET","","REMVE","METER","","PP","METLG210","LPG",2014,"L9999999999M","01","","RE" "METER","","U","ET",6.0000,"","",,"",, "REGST","","METER",5,"SCMH",1.000 "READG",20160213,,"50384" "ASSET","","INSTL","METER","","PP","METLG210","LPG",2014,"L23453466756X","01","","LI" "METER","","U","ET",6.0000,"","",,"",, "REGST","","METER",5,"SCMH",1.000 "READG",20160213,,"00566" "APPNT","",20160213,,"","080000",""
I'm trying to achieve to get a primary key for each group and sub group and sub sub group for when i join it back to TRANS. I need help in the thought process of how to get these key, my initial thoughts a first. by group processing, but with so many nested groups, i need it to be unique to join back together as the sample above would be 2 rows 2 ASSET subgroups under 1 TRANS parent grouping.
anyone who has dealt with energy and flow data before would run a mile, hahaha!
Thanks Everyone for reading,
Matt
SAS 9.4M2 Grid Server on REHL
have seen the following post
https://communities.sas.com/t5/SAS-Programming/Enumeration-Varible-3-By-Groups/m-p/547771#M151839
maybe i need combination of grouping the the "first." declaration
@teelov wrote:
Hello folks. i have some data which needs to be split into 12 or so different groups, there is no key and the order the data is in is important.
the data has a number of groups and those groups have singular and / or nested groups within that. each group will be split out as the data is (again like some previous posts) in a hierarchical format. so each "GROUP" then has its own format which then all needs to be joined up on one line.
the difference is to the other issues i've faced is this has many nested groups..
the best way i can write out the hierarchy groupings is
1. Transaction [TRANS]
1.1. Meter Point [MTPNT]
1.1.1. Asset [ASSET]
1.1.1.1. Meter [METER]
1.1.1.2. Converter [CONVE]
1.1.1.3. Register Details [REGST]
1.1.1.3.1. Reading [READG]
1.1.1.4. Market Participant [MKPRT]
1.1.1.5. Name [NAME]
1.1.1.5.1. Address [ADDRS]
1.1.1.5.2. Contact Mechanism [CONTM]
1.2. Appointment [APPNT]
1.3. Name [NAME]
1.3.1. Address [ADDRS]
1.3.2. Contact Mechanism [CONTM]
1.4. Market Participant [MKPRT]
a previous method i have used before was retain and by groups, but that was only for one nested group and i feel i'm getting lost when trying it over so many groups, as once they are split (this is needed for labeling the variables per split) i need a key to join the sub groups back to the parent then, join that back to the main group.
cavaet: there are 40k extracts with 10k rows each, and not all of the files have all the groups and subgroups each file.
data sample:
"TRANS","JOB1000287648","","","EXCHG","","","","COMLT","","D",,"OAMI","501928",, "MTPNT","",2143024201,"","","","","",1.022640,, "ASSET","","REMVE","METER","","PP","METLG210","LPG",2014,"L9999999999M","01","","RE" "METER","","U","ET",6.0000,"","",,"",, "REGST","","METER",5,"SCMH",1.000 "READG",20160213,,"50384" "ASSET","","INSTL","METER","","PP","METLG210","LPG",2014,"L23453466756X","01","","LI" "METER","","U","ET",6.0000,"","",,"",, "REGST","","METER",5,"SCMH",1.000 "READG",20160213,,"00566" "APPNT","",20160213,,"","080000",""I'm trying to achieve to get a primary key for each group and sub group and sub sub group for when i join it back to TRANS. I need help in the thought process of how to get these key, my initial thoughts a first. by group processing, but with so many nested groups, i need it to be unique to join back together as the sample above would be 2 rows 2 ASSET subgroups under 1 TRANS parent grouping.
anyone who has dealt with energy and flow data before would run a mile, hahaha!
Thanks Everyone for reading,
Matt
SAS 9.4M2 Grid Server on REHL
The first thing you will need to work on is the values ADDRS, NAME, MKPRT and CONTM. The implication from your example data is that is a value in the data but you have the same text value assigned to two different levels of your hierarchy. How, if all that appears in the data, are we to tell which value the encountered ADDRS, NAME, MKPRT or CONTM should have or represent???
There are so many possible meanings of "split" that I would really need a very concrete example of what you mean. And possibly a lot of detail describing the rules.
Much less what it might mean a "for when i join it back to TRANS".
IF the value you show as the value in the data, such as ADDRS were unique in only appearing with a single meaning then a custom informat that assigned valued would be likely a valid approach and for creating summaries and reports possibly a Multilabel format would be appropriate.
Something like this is an example:
proc format library=work; invalue heir (upcase) 'TRANS' = 10000 'MTPNT' = 11000 'ASSET' = 11100 'METER' = 11110 'CONVE' = 11120 'REGST' = 11130 'READG' = 11131 'MKPRT' = 11140 'NAME' = 11150 'ADDRS' = 11151 'CONTM' = 11152 'APPNT' = 12000 /*'NAME' = 13000 */ /*'ADDRS' = 13100 */ /*'CONTM' = 13200 */ /*'MKPRT' = 14000 */ ; value heir 10000 = 'Transaction' 11000 = 'Meter Point' 11100 = 'Asset' 11110 = 'Meter' 11120 = 'Converter' 11130 = 'Register Details' 11131 = 'Reading' 11140 = 'Market Participant' 11150 = 'Name' 11151 = 'Address' 11152 = 'Contact Mechanism' 12000 = 'Appointment' /*13000 = 'Name' */ /*13100 = 'Address' */ /*13200 = 'Contact Mechanism' */ /*14000 = 'Market Participant'*/ ; data example; infile datalines dlm=',' truncover; input value :heir.; datalines; TRANS,JOB1000287648,,,EXCHG,,,,COMLT,,D,,OAMI,501928,, MTPNT,,2143024201,,,,,,1.022640,, ASSET,,REMVE,METER,,PP,METLG210,LPG,2014,L9999999999M,01,,RE METER,,U,ET,6.0000,,,,,, REGST,,METER,5,SCMH,1.000 READG,20160213,,50384 ASSET,,INSTL,METER,,PP,METLG210,LPG,2014,L23453466756X,01,,LI METER,,U,ET,6.0000,,,,,, REGST,,METER,5,SCMH,1.000 READG,20160213,,00566 APPNT,,20160213,,,080000, ; proc print data=example; format value heir.; run;
Note that the repeated values for the hierarchy are commented out because that will not work for an invalue (informat) statement that attempts to assign two different values to the read string.
Thank you for the help so far.
the values in my explanation above 1.1 and 1.1.1.3 and so on is me trying to explain how the data hierarchy work, the numbering is just visual.
each keyword you see has its own field specification, in total there are 120 possible fields spit across all the groupings. i need to split out the groups and treat them like small tables, but need to create a key to bring them back together in one group. the data above is a 1 transaction where 2 meters were involved.
so i need help with generating the numeric counters for each "depth" of each level
i'm imagining something like this at the end of the master dataset before i create the subsets for each group.
i did see the below example work by @PeterClemmensenthis work on 3 fields, but in a there were 3 variables to work with - https://communities.sas.com/t5/SAS-Programming/Enumeration-Varible-3-By-Groups/m-p/547771#M151839
data want;
set have;
by PID notsorted RID notsorted LID notsorted;
if first.PID | first.RID | first.LID then do;
CountVariable+1;
RestartVariable+1;
end;
if first.PID then RestartVariable=1;
run;
so if i forget the rest of the data for now and just think of the pattern, i'm trying to create counters for each level based on the parent child rules above numbered.
this is not correct, but i'm trying to illustrate what i'm trying to achieve
_n_ | TRANS_itr | TRANS_KEY | MTPNT_itr | MTPNT_KEY | |
TRANS | 1 | 1 | 1 | 0 | 0 |
MTPNT | 2 | 2 | 1 | 1 | 1 |
ASSET | 3 | 3 | 1 | 2 | 1 |
METER | 4 | 4 | 1 | 3 | 1 |
READG | 5 | 5 | 1 | 4 | 1 |
MTPNT | 6 | 6 | 1 | 1 | 2 |
ASSET | 7 | 7 | 1 | 2 | 2 |
METER | 8 | 8 | 1 | 3 | 2 |
READG | 9 | 9 | 1 | 4 | 2 |
APPNT | 10 | 10 | 1 | 5 | 2 |
TRANS | 11 | 1 | 2 | 6 | 2 |
MTPNT | 12 | 2 | 2 | 1 | 3 |
ASSET | 13 | 3 | 2 | 2 | 3 |
METER | 14 | 4 | 2 | 3 | 3 |
READG | 15 | 5 | 2 | 4 | 3 |
MTPNT | 16 | 6 | 2 | 1 | 4 |
ASSET | 17 | 7 | 2 | 2 | 4 |
METER | 18 | 8 | 2 | 3 | 4 |
READG | 19 | 9 | 2 | 4 | 4 |
APPNT | 20 | 10 | 2 | 5 | 4 |
looks like i'm going to have to create two counters for each group i think
this was my attempt, very basic to start with to see if it is possible in thoery, but none of the keys match up to the groupings
data TRANS;
set mpancreate_a;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKey+1;
end;
run;
data TRANS;
set TRANS;
TRANSKey2 + 1;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKEY2=1;
end;
run;
data MTPNT;
set TRANS;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "MTPNT" then
MTPNTKEY+1;
end;
run;
data MTPNT;
set MTPNT;
by MTPNTKEY NOTSORTED;
if first.MTPNTKEY and DataItmGrp = "MTPNT" then
MTPNTKEY2=0;
MTPNTKEY2+1;
run;
data ASSET;
set MTPNT;
IF MTPNTKEY = 0 THEN
MTPNTKEY2=0;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "ASSET" then
ASSETKEY+1;
end;
run;
data ASSET;
set ASSET;
by ASSETKEY NOTSORTED;
if first.ASSETKEY and DataItmGrp = "ASSET" then
ASSETKEY2=0;
ASSETKEY2+1;
IF ASSETKEY =0 THEN
ASSETKEY2=0;
run;
data METER;
set ASSET;
IF ASSETKEY = 0 THEN
ASSETKEY2=0;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "METER" then
METERKEY+1;
end;
run;
data METER;
set METER;
by ASSETKEY NOTSORTED;
if first.ASSETKEY and DataItmGrp = "METER" then
METERKEY2=0;
METERKEY2+1;
IF METERKEY =0 THEN
METERKEY2=0;
run;
data CONVE;
set METER;
IF METERKEY = 0 THEN
METERKEY2=0;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "CONVE" then
CONVEKEY+1;
end;
run;
data METER;
set METER;
by ASSETKEY NOTSORTED;
if first.ASSETKEY and DataItmGrp = "CONVE" then
CONVEKEY2=0;
CONVEKEY2+1;
IF CONVEKEY =0 THEN
CONVEKEY2=0;
run;
data REGST;
set METER;
IF METERKEY = 0 THEN
METERKEY2=0;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "REGST" then
REGSTGKEY+1;
end;
run;
data PRE_SPLIT;
set REGST;
by REGSTGKEY NOTSORTED;
if first.REGSTGKEY and DataItmGrp = "REGST" then
REGSTGKEY2=0;
REGSTGKEY2+1;
IF REGSTGKEY =0 THEN
REGSTGKEY2=0;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.