Desktop productivity for business analysts and programmers

How to write better loops for grouping and sub grouping the data

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 87
Accepted Solution

How to write better loops for grouping and sub grouping the data

 

First please consider the following list of WC variable...

 

 

1101    1201    1301    1401    1501    1601
1102    1202    1302    1402    1502    1602
1103    1203    1303    1403    1503    1603
1104    1204    1304    1404    1504    1604
1105    1205    1305    1405    1505    1605
1106    1206    1306    1406    1506    1606
1107    1207    1307    1407    1507    1607
1108    1208    1308    1408    1508    1608
1109    1209    1309    1409    1509    1609
1110    1210    1310    1410    1510    1610
1111    1211    1311    1411    1511    1611
1112    1212    1312    1412    1512    1612
1113    1213    1313    1413    1513    1613
1114    1214    1314    1414    1514    1614
1115    1215    1315    1415    1515    1615
1116    1216    1316    1416    1516    1616
1117    1217    1317    1417    1517    1617
1118    1218    1318    1418    1518    1618
1119                    
1120                    

 

 

Now understand what I am trying to achieve. I have a very long dataset where in i have two variables are shown in the below table...

 

 

WC     ASN  
1101    0       
1101    1       
1101    2       
1101    3       
1101    4       
1101    20      
1101    21      
1101    22      
1101    23      
1101    24      
1101    25      
1101    26      
1101    27      
1101    28      
1101    45      
1101    46      
1101    47      
1101    48      
1201    4       
1201    5       
1201    6       
1201    7       
1201    8       
1201    16      
1201    17      
1201    18      
1201    19      
1201    20      
1201    28      
1201    29      
1201    30      
1201    31      
1201    32      
1201    41      
1201    42      
1201    43      
1201    44      
1202    4       
1202    5       
1202    6       
1202    7       
1202    8       
1202    16      
1202    17      
1202    18      
1202    19      
1202    20      
1202    29      
1202    30      
1202    31      
1202    32      
1202    40      
1202    41      
1202    42      
1202    43      
1202    44      

 

What I want to do is to add two more columns Group and SubGroup such that I get the final table as shown below:

 

WC  ASN Group   SubGroup
1101    0   1   1
1101    1   1   1
1101    2   1   1
1101    3   1   1
1101    4   1   1
1101    20  1   1
1101    21  1   1
1101    22  1   1
1101    23  1   1
1101    24  1   1
1101    25  1   1
1101    26  1   1
1101    27  1   1
1101    28  1   1
1101    45  1   1
1101    46  1   1
1101    47  1   1
1101    48  1   1
1201    4   1   2
1201    5   1   2
1201    6   1   2
1201    7   1   2
1201    8   1   2
1201    16  1   2
1201    17  1   2
1201    18  1   2
1201    19  1   2
1201    20  1   2
1201    28  1   2
1201    29  1   2
1201    30  1   2
1201    31  1   2
1201    32  1   2
1201    41  1   2
1201    42  1   2
1201    43  1   2
1201    44  1   2
1301    8   1   3
1301    9   1   3
1301    10  1   3
1301    11  1   3
1301    12  1   3
1301    13  1   3
1301    14  1   3
1301    15  1   3
1301    16  1   3
1301    32  1   3
1301    33  1   3
1301    34  1   3
1301    35  1   3
1301    36  1   3
1301    37  1   3
1301    38  1   3
1301    39  1   3
1301    40  1   3
1401    8   1   4
1401    9   1   4
1401    10  1   4
1401    11  1   4
1401    12  1   4
1401    13  1   4
1401    14  1   4
1401    15  1   4
1401    16  1   4
1401    33  1   4
1401    34  1   4
1401    35  1   4
1401    36  1   4
1401    37  1   4
1401    38  1   4
1401    39  1   4
1401    40  1   4
1501    4   1   5
1501    5   1   5
1501    6   1   5
1501    7   1   5
1501    8   1   5
1501    16  1   5
1501    17  1   5
1501    18  1   5
1501    19  1   5
1501    20  1   5
1501    29  1   5
1501    30  1   5
1501    31  1   5
1501    32  1   5
1501    40  1   5
1501    41  1   5
1501    42  1   5
1501    43  1   5
1501    44  1   5
1601    0   1   6
1601    1   1   6
1601    2   1   6
1601    3   1   6
1601    4   1   6
1601    20  1   6
1601    21  1   6
1601    22  1   6
1601    23  1   6
1601    24  1   6
1601    25  1   6
1601    26  1   6
1601    27  1   6
1601    28  1   6
1601    44  1   6
1601    45  1   6
1601    46  1   6
1601    47  1   6
1601    48  1   6

 

I was trying something like this...

 

 select;
    when (WC = 1101)  group = 1 subgroup=1;
    when (WC = 1201)  group = 1 subgroup=2;
    when (WC = 1301)  group = 1 subgroup=3;
    when (WC = 1401)  group = 1 subgroup=4;
    when (WC = 1501)  group = 1 subgroup=5;
    when (WC = 1601)  group = 1 subgroup=6;
    when (WC = 1102)  group = 2 subgroup=1;
    when (WC = 1202)  group = 2 subgroup=2;
    when (WC = 1302)  group = 2 subgroup=3;
    when (WC = 1402)  group = 2 subgroup=4;
    .
    .
    .
    when (WC = 1617)  group = 18 subgroup=5;
    when (WC = 1618)  group = 18 subgroup=6;
    when (WC = 1119)  group = 1 subgroup=1;
    otherwise group = 20 subgroup=1;
 end;

 

 

This is seriously long, tedious and confusing. I am sure there are better and short ways of doing it.

 

Please help.


Accepted Solutions
Solution
‎11-10-2016 04:36 AM
Grand Advisor
Posts: 9,576

Re: How to write better loops for grouping and sub grouping the data

data wc;
infile cards truncover;
input v1-v6;
group+1;
cards;
1101    1201    1301    1401    1501    1601
1102    1202    1302    1402    1502    1602
1103    1203    1303    1403    1503    1603
1104    1204    1304    1404    1504    1604
1105    1205    1305    1405    1505    1605
1106    1206    1306    1406    1506    1606
1107    1207    1307    1407    1507    1607
1108    1208    1308    1408    1508    1608
1109    1209    1309    1409    1509    1609
1110    1210    1310    1410    1510    1610
1111    1211    1311    1411    1511    1611
1112    1212    1312    1412    1512    1612
1113    1213    1313    1413    1513    1613
1114    1214    1314    1414    1514    1614
1115    1215    1315    1415    1515    1615
1116    1216    1316    1416    1516    1616
1117    1217    1317    1417    1517    1617
1118    1218    1318    1418    1518    1618
1119                    
1120    
;
run;   
data key;
 set wc;
 array x{*} v1-v6;
 do subgroup=1 to dim(x);
  if not missing(x{subgroup}) then do;wc=x{subgroup};output;end;
 end;  
 drop v1-v6;  
run;     
data have;
input WC  ASN ;
cards;
1101    0       
1101    1       
1101    2       
1101    3       
1101    4       
1101    20      
1101    21      
1101    22      
1101    23      
1101    24      
1101    25      
1101    26      
1101    27      
1101    28      
1101    45      
1101    46      
1101    47      
1101    48      
1201    4       
1201    5       
1201    6       
1201    7       
1201    8       
1201    16      
1201    17      
1201    18      
1201    19      
1201    20      
1201    28      
1201    29      
1201    30      
1201    31      
1201    32      
1201    41      
1201    42      
1201    43      
1201    44      
1202    4       
1202    5       
1202    6       
1202    7       
1202    8       
1202    16      
1202    17      
1202    18      
1202    19      
1202    20      
1202    29      
1202    30      
1202    31      
1202    32      
1202    40      
1202    41      
1202    42      
1202    43      
1202    44      
;
run;   
data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('wc');
  h.definedata('group','subgroup');
  h.definedone();
 end; 
call missing(of _all_);
set have;
rc=h.find();
run;

View solution in original post


All Replies
Esteemed Advisor
Esteemed Advisor
Posts: 7,211

Re: How to write better loops for grouping and sub grouping the data

[ Edited ]

Well for your grouping that is easy;

data want;
  set have;
  retain group subgroup;
  by wc;
  if _n_=1 then do;
    group=0;
    subgroup=1;
  end;
  if first.wc then group=sum(group,1);
  if lag(asn) > asn then subgroup=sum(subgroup,1);
run;

Note, code not tested if you want working code post test data in the form of a datastep.

Frequent Contributor
Posts: 87

Re: How to write better loops for grouping and sub grouping the data

Hi Thanks for help. This is the data table...


WC ASN
1101 0
1101 1
1101 2
1101 3
1101 4
1101 20
1101 21
1101 22
1101 23
1101 24
1101 25
1101 26
1101 27
1101 28
1101 45
1101 46
1101 47
1101 48
1201 4
1201 5
1201 6
1201 7
1201 8
1201 16
1201 17
1201 18
1201 19
1201 20
1201 28
1201 29
1201 30
1201 31
1201 32
1201 41
1201 42
1201 43
1201 44
1202 4
1202 5
1202 6
1202 7
1202 8
1202 16
1202 17
1202 18
1202 19
1202 20
1202 29
1202 30
1202 31
1202 32
1202 40
1202 41
1202 42
1202 43
1202 44
Esteemed Advisor
Esteemed Advisor
Posts: 7,211

Re: How to write better loops for grouping and sub grouping the data

I have posted example code above in my post.  When I ask for test data in the form of a datastep it means a datastep which will recreate the data exactly as you have it.  Take a look at:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Does the code provided not work?

Grand Advisor
Posts: 10,210

Re: How to write better loops for grouping and sub grouping the data

Is group supposed to be numeric or character? Is the rule that group should be the last character of WC or numeric value of the last 2 characters of WC? Is WC character or numeric?

If you want the last two characters of WC as a numeric value then something like

 

group = input(substr(wc,3),best4.);

may work.

if you want the last character:

group = substr(wc,4);

 

These both assume WC has a length of 4 and is character. If WC is numeric then

group = mod(wc,100); is perhaps what you want.

 

Subgroup could be either

subgroup = substr(wc,2,1);

or

Subgroup = mod(floor(wc/100),10);

Solution
‎11-10-2016 04:36 AM
Grand Advisor
Posts: 9,576

Re: How to write better loops for grouping and sub grouping the data

data wc;
infile cards truncover;
input v1-v6;
group+1;
cards;
1101    1201    1301    1401    1501    1601
1102    1202    1302    1402    1502    1602
1103    1203    1303    1403    1503    1603
1104    1204    1304    1404    1504    1604
1105    1205    1305    1405    1505    1605
1106    1206    1306    1406    1506    1606
1107    1207    1307    1407    1507    1607
1108    1208    1308    1408    1508    1608
1109    1209    1309    1409    1509    1609
1110    1210    1310    1410    1510    1610
1111    1211    1311    1411    1511    1611
1112    1212    1312    1412    1512    1612
1113    1213    1313    1413    1513    1613
1114    1214    1314    1414    1514    1614
1115    1215    1315    1415    1515    1615
1116    1216    1316    1416    1516    1616
1117    1217    1317    1417    1517    1617
1118    1218    1318    1418    1518    1618
1119                    
1120    
;
run;   
data key;
 set wc;
 array x{*} v1-v6;
 do subgroup=1 to dim(x);
  if not missing(x{subgroup}) then do;wc=x{subgroup};output;end;
 end;  
 drop v1-v6;  
run;     
data have;
input WC  ASN ;
cards;
1101    0       
1101    1       
1101    2       
1101    3       
1101    4       
1101    20      
1101    21      
1101    22      
1101    23      
1101    24      
1101    25      
1101    26      
1101    27      
1101    28      
1101    45      
1101    46      
1101    47      
1101    48      
1201    4       
1201    5       
1201    6       
1201    7       
1201    8       
1201    16      
1201    17      
1201    18      
1201    19      
1201    20      
1201    28      
1201    29      
1201    30      
1201    31      
1201    32      
1201    41      
1201    42      
1201    43      
1201    44      
1202    4       
1202    5       
1202    6       
1202    7       
1202    8       
1202    16      
1202    17      
1202    18      
1202    19      
1202    20      
1202    29      
1202    30      
1202    31      
1202    32      
1202    40      
1202    41      
1202    42      
1202    43      
1202    44      
;
run;   
data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('wc');
  h.definedata('group','subgroup');
  h.definedone();
 end; 
call missing(of _all_);
set have;
rc=h.find();
run;
Frequent Contributor
Posts: 87

Re: How to write better loops for grouping and sub grouping the data

Hi KSharp,

Thanks for your efforts and hard work. I was running your code on my SAS EG here. Your code gave perfect results for WC and KEY but failed to generate results for HAVE and WANT.

 

In HAVE I got 1018 rows of black results:

WC ASN

.    .
.    .
.    .
.    .
.    .
.    .
.    .

 

In WANT i got 1018 rows of results (all blank except for 'rc'):

 

group      subgroup   WC    ASN   rc

.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038
.    .    .    .    160038

 

 

 

What could be the reason?

 

 

Frequent Contributor
Posts: 87

Re: How to write better loops for grouping and sub grouping the data

[ Edited ]

Ahh !! I got it.. it is formatting of data...


7101 0
7101 1
7101 2
7101 3
7101 4
7101 20
7101 21
7101 22

Grand Advisor
Posts: 10,210

Re: How to write better loops for grouping and sub grouping the data

There are instuctions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... on how to create data step code to post to the form so we can actually duplicate your data. When we look at a list of values like:

 

1107  23

 

We cannot tell whether YOUR values in your data set are numeric or character. So we make guesses. And the resulting posted solution code may not work of our guess is incorrect.

 

That is why I asked earlier about character or numeric values. Different functions are usually needed/available though SAs can sometimes to an on the fly conversion that works.

Esteemed Advisor
Posts: 5,198

Re: How to write better loops for grouping and sub grouping the data

Avoid coding for this type of data relationships.
To me it seem you need a reference table which could lookup using either SQL join or SAS formats.
Data never sleeps
Frequent Contributor
Posts: 87

Re: How to write better loops for grouping and sub grouping the data

Agree.. I will get an idea from suggestion posted here and then I will recode using SQL that what i was thinking. Thanks LinusH
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 227 views
  • 4 likes
  • 5 in conversation