How to create a variable column when repeated groups are unlabled?

Accepted Solution Solved
Reply
Contributor
Posts: 52
Accepted Solution

How to create a variable column when repeated groups are unlabled?

[ Edited ]

I have a dataset that was generated by an old FORTRAN model.  The output are time steps, diameter groups, and some variables calculated for a model that goes with those groups.

 

There are 16 lines giving model outputs for 16 diameter classes.  The first 16 lines is species A for time X, the second 16 lines for species B for time X, the third 16 lines for species C for time X, the fourth 16 lines the combined total of the three species.

Then the pattern repeats.  

 

Any suggestions for automating the labeling?  I have a few dozen of these to go through and the data sets for each time X are 40K lines each.  Attached is one set of data. and below is what I need it to look like with a new species column.  I think with this model output it's always 16 lines per species or total.

Species time group numtrees BA STMV SVTS
A 0 1.00-4.99 112 0.1 0 0
A 0 5.00-9.99 56 0.2 0 0
A 0 10.00-14.99 49 0.6 0 0
A 0 15.00-19.99 41 0.9 4.74 0
A 0 20.00-24.99 30 1.1 7.49 0
A 0 25.00-29.99 27 1.5 12.26 0
A 0 30.00-34.99 18 1.4 12.72 8.46
A 0 35.00-39.99 8 0.9 8.06 6.35
A 0 40.00-44.99 10 1.4 14.42 11.87
A 0 45.00-49.99 7 1.3 13.28 10.76
A 0 50.00-54.99 5 1.1 11.62 9.13
A 0 55.00-59.99 1 0.3 3.04 2.26
A 0 60.00-64.99 0 0 0 0
A 0 65.00-69.99 0 0 0 0
A 0 70.00-74.99 0 0 0 0
A 0 75.00+ 0 0 0 0
B 0 1.00-4.99 46 0.1 0 0
B 0 5.00-9.99 21 0.1 0 0
B 0 10.00-14.99 17 0.2 0 0
B 0 15.00-19.99 17 0.4 2.09 0
B 0 20.00-24.99 8 0.3 2 0
B 0 25.00-29.99 7 0.4 2.88 0
B 0 30.00-34.99 6 0.5 4.46 3.08
B 0 35.00-39.99 5 0.5 5.05 3.99
B 0 40.00-44.99 2 0.3 2.78 2.28
B 0 45.00-49.99 1 0.2 1.9 1.54
B 0 50.00-54.99 1 0.2 2.18 1.73
B 0 55.00-59.99 0 0 0 0
B 0 60.00-64.99 0 0 0 0
B 0 65.00-69.99 0 0 0 0
B 0 70.00-74.99 0 0 0 0
B 0 75.00+ 0 0 0 0
C 0 1.00-4.99 42 0.1 0 0
C 0 5.00-9.99 23 0.1 0 0
C 0 10.00-14.99 14 0.2 0 0
C 0 15.00-19.99 12 0.3 1.51 0
C 0 20.00-24.99 17 0.6 4.16 0
C 0 25.00-29.99 6 0.3 2.73 0
C 0 30.00-34.99 6 0.5 4.24 2.82
C 0 35.00-39.99 7 0.7 6.63 5.15
C 0 40.00-44.99 3 0.4 4.01 3.29
C 0 45.00-49.99 2 0.3 3.35 2.74
C 0 50.00-54.99 2 0.4 4.27 3.4
C 0 55.00-59.99 3 0.7 8 6.13
C 0 60.00-64.99 0 0 0 0
C 0 65.00-69.99 0 0 0 0
C 0 70.00-74.99 0 0 0 0
C 0 75.00+ 0 0 0 0
T 0 1.00-4.99 200 0.3 0 0
T 0 5.00-9.99 100 0.4 0 0
T 0 10.00-14.99 80 0.9 0 0
T 0 15.00-19.99 70 1.5 8.35 0
T 0 20.00-24.99 55 2 13.64 0
T 0 25.00-29.99 40 2.2 17.88 0
T 0 30.00-34.99 30 2.4 21.43 14.36
T 0 35.00-39.99 20 2.1 19.74 15.49
T 0 40.00-44.99 15 2.1 21.21 17.44
T 0 45.00-49.99 10 1.8 18.53 15.04
T 0 50.00-54.99 8 1.7 18.06 14.27
T 0 55.00-59.99 4 1 11.04 8.39
T 0 60.00-64.99 0 0 0 0
T 0 65.00-69.99 0 0 0 0
T 0 70.00-74.99 0 0 0 0
T 0 75.00+ 0 0 0 0
A 5 1.00-4.99 1063 0.2 0 0
A 5 5.00-9.99 116 0.4 0 0
A 5 10.00-14.99 46 0.5 0 0
A 5 15.00-19.99 41 1 5.66 0
A 5 20.00-24.99 36 1.4 9.83 0
A 5 25.00-29.99 23 1.4 10.92 0
A 5 30.00-34.99 20 1.6 14.67 10
A 5 35.00-39.99 13 1.4 13.6 10.8
A 5 40.00-44.99 7 1 10.46 8.6
A 5 45.00-49.99 6 1.1 10.99 8.93
A 5 50.00-54.99 7 1.5 15.47 12.27
A 5 55.00-59.99 3 0.8 8.32 6.32
A 5 60.00-64.99 1 0.3 3.19 2.35
A 5 65.00-69.99 0 0 0 0
A 5 70.00-74.99 0 0 0 0
A 5 75.00+ 0 0 0 0
B 5 1.00-4.99 392 0.1 0 0
B 5 5.00-9.99 48 0.2 0 0
B 5 10.00-14.99 15 0.2 0 0
             
etc.            




Attachment

Accepted Solutions
Solution
‎05-25-2018 11:32 AM
PROC Star
Posts: 1,836

Re: How to create a variable column when repeated groups are unlabled?

@iiibbb Are you after something as simple as this? 

data want;
do species='A','B','C','T';
 do _n_=1 to 16;
 set tmp1.b;
 output;
 end;
end;
run;

View solution in original post


All Replies
Super User
Posts: 23,776

Re: How to create a variable column when repeated groups are unlabled?

What do you mean by 'labeling'?

Its not clear what you want as output. 

 

If you're trying to create groups, then using a combination of Species and Time in the BY statement would work.

 

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 


@iiibbb wrote:

I have a dataset that was generated by an old FORTRAN model.  The output are time steps, diameter groups, and some variables calculated for a model that goes with those groups.

 

There are 16 lines giving model outputs for 16 diameter classes.  The first 16 lines is species A for time X, the second 16 lines for species B for time X, the third 16 lines for species C for time X, the fourth 16 lines the combined total of the three species.

Then the pattern repeats.  

 

Any suggestions for automating the labeling?  I have a few dozen of these to go through and the data sets for each time X are 40K lines each.  Attached is one set of data. and below is what I need it to look like with a new species column.  I think with this model output it's always 16 lines per species or total.

Species time group numtrees BA STMV SVTS
A 0 1.00-4.99 112 0.1 0 0
A 0 5.00-9.99 56 0.2 0 0
A 0 10.00-14.99 49 0.6 0 0
A 0 15.00-19.99 41 0.9 4.74 0
A 0 20.00-24.99 30 1.1 7.49 0
A 0 25.00-29.99 27 1.5 12.26 0
A 0 30.00-34.99 18 1.4 12.72 8.46
A 0 35.00-39.99 8 0.9 8.06 6.35
A 0 40.00-44.99 10 1.4 14.42 11.87
A 0 45.00-49.99 7 1.3 13.28 10.76
A 0 50.00-54.99 5 1.1 11.62 9.13
A 0 55.00-59.99 1 0.3 3.04 2.26
A 0 60.00-64.99 0 0 0 0
A 0 65.00-69.99 0 0 0 0
A 0 70.00-74.99 0 0 0 0
A 0 75.00+ 0 0 0 0
B 0 1.00-4.99 46 0.1 0 0
B 0 5.00-9.99 21 0.1 0 0
B 0 10.00-14.99 17 0.2 0 0
B 0 15.00-19.99 17 0.4 2.09 0
B 0 20.00-24.99 8 0.3 2 0
B 0 25.00-29.99 7 0.4 2.88 0
B 0 30.00-34.99 6 0.5 4.46 3.08
B 0 35.00-39.99 5 0.5 5.05 3.99
B 0 40.00-44.99 2 0.3 2.78 2.28
B 0 45.00-49.99 1 0.2 1.9 1.54
B 0 50.00-54.99 1 0.2 2.18 1.73
B 0 55.00-59.99 0 0 0 0
B 0 60.00-64.99 0 0 0 0
B 0 65.00-69.99 0 0 0 0
B 0 70.00-74.99 0 0 0 0
B 0 75.00+ 0 0 0 0
C 0 1.00-4.99 42 0.1 0 0
C 0 5.00-9.99 23 0.1 0 0
C 0 10.00-14.99 14 0.2 0 0
C 0 15.00-19.99 12 0.3 1.51 0
C 0 20.00-24.99 17 0.6 4.16 0
C 0 25.00-29.99 6 0.3 2.73 0
C 0 30.00-34.99 6 0.5 4.24 2.82
C 0 35.00-39.99 7 0.7 6.63 5.15
C 0 40.00-44.99 3 0.4 4.01 3.29
C 0 45.00-49.99 2 0.3 3.35 2.74
C 0 50.00-54.99 2 0.4 4.27 3.4
C 0 55.00-59.99 3 0.7 8 6.13
C 0 60.00-64.99 0 0 0 0
C 0 65.00-69.99 0 0 0 0
C 0 70.00-74.99 0 0 0 0
C 0 75.00+ 0 0 0 0
T 0 1.00-4.99 200 0.3 0 0
T 0 5.00-9.99 100 0.4 0 0
T 0 10.00-14.99 80 0.9 0 0
T 0 15.00-19.99 70 1.5 8.35 0
T 0 20.00-24.99 55 2 13.64 0
T 0 25.00-29.99 40 2.2 17.88 0
T 0 30.00-34.99 30 2.4 21.43 14.36
T 0 35.00-39.99 20 2.1 19.74 15.49
T 0 40.00-44.99 15 2.1 21.21 17.44
T 0 45.00-49.99 10 1.8 18.53 15.04
T 0 50.00-54.99 8 1.7 18.06 14.27
T 0 55.00-59.99 4 1 11.04 8.39
T 0 60.00-64.99 0 0 0 0
T 0 65.00-69.99 0 0 0 0
T 0 70.00-74.99 0 0 0 0
T 0 75.00+ 0 0 0 0
A 5 1.00-4.99 1063 0.2 0 0
A 5 5.00-9.99 116 0.4 0 0
A 5 10.00-14.99 46 0.5 0 0
A 5 15.00-19.99 41 1 5.66 0
A 5 20.00-24.99 36 1.4 9.83 0
A 5 25.00-29.99 23 1.4 10.92 0
A 5 30.00-34.99 20 1.6 14.67 10
A 5 35.00-39.99 13 1.4 13.6 10.8
A 5 40.00-44.99 7 1 10.46 8.6
A 5 45.00-49.99 6 1.1 10.99 8.93
A 5 50.00-54.99 7 1.5 15.47 12.27
A 5 55.00-59.99 3 0.8 8.32 6.32
A 5 60.00-64.99 1 0.3 3.19 2.35
A 5 65.00-69.99 0 0 0 0
A 5 70.00-74.99 0 0 0 0
A 5 75.00+ 0 0 0 0
B 5 1.00-4.99 392 0.1 0 0
B 5 5.00-9.99 48 0.2 0 0
B 5 10.00-14.99 15 0.2 0 0
             
etc.            





 

Contributor
Posts: 52

Re: How to create a variable column when repeated groups are unlabled?

[ Edited ]

Hi, thanks

 

If you look at the sas data file B I attached before, you'll see it lacks the "Species" column.  I need this column to work with the data.  Each set of 16 lines belongs to one of three species, or is the total.  The pattern repeats lines 1-16 are species A.  Lines 17-32 are species B.  Lines 33-48 are species C, lines 49 through 64 are the totals.  Then the pattern repeats lines 65-80 are species A again, lines 81-96 species B... etc until line 40+K.  I have a few dozen files that I need to create this column for.

 

The table in my message text is an example of what I want it to look like compared to the attached dataset...

I have an inelegant solution at this point, but I'd rather do it the right way.

If that file is hard to open, it looks like this

time group numtrees BA STMV SVTS
0 1.00-4.99 112 0.1 0 0
0 5.00-9.99 56 0.2 0 0
0 10.00-14.99 49 0.6 0 0
0 15.00-19.99 41 0.9 4.74 0
0 20.00-24.99 30 1.1 7.49 0
0 25.00-29.99 27 1.5 12.26 0
0 30.00-34.99 18 1.4 12.72 8.46
0 35.00-39.99 8 0.9 8.06 6.35
0 40.00-44.99 10 1.4 14.42 11.87
0 45.00-49.99 7 1.3 13.28 10.76
0 50.00-54.99 5 1.1 11.62 9.13
0 55.00-59.99 1 0.3 3.04 2.26
0 60.00-64.99 0 0 0 0
0 65.00-69.99 0 0 0 0
0 70.00-74.99 0 0 0 0
0 75.00+ 0 0 0 0
0 1.00-4.99 46 0.1 0 0
0 5.00-9.99 21 0.1 0 0
0 10.00-14.99 17 0.2 0 0
0 15.00-19.99 17 0.4 2.09 0
0 20.00-24.99 8 0.3 2 0
0 25.00-29.99 7 0.4 2.88 0
0 30.00-34.99 6 0.5 4.46 3.08
0 35.00-39.99 5 0.5 5.05 3.99
0 40.00-44.99 2 0.3 2.78 2.28
0 45.00-49.99 1 0.2 1.9 1.54
0 50.00-54.99 1 0.2 2.18 1.73
0 55.00-59.99 0 0 0 0
0 60.00-64.99 0 0 0 0
0 65.00-69.99 0 0 0 0
0 70.00-74.99 0 0 0 0
0 75.00+ 0 0 0 0
0 1.00-4.99 42 0.1 0 0
0 5.00-9.99 23 0.1 0 0
0 10.00-14.99 14 0.2 0 0
0 15.00-19.99 12 0.3 1.51 0
0 20.00-24.99 17 0.6 4.16 0
0 25.00-29.99 6 0.3 2.73 0
0 30.00-34.99 6 0.5 4.24 2.82
0 35.00-39.99 7 0.7 6.63 5.15
0 40.00-44.99 3 0.4 4.01 3.29
0 45.00-49.99 2 0.3 3.35 2.74
0 50.00-54.99 2 0.4 4.27 3.4
0 55.00-59.99 3 0.7 8 6.13
0 60.00-64.99 0 0 0 0
0 65.00-69.99 0 0 0 0
0 70.00-74.99 0 0 0 0
0 75.00+ 0 0 0 0
0 1.00-4.99 200 0.3 0 0
0 5.00-9.99 100 0.4 0 0
0 10.00-14.99 80 0.9 0 0
0 15.00-19.99 70 1.5 8.35 0
0 20.00-24.99 55 2 13.64 0
0 25.00-29.99 40 2.2 17.88 0
0 30.00-34.99 30 2.4 21.43 14.36
0 35.00-39.99 20 2.1 19.74 15.49
0 40.00-44.99 15 2.1 21.21 17.44
0 45.00-49.99 10 1.8 18.53 15.04
0 50.00-54.99 8 1.7 18.06 14.27
0 55.00-59.99 4 1 11.04 8.39
0 60.00-64.99 0 0 0 0
0 65.00-69.99 0 0 0 0
0 70.00-74.99 0 0 0 0
0 75.00+ 0 0 0 0
5 1.00-4.99 1063 0.2 0 0
5 5.00-9.99 116 0.4 0 0
5 10.00-14.99 46 0.5 0 0
5 15.00-19.99 41 1 5.66 0
5 20.00-24.99 36 1.4 9.83 0
5 25.00-29.99 23 1.4 10.92 0
5 30.00-34.99 20 1.6 14.67 10
5 35.00-39.99 13 1.4 13.6 10.8
5 40.00-44.99 7 1 10.46 8.6
5 45.00-49.99 6 1.1 10.99 8.93
5 50.00-54.99 7 1.5 15.47 12.27
5 55.00-59.99 3 0.8 8.32 6.32
5 60.00-64.99 1 0.3 3.19 2.35
5 65.00-69.99 0 0 0 0
5 70.00-74.99 0 0 0 0
5 75.00+ 0 0 0 0
5 1.00-4.99 392 0.1 0 0
5 5.00-9.99 48 0.2 0 0
5 10.00-14.99 15 0.2 0 0
           
etc          
Super User
Posts: 23,776

Re: How to create a variable column when repeated groups are unlabled?

Well, if the only rule you have is every 16, use the MOD() function to identify each group, then assign the groups to the species.

 

data want;
set sashelp.cars;

if mod(_n_, 16)=1 then count+1;

keep model make mpg_city count;
run;

 

Contributor
Posts: 52

Re: How to create a variable column when repeated groups are unlabled?

Okay.

This is partway there.

The problem is I have a few dozen files of differing lengths 40,000 or more lines.  So labeling each group is still 2500 assignments I have to make to my species.

So it seems that am still roughly in the same place as labeling each diameter group manually because groups 1,5,9,13,17 etc are species A, 2,6,10,14,18 etc are species B, 3,7,11,15,19 etc are species C, and 4,8,12,16,20 etc. are total.

So it solves the grouping problem, but not the naming problem.

Super User
Posts: 23,776

Re: How to create a variable column when repeated groups are unlabled?

Use a double mod then, if you know you only have 4 groups. Do the same logic to group the others into 1 to 4. 

And then you're only writing about 4 If/Then or Format statements.

Solution
‎05-25-2018 11:32 AM
PROC Star
Posts: 1,836

Re: How to create a variable column when repeated groups are unlabled?

@iiibbb Are you after something as simple as this? 

data want;
do species='A','B','C','T';
 do _n_=1 to 16;
 set tmp1.b;
 output;
 end;
end;
run;
Contributor
Posts: 52

Re: How to create a variable column when repeated groups are unlabled?

[ Edited ]
Posted in reply to novinosrin

Wonderful.

 

I am not skilled with do statements... this did it.

Thanks

PROC Star
Posts: 1,836

Re: How to create a variable column when repeated groups are unlabled?

@iiibbb You are very welcome. That's what the community is for. All of us work as a team for a common good i.e knowledge sharing

Contributor
Posts: 52

Re: How to create a variable column when repeated groups are unlabled?

Posted in reply to novinosrin

What's funny is that I successfully filtered some of the formatting peculiarities imputing the data only to be stymied by this.

Is there a way I could make this more generic... like if there were missing values?  Could I write the code where the count might vary from group to group?

Not my current problem, but I get data with missing observations all the time.

 

Thanks

PROC Star
Posts: 1,836

Re: How to create a variable column when repeated groups are unlabled?

[ Edited ]

I'm afraid if the count varies you would have to let us know where to exit the loop. For example, if the count is 7, and species is B, the code would need to tweaked to exit with some kind of  OR condition. 

 

 

EDIT: If you are working in a production environment, having to filter after making it a two step process is still safer than meddling to get it in one go. Of course on a forum, it's fun for us to solve regardless

Contributor
Posts: 52

Re: How to create a variable column when repeated groups are unlabled?

[ Edited ]
Posted in reply to novinosrin

Yeah... mostly just fun to think about.  In this forum it's always better to have actual data to go with the question.

I am not a heavy user of SAS and what problems I get are usually one-offs so I don't make as good of use of things like do loops because I can often do something in EXCEL first by hand in a few minutes rather than have to code and debug code for unexpected results.  This problem I was given though, manual was going to be very tedious and I couldn't insist the people giving it to us have it perfect beforehand because they're doing us the favor.

Thanks again!

Super User
Posts: 13,583

Re: How to create a variable column when repeated groups are unlabled?


@iiibbb wrote:

What's funny is that I successfully filtered some of the formatting peculiarities imputing the data only to be stymied by this.

Is there a way I could make this more generic... like if there were missing values?  Could I write the code where the count might vary from group to group?

Not my current problem, but I get data with missing observations all the time.

 

Thanks


I think this question would warrant a new thread. Reference this one by copy and paste the URL. Provide a different start data and the desired result, best would be as data steps. Only paste as many variables and records needed to demonstrate the actual issue.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 224 views
  • 8 likes
  • 4 in conversation