BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
iiibbb
Quartz | Level 8

I have a dataset that was generated by an old FORTRAN model.  The output are time steps, diameter groups, and some variables calculated for a model that goes with those groups.

 

There are 16 lines giving model outputs for 16 diameter classes.  The first 16 lines is species A for time X, the second 16 lines for species B for time X, the third 16 lines for species C for time X, the fourth 16 lines the combined total of the three species.

Then the pattern repeats.  

 

Any suggestions for automating the labeling?  I have a few dozen of these to go through and the data sets for each time X are 40K lines each.  Attached is one set of data. and below is what I need it to look like with a new species column.  I think with this model output it's always 16 lines per species or total.

Species time group numtrees BA STMV SVTS
A 0 1.00-4.99 112 0.1 0 0
A 0 5.00-9.99 56 0.2 0 0
A 0 10.00-14.99 49 0.6 0 0
A 0 15.00-19.99 41 0.9 4.74 0
A 0 20.00-24.99 30 1.1 7.49 0
A 0 25.00-29.99 27 1.5 12.26 0
A 0 30.00-34.99 18 1.4 12.72 8.46
A 0 35.00-39.99 8 0.9 8.06 6.35
A 0 40.00-44.99 10 1.4 14.42 11.87
A 0 45.00-49.99 7 1.3 13.28 10.76
A 0 50.00-54.99 5 1.1 11.62 9.13
A 0 55.00-59.99 1 0.3 3.04 2.26
A 0 60.00-64.99 0 0 0 0
A 0 65.00-69.99 0 0 0 0
A 0 70.00-74.99 0 0 0 0
A 0 75.00+ 0 0 0 0
B 0 1.00-4.99 46 0.1 0 0
B 0 5.00-9.99 21 0.1 0 0
B 0 10.00-14.99 17 0.2 0 0
B 0 15.00-19.99 17 0.4 2.09 0
B 0 20.00-24.99 8 0.3 2 0
B 0 25.00-29.99 7 0.4 2.88 0
B 0 30.00-34.99 6 0.5 4.46 3.08
B 0 35.00-39.99 5 0.5 5.05 3.99
B 0 40.00-44.99 2 0.3 2.78 2.28
B 0 45.00-49.99 1 0.2 1.9 1.54
B 0 50.00-54.99 1 0.2 2.18 1.73
B 0 55.00-59.99 0 0 0 0
B 0 60.00-64.99 0 0 0 0
B 0 65.00-69.99 0 0 0 0
B 0 70.00-74.99 0 0 0 0
B 0 75.00+ 0 0 0 0
C 0 1.00-4.99 42 0.1 0 0
C 0 5.00-9.99 23 0.1 0 0
C 0 10.00-14.99 14 0.2 0 0
C 0 15.00-19.99 12 0.3 1.51 0
C 0 20.00-24.99 17 0.6 4.16 0
C 0 25.00-29.99 6 0.3 2.73 0
C 0 30.00-34.99 6 0.5 4.24 2.82
C 0 35.00-39.99 7 0.7 6.63 5.15
C 0 40.00-44.99 3 0.4 4.01 3.29
C 0 45.00-49.99 2 0.3 3.35 2.74
C 0 50.00-54.99 2 0.4 4.27 3.4
C 0 55.00-59.99 3 0.7 8 6.13
C 0 60.00-64.99 0 0 0 0
C 0 65.00-69.99 0 0 0 0
C 0 70.00-74.99 0 0 0 0
C 0 75.00+ 0 0 0 0
T 0 1.00-4.99 200 0.3 0 0
T 0 5.00-9.99 100 0.4 0 0
T 0 10.00-14.99 80 0.9 0 0
T 0 15.00-19.99 70 1.5 8.35 0
T 0 20.00-24.99 55 2 13.64 0
T 0 25.00-29.99 40 2.2 17.88 0
T 0 30.00-34.99 30 2.4 21.43 14.36
T 0 35.00-39.99 20 2.1 19.74 15.49
T 0 40.00-44.99 15 2.1 21.21 17.44
T 0 45.00-49.99 10 1.8 18.53 15.04
T 0 50.00-54.99 8 1.7 18.06 14.27
T 0 55.00-59.99 4 1 11.04 8.39
T 0 60.00-64.99 0 0 0 0
T 0 65.00-69.99 0 0 0 0
T 0 70.00-74.99 0 0 0 0
T 0 75.00+ 0 0 0 0
A 5 1.00-4.99 1063 0.2 0 0
A 5 5.00-9.99 116 0.4 0 0
A 5 10.00-14.99 46 0.5 0 0
A 5 15.00-19.99 41 1 5.66 0
A 5 20.00-24.99 36 1.4 9.83 0
A 5 25.00-29.99 23 1.4 10.92 0
A 5 30.00-34.99 20 1.6 14.67 10
A 5 35.00-39.99 13 1.4 13.6 10.8
A 5 40.00-44.99 7 1 10.46 8.6
A 5 45.00-49.99 6 1.1 10.99 8.93
A 5 50.00-54.99 7 1.5 15.47 12.27
A 5 55.00-59.99 3 0.8 8.32 6.32
A 5 60.00-64.99 1 0.3 3.19 2.35
A 5 65.00-69.99 0 0 0 0
A 5 70.00-74.99 0 0 0 0
A 5 75.00+ 0 0 0 0
B 5 1.00-4.99 392 0.1 0 0
B 5 5.00-9.99 48 0.2 0 0
B 5 10.00-14.99 15 0.2 0 0
             
etc.            




1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

@iiibbb Are you after something as simple as this? 

data want;
do species='A','B','C','T';
 do _n_=1 to 16;
 set tmp1.b;
 output;
 end;
end;
run;

View solution in original post

12 REPLIES 12
Reeza
Super User

What do you mean by 'labeling'?

Its not clear what you want as output. 

 

If you're trying to create groups, then using a combination of Species and Time in the BY statement would work.

 

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 


@iiibbb wrote:

I have a dataset that was generated by an old FORTRAN model.  The output are time steps, diameter groups, and some variables calculated for a model that goes with those groups.

 

There are 16 lines giving model outputs for 16 diameter classes.  The first 16 lines is species A for time X, the second 16 lines for species B for time X, the third 16 lines for species C for time X, the fourth 16 lines the combined total of the three species.

Then the pattern repeats.  

 

Any suggestions for automating the labeling?  I have a few dozen of these to go through and the data sets for each time X are 40K lines each.  Attached is one set of data. and below is what I need it to look like with a new species column.  I think with this model output it's always 16 lines per species or total.

Species time group numtrees BA STMV SVTS
A 0 1.00-4.99 112 0.1 0 0
A 0 5.00-9.99 56 0.2 0 0
A 0 10.00-14.99 49 0.6 0 0
A 0 15.00-19.99 41 0.9 4.74 0
A 0 20.00-24.99 30 1.1 7.49 0
A 0 25.00-29.99 27 1.5 12.26 0
A 0 30.00-34.99 18 1.4 12.72 8.46
A 0 35.00-39.99 8 0.9 8.06 6.35
A 0 40.00-44.99 10 1.4 14.42 11.87
A 0 45.00-49.99 7 1.3 13.28 10.76
A 0 50.00-54.99 5 1.1 11.62 9.13
A 0 55.00-59.99 1 0.3 3.04 2.26
A 0 60.00-64.99 0 0 0 0
A 0 65.00-69.99 0 0 0 0
A 0 70.00-74.99 0 0 0 0
A 0 75.00+ 0 0 0 0
B 0 1.00-4.99 46 0.1 0 0
B 0 5.00-9.99 21 0.1 0 0
B 0 10.00-14.99 17 0.2 0 0
B 0 15.00-19.99 17 0.4 2.09 0
B 0 20.00-24.99 8 0.3 2 0
B 0 25.00-29.99 7 0.4 2.88 0
B 0 30.00-34.99 6 0.5 4.46 3.08
B 0 35.00-39.99 5 0.5 5.05 3.99
B 0 40.00-44.99 2 0.3 2.78 2.28
B 0 45.00-49.99 1 0.2 1.9 1.54
B 0 50.00-54.99 1 0.2 2.18 1.73
B 0 55.00-59.99 0 0 0 0
B 0 60.00-64.99 0 0 0 0
B 0 65.00-69.99 0 0 0 0
B 0 70.00-74.99 0 0 0 0
B 0 75.00+ 0 0 0 0
C 0 1.00-4.99 42 0.1 0 0
C 0 5.00-9.99 23 0.1 0 0
C 0 10.00-14.99 14 0.2 0 0
C 0 15.00-19.99 12 0.3 1.51 0
C 0 20.00-24.99 17 0.6 4.16 0
C 0 25.00-29.99 6 0.3 2.73 0
C 0 30.00-34.99 6 0.5 4.24 2.82
C 0 35.00-39.99 7 0.7 6.63 5.15
C 0 40.00-44.99 3 0.4 4.01 3.29
C 0 45.00-49.99 2 0.3 3.35 2.74
C 0 50.00-54.99 2 0.4 4.27 3.4
C 0 55.00-59.99 3 0.7 8 6.13
C 0 60.00-64.99 0 0 0 0
C 0 65.00-69.99 0 0 0 0
C 0 70.00-74.99 0 0 0 0
C 0 75.00+ 0 0 0 0
T 0 1.00-4.99 200 0.3 0 0
T 0 5.00-9.99 100 0.4 0 0
T 0 10.00-14.99 80 0.9 0 0
T 0 15.00-19.99 70 1.5 8.35 0
T 0 20.00-24.99 55 2 13.64 0
T 0 25.00-29.99 40 2.2 17.88 0
T 0 30.00-34.99 30 2.4 21.43 14.36
T 0 35.00-39.99 20 2.1 19.74 15.49
T 0 40.00-44.99 15 2.1 21.21 17.44
T 0 45.00-49.99 10 1.8 18.53 15.04
T 0 50.00-54.99 8 1.7 18.06 14.27
T 0 55.00-59.99 4 1 11.04 8.39
T 0 60.00-64.99 0 0 0 0
T 0 65.00-69.99 0 0 0 0
T 0 70.00-74.99 0 0 0 0
T 0 75.00+ 0 0 0 0
A 5 1.00-4.99 1063 0.2 0 0
A 5 5.00-9.99 116 0.4 0 0
A 5 10.00-14.99 46 0.5 0 0
A 5 15.00-19.99 41 1 5.66 0
A 5 20.00-24.99 36 1.4 9.83 0
A 5 25.00-29.99 23 1.4 10.92 0
A 5 30.00-34.99 20 1.6 14.67 10
A 5 35.00-39.99 13 1.4 13.6 10.8
A 5 40.00-44.99 7 1 10.46 8.6
A 5 45.00-49.99 6 1.1 10.99 8.93
A 5 50.00-54.99 7 1.5 15.47 12.27
A 5 55.00-59.99 3 0.8 8.32 6.32
A 5 60.00-64.99 1 0.3 3.19 2.35
A 5 65.00-69.99 0 0 0 0
A 5 70.00-74.99 0 0 0 0
A 5 75.00+ 0 0 0 0
B 5 1.00-4.99 392 0.1 0 0
B 5 5.00-9.99 48 0.2 0 0
B 5 10.00-14.99 15 0.2 0 0
             
etc.            





 

iiibbb
Quartz | Level 8

Hi, thanks

 

If you look at the sas data file B I attached before, you'll see it lacks the "Species" column.  I need this column to work with the data.  Each set of 16 lines belongs to one of three species, or is the total.  The pattern repeats lines 1-16 are species A.  Lines 17-32 are species B.  Lines 33-48 are species C, lines 49 through 64 are the totals.  Then the pattern repeats lines 65-80 are species A again, lines 81-96 species B... etc until line 40+K.  I have a few dozen files that I need to create this column for.

 

The table in my message text is an example of what I want it to look like compared to the attached dataset...

I have an inelegant solution at this point, but I'd rather do it the right way.

If that file is hard to open, it looks like this

time group numtrees BA STMV SVTS
0 1.00-4.99 112 0.1 0 0
0 5.00-9.99 56 0.2 0 0
0 10.00-14.99 49 0.6 0 0
0 15.00-19.99 41 0.9 4.74 0
0 20.00-24.99 30 1.1 7.49 0
0 25.00-29.99 27 1.5 12.26 0
0 30.00-34.99 18 1.4 12.72 8.46
0 35.00-39.99 8 0.9 8.06 6.35
0 40.00-44.99 10 1.4 14.42 11.87
0 45.00-49.99 7 1.3 13.28 10.76
0 50.00-54.99 5 1.1 11.62 9.13
0 55.00-59.99 1 0.3 3.04 2.26
0 60.00-64.99 0 0 0 0
0 65.00-69.99 0 0 0 0
0 70.00-74.99 0 0 0 0
0 75.00+ 0 0 0 0
0 1.00-4.99 46 0.1 0 0
0 5.00-9.99 21 0.1 0 0
0 10.00-14.99 17 0.2 0 0
0 15.00-19.99 17 0.4 2.09 0
0 20.00-24.99 8 0.3 2 0
0 25.00-29.99 7 0.4 2.88 0
0 30.00-34.99 6 0.5 4.46 3.08
0 35.00-39.99 5 0.5 5.05 3.99
0 40.00-44.99 2 0.3 2.78 2.28
0 45.00-49.99 1 0.2 1.9 1.54
0 50.00-54.99 1 0.2 2.18 1.73
0 55.00-59.99 0 0 0 0
0 60.00-64.99 0 0 0 0
0 65.00-69.99 0 0 0 0
0 70.00-74.99 0 0 0 0
0 75.00+ 0 0 0 0
0 1.00-4.99 42 0.1 0 0
0 5.00-9.99 23 0.1 0 0
0 10.00-14.99 14 0.2 0 0
0 15.00-19.99 12 0.3 1.51 0
0 20.00-24.99 17 0.6 4.16 0
0 25.00-29.99 6 0.3 2.73 0
0 30.00-34.99 6 0.5 4.24 2.82
0 35.00-39.99 7 0.7 6.63 5.15
0 40.00-44.99 3 0.4 4.01 3.29
0 45.00-49.99 2 0.3 3.35 2.74
0 50.00-54.99 2 0.4 4.27 3.4
0 55.00-59.99 3 0.7 8 6.13
0 60.00-64.99 0 0 0 0
0 65.00-69.99 0 0 0 0
0 70.00-74.99 0 0 0 0
0 75.00+ 0 0 0 0
0 1.00-4.99 200 0.3 0 0
0 5.00-9.99 100 0.4 0 0
0 10.00-14.99 80 0.9 0 0
0 15.00-19.99 70 1.5 8.35 0
0 20.00-24.99 55 2 13.64 0
0 25.00-29.99 40 2.2 17.88 0
0 30.00-34.99 30 2.4 21.43 14.36
0 35.00-39.99 20 2.1 19.74 15.49
0 40.00-44.99 15 2.1 21.21 17.44
0 45.00-49.99 10 1.8 18.53 15.04
0 50.00-54.99 8 1.7 18.06 14.27
0 55.00-59.99 4 1 11.04 8.39
0 60.00-64.99 0 0 0 0
0 65.00-69.99 0 0 0 0
0 70.00-74.99 0 0 0 0
0 75.00+ 0 0 0 0
5 1.00-4.99 1063 0.2 0 0
5 5.00-9.99 116 0.4 0 0
5 10.00-14.99 46 0.5 0 0
5 15.00-19.99 41 1 5.66 0
5 20.00-24.99 36 1.4 9.83 0
5 25.00-29.99 23 1.4 10.92 0
5 30.00-34.99 20 1.6 14.67 10
5 35.00-39.99 13 1.4 13.6 10.8
5 40.00-44.99 7 1 10.46 8.6
5 45.00-49.99 6 1.1 10.99 8.93
5 50.00-54.99 7 1.5 15.47 12.27
5 55.00-59.99 3 0.8 8.32 6.32
5 60.00-64.99 1 0.3 3.19 2.35
5 65.00-69.99 0 0 0 0
5 70.00-74.99 0 0 0 0
5 75.00+ 0 0 0 0
5 1.00-4.99 392 0.1 0 0
5 5.00-9.99 48 0.2 0 0
5 10.00-14.99 15 0.2 0 0
           
etc          
Reeza
Super User

Well, if the only rule you have is every 16, use the MOD() function to identify each group, then assign the groups to the species.

 

data want;
set sashelp.cars;

if mod(_n_, 16)=1 then count+1;

keep model make mpg_city count;
run;

 

iiibbb
Quartz | Level 8

Okay.

This is partway there.

The problem is I have a few dozen files of differing lengths 40,000 or more lines.  So labeling each group is still 2500 assignments I have to make to my species.

So it seems that am still roughly in the same place as labeling each diameter group manually because groups 1,5,9,13,17 etc are species A, 2,6,10,14,18 etc are species B, 3,7,11,15,19 etc are species C, and 4,8,12,16,20 etc. are total.

So it solves the grouping problem, but not the naming problem.

Reeza
Super User

Use a double mod then, if you know you only have 4 groups. Do the same logic to group the others into 1 to 4. 

And then you're only writing about 4 If/Then or Format statements.

novinosrin
Tourmaline | Level 20

@iiibbb Are you after something as simple as this? 

data want;
do species='A','B','C','T';
 do _n_=1 to 16;
 set tmp1.b;
 output;
 end;
end;
run;
iiibbb
Quartz | Level 8

Wonderful.

 

I am not skilled with do statements... this did it.

Thanks

novinosrin
Tourmaline | Level 20

@iiibbb You are very welcome. That's what the community is for. All of us work as a team for a common good i.e knowledge sharing

iiibbb
Quartz | Level 8

What's funny is that I successfully filtered some of the formatting peculiarities imputing the data only to be stymied by this.

Is there a way I could make this more generic... like if there were missing values?  Could I write the code where the count might vary from group to group?

Not my current problem, but I get data with missing observations all the time.

 

Thanks

novinosrin
Tourmaline | Level 20

I'm afraid if the count varies you would have to let us know where to exit the loop. For example, if the count is 7, and species is B, the code would need to tweaked to exit with some kind of  OR condition. 

 

 

EDIT: If you are working in a production environment, having to filter after making it a two step process is still safer than meddling to get it in one go. Of course on a forum, it's fun for us to solve regardless

iiibbb
Quartz | Level 8

Yeah... mostly just fun to think about.  In this forum it's always better to have actual data to go with the question.

I am not a heavy user of SAS and what problems I get are usually one-offs so I don't make as good of use of things like do loops because I can often do something in EXCEL first by hand in a few minutes rather than have to code and debug code for unexpected results.  This problem I was given though, manual was going to be very tedious and I couldn't insist the people giving it to us have it perfect beforehand because they're doing us the favor.

Thanks again!

ballardw
Super User

@iiibbb wrote:

What's funny is that I successfully filtered some of the formatting peculiarities imputing the data only to be stymied by this.

Is there a way I could make this more generic... like if there were missing values?  Could I write the code where the count might vary from group to group?

Not my current problem, but I get data with missing observations all the time.

 

Thanks


I think this question would warrant a new thread. Reference this one by copy and paste the URL. Provide a different start data and the desired result, best would be as data steps. Only paste as many variables and records needed to demonstrate the actual issue.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2127 views
  • 8 likes
  • 4 in conversation