I'd like to reproduce image below showing how soil fertility relate to BF categories based on fertilization durations over time.
Have is the sample of my data. I'm totally confused!!! I feel like an idiot anyway. Try not to judge me please.
Question1: First off, I thought creating categorical variable 'bfcat' shown in my array code below would help create categorical variable for 'BF categories based on fertilization durations' that splits lines in the image above. Please see Chart legend, e.g., BF ever(yes,no) et.c. But creating variable 'bfcat' doesn't make sense, right? Any alternative suggestions as to how I should organize my data to come up with similar chart shown above please?
Question2:
Array code gave error:
ERROR: Array subscript out of range at line 223 column 4.
What am I doing wrong here?
data have;
input id bf1 bf2 bf3 bf4 bf5 bf6 bf7 bf8 bf9 bf10 bf11 bf12 ever;
datalines;
1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 0 0 0 0 0 0 0 0 1
3 1 1 0 0 0 0 0 0 0 0 0 0 1
4 1 0 0 0 0 0 0 0 0 0 0 0 1
5 0 0 0 0 0 0 0 0 0 0 0 0 0
6 1 1 1 1 1 1 1 0 0 0 0 0 1
;
data test; set have;
array x (12) BF_1-BF_12;
do i=1 to 12;
if x(1-12)=1 then bfcat=121; else bfcat=120;
if x(1-6)=1 then bfcat=61; else bfcat=60;
if x(1-3)=1 then bfcat=31; else bfcat=30;
if x(1)=1 then bfcat=11; else bfcat=10;
end;
if bf_ever=1 then bfcat=1; else bfcat=0;
run;
Hi,
Well, lets just look at your array step for the moment and fix that. You don't do ranges as such like that, do you mean if all of elements 1-12 are 1 then bfcat=121 then something like:
data have; input id bf1 bf2 bf3 bf4 bf5 bf6 bf7 bf8 bf9 bf10 bf11 bf12 ever; datalines; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 0 0 0 0 0 0 0 0 1 3 1 1 0 0 0 0 0 0 0 0 0 0 1 4 1 0 0 0 0 0 0 0 0 0 0 0 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 1 1 1 1 1 1 0 0 0 0 0 1 ; run; data test; set have; if min(of bf1--bf12)=1 and max(of bf1--bf12)=1 then bfcat=121; else if min(of bf1--bf6)=1 and max(of bf1--bf6)=1 then bfcat=61; else if ever=1 then bfcat=1; run;
However you will need to explain your logic further as you have else statements of your if's as well. This:
if x(1-12)=1 then bfcat=121; else bfcat=120;
Means that bfcat will either be 121 or 120 - but this will be overwritten by the next if, and the last if with the ever will always overide everything else. Thats why I made it else's in the given code. Also note you don't need the array or loop.
Chart shows data by BF12M+(Yes) vs BF12M+(No), BF3M+(Yes) vs BF3M+(No), BF1M+(Yes) vs BF1M+(No), BF ever(Yes) vs BF ever+(No). That's where my confusion came from with "if-else-then" logic. I like your strategy, but I'm wondering how I would come up with data produced from @AhmedAl_Attar 's sample data.
Q1: Please see attached image below. I'm awfully sorry for providing you with a screenshot for an idea of how my real data looks like. My SAS on laptop had expired.
Q2: What does your categories, such as "BF 12M+ (Yes)" mean?
- "BF 12M+ (Yes)" means that given individual had received BF treatment for longer than 12 months vs "BF 12M+ (No)" means "else, then less than 12 months".
- Yes, 1==Yes, 0==No.
- While 12M+ means 12 months or More? Yes, you're right.
- Are your bf1 -- bf12 represent months? Yes, they're number of months that treatment BF continued.
- As shown in the data screen shot, id=1 invidual volunteered for measurements at his/her age of 1,2,3 and 4 and known to have BF treatment for more than 12 months. Thus also takes value 1 for "ever" variable.
Individual id=2 volunteered for only one measurement at age category of 1 and known to have BF treatment for only 2 months. Individual id=3 had never received a BF treatment thus took "0" for "ever" variable and 0 for across all BF dummy variables. Same logic applies for id=4 who has 3 measurements but treatment continued for only 1 month.
Thank you very much Ahmed!
You should really type it out or provide data in a text format, otherwise to work with it, we have to.
I feel like you're skipping a whole bunch of steps and logic here in what needs to happen. Please take some time to detail your data structure. It sounds like your BF groups/lines you want to draw are not pre-calculated, so that's likely the step you should start with. You can worry about format for a chart when you reach that point.
Hi Cruise,
Based on your sample data, here is what I was to come up with,
data want (Keep=id year category values);
SET work.have(rename=(measurment_years=year));
length category $13;
if min(of bf1--bf12)=1 and max(of bf1--bf12)=1 then
category='BF 12M+ (Yes)';
else
do;
if min(of bf1--bf12)=0 and max(of bf1--bf12)=0 then
category='BF 12M+ (No)';
else
do;
if min(of bf1--bf6)=1 and max(of bf1--bf6)=1 then
category='BF 6M+ (Yes)';
else
do;
if min(of bf1--bf6)=0 and max(of bf1--bf6)=0 then
category='BF 6M+ (No)';
else
do;
if min(of bf1--bf3)=1 and max(of bf1--bf3)=1 then
category='BF 3M+ (Yes)';
else
do;
if min(of bf1--bf3)=0 and max(of bf1--bf3)=0 then
category='BF 3M+ (No)';
else if (bf1=1) then
category='BF 4WK+ (Yes)';
else
category='BF 4WK+ (No)';
end; /* End of 3M+ check */
end;
end; /* End of 6M+ check */
end;
end; /* End of 12M+ check */
output;
if (ever=1) then
category='BF Ever (Yes)';
else
category='BF Ever (No)';
output;
run;
As you can see, I used the evaluation logic suggested by @RW9, and I assumed your categories are mutually exclusive.
Note: I just added an excel version of your data image as an attachment.
Thanks,
Ahmed
This is good ! The guy who created the data is not here anymore. However, I have just learned that variable 'category' is inclusive which is meaning:
BF_1: individual received treatment for more than one month. BF_3 means that individual received treatment for more than 3 months instead equal to 3 months., so forth so on. So below simple array is correct even though I still need complementary categories of (No) for each category created exactly by the manner you demonstrated in your last do loop. Please note that the logic underlying "if EVER=1 and x(i)=0 then bf='BF 3M (Yes)'; " is true given internal data log note.
Now question becomes:
If I modify your code to ...min(of bf3--bf5) instead min(of bf1--bf5) do we still need 'else' logic there? Sorry, if it's only confusing.
Would you mind to show me how your do loop would be modified given 'category' was inclusive as I described above?
array x (12) BF_1-BF_12;
do i=1 to 12;
if EVER=1 and x(i)=0 then cat='BF 3M (Yes)';
end;
if EVER =0 then cat='BF Ever (No)';
if BF_1=1 or BF_2=1 then cat='BF 3M (Yes)';
if BF_3=1 or BF_4=1 or BF_5=1 then cat='BF 6M (Yes)';
if BF_6=1 or BF_7=1 or BF_8=1 then cat='BF 9M (Yes)';
if BF_9=1 or BF_10=1 or BF_11=1 then cat='BF 12M (Yes)';
if BF_12=1 then cat='BF 12M+ (Yes)';
where 'cat' in above code means:
BF Ever (No) vs (Yes)=never vs ever received treatment
BF 3M (Yes)=received BF treatment for 3 months
BF 6M (Yes)=received BF treatment for 6 months
BF 9M (Yes)=received BF treatment for 9 months
BF 12M (Yes)=received BF treatment for 12 months
BF 12M+ (Yes)=received BF treatment beyond 12 months
Thank you very much for your inputs!! I truly appreciate it. I now got the logic behind your do loop and I'd like to use this approach for final data preparation for a chart. May I have a question please?
The resulting data 'want' is correct except row 11 and 13 has wrong values for a variable 'category'. What went wrong in modifying your do loop? I will summarize data for a chart after 'category' variable is created. Thanks zillions!
data have;
input id bf1 bf2 bf3 bf4 bf5 bf6 bf7 bf8 bf9 bf10 bf11 bf12 ever denom obs years;
datalines;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2002
1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2003
2 1 1 1 1 1 1 1 1 1 1 1 0 1 3 0 2002
3 1 1 1 1 1 1 1 1 0 0 0 0 1 4 1 2003
4 1 1 1 1 1 0 0 0 0 0 0 0 1 5 1 2004
5 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2004
6 1 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2004
;
data want;
SET have;
length category $13;
if min(of bf1--bf12)=1 and max(of bf1--bf12)=1 then
category='BF 12M+ (Yes)';
else
do;
if min(of bf1--bf12)=0 and max(of bf1--bf12)=0 then
category='BF 12M+ (No)';
else
do;
if min(of bf1--bf11)=1 and max(of bf1--bf11)=1 then
category='BF 12M (Yes)';
else
do;
if min(of bf1--bf11)=0 and max(of bf1--bf11)=0 then
category='BF 12M (No)';
else
do;
if min(of bf1--bf8)=1 and max(of bf1--bf8)=1 then
category='BF 9M (Yes)';
else
do;
if min(of bf1--bf8)=0 and max(of bf1--bf8)=0 then
category='BF 9M (No)';
else
do;
if min(of bf1--bf5)=1 and max(of bf1--bf5)=1 then
category='BF 6M (Yes)';
else
do;
if min(of bf1--bf5)=0 and max(of bf1--bf5)=0 then
category='BF 6M (No)';
else
do;
if min(of bf1--bf2)=1 and max(of bf1--bf2)=1 then
category='BF 3M (Yes)';
else
do;
if min(of bf1--bf2)=0 and max(of bf1--bf2)=0 then
category='BF 3M (No)';
else
do;
if min(of bf1--bf12)=0 and max(of bf1--bf12)=0 and ever=1 then
category='BF 1Mo (Yes)';
else category='BF 1Mo (No)';
end;
end;
end;
end;
end;
end;
end;
end;
end;
end;
output;
if (ever=1) then
category='BF Ever (Yes)';
else
category='BF Ever (No)';
output;
run;
Hi Cruise,
What were you expecting the category value for row 11 & 13 in your output (Want) data set?
From looking at the input rows and your conditions (ever=1)
5 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2004 --> [Row 12] BF Ever (No), [Row 11] BF 12M+ (No)
6 1 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2004 --> [Row 14] BF Ever (No), [Row 13] BF 1Mo (No)
Ahmed
Transpose your data to a wide format.
It should look like, with Year, where's year in your data?
ID Year Category Value
1 2013 BF1 1
1 2013 BF2 1
1 2013 BF3 1
...
1 2014 BF1 1
1 2014 BF2 1
2 2013 BF1 1
...
Then your code will be something like:
proc sgplot data=have;
series x=year y=value / group=category;
run;
@Cruise wrote:
I'd like to reproduce image below showing how soil fertility relate to BF categories based on fertilization durations over time.
Have is the sample of my data. I'm totally confused!!! I feel like an idiot anyway. Try not to judge me please.
Question1: First off, I thought creating categorical variable 'bfcat' shown in my array code below would help create categorical variable for 'BF categories based on fertilization durations' that splits lines in the image above. Please see Chart legend, e.g., BF ever(yes,no) et.c. But creating variable 'bfcat' doesn't make sense, right? Any alternative suggestions as to how I should organize my data to come up with similar chart shown above please?
Question2:
Array code gave error:
ERROR: Array subscript out of range at line 223 column 4.
What am I doing wrong here?
data have; input id bf1 bf2 bf3 bf4 bf5 bf6 bf7 bf8 bf9 bf10 bf11 bf12 ever; datalines; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 0 0 0 0 0 0 0 0 1 3 1 1 0 0 0 0 0 0 0 0 0 0 1 4 1 0 0 0 0 0 0 0 0 0 0 0 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 1 1 1 1 1 1 0 0 0 0 0 1 ;
data test; set have; array x (12) BF_1-BF_12; do i=1 to 12; if x(1-12)=1 then bfcat=121; else bfcat=120; if x(1-6)=1 then bfcat=61; else bfcat=60; if x(1-3)=1 then bfcat=31; else bfcat=30; if x(1)=1 then bfcat=11; else bfcat=10; end; if bf_ever=1 then bfcat=1; else bfcat=0; run;
I have about 5 million rows. Does transpose work for large dataset like this? SAS crashed with below code. I still don't understand the benefit of transposing, can you please explain a little more?
proc transpose data=realdata;
out=temp
name=transposed_col;
VAR BF_1-BF_12;
by id survey_yr;
run;
Transposing would mean that you can use the GROUP option in my sample SGPLOT. If you don't then you need to have a series statement for each line, which can be cumbersome.
5 million rows * 10 variables would be a 50 million row data set. I'm assuming that at some point you actually summarize the data to graph it...showing 5 million data points on any graph would be problematic.
Hi,
As @Reeza had mentioned, your data should be converted to have three columns (year, Category, value) and 48 rows (2003 - 2013 by 2 yrs increments)
This how I generated my Sample data to "reproduce" your chart
data want(KEEP=year cat val);
length year 4 cat $12 val 4;
array bfs {8} $12 bf1 - bf8 ('BF Ever (Yes)' 'BF Ever (No)' 'BF 4WK+ (Yes)' 'BF 4WK+ (No)' 'BF 3M+ (Yes)' 'BF 3M+ (No)' 'BF 12M+ (Yes)' 'BF 12M+ (No)');
do year=2003 to 2013 by 2;
do i=1 to dim(bfs);
cat = bfs[i];
val = rand('uniform') * 10; /* Used Random numbers */
output;
end;
end;
run;
For Custom line styling and coloring, you could use SAS/GRAPH SYMBOL, LEGEND, and AXIS statements prior to Proc Gplot, or use other SAS charting procedures.
Note: I used the Line Plot Wizard in SAS Enterprise Guide, to generate my chart.
Hope this helps.
Ahmed
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.