Solved: Data preparation for a line chart and using array index in different r...

Cruise · Posted 01-03-2018 11:20 AM

I'd like to reproduce image below showing how soil fertility relate to BF categories based on fertilization durations over time.

Have is the sample of my data. I'm totally confused!!! I feel like an idiot anyway. Try not to judge me please.

Question1: First off, I thought creating categorical variable 'bfcat' shown in my array code below would help create categorical variable for 'BF categories based on fertilization durations' that splits lines in the image above. Please see Chart legend, e.g., BF ever(yes,no) et.c. But creating variable 'bfcat' doesn't make sense, right? Any alternative suggestions as to how I should organize my data to come up with similar chart shown above please?

Question2:

Array code gave error:

ERROR: Array subscript out of range at line 223 column 4.

What am I doing wrong here?

data have; 
input id bf1 bf2 bf3 bf4 bf5 bf6 bf7 bf8 bf9 bf10 bf11 bf12 ever; 
datalines;
1	1	1	1	1	1	1	1	1	1	1	1	1	1
2	1	1	1	1	0	0	0	0	0	0	0	0	1
3	1	1	0	0	0	0	0	0	0	0	0	0	1
4	1	0	0	0	0	0	0	0	0	0	0	0	1
5	0	0	0	0	0	0	0	0	0	0	0	0	0
6	1	1	1	1	1	1	1	0	0	0	0	0	1
;


data test; set have;
array x (12) BF_1-BF_12;
do i=1 to 12;
if x(1-12)=1 then bfcat=121; else bfcat=120;
if x(1-6)=1 then bfcat=61; else bfcat=60;
if x(1-3)=1 then bfcat=31; else bfcat=30;
if x(1)=1 then bfcat=11; else bfcat=10;
end;
if bf_ever=1 then bfcat=1; else bfcat=0;
run;

AhmedAl_Attar · Posted 01-04-2018 03:48 PM

Hi,
I'm sorry, but I'm a little bit confused by your reply, and I don't fully understand your requirements!
It seems you have changed the logic around from your original questions and how to reproduce the line plot/chart.

You have been provided with various coding techniques and approaches already, and now it's your turn to experiment and learn more about how to customize them to meet your requirements.

If you want to get SAS on your laptop again, try to download SAS University Edition (https://www.sas.com/en_us/software/university-edition/download-software.html).

Ahmed

View solution in original post

RW9 · Posted 01-03-2018 11:42 AM

Hi,

Well, lets just look at your array step for the moment and fix that. You don't do ranges as such like that, do you mean if all of elements 1-12 are 1 then bfcat=121 then something like:

data have; 
  input id bf1 bf2 bf3 bf4 bf5 bf6 bf7 bf8 bf9 bf10 bf11 bf12 ever; 
datalines;
1   1   1   1   1   1   1   1   1   1   1   1   1   1
2   1   1   1   1   0   0   0   0   0   0   0   0   1
3   1   1   0   0   0   0   0   0   0   0   0   0   1
4   1   0   0   0   0   0   0   0   0   0   0   0   1
5   0   0   0   0   0   0   0   0   0   0   0   0   0
6   1   1   1   1   1   1   1   0   0   0   0   0   1
;
run;

data test; 
  set have;
  if min(of bf1--bf12)=1 and max(of bf1--bf12)=1 then bfcat=121;
  else if min(of bf1--bf6)=1 and max(of bf1--bf6)=1 then bfcat=61;
  else if ever=1 then bfcat=1;
run;

However you will need to explain your logic further as you have else statements of your if's as well. This:

if x(1-12)=1 then bfcat=121; else bfcat=120;

Means that bfcat will either be 121 or 120 - but this will be overwritten by the next if, and the last if with the ever will always overide everything else. Thats why I made it else's in the given code. Also note you don't need the array or loop.

Cruise · Posted 01-03-2018 01:33 PM

@RW9

Chart shows data by BF12M+(Yes) vs BF12M+(No), BF3M+(Yes) vs BF3M+(No), BF1M+(Yes) vs BF1M+(No), BF ever(Yes) vs BF ever+(No). That's where my confusion came from with "if-else-then" logic. I like your strategy, but I'm wondering how I would come up with data produced from @AhmedAl_Attar 's sample data.

AhmedAl_Attar · Posted 01-03-2018 04:26 PM

@Cruise,
To provide you with way to re-produce my sample data Structure, please answer these two questions:
Q1: What does your 5 Million records data set look like, in terms of structure?

Q2: What does your categories, such as "BF 12M+ (Yes)" mean? Are these calculation with 1==Yes, 0==No? while 12M+ means 12 months or More? Are your bf1 -- bf12 represent months?

Thanks,
Ahmed

Cruise · Posted 01-03-2018 09:17 PM

@AhmedAl_Attar

Q1: Please see attached image below. I'm awfully sorry for providing you with a screenshot for an idea of how my real data looks like. My SAS on laptop had expired.

Q2: What does your categories, such as "BF 12M+ (Yes)" mean?
- "BF 12M+ (Yes)" means that given individual had received BF treatment for longer than 12 months vs "BF 12M+ (No)" means "else, then less than 12 months".
- Yes, 1==Yes, 0==No.
- While 12M+ means 12 months or More? Yes, you're right.
- Are your bf1 -- bf12 represent months? Yes, they're number of months that treatment BF continued.
- As shown in the data screen shot, id=1 invidual volunteered for measurements at his/her age of 1,2,3 and 4 and known to have BF treatment for more than 12 months. Thus also takes value 1 for "ever" variable.
Individual id=2 volunteered for only one measurement at age category of 1 and known to have BF treatment for only 2 months. Individual id=3 had never received a BF treatment thus took "0" for "ever" variable and 0 for across all BF dummy variables. Same logic applies for id=4 who has 3 measurements but treatment continued for only 1 month.

Thank you very much Ahmed!

Reeza · Posted 01-04-2018 10:54 AM

You should really type it out or provide data in a text format, otherwise to work with it, we have to.

I feel like you're skipping a whole bunch of steps and logic here in what needs to happen. Please take some time to detail your data structure. It sounds like your BF groups/lines you want to draw are not pre-calculated, so that's likely the step you should start with. You can worry about format for a chart when you reach that point.

AhmedAl_Attar · Posted 01-04-2018 10:54 AM

Hi Cruise,

Based on your sample data, here is what I was to come up with,

data want (Keep=id year category values);

    SET work.have(rename=(measurment_years=year));
    length category $13;

     if min(of bf1--bf12)=1 and max(of bf1--bf12)=1 then 
        category='BF 12M+ (Yes)';
    else 
    do;
        if min(of bf1--bf12)=0 and max(of bf1--bf12)=0 then 
            category='BF 12M+ (No)';
        else
        do;
            if min(of bf1--bf6)=1 and max(of bf1--bf6)=1 then 
                category='BF 6M+ (Yes)';
            else 
            do;
                if min(of bf1--bf6)=0 and max(of bf1--bf6)=0 then 
                    category='BF 6M+ (No)';
                else
                do;
                    if min(of bf1--bf3)=1 and max(of bf1--bf3)=1 then 
                        category='BF 3M+ (Yes)';
                    else 
                    do;
                        if min(of bf1--bf3)=0 and max(of bf1--bf3)=0 then 
                            category='BF 3M+ (No)';
                        else if (bf1=1) then 
                            category='BF 4WK+ (Yes)';
                         else
                            category='BF 4WK+ (No)';
                    end; /* End of 3M+ check */
                end;
            end; /* End of 6M+ check */
        end;
    end; /* End of 12M+ check */
    output;
    if (ever=1) then 
        category='BF Ever (Yes)';
     else
        category='BF Ever (No)';
    output;
run;

As you can see, I used the evaluation logic suggested by @RW9, and I assumed your categories are mutually exclusive.

Note: I just added an excel version of your data image as an attachment.

Thanks,

Ahmed

Cruise · Posted 01-04-2018 12:48 PM

@AhmedAl_Attar

This is good ! The guy who created the data is not here anymore. However, I have just learned that variable 'category' is inclusive which is meaning:

BF_1: individual received treatment for more than one month. BF_3 means that individual received treatment for more than 3 months instead equal to 3 months., so forth so on. So below simple array is correct even though I still need complementary categories of (No) for each category created exactly by the manner you demonstrated in your last do loop. Please note that the logic underlying "if EVER=1 and x(i)=0 then bf='BF 3M (Yes)'; " is true given internal data log note.

Now question becomes:

If I modify your code to ...min(of bf3--bf5) instead min(of bf1--bf5) do we still need 'else' logic there? Sorry, if it's only confusing.

Would you mind to show me how your do loop would be modified given 'category' was inclusive as I described above?

array x (12) BF_1-BF_12;
do i=1 to 12;
if EVER=1 and x(i)=0 then cat='BF 3M (Yes)'; 
end;
if EVER =0 then cat='BF Ever (No)'; 
if BF_1=1 or BF_2=1 then cat='BF 3M (Yes)';  
if BF_3=1 or BF_4=1 or BF_5=1 then cat='BF 6M (Yes)';  
if BF_6=1 or BF_7=1 or BF_8=1 then cat='BF 9M (Yes)'; 
if BF_9=1 or BF_10=1 or BF_11=1 then cat='BF 12M (Yes)'; 
if BF_12=1 then cat='BF 12M+ (Yes)';

where 'cat' in above code means: 
BF Ever (No) vs (Yes)=never vs ever received treatment 
BF 3M (Yes)=received BF treatment for 3 months 
BF 6M (Yes)=received BF treatment for 6 months 
BF 9M (Yes)=received BF treatment for 9 months 
BF 12M (Yes)=received BF treatment for 12 months 
BF 12M+ (Yes)=received BF treatment beyond 12 months

AhmedAl_Attar · Posted 01-04-2018 03:48 PM

Hi,
I'm sorry, but I'm a little bit confused by your reply, and I don't fully understand your requirements!
It seems you have changed the logic around from your original questions and how to reproduce the line plot/chart.

You have been provided with various coding techniques and approaches already, and now it's your turn to experiment and learn more about how to customize them to meet your requirements.

If you want to get SAS on your laptop again, try to download SAS University Edition (https://www.sas.com/en_us/software/university-edition/download-software.html).

Ahmed

Cruise · Posted 01-04-2018 05:04 PM

@AhmedAl_Attar

Thank you very much for your inputs!! I truly appreciate it. I now got the logic behind your do loop and I'd like to use this approach for final data preparation for a chart. May I have a question please?

The resulting data 'want' is correct except row 11 and 13 has wrong values for a variable 'category'. What went wrong in modifying your do loop? I will summarize data for a chart after 'category' variable is created. Thanks zillions!

data have;
input id bf1 bf2 bf3 bf4 bf5 bf6 bf7 bf8 bf9 bf10 bf11 bf12 ever denom obs years;
datalines;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2002
1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2003
2 1 1 1 1 1 1 1 1 1 1 1 0 1 3 0 2002
3 1 1 1 1 1 1 1 1 0 0 0 0 1 4 1 2003
4 1 1 1 1 1 0 0 0 0 0 0 0 1 5 1 2004
5 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2004
6 1 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2004
;

data want;
    SET have;
    length category $13;
     if min(of bf1--bf12)=1 and max(of bf1--bf12)=1 then 
        category='BF 12M+ (Yes)';
    else 
    do;
        if min(of bf1--bf12)=0 and max(of bf1--bf12)=0 then 
            category='BF 12M+ (No)';
        else
        do;
            if min(of bf1--bf11)=1 and max(of bf1--bf11)=1 then 
                category='BF 12M (Yes)';
            else 
            do;
                if min(of bf1--bf11)=0 and max(of bf1--bf11)=0 then 
                    category='BF 12M (No)';
                else
                do;
                    if min(of bf1--bf8)=1 and max(of bf1--bf8)=1 then 
                        category='BF 9M (Yes)';
                    else 
                    do;
                        if min(of bf1--bf8)=0 and max(of bf1--bf8)=0 then 
                            category='BF 9M (No)';
	else
        do;
            if min(of bf1--bf5)=1 and max(of bf1--bf5)=1 then 
                category='BF 6M (Yes)';
            else 
            do;
                if min(of bf1--bf5)=0 and max(of bf1--bf5)=0 then 
                    category='BF 6M (No)';
	else
        do;
            if min(of bf1--bf2)=1 and max(of bf1--bf2)=1 then 
                category='BF 3M (Yes)';
            else 
            	do;
                if min(of bf1--bf2)=0 and max(of bf1--bf2)=0 then 
                    category='BF 3M (No)';
				else 
					do;
					if min(of bf1--bf12)=0 and max(of bf1--bf12)=0 and ever=1 then 
                            category='BF 1Mo (Yes)';
                         else category='BF 1Mo (No)';
                    end; 
                end;
            end; 
        end;
    end; 
end; 
end; 
end;
end;
end;

    output;
    if (ever=1) then 
        category='BF Ever (Yes)';
     else
        category='BF Ever (No)';
    output;
run;

AhmedAl_Attar · Posted 01-05-2018 08:29 AM

Hi Cruise,

What were you expecting the category value for row 11 & 13 in your output (Want) data set?

From looking at the input rows and your conditions (ever=1)

5 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2004 --> [Row 12] BF Ever (No), [Row 11] BF 12M+ (No)

6 1 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2004 --> [Row 14] BF Ever (No), [Row 13] BF 1Mo (No)

Ahmed

Reeza · Posted 01-03-2018 11:51 AM

Transpose your data to a wide format.

It should look like, with Year, where's year in your data?

ID Year Category Value

1 2013 BF1 1

1 2013 BF2 1

1 2013 BF3 1

...

1 2014 BF1 1

1 2014 BF2 1

2 2013 BF1 1

...

Then your code will be something like:

proc sgplot data=have;
series x=year y=value / group=category;
run;

@Cruise wrote:

I'd like to reproduce image below showing how soil fertility relate to BF categories based on fertilization durations over time.

Have is the sample of my data. I'm totally confused!!! I feel like an idiot anyway. Try not to judge me please.

Question1: First off, I thought creating categorical variable 'bfcat' shown in my array code below would help create categorical variable for 'BF categories based on fertilization durations' that splits lines in the image above. Please see Chart legend, e.g., BF ever(yes,no) et.c. But creating variable 'bfcat' doesn't make sense, right? Any alternative suggestions as to how I should organize my data to come up with similar chart shown above please?

Question2:

Array code gave error:

ERROR: Array subscript out of range at line 223 column 4.

What am I doing wrong here?
data have; 
input id bf1 bf2 bf3 bf4 bf5 bf6 bf7 bf8 bf9 bf10 bf11 bf12 ever; 
datalines;
1	1	1	1	1	1	1	1	1	1	1	1	1	1
2	1	1	1	1	0	0	0	0	0	0	0	0	1
3	1	1	0	0	0	0	0	0	0	0	0	0	1
4	1	0	0	0	0	0	0	0	0	0	0	0	1
5	0	0	0	0	0	0	0	0	0	0	0	0	0
6	1	1	1	1	1	1	1	0	0	0	0	0	1
;
data test; set have;
array x (12) BF_1-BF_12;
do i=1 to 12;
if x(1-12)=1 then bfcat=121; else bfcat=120;
if x(1-6)=1 then bfcat=61; else bfcat=60;
if x(1-3)=1 then bfcat=31; else bfcat=30;
if x(1)=1 then bfcat=11; else bfcat=10;
end;
if bf_ever=1 then bfcat=1; else bfcat=0;
run;

Cruise · Posted 01-03-2018 01:28 PM

@Reeza

I have about 5 million rows. Does transpose work for large dataset like this? SAS crashed with below code. I still don't understand the benefit of transposing, can you please explain a little more?

proc transpose data=realdata;
out=temp
name=transposed_col;
VAR BF_1-BF_12;
by id survey_yr;
run;

Reeza · Posted 01-03-2018 02:01 PM

Transposing would mean that you can use the GROUP option in my sample SGPLOT. If you don't then you need to have a series statement for each line, which can be cumbersome.

5 million rows * 10 variables would be a 50 million row data set. I'm assuming that at some point you actually summarize the data to graph it...showing 5 million data points on any graph would be problematic.

AhmedAl_Attar · Posted 01-03-2018 12:14 PM

Hi,

As @Reeza had mentioned, your data should be converted to have three columns (year, Category, value) and 48 rows (2003 - 2013 by 2 yrs increments)

This how I generated my Sample data to "reproduce" your chart

data want(KEEP=year cat val);
    length year 4 cat $12 val 4; 
    array bfs {8} $12 bf1 - bf8 ('BF Ever (Yes)' 'BF Ever (No)' 'BF 4WK+ (Yes)' 'BF 4WK+ (No)' 'BF 3M+ (Yes)' 'BF 3M+ (No)' 'BF 12M+ (Yes)' 'BF 12M+ (No)');
    do year=2003 to 2013 by 2;
        do i=1 to dim(bfs);
            cat = bfs[i];
            val = rand('uniform') * 10; /* Used Random numbers */
            output;
        end;
    end;
run;

For Custom line styling and coloring, you could use SAS/GRAPH SYMBOL, LEGEND, and AXIS statements prior to Proc Gplot, or use other SAS charting procedures.

Note: I used the Line Plot Wizard in SAS Enterprise Guide, to generate my chart.

Hope this helps.

Ahmed

Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Re: Data preparation for a line chart and using array index in different range

Registration is open

SAS Training: Just a Click Away