Hi all,
I have three very related questions that are simple but I can not figure out. I annotated the code.
I do realize my data structure is wonky and I intend to create a year column instead of having these variables but I thought it would be easier to do these transformations this way (Correct me if I am wrong) since it was how I was given the data.
libname NLSY 'U:\My SAS Files\NLSY97\NLSY97_Height_Weight';
run;
data Fix;
set NLSY.nlsy;
array missing _numeric_;
do over missing;
if missing < 0 then missing=.;
end;
run;
Data BMI;
Set Fix;
If Height_F_1997 > 0 then Height_In_1997 = Height_F_1997*12+Height_In_1997; /*Why wouldnt "if not missing" work instead of <0?*/
If Height_F_1998 > 0 then Height_In_1998 = Height_F_1998*12+Height_In_1998;
If Height_F_1999 > 0 then Height_In_1999 = Height_F_1999*12+Height_In_1999;
If Height_F_2000 > 0 then Height_In_2000 = Height_F_2000*12+Height_In_2000;
/*How do you do this all the way through 2011 without copy and pasting?*/
run;
data BMI;
set Fix ;
BMI_97 = (Weight_1997*703)/(Height_In_1997)**2;
BMI_98 = (Weight_1998*703)/(Height_In_1998)**2;
BMI_99 = (Weight_1999*703)/(Height_In_1999)**2;
BMI_00 = (Weight_2000*703)/(Height_In_2000)**2;
/*Exact same question*/
run;
Thank you in advance!
Data BMI;
Set Fix;
array j(*)Height_F_1997-Height_F_2011;
array k(*)Height_In_1997-Height_In_2011;
do i=1 to dim(j);
if j(i)>0 then j(k)=j(i)*12+j(i);
end;
drop i;
run;
if you can follow the above, should get you closer
Data (such as year) never belongs in a variable name.
Data (such as year) belongs should be a value of a variable.
So, a more simple and easier to work with arrangement of the data would have a variable named YEAR, and a variable named Height_F and a variable named Height_IN and then the entire coding is
If Height_F > 0 then Height_IN = Height_F*12 + Height_IN;
So a restructuring of your data would seem to be in order, except that involves coding as well.
data new;
set old;
year=1997;
height_f=height_f_1997;
height_in=height_in_1997;
output;
year=1998;
/* etc. */
output;
run;
I leave it up to you which one is simpler.
All of the above can be done via arrays as well.
Even so, I recommend the restructuring as it will make everything you do with this data simpler in the long run.
Data BMI;
Set Fix;
array j(*)Height_F_1997-Height_F_2011;
array k(*)Height_In_1997-Height_In_2011;
do i=1 to dim(j);
if j(i)>0 then j(k)=j(i)*12+j(i);
end;
drop i;
run;
if you can follow the above, should get you closer
Yes my sincere apologies for typo. My bad. I need to get my eyes checked or is it too much caffeine shakes I am having due to addiction. Good catch and well done!
No! Do not be sorry at all. I am just learning and wanted to be sure. Coffee is my lifeblood.
A follow-up question:
For the second set of arrays, how to initialize BMI as an array?
For instance, I have this:
Data BMI2;
Set BMI;
array Weight(*)Weight_1997-Weight_2011;
array Height(*)Height_In_1997-Height_In_2011;
do i=1 to dim(Weight);
if Weight(i) >0 and Height(i) >0 then BMI=((Weight(i)*703)/((Height(i))**2),i);
end;
drop i;
run;
I cannot get the BMI to calculate for each value of weight and height, but can get it to work for only one.
Add a third array
array b(*) bmi1997-bmi2011;
and I(index) is anyway the same when you run the loop
so basically you have 15 elements in each array and the index variable I increments from 1 to 15 in the loop and array reference uses the index accordingly
Another follow up question since you seem to be rather good with arrays.
I wanted to just as Paige suggested and pull out the year with an array rather than plug and chug. I have nothing against the long way, but I am always looking for the quickest route.
How would you do that? I have something like this as of now:
Data BMI3;
Set BMI2;
array Weight(*)Weight_1997-Weight_2011;
array Height(*)Height_In_1997-Height_In_2011;
array BMI(*) BMI_1997-BMI_2011;
do Year=1 to dim(Weight);
Year = substr(BMI,5,4);
Weight = Weight[Year];
Height = Height[Year];
BMI = BMI[Year];
output;
end;
run;
But i am getting illegal reference errors.
Would help me test on my machine, if you could help me/us by providing sample data of what you have and your required output.
Otherwise, I am gonna just assume what your data may contain and we would keep going back and forth.
If you could provide us a good representative sample of what you have and the required output, I will give you the solution in one shot. 🙂
PS Please do not hesitate ever if you need any help in SAS, somebody will offer the solution regardless of complexity. The SAS family here(you, me and all of us) are part of amazing group.
I have a set that looks like this:
ID Height_In_1997 Weight_1997 Gender DOB_M DOB_Y Race Height_In_1998 Weight_1998 Height_In_1999 Weight-1999 ... BMI_1997 BMI_1998
1 60 150 2 2 1980 2 61 155 ....
I did it the long way like Paige suggested so that
Data BMI_Year;
Set BMI2;
Year = 1997;
Height = Height_In_1997;
Weight = Weight_1997;
BMI = BMI_1997;
output;
Year = 1998;
Height = Height_In_1998;
Weight = Weight_1998;
BMI = BMI_1998;
output;
Year = 1999;
Height = Height_In_1999;
Weight = Weight_1999;
BMI = BMI_1999;
output;
Year = 2000;
Height = Height_In_2000;
Weight = Weight_2000;
BMI = BMI_2000;
output;
Year = 2001;
Height = Height_In_2001;
Weight = Weight_2001;
BMI = BMI_2001;
output;
Year = 2002;
Height = Height_In_2002;
Weight = Weight_2002;
BMI = BMI_2002;
output;
Year = 2003;
Height = Height_In_2003;
Weight = Weight_2003;
BMI = BMI_2003;
output;
Year = 2004;
Height = Height_In_2004;
Weight = Weight_2004;
BMI = BMI_2004;
output;
Year = 2005;
Height = Height_In_2005;
Weight = Weight_2005;
BMI = BMI_2005;
output;
Year = 2006;
Height = Height_In_2006;
Weight = Weight_2006;
BMI = BMI_2006;
output;
Year = 2007;
Height = Height_In_2007;
Weight = Weight_2007;
BMI = BMI_2007;
output;
Year = 2008;
Height = Height_In_2008;
Weight = Weight_2008;
BMI = BMI_2008;
output;
Year = 2009;
Height = Height_In_2009;
Weight = Weight_2009;
BMI = BMI_2009;
output;
Year = 2010;
Height = Height_In_2010;
Weight = Weight_2010;
BMI = BMI_2010;
output;
Year = 2011;
Height = Height_In_2011;
Weight = Weight_2011;
BMI = BMI_2011;
output;
Year = 2013;
Height = Height_In_2011;
Weight = Weight_2013;
if height >0 and weight> 0 then BMI =((Weight_2013*703)/((Height_In_2011)**2));
output;
Year = 2015;
Height = Height_In_2011;
Weight = Weight_2015;
if height >0 and weight> 0 then BMI =((Weight_2015*703)/((Height_In_2011)**2));
output;
Drop BMI_1997-BMI_2011 Height_In_1997-Height_In_2011 Weight_1997-Weight_2015;
Label Gender = "1= Male, 2=Female"
DOB_M = "Date of Birth Month"
DOB_Y = "Date of Birth Year"
Race = 1= "Black", 2= "Hispanic", 3= "Mixed Race (Non-Hispanic)" 4= "Non-Black / Non-Hispanic"
Year= "Year the Survey took place"
Height = "Height in Inches"
Weight = "Weight in pounds";
Run;
But I was wondering how you would do that same thing in an array?
Thank you for helping and being so kind! I can't speak for everyone, but its often humbling having to post on this forum. However, it has helped me learn rather quickly the tips and tricks.
Best,
Joe
The sequence seem to jump from 2011 to 2013 and then 2015. in the below
output;
Year = 2011;
Height = Height_In_2011;
Weight = Weight_2011;
BMI = BMI_2011;
output;
Year = 2013;
Height = Height_In_2011;
Weight = Weight_2013;
if height >0 and weight> 0 then BMI =((Weight_2013*703)/((Height_In_2011)**2));
output;
Year = 2015;
Height = Height_In_2011;
Weight = Weight_2015;
Are you sure or is that a typo?
if it is sequential from 1997-2013. it;s easy like
do year =1997 to 2013
It did in fact jump. This is a public data set but they have no data for years 2012 and 2014. They also only have weight for 2013 and 2015. Thus, I made an assumption the height would be the same after those years. Long story short, it is correct!
Is it really as simple as that?
Year = 1997-2015;
Height = Height_In_1997-Height_In_2015;
Weight = Weight_1997-Weight_2015;
BMI = BMI_1997-BMI_2015;
output;
or should i just go to 2011 and manually do the 2013 and 2015?
Is it really as simple as that?
Year = 1997-2015;
Height = Height_In_1997-Height_In_2015;
Weight = Weight_1997-Weight_2015;
BMI = BMI_1997-BMI_2015;
output;
No, let me review and make some adjustments to the array and loop and respond back
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.