BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
joebacon
Pyrite | Level 9

Hi all,

 

I have three very related questions that are simple but I can not figure out. I annotated the code.


I do realize my data structure is wonky and I intend to create a year column instead of having these variables but I thought it would be easier to do these transformations this way (Correct me if I am wrong) since it was how I was given the data.

 

libname NLSY 'U:\My SAS Files\NLSY97\NLSY97_Height_Weight';
run;

data Fix;
set NLSY.nlsy;
array missing _numeric_;
do over missing;
if missing < 0 then missing=.;
end;
run;

Data BMI;
Set Fix;
If Height_F_1997 > 0 then Height_In_1997  = Height_F_1997*12+Height_In_1997; /*Why wouldnt "if not missing" work instead of <0?*/
If Height_F_1998 > 0 then Height_In_1998  = Height_F_1998*12+Height_In_1998;
If Height_F_1999 > 0 then Height_In_1999  = Height_F_1999*12+Height_In_1999;
If Height_F_2000 > 0 then Height_In_2000  = Height_F_2000*12+Height_In_2000;
/*How do you do this all the way through 2011 without copy and pasting?*/
run;


data BMI;
set Fix ;
BMI_97 = (Weight_1997*703)/(Height_In_1997)**2;
BMI_98 = (Weight_1998*703)/(Height_In_1998)**2;
BMI_99 = (Weight_1999*703)/(Height_In_1999)**2;
BMI_00 = (Weight_2000*703)/(Height_In_2000)**2;
/*Exact same question*/
run;

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
Data BMI;
Set Fix;
array j(*)Height_F_1997-Height_F_2011;
array k(*)Height_In_1997-Height_In_2011;
do i=1 to dim(j);
if j(i)>0 then j(k)=j(i)*12+j(i);
end;
drop i;
run;

if you can follow the above, should get you closer

View solution in original post

13 REPLIES 13
PaigeMiller
Diamond | Level 26

Data (such as year) never belongs in a variable name.

 

Data (such as year) belongs should be a value of a variable.

 

So, a more simple and easier to work with arrangement of the data would have a variable named YEAR, and a variable named Height_F and a variable named Height_IN and then the entire coding is

 

If Height_F > 0 then Height_IN = Height_F*12 + Height_IN;

So a restructuring of your data would seem to be in order, except that involves coding as well. 

 

data new;
    set old; 
    year=1997; 
    height_f=height_f_1997;
    height_in=height_in_1997;
    output;
    year=1998;
    /* etc. */
    output;
run;

I leave it up to you which one is simpler.

 

All of the above can be done via arrays as well.

 

Even so, I recommend the restructuring as it will make everything you do with this data simpler in the long run.

--
Paige Miller
joebacon
Pyrite | Level 9
Thank you Paige!

I am going to pull out the year and then do the arrays because it will be much simpler!
novinosrin
Tourmaline | Level 20
Data BMI;
Set Fix;
array j(*)Height_F_1997-Height_F_2011;
array k(*)Height_In_1997-Height_In_2011;
do i=1 to dim(j);
if j(i)>0 then j(k)=j(i)*12+j(i);
end;
drop i;
run;

if you can follow the above, should get you closer

joebacon
Pyrite | Level 9
I follow this for the most part, but why would i reference array k in the "then" statement?

Would it not be if j(i)>0 then k(i)=j(i)*12+k(i);


Either way, thank you! This is exactly what I was looking for.
novinosrin
Tourmaline | Level 20

Yes my sincere apologies for typo. My bad. I need to get my eyes checked or is it too much caffeine shakes I am having due to addiction. Good catch and well done!

joebacon
Pyrite | Level 9

No! Do not be sorry at all. I am just learning and wanted to be sure. Coffee is my lifeblood. 

 

A follow-up question:

 

For the second set of arrays, how to initialize BMI as an array?

 

For instance, I have this:

 

Data BMI2;
Set BMI;
array Weight(*)Weight_1997-Weight_2011;
array Height(*)Height_In_1997-Height_In_2011;
do i=1 to dim(Weight);
if Weight(i) >0 and Height(i) >0 then BMI=((Weight(i)*703)/((Height(i))**2),i);
end;
drop i;
run;

I cannot get the BMI to calculate for each value of weight and height, but can get it to work for only one.

novinosrin
Tourmaline | Level 20

Add a third array

 

array b(*) bmi1997-bmi2011;

 

and I(index) is anyway the same when you run the loop

 

so basically you have 15 elements in each array and the index variable I increments from 1 to 15 in the loop and array reference uses the index accordingly

joebacon
Pyrite | Level 9

Another follow up question since you seem to be rather good with arrays. 

 

I wanted to just as Paige suggested and pull out the year with an array rather than plug and chug. I have nothing against the long way, but I am always looking for the quickest route.

 

How would you do that? I have something like this as of now:

 

Data BMI3;
Set BMI2;
array Weight(*)Weight_1997-Weight_2011;
array Height(*)Height_In_1997-Height_In_2011;
array BMI(*) BMI_1997-BMI_2011;
do Year=1 to dim(Weight);
Year = substr(BMI,5,4);
Weight = Weight[Year];
Height = Height[Year];
BMI = BMI[Year];
output;
end;
run;

But i am getting illegal reference errors.

novinosrin
Tourmaline | Level 20

Would help me test on my machine, if you could help me/us by providing sample data of what you have and your required output.

Otherwise, I am gonna just assume what your data may contain and we would keep going back and forth.

 

If you could provide us a good representative sample of what you have and the required output, I will give you the solution in one shot. 🙂

 

PS Please do not hesitate ever if you need any help in SAS, somebody will offer the solution regardless of complexity. The SAS family here(you, me and all of us) are part of amazing group. 

joebacon
Pyrite | Level 9

I have a set that looks like this:

 

ID Height_In_1997 Weight_1997 Gender DOB_M DOB_Y Race Height_In_1998 Weight_1998 Height_In_1999 Weight-1999 ... BMI_1997 BMI_1998

 

1 60 150 2 2 1980 2 61 155 ....

 

I did it the long way like Paige suggested so that 

Data BMI_Year;
	Set BMI2;
	Year = 1997;
	Height = Height_In_1997;
	Weight = Weight_1997;
	BMI = BMI_1997;
	output;
	Year = 1998;
	Height = Height_In_1998;
	Weight = Weight_1998;
	BMI = BMI_1998;
	output;
	Year = 1999;
	Height = Height_In_1999;
	Weight = Weight_1999;
	BMI = BMI_1999;
	output;
	Year = 2000;
	Height = Height_In_2000;
	Weight = Weight_2000;
	BMI = BMI_2000;
	output;
	Year = 2001;
	Height = Height_In_2001;
	Weight = Weight_2001;
	BMI = BMI_2001;
	output;
	Year = 2002;
	Height = Height_In_2002;
	Weight = Weight_2002;
	BMI = BMI_2002;
	output;
	Year = 2003;
	Height = Height_In_2003;
	Weight = Weight_2003;
	BMI = BMI_2003;
	output;
	Year = 2004;
	Height = Height_In_2004;
	Weight = Weight_2004;
	BMI = BMI_2004;
	output;
	Year = 2005;
	Height = Height_In_2005;
	Weight = Weight_2005;
	BMI = BMI_2005;
	output;
	Year = 2006;
	Height = Height_In_2006;
	Weight = Weight_2006;
	BMI = BMI_2006;
	output;
	Year = 2007;
	Height = Height_In_2007;
	Weight = Weight_2007;
	BMI = BMI_2007;
	output;
	Year = 2008;
	Height = Height_In_2008;
	Weight = Weight_2008;
	BMI = BMI_2008;
	output;
	Year = 2009;
	Height = Height_In_2009;
	Weight = Weight_2009;
	BMI = BMI_2009;
	output;
	Year = 2010;
	Height = Height_In_2010;
	Weight = Weight_2010;
	BMI = BMI_2010;
	output;
	Year = 2011;
	Height = Height_In_2011;
	Weight = Weight_2011;
	BMI = BMI_2011;
	output;
	Year = 2013;
	Height = Height_In_2011;
	Weight = Weight_2013;
	if height >0 and weight> 0 then BMI =((Weight_2013*703)/((Height_In_2011)**2));
	output;
	Year = 2015;
	Height = Height_In_2011;
	Weight = Weight_2015;
	if height >0 and weight> 0 then BMI =((Weight_2015*703)/((Height_In_2011)**2));
	output;
	Drop BMI_1997-BMI_2011 Height_In_1997-Height_In_2011 Weight_1997-Weight_2015;
	Label Gender = "1= Male, 2=Female"
		DOB_M = "Date of Birth Month"
		DOB_Y = "Date of Birth Year"
		Race = 1= "Black", 2= "Hispanic", 3= "Mixed Race (Non-Hispanic)" 4= "Non-Black / Non-Hispanic"
		Year= "Year the Survey took place"
		Height = "Height in Inches"
		Weight = "Weight in pounds";
Run;

But  I was wondering how you would do that same thing in an array?

 

Thank you for helping and being so kind! I can't speak for everyone, but its often humbling having to post on this forum. However, it has helped me learn rather quickly the tips and tricks.

 

Best,

Joe 

 

novinosrin
Tourmaline | Level 20

The sequence seem to jump from 2011 to 2013 and then 2015. in the below

 

 

output;
	Year = 2011;
	Height = Height_In_2011;
	Weight = Weight_2011;
	BMI = BMI_2011;
	output;
	Year = 2013;
	Height = Height_In_2011;
	Weight = Weight_2013;
	if height >0 and weight> 0 then BMI =((Weight_2013*703)/((Height_In_2011)**2));
	output;
	Year = 2015;
	Height = Height_In_2011;
	Weight = Weight_2015;

Are you sure or is that a typo?

 

if it is sequential from 1997-2013. it;s easy like

do year =1997 to 2013

 

joebacon
Pyrite | Level 9

It did in fact jump. This is a public data set but they have no data for years 2012 and 2014. They also only have weight for 2013 and 2015. Thus, I made an assumption the height would be the same after those years. Long story short, it is correct!

 

Is it really as simple as that? 

 

Year = 1997-2015;
	Height = Height_In_1997-Height_In_2015;
	Weight = Weight_1997-Weight_2015;
	BMI = BMI_1997-BMI_2015;
	output;

or should i just go to 2011 and manually do the 2013 and 2015?

novinosrin
Tourmaline | Level 20

Is it really as simple as that? 

 

Year = 1997-2015;
	Height = Height_In_1997-Height_In_2015;
	Weight = Weight_1997-Weight_2015;
	BMI = BMI_1997-BMI_2015;
	output;

No, let me review and make some adjustments to the array and loop and respond back 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1174 views
  • 4 likes
  • 3 in conversation