- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
All blood samples were drawn in 1990. However, during data entry the order of blood samples was scrambled so that blood sample A may not correspond to the first blood sample taken on a subject, it could be the first, second or third. The same ordering concern may apply to blood samples B and C as well. In addition, some of the months and days for the blood sampling were not written on the forms. At data entry, missing month and missing day values were each coded as -1 or 13 for month and -1 or 32 for day. Be sure to write your code to account for either possibility.
The team of investigators for this project has made the following decisions regarding the missing values. Any missing days should be set to 15 and any missing months set to 6. Any analyses that follow are to be done on this data set. Be sure to implement the SAS syntax as indicated for each question. For example, use SAS arrays and loops if the item states that these must be used.
A) Using this saved SAS data set, create a new, temporary SAS data set and performing the following:
1) use SAS arrays and looping to create a SAS date variable for each of the three blood samples and to address the missing data in accordance to the decisions of the investigators. Use arrays and a loop to recode the missing values for day and month;
2) use a SAS function to create a new variable for the highest, i.e., maximum, blood lead value for each child;
3) use SAS arrays and looping to identify the date on which this highest value was obtained and create a new variable for the date of the highest blood lead value;
Code I have used
data temp2_lead_f2022;
set temp_lead_f2022;
array x {3} daybld_a daybld_b daybld_c;
array y {3} mthbld_a mthbld_b mthbld_c;
array dates {3} date1_a date2_b date3_c;
array maxleaddt {3} pblev_a pblev_b pblev_c;
do i = 1 to 3;
dates{i} = mdy( y{i},x{i},1990);
end;
do k= 1 to 3;
if maxleaddt{k} = maxlead then dates{k}= max_date;
end;
drop i k;
format date1_a date2_b date3_c dob mmddyy8. ;
maxlead= max(of pblev_a pblev_b pblev_c);
run;
I am trying to solve 3 one but not getting the correct syntax
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Order of operations. The maxlead is calculated AFTER you try and identify the date in the loop.
- When finding the maximum, the array notation is not used, all variables are listed
- #1 in the assignment is not answered in this piece of shown code, dealing with missing values.
- If there are ties in the maximum date the last one is selected.
- I would recommend using descriptive array names, not x/y makes it harder to follow
- In the loop to identify the maximum date, the array is changed when the variable max_date should be modified
- Add comments to the code to state what is happening in each section.
data temp2_lead_f2022; set temp_lead_f2022; array x {3} daybld_a daybld_b daybld_c; array y {3} mthbld_a mthbld_b mthbld_c; array dates {3} date1_a date2_b date3_c; array maxleaddt {3} pblev_a pblev_b pblev_c; do i = 1 to 3; dates{i} = mdy( y{i},x{i},1990); end; maxlead= max(of pblev_a pblev_b pblev_c); do k= 1 to 3; if maxleaddt{k} = maxlead then dates{k}= max_date; end; drop i k; format date1_a date2_b date3_c dob mmddyy8. ; run;
I've highlighted problematic parts of the code and you can fix it from here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data temp_lead_f2022;
set in.lead_f2022;
run;
data temp_lead_f2022;
set temp_lead_f2022;
array a {3} daybld_a daybld_b daybld_c;
array b {3} mthbld_a mthbld_b mthbld_c;
do i=1 to 3;
if a{i} = -1 then a{i} = 15 ;
else if a{i} = 13 then a{i} = 15;
end;
do i=1 to 3;
if b{i} = -1 then b{i} = 6;
else if b{i} = 32 then b{i} = 6;
end;
run;
data temp2_lead_f2022;
set temp_lead_f2022;
array x {3} daybld_a daybld_b daybld_c;
array y {3} mthbld_a mthbld_b mthbld_c;
array dates {3} date1_a date2_b date3_c;
array maxleaddt {3} pblev_a pblev_b pblev_c;
do i = 1 to 3;
dates{i} = mdy( y{i},x{i},1990);
end;
maxlead= max(of pblev_a pblev_b pblev_c);
do k= 1 to 3;
if maxleaddt{k} = maxlead then dates{k}= max_date;
end;
drop i k;
format date1_a date2_b date3_c dob mmddyy8. ;
run;
ERROR Message
NOTE: Variable max_date is uninitialized.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I've bolded the statement in my initial response.
Commenting your code will help you find the issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
i know there is something wrong when i try to create a max_date variable
But im not able to figure this out
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
dates{k}= max_date;
What does that statement do?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
it will give the data on which the highest level of lead was collected from blood, but i dont know how do define it properly
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's backwards. If you find the maximum date you're modifying the array variable not max_date so max_date is never created.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
oh i understand now, thanks