Hi all I am bumping into a coding issue that has me stumped. I have 5 years of data for a children under 5 and I need to find for each year the number that turned specific ages who breastfed for specific time periods in a year. For example, I need to calculate all of the children who turned 1+ weeks old in the year who breastfed for more than one week, children who turned 2+ weeks old in the year who breastfed for more than two weeks, etc.
I have child DOB, date last asked about breastfeeding and whether they ever breastfed. I have sorted out how ages at the beginning and end of the time period and age at time of breastfeeding question but I keep ending up with too many kids in each age group, probably because I am pulling in everyone who was 1+weeks, 2+weeks, etc. old, rather than just those who turned that age during the time period. I am stumped on how to get only the kids that turned that age in the year, especially since kids pass from one age group to the next through out the year. So your 1 week olds become 2 weeks, become 3 weeks....up to 18 months.
Any help would be greatly appreciated! A portion of my code is below
Thanks,
Erica
data BFDuration;
set perm.dataset;
IF AGEMOS<60 AND
AGEMOS NE .;
%LET YR=YEAR;
Startperiod = MDY(1,1,&yr);
Endperiod = MDY(12,31,&yr);
*****Child birthday******;
child_dobmo=
substr(child_dob,1,2);
child_dobdy=
substr(child_dob,3,2);
child_dobyr=
substr(child_dob,5,4);
ChildDOB = MDY(child_dobmo,
child_dobdy, child_dobyr);
*********Last breastfeeding response*****
lstbrstfeddt_mo=
substr(lstbrstfed_dt,1,2);
lstbrstfeddt_dy=
substr(lstbrstfed_dt,3,2);
lstbrstfeddt_yr=
substr(lstbrstfed_dt,5,4);
DOBreastfed =
MDY(lstbrstfeddt_mo ,lstbrstfeddt_dy ,lstbrstfeddt_yr);
*****Age at last breastfeeding response in weeks and months****
BFAGEwk=INTCK('WEEK',CHILDDOB,DOBREASTFED);
BFAGEmo=INTCK('MONTH',CHILDDOB,DOBREASTFED);
****Currently or ever breastfed*****
if cur_brstfed='1' or
ever_brstfed='1' then brstfed='1';
if cur_brstfed='2' and
ever_brstfed='2' then brstfed='2';
if cur_brstfed~='' and
ever_brstfed='' then brstfed=cur_brstfed;
if cur_brstfed='' and
ever_brstfed~='' then brstfed=ever_brstfed;
*****Age at beginning and end of time period*****
if childdob<= MDY(1,1,&yr)
then DO;
AGESTARTwk=intck('WEEK',childdob,Startperiod);
AGESTARTmo=intck('MONTH',childdob,Startperiod);
AGEENDwk=intck('WEEK',childdob,Endperiod);
AGEENDmo=intck('MONTH',childdob,Endperiod)
END;
if childdob> MDY(1,1,&yr)then DO;
AGEENDwk=intck('WEEK',childdob,Endperiod);
AGEENDmo=intck('MONTH',childdob,Endperiod);
END;
IF AGESTARTwk<=1
AND AGESTARTwk NE . AND brstfed=1 THEN DO;
IF BFAGEwk>=1 THEN BF1WK=1;
IF BFAGEwk<=1 THEN BF1WK=0;
END;
IF AGEENDwk>1 AND
AGEENDWK NE . AND brstfed=1 THEN DO;
WEEK1=INTCK('WEEK',CHILDDOB,DOBreastfed);
IF BFAGEwk>=1 THEN BF1WK=1;
IF BFAGEwk<=1 THEN BF1WK=0;
END;
IF brstfed=2 THEN DO;
BF1WK=0;
END;
Generally if it isn't a value to do arithmetic with I leave them as character. Exceptions are sometimes Yes/no codes that I'll make 1/0 with custom informats.
The date variables need to be SAS dates valued or you'll get nowhere with the other code.
Here's part of my code to read a PedNSS submission file
Length
state $ 2 substate $ 1 clinic $ 9 zip $ 5 sod 3 rcdtyp $ 1
dov 8 idchild $ 30 dob 8 sex 3 race $ 6 wicrace $ 3 contrace $ 2
hhsize 3 income 8 migrant $ 1 wic $ 1 stamps $ 1 medicaid $ 1
tanf $ 1 bwteng 8 bwtmtr 8 hgteng 3 hgtmtr 8 wgteng 8 wgtmtr 8
doght 8 hemogl 3 hemato 3 dohemo 8 cbrstfd $ 1 evrbrstfd $ 1
lgthbrst 3 dobrst 8 suppfeed 3 tvview $ 1 hhsmoke $ 1
;
INPUT
state 1-2 substate 3 clinic 4-12 zip 13-17 sod 18
rcdtyp 19 dov mmddyy8. idchild 28-57 @88 dob mmddyy8. sex 96
race 97-102 wicrace 103-105 contrace 106-107 hhsize 108-109
income 110-113 migrant 114 wic 115 stamps 116 medicaid 117
tanf 118 bwteng 119-122 bwtmtr 123-126 hgteng 127-129
hgtmtr 130-133 wgteng 134-137 wgtmtr 138-141 doght ?? mmddyy8.
hemogl 150-152 hemato 153-155 dohemo ?? mmddyy8. cbrstfd 164
evrbrstfd 165 lgthbrst 166-167 dobrst ?? mmddyy8. suppfeed 176-177
tvview 178 hhsmoke 179 ;
/* currently the documented data stops at column 179. Columns 180-209 */
/* are reserved for future CDC use. Columns 210+ are for state added */
/* data. We do have some thing showing up in columns 210 and 211 but */
/* it is not documented. */
LABEL
state = "State"
substate = "Substate"
clinic = "Clinic/School"
zip = "Zip Code"
sod = "Source of Data"
rcdtyp = "Record Type"
dov = "Date of Visit"
idchild = "Child’s Alphanumeric ID"
dob = "Date of Birth"
sex = "Sex"
race = "Race/Ethnicity (6 level)"
hisp = "Ethnicity (1=Hispanic,0=Non-hispanic)"
wicrace = "WIC Race/Ethnicity"
contrace = "Contributor-Specific Race/Ethnicity"
hhsize = "Household Size"
income = "Monthly Household Income"
migrant = "Migrant Status"
wic = "WIC"
stamps = "Food Stamps"
medicaid = "Medicaid"
tanf = "TANF"
bwteng = "Birthweight English (LBOz)"
bwtmtr = "Birthweight Metric (gm)"
hgteng = "Height English (InIn 1/8ths)"
hgtmtr = "Height Metric (mm)"
wgteng = "Weight English (LB fourths)"
wgtmtr = "Weight Metric (kg implied decimal)"
doght = "Date of Height/Weight Measure"
hemogl = "Hemoglobin (g/Dl impled decimal)"
hemato = "Hematocrit (% implied tenth)"
dohemo = "Date of Hemoglobin/Hematocrit Measure"
cbrstfd = "Currently Breastfed"
evrbrstfd = "Ever Breastfed"
lgthbrst = "Length of Time Breastfed (if ever and stopped) weeks 99=none/unk"
dobrst = "Date of Most Recent Breastfeeding"
suppfeed = "Introduction to Supplementary Feeding, age in weeks"
tvview = "TV Viewing"
hhsmoke = "Household Smoking"
distr = "WIC District"
;
format dov dob doght dohemo dobrst mmddyy10.;
### NOTE TO CASUAL READERS: The code below was developed as part of my work and vastly exceeds the amount of effort I would put into a routine response for help on this forum. I just happen to work with similar data and recognized the project. If you are interested you might search the CDC.GOV website for PedNSS and PNSS for details of some of the scope involved.
I suspect you are trying to duplicate the code behavior from PedNSS (if that looks familiar).
I'm going to paste some of my code that uses some similar variables. I think the names are sufficiently close you can modify the code below.
this label block describes the variables, may look somewhat familiar
label
dobrst = "Date of breastfeeding information"
x_bfcnt = "Ever BF -numerator"
x_bftot = "Ever BF -denominator"
x_bf1wcnt = "BF 1+ Weeks -numerator"
x_bf1wktot = "BF 1+ Weeks -denominator"
x_bf2wcnt = "BF 2+ Weeks -numerator"
x_bf2wktot = "BF 2+ Weeks -denominator"
x_bf3wcnt = "BF 3+ Weeks -numerator"
x_bf3wktot = "BF 3+ Weeks -denominator"
x_bf4wcnt = "BF 4+ Weeks -numerator"
x_bf4wktot = "BF 4+ Weeks -denominator"
x_bf6wcnt = "BF 6+ Weeks -numerator"
x_bf6wktot = "BF 6+ Weeks -denominator"
x_bf2mcnt = "BF 2+ Months -numerator"
x_bf2mtot = "BF 2+ Months -denominator"
x_bf3mcnt = "BF 3+ Months -numerator"
x_bf3mtot = "BF 3+ Months -denominator"
x_bf4mcnt = "BF 4+ Months -numerator"
x_bf4mtot = "BF 4+ Months -denominator"
x_bf5mcnt = "BF 5+ Months -numerator"
x_bf5mtot = "BF 5+ Months -denominator"
x_bf6mcnt = "BF 6+ Months -numerator"
x_bf6mtot = "BF 6+ Months -denominator"
x_bf9mcnt = "BF 9+ Months -numerator"
x_bf9mtot = "BF 9+ Months -denominator"
x_bf12mcnt = "BF 12+ Months -numerator"
x_bf12mtot = "BF 12+ Months -denominator"
x_bf18mcnt = "BF 18+ Months -numerator"
x_bf18mtot = "BF 18+ Months -denominator"
x_bfe3mcnt = "Exclusively BF 3+ Months -numerator"
x_bfe3mtot = "Exclusively BF 3+ Months -denominator"
x_bfe6mcnt = "Exclusively BF 6+ Months -numerator"
x_bfe6mtot = "Exclusively BF 6+ Months -denominator"
dov ="Date of Visit, SAS date value"
dob ="Date of Birth, SAS date value"
;
/*I create a variable that combines the current and ever breast fed variables so I simplify some comparisons*/
/* there may be some "error" adjustments that were peculiar to my data source, if you don't have the issues
I did the code should be dormant*/
length tstrng $ 2;
tstrng = cats(cbrstfd,evrbrstfd);
if tstrng = '11' then do;
if 0 le lgthbrst le 96 then cbrstfd='2';
else evrbrstfd='9';
end;
/* errors in supplemental feed, often larger than period breast feed
Also value of 77 should ONLY be when cbrstfd=1
*/
/* if tstrng in ( '22') and lgthbrst ne 99 then do;*/
/* cbrstfd='9'; */
/* evrbrstfd='1';*/
/* lgthbrst=99;*/
/* tstrng = cats(cbrstfd,evrbrstfd);*/
/* end;*/
if tstrng in ( '22') then do;
lgthbrst = 99 ;
suppfeed = 99;
end;
tstrng = cats(cbrstfd,evrbrstfd);*reset the tststrng after correction;
x_bftot = (year(dob)=year(dov) and tstrng in('19','91','21','22'));
if x_bftot = 1 then x_bfcnt = ( cbrstfd='1' or evrbrstfd='1');
array day [15] _temporary_ (7 14 21 28 42 61 92 122 153 183 274 366 548 92 183);
array bftot x_bf1wktot x_bf2wktot x_bf3wktot x_bf4wktot x_bf6wktot
x_bf2mtot x_bf3mtot x_bf4mtot x_bf5mtot x_bf6mtot
x_bf9mtot x_bf12mtot x_bf18mtot x_bfe3mtot x_bfe6mtot
;
array bfcnt x_bf1wcnt x_bf2wcnt x_bf3wcnt x_bf4wcnt x_bf6wcnt
x_bf2mcnt x_bf3mcnt x_bf4mcnt x_bf5mcnt x_bf6mcnt
x_bf9mcnt x_bf12mcnt x_bf18mcnt x_bfe3mcnt x_bfe6mcnt
;
days = dov-dob; *a temporary variable used in some comparisons;
do _i_= 1 to 15;
/* start denominator assignment, default to NO/ 0 */
bftot[_i_]=0 ;
if year(DOB + day[_i_]) = year(dov) then do;
/* tstdate is a temporary variable that contains value based on the parameter
in the DAY array added to DOB to test if the result is within the time frame
of valid data for the year (date of visit)
*/
tstdate = DOB + day[_i_];
format tstdate mmddyy10.;
/* if birth date plus increment is later than date of visit then not valid to consider*/
if tstdate > DOV then bftot[_i_]=0;
/* if age at visit more than 24 months (732 days)
then over date range of interest*/
else if dov > (dob+732) then bftot[_i_]=0;
/* set valid denominator only when a valid breastfeeding info may be present*/
else if tstrng in ('99','91') then bftot[_i_]=0;
else bftot[_i_] = 1;
/* the exclusive BF have an different tests to pass*/
if _i_ in (14,15) then do;
bftot[_i_]=0;
if tstrng in('21','19') and days ge day[_i_] then bftot[_i_]=1;
end;
end;
else bftot[_i_] = 0;
/* end denominator assignment and start numerator*/
/* ONLY ASSIGN NUMERATOR WHERE THE DENOMINATOR IS 1!!! Likely we have
another hidden issue about weeks here and original data collection
*/
ltdays = (lgthbrst < 99)*lgthbrst*7;
IF bftot[_i_]=1 then do;
bfcnt[_i_]=0;
/* If 1) currently breast feeding and time is long enough then counts
OR if ever bf but stopped and weeks BF info is long enough then assign
*/
wk2days = (lgthbrst < 99)*lgthbrst*7;
/* here's the special week to days adjustment*/
select ( lgthbrst );
when (13) wk2days= 92;
when (17) wk2days=122;
when (21) wk2days=153;
when (26) wk2days=183;
when (39) wk2days=274;
when (52) wk2days=366;
when (78) wk2days=548;
otherwise; *do nothing the value previous calculated should be sufficient;
end;
if cbrstfd='1' or (evrbrstfd='1' and (wk2days ge day[_i_]))
then bfcnt[_i_]=1;
if _i_ in (14,15) then do;
/* if supplement foods were introduced before the minimum date of the exclusive feeding period
then NOT exclusive. Note on Intro Supp Feeding coding: 31 = 31 or more weeks
77 = currently/only breastfed, 99 missing or not applicable (supposedly also
never breast fed BUT also seems to be automatically assigned if age is 24 months
or older. Since 31 weeks is greater than 6 months, which is the max value we
worry about then the calc should be okay
*/
supdays= (suppfeed< 99)*(suppfeed*7);
/* similar adjustment for days before suplementary feeding*/
if suppfeed=13 then supdays= 92;
else if suppfeed=26 then supdays=183;
if bfcnt[_i_]=1 and (supdays lt day[_i_]) then bfcnt[_i_]=0;
end; *exculsive numerator;
end;*numerator assignment;
end;
/* I'm investigating using the number of defined numerators as the selection criteria for keeping records*/
NumeratorCount = N(of bfcnt
/* remove the temporary variables, uncomment for debugging*/
/* drop days tstrng tstdate ltdays wk2days supdays*/
/* ;*/
Hi BallardW,
You got it in one! It is PedNSS and, looking at your code, I probably would not have figured this out on my own....or at least not any time soon, so thank you. Just for my own understanding, what you are doing is adding days in your array to confirm that someone is in the denominator and, if they are, classifying them as breastfed or not breastfed? And once they stop being breastfed they drop out of the numerator, into the denominator and continue along until the age out (in my case at age 5)?
Thank you again!
Erica
Yes, that's what I'm doing. And it did take more than a couple hours to work this out. I spent some time going through the code repository that CDC had developed for PedNSS processing and then tested my submission files against copies of CDC supplied "master" single record files. I think I had a few differences from my raw submissions but they were due to record exclusions.
In our case there isn't any actual breastfeeding information is records for children seen at some age around 24 months. So the extracts had all 9, not reported data, at that point.
This code does calculate for each visit so a further filter may be needed to "final" status for a period. With 1/0 coding that's easy with Proc Summary looking for the MAX value of the numerator and denominator values.
If I need to know the BF history of a child at age 5 I merge in data from the time with the
What I am running into is that there is no "Date of Visit" variable in this extract, or at least no date that is intuitive. Unfortunately our WIC program does not have a good handle on the data. All they know is that what they gave me is what they gave to CDC and they want me to make the same table. We do have initial visit date (which may actually be date of visit) and I suspect that when the quarterly files were merged only one visit was kept which is making this analysis more painful than necessary.
Do you think appending the quarters for the 5 years (under the assumption that the kids have at least one record for each year) and then transposing "Initial visit" into Initial Visit1--Initial VisitN , then using the last visit that is within the year of reporting as a DOV in your code might work? As my extract stands, that is the only date that is not tied to a specific biometric measure (e.g. breastfeeding, height, weight)
If the file was created in the same format as a PedNSS submission file then the date should be in columns 20-27 and read it with an MMDDYY8. format. Also if this is PedNSS data there should be a column for date of most recent breastfeeding response which should substitute for DOV if different than the DOV (not often in my data).
Initial Visit Date sounds like a PNSS data extract with mothers information which would not have sufficient information to create the time interval breast feeding variables such as introduction of supplementary feeding and length of time breastfed and should only be one record per report year.
OK, it looks like what my import statement (some legacy code from an epi from long, long ago) calls intvistdt is what you call DOV since it is positions 20-27 (see below). Why on earth everything was done as characters I have no clue, but I am going to fix it when I am past the "DO THIS NOW!" crunch,
Thank you so much for your help because I was seriously flying blind!
data pednss_2_2012;
infile 'G:\.....\XAZ_PedNSS_2012_Q02_20120925.dat' LRECL=400;
input substate $3. clinic $9. zipcode $5. source_data $1. record_type $1. intvistdt $8. infantid $60. child_dob $8. gender $1. race $6. wic_race $3. contr_race $2.
hholdsize $2. hholdmoincome $4. migrant $1. wic $1. foodstp $1. medicaid $1. tanf $1. bwtenglish $4. bwtmetric $4. htinch $3. htmetric $4. wtenglish $4. wtmetric $4.
htwtmeas_dt $8. hgbin $3. hmct $3. hgbhmctmeas_dt $8. cur_brstfed $1. ever_brstfed $1. brstfed_length $2. lstbrstfed_dt $8. intro_suppl $2. tv $1. hshld_smokn $1.;
run;
Generally if it isn't a value to do arithmetic with I leave them as character. Exceptions are sometimes Yes/no codes that I'll make 1/0 with custom informats.
The date variables need to be SAS dates valued or you'll get nowhere with the other code.
Here's part of my code to read a PedNSS submission file
Length
state $ 2 substate $ 1 clinic $ 9 zip $ 5 sod 3 rcdtyp $ 1
dov 8 idchild $ 30 dob 8 sex 3 race $ 6 wicrace $ 3 contrace $ 2
hhsize 3 income 8 migrant $ 1 wic $ 1 stamps $ 1 medicaid $ 1
tanf $ 1 bwteng 8 bwtmtr 8 hgteng 3 hgtmtr 8 wgteng 8 wgtmtr 8
doght 8 hemogl 3 hemato 3 dohemo 8 cbrstfd $ 1 evrbrstfd $ 1
lgthbrst 3 dobrst 8 suppfeed 3 tvview $ 1 hhsmoke $ 1
;
INPUT
state 1-2 substate 3 clinic 4-12 zip 13-17 sod 18
rcdtyp 19 dov mmddyy8. idchild 28-57 @88 dob mmddyy8. sex 96
race 97-102 wicrace 103-105 contrace 106-107 hhsize 108-109
income 110-113 migrant 114 wic 115 stamps 116 medicaid 117
tanf 118 bwteng 119-122 bwtmtr 123-126 hgteng 127-129
hgtmtr 130-133 wgteng 134-137 wgtmtr 138-141 doght ?? mmddyy8.
hemogl 150-152 hemato 153-155 dohemo ?? mmddyy8. cbrstfd 164
evrbrstfd 165 lgthbrst 166-167 dobrst ?? mmddyy8. suppfeed 176-177
tvview 178 hhsmoke 179 ;
/* currently the documented data stops at column 179. Columns 180-209 */
/* are reserved for future CDC use. Columns 210+ are for state added */
/* data. We do have some thing showing up in columns 210 and 211 but */
/* it is not documented. */
LABEL
state = "State"
substate = "Substate"
clinic = "Clinic/School"
zip = "Zip Code"
sod = "Source of Data"
rcdtyp = "Record Type"
dov = "Date of Visit"
idchild = "Child’s Alphanumeric ID"
dob = "Date of Birth"
sex = "Sex"
race = "Race/Ethnicity (6 level)"
hisp = "Ethnicity (1=Hispanic,0=Non-hispanic)"
wicrace = "WIC Race/Ethnicity"
contrace = "Contributor-Specific Race/Ethnicity"
hhsize = "Household Size"
income = "Monthly Household Income"
migrant = "Migrant Status"
wic = "WIC"
stamps = "Food Stamps"
medicaid = "Medicaid"
tanf = "TANF"
bwteng = "Birthweight English (LBOz)"
bwtmtr = "Birthweight Metric (gm)"
hgteng = "Height English (InIn 1/8ths)"
hgtmtr = "Height Metric (mm)"
wgteng = "Weight English (LB fourths)"
wgtmtr = "Weight Metric (kg implied decimal)"
doght = "Date of Height/Weight Measure"
hemogl = "Hemoglobin (g/Dl impled decimal)"
hemato = "Hematocrit (% implied tenth)"
dohemo = "Date of Hemoglobin/Hematocrit Measure"
cbrstfd = "Currently Breastfed"
evrbrstfd = "Ever Breastfed"
lgthbrst = "Length of Time Breastfed (if ever and stopped) weeks 99=none/unk"
dobrst = "Date of Most Recent Breastfeeding"
suppfeed = "Introduction to Supplementary Feeding, age in weeks"
tvview = "TV Viewing"
hhsmoke = "Household Smoking"
distr = "WIC District"
;
format dov dob doght dohemo dobrst mmddyy10.;
Thank you again! I have been substringing the characters into separate mm, dd, and yy variables and then turning them into SAS dates like this:
lstbrstfeddt_mo= substr(lstbrstfed_dt,1,2);
lstbrstfeddt_dy= substr(lstbrstfed_dt,3,2);
lstbrstfeddt_yr= substr(lstbrstfed_dt,5,4);
dobrstfd = MDY(lstbrstfeddt_mo ,lstbrstfeddt_dy ,lstbrstfeddt_yr);
which is a pretty silly, high effort way to do it, now that I think about it. I was being lazy about changing the import code and created more work for myself, I do believe. It made sense at the time, I suppose, but that can be said of most bad decisions
If that's the worst extra work you ever cause yourself then be happy.
When I inherited some of this I reduced some of the code from 8 or 9 datasets and about 800 lines of code to a single data step with roughly 200 lines, and many of those lines were labels to document what the variables were supposed to be. And code exactly like you posted was part of it.
Learn Proc Format, invalue and cntlin and cntlout options and there are quite a few tasks that get much easier for reading data.
Hi Ballard,
Thank you again for all of your help. I was able to use your code to get withing 0.X% of CDC's numbers and get the totals within +/- 5-10 people. The tweak to my data was that current breasteeding and ever breastfed did not have missing coded as 9, they were just blank, so I recoded the missing. Worked like a charm!
Thanks for everything,
Erica
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.