BookmarkSubscribeRSS Feed
PFM151632
Obsidian | Level 7

i need to merge data files that contains dietary interview- individual foods data (4 files DR1IFF_G, DR1IFF_H, DR2IFF_G and DR2IFF_H converted into sas files) for which SEQN is not a unique identifier. NHANES indicates in its tutorial that analysts would need to transform the detailed food-level file into a person-level file (with one record for each person) before merging the file with other files such as demographic and other data files by using SEQN as the unique identifier.

 

How do you transform the detailed food level file into a person-level file? 

 

Thank you.

 

7 REPLIES 7
ballardw
Super User

@PFM151632 wrote:

i need to merge data files that contains dietary interview- individual foods data (4 files DR1IFF_G, DR1IFF_H, DR2IFF_G and DR2IFF_H converted into sas files) for which SEQN is not a unique identifier. NHANES indicates in its tutorial that analysts would need to transform the detailed food-level file into a person-level file (with one record for each person) before merging the file with other files such as demographic and other data files by using SEQN as the unique identifier.

 

How do you transform the detailed food level file into a person-level file? 

 

Thank you.

 


Transformed to "person-level" basically means that "something" whether it is one or more variables or other information is placed into new variables that hold the "repeated" type of information.

The exact approaches depend on the data.

 

Here is a simple example where a made up "measure" variable name becomes the names of the new variables holding a value when reduced to a "person-level":

data have;
   input id $ measurename $ value;
datalines;
abc  height 123
abc  weight  55
abc  shoesize 8
pdq  height 105
pdq  weight  44
pdq  shoesize 9
;
proc sort data=have;
   by id;
run;
proc transpose  data=have
   out=tranposed;
   by id;
   var value;
   id measurename;
run;

Proc Transpose is only one tool to do this but is designed for making long data sets wide or wide ones long based on the values. There are caveats though. When transposing the Var variables, if more then one is involved, have to be of the same type OR you use two passes and then merge the results on the identification variables values.

 

Above is a "nice" set where we have a variable, measurename, that can be used to create the names of the new variables and make sense. Sometimes that is not the case and you have to provide more coding to get a specific desired structure.

 

You would have to show specific examples of how the data set(s) look and make some decisions about how the result when merged together should be.

 

I know just enough to know that NHANES data sets can have a lot of variables. So depending on which variables you need for your analysis you may be spending lots of time getting the new variables you need.

Before merging the results from your 4 food data sets you will need to ensure that the common named variables, should be your identification variable(s) are of the same type and length in all the sets AND the demographic data you intend to add. Otherwise if the types are not the same you can't merge the data and if the variables are of different lengths there is a chance that you can lose data because of truncated values.

 

This a is pretty generic response to a fairly generic question. Examples of the data sets to work with would be needed to provide targeted responses.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

Since your data involves merging data you should likely subset the tables to at most 5 individuals to avoid creating bigger than needed data sets. And only include the variables you need unless you need every variable.

 

An example of what you expect the final result to be would ideally be provided.

 

PFM151632
Obsidian | Level 7

@ballardw thank you for your swift response.

I am a new user so it is still very difficult for me to answer your questions

I want to merge the diet datasets with demographics to be able to categorize my population into healthy diet and unhealthy diet and from there also conduct analysis on the impact of covariables. what I expect is that after merging demographics and diet files I could run the below macro for this population categorization.

Which variables I should select for it? shouldn't it be those in the macro? 

 

%macro HEI2015 (indat=,kcal=,vtotalleg=,vdrkgrleg=,f_total=,fwholefrt=,g_whole=,d_total=,

  pfallprotleg=,pfseaplantleg=,monopoly=,satfat=,sodium=,g_refined=,add_sugars=,outdat=);

 

data &outdat (drop=FARMIN FARMAX SODMAX SODMIN RGMIN RGMAX SFATMIN SFATMAX ADDSUGMIN ADDSUGMAX);

  set &indat;

 

  IF &kcal > 0 then VEGDEN=&vtotalleg/(&kcal/1000);

  HEI2015C1_TOTALVEG=5*(VEGDEN/1.1);

  IF HEI2015C1_TOTALVEG > 5 THEN HEI2015C1_TOTALVEG=5;

  IF VEGDEN=0 THEN HEI2015C1_TOTALVEG=0;

 

  IF &kcal > 0 then GRBNDEN=&vdrkgrleg/(&kcal/1000);

  HEI2015C2_GREEN_AND_BEAN=5*(GRBNDEN/0.2);

  IF HEI2015C2_GREEN_AND_BEAN > 5 THEN HEI2015C2_GREEN_AND_BEAN=5;

  IF GRBNDEN=0 THEN HEI2015C2_GREEN_AND_BEAN=0;

 

  IF &kcal > 0 then FRTDEN=&f_total/(&kcal/1000);

  HEI2015C3_TOTALFRUIT=5*(FRTDEN/0.8);

  IF HEI2015C3_TOTALFRUIT > 5 THEN HEI2015C3_TOTALFRUIT=5;

  IF FRTDEN=0 THEN HEI2015C3_TOTALFRUIT=0;     

 

  IF &kcal > 0 then WHFRDEN=&fwholefrt/(&kcal/1000);

  HEI2015C4_WHOLEFRUIT=5*(WHFRDEN/0.4);

  IF HEI2015C4_WHOLEFRUIT > 5 THEN HEI2015C4_WHOLEFRUIT=5;

  IF WHFRDEN=0 THEN HEI2015C4_WHOLEFRUIT=0;    

 

  IF &kcal > 0 then WGRNDEN=&g_whole/(&kcal/1000);

  HEI2015C5_WHOLEGRAIN=10*(WGRNDEN/1.5);

  IF HEI2015C5_WHOLEGRAIN > 10 THEN HEI2015C5_WHOLEGRAIN=10;

  IF WGRNDEN=0 THEN HEI2015C5_WHOLEGRAIN=0;

 

  IF &kcal > 0 then DAIRYDEN=&d_total/(&kcal/1000);

  HEI2015C6_TOTALDAIRY=10*(DAIRYDEN/1.3);

  IF HEI2015C6_TOTALDAIRY > 10 THEN HEI2015C6_TOTALDAIRY=10;

  IF DAIRYDEN=0 THEN HEI2015C6_TOTALDAIRY=0;

 

  IF &kcal > 0 then PROTDEN=&pfallprotleg/(&kcal/1000);

  HEI2015C7_TOTPROT=5*(PROTDEN/2.5);

  IF HEI2015C7_TOTPROT > 5 THEN HEI2015C7_TOTPROT=5;

  IF PROTDEN=0 THEN HEI2015C7_TOTPROT=0;

 

  IF &kcal > 0 then SEAPLDEN=&pfseaplantleg/(&kcal/1000);

  HEI2015C8_SEAPLANT_PROT=5*(SEAPLDEN/0.8);

  IF HEI2015C8_SEAPLANT_PROT > 5 THEN HEI2015C8_SEAPLANT_PROT=5;

  IF SEAPLDEN=0 THEN HEI2015C8_SEAPLANT_PROT=0;

 

  IF &satfat > 0 THEN FARATIO=&monopoly/&satfat;

  FARMIN=1.2;

  FARMAX=2.5;

  if &satfat=0 and &monopoly=0 then HEI2015C9_FATTYACID=0;

    else if &satfat=0 and &monopoly > 0 then HEI2015C9_FATTYACID=10;

    else if FARATIO >= FARMAX THEN HEI2015C9_FATTYACID=10;

    else if FARATIO <= FARMIN THEN HEI2015C9_FATTYACID=0;

    else HEI2015C9_FATTYACID=10* ( (FARATIO-FARMIN) / (FARMAX-FARMIN) );

 

  IF &kcal > 0 then SODDEN=&sodium/&kcal;

  SODMIN=1.1;

  SODMAX=2.0;

  IF SODDEN <= SODMIN THEN HEI2015C10_SODIUM=10;

    ELSE IF SODDEN >= SODMAX THEN HEI2015C10_SODIUM=0;

    ELSE HEI2015C10_SODIUM=10 - (10 * (SODDEN-SODMIN) / (SODMAX-SODMIN) );

 

  IF &kcal > 0 then RGDEN=&g_refined/(&kcal/1000);

  RGMIN=1.8;

  RGMAX=4.3;

  IF RGDEN <= RGMIN THEN HEI2015C11_REFINEDGRAIN=10;

    ELSE IF RGDEN >= RGMAX THEN HEI2015C11_REFINEDGRAIN=0;

    ELSE HEI2015C11_REFINEDGRAIN=10 - ( 10* (RGDEN-RGMIN) / (RGMAX-RGMIN) );

 

  IF &kcal > 0 then SFAT_PERC=100*(&satfat*9/&kcal);

  SFATMIN=8;

  SFATMAX=16;

  IF SFAT_PERC >= SFATMAX THEN HEI2015C12_SFAT=0;

    ELSE IF SFAT_PERC <= SFATMIN THEN HEI2015C12_SFAT=10;

    ELSE HEI2015C12_SFAT= 10 - ( 10* (SFAT_PERC-SFATMIN) / (SFATMAX-SFATMIN) );

 

  IF &kcal > 0 then ADDSUG_PERC=100*(&add_sugars*16/&kcal);

  ADDSUGMIN=6.5;

  ADDSUGMAX=26;

  IF ADDSUG_PERC >= ADDSUGMAX THEN HEI2015C13_ADDSUG=0;

    ELSE IF ADDSUG_PERC <= ADDSUGMIN THEN HEI2015C13_ADDSUG=10;

    ELSE HEI2015C13_ADDSUG= 10 - ( 10* (ADDSUG_PERC-ADDSUGMIN) / (ADDSUGMAX-ADDSUGMIN) );

 

 

IF &kcal=0 THEN DO;

  HEI2015C1_TOTALVEG=0; HEI2015C2_GREEN_AND_BEAN=0; HEI2015C3_TOTALFRUIT=0; HEI2015C4_WHOLEFRUIT=0; HEI2015C5_WHOLEGRAIN=0; HEI2015C6_TOTALDAIRY=0;

  HEI2015C7_TOTPROT=0;  HEI2015C8_SEAPLANT_PROT=0; HEI2015C9_FATTYACID=0; HEI2015C10_SODIUM=0; HEI2015C11_REFINEDGRAIN=0; HEI2015C12_SFAT=0; HEI2015C13_ADDSUG=0;

  END;

 

/**Calculate HEI-2015 total score**/

/*total HEI-2015 score is the sum of 13 HEI component scores*/

 

HEI2015_TOTAL_SCORE = HEI2015C1_TOTALVEG + HEI2015C2_GREEN_AND_BEAN + HEI2015C3_TOTALFRUIT + HEI2015C4_WHOLEFRUIT + HEI2015C5_WHOLEGRAIN + HEI2015C6_TOTALDAIRY +

  HEI2015C7_TOTPROT + HEI2015C8_SEAPLANT_PROT + HEI2015C9_FATTYACID + HEI2015C10_SODIUM + HEI2015C11_REFINEDGRAIN + HEI2015C12_SFAT + HEI2015C13_ADDSUG;

 

 

LABEL HEI2015_TOTAL_SCORE='TOTAL HEI-2015 SCORE'

      HEI2015C1_TOTALVEG='HEI-2015 COMPONENT 1 TOTAL VEGETABLES'

      HEI2015C2_GREEN_AND_BEAN='HEI-2015 COMPONENT 2 GREENS AND BEANS'

      HEI2015C3_TOTALFRUIT='HEI-2015 COMPONENT 3 TOTAL FRUIT'

      HEI2015C4_WHOLEFRUIT='HEI-2015 COMPONENT 4 WHOLE FRUIT'

      HEI2015C5_WHOLEGRAIN='HEI-2015 COMPONENT 5 WHOLE GRAINS'

      HEI2015C6_TOTALDAIRY='HEI-2015 COMPONENT 6 DAIRY'

      HEI2015C7_TOTPROT='HEI-2015 COMPONENT 7 TOTAL PROTEIN FOODS'

 

      HEI2015C8_SEAPLANT_PROT='HEI-2015 COMPONENT 8 SEAFOOD AND PLANT PROTEIN'

      HEI2015C9_FATTYACID='HEI-2015 COMPONENT 9 FATTY ACID RATIO'

      HEI2015C10_SODIUM='HEI-2015 COMPONENT 10 SODIUM'

      HEI2015C11_REFINEDGRAIN='HEI-2015 COMPONENT 11 REFINED GRAINS'

      HEI2015C12_SFAT='HEI-2015 COMPONENT 12 SAT FAT'

      HEI2015C13_ADDSUG='HEI-2015 COMPONENT 13 ADDED SUGAR'

      VEGDEN='DENSITY OF TOTAL VEGETABLES PER 1000 KCAL'

      GRBNDEN='DENSITY OF DARK GREEN VEG AND BEANS PER 1000 KCAL'

      FRTDEN='DENSITY OF TOTAL FRUIT PER 1000 KCAL'

      WHFRDEN='DENSITY OF WHOLE FRUIT PER 1000 KCAL'

      WGRNDEN='DENSITY OF WHOLE GRAIN PER 1000 KCAL'

      DAIRYDEN='DENSITY OF DAIRY PER 1000 KCAL'

      PROTDEN='DENSITY OF TOTAL PROTEIN PER 1000 KCAL'

      SEAPLDEN='DENSITY OF SEAFOOD AND PLANT PROTEIN PER 1000 KCAL'

      FARATIO='FATTY ACID RATIO'

      SODDEN='DENSITY OF SODIUM PER 1000 KCAL'

      RGDEN='DENSITY OF REFINED GRAINS PER 1000 KCAL'

      SFAT_PERC='PERCENT OF CALORIES FROM SAT FAT'

      ADDSUG_PERC='PERCENT OF CALORIES FROM ADDED SUGAR'

      ;

 

run;

 

 

%mend HEI2015;

 

 

 

 

/*  END OF THE HEI2015 MACRO                                       */

/*******************************************************************/

 

 

 

 

 

ballardw
Super User

The macro only tells me the names of some of the variables you need to include/ create.

Consider this snippet of that code:

  IF &kcal > 0 then VEGDEN=&vtotalleg/(&kcal/1000);
  HEI2015C1_TOTALVEG=5*(VEGDEN/1.1);
  IF HEI2015C1_TOTALVEG > 5 THEN HEI2015C1_TOTALVEG=5;
  IF VEGDEN=0 THEN HEI2015C1_TOTALVEG=0;

This means that your data set must include a varaible named VEGDEN, which should be numeric.

You are expected to provide single variable names that must be in the data referenced by the macro parameters &kcal and &vtotalleg. Whoever provided that macro code should provide details of those macro variables. Each one appears to be required for the macro call but could represent different variables.

 

Again, your choice of variables otherwise. But without starting data sets there is no way for me to know the next steps.

 

Since this smells of CDC written code the macro appears to only be expected to work for the 2015 data as well. So is that what you are using? Other years may not have the required variables. I would suggest hunting around on the NHANES web site to see if one of the SAS code files actually has the code to make a person-level file. Just don't expect the name to be obvious.

 

Basically every variable that appears on the right of an = sign that is not created in the code must be in your data.

Any variable used in an "IF" that is not created in the data step also needs to be in your data set.

PFM151632
Obsidian | Level 7

I found it - Resolved 

mkeintz
PROC Star

@PFM151632 wrote:

I found it - resolved


Then mark your topic as solved.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PFM151632
Obsidian | Level 7

@mkeintz I do not find the way to report it as solved

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1037 views
  • 2 likes
  • 3 in conversation