Re: How to deal with the dataset with affiliated subjects or participa...

CynthiaWei · Posted 10-11-2019 07:53 PM

Hi SAS Experts,

I am having a project to mange a dataset with affiliated subjects, which means there is a lady who is a mother from one family (subject=1) and her daughter (subject=1.1) and her son (subject=1.2) etc.

I need to create some variables describing every mother-kid situation, such as whether they are all female, whether they all have fiber everyday, and what is the absolute difference of age.

The dataset I have is this:

family	ID	female	Has fiber every day	age	Mother	Kids
4	4	1	1	25	1	0
4	4.1	0	1	29	0	1
4	4.2	1	0	34	0	1
4	4.3	1	1	45	0	1
5	5	1	1	42	1	0
5	5.1	1	1	53	0	1
6	6	1	0	23	1	0
6	6.1	0	0	29	0	1
6	6.2	0	1	36	0	1
7	7	1	1	48	1	0
7	7.1	1	0	23	0	1
8	8	1	0	53	1	0
8	8.1	0	0	54	0	1
8	8.2	1	1	37	0	1
8	8.3	1	1	41	0	1
8	8.4	0	1	26	0	1

I need to calculate: (I highlight the want part in yellow)

1. Both the mother and her daughter or son have fiber everyday: yes or no

2. Both they are female, which mean both mother and her daughter participated in the study: yes or no

3. Count of both have fiber by mother: for example family 4 has 2

4. Count of both are female by mother: for example family 8 has 2

5. Total count of both have fiber in the study: there are 3 mother-kid cases having fiber everyday

6. Total count of both female in the study: it is 6

7. Absolute difference of age between the mother and her daughter or son

8. Number of kids participated in the study by mother

9. Mean of absolute difference of age across all mother-kid cases

The calculation shouldn't be very difficult, but the display of the current dataset stop me from doing those calculation. Can SAS do some calculation by family number?

This is very challenging for me to use SAS.

Thank you very much for any help!

with regards,

hashman · Posted 10-12-2019 12:23 AM

@CynthiaWei:

"This is very challenging for me to use SAS"

It's not at all as difficult as you may think - it's nothing but an acquired skill ;).

data have ;                                                                                                                                                                                                                                                     
  input family id female hfed age mother kids ;                                                                                                                                                                                                                 
  cards ;                                                                                                                                                                                                                                                       
4  4    1  1  25  1  0                                                                                                                                                                                                                                          
4  4.1  0  1  29  0  1                                                                                                                                                                                                                                          
4  4.2  1  0  34  0  1                                                                                                                                                                                                                                          
4  4.3  1  1  45  0  1                                                                                                                                                                                                                                          
5  5    1  1  42  1  0                                                                                                                                                                                                                                          
5  5.1  1  1  53  0  1                                                                                                                                                                                                                                          
6  6    1  0  23  1  0                                                                                                                                                                                                                                          
6  6.1  0  0  29  0  1                                                                                                                                                                                                                                          
6  6.2  0  1  36  0  1                                                                                                                                                                                                                                          
7  7    1  1  48  1  0                                                                                                                                                                                                                                          
7  7.1  1  0  23  0  1                                                                                                                                                                                                                                          
8  8    1  0  53  1  0                                                                                                                                                                                                                                          
8  8.1  0  0  54  0  1                                                                                                                                                                                                                                          
8  8.2  1  1  37  0  1                                                                                                                                                                                                                                          
8  8.3  1  1  41  0  1                                                                                                                                                                                                                                          
8  8.4  0  1  26  0  1                                                                                                                                                                                                                                          
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data detail (drop = cnt: tot: mean: _:)                                                                                                                                                                                                                         
     group  (keep = family cnt:)                                                                                                                                                                                                                                
     total  (keep = family tot: mean:)                                                                                                                                                                                                                          
  ;                                                                                                                                                                                                                                                             
  do _n_ = 1 by 1 until (last.family) ;                                                                                                                                                                                                                         
    set have end = z ;                                                                                                                                                                                                                                          
    by family ;                                                                                                                                                                                                                                                 
    if _n_ = 1 then do ;                                                                                                                                                                                                                                        
      _m_hfed = hfed ;                                                                                                                                                                                                                                          
      _m_age  = age ;                                                                                                                                                                                                                                           
    end ;                                                                                                                                                                                                                                                       
    else do ;                                                                                                                                                                                                                                                   
      both_hfed     = _m_hfed and hfed ;                                                                                                                                                                                                                        
      both_fem      = female ;                                                                                                                                                                                                                                  
      abs_age_diff  = abs (age - _m_age) ;                                                                                                                                                                                                                      
      cnt_both_hfed + both_hfed ;                                                                                                                                                                                                                               
      cnt_both_fem  + both_fem ;                                                                                                                                                                                                                                
      cnt_kids      + kids ;                                                                                                                                                                                                                                    
      _aads         + abs_age_diff ;                                                                                                                                                                                                                            
      _n_aads       + 1 ;                                                                                                                                                                                                                                       
    end ;                                                                                                                                                                                                                                                       
    output detail ;                                                                                                                                                                                                                                             
  end ;                                                                                                                                                                                                                                                         
  output group ;                                                                                                                                                                                                                                                
  tot_both_hfed + cnt_both_hfed ;                                                                                                                                                                                                                               
  tot_both_fem  + cnt_both_fem ;                                                                                                                                                                                                                                
  call missing (of cnt:) ;                                                                                                                                                                                                                                      
  if z ;                                                                                                                                                                                                                                                        
  mean_abs_age_diff = divide (_aads, _n_aads) ;                                                                                                                                                                                                                 
  output total ;                                                                                                                                                                                                                                                
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want ;                                                                                                                                                                                                                                                     
  merge detail group ;                                                                                                                                                                                                                                          
  by family ;                                                                                                                                                                                                                                                   
  if _n_ = 1 then set total ;                                                                                                                                                                                                                                   
run ;

Kind regards

Paul D.

CynthiaWei · Posted 10-12-2019 01:15 AM

Hi Paul,

I really appreciate your prompt response a lot! I can't believe you provide me with the whole syntax for my question. It is awesome! Thank you very much again!

Right now, I just realized that I don't have family variable yet. I use the following code to create them but it looks it doesn't work, that is the family variable is not generated by using the following code, could you please help me correct it?

data want;
set have;
family=substr(id,1,1);
run;

Is there something wrong?

Thank you very much!

Kind regards,

hashman · Posted 10-12-2019 01:29 AM

@CynthiaWei:

If your ID is a numeric variable (which I suspect it is), using SUBSTR causes an implicit type conversion of ID to a character variable using the BEST12. format, right-justifying the result from which you are trying to extract the first byte using SUBSTR (ID,1,1). But because of the right justification, that byte is blank, hence the problem. Just use:

  family = int (ID) ;

and you will be all right.

Kind regards

Paul D.

CynthiaWei · Posted 10-12-2019 04:32 AM

Thank you Paul again!

I still have some questions.

1. Both female and hfed are categorical variables, why you only created _m_hfed for fhed?

2. what does _aads mean? Does it mean the total of age absolute difference? I can use any name for it, right? It is not SAS default name, right?

3. What does _n_aads mean? Why there is +1. Is it SAS default name? So, for other continuous variables, I just need to change the variables name based on my dataset, right?

4. Why both_hfed =_m_hfed and hfed, but both_fem=female?

both_hfed     = _m_hfed and hfed ;                                                                                                                                                                                                                        
      both_fem      = female ;

Thank you so much for your great instruction!

Kind regards,

Cynthia

hashman · Posted 10-12-2019 03:24 PM

@CynthiaWei:

The code under _N_=1 (i.e. the first record in the family - which is the mother's record) is there to create baseline variables to be used in each child record (i.e. not _N_=1) to determine what you need. HFED for the mother can be either 1 or 0, so to determine _both_hfed we need to know which one. Hence we need _M_HFED created at the mother's record (_N_=1); likewise, we need the baseline mother's age _M_AGE to subtract it from AGE in each child's record. But FEMALE on the mother's record is always 1, so we don't need to create something like _M_FEM at _N_=1 and then test it like (IF _M_FEM AND FEMALE) on the child records: If on a child's record, FEMALE=1, it auto-means that both the mother and the child are females. Of course, if your data are such that on the _N_=1 records, FEMALE=0 (i.e. the mother is a male, which I'd find a bit odd), then include this logic in the program. Lastly, a provision must be made to ensure that the _M_* variables keep their values created at _N_=1 when we read the child record and then reset to missing values before the next family BY group. This is achieved by using the DoW loop, i.e. the DO UNTIL (LAST.X) construct with the SET inside it: When the records in the group are being read, program control (PC) never reaches the top of the DATA step, and thus the value of any variable is kept intact unless overwritten, while before each next family BY group, PC goes back to the top of the step, where the unretained variables are auto-reset to nulls.
Yes. You can use any name you like for _AADS (and likewise for any other variable in the step except for those coming from the input).
_N_AADS just count the total number of child records for the purpose of averaging the absolute age difference. On _N_=1 (mother) the count is inactive because its execution is under ELSE on the IF _N_=1 ... ELSE ... construct. _N_AADS+1 (and the other with the + in this program) is called "the SUM statement" in SAS" and means: Set _N_AADS=0 at compile time, add 1 every time the statement executes - i.e. do the same as _N_AADS=sum(_N_AADS,1), and never auto-reset _N_AADS to a missing value at the top of the step. Note that the CNT_* variables calculated in the same manner are group values and so they are reset to nulls after each BY group using CALL MISSING, as for each new group we need to start calculating them from scratch. But _AADS and _N_AADS aren't group but total count variables, which is why they're not included in the MISSING routine call - we need them to keep accumulating across all child records in all family BY groups.
See #1.

Kind regards

Paul D.

CynthiaWei · Posted 10-15-2019 03:38 PM

Hi Paul,

I appreciate your great instruction a lot! The code made my analyses job
moving forward significantly. Thank you so much again.

Some questions associating with this:
1. What if I want to do both positive matching and negative matching too,
for example, in this project I want to know both mother and kid have fiber
everyday. What if I want to know neither mother and kid have fiber
everyday. What code should I use?
2. If I want to add Where statement or having statement, where should I add
them?
3. When calculating mean absolute difference of age, we created _n_aads as
the denominator. What if there are missing values in continuous variables
for which I want to calculate mean absolute difference too besides age,
such as sedentary time, daily exercise hours, or sleep hours, what new
variables should I create to deal with any possible missing values in the
above variables to get actual mean absolute difference?

This is really a great learning opportunity from you to extend my project.
I am sorry with my following questions.

Please take your time to respond them!

Thank you so much for again!!!!!!

Kind regards,

hashman · Posted 10-15-2019 08:23 PM

@CynthiaWei:

1. Add the lines:

      neither_hfed = not (_m_hfed or hfed) ;
      ...
      cnt_neither_hfed + neither_hfed ;
    ...
    tot_neither_hfed + cnt_neither_hfed ;

2. You can add a WHERE statement either right after the SET statement or as a HAVE data set option WHERE=. The Having statement is not available in the DATA step - it's part of the SQL syntax.

3. Make the SUM statements in question conditional. E.g., schematically:

  if not missing (<some variable>) then <aggregate variable> + <expression> ;

Kind regards

Paul D.

CynthiaWei · Posted 10-18-2019 05:22 PM

I really appreciate your instruction and help!

Kind regards,

Cynthia

How to deal with the dataset with affiliated subjects or participants

Re: How to deal with the dataset with affiliated subjects or participants

Re: How to deal with the dataset with affiliated subjects or participants

Re: How to deal with the dataset with affiliated subjects or participants

Re: How to deal with the dataset with affiliated subjects or participants

Re: How to deal with the dataset with affiliated subjects or participants

Re: How to deal with the dataset with affiliated subjects or participants

Re: How to deal with the dataset with affiliated subjects or participants

Re: How to deal with the dataset with affiliated subjects or participants

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away