Solved: Re: Creating a new variable based on the lowest value of an existing v...

righcoastmike · Posted 11-30-2018 10:18 AM

Hi All,

I have this longitudinal dataset with multiple observations per studyid:

data have;
input studyID FSA $3. Exposure Age ;
cards;
1 B2Y 384 17
1 B2Y 384 17
1 B2Y 384 18
2 BgT 1000 15
3 M6D 400 14
3 M6D 400  15
3 M6D 400 16
run;

What I want to do is create a new variable called "Age_start" which is the lowest value for age for each indivdual in the study, so the "want" dataset would look like this:

data want;
input studyID FSA $3. Exposure Age age_start ;
cards;
1 B2Y 384 17 17
1 B2Y 384 17 17
1 B2Y 384 18 17
2 BgT 1000 15 15
3 M6D 400 14 14
3 M6D 400  15 14
3 M6D 400 16 14
run;

As per usual, any thoughts would be hugely appreciated. I've been playing with it for a couple of hours using proc sort and then running variations of a data step that looks like

data want;
set have;
age_start = first.age;
run;

but all I get are zeros.

Any thoughts are much appreciated.

Thank you.

Rightcoast

kiranv_ · Posted 11-30-2018 10:28 AM

data want;
set have;
by studyid;
retain age_start;
if first.studyID then 
age_start = age;
run;

View solution in original post

kiranv_ · Posted 11-30-2018 10:28 AM

data want;
set have;
by studyid;
retain age_start;
if first.studyID then 
age_start = age;
run;

novinosrin · Posted 11-30-2018 10:29 AM

data have;
input studyID FSA $3. Exposure Age ;
cards;
1 B2Y 384 17
1 B2Y 384 17
1 B2Y 384 18
2 BgT 1000 15
3 M6D 400 14
3 M6D 400  15
3 M6D 400 16
run;


proc sql;
create table want as
select *, min(age) as age_start 
from have
group by studyid
order by studyid;
quit;

Tom · Posted 11-30-2018 10:31 AM

FIRST. and LAST. variables are binary valued numbers that are generated to indicate if the current observation is the first or last in the current group defined by the values of the BY values upto and including the one you listed.

Looks like you have found a feature (A.K.A. bug) in SAS. The reference to FIRST.AGE should have generated an error since not only is AGE not in your list of BY variables but you don't even have any BY statement. Instead it looks like SAS just treated it as always FALSE. Hence the constant value of zero.

Easy to do in SQL code since SAS will automatically remerge summary statistics for you.

proc sql;
  create table want as select *,min(age) as first_age from have group by studyid;
quit;

Or if the data is sorted (either by AGE or by DATE and your AGE variables are correct) within STUDYID then you can use real BY variable processing in a data step.

data want;
  set have;
  by studyid ;
  if first.studyid then first_age=age;
  retain first_age;
run;

Now if you have some missing values of AGE then there is a possibility that FIRST_AGE will be missing. You could add this statement so that the first non-missing value is taken.

first_age=coalesce(first_age,age);

But then the earlier values will still have missing values.

You could also add what are called DOW loops. Basically placing the SET/MERGE statement inside of DO loop. In this case two DO loops. One to find the MIN and another to actually write the values.

data want;
do until (last.studyid);
  set have;
  by studyid;
  first_age=min(first_age,age);
end;
do until (last.studyid);
  set have;
  by studyid;
  output;
end;
run;

righcoastmike · Posted 11-30-2018 11:18 AM

Hi Tom,

Thanks so much for your thoughtful response. Thankfully I don't have any missing data, but I'll file it away because I'm sure it will come up in the future. The clarification on the .first/.last variables was especially helpful because I thought they were a product of proc sort generally, not of the BY statement. I'm sure that will help my programming going forward.

As always I'm super humbled and impressed by all the people willing to help out on here.

Thanks so much, hopefully one day I can get skilled enough to pay it forward and help out someone else.

Rightcoast

Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Re: Creating a new variable based on the lowest value of an existing variable

Registration is open

SAS Training: Just a Click Away