Hi,
I have a dataset at the person level, with an ID variable and several month/year variables.
ID 200101 200102 200103............through 201505
1 1 . 1
2 . . 1
3 . 1 1
Values of the month/year variables can either be 1 or missing.
What I need is to create a new variable that tells me the date at which someone had their first value of 1. So for ID 1, that would be 200101, for 2, it would be 200103, etc.
I'm hoping for a solution that doesn't involve typing out everything by hand as I have over 100 month/year variables.
Thanks!
Hi, made some assumptions as to how your data looks. Solution also creates the variable as a real date. If you don't care about that and just want the variable name, use the second data step.
data x;
input id d200101-d200103;
datalines;
1 1 . 1
2 . . 1
3 . 1 1
;
data y;
format first1 yymon7.;
set x;
array d(*) d: ;
first1 = input(substr(vname(d(whichn( 1,of d: ))),2,6),yymmn6.);
run;
Obs date id d200101 d200102 d200103
1 2001JAN 1 1 . 1
2 2001MAR 2 . . 1
3 2001FEB 3 . 1 1
Alternative, FIRST1 as a character variable (variable name with the first occurrence of a 1) ...
data y;
set x;
array d(*) d: ;
first1 = vname(d(whichn( 1,of d: )));
run;
data have;
input id d200101-d200103;
datalines;
1 1 . 1
2 . . 1
3 . 1 1
;
data want;
set have;
array a{*} d200101-d200103;
do _n_=1 to dim(a);
if a{_n_}=1 then do;
date_first=vname(a{_n_});
leave;
end;
end;
run;
proc print;
run;
As I always comment on these types of things, it is not a good idea to have data in your header row. The name of a variable is an identifier and not data. At worst, I would recommend changing the columns headers to have the same prefex and a suffix incrementor:
VAR1, VAR2
This way you can use VAR:, for example. Far simpler coding. However as you have data in your column names then I would suggest you normlise your data:
ID YEAR RESULT
1 2012 0
1 2013 1
...
This will make your life far easier, for instnace your question on how to find the first 1 record:
proc sql;
select ID,min(YEAR) as MIN_YEAR
from HAVE
group by ID
having RESULT=1;
quit;
You see the data is there to be processed on.
Here's another solutions:
data have;
input ID '200101'n '200102'n '200103'n;
cards;
1 1 . 1
2 . . 1
3 . 1 1
;
proc transpose data=have out=tran(rename=_NAME_ = DATE);by id;
proc sort data=tran;by id descending col1 date;
data prep;
set tran;
by id descending col1 date;
if first.id then first = date;
run;
proc sql;
create table want as
select distinct a.*, b.First
from have a left join
prep b on
a.id = b.id
where not missing(b.first)
order by id;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.