Hi,
I'm not sure if this is the right community to post in, but this is my issue. I have a dataset with millions of rows set up like this:
Record Number | DOS |
---|---|
1 | 1/1/2006 |
2 | 6/5/2007 |
1 | 1/24/2009 |
3 | 7/1/2011 |
4 | 5/22/2012 |
3 | 8/14/2010 |
What I am trying to make is a data set that looks like this (with a first DOS and a binary for if they had a date of service in each year):
Record Number | First DOS | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 |
---|---|---|---|---|---|---|---|---|---|
1 | 1/1/2006 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 6/5/2007 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 8/14/2010 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
4 | 5/22/2012 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
I have no idea how to get from A to B. I'm very new to SAS and any help would be much appreciated!
Thank you
I think this might be passed to the underlying database to do the work:
proc sql;
create table want as
select
Record_Number,
Min(dos) as First_DOS,
max(case when dos>='01jan2006:0:0'dt and dos < '01Jan2007:0:0'dt then 1 else 0 end) as DOS_2006,
max(case when dos>='01jan2007:0:0'dt and dos < '01Jan2008:0:0'dt then 1 else 0 end) as DOS_2007
from
have
group by record_number;
quit;
Hi,
Well you have two parts to your request there. Firstly getting a transposed list of years, the second part being getting the first DOS date. See the steps below:
data have;
record_num=1; dos='01JAN2006'd; output;
record_num=2; dos='06MAY2006'd; output;
record_num=1; dos='24JAN2009'd; output;
run;
data inter; /* Assign a text year */
set have;
length year $4.;
year=put(year(dos),4.);
flag=1;
run;
proc sort data=inter nodupkey; /* Get rid of duplicates as we only need one per rec-num */
by record_num year;
run;
proc transpose data=inter out=t_want; /* Transpose the data to get yearly values */
by record_num;
var flag;
id year;
idlabel year;
run;
proc sql; /* Merge on the minimum date */
create table WANT (drop=_name_) as
select A.*,
B.MIN_DOS format=date9.
from T_WANT A
left join (select distinct RECORD_NUM,MIN(DOS)as MIN_DOS from HAVE group by RECORD_NUM) B
on A.RECORD_NUM=B.RECORD_NUM;
quit;
Thanks- I can't do the first step though. The data already exists through an ODBC connection and is 65+ million rows, so I'm not sure how to go through and reformat the dates. I tried the second step without doing the first, just in case because it is set up like 25JAN2014 (I misspoke before) but also has 0:00:0000:00 included in the date, so I'm not sure how to complete the first step.
Look up the DATEPART function.
data have;
input RecordNumber DOS :mmddyy10.;
format DOS mmddyy10.;
datalines;
1 1/1/2006
2 6/5/2007
1 1/24/2009
3 7/1/2011
4 5/22/2012
3 8/14/2010
;
data want(drop=i);
set have;
array yr{*} y2006-y2013;
do i=2006 to 2013;
if year(dos)=i then yr{i-2005}=1;
else yr{i-2005}=0;
end;
run;
My suggestion is very similar to stat's:
data have;
informat record_number 8.;
informat dos anydtdtm.;
infile cards dlm=',';
input record_number dos;
cards;
1,01JAN2006 14:30:08
2,05JUN2007 14:30:08
1,24JAN2009 14:30:08
3,01JUL2011 14:30:08
4,22MAY2012 14:30:08
3,22AUG2010 14:30:08
;
proc sort data=have out=want;
by record_number descending dos;
run;
data want;
format First_DOS mmddyy10.;
set want (rename=(dos=First_DOS));
by record_number;
array years(2006:2012) _2006-_2012;
retain years;
if first.record_number then do _n_=2006 to 2012;
years(_n_)=0;
end;
years(year(datepart(First_DOS)))=1;
First_DOS=datepart(First_DOS);
if last.record_number then output;;
run;
I think this might be passed to the underlying database to do the work:
proc sql;
create table want as
select
Record_Number,
Min(dos) as First_DOS,
max(case when dos>='01jan2006:0:0'dt and dos < '01Jan2007:0:0'dt then 1 else 0 end) as DOS_2006,
max(case when dos>='01jan2007:0:0'dt and dos < '01Jan2008:0:0'dt then 1 else 0 end) as DOS_2007
from
have
group by record_number;
quit;
Thank you SO much! This worked amazingly well - I seriously appreciate it so much.
Got it to work! Thank you everyone for your help- I'd be lost otherwise! I really appreciate all your input!
data have; input RecordNumber DOS :mmddyy10.; format DOS mmddyy10.; datalines; 1 1/1/2006 2 6/5/2007 1 1/24/2009 3 7/1/2011 4 5/22/2012 3 8/14/2010 ; run; proc sort data=have; by RecordNumber DOS;run; data have; set have; by RecordNumber; retain first_dos var 1; if first.RecordNumber then first_dos=DOS; year=year(DOS); format first_dos date9.; run; proc transpose data=have out=temp(drop=_name_) prefix=_; by RecordNumber first_dos; id year; var var; run; proc stdize data=temp out=want missing=0 reponly;run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.