Hi,
I'm not sure if this is the right community to post in, but this is my issue. I have a dataset with millions of rows set up like this:
Record Number | DOS |
---|---|
1 | 1/1/2006 |
2 | 6/5/2007 |
1 | 1/24/2009 |
3 | 7/1/2011 |
4 | 5/22/2012 |
3 | 8/14/2010 |
What I am trying to make is a data set that looks like this (with a first DOS and a binary for if they had a date of service in each year):
Record Number | First DOS | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 |
---|---|---|---|---|---|---|---|---|---|
1 | 1/1/2006 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 6/5/2007 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 8/14/2010 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
4 | 5/22/2012 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
I have no idea how to get from A to B. I'm very new to SAS and any help would be much appreciated!
Thank you
I think this might be passed to the underlying database to do the work:
proc sql;
create table want as
select
Record_Number,
Min(dos) as First_DOS,
max(case when dos>='01jan2006:0:0'dt and dos < '01Jan2007:0:0'dt then 1 else 0 end) as DOS_2006,
max(case when dos>='01jan2007:0:0'dt and dos < '01Jan2008:0:0'dt then 1 else 0 end) as DOS_2007
from
have
group by record_number;
quit;
Hi,
Well you have two parts to your request there. Firstly getting a transposed list of years, the second part being getting the first DOS date. See the steps below:
data have;
record_num=1; dos='01JAN2006'd; output;
record_num=2; dos='06MAY2006'd; output;
record_num=1; dos='24JAN2009'd; output;
run;
data inter; /* Assign a text year */
set have;
length year $4.;
year=put(year(dos),4.);
flag=1;
run;
proc sort data=inter nodupkey; /* Get rid of duplicates as we only need one per rec-num */
by record_num year;
run;
proc transpose data=inter out=t_want; /* Transpose the data to get yearly values */
by record_num;
var flag;
id year;
idlabel year;
run;
proc sql; /* Merge on the minimum date */
create table WANT (drop=_name_) as
select A.*,
B.MIN_DOS format=date9.
from T_WANT A
left join (select distinct RECORD_NUM,MIN(DOS)as MIN_DOS from HAVE group by RECORD_NUM) B
on A.RECORD_NUM=B.RECORD_NUM;
quit;
Thanks- I can't do the first step though. The data already exists through an ODBC connection and is 65+ million rows, so I'm not sure how to go through and reformat the dates. I tried the second step without doing the first, just in case because it is set up like 25JAN2014 (I misspoke before) but also has 0:00:0000:00 included in the date, so I'm not sure how to complete the first step.
Look up the DATEPART function.
data have;
input RecordNumber DOS :mmddyy10.;
format DOS mmddyy10.;
datalines;
1 1/1/2006
2 6/5/2007
1 1/24/2009
3 7/1/2011
4 5/22/2012
3 8/14/2010
;
data want(drop=i);
set have;
array yr{*} y2006-y2013;
do i=2006 to 2013;
if year(dos)=i then yr{i-2005}=1;
else yr{i-2005}=0;
end;
run;
My suggestion is very similar to stat's:
data have;
informat record_number 8.;
informat dos anydtdtm.;
infile cards dlm=',';
input record_number dos;
cards;
1,01JAN2006 14:30:08
2,05JUN2007 14:30:08
1,24JAN2009 14:30:08
3,01JUL2011 14:30:08
4,22MAY2012 14:30:08
3,22AUG2010 14:30:08
;
proc sort data=have out=want;
by record_number descending dos;
run;
data want;
format First_DOS mmddyy10.;
set want (rename=(dos=First_DOS));
by record_number;
array years(2006:2012) _2006-_2012;
retain years;
if first.record_number then do _n_=2006 to 2012;
years(_n_)=0;
end;
years(year(datepart(First_DOS)))=1;
First_DOS=datepart(First_DOS);
if last.record_number then output;;
run;
I think this might be passed to the underlying database to do the work:
proc sql;
create table want as
select
Record_Number,
Min(dos) as First_DOS,
max(case when dos>='01jan2006:0:0'dt and dos < '01Jan2007:0:0'dt then 1 else 0 end) as DOS_2006,
max(case when dos>='01jan2007:0:0'dt and dos < '01Jan2008:0:0'dt then 1 else 0 end) as DOS_2007
from
have
group by record_number;
quit;
Thank you SO much! This worked amazingly well - I seriously appreciate it so much.
Got it to work! Thank you everyone for your help- I'd be lost otherwise! I really appreciate all your input!
data have; input RecordNumber DOS :mmddyy10.; format DOS mmddyy10.; datalines; 1 1/1/2006 2 6/5/2007 1 1/24/2009 3 7/1/2011 4 5/22/2012 3 8/14/2010 ; run; proc sort data=have; by RecordNumber DOS;run; data have; set have; by RecordNumber; retain first_dos var 1; if first.RecordNumber then first_dos=DOS; year=year(DOS); format first_dos date9.; run; proc transpose data=have out=temp(drop=_name_) prefix=_; by RecordNumber first_dos; id year; var var; run; proc stdize data=temp out=want missing=0 reponly;run;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.