Hello guys
I have an interesting query. I have a dataset with certain columns named as start date, end date, type, item, For instance the toy data set looks as:
start _date end_date type category
1/1/2011 5/1/2011 A 1945
16/1/2011 30/1/2011 B 1945
2/2/2011 3/2/2011 c 1945
4/2/2011 5/2/2011 A 1945
16/1/2011 30/1/2011 B 1945
17/1/2011 25/1/2011 B 1945
Now basically this information I have for entire year up to December. Now I want to extract the information from the data set in vertical manner so that my output dataset looks like the following:
date Type
1/1/2011 A 1945
2/1/2011 A 1945
3/1/2011 A 1945
4/1/2011 A 1945
5/1/2011 A 1945
so on
16/1/2011 B 1945
17/1/2011 B 1945
18/1/2011 B 1945
19/1/2011 B 1945
so on up to
30/1/2011 B 1945
2/2/2011 C
3/2/2011 C
4/2/2011 A
5/2/2011 A
So essentially the tasks are two fold:
Firstly the information given in range(start date :end date) should appear row wise with each date corresponding to one row.
Secondly, corresponding to input data set, I want to get the value of type on that particular day. For instance if for the range 16/1/2011-30/1/2011 type has value B then in the output data set, each row corresponding to one single date should as entry B
I want to automate the above process as the input data set is enormous, so basically if one can help me with any code/macro which would automatically accomplish the above task, that would be highly appreciated.
Thanks
Ravinder
This should be straightforward, and won't involve macro language. Assuming that your date variables are actual SAS dates, you could try:
data want;
set have;
do date = start_date to end_date;
output;
end;
keep date type category;
format date ddmmyy10.;
run;
Good luck.
Thanks For your response. That worked perfectly. Now I have one more tricky task. Once we have the data in following format
Date type category
1/2/2011 A 1945
ate Type
1/1/2011 A 1945
2/1/2011 A 1945
3/1/2011 A 1945
4/1/2011 A 1945
5/1/2011 A 1945
so on
16/1/2011 B 1945
17/1/2011 B 1945
18/1/2011 B 1945
19/1/2011 B 1945
now for each type we need to create separate variable such as 1 occurs where it is present and zero occurs where it is absent...
date Category
1/1/2011 1945 TypeA TypeB
2/1/2011 1945 1 0
3/1/2011 1945 1 0
4/1/2011 1945 1 0
so on...
16/1/2011 1945 0 1
17/1/2011 1945 0 1
So basically for each type column value we need to create new binary columns having one for that row in which that particular value occurs.
Thanks
Ravinder
Also due to size of the dataset we cannot adhere to use of IF/else conditional processing. We need to automate in some other way.
Just add two statements into Astounding's code .
data want;
set have;
do date = start_date to end_date;
typeA=(type='A');
typeB=(type='B');
output;
end;
keep date type category;
format date ddmmyy10.;
run;
Ksharp
But in actual data set, there are about 150 type values(that too very long descriptions, type is just toy name to explain here), I cannot include typeA=(type='A') for each of them. It would be very tedious.
Thanks
Ravi
Will this list of names change in the future?
no for now it is static only...150 different type or descriptor variables
Try this array() approach, it is a variation base on Astounding and Ksharp's suggestion.
data have;
infile cards truncover;
input (start_date end_date) (:ddmmyy10.) (type category) (:$);
format start_date end_date ddmmyy10.;
cards;
1/1/2011 5/1/2011 A 1945
16/1/2011 30/1/2011 B 1945
2/2/2011 3/2/2011 c 1945
4/2/2011 5/2/2011 A 1945
16/1/2011 30/1/2011 B 1945
17/1/2011 25/1/2011 B 1945
;
proc sql noprint;
select distinct cats('type','_', type) into :type separated by ' ' from have;
select cats(count(distinct type)) into :ct from have;
quit;
data want;
set have;
array tp(&ct) &type.;
do date = start_date to end_date;
do i=1 to dim(tp);
tp(i)=(scan(vname(tp(i)),-1,'_')=type);
end;
output;
end;
keep date type: category;
format date ddmmyy10.;
run;
Haikuo
Yeah. HaiKuo have already give you answer.
Here is my version:
data have; infile cards truncover; input (start_date end_date) (:ddmmyy10.) (type category) (:$); format start_date end_date ddmmyy10.; cards; 1/1/2011 5/1/2011 A 1945 16/1/2011 30/1/2011 B 1945 2/2/2011 3/2/2011 c 1945 4/2/2011 5/2/2011 A 1945 16/1/2011 30/1/2011 B 1945 17/1/2011 25/1/2011 B 1945 ; run; proc sql noprint; select distinct cats('type',type,'=(type="',type,'") ;') into :type separated by ' ' from have; quit; data want; set have; do date = start_date to end_date; &type output; end; format date ddmmyy10.; run;
Ksharp
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.