Re: Interesting Query

Ravinder · Posted 04-26-2013 10:07 AM

Hello guys

I have an interesting query. I have a dataset with certain columns named as start date, end date, type, item, For instance the toy data set looks as:

start _date end_date type category

1/1/2011 5/1/2011 A 1945

16/1/2011 30/1/2011 B 1945

2/2/2011 3/2/2011 c 1945

4/2/2011 5/2/2011 A 1945

16/1/2011 30/1/2011 B 1945

17/1/2011 25/1/2011 B 1945

Now basically this information I have for entire year up to December. Now I want to extract the information from the data set in vertical manner so that my output dataset looks like the following:

date Type

1/1/2011 A 1945

2/1/2011 A 1945

3/1/2011 A 1945

4/1/2011 A 1945

5/1/2011 A 1945

so on

16/1/2011 B 1945

17/1/2011 B 1945

18/1/2011 B 1945

19/1/2011 B 1945

so on up to

30/1/2011 B 1945

2/2/2011 C

3/2/2011 C

4/2/2011 A

5/2/2011 A

So essentially the tasks are two fold:

Firstly the information given in range(start date :end date) should appear row wise with each date corresponding to one row.

Secondly, corresponding to input data set, I want to get the value of type on that particular day. For instance if for the range 16/1/2011-30/1/2011 type has value B then in the output data set, each row corresponding to one single date should as entry B

I want to automate the above process as the input data set is enormous, so basically if one can help me with any code/macro which would automatically accomplish the above task, that would be highly appreciated.

Thanks

Ravinder

Astounding · Posted 04-26-2013 11:27 AM

This should be straightforward, and won't involve macro language. Assuming that your date variables are actual SAS dates, you could try:

data want;

set have;

do date = start_date to end_date;

output;

end;

keep date type category;

format date ddmmyy10.;

run;

Good luck.

Ravinder · Posted 05-02-2013 04:03 AM

Thanks For your response. That worked perfectly. Now I have one more tricky task. Once we have the data in following format

Date type category

1/2/2011 A 1945

ate Type

1/1/2011 A 1945

2/1/2011 A 1945

3/1/2011 A 1945

4/1/2011 A 1945

5/1/2011 A 1945

so on

16/1/2011 B 1945

17/1/2011 B 1945

18/1/2011 B 1945

19/1/2011 B 1945

now for each type we need to create separate variable such as 1 occurs where it is present and zero occurs where it is absent...

date Category

1/1/2011 1945 TypeA TypeB

2/1/2011 1945 1 0

3/1/2011 1945 1 0

4/1/2011 1945 1 0

so on...

16/1/2011 1945 0 1

17/1/2011 1945 0 1

So basically for each type column value we need to create new binary columns having one for that row in which that particular value occurs.

Thanks

Ravinder

Ravinder · Posted 05-02-2013 04:44 AM

Also due to size of the dataset we cannot adhere to use of IF/else conditional processing. We need to automate in some other way.

Ksharp · Posted 05-02-2013 08:32 AM

Just add two statements into Astounding's code .

data want;

set have;

do date = start_date to end_date;

typeA=(type='A');

typeB=(type='B');

output;

end;

keep date type category;

format date ddmmyy10.;

run;

Ksharp

Ravinder · Posted 05-02-2013 08:54 AM

But in actual data set, there are about 150 type values(that too very long descriptions, type is just toy name to explain here), I cannot include typeA=(type='A') for each of them. It would be very tedious.

Thanks

Ravi

ballardw · Posted 05-02-2013 10:27 AM

Will this list of names change in the future?

Ravinder · Posted 05-02-2013 10:40 AM

no for now it is static only...150 different type or descriptor variables

Haikuo · Posted 05-02-2013 11:01 AM

Try this array() approach, it is a variation base on Astounding and Ksharp's suggestion.

data have;

infile cards truncover;

input (start_date end_date) (:ddmmyy10.) (type category) (:$);

format start_date end_date ddmmyy10.;

cards;

1/1/2011 5/1/2011 A 1945

16/1/2011 30/1/2011 B 1945

2/2/2011 3/2/2011 c 1945

4/2/2011 5/2/2011 A 1945

16/1/2011 30/1/2011 B 1945

17/1/2011 25/1/2011 B 1945

;

proc sql noprint;

select distinct cats('type','_', type) into :type separated by ' ' from have;

select cats(count(distinct type)) into :ct from have;

quit;

data want;

set have;

array tp(&ct) &type.;

do date = start_date to end_date;

do i=1 to dim(tp);

tp(i)=(scan(vname(tp(i)),-1,'_')=type);

end;

output;

end;

keep date type: category;

format date ddmmyy10.;

run;

Haikuo

Ksharp · Posted 05-02-2013 11:20 AM

Yeah. HaiKuo have already give you answer.

Here is my version:

data have;
infile cards truncover;
input (start_date end_date) (:ddmmyy10.) (type category) (:$);
format start_date end_date ddmmyy10.;
cards;
1/1/2011 5/1/2011 A 1945
16/1/2011 30/1/2011 B 1945
2/2/2011 3/2/2011 c 1945
4/2/2011 5/2/2011 A 1945
16/1/2011 30/1/2011 B 1945
17/1/2011 25/1/2011 B 1945
;
run;
proc sql noprint;

  select distinct cats('type',type,'=(type="',type,'") ;') into :type separated by ' ' from have;
  quit;

  data want;
  set have;
  do date = start_date to end_date;
      &type 
      output;
  end;
  format date ddmmyy10.;
run;

Ksharp

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!