BookmarkSubscribeRSS Feed
Ravinder
Calcite | Level 5

Hello guys

I have an interesting query. I have a dataset with certain columns named as start date, end date, type, item, For instance the toy data set looks as:

start _date    end_date          type   category

1/1/2011       5/1/2011          A       1945

16/1/2011      30/1/2011        B      1945

2/2/2011        3/2/2011         c        1945

4/2/2011       5/2/2011          A       1945

16/1/2011      30/1/2011        B        1945

17/1/2011        25/1/2011       B        1945

Now basically this information I have for entire year up to December. Now I want to extract the information from the data set in vertical manner so that my output dataset looks like the following:

date                           Type

1/1/2011                     A             1945

2/1/2011                     A            1945

3/1/2011                     A            1945

4/1/2011                     A            1945

5/1/2011                     A            1945

so on

16/1/2011                    B               1945

17/1/2011                    B               1945

18/1/2011                    B               1945

19/1/2011                    B                1945

so on up to

30/1/2011                    B                1945

2/2/2011                       C

3/2/2011                       C

4/2/2011                       A

5/2/2011                       A

So essentially the tasks are two fold:

Firstly the information given in  range(start date :end date) should appear row wise with each date corresponding to one row.

Secondly, corresponding to input data set, I want to get the value of type on that particular day. For instance if for the range 16/1/2011-30/1/2011 type has value B then in the output data set, each row corresponding to one single date should as entry B

I want to automate the above process as the input data set is enormous, so basically if one can help me with any code/macro which would automatically accomplish the above task, that would be highly appreciated.

Thanks

Ravinder

9 REPLIES 9
Astounding
PROC Star

This should be straightforward, and won't involve macro language.  Assuming that your date variables are actual SAS dates, you could try:

data want;

  set have;

  do date = start_date to end_date;

      output;

  end;

  keep date type category;

  format date ddmmyy10.;

run;

Good luck.

Ravinder
Calcite | Level 5

Thanks For your response. That worked perfectly. Now I have one more tricky task. Once we have the data in following format

Date type category

1/2/2011 A  1945

ate                           Type

1/1/2011                     A             1945

2/1/2011                     A            1945

3/1/2011                     A            1945

4/1/2011                     A            1945

5/1/2011                     A            1945

so on

16/1/2011                    B               1945

17/1/2011                    B               1945

18/1/2011                    B               1945

19/1/2011                    B                1945

now for each type we need to create separate variable such as 1 occurs where it is present and zero occurs where it is absent...

date                                  Category              

1/1/2011                                1945   TypeA     TypeB

2/1/2011                                1945    1            0

3/1/2011                                1945     1            0

4/1/2011                                1945    1             0

so on...

16/1/2011                                1945   0    1

17/1/2011                                1945   0     1

So basically for each type column value we need to create new  binary columns having one for that row in which that particular value occurs.

Thanks

Ravinder

Ravinder
Calcite | Level 5

Also due to size of the dataset we cannot adhere to use of IF/else conditional processing. We need to automate in some other way.

Ksharp
Super User

Just add two statements into Astounding's code .

data want;

  set have;

  do date = start_date to end_date;

   typeA=(type='A');

typeB=(type='B');

      output;

  end;

  keep date type category;

  format date ddmmyy10.;

run;

Ksharp

Ravinder
Calcite | Level 5

But in actual data set, there are about 150 type values(that too very long descriptions, type is just toy name to explain here), I cannot include typeA=(type='A') for each of them. It would be very tedious.

Thanks

Ravi

ballardw
Super User

Will this list of names change in the future?

Ravinder
Calcite | Level 5

no for now it is static only...150 different type or descriptor variables

Haikuo
Onyx | Level 15

Try this array() approach, it is a  variation base on  Astounding and Ksharp's suggestion.

data have;

infile cards truncover;

input (start_date end_date) (:ddmmyy10.) (type category) (:$);

format start_date end_date ddmmyy10.;

cards;

1/1/2011 5/1/2011 A 1945

16/1/2011 30/1/2011 B 1945

2/2/2011 3/2/2011 c 1945

4/2/2011 5/2/2011 A 1945

16/1/2011 30/1/2011 B 1945

17/1/2011 25/1/2011 B 1945

;

proc sql noprint;

  select distinct cats('type','_', type) into :type separated by ' ' from have;

  select cats(count(distinct type)) into :ct from have;

  quit;

data want;

  set have;

  array tp(&ct) &type.;

  do date = start_date to end_date;

  do i=1 to dim(tp);

tp(i)=(scan(vname(tp(i)),-1,'_')=type);

end;

  output;

  end;

  keep date type: category;

  format date ddmmyy10.;

run;

Haikuo

Ksharp
Super User

Yeah. HaiKuo have already give you answer.

Here is my version:

data have;
infile cards truncover;
input (start_date end_date) (:ddmmyy10.) (type category) (:$);
format start_date end_date ddmmyy10.;
cards;
1/1/2011 5/1/2011 A 1945
16/1/2011 30/1/2011 B 1945
2/2/2011 3/2/2011 c 1945
4/2/2011 5/2/2011 A 1945
16/1/2011 30/1/2011 B 1945
17/1/2011 25/1/2011 B 1945
;
run;
proc sql noprint;

  select distinct cats('type',type,'=(type="',type,'") ;') into :type separated by ' ' from have;
  quit;

  data want;
  set have;
  do date = start_date to end_date;
      &type 
      output;
  end;
  format date ddmmyy10.;
run;

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1190 views
  • 1 like
  • 5 in conversation