BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MarcBoh
Obsidian | Level 7

Hi,

I want to create dummies in SAS. My plan is, everytime AnnouncementDate ne . I want to make a variable called event=2 and the next one also (event=2) as well as the five previous observations 1 And all this flagged if the Previous 5 observations and the day after the announcement date are not empty. and in a by class (by stock for example). If this now switches to the next stock, it should start over and not take the previous observations into account.

 

This is my current code for an example dataset that I got help with below. But at the proc format step it says: ERROR: Start is greater than end. I tried removing the $ or sort by start. nothing works...

 

Thankful for any help!

data have;
input date announcementdate event_want conflict_want $;
datalines;
1115 . . .
2115 . . .
3115 . . .
4115 . . .
5115 . . .
6115 . 1 .
7115 . 1 .
8115 . 1 .
9115 . 1 .
10115 . 1 .
11115 1115 2 .
12115 . 2 .
13115 . . .
14115 . 1 .
16115 . 1 .
17115 . 1 .
18115 . 1 .
19115 . 1 .
20115 20115 2 .
21115 . 1 Y
21115 . 1 Y
22115 22115 2 Y
23115 . 2 Y
24115 . . .
25115 . . .
;;;;
run;

data for_fmt;
  set have nobs=nobs;
  length fmtname $16;
  fmtname = 'eventf';

  if _n_=1 then do;   *HLO='o' gives the 'other' value;
    start=.;
    end  =.;
    label = ' ';
    hlo='o';
    output;
  end;
  hlo=' ';

  if not missing(announcementdate) then do;
    start = max(1,_n_-5);
    end   = max(1,_n_-1);
    label='1';
    output;
    start = _n_;
    end   = min(_n_+1,nobs);
    label='2';
    output;
  end;
run;  *now we have a dataset of the ranges - but they overlap;

data for_fmt_conflicts;

  merge for_fmt(in=a) for_fmt(in=_in_next firstobs=2 keep=start rename=start=next_start); *look-ahead merge;
  retain new_start conflict_marker;
  if hlo='o' then do;    *again the 'other' row;
    output;
    fmtname='conflictf';
    output;
  end;
  else if end gt next_start and (_in_next) then do;   *if the next row will be a conflict, adjust the end back one and indicate where next should start;
    end = end-1;
    new_start = end+1;
    conflict_marker=1;
    output;
  end;
  else if conflict_marker=1 then do;  *if this is a conflict row type 1;
    start = new_start;
    output;
    fmtname='conflictf';
    label  ='Y';
    output;
    conflict_marker=2;
  end;
  else if conflict_marker=2 then do;  *if this is the 2s for a conflict row;
    output;
    fmtname='conflictf';
    label  ='Y';
    output;
    conflict_marker=0;
  end;
  else output;
run;
proc sort data=for_fmt_conflicts;  *must sort to get fmtnames grouped together;
  by fmtname start;
run;
proc format cntlin=for_fmt_conflicts;  *import formats;
quit;

data want;  *now apply formats;
  set have;
  event = put(_n_,eventf.);
  conflict = put(_n_,conflictf.);
run;
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please note the text:

"the data you posted does not illustrate this"

 

I am not sitting at your desk looking at your data, the data you posted does not illustrate the problem.

 

What is it your actually trying to do here?  It looks like you creating a format based on a range of observation numbers.  That doesn't seem like a good idea - logical position in a dataset is not fixed (sorts for instance will change that).  

If you want to flag certain records, then retain the flag variables and do your logic as you go through the dataset.  You seem to be doing this anyways in the dataset which creates the formats dataset, just remove the formats part and do that directly on your data?

View solution in original post

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

That code you provide works fine for me.  You must have other problems, for instance maybe data further on does not have the required rows or values.  Run it and see what is in the dataset for_fmt_conflicts, you can filter it for start > end.

MarcBoh
Obsidian | Level 7

It has several where start>end

 

Where does that happen? I don't see the issue....

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I don't know what the question is.  The error is quite clear however, you cannot have a format where the start of the range is after the end of the range, hence you would need to fix the dataset which goes into the format procedure - the data you posted does not illustrate this.

MarcBoh
Obsidian | Level 7

The question is, where in the code the start value gets bigger than the end. My logic seems to be wrong here.

 

If I deduct 2 from the new_start value and make it start, there are a lot of overlapping ranges.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please note the text:

"the data you posted does not illustrate this"

 

I am not sitting at your desk looking at your data, the data you posted does not illustrate the problem.

 

What is it your actually trying to do here?  It looks like you creating a format based on a range of observation numbers.  That doesn't seem like a good idea - logical position in a dataset is not fixed (sorts for instance will change that).  

If you want to flag certain records, then retain the flag variables and do your logic as you go through the dataset.  You seem to be doing this anyways in the dataset which creates the formats dataset, just remove the formats part and do that directly on your data?

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3056 views
  • 0 likes
  • 2 in conversation