Hi,
I want to create dummies in SAS. My plan is, everytime AnnouncementDate ne . I want to make a variable called event=2 and the next one also (event=2) as well as the five previous observations 1 And all this flagged if the Previous 5 observations and the day after the announcement date are not empty. and in a by class (by stock for example). If this now switches to the next stock, it should start over and not take the previous observations into account.
This is my current code for an example dataset that I got help with below. But at the proc format step it says: ERROR: Start is greater than end. I tried removing the $ or sort by start. nothing works...
Thankful for any help!
data have; input date announcementdate event_want conflict_want $; datalines; 1115 . . . 2115 . . . 3115 . . . 4115 . . . 5115 . . . 6115 . 1 . 7115 . 1 . 8115 . 1 . 9115 . 1 . 10115 . 1 . 11115 1115 2 . 12115 . 2 . 13115 . . . 14115 . 1 . 16115 . 1 . 17115 . 1 . 18115 . 1 . 19115 . 1 . 20115 20115 2 . 21115 . 1 Y 21115 . 1 Y 22115 22115 2 Y 23115 . 2 Y 24115 . . . 25115 . . . ;;;; run; data for_fmt; set have nobs=nobs; length fmtname $16; fmtname = 'eventf'; if _n_=1 then do; *HLO='o' gives the 'other' value; start=.; end =.; label = ' '; hlo='o'; output; end; hlo=' '; if not missing(announcementdate) then do; start = max(1,_n_-5); end = max(1,_n_-1); label='1'; output; start = _n_; end = min(_n_+1,nobs); label='2'; output; end; run; *now we have a dataset of the ranges - but they overlap; data for_fmt_conflicts; merge for_fmt(in=a) for_fmt(in=_in_next firstobs=2 keep=start rename=start=next_start); *look-ahead merge; retain new_start conflict_marker; if hlo='o' then do; *again the 'other' row; output; fmtname='conflictf'; output; end; else if end gt next_start and (_in_next) then do; *if the next row will be a conflict, adjust the end back one and indicate where next should start; end = end-1; new_start = end+1; conflict_marker=1; output; end; else if conflict_marker=1 then do; *if this is a conflict row type 1; start = new_start; output; fmtname='conflictf'; label ='Y'; output; conflict_marker=2; end; else if conflict_marker=2 then do; *if this is the 2s for a conflict row; output; fmtname='conflictf'; label ='Y'; output; conflict_marker=0; end; else output; run; proc sort data=for_fmt_conflicts; *must sort to get fmtnames grouped together; by fmtname start; run; proc format cntlin=for_fmt_conflicts; *import formats; quit; data want; *now apply formats; set have; event = put(_n_,eventf.); conflict = put(_n_,conflictf.); run;
Please note the text:
"the data you posted does not illustrate this"
I am not sitting at your desk looking at your data, the data you posted does not illustrate the problem.
What is it your actually trying to do here? It looks like you creating a format based on a range of observation numbers. That doesn't seem like a good idea - logical position in a dataset is not fixed (sorts for instance will change that).
If you want to flag certain records, then retain the flag variables and do your logic as you go through the dataset. You seem to be doing this anyways in the dataset which creates the formats dataset, just remove the formats part and do that directly on your data?
That code you provide works fine for me. You must have other problems, for instance maybe data further on does not have the required rows or values. Run it and see what is in the dataset for_fmt_conflicts, you can filter it for start > end.
It has several where start>end
Where does that happen? I don't see the issue....
I don't know what the question is. The error is quite clear however, you cannot have a format where the start of the range is after the end of the range, hence you would need to fix the dataset which goes into the format procedure - the data you posted does not illustrate this.
The question is, where in the code the start value gets bigger than the end. My logic seems to be wrong here.
If I deduct 2 from the new_start value and make it start, there are a lot of overlapping ranges.
Please note the text:
"the data you posted does not illustrate this"
I am not sitting at your desk looking at your data, the data you posted does not illustrate the problem.
What is it your actually trying to do here? It looks like you creating a format based on a range of observation numbers. That doesn't seem like a good idea - logical position in a dataset is not fixed (sorts for instance will change that).
If you want to flag certain records, then retain the flag variables and do your logic as you go through the dataset. You seem to be doing this anyways in the dataset which creates the formats dataset, just remove the formats part and do that directly on your data?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.