- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello every body.
I use this code for formulating volume of trades. I need sorting my data by date (TRD_EVENT_DT var.) and then use the formula for calcuating the volume of trades of each stock (based on TRD_STCK_CD var. (text var.)).
/*Sort by TRD_STCK_CD and temporal variables.*/
PROC SORT DATA=Sampledata87_RT OUT=Sampledata87_SumVol;
BY TRD_EVENT_DT;
RUN;
/*Sum VOLUME until the last of each TRD_STCK_CD is reached.*/
DATA Sampledata87_SumVolSo;
SET Sampledata87_SumVol;
BY TRD_EVENT_DT TRD_EVENT_DT;
FORMAT TRD_STCK_CD $5.;
INFORMAT TRD_STCK_CD $5.;
RETAIN tmp_volume_sum;
tmp_volume_sum + VOLUME;
IF last.TRD_STCK_CD THEN DO;
VOLUME_SUM = tmp_volume_sum;
CALL missing(tmp_volume_sum);
END;
DROP tmp_:;
RUN;
But the result is:
NOTE: Variable last.TRD_STCK_CD is uninitialized.
NOTE: There were 42948 observations read from the data set WORK.SAMPLEDATA87_SUMVOL.
NOTE: The data set WORK.SAMPLEDATA87_SUMVOLSO has 42948 observations and 15 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
Thanks for your help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@RW9 He doesn't understand the code because he didn't write it.
@aminkarimid Please review how BY groups work. If you want to use FIRST/LAST on a variable that variable MUST be in your BY statement.
Your BY Statement is what allows you to uniquely identify groups.
You've changed the code in the solution, removing BY variables. Note that you should also update your comments to reflect your new algorithm. I believe the code on SO is correct and answers your question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So what is the question? You appear to have said you have some code and presented the code, not asked a question. Do look at the guidance also when posting a question, give example test data in the form of a datastep, and show what you would like the output to look like. Then explain the logic between the two. Also, please avoid coding all in captials it makes your code so much harder to read.
Finally your topic doesn't appear to have anything to do with the post?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ah, now you provide further info. Does this line:
BY TRD_EVENT_DT TRD_EVENT_DT;
Contain the variable mentioned in the note:
NOTE: Variable last.TRD_STCK_CD is uninitialized.
This is why you have the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, you must bring your dataset int the order necessary for by-group processing, so your by variables will form the groups in succession as you need them. If you have a kind of order where you need to get a by-group for a variable that is not in a strict ascending or descending sequence, you can use the notsorted option on this variable in the by statement.
For detailed help, you will need to supply example data (in a data step), and an example of the expected result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@RW9 He doesn't understand the code because he didn't write it.
@aminkarimid Please review how BY groups work. If you want to use FIRST/LAST on a variable that variable MUST be in your BY statement.
Your BY Statement is what allows you to uniquely identify groups.
You've changed the code in the solution, removing BY variables. Note that you should also update your comments to reflect your new algorithm. I believe the code on SO is correct and answers your question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let me show you an example:
Here is a first data set:
TRD_EVENT_DT TRD_EVENT_TM TRD_STCK_CD TRD_PR TRD_TUROVR TRD_EVENT_ROUFOR VOLUME
3/24/2008 13:13:44 ALBZ1 1567 10 13:00 15670
3/24/2008 12:20:38 AZAB1 683 10000 12:30 6830000
3/24/2008 13:13:44 AZAB1 695 10 13:00 6950
3/24/2008 9:14:57 BALI1 850 9260 9:00 7871000
3/24/2008 9:15:06 BALI1 850 2000 9:30 1700000
3/24/2008 9:15:14 BALI1 850 10000 9:30 8500000
3/24/2008 9:15:24 BALI1 850 6000 9:30 5100000
3/24/2008 9:29:27 BALI1 850 10000 9:30 8500000
3/24/2008 12:28:00 BALIl 850 10000 12:30 8500000
3/24/2008 12:28:07 BALIl 850 10000 12:30 8500000
3/24/2008 13:13:44 BALI1 865 10 13:00 8650
4/24/2008 12:28:01 ALBZ1 1537 10000 12:30 15370000
4/24/2008 13:13:44 ALBZ1 1567 10 13:00 15670
4/24/2008 12:20:38 AZAB1 683 10000 12:30 6830000
4/24/2008 13:13:44 AZAB1 695 10 13:00 6950
4/24/2008 9:14:57 BALI1 850 9260 9:00 7871000
4/24/2008 9:15:06 BALI1 850 2000 9:30 1700000
4/24/2008 9:15:14 BALI1 850 10000 9:30 8500000
4/24/2008 9:15:24 BALI1 850 6000 9:30 5100000
4/24/2008 9:29:27 BALI1 850 10000 9:30 8500000
4/24/2008 12:28:00 BALIl 850 10000 12:30 8500000
4/24/2008 12:28:07 BALIl 850 10000 12:30 8500000
4/24/2008 13:13:44 BALI1 865 10 13:00 8650
And I need this table:
TRD_EVENT_DT TRD_EVENT_TM TRD_STCK_CD TRD_PR TRD_TUROVR TRD_EVENT_ROUFOR VOLUME VOLUME_SUM
3/24/2008 13:13:44 ALBZ1 1567 10 13:00 15670 15385670
3/24/2008 12:20:38 AZAB1 683 10000 12:30 6830000
3/24/2008 13:13:44 AZAB1 695 10 13:00 6950 6836950
3/24/2008 9:14:57 BALI1 850 9260 9:00 7871000
3/24/2008 9:15:06 BALI1 850 2000 9:30 1700000
3/24/2008 9:15:14 BALI1 850 10000 9:30 8500000
3/24/2008 9:15:24 BALI1 850 6000 9:30 5100000
3/24/2008 9:29:27 BALI1 850 10000 9:30 8500000
3/24/2008 12:28:00 BALIl 850 10000 12:30 8500000
3/24/2008 12:28:07 BALIl 850 10000 12:30 8500000
3/24/2008 13:13:44 BALI1 865 10 13:00 8650 48679650
4/24/2008 12:28:01 ALBZ1 1537 10000 12:30 15370000
4/24/2008 13:13:44 ALBZ1 1567 10 13:00 15670 15385670
4/24/2008 12:20:38 AZAB1 683 10000 12:30 6830000
4/24/2008 13:13:44 AZAB1 695 10 13:00 6950 6836950
4/24/2008 9:14:57 BALI1 850 9260 9:00 7871000
4/24/2008 9:15:06 BALI1 850 2000 9:30 1700000
4/24/2008 9:15:14 BALI1 850 10000 9:30 8500000
4/24/2008 9:15:24 BALI1 850 6000 9:30 5100000
4/24/2008 9:29:27 BALI1 850 10000 9:30 8500000
4/24/2008 12:28:00 BALIl 850 10000 12:30 8500000
4/24/2008 12:28:07 BALIl 850 10000 12:30 8500000
4/24/2008 13:13:44 BALI1 865 10 13:00 8650 48679650
Please pay attention to last col. It has been generated by summing volumes that have just same TRD_STCK_CD var. So each TRD_STCK_CD obs. has just one Volume_Sum data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DATA Sampledata87_SumVolSo;
SET Sampledata87_SumVol;
BY TRD_EVENT_DT
TRD_STCK_CD notsorted;
FORMAT TRD_STCK_CD $5.;
INFORMAT TRD_STCK_CD $5.;
RETAIN tmp_volume_sum;
tmp_volume_sum + VOLUME;
IF last.TRD_STCK_CD THEN DO;
VOLUME_SUM = tmp_volume_sum;
CALL missing(tmp_volume_sum);
END;
DROP tmp_:;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@aminkarimid wrote:
The solution is:
DATA Sampledata87_SumVolSo;
SET Sampledata87_SumVol;
BY TRD_EVENT_DT
TRD_STCK_CD notsorted;
FORMAT TRD_STCK_CD $5.;
INFORMAT TRD_STCK_CD $5.;
RETAIN tmp_volume_sum;
tmp_volume_sum + VOLUME;
IF last.TRD_STCK_CD THEN DO;
VOLUME_SUM = tmp_volume_sum;
CALL missing(tmp_volume_sum);
END;
DROP tmp_:;
RUN;
Please use the code boxes for posting code such as the forum {i} menu icon opens. The main message windows reformat code by removing leading spaces.
And also use indenting to make code more legible. Doing so consistently and with a good pattern will make debugging and understanding code much easier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think that's correct. The BY statement should be reversed, especially since for your next questions. And don't use NOTSORTED, it has its uses, but not your case. Using it here will likely return incorrect results.