Hi,
I am trying to transfer table 1 to table 2.
Table 1
obs | ID | edate | Rating | Outlook | CreditWatch |
1 | 1 | 02/20/2020 | positive outlook | ||
2 | 1 | 02/20/2020 | B- | ||
3 | 1 | 04/08/2020 | developing watch | ||
4 | 1 | 04/09/2020 | CCC+ | ||
5 | 2 | 03/28/2018 | negative outlook | ||
6 | 2 | 03/28/2018 | B | ||
7 | 2 | 02/24/2020 | negative watch | ||
8 | 2 | 02/24/2020 | stable outlook | ||
9 | 2 | 02/24/2020 | B- |
Table 2
obs | ID | edate | Rating | outlookorwatch |
1 | 1 | 02/20/2020 | B- | positive outlook |
2 | 1 | 04/08/2020 | B- | developing watch |
3 | 1 | 04/09/2020 | CCC+ | stable outlook |
4 | 2 | 03/28/2018 | B | negative outlook |
5 | 2 | 02/24/2020 | B- | wrong |
1. for each ID group, if the rows have the same 'edate' in table 1, then convert them to the same row in table 2.
For example,
1)the observation 1,2 in Table 1 are both ID '1' and have the same 'edate', then the value in 'Rating' in Table 1 becomes the value in 'Rating' in Table 2.
2) the value in 'outlook' variable in 'positive outlook' and the non-value exist in 'CreditWatch' variable, then file ''positive outlook' in variable 'outlookorwatch' in Table 2.
2. The value in ‘Rating’ variable does not change until the next record in ‘rating’ variable happens.
For example,
the rating is 'B-' in obs 2 in Table 2, because no new 'rating' record happens in 04/08/2020.
3. if there is no record of ‘CreditWatch’ or ‘Outlook’, the value in ‘outlookorwatch’ is ‘stable outlook’.
For example,
the outlookorwatch is ‘stable outlook’ in obs 3 in Table 2, because no record of ‘CreditWatch’ or ‘Outlook’ happen in 04/09/2020。
4. if ‘CreditWatch’ record and ‘Outlook’ happen at the same time, I would like to set ‘outlookorwatch’ as 'wrong'.
For example,
Record in variable ‘CreditWatch’ and ‘Outlook’ happens at the same date (i.e., 02/24/2020) in obs 8,9 of table 1, then the value in ‘outlookorwatch’ variable in table 2 is 'wrong'.
Could you please give me some suggestion about this ?
Thanks in advance.
data table1; infile cards dsd dlm=","; input ID $ edate :mmddyy10. Rating $ Outlook :$50. CreditWatch :$50. ; format edate mmddyy10. ; cards; 1,02/20/2020,,positive outlook, 1,02/20/2020,B-,, 1,04/08/2020,,,developing watch 1,04/09/2020,CCC+,, 2,03/28/2018,,negative outlook, 2,03/28/2018,B,, 2,02/24/2020,,,negative watch 2,02/24/2020,,stable outlook, 2,02/24/2020,B-,, ;;;; run;
Like this?
data table1;
infile cards dsd dlm=",";
input
ID $
EDATE :mmddyy10.
RATING $
OUTLOOK :$50.
CREDITWATCH :$50.
;
format
EDATE mmddyy10.
;
cards;
1,02/20/2020,,positive outlook,
1,02/20/2020,B-,,
1,04/08/2020,,,developing watch
1,04/09/2020,CCC+,,
2,03/28/2018,,negative outlook,
2,03/28/2018,B,,
2,02/24/2020,,,negative watch
2,02/24/2020,,stable outlook,
2,02/24/2020,B-,,
run;
data WANT;
merge TABLE1 ;
by ID EDATE;
length KEEP_RATING KEEP_OUTLOOK KEEP_CRWATCH $16;
retain KEEP_RATING KEEP_OUTLOOK KEEP_CRWATCH;
if first.ID then call missing(KEEP_RATING);
if first.EDATE then call missing(KEEP_OUTLOOK, KEEP_CRWATCH);
KEEP_RATING =coalescec(RATING ,KEEP_RATING );
KEEP_OUTLOOK=coalescec(OUTLOOK ,KEEP_OUTLOOK);
KEEP_CRWATCH=coalescec(CREDITWATCH,KEEP_CRWATCH);
if last.EDATE then do;
RATING = KEEP_RATING;
OUTLOOKORWATCH = ifc( KEEP_OUTLOOK^=' ' and KEEP_CRWATCH^=' ', 'wrong'
, ifc( KEEP_OUTLOOK =' ' and KEEP_CRWATCH =' ', 'stable outlook'
, coalescec(KEEP_OUTLOOK ,KEEP_CRWATCH)));
output;
end;
keep ID EDATE RATING OUTLOOKORWATCH;
run;
proc print; run;
Obs | ID | edate | Rating | OUTLOOKORWATCH |
---|---|---|---|---|
1 | 1 | 02/20/2020 | B- | positive outlook |
2 | 1 | 04/08/2020 | B- | developing watch |
3 | 1 | 04/09/2020 | CCC+ | stable outlook |
4 | 2 | 03/28/2018 | B | negative outlook |
5 | 2 | 02/24/2020 | B- | wrong |
Note that there are tabs in your code. They mess up the data. Do not do that.
This is a complicated problem and it looks like this is the second time you have posted it. I suspect few will want to write it for you.
It can be solved using this type of code:
proc sort data=table1; by ID edate; run; data results; format ID $1. edate mmddyy10. Rating $2. outlookorwatch $50.; retain Rating outlookorwatch; set table1(rename=(Rating = Rating_in)); by ID edate; if first.ID then outlookorwatch = "stable outlook"; if first.edate then ... Rating = coalescec(Rating_in,Rating); outlookorwatch = coalescec(Outlook,CreditWatch); if last.ID then output; run;
This is a very complicated request. Yipes. I have a solution. Here's the code, and then I'll discuss it below the code:
data table1;
infile cards dsd dlm=",";
input
ID $
edate :mmddyy10.
Rating $
Outlook :$50.
CreditWatch :$50.
;
format
edate mmddyy10.
;
cards;
1,02/20/2020,,positive outlook,
1,02/20/2020,B-,,
1,04/08/2020,,,developing watch
1,04/09/2020,CCC+,,
2,03/28/2018,,negative outlook,
2,03/28/2018,B,,
2,02/24/2020,,,negative watch
2,02/24/2020,,stable outlook,
2,02/24/2020,B-,,
;;;;
run;
PROC SORT DATA=Table1;
By ID eDATE;
Run;
DATA Table2;
LENGTH ID $8
Prior_ID $8
edate 8
prior_edate 8
Rating $8
Prior_Rating $8
Outlook $50
CreditWatch $50
OutlookOrWatch $100
Prior_OutlookOrWatch $100
Type $8
Prior_Type $8
;
DROP Outlook
CreditWatch
prior_edate
Prior_Rating
Prior_ID
Prior_OutlookOrWatch
Type
Prior_Type
;
RETAIN Prior_ID ' '
prior_edate 0
prior_rating ' '
OutlookOrWatch ' '
Prior_OutlookOrWatch ' '
Prior_Type ' '
;
FORMAT Prior_edate MMDDYYS10.;
SET Table1;
BY ID Edate;
IF NOT MISSING(Outlook) THEN
DO;
OutlookOrWatch = Outlook;
Type = 'RATE';
END;
ELSE
IF NOT MISSING(CreditWatch) THEN
DO;
OutlookOrWatch = CreditWatch;
Type = 'RATE';
END;
ELSE
DO;
Type = 'NON-RATE';
END;
IF MISSING(Rating) AND
ID = Prior_ID THEN
Rating = Prior_Rating;
IF Prior_OutlookOrWatch ^= OutlookOrWatch AND
NOT MISSING(OutlookOrWatch) AND
NOT MISSING(Rating) THEN
DO;
LINK Output_Routine;
END;
ELSE
IF Rating ^= Prior_Rating AND
NOT MISSING(Rating) THEN
DO;
LINK Output_Routine;
END;
ELSE
IF NOT MISSING(OutlookOrWatch) AND
NOT MISSING(Rating) THEN
DO;
LINK Output_Routine;
END;
IF LAST.ID THEN
DO;
CALL MISSING(prior_ID, Prior_Rating, prior_edate, OutlookOrWatch, Prior_OutlookOrWatch, Prior_Type);
END;
ELSE
DO;
Prior_ID = ID;
Prior_edate = edate;
Prior_Rating = Rating;
Prior_OutlookOrWatch = OutlookOrWatch;
Prior_Type = Type;
END;
******;
RETURN;
******;
**************;
Output_Routine:
**************;
IF MISSING(OutlookOrWatch) THEN
DO;
OutlookOrWatch = 'stable outlook';
END;
ELSE
IF Type = 'RATE' AND
Prior_Type = 'RATE' AND
edate = prior_edate THEN
DO;
OutlookOrWatch = 'wrong';
END;
OUTPUT;
CALL MISSING(OutlookOrWatch);
******;
RETURN;
******;
RUN;
Here are the results, which I put into Table2:
Now, these results look slightly different than yours. Here's what's going on:
When record 7 comes in from the source data, SAS doesn't know that record 7 is an error yet. SAS can't know that until it reads record 8. At the point of record 7, SAS has a rating and an outlook/watch value, so the proper response is to write out "B - Negative Watch."
When record 8 comes in from the source data, now SAS knows an error has occurred and writes out "B - Wrong".
There's nothing wrong with record 9 per your specifications, but SAS doesn't have a outlook/watch for record 9, so SAS defaults to "stable outlook".
This code that I have written meets your specifications except that on record 7 where SAS doesn't know yet what is in record 8.
Now, you can get SAS to know about record 8 by:
1. Deferred output processing. Save each record and do not write until the following record has been read and evaluated in light of the prior record.
2. Read ahead processing. Write code to do a direct read on the next record in the Table1 SAS data set and evaluate the current record in light of the values from the read ahead.
Whether you do deferred output processing or read ahead processing, you're asking for a fairly complicated process. If you really want that, go for it, but I think I've written enough code for this thread already. 😊 The code I have written will identify "wrong" data. I would argue that a program that identifies "wrong" data is sufficient and that we shouldn't waste time writing code to identify "wrong" data in a fancier way. Instead, when "wrong" data is identified, the data should be corrected and the program re-run.
Hopefully we can consider this the solution to your problem. If however you really need for some reason the final output to be just exactly as you stated in your 2nd table, I can give you some tips. For one, @LeonidBatkhan wrote an excellent blog post on read ahead type processing. I don't recommend that you use read ahead processing in this case (just correct the data and re-run!), but the post itself is very worthwhile. Link: https://blogs.sas.com/content/sgf/2017/11/01/hopping-for-the-best-calculations-across-sas-dataset-ob...
Jim
Like this?
data table1;
infile cards dsd dlm=",";
input
ID $
EDATE :mmddyy10.
RATING $
OUTLOOK :$50.
CREDITWATCH :$50.
;
format
EDATE mmddyy10.
;
cards;
1,02/20/2020,,positive outlook,
1,02/20/2020,B-,,
1,04/08/2020,,,developing watch
1,04/09/2020,CCC+,,
2,03/28/2018,,negative outlook,
2,03/28/2018,B,,
2,02/24/2020,,,negative watch
2,02/24/2020,,stable outlook,
2,02/24/2020,B-,,
run;
data WANT;
merge TABLE1 ;
by ID EDATE;
length KEEP_RATING KEEP_OUTLOOK KEEP_CRWATCH $16;
retain KEEP_RATING KEEP_OUTLOOK KEEP_CRWATCH;
if first.ID then call missing(KEEP_RATING);
if first.EDATE then call missing(KEEP_OUTLOOK, KEEP_CRWATCH);
KEEP_RATING =coalescec(RATING ,KEEP_RATING );
KEEP_OUTLOOK=coalescec(OUTLOOK ,KEEP_OUTLOOK);
KEEP_CRWATCH=coalescec(CREDITWATCH,KEEP_CRWATCH);
if last.EDATE then do;
RATING = KEEP_RATING;
OUTLOOKORWATCH = ifc( KEEP_OUTLOOK^=' ' and KEEP_CRWATCH^=' ', 'wrong'
, ifc( KEEP_OUTLOOK =' ' and KEEP_CRWATCH =' ', 'stable outlook'
, coalescec(KEEP_OUTLOOK ,KEEP_CRWATCH)));
output;
end;
keep ID EDATE RATING OUTLOOKORWATCH;
run;
proc print; run;
Obs | ID | edate | Rating | OUTLOOKORWATCH |
---|---|---|---|---|
1 | 1 | 02/20/2020 | B- | positive outlook |
2 | 1 | 04/08/2020 | B- | developing watch |
3 | 1 | 04/09/2020 | CCC+ | stable outlook |
4 | 2 | 03/28/2018 | B | negative outlook |
5 | 2 | 02/24/2020 | B- | wrong |
Note that there are tabs in your code. They mess up the data. Do not do that.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.